Skip to content

fix(tests): use crawler-friendly search query in Exa integration test#7746

Merged
yuhongsun96 merged 1 commit intomainfrom
nikg/fix-flaky-exa-test
Jan 24, 2026
Merged

fix(tests): use crawler-friendly search query in Exa integration test#7746
yuhongsun96 merged 1 commit intomainfrom
nikg/fix-flaky-exa-test

Conversation

@nmgarza5
Copy link
Copy Markdown
Contributor

@nmgarza5 nmgarza5 commented Jan 24, 2026

Description

Fixes flaky test_web_search_endpoints_with_exa integration test. The previous search query "latest ai research news" returned URLs from news sites (e.g., chemistryworld.com) that block web crawlers with bot protection, causing consistent test failures.

Changed to "wikipedia python programming" which returns Wikipedia URLs that are reliably crawlable without bot protection.

How Has This Been Tested?

The test was failing consistently on PR #7745 due to chemistryworld.com blocking the crawler. Wikipedia URLs should be reliably fetchable.

Additional Options

  • [Optional] Override Linear Check

Summary by cubic

Fixes flaky Exa integration test by using a crawler-friendly query that returns Wikipedia pages. Replaced "latest ai research news" with "wikipedia python programming" to avoid bot-protected news sites.

Written for commit fa233cb. Summary will update on new commits.

The previous query "latest ai research news" returned URLs from news
sites that block web crawlers with bot protection, causing flaky test
failures. Changed to "wikipedia python programming" which returns
Wikipedia URLs that are reliably crawlable.
@nmgarza5 nmgarza5 requested a review from a team as a code owner January 24, 2026 20:18
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 24, 2026

Greptile Overview

Greptile Summary

Fixes flaky integration test by replacing a search query that returned bot-protected news URLs with one that returns crawler-friendly Wikipedia URLs.

  • Changed query from "latest ai research news" to "wikipedia python programming" in test_web_search_endpoints_with_exa
  • Addresses test failures caused by chemistryworld.com and similar sites blocking web crawlers
  • Wikipedia URLs are reliably crawlable without bot protection, making the test deterministic

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is a simple one-line modification to a test query string that addresses a specific, well-documented issue. The new query maintains test validity while improving reliability by using crawler-friendly Wikipedia URLs instead of bot-protected news sites.
  • No files require special attention

Important Files Changed

Filename Overview
backend/tests/integration/tests/web_search/test_web_search_api.py Changed search query from bot-protected news sites to crawler-friendly Wikipedia URLs to fix flaky test

Sequence Diagram

sequenceDiagram
    participant Test as test_web_search_endpoints_with_exa
    participant ExaAPI as Exa Search Provider
    participant SearchLite as /web-search/search-lite
    participant Crawler as Onyx Web Crawler
    participant OpenURLs as /web-search/open-urls
    participant Search as /web-search/search
    participant Wikipedia as Wikipedia URLs

    Test->>ExaAPI: Activate Exa provider
    ExaAPI-->>Test: Provider ID
    
    Test->>SearchLite: POST {"queries": ["wikipedia python programming"], "max_results": 3}
    SearchLite->>ExaAPI: Search query
    ExaAPI->>Wikipedia: Find matching pages
    Wikipedia-->>ExaAPI: Return Wikipedia URLs
    ExaAPI-->>SearchLite: Search results with URLs
    SearchLite-->>Test: Exa search results
    
    Test->>Test: Extract URLs from results
    Test->>OpenURLs: POST {"urls": [url1, url2]}
    OpenURLs->>Crawler: Fetch content from URLs
    Crawler->>Wikipedia: GET url1
    Wikipedia-->>Crawler: Page content (bot-friendly)
    Crawler->>Wikipedia: GET url2
    Wikipedia-->>Crawler: Page content (bot-friendly)
    Crawler-->>OpenURLs: Crawled content
    OpenURLs-->>Test: Results with content
    
    Test->>Search: POST search request (combined)
    Search->>ExaAPI: Search query
    ExaAPI->>Wikipedia: Find matching pages
    Wikipedia-->>ExaAPI: Return Wikipedia URLs
    ExaAPI-->>Search: Search results
    Search->>Crawler: Fetch content from URLs
    Crawler->>Wikipedia: GET URLs
    Wikipedia-->>Crawler: Page content
    Crawler-->>Search: Full content
    Search-->>Test: Combined search + content results
    
    Test->>Test: Assert all responses valid
Loading

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@yuhongsun96 yuhongsun96 added this pull request to the merge queue Jan 24, 2026
Merged via the queue into main with commit eb7b91e Jan 24, 2026
78 checks passed
@yuhongsun96 yuhongsun96 deleted the nikg/fix-flaky-exa-test branch January 24, 2026 21:02
@wenxi-onyx wenxi-onyx mentioned this pull request Jan 26, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants