Skip to content

perf(open-url): parallelize URL fetching with split connect/read timeouts (#8580) to release v2.12#8908

Merged
Subash-Mohan merged 1 commit intorelease/v2.12from
hotfix/29958f1a-v2.12
Mar 2, 2026
Merged

perf(open-url): parallelize URL fetching with split connect/read timeouts (#8580) to release v2.12#8908
Subash-Mohan merged 1 commit intorelease/v2.12from
hotfix/29958f1a-v2.12

Conversation

@Subash-Mohan
Copy link
Copy Markdown
Contributor

@Subash-Mohan Subash-Mohan commented Mar 2, 2026

Cherry-pick of commit 29958f1 to release/v2.12 branch.

Original PR: #8580

  • [Optional] Override Linear Check

Summary by cubic

Parallelize URL fetching in OnyxWebCrawler and add split connect/read timeouts to reduce latency and prevent long hangs. Hardens batch behavior so one bad URL doesn’t affect others.

  • New Features
    • Fetch URLs concurrently with ThreadPoolExecutor (up to 5 workers).
    • Use separate connect/read timeouts (default 5s connect, 15s read) and pass them to ssrf_safe_get; URL utils now accept tuple timeouts.
    • Isolate per-URL failures and HTTP errors; return a safe “failed” result instead of aborting the batch.
    • Added unit tests for parallelism, failure isolation, and tuple timeouts.

Written for commit 22896e4. Summary will update on new commits.

@Subash-Mohan Subash-Mohan requested a review from a team as a code owner March 2, 2026 03:41
@Subash-Mohan Subash-Mohan added the cherry-pick 🍒 Tags the PR to ensure that these are cherry-pick PRs label Mar 2, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

Cherry-picks performance and reliability improvements to URL fetching from #8580 to the release/v2.12 branch.

Key Changes:

  • Parallelized URL fetching using ThreadPoolExecutor with max 5 concurrent workers
  • Split single timeout into separate connect (5s) and read (15s) timeouts for better network control
  • Added error isolation wrapper so one failing URL doesn't abort the entire batch
  • Extracted _failed_result helper to eliminate code duplication
  • Comprehensive test coverage for parallel execution, failure isolation, and timeout handling

The changes are well-isolated, backward-compatible, and improve both performance (parallel fetching) and reliability (isolated failures).

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Clean cherry-pick with comprehensive test coverage (parallel execution, failure isolation, timeout handling), well-isolated changes that improve performance and reliability without breaking backward compatibility
  • No files require special attention

Important Files Changed

Filename Overview
backend/onyx/tools/tool_implementations/open_url/onyx_web_crawler.py parallelized URL fetching with ThreadPoolExecutor, split timeouts into connect/read, added error isolation wrapper
backend/onyx/utils/url.py updated timeout parameter to accept tuple format for split connect/read timeouts
backend/tests/unit/onyx/tools/tool_implementations/open_url/test_onyx_web_crawler.py added comprehensive tests for parallel execution, failure isolation, and tuple timeout handling
backend/tests/unit/onyx/utils/test_url_ssrf.py updated test to verify tuple timeout format is passed through correctly

Last reviewed commit: 22896e4

@Subash-Mohan Subash-Mohan merged commit 7589767 into release/v2.12 Mar 2, 2026
60 checks passed
@Subash-Mohan Subash-Mohan deleted the hotfix/29958f1a-v2.12 branch March 2, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick 🍒 Tags the PR to ensure that these are cherry-pick PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant