fix(web search): removing site: operator from exa query by jessicasingh7 · Pull Request #7248 · onyx-dot-app/onyx

jessicasingh7 · 2026-01-07T02:21:00Z

Description

ENG-3276

Before vs. After

How Has This Been Tested?

Additional Options

[Optional] Override Linear Check

cubic-dev-ai

1 issue found across 2 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/onyx/tools/tool_implementations/web_search/web_search_tool.py">

<violation number="1" location="backend/onyx/tools/tool_implementations/web_search/web_search_tool.py:148">
P1: Regex inconsistency: extraction allows space after `site:` but removal doesn't. If user writes `site: example.com`, the domain will be extracted but `site: example.com` won't be removed from the query.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-01-07T02:23:38Z

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

+            cleaned_query = re.sub(
+                r"site:\S+\s*", "", query, flags=re.IGNORECASE
+            ).strip()


P1: Regex inconsistency: extraction allows space after site: but removal doesn't. If user writes site: example.com, the domain will be extracted but site: example.com won't be removed from the query.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At backend/onyx/tools/tool_implementations/web_search/web_search_tool.py, line 148: <comment>Regex inconsistency: extraction allows space after `site:` but removal doesn't. If user writes `site: example.com`, the domain will be extracted but `site: example.com` won't be removed from the query.</comment> <file context> @@ -118,12 +122,62 @@ def emit_start(self, turn_index: int) -> None: + site_domains = re.findall(r"site:\s*([^\s]+)", query, re.IGNORECASE) + + # Remove site: operator for Exa + cleaned_query = re.sub( + r"site:\S+\s*", "", query, flags=re.IGNORECASE + ).strip() </file context>

Suggested change

cleaned_query = re.sub(

r"site:\S+\s*", "", query, flags=re.IGNORECASE

).strip()

cleaned_query = re.sub(

r"site:\s*\S+\s*", "", query, flags=re.IGNORECASE

).strip()

✅ Addressed in d520fd2

greptile-apps · 2026-01-07T02:24:52Z

Greptile Summary

Fixed Exa web search by converting site: operators to Exa's native include_domains parameter instead of passing them in the query string. Exa doesn't support the site: syntax, causing searches to fail. The fix extracts domains from site: operators, removes them from the query, and passes them via Exa's API parameter. Also fixed regex pattern mismatch from previous review where extraction used site:\s*([^\s]+) but removal used site:\S+\s* - now both use site:\s*\S+\s* for consistency.

Key changes:

Added _transform_queries_for_provider() to extract domains and clean queries for Exa
Modified _execute_single_search() to accept include_domains parameter with fallback logic
Updated ExaClient.search() to support include_domains parameter
Improved error handling to return structured message when no results found instead of raising exception
Added SectionEnd emission for proper streaming completion signal

Confidence Score: 5/5

Safe to merge - addresses reported bug with clean implementation
The PR correctly fixes the Exa site: operator issue by using Exa's native API parameter, fixes the previously reported regex mismatch, includes proper fallback logic, and improves error handling. The changes are focused, well-structured, and follow good practices.
No files require special attention

Important Files Changed

Filename	Overview
backend/onyx/tools/tool_implementations/web_search/web_search_tool.py	Fixed regex pattern mismatch for `site:` operator extraction/removal. Added Exa-specific query transformation with domain extraction and fallback search logic. Improved error handling for empty search results.
backend/onyx/tools/tool_implementations/web_search/clients/exa_client.py	Added `include_domains` parameter to `search()` method to support Exa's native domain filtering API.

Sequence Diagram

sequenceDiagram
    participant User
    participant WebSearchTool
    participant ExaClient
    participant ExaAPI

    User->>WebSearchTool: run(queries=["site:example.com python"])
    WebSearchTool->>WebSearchTool: _transform_queries_for_provider()
    Note over WebSearchTool: Extract domains: ["example.com"]<br/>Clean query: "python"<br/>Map: {"python": ["example.com"]}
    WebSearchTool->>WebSearchTool: emit SearchToolQueriesDelta
    WebSearchTool->>WebSearchTool: _execute_single_search(query="python", include_domains=["example.com"])
    alt include_domains provided
        WebSearchTool->>ExaClient: search(query="python", include_domains=["example.com"])
        ExaClient->>ExaAPI: search_and_contents(query="python", include_domains=["example.com"])
        ExaAPI-->>ExaClient: results
        ExaClient-->>WebSearchTool: WebSearchResult[]
        alt results found
            WebSearchTool->>WebSearchTool: return results
        else no results
            WebSearchTool->>ExaClient: search(query="python", include_domains=None)
            Note over WebSearchTool: Fallback without domain restriction
            ExaClient->>ExaAPI: search_and_contents(query="python", include_domains=None)
            ExaAPI-->>ExaClient: results
            ExaClient-->>WebSearchTool: WebSearchResult[]
        end
    else no include_domains
        WebSearchTool->>ExaClient: search(query="python", include_domains=None)
        ExaClient->>ExaAPI: search_and_contents(query="python", include_domains=None)
        ExaAPI-->>ExaClient: results
        ExaClient-->>WebSearchTool: WebSearchResult[]
    end
    WebSearchTool->>WebSearchTool: emit SearchToolDocumentsDelta
    WebSearchTool->>WebSearchTool: emit SectionEnd
    WebSearchTool->>User: ToolResponse

greptile-apps

Additional Comments (1)

backend/onyx/tools/tool_implementations/web_search/clients/exa_client.py, line 36 (link)

style: ternary is redundant: include_domains already defaults to None, and empty list [] would also be falsy. This line effectively just converts [] to None.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

jessicasingh7 · 2026-01-07T20:52:15Z

@greptile

greptile-apps · 2026-01-07T20:56:51Z

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

backend/onyx/tools/tool_implementations/web_search/clients/exa_client.py

evan-onyx · 2026-01-07T21:27:26Z

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py


+    def _transform_queries_for_provider(
+        self, queries: list[str]
+    ) -> tuple[list[str], dict[str, list[str]]]:


to me this type is a lil too complex, could you make it a BaseModel?

Or maybe define type names ie. QueryDomainMap = dict[str, list[str]]. Might make it a bit more readable

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

evan-onyx · 2026-01-07T21:32:16Z

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

+            cleaned_query = re.sub(
+                r"site:\s*\S+\s*", "", query, flags=re.IGNORECASE
+            ).strip()
+            if not cleaned_query and site_domains:


would be good to have a comment here to explain why this happens/ why we do this

evan-onyx · 2026-01-07T21:32:48Z

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

+
+            cleaned_queries.append(cleaned_query)
+
+        return cleaned_queries if cleaned_queries else queries, query_domains_map


nit: cleaned_queries or queries, query_domains_map

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

evan-onyx · 2026-01-07T21:49:11Z

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

                if not added_any:
                    break

-        if not all_search_results:


imo we probably should propagate this error up?

Seems like you address this in your PR @yuhongsun96 ?

yuhongsun96 · 2026-01-07T21:23:21Z

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

+        include_domains: list[str] | None = None,
    ) -> list[WebSearchResult]:
        """Execute a single search query and return results."""
+        if include_domains:


why would include domains not work?

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

Danelegend · 2026-01-09T03:31:50Z

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

+        """
+        query_domains_map: dict[str, list[str]] = {}
+
+        if not isinstance(self._provider, ExaClient):


If we're checking the instance, might be worth making an abstract method (default to nothing) and override in ExaClient. Could be a bit cleaner.

But also, that could just be more work so ceebs

Danelegend

Got handed reviewing duty for this. Looks functionally good but couple style things that could change. If we wanna get it in quick tho, here's a tick

yuhong handed me to wrong pr to review haha.

evan-onyx

mostly looks good, one more issue to address

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py

jessicasingh7 requested a review from a team as a code owner January 7, 2026 02:21

cubic-dev-ai bot reviewed Jan 7, 2026

View reviewed changes

greptile-apps bot reviewed Jan 7, 2026

View reviewed changes

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py Outdated Show resolved Hide resolved

rebase

8b1916b

jessicasingh7 force-pushed the jessica/web-search-eng-3276 branch from a77afc6 to 8b1916b Compare January 7, 2026 17:48

regex

d520fd2

evan-onyx reviewed Jan 7, 2026

View reviewed changes

yuhongsun96 reviewed Jan 7, 2026

View reviewed changes

Danelegend reviewed Jan 9, 2026

View reviewed changes

Danelegend previously approved these changes Jan 9, 2026

View reviewed changes

jessicasingh7 and others added 3 commits January 9, 2026 09:21

move logic into exa client

3fb8b96

Merge branch 'main' into jessica/web-search-eng-3276

5cfc1ec

merge conflict

eabe497

evan-onyx reviewed Jan 10, 2026

View reviewed changes

backend/onyx/tools/tool_implementations/web_search/web_search_tool.py Outdated Show resolved Hide resolved

rm packet

99d835d

jessicasingh7 enabled auto-merge January 12, 2026 09:08

evan-onyx approved these changes Jan 12, 2026

View reviewed changes

jessicasingh7 added this pull request to the merge queue Jan 12, 2026

Merged via the queue into main with commit cd36baa Jan 12, 2026
73 checks passed

jessicasingh7 deleted the jessica/web-search-eng-3276 branch January 12, 2026 18:27

jessicasingh7 added a commit that referenced this pull request Jan 12, 2026

fix(web search): removing site: operator from exa query (#7248)

2ba670c

jessicasingh7 added a commit that referenced this pull request Jan 21, 2026

fix(web search): removing site: operator from exa query (#7248)

d94a01f


		cleaned_queries.append(cleaned_query)

		return cleaned_queries if cleaned_queries else queries, query_domains_map

Conversation

jessicasingh7 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Additional Options

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

Uh oh!

jessicasingh7 commented Jan 7, 2026

Uh oh!

greptile-apps bot commented Jan 7, 2026

Greptile's behavior is changing!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Danelegend left a comment

Choose a reason for hiding this comment

Uh oh!

evan-onyx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jessicasingh7 commented Jan 7, 2026 •

edited

Loading

cubic-dev-ai bot Jan 7, 2026 •

edited

Loading

greptile-apps bot commented Jan 7, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading