fix(web search): removing site: operator from exa query#7248
fix(web search): removing site: operator from exa query#7248jessicasingh7 merged 6 commits intomainfrom
Conversation
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/onyx/tools/tool_implementations/web_search/web_search_tool.py">
<violation number="1" location="backend/onyx/tools/tool_implementations/web_search/web_search_tool.py:148">
P1: Regex inconsistency: extraction allows space after `site:` but removal doesn't. If user writes `site: example.com`, the domain will be extracted but `site: example.com` won't be removed from the query.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| cleaned_query = re.sub( | ||
| r"site:\S+\s*", "", query, flags=re.IGNORECASE | ||
| ).strip() |
There was a problem hiding this comment.
P1: Regex inconsistency: extraction allows space after site: but removal doesn't. If user writes site: example.com, the domain will be extracted but site: example.com won't be removed from the query.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/onyx/tools/tool_implementations/web_search/web_search_tool.py, line 148:
<comment>Regex inconsistency: extraction allows space after `site:` but removal doesn't. If user writes `site: example.com`, the domain will be extracted but `site: example.com` won't be removed from the query.</comment>
<file context>
@@ -118,12 +122,62 @@ def emit_start(self, turn_index: int) -> None:
+ site_domains = re.findall(r"site:\s*([^\s]+)", query, re.IGNORECASE)
+
+ # Remove site: operator for Exa
+ cleaned_query = re.sub(
+ r"site:\S+\s*", "", query, flags=re.IGNORECASE
+ ).strip()
</file context>
| cleaned_query = re.sub( | |
| r"site:\S+\s*", "", query, flags=re.IGNORECASE | |
| ).strip() | |
| cleaned_query = re.sub( | |
| r"site:\s*\S+\s*", "", query, flags=re.IGNORECASE | |
| ).strip() |
✅ Addressed in d520fd2
Greptile SummaryFixed Exa web search by converting Key changes:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant WebSearchTool
participant ExaClient
participant ExaAPI
User->>WebSearchTool: run(queries=["site:example.com python"])
WebSearchTool->>WebSearchTool: _transform_queries_for_provider()
Note over WebSearchTool: Extract domains: ["example.com"]<br/>Clean query: "python"<br/>Map: {"python": ["example.com"]}
WebSearchTool->>WebSearchTool: emit SearchToolQueriesDelta
WebSearchTool->>WebSearchTool: _execute_single_search(query="python", include_domains=["example.com"])
alt include_domains provided
WebSearchTool->>ExaClient: search(query="python", include_domains=["example.com"])
ExaClient->>ExaAPI: search_and_contents(query="python", include_domains=["example.com"])
ExaAPI-->>ExaClient: results
ExaClient-->>WebSearchTool: WebSearchResult[]
alt results found
WebSearchTool->>WebSearchTool: return results
else no results
WebSearchTool->>ExaClient: search(query="python", include_domains=None)
Note over WebSearchTool: Fallback without domain restriction
ExaClient->>ExaAPI: search_and_contents(query="python", include_domains=None)
ExaAPI-->>ExaClient: results
ExaClient-->>WebSearchTool: WebSearchResult[]
end
else no include_domains
WebSearchTool->>ExaClient: search(query="python", include_domains=None)
ExaClient->>ExaAPI: search_and_contents(query="python", include_domains=None)
ExaAPI-->>ExaClient: results
ExaClient-->>WebSearchTool: WebSearchResult[]
end
WebSearchTool->>WebSearchTool: emit SearchToolDocumentsDelta
WebSearchTool->>WebSearchTool: emit SectionEnd
WebSearchTool->>User: ToolResponse
|
There was a problem hiding this comment.
Additional Comments (1)
-
backend/onyx/tools/tool_implementations/web_search/clients/exa_client.py, line 36 (link)style: ternary is redundant:
include_domainsalready defaults toNone, and empty list[]would also be falsy. This line effectively just converts[]toNone.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
2 files reviewed, 2 comments
backend/onyx/tools/tool_implementations/web_search/web_search_tool.py
Outdated
Show resolved
Hide resolved
a77afc6 to
8b1916b
Compare
|
@greptile |
Greptile's behavior is changing!From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
backend/onyx/tools/tool_implementations/web_search/clients/exa_client.py
Outdated
Show resolved
Hide resolved
|
|
||
| def _transform_queries_for_provider( | ||
| self, queries: list[str] | ||
| ) -> tuple[list[str], dict[str, list[str]]]: |
There was a problem hiding this comment.
to me this type is a lil too complex, could you make it a BaseModel?
There was a problem hiding this comment.
Or maybe define type names ie. QueryDomainMap = dict[str, list[str]]. Might make it a bit more readable
backend/onyx/tools/tool_implementations/web_search/web_search_tool.py
Outdated
Show resolved
Hide resolved
| cleaned_query = re.sub( | ||
| r"site:\s*\S+\s*", "", query, flags=re.IGNORECASE | ||
| ).strip() | ||
| if not cleaned_query and site_domains: |
There was a problem hiding this comment.
would be good to have a comment here to explain why this happens/ why we do this
|
|
||
| cleaned_queries.append(cleaned_query) | ||
|
|
||
| return cleaned_queries if cleaned_queries else queries, query_domains_map |
There was a problem hiding this comment.
nit: cleaned_queries or queries, query_domains_map
backend/onyx/tools/tool_implementations/web_search/web_search_tool.py
Outdated
Show resolved
Hide resolved
| if not added_any: | ||
| break | ||
|
|
||
| if not all_search_results: |
There was a problem hiding this comment.
imo we probably should propagate this error up?
There was a problem hiding this comment.
Seems like you address this in your PR @yuhongsun96 ?
| include_domains: list[str] | None = None, | ||
| ) -> list[WebSearchResult]: | ||
| """Execute a single search query and return results.""" | ||
| if include_domains: |
There was a problem hiding this comment.
why would include domains not work?
backend/onyx/tools/tool_implementations/web_search/web_search_tool.py
Outdated
Show resolved
Hide resolved
| """ | ||
| query_domains_map: dict[str, list[str]] = {} | ||
|
|
||
| if not isinstance(self._provider, ExaClient): |
There was a problem hiding this comment.
If we're checking the instance, might be worth making an abstract method (default to nothing) and override in ExaClient. Could be a bit cleaner.
But also, that could just be more work so ceebs
Danelegend
left a comment
There was a problem hiding this comment.
Got handed reviewing duty for this. Looks functionally good but couple style things that could change. If we wanna get it in quick tho, here's a tick
yuhong handed me to wrong pr to review haha.
evan-onyx
left a comment
There was a problem hiding this comment.
mostly looks good, one more issue to address
Description
ENG-3276
Before vs. After

How Has This Been Tested?
Additional Options