fix(tools): Tool name should align with what llm knows#7352
fix(tools): Tool name should align with what llm knows#7352Danelegend merged 16 commits intomainfrom
Conversation
|
|
||
|
|
||
| DEEP_RESEARCH_TOOL = { | ||
| "name": RESEARCH_AGENT_DB_NAME, |
There was a problem hiding this comment.
Not good to leave this as variable since if code changes (as it does here), then the migration changes. Hardcoding this value.
Greptile OverviewGreptile SummaryWhat This PR FixesThis PR addresses a critical tool name inconsistency between what the LLM is instructed to use and what's stored in the database. The Problem:
The Solution:
VerificationI verified all NAME constants in tool implementations match the migration mappings:
Minor Issue FoundOne style comment on the downgrade migration comment wording - purely cosmetic, doesn't affect functionality. Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant LLM as LLM
participant ToolImpl as Tool Implementation<br/>(e.g., SearchTool.NAME)
participant DB as Database<br/>(tool.name)
participant ChatUtils as chat_utils.py<br/>(tool_id_to_name_map)
participant History as Message History
Note over ToolImpl: NAME = "internal_search"
Note over DB: BEFORE: name = "SearchTool"<br/>AFTER: name = "internal_search"
rect rgb(240, 240, 255)
Note over DB,ChatUtils: Tool History Reconstruction Flow
ChatUtils->>DB: get_tools(db_session)
DB-->>ChatUtils: List of tools
ChatUtils->>ChatUtils: Build tool_id_to_name_map<br/>{tool.id: tool.name}
ChatUtils->>History: Reconstruct tool calls<br/>using tool_name from map
History->>LLM: Send tool call history<br/>{"function_name": tool.name}
end
rect rgb(255, 240, 240)
Note over LLM,ToolImpl: Before Migration (BROKEN)
LLM->>History: Sees "SearchTool" in history
Note over LLM: Confused! I only know<br/>"internal_search"
end
rect rgb(240, 255, 240)
Note over LLM,ToolImpl: After Migration (FIXED)
LLM->>History: Sees "internal_search" in history
Note over LLM: Perfect! Matches what I know
end
|
| DEEP_RESEARCH_TOOL = { | ||
| "name": RESEARCH_AGENT_DB_NAME, | ||
| "name": "ResearchAgent", |
There was a problem hiding this comment.
There's an inconsistency between this migration and the new 19a896a66d3f migration:
- This migration (potentially already run in production) inserts the tool with
name: "ResearchAgent" - The new migration
19a896a66d3fthen changes it to"research_agent" - But this hardcoded value will persist if someone re-runs or reviews this migration
While the hardcoding is fine (removes the import dependency), the value should match what the final state will be after 19a896a66d3f runs. Consider changing this to:
| DEEP_RESEARCH_TOOL = { | |
| "name": RESEARCH_AGENT_DB_NAME, | |
| "name": "ResearchAgent", | |
| "name": "research_agent", |
However, be aware that modifying an already-run migration can cause issues. If this migration has already run in production, you may want to leave it as-is and let 19a896a66d3f handle the update. But if databases exist where this hasn't run yet, they'll get the wrong initial value.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/alembic/versions/c1d2e3f4a5b6_add_deep_research_tool.py
Line: 20:21
Comment:
There's an inconsistency between this migration and the new `19a896a66d3f` migration:
1. This migration (potentially already run in production) inserts the tool with `name: "ResearchAgent"`
2. The new migration `19a896a66d3f` then changes it to `"research_agent"`
3. But this hardcoded value will persist if someone re-runs or reviews this migration
While the hardcoding is fine (removes the import dependency), the value should match what the final state will be after `19a896a66d3f` runs. Consider changing this to:
```suggestion
"name": "research_agent",
```
However, be aware that modifying an already-run migration can cause issues. If this migration has already run in production, you may want to leave it as-is and let `19a896a66d3f` handle the update. But if databases exist where this hasn't run yet, they'll get the wrong initial value.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
No, we want to keep it consistent with what the name was at that point in time.
There was a problem hiding this comment.
6 issues found across 80 files (changes from recent commits).
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/onyx/db/release_notes.py">
<violation number="1">
P1: URL fragment should come after query parameters. The current construction `#{version_anchor}?{urlencode(utm_params)}` places the fragment before the query string, which means the UTM parameters will be treated as part of the fragment and never sent to the server. This breaks all UTM tracking/analytics.</violation>
</file>
<file name="backend/onyx/server/features/notifications/api.py">
<violation number="1">
P1: Docstring says "Get all undismissed notifications" but the code now uses `include_dismissed=True`, which returns dismissed notifications as well. Either the docstring needs to be updated to match the new behavior, or this change is incorrect.</violation>
</file>
<file name="web/src/layouts/actions-layouts.tsx">
<violation number="1">
P2: The `Section` component defaults to `alignItems="center"`, but the original `div` had `align-items: stretch` (CSS default). This may cause children without explicit widths to be centered instead of stretched. Consider adding `alignItems="stretch"` to preserve the original behavior.</violation>
</file>
<file name="backend/alembic/versions/8405ca81cc83_notifications_constraint.py">
<violation number="1">
P2: Comment/code mismatch: Comment mentions cleaning up 'reindex' notifications but the DELETE filters on `title = 'New Notification'`. Please update either the comment to match the actual deletion criteria, or fix the WHERE clause if 'reindex' notifications have a different title.</violation>
</file>
<file name="backend/onyx/server/query_and_chat/session_loading.py">
<violation number="1">
P0: `RESEARCH_AGENT_DB_NAME` is not imported or defined anywhere in the codebase. This will cause a `NameError` at runtime. Based on the existing import and usage on line 460, this should be `RESEARCH_AGENT_IN_CODE_ID`.</violation>
</file>
<file name="web/src/components/dateRangeSelectors/AdminDateRangeSelector.tsx">
<violation number="1">
P2: Restoring the `className="w-auto p-0"` on the new `Popover.Content` is necessary to match the previous calendar layout; otherwise the refresh Popover defaults add padding and change width, causing the calendar/preset list to be misaligned.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
evan-onyx
left a comment
There was a problem hiding this comment.
how does get_built_in_tool_by_id work now? I'm surprised that tool calling with internal tools still works after the migration
|
|
||
| # Mapping of in_code_tool_id to the NAME constant from each tool class | ||
| # These are the currently seeded tool names | ||
| CURRENT_TOOL_NAME_MAPPING = { |
There was a problem hiding this comment.
the keys here look wrong, should be i.e. internal_search
There was a problem hiding this comment.
This is a mapping in_code_id -> name for how it currently looks (for the downgrade).
Basically this is just how it currently looks now, so that we can downgrade appropriately
get_built_in_tool_by_id takes in in_code_tool_id. This hasn't been changed. We're only updating the tool.name value |
|
@greptile |
| # Reverse the migration by setting name back to in_code_tool_id | ||
| # This matches the original pattern where name was the class name |
There was a problem hiding this comment.
The downgrade comment says "setting name back to in_code_tool_id" but the code actually sets it to current_name from CURRENT_TOOL_NAME_MAPPING, which happens to equal the in_code_tool_id. This is correct behavior, but the comment is slightly misleading.
Consider clarifying the comment to be more explicit:
| # Reverse the migration by setting name back to in_code_tool_id | |
| # This matches the original pattern where name was the class name | |
| # Reverse the migration by setting name back to the class name (which matches in_code_tool_id) | |
| # This restores the original pattern where tool.name was the same as the tool class name |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/alembic/versions/d25168c2beee_tool_name_consistency.py
Line: 72:73
Comment:
The downgrade comment says "setting name back to in_code_tool_id" but the code actually sets it to `current_name` from `CURRENT_TOOL_NAME_MAPPING`, which happens to equal the in_code_tool_id. This is correct behavior, but the comment is slightly misleading.
Consider clarifying the comment to be more explicit:
```suggestion
# Reverse the migration by setting name back to the class name (which matches in_code_tool_id)
# This restores the original pattern where tool.name was the same as the tool class name
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
3 issues found across 3 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/tests/integration/tests/migrations/test_tool_seeding.py">
<violation number="1" location="backend/tests/integration/tests/migrations/test_tool_seeding.py:100">
P2: The `in_code_tool_id` field is defined in `ToolSeedingExpectedResult` and populated in `EXPECTED_TOOLS` but is never validated in `validate_tool`. Consider adding an assertion to verify `tool[4] == expected.in_code_tool_id`.</violation>
</file>
<file name="web/tests/e2e/utils/tools.ts">
<violation number="1" location="web/tests/e2e/utils/tools.ts:9">
P2: `searchOption` selector uses a misspelled tool name (`intenral_search`), so tests can no longer find the search tool option.</violation>
</file>
<file name="backend/tests/integration/tests/personas/test_unified_assistant.py">
<violation number="1" location="backend/tests/integration/tests/personas/test_unified_assistant.py:41">
P3: Align the assertion failure message with the tool name actually being validated so test failures mention the correct tool.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
1 issue found across 16 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/onyx/tools/tool_implementations/web_search/web_search_tool.py">
<violation number="1">
P2: Dead code: This `if not all_search_results:` branch is unreachable because the same condition already raises a `ToolCallException` earlier in the method (around line 248). If `all_search_results` is empty, the function will have already thrown an exception before reaching this code.
Either remove this dead branch, or if the intent is to handle empty results gracefully without throwing, remove the earlier exception.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Description
When reconstructing the chat history for the llm, the provided tool name is pulled from the database. Meanwhile, the llm is told the tool name from from a prompt. Currently there is a misalignment between the tool name in the database and the tool name in the prompt.
This migration ensures that tool.name is aligned with what we expect the llm to see. Although this doesn't solve the fundamental problem of needing to align database + prompt name, it solves the immediate problem. A larger followup PR can come to refactor this logic.
How Has This Been Tested?
Run migration and ensure that the database presents us with correct tools.
Validate on braintrust that tool history is consistent with the action that it took.
Additional Options
closes https://linear.app/onyx-app/issue/ENG-3360/tool-inconsistency
Summary by cubic
Aligns tool names in the database with the names the LLM is instructed to use, fixing mismatched tool calls and history reconstruction. Addresses ENG-3360 (tool inconsistency).
Migration
Code Updates
Written for commit e38f2c2. Summary will update on new commits.