feat: Maintain correct docs on replay#7683
Conversation
There was a problem hiding this comment.
1 issue found across 8 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/onyx/tools/fake_tools/research_agent.py">
<violation number="1" location="backend/onyx/tools/fake_tools/research_agent.py:508">
P2: `displayed_docs or search_docs` treats an empty list as falsy and will persist all search docs even when no docs were displayed. Preserve an empty `displayed_docs` by checking for `None` explicitly.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Greptile OverviewGreptile SummaryThis PR fixes the document selection issue on chat replay (ENG-3135) by separating the concepts of "all fetched documents" from "displayed documents" throughout the chat pipeline. Key Changes:
Impact: Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant SearchTool
participant LLMLoop
participant StateContainer
participant LLMStep
participant SaveChat
participant DB
SearchTool->>SearchTool: Execute search query
SearchTool->>SearchTool: Generate search_docs & final_ui_docs
SearchTool->>LLMLoop: Return SearchDocsResponse<br/>(search_docs, displayed_docs, citation_mapping)
LLMLoop->>StateContainer: add_search_docs(search_docs)<br/>(stores ALL search docs)
LLMLoop->>LLMLoop: Create ToolCallInfo with<br/>displayed_docs or search_docs
LLMLoop->>StateContainer: add_tool_call(tool_call_info)
LLMLoop->>LLMStep: Stream LLM response
LLMStep->>LLMStep: Process citation in answer
LLMStep->>StateContainer: add_emitted_citation(citation_num)<br/>(track citations that appear in text)
LLMStep->>StateContainer: set_citation_mapping(citation_to_doc)
LLMLoop->>SaveChat: save_chat_turn(citation_to_doc,<br/>all_search_docs, emitted_citations)
SaveChat->>DB: Create SearchDoc entries<br/>from all_search_docs
SaveChat->>SaveChat: Build tool_call -> displayed_docs mapping
SaveChat->>SaveChat: Filter citations by emitted_citations
SaveChat->>DB: Link displayed_docs to ToolCalls
SaveChat->>DB: Link all search_docs to ChatMessage
SaveChat->>DB: Save citations mapping (emitted only)
|
| tool_call_arguments=tool_call.tool_args, | ||
| tool_call_response=tool_response.llm_facing_response, | ||
| search_docs=search_docs, | ||
| search_docs=displayed_docs or search_docs, |
There was a problem hiding this comment.
when would this be the case that displayed_docs are None?
There was a problem hiding this comment.
Doesn't the research agents need all search docs?
There was a problem hiding this comment.
LLM filtering step could fail, I think it's ok to default to the larger set. I think it's likely ok to just let it fail also
There was a problem hiding this comment.
Changing this to the display docs is only for saving the tool call in the DB. It's for replaying and it's for the internal search / web search tool
|
|
||
| // Separate cited documents from other documents | ||
| const citedDocumentIds = useMemo(() => { | ||
| // Get citations in order and build a set of cited document IDs |
There was a problem hiding this comment.
Why are we adding a citation order? Earlier citations were in order of importance already I thought
There was a problem hiding this comment.
Mmm it definitely wasn't showing in order in the UI. We just load them from the DB and pass them to the frontend via the relationship table and there is no sorting or anything prior to this so I'm pretty sure it wasn't sorted
| tool_call_arguments=tool_call.tool_args, | ||
| tool_call_response=tool_response.llm_facing_response, | ||
| search_docs=search_docs, | ||
| search_docs=displayed_docs or search_docs, |
There was a problem hiding this comment.
Doesn't the research agents need all search docs?
Description
Previously, when replaying a session, it would show all the docs that came back from the search, now it shows it same as the first pass. Addresses: ENG-3135
Additionally now sorts the cited sources on replay based on the order they appear in the text.
How Has This Been Tested?
Verified with main chat loop with some tools
Verified with deep research
Additional Options
Summary by cubic
Fixes doc selection on chat replay so it shows the same displayed docs as the original turn, not the full search results. Addresses ENG-3135 (doc selection on replay).
Written for commit 84e6067. Summary will update on new commits.