feat: Maintain correct docs on replay by yuhongsun96 · Pull Request #7683 · onyx-dot-app/onyx

yuhongsun96 · 2026-01-23T00:58:33Z

Description

Previously, when replaying a session, it would show all the docs that came back from the search, now it shows it same as the first pass. Addresses: ENG-3135

Additionally now sorts the cited sources on replay based on the order they appear in the text.

How Has This Been Tested?

Verified with main chat loop with some tools
Verified with deep research

Additional Options

[Optional] Override Linear Check

Summary by cubic

Fixes doc selection on chat replay so it shows the same displayed docs as the original turn, not the full search results. Addresses ENG-3135 (doc selection on replay).

Bug Fixes
- Added displayed_docs to SearchDocsResponse and saved those per tool call; replay now uses this subset.
- Tracked all fetched search docs and emitted citation numbers in ChatStateContainer; saved only emitted citations, deduped docs, and ordered citations by first appearance.
- Updated save logic to create DB entries from all_search_docs, link tool calls to displayed docs, and build citations from the emitted mapping.

^{Written for commit 84e6067. Summary will update on new commits.}

cubic-dev-ai

1 issue found across 8 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/onyx/tools/fake_tools/research_agent.py">

<violation number="1" location="backend/onyx/tools/fake_tools/research_agent.py:508">
P2: `displayed_docs or search_docs` treats an empty list as falsy and will persist all search docs even when no docs were displayed. Preserve an empty `displayed_docs` by checking for `None` explicitly.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/onyx/tools/fake_tools/research_agent.py

greptile-apps · 2026-01-23T01:02:17Z

Greptile Overview

Greptile Summary

This PR fixes the document selection issue on chat replay (ENG-3135) by separating the concepts of "all fetched documents" from "displayed documents" throughout the chat pipeline.

Key Changes:

Added displayed_docs field to SearchDocsResponse to distinguish between all search results and the subset shown to users
Introduced ChatStateContainer tracking for all search docs (deduplicated by document_id) and emitted citations
Modified save_chat_turn() to create DB entries from all fetched docs while linking only displayed docs to tool calls
Added citation filtering to save only citations that were actually emitted during streaming
Applied changes consistently across both main chat loop and research agent flows

Impact:
When replaying a chat session, the UI now displays the same documents that were shown originally (via displayed_docs) rather than the full search results, ensuring replay accuracy matches the original experience.

Confidence Score: 4/5

This PR is safe to merge with low risk - the changes are well-structured and maintain backward compatibility
The implementation correctly separates displayed docs from all search docs and adds proper citation tracking. The logic is consistent across both chat flows (main loop and research agent). Minor risk exists in the fallback logic and the complexity of the deduplication strategy, but the changes are well-contained and the core logic appears sound.
Pay attention to backend/onyx/chat/save_chat.py - the most complex file with multiple deduplication and mapping steps

Important Files Changed

Filename	Overview
backend/onyx/chat/chat_state.py	Added search doc tracking and citation emission tracking to `ChatStateContainer` with thread-safe methods and deduplication support
backend/onyx/chat/llm_loop.py	Extracts `displayed_docs`, adds all `search_docs` to state container, and saves `displayed_docs` or `search_docs` fallback to tool call info
backend/onyx/chat/llm_step.py	Added tracking of emitted citations by calling `state_container.add_emitted_citation()` whenever a citation is streamed to the frontend
backend/onyx/chat/save_chat.py	Refactored to create DB docs from pre-deduplicated `all_search_docs`, link displayed docs to tool calls, and filter citations by emitted set

Sequence Diagram

sequenceDiagram
    participant SearchTool
    participant LLMLoop
    participant StateContainer
    participant LLMStep
    participant SaveChat
    participant DB

    SearchTool->>SearchTool: Execute search query
    SearchTool->>SearchTool: Generate search_docs & final_ui_docs
    SearchTool->>LLMLoop: Return SearchDocsResponse<br/>(search_docs, displayed_docs, citation_mapping)
    
    LLMLoop->>StateContainer: add_search_docs(search_docs)<br/>(stores ALL search docs)
    LLMLoop->>LLMLoop: Create ToolCallInfo with<br/>displayed_docs or search_docs
    LLMLoop->>StateContainer: add_tool_call(tool_call_info)
    
    LLMLoop->>LLMStep: Stream LLM response
    LLMStep->>LLMStep: Process citation in answer
    LLMStep->>StateContainer: add_emitted_citation(citation_num)<br/>(track citations that appear in text)
    LLMStep->>StateContainer: set_citation_mapping(citation_to_doc)
    
    LLMLoop->>SaveChat: save_chat_turn(citation_to_doc,<br/>all_search_docs, emitted_citations)
    
    SaveChat->>DB: Create SearchDoc entries<br/>from all_search_docs
    SaveChat->>SaveChat: Build tool_call -> displayed_docs mapping
    SaveChat->>SaveChat: Filter citations by emitted_citations
    SaveChat->>DB: Link displayed_docs to ToolCalls
    SaveChat->>DB: Link all search_docs to ChatMessage
    SaveChat->>DB: Save citations mapping (emitted only)

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

backend/onyx/chat/llm_loop.py

backend/onyx/tools/fake_tools/research_agent.py

jessicasingh7 · 2026-01-23T02:39:20Z

backend/onyx/tools/fake_tools/research_agent.py

                            tool_call_arguments=tool_call.tool_args,
                            tool_call_response=tool_response.llm_facing_response,
-                            search_docs=search_docs,
+                            search_docs=displayed_docs or search_docs,


when would this be the case that displayed_docs are None?

Doesn't the research agents need all search docs?

LLM filtering step could fail, I think it's ok to default to the larger set. I think it's likely ok to just let it fail also

Changing this to the display docs is only for saving the tool call in the DB. It's for replaying and it's for the internal search / web search tool

jessicasingh7 · 2026-01-23T02:44:06Z

web/src/sections/document-sidebar/DocumentsSidebar.tsx


-    // Separate cited documents from other documents
-    const citedDocumentIds = useMemo(() => {
+    // Get citations in order and build a set of cited document IDs


Why are we adding a citation order? Earlier citations were in order of importance already I thought

Mmm it definitely wasn't showing in order in the UI. We just load them from the DB and pass them to the frontend via the relationship table and there is no sorting or anything prior to this so I'm pretty sure it wasn't sorted

jessicasingh7 · 2026-01-23T02:49:08Z

backend/onyx/tools/fake_tools/research_agent.py

                            tool_call_arguments=tool_call.tool_args,
                            tool_call_response=tool_response.llm_facing_response,
-                            search_docs=search_docs,
+                            search_docs=displayed_docs or search_docs,


Doesn't the research agents need all search docs?

k

a5c8a14

yuhongsun96 requested a review from a team as a code owner January 23, 2026 00:58

cubic-dev-ai bot reviewed Jan 23, 2026

View reviewed changes

backend/onyx/tools/fake_tools/research_agent.py Show resolved Hide resolved

greptile-apps bot reviewed Jan 23, 2026

View reviewed changes

backend/onyx/chat/llm_loop.py Show resolved Hide resolved

backend/onyx/tools/fake_tools/research_agent.py Show resolved Hide resolved

k

84e6067

jessicasingh7 reviewed Jan 23, 2026

View reviewed changes

yuhongsun96 merged commit 3e4a1f8 into main Jan 23, 2026
78 of 79 checks passed

yuhongsun96 deleted the selected-docs branch January 23, 2026 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Maintain correct docs on replay#7683

feat: Maintain correct docs on replay#7683
yuhongsun96 merged 2 commits intomainfrom
selected-docs

yuhongsun96 commented Jan 23, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

greptile-apps bot commented Jan 23, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

jessicasingh7 Jan 23, 2026

Uh oh!

jessicasingh7 Jan 23, 2026

Uh oh!

yuhongsun96 Jan 23, 2026

Uh oh!

yuhongsun96 Jan 23, 2026

Uh oh!

jessicasingh7 Jan 23, 2026

Uh oh!

yuhongsun96 Jan 23, 2026

Uh oh!

jessicasingh7 Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuhongsun96 commented Jan 23, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Additional Options

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot commented Jan 23, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jessicasingh7 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

jessicasingh7 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

yuhongsun96 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

yuhongsun96 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

jessicasingh7 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

yuhongsun96 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

jessicasingh7 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuhongsun96 commented Jan 23, 2026 •

edited by cubic-dev-ai bot

Loading