fix(opensearch): Use the same method for getting title that the title embedding logic uses; small cleanup for content embedding by acaprau · Pull Request #7638 · onyx-dot-app/onyx

acaprau · 2026-01-21T21:02:50Z

Description

_index_vespa_chunk uses get_title_for_document_index() to populate the title field in a document index entry, which also happens to be same data used to create a title embedding. The OpenSearch flow needs to use this too.

This PR also encapsulates the enrichment logic for creating a content embedding in a similar way that content text enrichment logic is encapsulated in backend/onyx/document_index/chunk_content_enrichment.py.

How Has This Been Tested?

@evan-onyx first wrote this title fix in https://github.com/onyx-dot-app/onyx/pull/7560/changes#diff-d09eebe79e128bf021e791c21a4ff9cf98d85d79e4da36b237885de09bbd5618 and it works.

Additional Options

[Optional] Override Linear Check

Summary by cubic

Use get_title_for_document_index in OpenSearch so the stored title matches the title used for embeddings and corrects fallback behavior. Also add a helper for content-embedding enrichment and rename the text-enrichment helper for clarity.

Bug Fixes
- OpenSearch sets title via get_title_for_document_index(), matching embedder logic and falling back to semantic_identifier when title is None.
Refactors
- Added generate_enriched_content_for_chunk_embedding() and used it in the embedder.
- Renamed generate_enriched_content_for_chunk → generate_enriched_content_for_chunk_text and updated OpenSearch/Vespa call sites.

^{Written for commit f786e38. Summary will update on new commits.}

greptile-apps · 2026-01-21T21:05:15Z

Greptile Summary

Fixed OpenSearch to use get_title_for_document_index() for the title field, ensuring consistency with Vespa and the title embedding logic. This method properly falls back to semantic_identifier when title is None (but not empty string), preventing title/embedding mismatches.

Refactored content enrichment logic by extracting generate_enriched_content_for_chunk_embedding() function that encapsulates the embedding-specific enrichment (uses metadata_suffix_semantic)
Renamed existing function to generate_enriched_content_for_chunk_text() to clarify it's for text/BM25 indexing (uses metadata_suffix_keyword)
Updated all call sites in OpenSearch, Vespa, and embedder to use the appropriate enrichment functions

Confidence Score: 5/5

This PR is safe to merge with no issues found
The changes are straightforward refactoring and bug fixes that improve code consistency. The title fix addresses a real inconsistency between OpenSearch and Vespa/embedder. The content enrichment refactoring follows existing patterns and properly separates concerns between embedding and text indexing. All changes are well-contained with clear intent.
No files require special attention

Important Files Changed

Filename	Overview
backend/onyx/document_index/chunk_content_enrichment.py	Extracted content embedding enrichment logic into a dedicated function, matching the existing pattern for text enrichment
backend/onyx/document_index/opensearch/opensearch_document_index.py	Fixed title field to use `get_title_for_document_index()` matching Vespa and embedder logic; renamed function to use new text enrichment function

Sequence Diagram

sequenceDiagram
    participant Chunk as DocMetadataAwareIndexChunk
    participant Embedder as DefaultIndexingEmbedder
    participant EnrichFunc as generate_enriched_content_for_chunk_embedding
    participant OpenSearch as opensearch_document_index
    participant TextEnrichFunc as generate_enriched_content_for_chunk_text
    participant Doc as Document

    Note over Embedder,EnrichFunc: Content Embedding Flow
    Embedder->>EnrichFunc: Call with DocAwareChunk
    EnrichFunc-->>Embedder: Return enriched content (with metadata_suffix_semantic)
    Embedder->>Doc: get_title_for_document_index()
    Doc-->>Embedder: Return title (or semantic_identifier if title is None)
    
    Note over OpenSearch,TextEnrichFunc: OpenSearch Indexing Flow
    OpenSearch->>Doc: get_title_for_document_index()
    Note right of OpenSearch: Fixed: Now uses same<br/>method as embedder
    Doc-->>OpenSearch: Return consistent title
    OpenSearch->>TextEnrichFunc: Call with DocMetadataAwareIndexChunk
    TextEnrichFunc-->>OpenSearch: Return enriched content (with metadata_suffix_keyword)

cubic-dev-ai

No issues found across 4 files

ty evan

f786e38

acaprau requested a review from a team as a code owner January 21, 2026 21:02

cubic-dev-ai bot reviewed Jan 21, 2026

View reviewed changes

acaprau enabled auto-merge January 21, 2026 21:06

evan-onyx approved these changes Jan 21, 2026

View reviewed changes

acaprau added this pull request to the merge queue Jan 21, 2026

Merged via the queue into main with commit 267042a Jan 21, 2026
78 checks passed

acaprau deleted the andrei/260121/0/opensearch/fix-title-insertion-and-create-func-for-content-embedding-enrichment branch January 21, 2026 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(opensearch): Use the same method for getting title that the title embedding logic uses; small cleanup for content embedding#7638

fix(opensearch): Use the same method for getting title that the title embedding logic uses; small cleanup for content embedding#7638
acaprau merged 1 commit intomainfrom
andrei/260121/0/opensearch/fix-title-insertion-and-create-func-for-content-embedding-enrichment

acaprau commented Jan 21, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

greptile-apps bot commented Jan 21, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acaprau commented Jan 21, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Additional Options

Summary by cubic

Uh oh!

greptile-apps bot commented Jan 21, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

acaprau commented Jan 21, 2026 •

edited by cubic-dev-ai bot

Loading