Skip to content

feat(opensearch): Implement match highlighting#7437

Merged
acaprau merged 4 commits intomainfrom
andrei/260115/0/opensearch/match-highlighting
Jan 15, 2026
Merged

feat(opensearch): Implement match highlighting#7437
acaprau merged 4 commits intomainfrom
andrei/260115/0/opensearch/match-highlighting

Conversation

@acaprau
Copy link
Copy Markdown
Contributor

@acaprau acaprau commented Jan 15, 2026

Description

This PR implements a feature that returns which part of a chunk's content contributed to a text match.

It also bumbs the OpenSearch image tag to 3.4.0 because somewhere between 3.0.0 and that release they dropped a fix for match highlighting; it did not work for hybrid queries beforehand. See opensearch-project/neural-search#1215

Note that the highest version of the python client is 3.1.0, and we happen to be on 3.0.0. What a mess lol. Anyway from my testing things seem to work, I can't imagine that OpenSearch removed client-facing features from 3.0 to 3.4, just added things that maybe the client doesn't reflect.

How Has This Been Tested?

test_opensearch_client

Additional Options

  • [Optional] Override Linear Check

Summary by cubic

Adds match highlighting to OpenSearch search results so we return the exact content snippets that matched a query. Highlights use tags and work for hybrid search.

  • New Features

    • Enable unified highlighter in hybrid queries (fragment_size 100, 4 fragments, tags).
    • Capture OpenSearch highlight data in SearchHit.match_highlights and pass to inference chunks (content field).
    • Update content and title fields to use index_options: offsets for faster highlighting.
    • Add tests that assert highlights are present and correctly tagged.
  • Dependencies

    • Requires OpenSearch 3.4.0+ for hybrid query highlighting; dev container bumped to 3.4.0.

Written for commit e13798a. Summary will update on new commits.

@acaprau acaprau requested a review from a team as a code owner January 15, 2026 19:35
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 15, 2026

Greptile Summary

Implements match highlighting for OpenSearch hybrid queries, returning snippets of matched text with search terms wrapped in <hi> tags. The implementation configures OpenSearch to use the unified highlighter with optimized offset indexing, extracts highlights from search results, and propagates them through to inference chunks.

  • Added match_highlights field to SearchHit model to store highlighted snippets per field
  • Configured OpenSearch schema with index_options: offsets for title and content fields to enable efficient highlighting
  • Created _get_match_highlights_configuration() to define highlight parameters (fragment size, count, tags)
  • Updated conversion logic to extract and pass highlights from OpenSearch results to inference chunks
  • Added comprehensive test assertions to verify highlighting behavior with expected tags

The PR includes a typo in the test file (match_hightlights instead of match_highlights) that needs correction.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk after fixing the typo
  • The implementation is well-structured with proper test coverage. Only issue is a variable name typo in tests that will cause the test to fail. The feature requires OpenSearch 3.4.0+ as noted in PR description, which is a deployment consideration rather than code issue.
  • Pay attention to backend/tests/external_dependency_unit/opensearch/test_opensearch_client.py due to the typo that will cause test failures

Important Files Changed

Filename Overview
backend/onyx/document_index/opensearch/client.py Added match_highlights field to SearchHit and extracted highlights from OpenSearch response
backend/onyx/document_index/opensearch/search.py Added _get_match_highlights_configuration() method and integrated it into hybrid search query
backend/tests/external_dependency_unit/opensearch/test_opensearch_client.py Added assertions to verify match highlights are returned and contain expected highlighted terms

Sequence Diagram

sequenceDiagram
    participant Client as Search Client
    participant DQ as DocumentQuery
    participant OSC as OpenSearchClient
    participant OS as OpenSearch
    participant ODI as OpenSearchDocumentIndex
    participant IC as InferenceChunk

    Client->>DQ: get_hybrid_search_query()
    DQ->>DQ: _get_match_highlights_configuration()
    Note over DQ: Configure highlighting with<br/>fragment_size, number_of_fragments,<br/>and highlight tags
    DQ-->>Client: Query with highlight config

    Client->>OSC: search(query_body)
    OSC->>OS: Execute search
    OS-->>OSC: Results with highlight field
    Note over OSC: Extract match_highlights<br/>from hit.get("highlight")
    OSC->>OSC: Create SearchHit with match_highlights
    OSC-->>Client: List of SearchHit with highlights

    Client->>ODI: Process search results
    ODI->>ODI: _convert_retrieved_opensearch_chunk_to_inference_chunk_uncleaned()
    Note over ODI: Extract content field highlights<br/>from highlights dict
    ODI->>IC: Create InferenceChunkUncleaned
    IC-->>ODI: Chunk with match_highlights
    ODI-->>Client: Inference chunks with highlights
Loading

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@acaprau acaprau requested a review from evan-onyx January 15, 2026 19:56
@acaprau acaprau added this pull request to the merge queue Jan 15, 2026
Merged via the queue into main with commit e9242ca Jan 15, 2026
75 checks passed
@acaprau acaprau deleted the andrei/260115/0/opensearch/match-highlighting branch January 15, 2026 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants