feat(opensearch): Fix some stuff around metadata to improve code and match what we store in Vespa by acaprau · Pull Request #7448 · onyx-dot-app/onyx

acaprau · 2026-01-16T00:59:16Z

Description

We now store metadata list in OpenSearch, this will be used to filter on metadata fields but also used to reconstruct the metadata dict. This is more space efficient than storing both the dict and list, which is what we do in Vespa.

Also cleaned up the source links dict we return on retrieval to have keys which are ints not strs.

How Has This Been Tested?

I trust CI.

Additional Options

[Optional] Override Linear Check

Summary by cubic

Store metadata in OpenSearch as a flattened list (metadata_list) and add helpers to rebuild the dict at read time. Updates schema, indexing, and retrieval to reduce storage, align with Vespa filtering, and fix a source_links type issue.

Refactors
- Replace metadata dict storage with metadata_list (keyvalue strings) and add two converters.
- Update schema: metadata → metadata_list, add metadata_suffix, project_ids → user_projects.
- Indexer now writes metadata_list and metadata_suffix; skips empty arrays by sending None.
- Retrieval converts metadata_list back to dict and casts source_links keys to ints.
Migration
- Recreate or reindex the OpenSearch index due to mapping changes.
- Field changes: metadata → metadata_list, project_ids → user_projects, add metadata_suffix.

^{Written for commit 10341d2. Summary will update on new commits.}

cubic-dev-ai

1 issue found across 4 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/onyx/document_index/opensearch/schema.py">

<violation number="1" location="backend/onyx/document_index/opensearch/schema.py:38">
P0: This rename breaks an existing import. `opensearch_document_index.py` imports `PROJECT_IDS_FIELD_NAME` which no longer exists after this rename. This will cause an `ImportError` at runtime. The import and usage in `opensearch_document_index.py` should be updated to use `USER_PROJECTS_FIELD_NAME`.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-01-16T01:02:30Z

backend/onyx/document_index/opensearch/schema.py

 SOURCE_LINKS_FIELD_NAME = "source_links"
 DOCUMENT_SETS_FIELD_NAME = "document_sets"
-PROJECT_IDS_FIELD_NAME = "project_ids"
+USER_PROJECTS_FIELD_NAME = "user_projects"


P0: This rename breaks an existing import. opensearch_document_index.py imports PROJECT_IDS_FIELD_NAME which no longer exists after this rename. This will cause an ImportError at runtime. The import and usage in opensearch_document_index.py should be updated to use USER_PROJECTS_FIELD_NAME.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At backend/onyx/document_index/opensearch/schema.py, line 38: <comment>This rename breaks an existing import. `opensearch_document_index.py` imports `PROJECT_IDS_FIELD_NAME` which no longer exists after this rename. This will cause an `ImportError` at runtime. The import and usage in `opensearch_document_index.py` should be updated to use `USER_PROJECTS_FIELD_NAME`.</comment> <file context> @@ -35,14 +35,15 @@ SOURCE_LINKS_FIELD_NAME = "source_links" DOCUMENT_SETS_FIELD_NAME = "document_sets" -PROJECT_IDS_FIELD_NAME = "project_ids" +USER_PROJECTS_FIELD_NAME = "user_projects" DOCUMENT_ID_FIELD_NAME = "document_id" CHUNK_INDEX_FIELD_NAME = "chunk_index" </file context>

greptile-apps · 2026-01-16T01:04:36Z

Greptile Summary

This PR refactors OpenSearch metadata storage to use a flattened list format (metadata_list) instead of a JSON-stringified dict, aligning it with Vespa's storage approach for better space efficiency and filtering support. Key changes:

New utility functions in models.py for converting between metadata dict and list-of-strings formats
Schema changes: metadata → metadata_list, project_ids → user_projects, added metadata_suffix field
Retrieval improvements: source_links keys now properly cast to integers, metadata_suffix correctly read from storage for content cleanup

Critical Issue: The import and usage of PROJECT_IDS_FIELD_NAME in opensearch_document_index.py was not updated to match the schema rename to USER_PROJECTS_FIELD_NAME, which will cause an ImportError at runtime.

Confidence Score: 1/5

This PR contains an import error that will crash the application at startup when OpenSearch is used.
The PR has a critical bug where PROJECT_IDS_FIELD_NAME is imported but no longer exists in schema.py after being renamed to USER_PROJECTS_FIELD_NAME. This will cause an ImportError at runtime, preventing the OpenSearch module from loading.
backend/onyx/document_index/opensearch/opensearch_document_index.py requires immediate attention - line 47 imports a non-existent constant and line 562 uses it.

Important Files Changed

Filename	Overview
backend/onyx/connectors/models.py	Added two utility functions for metadata conversion between dict and list formats. Clean, well-documented implementation with proper edge case handling.
backend/onyx/context/search/models.py	Minor documentation-only change: Added TODO comment about metadata dict schema improvement.
backend/onyx/document_index/opensearch/schema.py	Schema updates: renamed metadata → metadata_list, project_ids → user_projects, added metadata_suffix field. Breaking schema change requiring index recreation.
backend/onyx/document_index/opensearch/opensearch_document_index.py	Critical bug: imports PROJECT_IDS_FIELD_NAME which no longer exists in schema.py (renamed to USER_PROJECTS_FIELD_NAME). Will cause ImportError at runtime.

Sequence Diagram

sequenceDiagram
    participant Chunk as DocMetadataAwareIndexChunk
    participant Convert as Indexer
    participant OS as OpenSearch
    participant Retrieve as Retriever
    participant InfChunk as InferenceChunkUncleaned

    Note over Chunk,InfChunk: Indexing Flow
    Chunk->>Convert: chunk with metadata dict
    Convert->>Convert: get_metadata_str_attributes
    Note right of Convert: Converts dict to list
    Convert->>OS: Store metadata_list, metadata_suffix

    Note over Chunk,InfChunk: Retrieval Flow
    OS->>Retrieve: Return chunk with metadata_list
    Retrieve->>Retrieve: convert_metadata_list_of_strings_to_dict
    Note right of Retrieve: Reconstructs dict from list
    Retrieve->>InfChunk: Return with metadata dict

greptile-apps

Additional Comments (2)

backend/onyx/document_index/opensearch/opensearch_document_index.py, line 47 (link)

syntax: PROJECT_IDS_FIELD_NAME no longer exists in schema.py (renamed to USER_PROJECTS_FIELD_NAME). This will cause an ImportError at runtime.
backend/onyx/document_index/opensearch/opensearch_document_index.py, line 562 (link)

syntax: After fixing the import, this usage also needs to be updated to use the new constant name.

_{4 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

evan-onyx · 2026-01-16T01:31:14Z

backend/onyx/document_index/opensearch/opensearch_document_index.py

+        # Small optimization, if this list is empty we can supply None to
+        # OpenSearch and it will not store any data at all for this field, which
+        # is different from supplying an empty list.
+        user_projects=chunk.user_project if chunk.user_project else None,


chunk.user_project or None

backend/onyx/document_index/opensearch/schema.py

 SOURCE_LINKS_FIELD_NAME = "source_links"
 DOCUMENT_SETS_FIELD_NAME = "document_sets"
-PROJECT_IDS_FIELD_NAME = "project_ids"
+USER_PROJECTS_FIELD_NAME = "user_projects"


…match what we store in Vespa (#7448)

why do we live just to suffer

4d49135

acaprau requested a review from a team as a code owner January 16, 2026 00:59

cubic-dev-ai bot reviewed Jan 16, 2026

View reviewed changes

greptile-apps bot reviewed Jan 16, 2026

View reviewed changes

evan-onyx approved these changes Jan 16, 2026

View reviewed changes

ok

d015e00

acaprau enabled auto-merge January 16, 2026 03:14

type

10341d2

acaprau added this pull request to the merge queue Jan 16, 2026

Merged via the queue into main with commit e0a9723 Jan 16, 2026
75 checks passed

acaprau deleted the andrei/260115/1/opensearch/some-metadata-stuff branch January 16, 2026 03:51

rohoswagger pushed a commit that referenced this pull request Jan 19, 2026

feat(opensearch): Fix some stuff around metadata to improve code and …

235785e

…match what we store in Vespa (#7448)

jessicasingh7 pushed a commit that referenced this pull request Jan 21, 2026

feat(opensearch): Fix some stuff around metadata to improve code and …

f5616aa

…match what we store in Vespa (#7448)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(opensearch): Fix some stuff around metadata to improve code and match what we store in Vespa#7448

feat(opensearch): Fix some stuff around metadata to improve code and match what we store in Vespa#7448
acaprau merged 3 commits intomainfrom
andrei/260115/1/opensearch/some-metadata-stuff

acaprau commented Jan 16, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Jan 16, 2026

Uh oh!

This comment was marked as off-topic.

Uh oh!

greptile-apps bot commented Jan 16, 2026

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

evan-onyx Jan 16, 2026

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acaprau commented Jan 16, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Additional Options

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

greptile-apps bot commented Jan 16, 2026

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (2)

Uh oh!

evan-onyx Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

acaprau commented Jan 16, 2026 •

edited by cubic-dev-ai bot

Loading

greptile-apps bot left a comment •

edited

Loading