Merged
Conversation
Contributor
Greptile OverviewGreptile SummaryThis PR refactors how metadata files are stored for file connectors. Previously, the entire Key Changes:
Issues Found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant API as API/Connector Endpoints
participant FileStore as File Store
participant DB as PostgreSQL
participant Connector as LocalFileConnector
Note over Client,Connector: File Upload Flow
Client->>API: Upload ZIP with .onyx_metadata.json
API->>API: Extract metadata from ZIP
API->>FileStore: Save metadata file (get file_id)
FileStore-->>API: Return metadata_file_id
API->>FileStore: Save uploaded files
FileStore-->>API: Return file_ids
API->>DB: Store connector config with zip_metadata_file_id
API-->>Client: Return file_paths and zip_metadata_file_id
Note over Client,Connector: Indexing Flow
Connector->>DB: Read connector config
DB-->>Connector: Return config with zip_metadata_file_id
Connector->>FileStore: Load metadata using file_id
FileStore-->>Connector: Return metadata JSON
Connector->>Connector: Parse metadata dict
loop For each file
Connector->>FileStore: Read file content
FileStore-->>Connector: Return file data
Connector->>Connector: Apply metadata overrides
Connector->>Connector: Process and index file
end
Note over Client,Connector: Update Connector Flow
Client->>API: Update files (add/remove)
API->>FileStore: Load current metadata (if exists)
API->>API: Merge old and new metadata
API->>FileStore: Save merged metadata as new file
Note over FileStore: Old metadata file NOT deleted (leak)
API->>DB: Update config with new zip_metadata_file_id
API-->>Client: Return updated config
|
Contributor
There was a problem hiding this comment.
2 issues found across 16 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/onyx/server/documents/connector.py">
<violation number="1" location="backend/onyx/server/documents/connector.py:703">
P2: Overly broad `except Exception` silently drops metadata on any error. If loading fails (e.g., a list item missing `"filename"` key, or an unexpected file store issue), metadata is silently lost and the user won't know their metadata overrides weren't applied. Consider narrowing to specific exceptions (`json.JSONDecodeError`, `KeyError`, `FileNotFoundError`) and/or raising an `HTTPException` for unexpected failures so users are notified.
(Based on your team's feedback about narrowing exception scope rather than catching Exception.) [FEEDBACK_USED]</violation>
<violation number="2" location="backend/onyx/server/documents/connector.py:772">
P2: Orphaned metadata files in file store: when merged metadata is saved as a new file, the old metadata files (`current_zip_metadata_file_id` and `new_zip_metadata_file_id`) are never deleted. Each update cycle leaks previously-stored metadata files. Consider deleting the old files after the new merged file is successfully saved (e.g., `file_store.delete_file(current_zip_metadata_file_id)`).</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
We are loading the metadata file into a dict in memory then saving it as a jsonl in postgres but if the dict is large, it just doesn't get saved and the metadata overrides for the file connector do not get applied. Instead of saving the entire dict, we can just have a pointer to it in postgres and load it when the FileConnector actually runs.
How Has This Been Tested?
Tried uploading a zip with the metadata file which was not working previously, now it works!
Additional Options
Summary by cubic
Store ZIP metadata (.onyx_metadata.json) in the file store and reference it by file_id, loading it at run time. This fixes failures with large metadata, keeps overrides reliable for big ZIP uploads, and allows optional document_id overrides via the id field.
Bug Fixes
Migration
Written for commit e6f91c1. Summary will update on new commits.