Skip to content

fix(vespa): Make ID retrieval always check for tenant ID; Add additional tenant ID checks in the new interface#7480

Merged
acaprau merged 3 commits intomainfrom
andrei/260116/1/opensearch/admin-retrieval-always-needs-tenant-id
Jan 17, 2026
Merged

fix(vespa): Make ID retrieval always check for tenant ID; Add additional tenant ID checks in the new interface#7480
acaprau merged 3 commits intomainfrom
andrei/260116/1/opensearch/admin-retrieval-always-needs-tenant-id

Conversation

@acaprau
Copy link
Copy Markdown
Contributor

@acaprau acaprau commented Jan 16, 2026

Description

We're seeing that tenant ID is not being set for ID retrieval, and this is causing issues with the more restrictive document index interface. This PR gets tenant ID from the context and supplies this for document index endpoints that never supplied it. For endpoints that do supply it, this PR just adds an additional check that the supplied ID matches the context ID.

When we get rid of the old interface entirely all this logic will become simpler.

How Has This Been Tested?

Testing locally now.

Additional Options

  • [Optional] Override Linear Check

Summary by cubic

Ensure all Vespa index operations use the context tenant ID and validate it, fixing ID-based retrieval failures and preventing cross-tenant access with the new interface.

  • Bug Fixes
    • Build TenantState from get_current_tenant_id() + MULTI_TENANT across index, update_single, delete_single, id_based_retrieval, hybrid_retrieval, and random_retrieval.
    • Validate provided tenant_id against context; raise on mismatch. Also check multitenant flag.
    • Stop relying on filters.tenant_id; retrieval always uses context tenant.
    • Pass a consistent TenantState to VespaDocumentIndex.

Written for commit 1e0652b. Summary will update on new commits.

@acaprau acaprau requested a review from a team as a code owner January 16, 2026 23:52
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 16, 2026

Greptile Summary

This PR ensures tenant ID is consistently retrieved from context via get_current_tenant_id() across all document index operations, addressing a critical issue where tenant ID was not being set for ID retrieval.

Key Changes:

  • Added import for get_current_tenant_id from shared_configs.contextvars
  • Modified 4 core methods (index(), update_single(), delete_single(), id_based_retrieval()) to explicitly retrieve tenant ID from context with validation
  • Modified 2 retrieval methods (hybrid_retrieval(), random_retrieval()) to use context-based tenant ID
  • Removed fallback logic that silently defaulted to empty string when tenant_id was missing from filters
  • Added validation checks to ensure supplied tenant ID (via parameters) matches the context tenant ID
  • Removed TODO comment about document_id field validation that was commented out

Architecture Alignment:
The changes align with the new interface pattern where tenant context is captured once during initialization via TenantState rather than passed through method parameters. The new approach provides better security by preventing callers from overriding tenant context and ensures consistency across async/multi-threaded environments.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk. All changes follow a consistent pattern, add proper validation checks, and align with the architectural migration to context-based tenant awareness.
  • The PR demonstrates solid engineering practices: (1) All changes follow the same pattern consistently across 6 methods; (2) Proper validation added to catch tenant ID mismatches early; (3) Aligns with the documented architecture pattern where tenant context is retrieved once and validated rather than passed through parameters; (4) Removes dangerous fallback logic that could cause silent tenant isolation failures; (5) Changes are focused and don't introduce unnecessary refactoring; (6) Import statements are correctly added; (7) No breaking changes to method signatures.
  • No files require special attention. The single file modified has been thoroughly reviewed and all changes are appropriate and consistent.

Important Files Changed

Filename Overview
backend/onyx/document_index/vespa/index.py This PR strengthens tenant ID validation by consistently retrieving tenant ID from the context via get_current_tenant_id() instead of relying on filter parameters or index_batch_params. Added explicit validation checks in 4 methods (index(), update_single(), delete_single(), and id_based_retrieval()), plus 2 retrieval methods (hybrid_retrieval(), random_retrieval()). Removed empty tenant_id fallback logic that was silently defaulting to empty string. All changes follow the same pattern and are consistent with the architecture.

Sequence Diagram

sequenceDiagram
    participant API as API Endpoint
    participant VespaIndex as VespaIndex (Old Interface)
    participant ContextVar as Context Variable
    participant TenantState as TenantState
    participant VespaDocIndex as VespaDocumentIndex (New Interface)

    API->>ContextVar: Sets CURRENT_TENANT_ID_CONTEXTVAR
    API->>VespaIndex: Calls id_based_retrieval(filters)
    
    activate VespaIndex
    VespaIndex->>ContextVar: get_current_tenant_id()
    ContextVar-->>VespaIndex: tenant_id (from context)
    
    VespaIndex->>TenantState: Create TenantState(tenant_id, multitenant)
    
    alt id_based_retrieval - No Validation (Before PR)
        VespaIndex->>TenantState: Uses filters.tenant_id or empty string
        VespaIndex-->>VespaIndex: May have mismatched tenant_id
    end
    
    alt id_based_retrieval - With Context (After PR)
        VespaIndex->>VespaIndex: Validates tenant_id from context
        VespaIndex->>VespaDocIndex: Create with validated TenantState
    end
    
    VespaIndex->>VespaDocIndex: Create VespaDocumentIndex(tenant_state)
    activate VespaDocIndex
    VespaDocIndex->>VespaDocIndex: Uses validated tenant_state
    VespaDocIndex-->>VespaIndex: Returns InferenceChunk[]
    deactivate VespaDocIndex
    
    VespaIndex-->>API: Returns chunks with correct tenant isolation
    deactivate VespaIndex
Loading

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@acaprau acaprau enabled auto-merge January 17, 2026 01:46
@acaprau acaprau added this pull request to the merge queue Jan 17, 2026
Merged via the queue into main with commit 00390c5 Jan 17, 2026
75 checks passed
@acaprau acaprau deleted the andrei/260116/1/opensearch/admin-retrieval-always-needs-tenant-id branch January 17, 2026 02:03
rohoswagger pushed a commit that referenced this pull request Jan 19, 2026
jessicasingh7 pushed a commit that referenced this pull request Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants