Skip to content

chore: Better enforcement of masking#7967

Merged
justin-tahara merged 14 commits intomainfrom
better-masking
Feb 10, 2026
Merged

chore: Better enforcement of masking#7967
justin-tahara merged 14 commits intomainfrom
better-masking

Conversation

@yuhongsun96
Copy link
Copy Markdown
Contributor

@yuhongsun96 yuhongsun96 commented Jan 29, 2026

Description

Standardizing masking of sensitive fields. This static time check now prevents developers from forgetting to mask fields as the value is unusuable and will now raise an error if not explicitly extracted with either a mask applied or not.

How Has This Been Tested?

Test setting/updating LLM keys
Test setting/updating web search keys
Test setting/updating connector credentials
Test Inference flow with tools

Additional Options

  • [Required] I have considered whether this PR needs to be cherry-picked to the latest beta branch.
  • [Optional] Override Linear Check

Summary by cubic

Enforces explicit masking for all secrets by wrapping encrypted columns with SensitiveValue. Backend code unwraps for internal use; API responses mask by default across connectors, OAuth, bots, KV, MCP, tools, and manage/federated APIs.

  • New Features

    • Introduced SensitiveValue with get_value(apply_mask=...). EncryptedString/EncryptedJson return SensitiveValue on reads and accept str/SensitiveValue on writes.
    • Centralized mask_string and mask_credential_dict in onyx.utils.encryption; removed legacy masking from server/utils.
    • Broad adoption with unwrapping before use: connectors and permission sync (GitHub/Gmail/Drive/Jira/SharePoint/Slack/Teams), OAuth token manager and status API, Slack/Discord bots (backend unwraps; manage APIs mask), KV store, MCP (extract_connection_data and tool headers), tools (web search/image/MCP), federated APIs/retrieval, and documents APIs.
    • DB helpers merge credential_json using unwrapped values and expire ORM instances after writes so columns reload as SensitiveValue.
    • Testing and typing: make_mock_sensitive_value helper, SensitiveValue unit/typing tests, and mypy-safe updates across models/APIs.
  • Migration

    • In backend code, unwrap with .get_value(apply_mask=False); use apply_mask=True for anything returned to clients.
    • Update tests/mocks to use make_mock_sensitive_value and expire ORM instances after writes when you need SensitiveValue reloaded.
    • Always pass unwrapped API keys/tokens to SDKs or external clients. No database schema changes.

Written for commit c6bb925. Summary will update on new commits.

@yuhongsun96 yuhongsun96 requested a review from a team as a code owner January 29, 2026 05:10
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 29, 2026

Greptile Overview

Greptile Summary

This PR introduces a comprehensive static-time enforcement mechanism for masking sensitive fields by wrapping all encrypted database fields in a SensitiveValue class. The core change prevents developers from accidentally exposing sensitive data by making encrypted values unusable until explicitly unwrapped with either get_value(apply_mask=True) for API responses or get_value(apply_mask=False) for internal use.

Key Changes:

  • Added new SensitiveValue wrapper class that raises SensitiveAccessError on any direct access (str conversion, iteration, subscripting, JSON serialization)
  • Modified EncryptedString and EncryptedJson SQLAlchemy type decorators to return SensitiveValue instances instead of raw decrypted values
  • Updated 61 files across the codebase to explicitly unwrap sensitive values using get_value(apply_mask=False) for internal operations
  • Moved masking utilities from server/utils.py to encryption.py for centralized management
  • Added comprehensive test coverage for the new SensitiveValue class
  • Updated all type annotations from str | None to SensitiveValue[str] | None for encrypted fields

Impact:
The change standardizes credential handling across connectors, LLM providers, OAuth tokens, web search keys, MCP configurations, and Slack bot tokens. All backend code now explicitly declares whether it needs masked or unmasked values, preventing accidental credential leakage in logs or API responses.

Confidence Score: 5/5

  • This PR is safe to merge - it strengthens security by enforcing explicit masking decisions
  • The implementation is well-designed with comprehensive test coverage, consistent patterns across 61 files, and proper null safety checks. The static-time enforcement prevents accidental credential exposure at compile/runtime rather than relying on developer discipline.
  • No files require special attention - all changes follow consistent patterns

Important Files Changed

Filename Overview
backend/onyx/utils/sensitive.py New SensitiveValue wrapper class that enforces explicit masking decisions, preventing accidental exposure of sensitive data
backend/onyx/db/models.py Updated EncryptedString/EncryptedJson type decorators to return SensitiveValue wrappers, updated all encrypted field type annotations
backend/onyx/server/documents/models.py Updated CredentialSnapshot to handle SensitiveValue and apply masking based on MASK_CREDENTIAL_PREFIX flag
backend/onyx/server/manage/llm/api.py Updated LLM provider operations to unwrap SensitiveValue with get_value(apply_mask=False) for API key access
backend/onyx/server/features/mcp/api.py Updated MCP connection config handling to unwrap SensitiveValue using extract_connection_data helper and get_value
backend/onyx/server/federated/api.py Added null checks for credentials and unwrapped SensitiveValue across all federated connector endpoints

Sequence Diagram

sequenceDiagram
    participant API as API Endpoint
    participant DB as Database Layer
    participant Model as SQLAlchemy Model
    participant SV as SensitiveValue
    participant Enc as Encryption Utils
    participant Consumer as Backend Consumer

    Note over API,Consumer: Writing Sensitive Data Flow
    API->>Model: Set credential_json = {"api_key": "secret"}
    Model->>Enc: encrypt_string_to_bytes(json_str)
    Enc-->>Model: encrypted_bytes
    Model->>DB: Store encrypted_bytes
    
    Note over API,Consumer: Reading Sensitive Data Flow (API Response)
    DB->>Model: Load encrypted_bytes
    Model->>SV: Create SensitiveValue(encrypted_bytes, decrypt_fn)
    SV-->>Model: SensitiveValue wrapper
    Model-->>API: credential.credential_json (SensitiveValue)
    API->>SV: get_value(apply_mask=True)
    SV->>Enc: mask_credential_dict(decrypted_value)
    Enc-->>SV: masked_value
    SV-->>API: Return masked credentials
    API-->>API: Send to client (safe)
    
    Note over API,Consumer: Reading Sensitive Data Flow (Backend Use)
    DB->>Model: Load encrypted_bytes
    Model->>SV: Create SensitiveValue(encrypted_bytes, decrypt_fn)
    SV-->>Model: SensitiveValue wrapper
    Model-->>Consumer: credential.credential_json (SensitiveValue)
    Consumer->>SV: get_value(apply_mask=False)
    SV->>Enc: decrypt_bytes_to_string(encrypted_bytes)
    Enc-->>SV: decrypted_value
    SV-->>Consumer: Return raw credentials
    Consumer-->>Consumer: Use for API calls
    
    Note over API,Consumer: Error Prevention
    API->>SV: str(sensitive_value) ❌
    SV-->>API: SensitiveAccessError
    API->>SV: sensitive_value["key"] ❌
    SV-->>API: SensitiveAccessError
Loading

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 61 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/onyx/tools/tool_implementations/mcp/mcp_tool.py">

<violation number="1" location="backend/onyx/tools/tool_implementations/mcp/mcp_tool.py:147">
P2: Filter denylisted headers from connection config before updating `headers`; otherwise a user-supplied Host header can still bypass the denylist.

(Based on your team's feedback about denying Host headers in MCP headers.) [FEEDBACK_USED]</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown
Contributor

@evan-onyx evan-onyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure u fix tests before merging but approach looks fine

@justin-tahara justin-tahara force-pushed the better-masking branch 2 times, most recently from ce9227f to 9358a84 Compare February 7, 2026 01:15
@justin-tahara justin-tahara added this pull request to the merge queue Feb 10, 2026
Merged via the queue into main with commit bb5c221 Feb 10, 2026
81 checks passed
@justin-tahara justin-tahara deleted the better-masking branch February 10, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants