chore: Better enforcement of masking by yuhongsun96 · Pull Request #7967 · onyx-dot-app/onyx

yuhongsun96 · 2026-01-29T05:10:02Z

Description

Standardizing masking of sensitive fields. This static time check now prevents developers from forgetting to mask fields as the value is unusuable and will now raise an error if not explicitly extracted with either a mask applied or not.

How Has This Been Tested?

Test setting/updating LLM keys
Test setting/updating web search keys
Test setting/updating connector credentials
Test Inference flow with tools

Additional Options

[Required] I have considered whether this PR needs to be cherry-picked to the latest beta branch.
[Optional] Override Linear Check

Summary by cubic

Enforces explicit masking for all secrets by wrapping encrypted columns with SensitiveValue. Backend code unwraps for internal use; API responses mask by default across connectors, OAuth, bots, KV, MCP, tools, and manage/federated APIs.

New Features
- Introduced SensitiveValue with get_value(apply_mask=...). EncryptedString/EncryptedJson return SensitiveValue on reads and accept str/SensitiveValue on writes.
- Centralized mask_string and mask_credential_dict in onyx.utils.encryption; removed legacy masking from server/utils.
- Broad adoption with unwrapping before use: connectors and permission sync (GitHub/Gmail/Drive/Jira/SharePoint/Slack/Teams), OAuth token manager and status API, Slack/Discord bots (backend unwraps; manage APIs mask), KV store, MCP (extract_connection_data and tool headers), tools (web search/image/MCP), federated APIs/retrieval, and documents APIs.
- DB helpers merge credential_json using unwrapped values and expire ORM instances after writes so columns reload as SensitiveValue.
- Testing and typing: make_mock_sensitive_value helper, SensitiveValue unit/typing tests, and mypy-safe updates across models/APIs.
Migration
- In backend code, unwrap with .get_value(apply_mask=False); use apply_mask=True for anything returned to clients.
- Update tests/mocks to use make_mock_sensitive_value and expire ORM instances after writes when you need SensitiveValue reloaded.
- Always pass unwrapped API keys/tokens to SDKs or external clients. No database schema changes.

^{Written for commit c6bb925. Summary will update on new commits.}

greptile-apps · 2026-01-29T05:13:34Z

Greptile Overview

Greptile Summary

This PR introduces a comprehensive static-time enforcement mechanism for masking sensitive fields by wrapping all encrypted database fields in a SensitiveValue class. The core change prevents developers from accidentally exposing sensitive data by making encrypted values unusable until explicitly unwrapped with either get_value(apply_mask=True) for API responses or get_value(apply_mask=False) for internal use.

Key Changes:

Added new SensitiveValue wrapper class that raises SensitiveAccessError on any direct access (str conversion, iteration, subscripting, JSON serialization)
Modified EncryptedString and EncryptedJson SQLAlchemy type decorators to return SensitiveValue instances instead of raw decrypted values
Updated 61 files across the codebase to explicitly unwrap sensitive values using get_value(apply_mask=False) for internal operations
Moved masking utilities from server/utils.py to encryption.py for centralized management
Added comprehensive test coverage for the new SensitiveValue class
Updated all type annotations from str | None to SensitiveValue[str] | None for encrypted fields

Impact:
The change standardizes credential handling across connectors, LLM providers, OAuth tokens, web search keys, MCP configurations, and Slack bot tokens. All backend code now explicitly declares whether it needs masked or unmasked values, preventing accidental credential leakage in logs or API responses.

Confidence Score: 5/5

This PR is safe to merge - it strengthens security by enforcing explicit masking decisions
The implementation is well-designed with comprehensive test coverage, consistent patterns across 61 files, and proper null safety checks. The static-time enforcement prevents accidental credential exposure at compile/runtime rather than relying on developer discipline.
No files require special attention - all changes follow consistent patterns

Important Files Changed

Filename	Overview
backend/onyx/utils/sensitive.py	New SensitiveValue wrapper class that enforces explicit masking decisions, preventing accidental exposure of sensitive data
backend/onyx/db/models.py	Updated EncryptedString/EncryptedJson type decorators to return SensitiveValue wrappers, updated all encrypted field type annotations
backend/onyx/server/documents/models.py	Updated CredentialSnapshot to handle SensitiveValue and apply masking based on MASK_CREDENTIAL_PREFIX flag
backend/onyx/server/manage/llm/api.py	Updated LLM provider operations to unwrap SensitiveValue with get_value(apply_mask=False) for API key access
backend/onyx/server/features/mcp/api.py	Updated MCP connection config handling to unwrap SensitiveValue using extract_connection_data helper and get_value
backend/onyx/server/federated/api.py	Added null checks for credentials and unwrapped SensitiveValue across all federated connector endpoints

Sequence Diagram

sequenceDiagram
    participant API as API Endpoint
    participant DB as Database Layer
    participant Model as SQLAlchemy Model
    participant SV as SensitiveValue
    participant Enc as Encryption Utils
    participant Consumer as Backend Consumer

    Note over API,Consumer: Writing Sensitive Data Flow
    API->>Model: Set credential_json = {"api_key": "secret"}
    Model->>Enc: encrypt_string_to_bytes(json_str)
    Enc-->>Model: encrypted_bytes
    Model->>DB: Store encrypted_bytes
    
    Note over API,Consumer: Reading Sensitive Data Flow (API Response)
    DB->>Model: Load encrypted_bytes
    Model->>SV: Create SensitiveValue(encrypted_bytes, decrypt_fn)
    SV-->>Model: SensitiveValue wrapper
    Model-->>API: credential.credential_json (SensitiveValue)
    API->>SV: get_value(apply_mask=True)
    SV->>Enc: mask_credential_dict(decrypted_value)
    Enc-->>SV: masked_value
    SV-->>API: Return masked credentials
    API-->>API: Send to client (safe)
    
    Note over API,Consumer: Reading Sensitive Data Flow (Backend Use)
    DB->>Model: Load encrypted_bytes
    Model->>SV: Create SensitiveValue(encrypted_bytes, decrypt_fn)
    SV-->>Model: SensitiveValue wrapper
    Model-->>Consumer: credential.credential_json (SensitiveValue)
    Consumer->>SV: get_value(apply_mask=False)
    SV->>Enc: decrypt_bytes_to_string(encrypted_bytes)
    Enc-->>SV: decrypted_value
    SV-->>Consumer: Return raw credentials
    Consumer-->>Consumer: Use for API calls
    
    Note over API,Consumer: Error Prevention
    API->>SV: str(sensitive_value) ❌
    SV-->>API: SensitiveAccessError
    API->>SV: sensitive_value["key"] ❌
    SV-->>API: SensitiveAccessError

cubic-dev-ai

1 issue found across 61 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/onyx/tools/tool_implementations/mcp/mcp_tool.py">

<violation number="1" location="backend/onyx/tools/tool_implementations/mcp/mcp_tool.py:147">
P2: Filter denylisted headers from connection config before updating `headers`; otherwise a user-supplied Host header can still bypass the denylist.

(Based on your team's feedback about denying Host headers in MCP headers.) [FEEDBACK_USED]</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/onyx/tools/tool_implementations/mcp/mcp_tool.py

evan-onyx

make sure u fix tests before merging but approach looks fine

backend/onyx/db/mcp.py

backend/onyx/db/models.py

yuhongsun96 requested a review from a team as a code owner January 29, 2026 05:10

cubic-dev-ai bot reviewed Jan 29, 2026

View reviewed changes

backend/onyx/tools/tool_implementations/mcp/mcp_tool.py Show resolved Hide resolved

evan-onyx approved these changes Jan 29, 2026

View reviewed changes

yuhongsun96 force-pushed the better-masking branch from ad8dbab to 6ce1c4a Compare January 29, 2026 18:12

yuhongsun96 commented Jan 29, 2026

View reviewed changes

backend/onyx/db/mcp.py Show resolved Hide resolved

justin-tahara reviewed Feb 6, 2026

View reviewed changes

backend/onyx/db/models.py Show resolved Hide resolved

justin-tahara force-pushed the better-masking branch 2 times, most recently from ce9227f to 9358a84 Compare February 7, 2026 01:15

yuhongsun96 and others added 12 commits February 6, 2026 21:21

k

5e8b3db

k

a79d38c

noqa

c51e686

Fixing all tests

1ce6f33

Try to fix the remaining tests

e3e8965

Fix oauth token test

6fe076e

Trying to fix playwright

10e8f87

Trying to fix the oauth playwright test

46b0609

fixing playwright oauth

fc287b9

merge conflicts

7c479fb

One more time

ceeb8bd

fix test

11d4036

justin-tahara force-pushed the better-masking branch from e8ee796 to 11d4036 Compare February 7, 2026 05:23

justin-tahara added 2 commits February 6, 2026 21:28

fixing mypy

c840583

Let's try this one then

c6bb925

justin-tahara added this pull request to the merge queue Feb 10, 2026

Merged via the queue into main with commit bb5c221 Feb 10, 2026
81 checks passed

justin-tahara deleted the better-masking branch February 10, 2026 00:48

greptile-apps bot mentioned this pull request Feb 10, 2026

fix(embedding): Updating Masking Logic #8301

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Better enforcement of masking#7967

chore: Better enforcement of masking#7967
justin-tahara merged 14 commits intomainfrom
better-masking

yuhongsun96 commented Jan 29, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

greptile-apps bot commented Jan 29, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

evan-onyx left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yuhongsun96 commented Jan 29, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Additional Options

Summary by cubic

Uh oh!

greptile-apps bot commented Jan 29, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

evan-onyx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuhongsun96 commented Jan 29, 2026 •

edited by cubic-dev-ai bot

Loading