Skip to content

fix: vertex prompt caching#7339

Merged
Weves merged 2 commits intomainfrom
fix/vertex-prompt-caching
Jan 11, 2026
Merged

fix: vertex prompt caching#7339
Weves merged 2 commits intomainfrom
fix/vertex-prompt-caching

Conversation

@evan-onyx
Copy link
Copy Markdown
Contributor

@evan-onyx evan-onyx commented Jan 11, 2026

Description

We were seeing a variety of errors when users tried to user vertex models through Onyx. Supporting vertex's explicit prompt caching (not allowed to pass the tool calls or system message) will take a while and will likely be tricky to get right. Hopefully they just make it more convenient in the future.

We were previously just optimistically doing what litellm does in the first code example in their docs:
https://docs.litellm.ai/docs/providers/vertex#context-caching

but it seems pretty clear at this point we need the heavier-weight version they describe below it (making explicit calls to the provider).

How Has This Been Tested?

gemini through vertex works

Additional Options

  • [Optional] Override Linear Check

Summary by cubic

Disabled Vertex prompt caching to prevent errors with Gemini via Vertex. Removed cache_control injection and will add explicit caching in a future update.

  • Bug Fixes

    • Stop transforming cacheable messages for Vertex (no cache_control added).
    • Avoids conflicts with tools and system messages during caching.
  • Dependencies

    • Marked fsevents as dev-only in package-lock.json.

Written for commit d1df258. Summary will update on new commits.

@evan-onyx evan-onyx requested a review from a team as a code owner January 11, 2026 00:02
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

@Weves Weves enabled auto-merge January 11, 2026 00:04
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 11, 2026

Greptile Overview

Greptile Summary

This PR disables Vertex AI's explicit prompt caching to fix errors users were experiencing with Gemini models. The change moves from an optimistic caching approach (adding cache_control parameters) to relying on Vertex's implicit caching mechanism.

Key Changes:

  • Set transform_cacheable=None in prepare_messages_for_caching() to skip cache control transformations
  • Added explanatory comment noting that explicit caching with tools and system messages requires a more sophisticated implementation
  • Updated fsevents package metadata in package-lock.json (appears unrelated to the stated purpose)

Code Quality Issues:

  • The _add_vertex_cache_control function (lines 83-125) is now dead code and should be removed per coding standards
  • Contains a typo: "mechnism" should be "mechanism"
  • The package-lock.json change seems accidental and unrelated to the Vertex caching fix

Functional Impact:
The change correctly addresses the immediate problem by removing cache_control parameters that were conflicting with tools and system messages. Vertex AI will now handle caching implicitly, which should resolve the errors users were seeing. The approach aligns with how the OpenAI provider handles implicit caching (no transformation needed).

Confidence Score: 4/5

  • This PR is safe to merge with minor cleanup needed - the core fix is sound and addresses the reported issue effectively.
  • The functional change (setting transform_cacheable to None) is correct and follows the pattern used by the OpenAI provider for implicit caching. The approach appropriately defers complex explicit caching to a future PR. Score of 4 (not 5) due to: (1) dead code that should be removed per coding standards, (2) a spelling typo in comments, and (3) an apparently unrelated package-lock.json change that may be accidental. These are all minor style/cleanup issues that don't affect functionality.
  • backend/onyx/llm/prompt_cache/providers/vertex.py should have the unused _add_vertex_cache_control function removed (lines 83-125)

Important Files Changed

File Analysis

Filename Score Overview
backend/onyx/llm/prompt_cache/providers/vertex.py 3/5 Disabled Vertex prompt caching by setting transform_cacheable to None. Contains unused dead code (_add_vertex_cache_control function) and a typo in comment.
web/package-lock.json 4/5 Added "dev": true to fsevents package - appears unrelated to PR's stated purpose of fixing Vertex caching.

Sequence Diagram

sequenceDiagram
    participant Client
    participant VertexProvider
    participant prepare_messages_with_cacheable_transform
    participant LLM_API as Vertex AI API

    Note over VertexProvider: Before this PR
    Client->>VertexProvider: prepare_messages_for_caching(prefix, suffix)
    VertexProvider->>prepare_messages_with_cacheable_transform: transform_cacheable=_add_vertex_cache_control
    prepare_messages_with_cacheable_transform->>prepare_messages_with_cacheable_transform: Add cache_control to messages
    prepare_messages_with_cacheable_transform-->>VertexProvider: Messages with cache_control
    VertexProvider->>LLM_API: Send messages with cache_control
    LLM_API-->>VertexProvider: Error (conflicts with tools/system messages)

    Note over VertexProvider: After this PR
    Client->>VertexProvider: prepare_messages_for_caching(prefix, suffix)
    VertexProvider->>prepare_messages_with_cacheable_transform: transform_cacheable=None
    prepare_messages_with_cacheable_transform->>prepare_messages_with_cacheable_transform: No transformation
    prepare_messages_with_cacheable_transform-->>VertexProvider: Messages unchanged
    VertexProvider->>LLM_API: Send messages without cache_control
    LLM_API-->>VertexProvider: Success (implicit caching)
Loading

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

not at the message level. This function converts string content to the array format
and adds cache_control to the last content block in each cacheable message.
"""
# NOTE: unfortunately we need a much more sophisticated mechnism to support
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'mechnism' is mis-spelled - should be 'mechanism'

Suggested change
# NOTE: unfortunately we need a much more sophisticated mechnism to support
# NOTE: unfortunately we need a much more sophisticated mechanism to support
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/llm/prompt_cache/providers/vertex.py
Line: 92:92

Comment:
'mechnism' is mis-spelled - should be 'mechanism'

```suggestion
    # NOTE: unfortunately we need a much more sophisticated mechanism to support
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 10310 to +10311
"version": "2.3.2",
"dev": true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change appears unrelated to the PR's stated purpose of fixing Vertex prompt caching. The addition of "dev": true to the fsevents package is likely an accidental change from your local development environment.

Consider whether this change should be:

  1. Removed from this PR (if accidental)
  2. Included with explanation (if intentional)
  3. Moved to a separate PR focused on dependency management

fsevents is a macOS-specific file system event watcher that's typically an optional dependency. While marking it as dev-only may be correct, it's unclear why this change is bundled with a backend Python fix.

Prompt To Fix With AI
This is a comment left during a code review.
Path: web/package-lock.json
Line: 10310:10311

Comment:
This change appears unrelated to the PR's stated purpose of fixing Vertex prompt caching. The addition of `"dev": true` to the fsevents package is likely an accidental change from your local development environment.

Consider whether this change should be:
1. Removed from this PR (if accidental)
2. Included with explanation (if intentional)
3. Moved to a separate PR focused on dependency management

fsevents is a macOS-specific file system event watcher that's typically an optional dependency. While marking it as dev-only may be correct, it's unclear why this change is bundled with a backend Python fix.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 11, 2026

Additional Comments (1)

backend/onyx/llm/prompt_cache/providers/vertex.py
This function is now unused (since transform_cacheable is set to None on line 51) and should be removed entirely. According to the custom instruction "When hardcoding a boolean variable to a constant value, remove the variable entirely and clean up all places where it's used rather than just setting it to a constant" - the same principle applies here when setting a function parameter to a constant (None).

The function was previously called via transform_cacheable=_add_vertex_cache_control but is no longer used anywhere in the codebase. Keeping dead code adds maintenance burden and can be confusing for future developers.

# Remove lines 83-125 entirely - the function is no longer used

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/llm/prompt_cache/providers/vertex.py
Line: 83:125

Comment:
This function is now unused (since `transform_cacheable` is set to `None` on line 51) and should be removed entirely. According to the custom instruction "When hardcoding a boolean variable to a constant value, remove the variable entirely and clean up all places where it's used rather than just setting it to a constant" - the same principle applies here when setting a function parameter to a constant (None).

The function was previously called via `transform_cacheable=_add_vertex_cache_control` but is no longer used anywhere in the codebase. Keeping dead code adds maintenance burden and can be confusing for future developers.

```suggestion
# Remove lines 83-125 entirely - the function is no longer used
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@Weves Weves added this pull request to the merge queue Jan 11, 2026
Merged via the queue into main with commit 22138bb Jan 11, 2026
203 of 204 checks passed
@Weves Weves deleted the fix/vertex-prompt-caching branch January 11, 2026 00:28
jessicasingh7 pushed a commit that referenced this pull request Jan 12, 2026
Co-authored-by: Weves <chrisweaver101@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants