Skip to content

fix: kubernetes freezing#7928

Merged
Weves merged 1 commit intomainfrom
try-and-fix-kubernetes-freezing
Jan 28, 2026
Merged

fix: kubernetes freezing#7928
Weves merged 1 commit intomainfrom
try-and-fix-kubernetes-freezing

Conversation

@Weves
Copy link
Copy Markdown
Contributor

@Weves Weves commented Jan 28, 2026

Description

How Has This Been Tested?

Additional Options

  • [Required] I have considered whether this PR needs to be cherry-picked to the latest beta branch.
  • [Optional] Override Linear Check

Summary by cubic

Fixes Kubernetes sandbox freezes by separating REST and streaming API clients and preventing duplicate session loads in the web app. This stabilizes pod exec operations and stops “Handshake status 200 OK” errors.

  • Bug Fixes
    • Split Kubernetes ApiClient: REST for CRUD, dedicated client for stream/exec; route all exec calls through the streaming client to avoid WebSocket patch leakage.
    • Set loadedSessionIdRef before async work in useBuildSessionController to stop re-entrant fetches and duplicate loads.

Written for commit 0651dc8. Summary will update on new commits.

@Weves Weves requested a review from a team as a code owner January 28, 2026 06:23
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 28, 2026

Greptile Overview

Greptile Summary

Fixed Kubernetes sandbox freezing by addressing two critical race conditions:

Backend (Kubernetes Manager)

  • Created separate ApiClient instances for REST operations (_rest_api_client) and streaming/exec operations (_stream_api_client)
  • Routed all 13 k8s_stream calls through the new _stream_core_api to prevent WebSocket patching from leaking into REST calls
  • Prevents "Handshake status 200 OK" errors caused by the kubernetes.stream.stream function monkey-patching the shared ApiClient

Frontend (Session Controller)

  • Moved loadedSessionIdRef.current = existingSessionId assignment to occur before async loadSession() call
  • Prevents duplicate session loads when the effect re-runs while still loading

Confidence Score: 5/5

  • Safe to merge - targeted fixes for specific race conditions with clear root cause analysis
  • Both changes address well-understood race conditions with clean, isolated fixes that don't introduce new dependencies or complexity
  • No files require special attention

Important Files Changed

Filename Overview
backend/onyx/server/features/build/sandbox/kubernetes/kubernetes_sandbox_manager.py Separated REST and streaming Kubernetes API clients to prevent WebSocket patching leakage; routed all exec calls through dedicated streaming client
web/src/app/craft/hooks/useBuildSessionController.ts Set loadedSessionIdRef before async work to prevent race condition with duplicate session loads

Sequence Diagram

sequenceDiagram
    participant Frontend as useBuildSessionController
    participant Store as BuildSessionStore
    participant KubeManager as KubernetesSandboxManager
    participant RESTClient as REST ApiClient
    participant StreamClient as Stream ApiClient
    participant K8s as Kubernetes API

    Note over Frontend: User navigates to session
    Frontend->>Frontend: Check loadedSessionIdRef
    Frontend->>Frontend: Set loadedSessionIdRef BEFORE async
    Frontend->>Store: loadSession(sessionId)
    Note over Frontend: Race condition prevented:<br/>Subsequent re-runs see ref is set
    
    Note over KubeManager: Initialization
    KubeManager->>RESTClient: Create separate REST client
    KubeManager->>StreamClient: Create separate Stream client
    KubeManager->>RESTClient: Assign to _core_api, _batch_api, _networking_api
    KubeManager->>StreamClient: Assign to _stream_core_api
    
    Note over KubeManager: Regular CRUD operations
    KubeManager->>RESTClient: Standard API calls (list, get, create, delete)
    RESTClient->>K8s: HTTP REST requests
    K8s-->>RESTClient: REST responses
    
    Note over KubeManager: Exec/Stream operations
    KubeManager->>StreamClient: k8s_stream via _stream_core_api
    Note over StreamClient: kubernetes.stream.stream<br/>monkey-patches this client
    StreamClient->>K8s: WebSocket connection
    K8s-->>StreamClient: Streaming data
    
    Note over RESTClient,StreamClient: Clients isolated:<br/>WebSocket patch doesn't leak<br/>to REST operations
Loading

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="web/src/app/craft/hooks/useBuildSessionController.ts">

<violation number="1" location="web/src/app/craft/hooks/useBuildSessionController.ts:131">
P2: Setting `loadedSessionIdRef` before `loadSession` completes can block retries if the load fails, because the guard treats the session as already loaded.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


// Set ref BEFORE any async work to prevent duplicate calls
// if the effect re-runs while we're still loading
loadedSessionIdRef.current = existingSessionId;
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Setting loadedSessionIdRef before loadSession completes can block retries if the load fails, because the guard treats the session as already loaded.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At web/src/app/craft/hooks/useBuildSessionController.ts, line 131:

<comment>Setting `loadedSessionIdRef` before `loadSession` completes can block retries if the load fails, because the guard treats the session as already loaded.</comment>

<file context>
@@ -126,20 +126,22 @@ export function useBuildSessionController({
 
+      // Set ref BEFORE any async work to prevent duplicate calls
+      // if the effect re-runs while we're still loading
+      loadedSessionIdRef.current = existingSessionId;
+
       // Access sessions via getState() to avoid dependency on Map reference
</file context>
Fix with Cubic

@Weves Weves merged commit 6a02ff9 into main Jan 28, 2026
81 of 83 checks passed
@Weves Weves deleted the try-and-fix-kubernetes-freezing branch January 28, 2026 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants