Skip to content

chore: bump sandbox cpu and memory limits#8208

Merged
wenxi-onyx merged 2 commits intomainfrom
whuang/craft-adjust-sandbox-resource-limits
Feb 5, 2026
Merged

chore: bump sandbox cpu and memory limits#8208
wenxi-onyx merged 2 commits intomainfrom
whuang/craft-adjust-sandbox-resource-limits

Conversation

@wenxi-onyx
Copy link
Copy Markdown
Member

@wenxi-onyx wenxi-onyx commented Feb 5, 2026

Description

Note: there is no quota constraint on limit vs request

sandbox-nodes-firewalled Nodegroup Resources

Total Nodegroup Capacity:

  • 5 nodes × 8 cores = 40 total CPU cores
  • 5 nodes × ~30 GB = ~150 GB total memory
  • Current utilization: ~1% CPU, 14-18% memory (all pods synced and basically dormant, maybe some use)

Per-Node Resources:

  • CPU Allocatable: 7.91 cores (7910m)
  • Memory Allocatable: ~29.9 GB (30624976Ki)
  • Current CPU requests: 19-26% reserved per node
  • Available burst capacity: ~5.8-6.3 cores per node

Changes: file-sync Sidecar Resource Limits

Updated the file-sync sidecar container to enable faster S3 downloads during pod initialization:

Previous:

  • CPU limit: 1 core
  • Memory limit: 4 GB
  • Init time: ~22 minutes for 850 MB (145k files)

New:

  • CPU limit: 8 cores
  • Memory limit: 8 GB
  • Expected init time: ~4-6 minutes (4-5x faster)

Rationale:

  • Uses s5cmd with high parallelism (100+ concurrent workers)
  • Sidecar only bursts during initialization, then idles
  • Low CPU requests (500m) ensure efficient pod scheduling
  • Nodes are currently underutilized with ample burst capacity

How Has This Been Tested?

Additional Options

  • [Required] I have considered whether this PR needs to be cherry-picked to the latest beta branch.
  • [Optional] Override Linear Check

Summary by cubic

Increase Kubernetes sandbox pod resource limits to reduce CPU throttling and OOMs during heavier workloads.

Limits raised from 1000m CPU to 4000m and from 4Gi memory to 6Gi; requests increased from 50m CPU/128Mi to 250m CPU/256Mi.

Written for commit 47b06df. Summary will update on new commits.

@wenxi-onyx wenxi-onyx requested a review from a team as a code owner February 5, 2026 23:04
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

This PR adjusts Kubernetes resource limits for the sandbox pod’s S3 sync sidecar container, increasing its CPU and memory limits. The change is localized to the pod spec construction in backend/onyx/server/features/build/sandbox/kubernetes/kubernetes_sandbox_manager.py and affects how sandboxes schedule and consume cluster resources under quota/LimitRange constraints.

Confidence Score: 4/5

  • This PR is likely safe to merge, but the new limits can break scheduling in quota-constrained clusters.
  • Change is a single resource-limit bump in the Kubernetes pod spec; main risk is operational (pods becoming unschedulable or crowding other workloads) depending on cluster ResourceQuota/LimitRange and autoscaling behavior.
  • backend/onyx/server/features/build/sandbox/kubernetes/kubernetes_sandbox_manager.py

Important Files Changed

Filename Overview
backend/onyx/server/features/build/sandbox/kubernetes/kubernetes_sandbox_manager.py Bumps the S3 sync sidecar container resource limits from 1000m/4Gi to 8000m/8Gi; no other logic changes.

Sequence Diagram

sequenceDiagram
  participant Caller as Build/Session Manager
  participant KSM as KubernetesSandboxManager
  participant K8s as Kubernetes API Server

  Caller->>KSM: provision(user/session)
  KSM->>K8s: Create Pod (sandbox + s3-sync sidecar)
  Note over KSM,K8s: PR bumps sidecar limits to 8000m CPU / 8Gi RAM
  K8s-->>KSM: Pod scheduled (or rejected by quota/LimitRange)
  Caller->>KSM: exec incremental sync / workspace ops
  KSM->>K8s: kubectl exec into sidecar/sandbox
  K8s-->>KSM: command output
  Caller-->>KSM: terminate()
  KSM->>K8s: Delete Pod
Loading

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@wenxi-onyx wenxi-onyx enabled auto-merge February 5, 2026 23:25
@wenxi-onyx wenxi-onyx disabled auto-merge February 5, 2026 23:32
@wenxi-onyx wenxi-onyx merged commit ec4f85f into main Feb 5, 2026
81 checks passed
@wenxi-onyx wenxi-onyx deleted the whuang/craft-adjust-sandbox-resource-limits branch February 5, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant