feat(supervisor): custom tolerations for scheduled runs by myftija · Pull Request #3297 · triggerdotdev/trigger.dev

myftija · 2026-03-30T10:33:04Z

Adds support for taint tolerations for scheduled runs. Useful for selectively tolerating taints on dedicated node pools.

The new KUBERNETES_SCHEDULED_RUN_TOLERATIONS env variable accepts a comma-separated list in the format key=value:effect (or key:effect for the Exists operator).

Drive-by: renames all KUBERNETES_SCHEDULE_* affinity env vars to KUBERNETES_SCHEDULED_RUN_* for clarity — this feature isn't used in production yet or published in a tagged image; the name change is fine.

changeset-bot · 2026-03-30T10:33:16Z

⚠️ No Changeset found

Latest commit: 450910e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-03-30T10:33:22Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 29c705a2-15ef-4eff-b2d0-633350f052a2

📥 Commits

Reviewing files that changed from the base of the PR and between 46521c5 and 450910e.

📒 Files selected for processing (1)

apps/supervisor/src/env.ts

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: sdk-compat / Cloudflare Workers
GitHub Check: sdk-compat / Deno Runtime
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: typecheck / typecheck
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: sdk-compat / Bun Runtime

🧰 Additional context used

📓 Path-based instructions (6)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

apps/supervisor/src/env.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

apps/supervisor/src/env.ts

**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

**/*.ts: Use typecheck to verify changes in apps and internal packages (apps/*, internal-packages/*), not build - building proves almost nothing about correctness
When writing Trigger.dev tasks, always import from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob
Add crumbs as you write code - mark lines with // @Crumbs or wrap blocks in `// `#region` `@crumbs for agentcrumbs debug tracing, then strip before merge

Files:

apps/supervisor/src/env.ts

**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

apps/supervisor/src/env.ts

apps/supervisor/src/env.ts

📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)

Environment configuration should be defined in src/env.ts

Files:

apps/supervisor/src/env.ts

apps/**/*

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying only server components (apps/webapp/, apps/supervisor/, etc.) with no package changes, add a .server-changes/ file instead of a changeset

Files:

apps/supervisor/src/env.ts

🧠 Learnings (8)

📚 Learning: 2026-03-02T12:42:47.652Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/env.ts : Environment configuration should be defined in `src/env.ts`

Applied to files:

apps/supervisor/src/env.ts

📚 Learning: 2026-03-27T18:11:55.459Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3114
File: apps/supervisor/src/index.ts:83-98
Timestamp: 2026-03-27T18:11:55.459Z
Learning: In `apps/supervisor/src/index.ts`, `RESOURCE_MONITOR_ENABLED` (env var in `apps/supervisor/src/env.ts`) defaults to `false`. As a result, the local `ResourceMonitor`-based `maxResources`/`skipDequeue` gating in `preDequeue` is inactive in compute mode deployments. Do not flag local resource monitor usage in compute mode as a live bug; it has no practical impact unless `RESOURCE_MONITOR_ENABLED` is explicitly set to `true`.

Applied to files:

apps/supervisor/src/env.ts

📚 Learning: 2026-03-27T11:45:37.910Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3114
File: apps/supervisor/src/workloadServer/index.ts:832-840
Timestamp: 2026-03-27T11:45:37.910Z
Learning: In `apps/supervisor/src/workloadManager/compute.ts` and the supervisor restore flow, `TRIGGER_METADATA_URL` does not need to be re-injected on VM restore because it is baked into the instance environment at creation time and the environment is preserved through snapshot/restore. The Kubernetes restore path follows the same pattern. Do not flag the absence of `TRIGGER_METADATA_URL` re-injection on restore as a bug.

Applied to files:

apps/supervisor/src/env.ts

📚 Learning: 2025-08-14T18:35:44.370Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2390
File: apps/webapp/app/env.server.ts:764-765
Timestamp: 2025-08-14T18:35:44.370Z
Learning: The BoolEnv helper in apps/webapp/app/utils/boolEnv.ts uses z.preprocess with inconsistent default value types across the codebase - some usages pass boolean defaults (correct) while others pass string defaults (incorrect), leading to type confusion. The helper should enforce boolean-only defaults or have clearer documentation.

Applied to files:

apps/supervisor/src/env.ts

📚 Learning: 2026-03-21T21:23:35.117Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/runs/v3/ai/extractAISummarySpanData.ts:149-150
Timestamp: 2026-03-21T21:23:35.117Z
Learning: In `apps/webapp/app/components/runs/v3/ai/extractAISummarySpanData.ts` (and related AI span extraction files in `apps/webapp/app/components/runs/v3/ai/`), manual JSON.parse with typeof guards and type assertions is intentional. These functions are on the hot path for span rendering, so Zod validation is deliberately avoided for performance reasons. Do not suggest replacing manual JSON parsing with Zod schemas in these files.

Applied to files:

apps/supervisor/src/env.ts

📚 Learning: 2026-03-26T09:02:07.973Z

Learnt from: myftija
Repo: triggerdotdev/trigger.dev PR: 3274
File: apps/webapp/app/services/runsReplicationService.server.ts:922-924
Timestamp: 2026-03-26T09:02:07.973Z
Learning: In `triggerdotdev/trigger.dev`, `TaskRun.annotations` are always written atomically in one operation that conforms exactly to the `RunAnnotations` schema (from `trigger.dev/core/v3`). Using `RunAnnotations.safeParse` in `#parseAnnotations` (e.g., in `apps/webapp/app/services/runsReplicationService.server.ts`) is intentional and correct — there is no risk of partial/legacy annotation payloads causing schema mismatches, so suggesting a relaxed passthrough schema for this parsing is unnecessary.

Applied to files:

apps/supervisor/src/env.ts

📚 Learning: 2026-03-22T13:26:12.060Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

apps/supervisor/src/env.ts

📚 Learning: 2026-03-22T19:24:14.403Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

apps/supervisor/src/env.ts

🔇 Additional comments (3)

apps/supervisor/src/env.ts (3)

126-135: LGTM!

Reformatting to explicit .trim().min(1) chains improves readability while maintaining the same validation behavior.

148-165: LGTM!

The rename from KUBERNETES_SCHEDULE_* to KUBERNETES_SCHEDULED_RUN_* provides better clarity about the purpose of these variables.

167-223: Well-structured toleration parsing with proper validation.

The implementation correctly handles both key=value:effect and key:effect formats, validates effects against known Kubernetes taint effects, and rejects empty keys. The ctx.addIssue pattern allows collecting all validation errors in a single parse attempt.

One edge case to be aware of: key=:NoSchedule (empty value) is accepted and produces { key: "key", operator: "Equal", value: "", effect: "NoSchedule" }. This is technically valid in Kubernetes (matches taints with an empty value), but might indicate user error. Consider whether an empty value warning is warranted.

Walkthrough

Kubernetes schedule-affinity environment variables in the supervisor were renamed from KUBERNETES_SCHEDULE_* to KUBERNETES_SCHEDULED_RUN_*. A new optional KUBERNETES_SCHEDULED_RUN_TOLERATIONS env var was added; it is parsed from a comma-separated key=value:effect or key:effect format into Kubernetes toleration objects with Zod validation that rejects malformed entries and invalid effects. Zod schema calls for schedule-related labels were reformatted without changing validation. Pod creation now conditionally sets spec.tolerations for scheduled runs via a new helper, and schedule-related affinity logic was updated to use the renamed env vars.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The description explains the feature, format, and rationale, but omits required template sections like Testing, Changelog details, and the checklist confirmation.	Complete the PR description template by adding Testing section (test steps taken), expanding Changelog section with more detail, and confirming the checklist items were followed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main feature being added: custom tolerations for scheduled runs in the supervisor module.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch scheduled-run-tolerations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

feat(supervisor): custom tolerations for scheduled runs

2ed5794

This comment was marked as resolved.

Sign in to view

Do the parsing on startup instead of at run time

46521c5

This comment was marked as resolved.

Sign in to view

Add validation guard for empty keys

450910e

nicktrn approved these changes Mar 30, 2026

View reviewed changes

myftija merged commit 7210bde into main Mar 30, 2026
39 checks passed

myftija deleted the scheduled-run-tolerations branch March 30, 2026 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(supervisor): custom tolerations for scheduled runs#3297

feat(supervisor): custom tolerations for scheduled runs#3297
myftija merged 3 commits intomainfrom
scheduled-run-tolerations

myftija commented Mar 30, 2026

Uh oh!

changeset-bot bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

❌ Failed checks (1 inconclusive)

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

myftija commented Mar 30, 2026

Uh oh!

changeset-bot bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Mar 30, 2026 •

edited

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading