Documentation Index
Fetch the complete documentation index at: https://docs.impulselabs.ai/llms.txt
Use this file to discover all available pages before exploring further.
This runbook covers four cost anomaly scenarios: unexpected runtime spikes, budget exhaustion, runaway retries, and high-cost sessions. Use the Cloud Billing budget alerts as the primary entry point.
1. Detecting Cost Anomalies
Budget alert channels
| Alert | Threshold | Action |
|---|
gpu-budget-50pct | 50 % of monthly budget consumed | Review trend; no immediate action |
gpu-budget-90pct | 90 % of monthly budget consumed | Notify Engineering and Finance |
gpu-budget-exceeded | 100 % / forecast 120 % | Activate containment (this runbook) |
gpu-daily-spike | Daily spend > 2× 7-day rolling average | Investigate within 1 hour |
Manual cost review
# View GPU cost breakdown for the current month (BigQuery billing export)
bq query --use_legacy_sql=false \
--project=impulse-gpu-runtime \
"SELECT
sku.description,
SUM(cost) AS total_cost,
SUM(usage.amount) AS total_usage,
usage.unit
FROM \`billing_export.gcp_billing_export_v1_XXXXX\`
WHERE
DATE(usage_start_time) >= DATE_TRUNC(CURRENT_DATE(), MONTH)
AND service.description = 'Vertex AI'
AND project.id = 'impulse-gpu-runtime'
GROUP BY 1, 4
ORDER BY 2 DESC"
2. Unexpected Runtime Spikes
Symptoms: GPU accelerator-seconds billed in a given hour is significantly higher than baseline; Cloud Monitoring shows vertex_ai/custom_jobs/running_jobs elevated.
Investigate
# List all currently running Vertex AI custom jobs
gcloud ai custom-jobs list \
--project=impulse-gpu-runtime \
--region=us-central1 \
--filter="state=JOB_STATE_RUNNING" \
--format="table(name,displayName,createTime)"
# Check for jobs that have been running unusually long
gcloud ai custom-jobs list \
--project=impulse-gpu-runtime \
--region=us-central1 \
--filter="state=JOB_STATE_RUNNING AND createTime<$(date -d '2 hours ago' -Iseconds)" \
--format="table(name,displayName,createTime)"
Correlate with job submissions
# Count job submissions per hour for the last 24 hours (scheduler logs)
kubectl logs -n gpu-runtime -l app=gpu-scheduler --since=24h | \
grep "job_submitted" | \
awk '{print $1}' | cut -dT -f1 | cut -dH -f1 | sort | uniq -c
Contain
If you identify an unexpected surge of long-running jobs:
# Pause new intake immediately
kubectl set env deployment/gpu-scheduler \
-n gpu-runtime \
PAUSE_JOB_INTAKE=true
# Cancel jobs that exceed the expected maximum runtime (e.g. 2 hours)
gcloud ai custom-jobs list \
--project=impulse-gpu-runtime \
--region=us-central1 \
--filter="state=JOB_STATE_RUNNING AND createTime<$(date -d '2 hours ago' -Iseconds)" \
--format="value(name)" | \
while read -r job; do
gcloud ai custom-jobs cancel "$job" \
--project=impulse-gpu-runtime \
--region=us-central1
done
3. Budget Exhaustion
Symptoms: The gpu-budget-exceeded alert fires; GCP Budget enforcer may start restricting project spend; new Vertex AI jobs may be rejected.
GCP Budget alerts are informational by default and do not automatically stop resource usage. You must manually take containment steps.
-
Pause job intake:
kubectl set env deployment/gpu-scheduler \
-n gpu-runtime \
PAUSE_JOB_INTAKE=true
-
Cancel all non-critical running jobs (coordinate with Product):
gcloud ai custom-jobs list \
--project=impulse-gpu-runtime \
--region=us-central1 \
--filter="state=JOB_STATE_RUNNING" \
--format="value(name)" | \
while read -r job; do
gcloud ai custom-jobs cancel "$job" \
--project=impulse-gpu-runtime \
--region=us-central1
done
-
Notify Finance and Engineering with current spend, forecast, and containment actions taken.
Request emergency budget increase
- Log into the GCP Console → Billing → Budgets & Alerts.
- Select the
impulse-gpu-runtime budget.
- Click Edit and increase the budget amount.
- Notify the Finance team of the temporary increase and the expected overage.
Resume job intake
Once Finance approves the temporary budget increase:
kubectl set env deployment/gpu-scheduler \
-n gpu-runtime \
PAUSE_JOB_INTAKE=false
4. Runaway Retries
Symptoms: A single job_id or small set of jobs appears repeatedly in Vertex AI job history; DLQ message count grows; billing for the same logical job is abnormally high.
Identify runaway jobs
# Find job_ids with > 5 submission attempts in the last 24 hours
kubectl logs -n gpu-runtime -l app=gpu-scheduler --since=24h | \
grep "job_submitted" | \
jq -r '.job_id' | sort | uniq -c | sort -rn | head -20
Block a specific job from further retries
# Mark job as permanently failed to stop retry loop
curl -s -X PATCH "https://api.impulselabs.ai/internal/gpu/jobs/$JOB_ID" \
-H "Authorization: Bearer $IMPULSE_SERVICE_TOKEN" \
-H "Content-Type: application/json" \
-d '{"status": "FAILED", "error_code": "MAX_RETRIES_EXCEEDED", "retry_blocked": true}'
Verify retry circuit-breaker configuration
The scheduler enforces a maximum of 5 total attempts per job_id. Verify this setting:
kubectl get configmap gpu-scheduler-config -n gpu-runtime -o yaml | \
grep -E "max_retries|retry"
If max_retries is set higher than 5 or is missing, update the ConfigMap and restart the scheduler:
kubectl patch configmap gpu-scheduler-config -n gpu-runtime \
--type=merge \
-p '{"data":{"max_retries":"5"}}'
kubectl rollout restart deployment/gpu-scheduler -n gpu-runtime
5. High-Cost Sessions
Symptoms: A small number of user sessions account for a disproportionate share of GPU spend; per-session cost exceeds configured caps.
Identify high-cost sessions
# Query BigQuery billing for top sessions this month
bq query --use_legacy_sql=false \
--project=impulse-gpu-runtime \
"SELECT
labels.value AS session_id,
SUM(cost) AS total_cost,
COUNT(*) AS job_count
FROM \`billing_export.gcp_billing_export_v1_XXXXX\`,
UNNEST(labels) AS labels
WHERE
labels.key = 'session_id'
AND DATE(usage_start_time) >= DATE_TRUNC(CURRENT_DATE(), MONTH)
AND service.description = 'Vertex AI'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20"
Terminate and cap a high-cost session
# Get all running jobs for a session
curl -s "https://api.impulselabs.ai/internal/gpu/sessions/$SESSION_ID/jobs?status=RUNNING" \
-H "Authorization: Bearer $IMPULSE_SERVICE_TOKEN" | jq -r '.[].vertex_job_name' | \
while read -r vertex_job; do
gcloud ai custom-jobs cancel "$vertex_job" \
--project=impulse-gpu-runtime \
--region=us-central1
done
# Set a per-session GPU spend cap (update billing enforcement config)
curl -s -X PATCH "https://api.impulselabs.ai/internal/gpu/sessions/$SESSION_ID/limits" \
-H "Authorization: Bearer $IMPULSE_SERVICE_TOKEN" \
-H "Content-Type: application/json" \
-d '{"max_gpu_cost_usd": 50.0, "action_on_limit": "terminate"}'
Long-term prevention
| Action | Owner | Timeline |
|---|
| Implement per-session GPU spend caps at the scheduler level | Backend Engineering | 2 weeks |
Add session_id label to all Vertex AI custom job submissions | Backend Engineering | 1 week |
| Create per-customer spend anomaly detection alert in Cloud Monitoring | SRE | 1 week |
Review and enforce max_runtime_seconds per job tier | Product + Backend | 1 week |
Post-Incident Checklist