Datadog Integration
Use Datadog monitors as health check sources for your rollouts.
Overview
Kuberik can verify deployments by checking DatadogMonitor resource status:
flowchart LR
%% Styles
classDef datadog fill:#632CA6,stroke:#fff,stroke-width:2px,color:#fff
classDef kuberik fill:#4B4BE8,stroke:#fff,stroke-width:2px,color:#fff
classDef crd fill:#2D7E9D,stroke:#fff,stroke-width:2px,color:#fff
classDef boundary fill:#F3F4F6,stroke:#D1D5DB,stroke-width:1px,color:#374151,stroke-dasharray: 5 5
subgraph Datadog
MON[Monitor Status]:::datadog
end
subgraph Cluster
direction TB
DDM[DatadogMonitor CRD]:::crd
subgraph Kuberik
HC[HealthCheck]:::kuberik
RO[Rollout]:::kuberik
end
end
MON --> DDM
DDM --> HC
HC --> RO
class Kuberik boundary
If a Datadog monitor is alerting during bake time, the rollout is marked failed.
Setup
Create DatadogMonitor
Define a monitor that checks your application health:
datadog-monitor.yaml
apiVersion: datadoghq.com/v1alpha1
kind: DatadogMonitor
metadata:
name: my-app-error-rate
namespace: default
annotations:
# Enable as Kuberik health check
kuberik.com/health-check: "true"
labels:
# Used by Rollout to select this check
app: my-app
spec:
name: "My App Error Rate"
type: metric alert
query: "avg(last_5m):sum:my_app.errors{env:production} > 10"
message: "Error rate too high"
tags:
- "env:production"
- "team:platform"Apply it:
kubectl apply -f datadog-monitor.yamlConnect to Rollout
Configure your Rollout to select Datadog monitors:
rollout.yaml
apiVersion: kuberik.com/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
releasesImagePolicy:
name: my-app
versionHistoryLimit: 5
bakeTime: 10m
# Select monitors with matching labels
healthCheckSelector:
selector:
matchLabels:
app: my-appHow It Works
During bake time:
- Kuberik finds DatadogMonitors with
kuberik.com/health-check: "true" - Filters by
healthCheckSelectorlabels - Checks each monitor’s status
- If any monitor is in
Alertstate → rollout fails - If all monitors are
OK→ rollout proceeds
Monitor Types
Kuberik supports all Datadog monitor types:
| Type | Use Case |
|---|---|
metric alert | Error rates, latency thresholds |
service check | Service availability |
event alert | Error log patterns |
process alert | Process health |
Example: Latency Monitor
error-rate-monitor.yaml
spec:
name: "P99 Latency"
type: metric alert
query: "avg(last_5m):percentile(my_app.request.latency{env:production}, 0.99) > 500"
message: "P99 latency exceeds 500ms"Example: Error Rate Monitor
spec:
name: "Error Rate"
type: metric alert
query: "sum(last_5m):sum:my_app.errors{env:production} / sum:my_app.requests{env:production} > 0.01"
message: "Error rate exceeds 1%"Best Practices
Use Bake-Specific Monitors
Create monitors specifically for deployment verification:
metadata:
name: my-app-deploy-check
annotations:
kuberik.com/health-check: "true"
spec:
query: "avg(last_2m):avg:my_app.startup.success{version:${version}} < 0.99"Appropriate Thresholds
- Set thresholds that catch real issues
- Avoid flaky monitors that alert randomly
- Use
last_5mor longer for stability
Separate Labels
Use distinct labels for deployment checks vs. alerting:
labels:
app: my-app
purpose: deployment-gate # Only select these for rolloutsTroubleshooting
Monitor not being checked
Verify annotation is present:
kubectl get datadogmonitor my-app-monitor -o yamlCheck labels match Rollout selector:
kubectl describe rollout my-app
False positives
If monitors are too sensitive:
- Increase evaluation window (
last_5m→last_10m) - Adjust thresholds
- Add
no datahandling