SLO alerts
Overview
SLO alerts notify you when your service level objectives are at risk. Instead of alerting on raw metrics, SLO alerts track error budgets and burn rates to give you early warning when reliability is degrading, so you can respond before your users are affected.
SLO concepts
Service level indicator (SLI)
The SLI is the ratio of good events to total events, expressed as a percentage:
SLI = (good_events / total_events) * 100
For example, if your service handled 10,000 requests and 9,950 were successful, your SLI is 99.5%.
Target
The target is the desired SLI percentage for a given time window. For example, a target of 99.9% means you aim for no more than 0.1% of events to be failures.
Error budget
The error budget represents how much unreliability you can tolerate within the SLO window:
error_budget = (1 - target / 100) * window_minutes
For a 99.9% target over 30 days (43,200 minutes), the error budget is 43.2 minutes of allowed downtime.
Burn rate
The burn rate measures how fast you are consuming your error budget relative to the elapsed time in the window:
burn_rate = (error_budget_consumed_pct / 100) / elapsed_fraction
A burn rate of 1.0 means the budget is being consumed at exactly the expected rate. A burn rate above 1.0 means the budget is being consumed faster than sustainable, and the SLO will be breached before the window ends if the rate continues.
SLO window types
Rolling windows
Rolling windows look back a fixed duration from the current time:
| Window | Duration (minutes) |
|---|---|
| 7 days | 10,080 |
| 14 days | 20,160 |
| 28 days | 40,320 |
| 30 days | 43,200 |
| 90 days | 129,600 |
Calendar windows
Calendar windows align to calendar boundaries:
| Window | Duration (minutes) |
|---|---|
| Weekly | 10,080 |
| Monthly | 43,200 |
| Quarterly | 129,600 |
SLO statuses
Each SLO is assigned one of the following statuses based on current performance:
| Status | Description |
|---|---|
HEALTHY |
SLI meets the target |
AT_RISK |
SLI is within the warning threshold or more than 75% of the error budget is consumed |
BREACHED |
SLI is below the target or 100% of the error budget is consumed |
NO_DATA |
Insufficient data to evaluate the SLO |
Alert types
Burn rate alerts
Burn rate alerts fire when the rate of error budget consumption exceeds a threshold. These are useful for detecting fast-moving incidents that will exhaust your budget quickly.
The alert fires when:
burn_rate >= burn_rate_threshold
Budget consumed alerts
Budget consumed alerts fire when the total percentage of error budget consumed exceeds a threshold. These provide a direct measure of how much budget remains, regardless of the rate.
The alert fires when:
error_budget_consumed_pct >= consumed_percentage
Alert configuration
Schema fields
| Field | Type | Description |
|---|---|---|
name |
String | Alert name |
sloId |
ObjectId | Reference to the SLO definition |
condition |
String | Alert type: burn_rate, budget_remaining, compliance |
operator |
String | Comparison operator: greater_than, less_than |
threshold |
Number | Threshold value (default: 5) |
timeWindow |
String | Evaluation window: 1h, 6h, 12h, 24h, 7d, 30d |
severity |
String | Alert severity: info, warning, critical |
enabled |
Boolean | Whether the alert is active |
channels |
Array | Notification channel IDs to receive alerts |
Evaluation
SLO alerts are evaluated every minute via a scheduled cron job. SLOs are processed in batches of 10 to manage system load.
Examples
Fast burn rate alert
Detect rapid error budget consumption that would exhaust the budget within hours:
{
"name": "API availability - fast burn",
"sloId": "648a1b2c3d4e5f6a7b8c9d0e",
"condition": "burn_rate",
"operator": "greater_than",
"threshold": 14,
"timeWindow": "1h",
"severity": "critical",
"enabled": true,
"channels": ["648a1b2c3d4e5f6a7b8c9d0f"]
}
This fires a critical alert when the burn rate exceeds 14x over a one-hour window, indicating the error budget will be fully consumed in roughly two hours if the trend continues.
Error budget exhaustion warning
Get an early warning when a significant portion of the error budget has been consumed:
{
"name": "Checkout flow - budget warning",
"sloId": "648a1b2c3d4e5f6a7b8c9d1a",
"condition": "budget_remaining",
"operator": "greater_than",
"threshold": 75,
"timeWindow": "30d",
"severity": "warning",
"enabled": true,
"channels": ["648a1b2c3d4e5f6a7b8c9d1b"]
}
This fires a warning when more than 75% of the 30-day error budget has been consumed, giving the team time to investigate and course-correct before the SLO is breached.
+1-415-800-4104