A threshold defines the boundary between normal and abnormal metric behavior. When a metric crosses the threshold for the configured duration, Atatus creates an issue and notifies you.
Threshold levels
Atatus supports two threshold levels per rule — Warning and Critical — each evaluated independently.
| Level | Purpose |
|---|---|
| Critical | Immediate attention required. The metric has crossed a dangerous boundary. |
| Warning | Early signal. The metric is trending in a concerning direction but has not reached a critical state yet. |
Both levels can have different threshold values and are evaluated on every scheduler tick. For example, you might set Warning at 70% CPU and Critical at 90% CPU.
If only a Critical threshold is configured, no Warning issues are created. The Warning threshold is optional.
Evaluation windows
The evaluation window (duration) controls how long a metric must breach the threshold before an alert triggers. Available windows:
| Duration | Use when |
|---|---|
| 5 minutes | Fast detection for critical metrics (error rates, availability) |
| 10 minutes | Balanced — filters brief spikes while catching sustained issues |
| 15 minutes | Good default for most performance metrics |
| 30 minutes | Reduces noise for metrics with natural variability |
| 60 minutes | Long-term trend detection (capacity, slow degradation) |
Longer durations reduce false positives but delay detection. Shorter durations catch issues faster but may trigger on transient spikes.
Time functions
The time function controls how violations within the evaluation window are counted.
| Function | Behavior | Best for |
|---|---|---|
| all | Alert triggers only if the threshold is breached for every minute in the evaluation window | Sustained issues — CPU staying above 90% for 15 straight minutes |
| any | Alert triggers if the threshold is breached in at least one minute within the evaluation window | Critical metrics where even a single breach matters — error rate spike |
Example: A 10-minute window with all requires 10 consecutive minutes of violation. The same window with any triggers on the first breaching minute.
Operators
| Operator | Triggers when |
|---|---|
| above | Metric value > threshold |
| below | Metric value < threshold |
| equal | Metric value = threshold |
Most alerts use above (e.g., response time above 2 seconds, CPU above 90%). Use below for metrics where a drop is concerning (e.g., throughput drops below 100 requests/minute, replicas available drops below desired count).
No-data handling
If there are not enough data points within the evaluation window, the alert returns an indeterminate result and does not trigger. This prevents false positives when metrics are sparse or temporarily missing.
For infrastructure monitoring, the Host Not Reporting rule type specifically detects missing data — it triggers when a host stops sending metrics for the configured duration.
How to choose threshold values
- Start generous, tighten over time. Set initial thresholds higher than you think necessary, then lower them as you learn normal operating ranges.
- Use Warning for early awareness, Critical for action. Warning thresholds give you time to investigate before a situation becomes urgent.
- Check historical data. Look at your metric's normal range over the past week before setting thresholds. Set Warning at the upper end of normal and Critical well beyond it.
- Match duration to the metric's behavior. Metrics with natural variability (CPU, network throughput) need longer durations (15-30 min). Stable metrics (error rate, availability) can use shorter durations (5 min).
- Prefer
alloveranyfor most metrics. Thealltime function prevents false positives from momentary spikes. Useanyonly for metrics where a single breach is genuinely critical.
Examples
APM response time
Alert when web response time exceeds 2 seconds for 5 consecutive minutes.
| Setting | Value |
|---|---|
| Metric | Web Response Time |
| Operator | above |
| Warning threshold | 1 seconds |
| Critical threshold | 2 seconds |
| Duration | 5 minutes |
| Time function | all |
Infrastructure CPU
Alert when CPU usage exceeds 90% at any point in a 15-minute window.
| Setting | Value |
|---|---|
| Metric | CPU Used Percentage |
| Operator | above |
| Warning threshold | 80 % |
| Critical threshold | 90 % |
| Duration | 15 minutes |
| Time function | any |
Kubernetes replicas
Alert when available replicas drops below the desired count for 5 minutes.
| Setting | Value |
|---|---|
| Metric | Replicas Available (Deployment) |
| Operator | below |
| Critical threshold | 3 replicas |
| Duration | 5 minutes |
| Time function | all |
+1-415-800-4104