A threshold defines the boundary between normal and abnormal metric behavior. When a metric crosses the threshold for the configured duration, Atatus creates an issue and notifies you.

Threshold levels

Atatus supports two threshold levels per rule — Warning and Critical — each evaluated independently.

Level Purpose
Critical Immediate attention required. The metric has crossed a dangerous boundary.
Warning Early signal. The metric is trending in a concerning direction but has not reached a critical state yet.

Both levels can have different threshold values and are evaluated on every scheduler tick. For example, you might set Warning at 70% CPU and Critical at 90% CPU.

If only a Critical threshold is configured, no Warning issues are created. The Warning threshold is optional.

Evaluation windows

The evaluation window (duration) controls how long a metric must breach the threshold before an alert triggers. Available windows:

Duration Use when
5 minutes Fast detection for critical metrics (error rates, availability)
10 minutes Balanced — filters brief spikes while catching sustained issues
15 minutes Good default for most performance metrics
30 minutes Reduces noise for metrics with natural variability
60 minutes Long-term trend detection (capacity, slow degradation)

Longer durations reduce false positives but delay detection. Shorter durations catch issues faster but may trigger on transient spikes.

Time functions

The time function controls how violations within the evaluation window are counted.

Function Behavior Best for
all Alert triggers only if the threshold is breached for every minute in the evaluation window Sustained issues — CPU staying above 90% for 15 straight minutes
any Alert triggers if the threshold is breached in at least one minute within the evaluation window Critical metrics where even a single breach matters — error rate spike

Example: A 10-minute window with all requires 10 consecutive minutes of violation. The same window with any triggers on the first breaching minute.

Operators

Operator Triggers when
above Metric value > threshold
below Metric value < threshold
equal Metric value = threshold

Most alerts use above (e.g., response time above 2 seconds, CPU above 90%). Use below for metrics where a drop is concerning (e.g., throughput drops below 100 requests/minute, replicas available drops below desired count).

No-data handling

If there are not enough data points within the evaluation window, the alert returns an indeterminate result and does not trigger. This prevents false positives when metrics are sparse or temporarily missing.

For infrastructure monitoring, the Host Not Reporting rule type specifically detects missing data — it triggers when a host stops sending metrics for the configured duration.

How to choose threshold values

  1. Start generous, tighten over time. Set initial thresholds higher than you think necessary, then lower them as you learn normal operating ranges.
  2. Use Warning for early awareness, Critical for action. Warning thresholds give you time to investigate before a situation becomes urgent.
  3. Check historical data. Look at your metric's normal range over the past week before setting thresholds. Set Warning at the upper end of normal and Critical well beyond it.
  4. Match duration to the metric's behavior. Metrics with natural variability (CPU, network throughput) need longer durations (15-30 min). Stable metrics (error rate, availability) can use shorter durations (5 min).
  5. Prefer all over any for most metrics. The all time function prevents false positives from momentary spikes. Use any only for metrics where a single breach is genuinely critical.

Examples

APM response time

Alert when web response time exceeds 2 seconds for 5 consecutive minutes.

Setting Value
Metric Web Response Time
Operator above
Warning threshold 1 seconds
Critical threshold 2 seconds
Duration 5 minutes
Time function all

Infrastructure CPU

Alert when CPU usage exceeds 90% at any point in a 15-minute window.

Setting Value
Metric CPU Used Percentage
Operator above
Warning threshold 80 %
Critical threshold 90 %
Duration 15 minutes
Time function any

Kubernetes replicas

Alert when available replicas drops below the desired count for 5 minutes.

Setting Value
Metric Replicas Available (Deployment)
Operator below
Critical threshold 3 replicas
Duration 5 minutes
Time function all