Kubernetes alerts | Atatus Docs

Atatus Kubernetes alerts monitor the health and performance of your Kubernetes clusters. You can alert on CPU and memory usage for pods, containers, and nodes, track replica availability for deployments and statefulsets, detect pod restart loops, and monitor storage capacity.

Available metrics

Pod metrics

Metric	Summary Function	Unit
CPU Usage	average	millicore
Memory Usage	average	MB
Network Received	average	MB
Network Transmitted	average	MB
Pod Restart Count	average	—

Container metrics

Metric	Summary Function	Unit
CPU Usage	average	millicore
Memory Usage	average	MB
Container Restart Count	average	—

CronJob metrics

Metric	Summary Function	Unit
Active Count	average	—

DaemonSet metrics

Metric	Summary Function	Unit
Replicas Available	average	replicas
Replicas Desired	average	replicas

Deployment metrics

Metric	Summary Function	Unit
Replicas Available	average	replicas
Replicas Desired	average	replicas

Job metrics

Metric	Summary Function	Unit
Pods Succeeded	average	pods
Pods Active	average	—
Pods Failed	average	pods

Node metrics

Metric	Summary Function	Unit
CPU Usage	average	millicore
Memory Usage	average	GB

Storage metrics

Metric	Summary Function	Unit
PersistentVolume Capacity	average	MB
PersistentVolumeClaim Request Storage	average	GB

ReplicaSet metrics

Metric	Summary Function	Unit
Replicas Available	average	replicas
Replicas Desired	average	replicas

StatefulSet metrics

Metric	Summary Function	Unit
Replicas Observed	average	replicas
Replicas Desired	average	replicas
Replicas Ready	average	replicas

Enter CPU thresholds in millicores (1 core = 1000 millicores). Enter memory and network thresholds in MB or GB as shown. The alert engine converts units automatically.

How queries are executed

Kubernetes alert queries evaluate metrics per resource name, not at the cluster level. Each distinct pod, container, node, or workload is evaluated independently, and each violating resource creates its own incident.

Pod and container queries

A status filter subquery runs first to exclude completed pods (statusPhase != 'succeeded') or empty container names. Only active resources are evaluated.
The main query computes the metric value (e.g., avg(cpuUsageNanocores)) per 1-minute bucket, grouped by resource name.
The outer query counts how many buckets violated the threshold.
Filters like cluster, namespace, node, and deployment can narrow the scope via the kubeFilters field.

Restart count queries work differently: they compute the delta of statusRestarts (max - min) within the time window from the kubernetes.state_container table, rather than using an absolute count. This captures restarts that occurred during the evaluation period.

Workload queries (DaemonSet, Deployment, ReplicaSet, StatefulSet, CronJob, Job)

Workload queries evaluate metrics per workload name. For example, a deployment replicas alert computes avg(replicasAvailable) per 1-minute bucket for each deployment.

This is especially useful for detecting replica mismatches: set an alert when Replicas Available drops below a threshold to detect under-provisioned deployments.

Node queries

Node queries evaluate metrics per node name from the kubernetes.node table. Memory usage uses the memoryWorkingsetBytes column (working set memory, not total allocated memory), which reflects actual memory pressure.

Storage queries

PersistentVolume and PersistentVolumeClaim queries evaluate per volume name from their respective tables.

Common query pattern

All Kubernetes queries follow this structure:

SELECT name,
       countIf(value {operator} {threshold}) as violationCount,
       count() AS totalCount
FROM (
    SELECT name, avg({column}) AS value
    FROM kubernetes.{resource_table}
    WHERE {time_and_account_filters}
      AND name IN ({status_filter})
    GROUP BY {time_bucket}, name
)
GROUP BY name

The inner query computes the metric value per 1-minute bucket per resource. The outer query counts how many buckets violated the threshold. An alert triggers based on the time function (all = every bucket breached, any = at least one bucket breached).

Targets

Kubernetes alert rules target resources by name. Each violating resource creates its own incident.

Filters

Use the kubeFilters field to narrow the scope of evaluation: - cluster — restrict to a specific cluster - namespace — restrict to a specific namespace - node / nodeName — restrict to a specific node - deployment, daemonset, job, cronjob, replicaset — restrict to a specific workload

How alert evaluation works

Operators

Operator	Triggers when
`above`	metric value > threshold
`below`	metric value < threshold
`equal`	metric value = threshold

Evaluation windows

Available durations: 5, 10, 15, 30, or 60 minutes.

Time functions

Function	Behavior
`all`	Triggers only if every 1-minute bucket in the window breaches the threshold.
`any`	Triggers if at least one 1-minute bucket breaches the threshold.

Severity

Configure Warning and Critical thresholds independently.

Example configurations

Alert: Detect deployment replica mismatch — available replicas drops below 3.

Setting	Value
Metric	Replicas Available (Deployment)
Operator	below
Critical threshold	3 replicas
Duration	5 minutes
Time function	all
Filter: namespace	production

Alert: Pod memory usage exceeds 1024 MB.

Setting	Value
Metric	Memory Usage (Pod)
Operator	above
Warning threshold	800 MB
Critical threshold	1024 MB
Duration	10 minutes
Time function	all
Filter: namespace	production

Alert: Node CPU exceeds 4000 millicores (4 cores).

Setting	Value
Metric	CPU Usage (Node)
Operator	above
Critical threshold	4000 millicore
Duration	15 minutes
Time function	all