Atatus Kubernetes alerts monitor the health and performance of your Kubernetes clusters. You can alert on CPU and memory usage for pods, containers, and nodes, track replica availability for deployments and statefulsets, detect pod restart loops, and monitor storage capacity.

Available metrics

Pod metrics

Metric Summary Function Unit
CPU Usage average millicore
Memory Usage average MB
Network Received average MB
Network Transmitted average MB
Pod Restart Count average

Container metrics

Metric Summary Function Unit
CPU Usage average millicore
Memory Usage average MB
Container Restart Count average

CronJob metrics

Metric Summary Function Unit
Active Count average

DaemonSet metrics

Metric Summary Function Unit
Replicas Available average replicas
Replicas Desired average replicas

Deployment metrics

Metric Summary Function Unit
Replicas Available average replicas
Replicas Desired average replicas

Job metrics

Metric Summary Function Unit
Pods Succeeded average pods
Pods Active average
Pods Failed average pods

Node metrics

Metric Summary Function Unit
CPU Usage average millicore
Memory Usage average GB

Storage metrics

Metric Summary Function Unit
PersistentVolume Capacity average MB
PersistentVolumeClaim Request Storage average GB

ReplicaSet metrics

Metric Summary Function Unit
Replicas Available average replicas
Replicas Desired average replicas

StatefulSet metrics

Metric Summary Function Unit
Replicas Observed average replicas
Replicas Desired average replicas
Replicas Ready average replicas

Enter CPU thresholds in millicores (1 core = 1000 millicores). Enter memory and network thresholds in MB or GB as shown. The alert engine converts units automatically.

How queries are executed

Kubernetes alert queries evaluate metrics per resource name, not at the cluster level. Each distinct pod, container, node, or workload is evaluated independently, and each violating resource creates its own incident.

Pod and container queries

  1. A status filter subquery runs first to exclude completed pods (statusPhase != 'succeeded') or empty container names. Only active resources are evaluated.
  2. The main query computes the metric value (e.g., avg(cpuUsageNanocores)) per 1-minute bucket, grouped by resource name.
  3. The outer query counts how many buckets violated the threshold.
  4. Filters like cluster, namespace, node, and deployment can narrow the scope via the kubeFilters field.

Restart count queries work differently: they compute the delta of statusRestarts (max - min) within the time window from the kubernetes.state_container table, rather than using an absolute count. This captures restarts that occurred during the evaluation period.

Workload queries (DaemonSet, Deployment, ReplicaSet, StatefulSet, CronJob, Job)

Workload queries evaluate metrics per workload name. For example, a deployment replicas alert computes avg(replicasAvailable) per 1-minute bucket for each deployment.

This is especially useful for detecting replica mismatches: set an alert when Replicas Available drops below a threshold to detect under-provisioned deployments.

Node queries

Node queries evaluate metrics per node name from the kubernetes.node table. Memory usage uses the memoryWorkingsetBytes column (working set memory, not total allocated memory), which reflects actual memory pressure.

Storage queries

PersistentVolume and PersistentVolumeClaim queries evaluate per volume name from their respective tables.

Common query pattern

All Kubernetes queries follow this structure:

SELECT name,
       countIf(value {operator} {threshold}) as violationCount,
       count() AS totalCount
FROM (
    SELECT name, avg({column}) AS value
    FROM kubernetes.{resource_table}
    WHERE {time_and_account_filters}
      AND name IN ({status_filter})
    GROUP BY {time_bucket}, name
)
GROUP BY name

The inner query computes the metric value per 1-minute bucket per resource. The outer query counts how many buckets violated the threshold. An alert triggers based on the time function (all = every bucket breached, any = at least one bucket breached).

Targets

Kubernetes alert rules target resources by name. Each violating resource creates its own incident.

Filters

Use the kubeFilters field to narrow the scope of evaluation: - cluster — restrict to a specific cluster - namespace — restrict to a specific namespace - node / nodeName — restrict to a specific node - deployment, daemonset, job, cronjob, replicaset — restrict to a specific workload

How alert evaluation works

Operators

Operator Triggers when
above metric value > threshold
below metric value < threshold
equal metric value = threshold

Evaluation windows

Available durations: 5, 10, 15, 30, or 60 minutes.

Time functions

Function Behavior
all Triggers only if every 1-minute bucket in the window breaches the threshold.
any Triggers if at least one 1-minute bucket breaches the threshold.

Severity

Configure Warning and Critical thresholds independently.

Example configurations

Alert: Detect deployment replica mismatch — available replicas drops below 3.

Setting Value
Metric Replicas Available (Deployment)
Operator below
Critical threshold 3 replicas
Duration 5 minutes
Time function all
Filter: namespace production

Alert: Pod memory usage exceeds 1024 MB.

Setting Value
Metric Memory Usage (Pod)
Operator above
Warning threshold 800 MB
Critical threshold 1024 MB
Duration 10 minutes
Time function all
Filter: namespace production

Alert: Node CPU exceeds 4000 millicores (4 cores).

Setting Value
Metric CPU Usage (Node)
Operator above
Critical threshold 4000 millicore
Duration 15 minutes
Time function all