An issue represents a single metric violation on a specific target (application, host, pod, or other entity). When a rule's threshold is breached, Atatus creates an issue for each affected target and groups it into an incident based on the policy's incident preference.

Issues vs incidents

Concept What it represents Example
Issue One metric on one target breached a threshold "Web Response Time exceeded 2s on checkout-service"
Incident A container that groups one or more related issues "Production API Health — Critical" (may contain multiple issues)

A single incident can contain multiple issues. For example, if CPU usage spikes on three hosts, the policy creates three issues (one per host) grouped into one or more incidents depending on the incident preference.

Issue details

Every issue records:

Field Description
Rule The alert rule that was violated
Policy The alert policy the rule belongs to
Target The application, host, or entity where the violation occurred
Metric The specific metric that breached (e.g., Web Response Time, CPU Used Percentage)
Severity Critical or Warning — determined by which threshold tier was breached
Operator The comparison used (above, below, or equal)
Threshold The configured threshold value that was exceeded
Duration How long the issue has been open
Start time When the violation was first detected
Status Opened or Closed

Issue lifecycle

Threshold breached          Condition resolves
      │                           │
      ▼                           ▼
   Opened ──────────────────► Closed
      │                           ▲
      │       User closes         │
      └───────────────────────────┘
  • Opened — The metric crossed the threshold for the configured duration. Atatus creates the issue, links it to an incident, and sends notifications.
  • Closed — The condition resolved (metric returned to normal) or a user manually closed the issue.

Issues do not have an "Acknowledged" state — acknowledgment happens at the incident level.

Severity levels

Severity When it triggers
Critical The metric breached the Critical threshold
Warning The metric breached the Warning threshold but not the Critical threshold

If both Warning and Critical thresholds are configured and the metric exceeds both, the issue is created with Critical severity.

Closing issues

Issues can be closed in two ways:

  • Automatic — The alerting engine detects that the metric has returned to normal (no longer violating the threshold). The issue is closed automatically and the closedBy field is set to "Atatus".
  • Manual — A user closes the issue from the Alerting page. The closedBy field records the user's name.

When all issues in an incident are closed, the incident is also closed automatically.

Viewing issues

Navigate to Alerting > Issues to see all issues. You can filter by:

  • Status — Open or Closed
  • Incident — View issues belonging to a specific incident
  • Project — Filter by application or project
  • Search — Search by rule name, target name, or metric

Each issue links to its parent incident and shows a chart of the metric's value over time relative to the configured threshold.

Example

A policy "Production API Health" has a rule: Web Response Time above 2 seconds for 5 minutes, using the all time function.

At 14:00, the checkout-service application starts responding slowly. For every minute from 14:00 to 14:05, the average response time exceeds 2 seconds.

At 14:05, the alerting engine evaluates the rule: 1. Issue created — "Web Response Time exceeded 2 seconds on checkout-service" with severity Critical 2. The issue is grouped into an incident based on the policy's incident preference 3. Notifications are sent to the configured channels

At 14:12, response time drops back below 2 seconds and stays there for the full evaluation window. The engine closes the issue automatically.