Overview

Analytics alerts let you define rules that monitor API request performance and error rates. These rules evaluate metrics from the analytics.api_requests ClickHouse table, covering throughput, duration percentiles, and HTTP error counts.

Every analytics alert rule uses the Analytics Metric rule type.

Available metrics

Metric Summary Function Unit
Throughput throughput calls
Response Time - avg average seconds
Response Time - p75 average seconds
Response Time - p90 average seconds
Response Time - p95 average seconds
4XX Failure Count throughput
5XX Failure Count throughput
4XX and 5XX Failure Count throughput

All metrics are queried from the analytics.api_requests table.

Enter response time thresholds in seconds. The alert engine converts units automatically.

Targets

Analytics alert rules do not use project-level targets. Use Group By to scope which API endpoints or dimensions the rule evaluates.

Filters

Analytics rules inherit the standard filter mechanism. You can narrow evaluation to specific API endpoints, status codes, or other request attributes using filter conditions.

Group By

Group By is a required field for analytics alert rules. It groups the evaluation by API endpoint or another request dimension. Each group is evaluated independently against the threshold, and each group that violates the threshold creates a separate incident.

Common Group By values include:

  • API endpoint path
  • HTTP method
  • Service name
  • Status code category

Evaluation logic

The supported operators are:

Operator Triggers when
above Value is greater than or equal to the threshold
below Value is less than or equal to the threshold
equal Value equals the threshold

Each group produced by the Group By field is compared against the threshold independently. If three API endpoints all exceed the threshold, three separate incidents are opened.

Examples

API latency alert

Detect when the 95th percentile response time for any API endpoint exceeds an acceptable limit.

Field Value
Metric Response Time - p95
Operator above
Threshold 3 (3 seconds)
Group By API endpoint

This rule evaluates the 95th percentile duration for each API endpoint. If any endpoint's P95 latency exceeds 3 seconds in the evaluation window, an incident is opened for that endpoint. This helps catch slow endpoints before they affect user experience.

5XX error spike

Alert when server error responses surge for any API endpoint, indicating a potential backend failure.

Field Value
Metric 5XX Failure Count
Operator above
Threshold 50
Group By API endpoint

This rule counts 5XX responses per API endpoint. If any endpoint returns more than 50 server errors in the evaluation window, an incident is created. Pairing this with a throughput rule gives visibility into whether errors correlate with traffic spikes.