Tool Reference | Atatus Docs

This page provides a comprehensive list of all 44 tools available in the Atatus MCP server, organized into six categories. For each tool, you will find its description, the reason it is useful, a sample question that triggers it, its inputs, and typical next steps for chaining queries. You do not need to call these tools manually; your AI assistant will automatically select and chain the correct tools based on your question.

Almost all tools are read-only, with the exception of update_error_status (which performs write operations). Write tools require a Read & Write API key. You can find the access level for each tool in the Access column of the reference tables.

Category	Count
APM / Application	14
Kubernetes	12
Infrastructure	9
Logs	1
Distributed Tracing	5
AI-Powered Analysis	3

Most time-windowed tools accept a time_dur parameter. Refer to Time ranges and Response size and pagination at the bottom of this page for more details.

APM / Application

Tool	Access	Answers
`list_projects`	Read	What projects exist in this account?
`get_apm_metrics`	Read	How is this service doing overall?
`get_service_health_summary`	Read	Is service X healthy? (one-call verdict)
`get_recent_errors`	Read	What is currently breaking?
`get_error_details`	Read	Why is this error happening?
`get_error_events`	Read	Who/what is each occurrence hitting?
`get_error_trends`	Read	When did errors spike?
`get_recent_transactions`	Read	Which endpoint is slow?
`get_transaction_spans`	Read	Why is this transaction slow?
`get_apm_database_calls`	Read	How are my database calls performing?
`get_apm_timeseries`	Read	How did performance trend over time?
`get_recent_deployments`	Read	What did we deploy, and when?
`correlate_deploy_with_incident`	Read	What changed around this incident?
`update_error_status`	Write	Resolve / ignore / reopen an error

`list_projects`

Description: Returns all monitored applications and projects in the account (including name, type, language, project ID, and active status).
Use Case: Helps the assistant discover the required project_id for subsequent queries, so you do not have to look up or memorize IDs.
Sample Question: "What projects do we have in Atatus?"
Inputs: None.

`get_apm_metrics`

Description: Returns project-level APM aggregates (including Apdex score details, average response time, throughput, and failure rate).
Use Case: Provides a high-level view of service health in a single call, unlike endpoint-specific tools.
Sample Question: "How healthy is order-service overall today?"
Inputs: project_id, time_dur, optional transaction substring filter.
Next Steps: Use get_apm_timeseries to see performance trends or get_recent_transactions to find slow endpoints.

`get_service_health_summary`

Description: Evaluates service health to return a status (HEALTHY, WATCH, or DEGRADED) along with golden signals (such as Apdex, error rates, top open errors, slow endpoints, and recent deployments).
Use Case: Quickly checks if a service is running smoothly by consolidating multiple queries into a single response.
Sample Question: "Is checkout-service healthy right now?"
Inputs: project_id, time_dur (default 1h).
Next Steps: If the service is in a WATCH or DEGRADED state, use get_error_details to investigate top errors, get_recent_transactions for latency, or correlate_deploy_with_incident to check recent deployments.

`get_recent_errors`

Description: Lists the top error groups in a project ranked by impact (frequency and affected users).
Use Case: Helps identify what is currently breaking without manually filtering the Errors dashboard.
Sample Question: "Show me the top 5 unresolved errors in checkout-service in the last hour."
Inputs: project_id, time_dur, status, optional filters (like app version, browser, OS, URL).
Next Steps: Use get_error_details to view the stack trace and triggering deployment for the top error group.

`get_error_details`

Description: Returns the stack trace, request context, deployment details, and sample events for a specific error group.
Use Case: Helps diagnose the root cause of an error by showing where it occurred and which request triggered it.
Sample Question: "Tell me everything about error abc123."
Inputs: project_id, error_id.
Next Steps: Use get_error_events to see how occurrences vary across users and environments.

`get_error_events`

Description: Provides paginated instances of an error group, showing full stack traces, request payloads, user info, and environment details.
Use Case: Helps identify patterns across occurrences, such as whether an error only affects a specific browser or environment.
Sample Question: "Show me 5 sample events of error abc123 from the last 6 hours."
Inputs: project_id, error_id, time_dur, limit.

`get_error_trends`

Description: Tracks error frequency over time, showing total, peak, and latest counts.
Use Case: Helps correlate error spikes with specific events, such as deployments or traffic surges.
Sample Question: "Did errors spike in the last 24 hours in order-service?"
Inputs: project_id, time_dur.
Next Steps: Use get_recent_errors to identify which errors drove the spike, or get_recent_deployments to check for recent releases.

`get_recent_transactions`

Description: Lists performance metrics (average/min/max response time, throughput, failure rate, and Apdex) for endpoints.
Use Case: Pinpoints which transactions or endpoints are slowest. Includes a transaction ID to chain into span analysis.
Sample Question: "Which transactions are slowest in order-service this week?"
Inputs: project_id, time_dur.
Next Steps: Use get_transaction_spans with the transaction ID to identify the bottleneck.

`get_transaction_spans`

Description: Breaks down a transaction into database queries, external HTTP calls, and internal execution time.
Use Case: Helps understand why a specific endpoint is slow by highlighting the slowest layer.
Sample Question: "Why is GET /checkout slow? Show me the spans."
Inputs: project_id, transaction_id, time_dur.
Next Steps: Use get_recent_deployments to check if a recent release caused a regression.

`get_apm_database_calls`

Description: Lists database engines (like PostgreSQL, MySQL, Redis) with call volumes, response times, and slowest queries.
Use Case: Helps identify slow queries and database performance bottlenecks.
Sample Question: "Which database queries are slowest in order-service?"
Inputs: project_id, time_dur, optional database engine filter, order_by (responseTime / throughput), limit.
Next Steps: Use get_slowest_spans (span_type=database) for a ranked list of slow spans across traces.

`get_apm_timeseries`

Description: Returns performance trends over time, including response times, throughput, and HTTP failures.
Use Case: Visualizes performance trends and anomalies, such as sudden traffic drops or latency spikes.
Sample Question: "How did response time trend over the last 24 hours in order-service?"
Inputs: project_id, time_dur, optional transaction filter.
Next Steps: Use get_recent_transactions to find the affected endpoints, or get_recent_deployments to check for correlated releases.

`get_recent_deployments`

Description: Lists deployment markers, including version, release time, deployer, environment, and repository.
Use Case: Helps determine if a deployment caused a recent issue.
Sample Question: "What did we deploy in the last 6 hours?"
Inputs: project_id, limit.
Next Steps: Use correlate_deploy_with_incident or get_error_trends to analyze the impact of the deployment.

`correlate_deploy_with_incident`

Description: Analyzes error rates before and after deployments within a lookback window.
Use Case: Quickly answers if a recent deployment caused an incident, replacing manual lookup sequences.
Sample Question: "What changed around 2:30pm in the payments service?"
Inputs: project_id, optional incident_time (ISO 8601), lookback_minutes (default 60), compare_window_minutes (default 15).
Next Steps: For the deployment with the largest error increase, use get_recent_errors and get_error_details to investigate.

`update_error_status`

Description: Updates the status of an error group to Resolved, Ignored, or Open.
Use Case: Allows automated systems or assistants to manage error status. Requires a Read & Write API key.
Sample Question: "Mark error abc123 as resolved."
Inputs: project_id, error_id, status (Resolved / Ignored / Open).

Important: update_error_status mutates state and is the only write tool in the server. It is unavailable to Read-scoped keys. A Resolved error can reopen if it recurs; an Ignored error stays muted and sends no notifications.

Kubernetes

Tool	Access	Answers
`get_kubernetes_overview`	Read	Is the cluster OK? (snapshot)
`get_kubernetes_cluster_names`	Read	What clusters/namespaces exist?
`list_kubernetes_pods`	Read	Pod-level investigation
`get_kubernetes_pending_pods`	Read	What's stuck pending?
`list_kubernetes_nodes`	Read	Node capacity & pressure
`list_kubernetes_deployments`	Read	Did the rollout finish?
`get_kubernetes_events`	Read	Why did this fail?
`get_kubernetes_unhealthy_workloads`	Read	What's broken in the workload tier?
`list_kubernetes_storage`	Read	Any storage/PVC issues?
`get_pod_details`	Read	Full picture of one pod (`kubectl describe`)
`list_pods_on_node`	Read	Which pods are eating this node?
`get_pod_logs`	Read	What did this pod log before it died?

`get_kubernetes_overview`

Description: Provides a cluster-wide summary of nodes, pods, deployments, and other key resources.
Use Case: Gives a high-level snapshot of cluster health. Note that on some accounts this snapshot can come back empty, so you should use get_kubernetes_unhealthy_workloads for a more reliable view.
Sample Question: "Give me a snapshot of the production cluster."
Inputs: time_dur, optional cluster, namespace.
Next Steps: Use get_kubernetes_unhealthy_workloads to investigate specific workload failures.

`get_kubernetes_cluster_names`

Description: Lists monitored clusters and namespaces.
Use Case: Helps verify cluster and namespace names before running other commands. It is a discovery tool and does not need to be called routinely.
Sample Question: "What clusters and namespaces are we monitoring?"
Inputs: time_dur, optional cluster filter.

`list_kubernetes_pods`

Description: Lists pods with status, node, CPU, memory, restarts, and owner workload details.
Use Case: Standard tool for investigating pod health and resource usage, sortable by CPU, memory, or restarts.
Sample Question: "Top 10 pods by CPU in the prod cluster."
Inputs: time_dur, optional cluster / namespace, order_by, limit, page.
Next Steps: Use get_pod_details on a failing pod, then get_pod_logs to review logs.

`get_kubernetes_pending_pods`

Description: Lists pods stuck in a Pending state, highlighting node assignment issues.
Use Case: Helps debug scheduling issues such as resource exhaustion, taints, or affinity rules.
Sample Question: "Are any pods stuck pending right now?"
Inputs: time_dur, optional cluster / namespace.
Next Steps: Use list_kubernetes_nodes to check capacity or get_kubernetes_events to look for scheduling errors.

`list_kubernetes_nodes`

Description: Returns node capacity, resource usage, and pressure conditions (like MemoryPressure, DiskPressure, PIDPressure, or Ready).
Use Case: Useful for debugging scheduling failures or auditing cluster capacity.
Sample Question: "Are any nodes under MemoryPressure?"
Inputs: time_dur, optional cluster, order_by, pagination.
Next Steps: Use list_pods_on_node to see which pods are consuming resources on a pressured node.

`list_kubernetes_deployments`

Description: Tracks desired, available, updated, and unavailable pod counts for deployments.
Use Case: Helps verify if a deployment rollout has completed successfully.
Sample Question: "Are all deployments in the orders namespace healthy?"
Inputs: time_dur, optional cluster / namespace, pagination.
Next Steps: Use get_pod_details and get_pod_logs if a deployment fails to reach its desired replica count.

`get_kubernetes_events`

Description: Retrieves the Kubernetes event stream, including warnings and error reasons.
Use Case: The primary tool for diagnosing workload failures (such as CrashLoopBackOff or ImagePullBackOff).
Sample Question: "Why is pod payment-api-xyz failing?"
Inputs: time_dur, optional cluster / namespace, kind, name.
Next Steps: Use get_pod_details and get_pod_logs to investigate the affected pod.

`get_kubernetes_unhealthy_workloads`

Description: Consolidates unhealthy Deployments, DaemonSets, and StatefulSets into a single view.
Use Case: A quick shortcut for identifying all broken workloads in the cluster.
Sample Question: "Show me everything unhealthy in the cluster."
Inputs: time_dur, optional cluster / namespace.
Next Steps: Use get_kubernetes_events to find the failure reason, then check get_pod_details and get_pod_logs.

`list_kubernetes_storage`

Description: Displays PersistentVolumeClaims (PVCs) and PersistentVolumes (PVs), flagging unbound claims.
Use Case: Helps troubleshoot storage issues for stateful workloads, such as databases or queues.
Sample Question: "Why won't the database pod start? Any storage issues?"
Inputs: time_dur, optional cluster / namespace, limit.
Next Steps: Check get_kubernetes_events for errors related to unbound volumes.

`get_pod_details`

Description: Provides detailed pod state and recent events in a single view (similar to kubectl describe).
Use Case: Offers comprehensive context for a failing pod without requiring multiple queries.
Sample Question: "Tell me everything about pod recommendation-service-bd844bf7-wz8wb."
Inputs: pod_name (required), optional cluster, namespace, time_dur.
Next Steps: Use get_pod_logs to view logs, and get_kubernetes_events to check for node-level warnings.

`list_pods_on_node`

Description: Lists pods running on a specific node, sorted by resource consumption.
Use Case: Identifies resource-heavy pods when a node is under high utilization.
Sample Question: "Top 5 pods on node ip-10-0-3-44 by CPU."
Inputs: node_name (required), time_dur, optional cluster, order_by, limit.
Next Steps: Use get_pod_details and get_pod_logs on the heaviest pods.

`get_pod_logs`

Description: Retrieves log lines for a specific Kubernetes pod.
Use Case: Crucial step for diagnosing application-level crashes (such as CrashLoopBackOff). Note that pod name matching is best-effort.
Sample Question: "Show me the last 100 log lines from pod payments-7f9b8c4d5-xk2lp."
Inputs: pod_name (required), optional namespace, cluster, level, query, time_dur (default 30m), limit (max 200).

Infrastructure

Tool	Access	Answers
`get_infrastructure_overview`	Read	How is the fleet? (summary)
`list_infrastructure_hosts`	Read	Which hosts are inactive?
`get_infrastructure_checks`	Read	Are all agents reporting?
`get_infrastructure_metrics`	Read	Any custom infra metric
`list_containers`	Read	Which Docker containers are hot?
`get_container_events`	Read	Why did this container restart?
`list_host_processes`	Read	Which process is eating CPU?
`list_active_plugins`	Read	Is integration X reporting?
`get_host_inventory`	Read	OS / kernel / hardware per host

`get_infrastructure_overview`

Description: Provides a fleet-wide summary of hosts, containers, processes, and average utilization.
Use Case: Quick health check for your infrastructure fleet. Note that if this summary returns empty for your account, you should use list_infrastructure_hosts instead.
Sample Question: "Give me an infrastructure summary."
Inputs: time_dur, optional hostname.
Next Steps: Use list_infrastructure_hosts for a detailed per-host list.

`list_infrastructure_hosts`

Description: Lists hosts with status (Active/Inactive), CPU, memory, disk usage, and reporting timestamps.
Use Case: Identifies inactive hosts or systems under high resource usage.
Sample Question: "List all hosts and tell me which ones are inactive."
Inputs: time_dur, optional hostname, pagination.
Next Steps: For inactive hosts, check get_logs or get_container_events. For overloaded hosts, check list_host_processes.

`get_infrastructure_checks`

Description: Checks status (Up/Down) of agent tasks, process monitors, and integrations.
Use Case: Confirms that telemetry collections and checks are running properly.
Sample Question: "Are any infra checks failing?"
Inputs: time_dur, optional hostname.

`get_infrastructure_metrics`

Description: Queries custom infrastructure metrics (including CPU breakdown, load average, memory, network, and disk).
Use Case: Custom query tool for metrics not covered by standard tools.
Sample Question: "Show me CPU iowait by hostname for the last 6 hours."
Inputs: time_dur, metrics (array), optional group_by, hostname.

`list_containers`

Description: Lists running Docker containers (non-Kubernetes) with resource usage metrics.
Use Case: Monitors container resource usage and health on standalone Docker hosts.
Sample Question: "List the top 10 containers by CPU usage."
Inputs: time_dur, optional hostname, container_name, image_name, order_by, pagination.
Next Steps: Use get_container_events if a container is unstable.

`get_container_events`

Description: Tracks container lifecycle events (such as restarts, exit codes, and OOM kills).
Use Case: Troubleshoots container instability and crashes.
Sample Question: "Why did the worker container restart in the last 24h?"
Inputs: time_dur, optional hostname, container_name, image_name, pagination.

`list_host_processes`

Description: Lists the top processes on a host sorted by CPU or memory usage.
Use Case: Pinpoints the exact process causing high CPU or memory utilization on a host.
Sample Question: "Top 10 processes on app-prod-1 by CPU."
Inputs: hostname (required), time_dur, sort_by (cpu/memory), limit.
Next Steps: Use get_logs to check for error outputs from the offending process.

`list_active_plugins`

Description: Lists active Atatus integrations (like Redis, MongoDB, MySQL, and Nginx).
Use Case: Confirms if specific middleware integrations are reporting data correctly.
Sample Question: "Is Redis monitoring active?"
Inputs: None.

`get_host_inventory`

Description: Details hardware and OS configurations for each host.
Use Case: Useful for system audits, version verification, and upgrade planning.
Sample Question: "Which hosts are still on Ubuntu 18.04?"
Inputs: optional hostname, time_dur, pagination.
Next Steps: Use list_infrastructure_hosts to verify reporting status.

Logs

`get_logs`

Description: Searches unified logs across applications, Kubernetes, Docker, and hosts.
Use Case: The core log-analysis tool for diagnosing application-level issues.
Sample Question: "Show me error logs from host Zenitsu in the last 7 days."
Inputs: hostname, service, level, query, pod_name, namespace, cluster, time parameters, and limits.
Next Steps: Use analyze_logs to group recurring patterns if log volume is high.

Distributed Tracing

Tool	Access	Answers
`search_traces`	Read	Find the slow/failed requests
`get_trace_detail`	Read	Where did this request spend its time?
`get_trace_flame_chart`	Read	Which span is the bottleneck?
`get_slowest_spans`	Read	What are the slowest operations?
`get_service_map`	Read	How do my services depend on each other?

`search_traces`

Description: Searches distributed traces, returning latency and status details.
Use Case: Finding slow or failed microservice requests.
Sample Question: "Find all traces in checkout-service longer than 2 seconds in the last hour."
Inputs: time and project ID, status codes, duration thresholds, environment, and pagination.
Next Steps: Use get_trace_detail or get_trace_flame_chart with the returned trace IDs.

`get_trace_detail`

Description: Returns trace timeline breakdowns, database calls, and downstream service requests.
Use Case: Drills down into a trace to identify which service or query is causing latency.
Sample Question: "Tell me everything about trace abc123."
Inputs: trace_id, project_id, timestamp, and fallback time options.
Next Steps: Use get_trace_flame_chart to identify the bottleneck span.

`get_trace_flame_chart`

Description: Renders parent-child span relationships and timings for a specific trace.
Use Case: Visualizes microservice calls to find bottlenecks.
Sample Question: "Show me the flame chart for trace abc123 — which span is the bottleneck?"
Inputs: trace_id, project_id, time window options.

`get_slowest_spans`

Description: Lists the slowest operations (such as DB queries or remote calls) across all traces in a project.
Use Case: Finds slow queries or operations without checking individual traces.
Sample Question: "Show me the 10 slowest spans in order-service today."
Inputs: project ID, time window, span type, sort order, and pagination.
Next Steps: Use get_trace_detail for the trace associated with a slow span.

`get_service_map`

Description: Renders microservice dependency graphs with request volume, latency, and error rates.
Use Case: Visualizes cascading failures or latency propagation across services.
Sample Question: "Show me the service map for production — anywhere with high error rate?"
Inputs: time_dur, optional environment.
Next Steps: Use search_traces on the service with the highest error rate.

AI-Powered Analysis

Tool	Access	Answers
`analyze_logs`	Read	What are the top recurring log patterns?
`analyze_kubernetes_event_storm`	Read	What is the K8s event storm mostly about?
`analyze_slow_transactions`	Read	Where should I focus optimisation effort?

`analyze_logs`

Description: Clusters log patterns and displays recurrence counts.
Use Case: Helps analyze large log volumes by grouping duplicate entries.
Sample Question: "What's the most common log pattern in order-service today?"
Inputs: service, host, log level, time window, limit, and sample size.
Next Steps: Run get_logs with a query matching the target pattern to view raw entries.

`analyze_kubernetes_event_storm`

Description: Clusters Kubernetes events by error reason and affected object counts.
Use Case: Summarizes system events during failures (like OOM or scheduling issues).
Sample Question: "Why is the prod cluster acting up? Summarise the event storm."
Inputs: time range, cluster, namespace, limits, and sample size.
Next Steps: Use get_kubernetes_events followed by get_pod_details to investigate.

`analyze_slow_transactions`

Description: Identifies optimization targets by analyzing high total latency time and long-tail performance outliers.
Use Case: Ranks transactions to show where optimization efforts yield the highest impact.
Sample Question: "Where should I focus my optimisation effort in order-service?"
Inputs: project ID, time range, limits, and sample size.
Next Steps: Use get_transaction_spans on the recommended endpoint.

Time ranges

Most tools accept a time_dur parameter to specify the lookback window. You can use standard duration units:

Minutes: 5m, 10m, 15m, 30m, 60m
Hours: 1h, 3h, 6h, 12h, 24h
Days: 1d, 2d, 3d, 7d, 14d
Weeks: 1w, 2w (normalized to 7d and 14d)
Months: 1M, 2M, 3M
Custom: Set time_dur to custom and provide timeStart and timeEnd as ISO 8601 timestamps.

Example custom time range:

copy

{
  "time_dur": "custom",
  "timeStart": "2026-04-15T00:00:00Z",
  "timeEnd":   "2026-04-22T00:00:00Z"
}

Note that list_projects, get_kubernetes_cluster_names, and list_active_plugins do not accept time parameters.

Note: Log retention is set to 7 days. Any queries requesting logs older than this will return no results.

Response size and pagination

To keep responses clear and stay within context limits, tools return focused, capped results:

Limit: Most list tools accept a limit parameter to restrict the result count.
Pagination: Paginated tools accept a page parameter to walk through subsequent batches of results.
Automatic Truncation: When results exceed the limit, the response will indicate this (e.g., "showing 20 of 134"). Large fields like log lines or stack traces are truncated using a … (truncated) marker.

If you need more details, refine your query using filters or walk through the results page by page.

Note: Support for Database Monitoring, Alerts, Synthetics, and SLOs is planned for future updates.