This page provides a comprehensive list of all 44 tools available in the Atatus MCP server, organized into six categories. For each tool, you will find its description, the reason it is useful, a sample question that triggers it, its inputs, and typical next steps for chaining queries. You do not need to call these tools manually; your AI assistant will automatically select and chain the correct tools based on your question.

Almost all tools are read-only, with the exception of update_error_status (which performs write operations). Write tools require a Read & Write API key. You can find the access level for each tool in the Access column of the reference tables.

Category Count
APM / Application 14
Kubernetes 12
Infrastructure 9
Logs 1
Distributed Tracing 5
AI-Powered Analysis 3

Most time-windowed tools accept a time_dur parameter. Refer to Time ranges and Response size and pagination at the bottom of this page for more details.


APM / Application

Tool Access Answers
list_projects Read What projects exist in this account?
get_apm_metrics Read How is this service doing overall?
get_service_health_summary Read Is service X healthy? (one-call verdict)
get_recent_errors Read What is currently breaking?
get_error_details Read Why is this error happening?
get_error_events Read Who/what is each occurrence hitting?
get_error_trends Read When did errors spike?
get_recent_transactions Read Which endpoint is slow?
get_transaction_spans Read Why is this transaction slow?
get_apm_database_calls Read How are my database calls performing?
get_apm_timeseries Read How did performance trend over time?
get_recent_deployments Read What did we deploy, and when?
correlate_deploy_with_incident Read What changed around this incident?
update_error_status Write Resolve / ignore / reopen an error

list_projects

  • Description: Returns all monitored applications and projects in the account (including name, type, language, project ID, and active status).
  • Use Case: Helps the assistant discover the required project_id for subsequent queries, so you do not have to look up or memorize IDs.
  • Sample Question: "What projects do we have in Atatus?"
  • Inputs: None.

get_apm_metrics

  • Description: Returns project-level APM aggregates (including Apdex score details, average response time, throughput, and failure rate).
  • Use Case: Provides a high-level view of service health in a single call, unlike endpoint-specific tools.
  • Sample Question: "How healthy is order-service overall today?"
  • Inputs: project_id, time_dur, optional transaction substring filter.
  • Next Steps: Use get_apm_timeseries to see performance trends or get_recent_transactions to find slow endpoints.

get_service_health_summary

  • Description: Evaluates service health to return a status (HEALTHY, WATCH, or DEGRADED) along with golden signals (such as Apdex, error rates, top open errors, slow endpoints, and recent deployments).
  • Use Case: Quickly checks if a service is running smoothly by consolidating multiple queries into a single response.
  • Sample Question: "Is checkout-service healthy right now?"
  • Inputs: project_id, time_dur (default 1h).
  • Next Steps: If the service is in a WATCH or DEGRADED state, use get_error_details to investigate top errors, get_recent_transactions for latency, or correlate_deploy_with_incident to check recent deployments.

get_recent_errors

  • Description: Lists the top error groups in a project ranked by impact (frequency and affected users).
  • Use Case: Helps identify what is currently breaking without manually filtering the Errors dashboard.
  • Sample Question: "Show me the top 5 unresolved errors in checkout-service in the last hour."
  • Inputs: project_id, time_dur, status, optional filters (like app version, browser, OS, URL).
  • Next Steps: Use get_error_details to view the stack trace and triggering deployment for the top error group.

get_error_details

  • Description: Returns the stack trace, request context, deployment details, and sample events for a specific error group.
  • Use Case: Helps diagnose the root cause of an error by showing where it occurred and which request triggered it.
  • Sample Question: "Tell me everything about error abc123."
  • Inputs: project_id, error_id.
  • Next Steps: Use get_error_events to see how occurrences vary across users and environments.

get_error_events

  • Description: Provides paginated instances of an error group, showing full stack traces, request payloads, user info, and environment details.
  • Use Case: Helps identify patterns across occurrences, such as whether an error only affects a specific browser or environment.
  • Sample Question: "Show me 5 sample events of error abc123 from the last 6 hours."
  • Inputs: project_id, error_id, time_dur, limit.
  • Description: Tracks error frequency over time, showing total, peak, and latest counts.
  • Use Case: Helps correlate error spikes with specific events, such as deployments or traffic surges.
  • Sample Question: "Did errors spike in the last 24 hours in order-service?"
  • Inputs: project_id, time_dur.
  • Next Steps: Use get_recent_errors to identify which errors drove the spike, or get_recent_deployments to check for recent releases.

get_recent_transactions

  • Description: Lists performance metrics (average/min/max response time, throughput, failure rate, and Apdex) for endpoints.
  • Use Case: Pinpoints which transactions or endpoints are slowest. Includes a transaction ID to chain into span analysis.
  • Sample Question: "Which transactions are slowest in order-service this week?"
  • Inputs: project_id, time_dur.
  • Next Steps: Use get_transaction_spans with the transaction ID to identify the bottleneck.

get_transaction_spans

  • Description: Breaks down a transaction into database queries, external HTTP calls, and internal execution time.
  • Use Case: Helps understand why a specific endpoint is slow by highlighting the slowest layer.
  • Sample Question: "Why is GET /checkout slow? Show me the spans."
  • Inputs: project_id, transaction_id, time_dur.
  • Next Steps: Use get_recent_deployments to check if a recent release caused a regression.

get_apm_database_calls

  • Description: Lists database engines (like PostgreSQL, MySQL, Redis) with call volumes, response times, and slowest queries.
  • Use Case: Helps identify slow queries and database performance bottlenecks.
  • Sample Question: "Which database queries are slowest in order-service?"
  • Inputs: project_id, time_dur, optional database engine filter, order_by (responseTime / throughput), limit.
  • Next Steps: Use get_slowest_spans (span_type=database) for a ranked list of slow spans across traces.

get_apm_timeseries

  • Description: Returns performance trends over time, including response times, throughput, and HTTP failures.
  • Use Case: Visualizes performance trends and anomalies, such as sudden traffic drops or latency spikes.
  • Sample Question: "How did response time trend over the last 24 hours in order-service?"
  • Inputs: project_id, time_dur, optional transaction filter.
  • Next Steps: Use get_recent_transactions to find the affected endpoints, or get_recent_deployments to check for correlated releases.

get_recent_deployments

  • Description: Lists deployment markers, including version, release time, deployer, environment, and repository.
  • Use Case: Helps determine if a deployment caused a recent issue.
  • Sample Question: "What did we deploy in the last 6 hours?"
  • Inputs: project_id, limit.
  • Next Steps: Use correlate_deploy_with_incident or get_error_trends to analyze the impact of the deployment.

correlate_deploy_with_incident

  • Description: Analyzes error rates before and after deployments within a lookback window.
  • Use Case: Quickly answers if a recent deployment caused an incident, replacing manual lookup sequences.
  • Sample Question: "What changed around 2:30pm in the payments service?"
  • Inputs: project_id, optional incident_time (ISO 8601), lookback_minutes (default 60), compare_window_minutes (default 15).
  • Next Steps: For the deployment with the largest error increase, use get_recent_errors and get_error_details to investigate.

update_error_status

  • Description: Updates the status of an error group to Resolved, Ignored, or Open.
  • Use Case: Allows automated systems or assistants to manage error status. Requires a Read & Write API key.
  • Sample Question: "Mark error abc123 as resolved."
  • Inputs: project_id, error_id, status (Resolved / Ignored / Open).

Kubernetes

Tool Access Answers
get_kubernetes_overview Read Is the cluster OK? (snapshot)
get_kubernetes_cluster_names Read What clusters/namespaces exist?
list_kubernetes_pods Read Pod-level investigation
get_kubernetes_pending_pods Read What's stuck pending?
list_kubernetes_nodes Read Node capacity & pressure
list_kubernetes_deployments Read Did the rollout finish?
get_kubernetes_events Read Why did this fail?
get_kubernetes_unhealthy_workloads Read What's broken in the workload tier?
list_kubernetes_storage Read Any storage/PVC issues?
get_pod_details Read Full picture of one pod (kubectl describe)
list_pods_on_node Read Which pods are eating this node?
get_pod_logs Read What did this pod log before it died?

get_kubernetes_overview

  • Description: Provides a cluster-wide summary of nodes, pods, deployments, and other key resources.
  • Use Case: Gives a high-level snapshot of cluster health. Note that on some accounts this snapshot can come back empty, so you should use get_kubernetes_unhealthy_workloads for a more reliable view.
  • Sample Question: "Give me a snapshot of the production cluster."
  • Inputs: time_dur, optional cluster, namespace.
  • Next Steps: Use get_kubernetes_unhealthy_workloads to investigate specific workload failures.

get_kubernetes_cluster_names

  • Description: Lists monitored clusters and namespaces.
  • Use Case: Helps verify cluster and namespace names before running other commands. It is a discovery tool and does not need to be called routinely.
  • Sample Question: "What clusters and namespaces are we monitoring?"
  • Inputs: time_dur, optional cluster filter.

list_kubernetes_pods

  • Description: Lists pods with status, node, CPU, memory, restarts, and owner workload details.
  • Use Case: Standard tool for investigating pod health and resource usage, sortable by CPU, memory, or restarts.
  • Sample Question: "Top 10 pods by CPU in the prod cluster."
  • Inputs: time_dur, optional cluster / namespace, order_by, limit, page.
  • Next Steps: Use get_pod_details on a failing pod, then get_pod_logs to review logs.

get_kubernetes_pending_pods

  • Description: Lists pods stuck in a Pending state, highlighting node assignment issues.
  • Use Case: Helps debug scheduling issues such as resource exhaustion, taints, or affinity rules.
  • Sample Question: "Are any pods stuck pending right now?"
  • Inputs: time_dur, optional cluster / namespace.
  • Next Steps: Use list_kubernetes_nodes to check capacity or get_kubernetes_events to look for scheduling errors.

list_kubernetes_nodes

  • Description: Returns node capacity, resource usage, and pressure conditions (like MemoryPressure, DiskPressure, PIDPressure, or Ready).
  • Use Case: Useful for debugging scheduling failures or auditing cluster capacity.
  • Sample Question: "Are any nodes under MemoryPressure?"
  • Inputs: time_dur, optional cluster, order_by, pagination.
  • Next Steps: Use list_pods_on_node to see which pods are consuming resources on a pressured node.

list_kubernetes_deployments

  • Description: Tracks desired, available, updated, and unavailable pod counts for deployments.
  • Use Case: Helps verify if a deployment rollout has completed successfully.
  • Sample Question: "Are all deployments in the orders namespace healthy?"
  • Inputs: time_dur, optional cluster / namespace, pagination.
  • Next Steps: Use get_pod_details and get_pod_logs if a deployment fails to reach its desired replica count.

get_kubernetes_events

  • Description: Retrieves the Kubernetes event stream, including warnings and error reasons.
  • Use Case: The primary tool for diagnosing workload failures (such as CrashLoopBackOff or ImagePullBackOff).
  • Sample Question: "Why is pod payment-api-xyz failing?"
  • Inputs: time_dur, optional cluster / namespace, kind, name.
  • Next Steps: Use get_pod_details and get_pod_logs to investigate the affected pod.

get_kubernetes_unhealthy_workloads

  • Description: Consolidates unhealthy Deployments, DaemonSets, and StatefulSets into a single view.
  • Use Case: A quick shortcut for identifying all broken workloads in the cluster.
  • Sample Question: "Show me everything unhealthy in the cluster."
  • Inputs: time_dur, optional cluster / namespace.
  • Next Steps: Use get_kubernetes_events to find the failure reason, then check get_pod_details and get_pod_logs.

list_kubernetes_storage

  • Description: Displays PersistentVolumeClaims (PVCs) and PersistentVolumes (PVs), flagging unbound claims.
  • Use Case: Helps troubleshoot storage issues for stateful workloads, such as databases or queues.
  • Sample Question: "Why won't the database pod start? Any storage issues?"
  • Inputs: time_dur, optional cluster / namespace, limit.
  • Next Steps: Check get_kubernetes_events for errors related to unbound volumes.

get_pod_details

  • Description: Provides detailed pod state and recent events in a single view (similar to kubectl describe).
  • Use Case: Offers comprehensive context for a failing pod without requiring multiple queries.
  • Sample Question: "Tell me everything about pod recommendation-service-bd844bf7-wz8wb."
  • Inputs: pod_name (required), optional cluster, namespace, time_dur.
  • Next Steps: Use get_pod_logs to view logs, and get_kubernetes_events to check for node-level warnings.

list_pods_on_node

  • Description: Lists pods running on a specific node, sorted by resource consumption.
  • Use Case: Identifies resource-heavy pods when a node is under high utilization.
  • Sample Question: "Top 5 pods on node ip-10-0-3-44 by CPU."
  • Inputs: node_name (required), time_dur, optional cluster, order_by, limit.
  • Next Steps: Use get_pod_details and get_pod_logs on the heaviest pods.

get_pod_logs

  • Description: Retrieves log lines for a specific Kubernetes pod.
  • Use Case: Crucial step for diagnosing application-level crashes (such as CrashLoopBackOff). Note that pod name matching is best-effort.
  • Sample Question: "Show me the last 100 log lines from pod payments-7f9b8c4d5-xk2lp."
  • Inputs: pod_name (required), optional namespace, cluster, level, query, time_dur (default 30m), limit (max 200).

Infrastructure

Tool Access Answers
get_infrastructure_overview Read How is the fleet? (summary)
list_infrastructure_hosts Read Which hosts are inactive?
get_infrastructure_checks Read Are all agents reporting?
get_infrastructure_metrics Read Any custom infra metric
list_containers Read Which Docker containers are hot?
get_container_events Read Why did this container restart?
list_host_processes Read Which process is eating CPU?
list_active_plugins Read Is integration X reporting?
get_host_inventory Read OS / kernel / hardware per host

get_infrastructure_overview

  • Description: Provides a fleet-wide summary of hosts, containers, processes, and average utilization.
  • Use Case: Quick health check for your infrastructure fleet. Note that if this summary returns empty for your account, you should use list_infrastructure_hosts instead.
  • Sample Question: "Give me an infrastructure summary."
  • Inputs: time_dur, optional hostname.
  • Next Steps: Use list_infrastructure_hosts for a detailed per-host list.

list_infrastructure_hosts

  • Description: Lists hosts with status (Active/Inactive), CPU, memory, disk usage, and reporting timestamps.
  • Use Case: Identifies inactive hosts or systems under high resource usage.
  • Sample Question: "List all hosts and tell me which ones are inactive."
  • Inputs: time_dur, optional hostname, pagination.
  • Next Steps: For inactive hosts, check get_logs or get_container_events. For overloaded hosts, check list_host_processes.

get_infrastructure_checks

  • Description: Checks status (Up/Down) of agent tasks, process monitors, and integrations.
  • Use Case: Confirms that telemetry collections and checks are running properly.
  • Sample Question: "Are any infra checks failing?"
  • Inputs: time_dur, optional hostname.

get_infrastructure_metrics

  • Description: Queries custom infrastructure metrics (including CPU breakdown, load average, memory, network, and disk).
  • Use Case: Custom query tool for metrics not covered by standard tools.
  • Sample Question: "Show me CPU iowait by hostname for the last 6 hours."
  • Inputs: time_dur, metrics (array), optional group_by, hostname.

list_containers

  • Description: Lists running Docker containers (non-Kubernetes) with resource usage metrics.
  • Use Case: Monitors container resource usage and health on standalone Docker hosts.
  • Sample Question: "List the top 10 containers by CPU usage."
  • Inputs: time_dur, optional hostname, container_name, image_name, order_by, pagination.
  • Next Steps: Use get_container_events if a container is unstable.

get_container_events

  • Description: Tracks container lifecycle events (such as restarts, exit codes, and OOM kills).
  • Use Case: Troubleshoots container instability and crashes.
  • Sample Question: "Why did the worker container restart in the last 24h?"
  • Inputs: time_dur, optional hostname, container_name, image_name, pagination.

list_host_processes

  • Description: Lists the top processes on a host sorted by CPU or memory usage.
  • Use Case: Pinpoints the exact process causing high CPU or memory utilization on a host.
  • Sample Question: "Top 10 processes on app-prod-1 by CPU."
  • Inputs: hostname (required), time_dur, sort_by (cpu/memory), limit.
  • Next Steps: Use get_logs to check for error outputs from the offending process.

list_active_plugins

  • Description: Lists active Atatus integrations (like Redis, MongoDB, MySQL, and Nginx).
  • Use Case: Confirms if specific middleware integrations are reporting data correctly.
  • Sample Question: "Is Redis monitoring active?"
  • Inputs: None.

get_host_inventory

  • Description: Details hardware and OS configurations for each host.
  • Use Case: Useful for system audits, version verification, and upgrade planning.
  • Sample Question: "Which hosts are still on Ubuntu 18.04?"
  • Inputs: optional hostname, time_dur, pagination.
  • Next Steps: Use list_infrastructure_hosts to verify reporting status.

Logs

get_logs

  • Description: Searches unified logs across applications, Kubernetes, Docker, and hosts.
  • Use Case: The core log-analysis tool for diagnosing application-level issues.
  • Sample Question: "Show me error logs from host Zenitsu in the last 7 days."
  • Inputs: hostname, service, level, query, pod_name, namespace, cluster, time parameters, and limits.
  • Next Steps: Use analyze_logs to group recurring patterns if log volume is high.

Distributed Tracing

Tool Access Answers
search_traces Read Find the slow/failed requests
get_trace_detail Read Where did this request spend its time?
get_trace_flame_chart Read Which span is the bottleneck?
get_slowest_spans Read What are the slowest operations?
get_service_map Read How do my services depend on each other?

search_traces

  • Description: Searches distributed traces, returning latency and status details.
  • Use Case: Finding slow or failed microservice requests.
  • Sample Question: "Find all traces in checkout-service longer than 2 seconds in the last hour."
  • Inputs: time and project ID, status codes, duration thresholds, environment, and pagination.
  • Next Steps: Use get_trace_detail or get_trace_flame_chart with the returned trace IDs.

get_trace_detail

  • Description: Returns trace timeline breakdowns, database calls, and downstream service requests.
  • Use Case: Drills down into a trace to identify which service or query is causing latency.
  • Sample Question: "Tell me everything about trace abc123."
  • Inputs: trace_id, project_id, timestamp, and fallback time options.
  • Next Steps: Use get_trace_flame_chart to identify the bottleneck span.

get_trace_flame_chart

  • Description: Renders parent-child span relationships and timings for a specific trace.
  • Use Case: Visualizes microservice calls to find bottlenecks.
  • Sample Question: "Show me the flame chart for trace abc123 — which span is the bottleneck?"
  • Inputs: trace_id, project_id, time window options.

get_slowest_spans

  • Description: Lists the slowest operations (such as DB queries or remote calls) across all traces in a project.
  • Use Case: Finds slow queries or operations without checking individual traces.
  • Sample Question: "Show me the 10 slowest spans in order-service today."
  • Inputs: project ID, time window, span type, sort order, and pagination.
  • Next Steps: Use get_trace_detail for the trace associated with a slow span.

get_service_map

  • Description: Renders microservice dependency graphs with request volume, latency, and error rates.
  • Use Case: Visualizes cascading failures or latency propagation across services.
  • Sample Question: "Show me the service map for production — anywhere with high error rate?"
  • Inputs: time_dur, optional environment.
  • Next Steps: Use search_traces on the service with the highest error rate.

AI-Powered Analysis

Tool Access Answers
analyze_logs Read What are the top recurring log patterns?
analyze_kubernetes_event_storm Read What is the K8s event storm mostly about?
analyze_slow_transactions Read Where should I focus optimisation effort?

analyze_logs

  • Description: Clusters log patterns and displays recurrence counts.
  • Use Case: Helps analyze large log volumes by grouping duplicate entries.
  • Sample Question: "What's the most common log pattern in order-service today?"
  • Inputs: service, host, log level, time window, limit, and sample size.
  • Next Steps: Run get_logs with a query matching the target pattern to view raw entries.

analyze_kubernetes_event_storm

  • Description: Clusters Kubernetes events by error reason and affected object counts.
  • Use Case: Summarizes system events during failures (like OOM or scheduling issues).
  • Sample Question: "Why is the prod cluster acting up? Summarise the event storm."
  • Inputs: time range, cluster, namespace, limits, and sample size.
  • Next Steps: Use get_kubernetes_events followed by get_pod_details to investigate.

analyze_slow_transactions

  • Description: Identifies optimization targets by analyzing high total latency time and long-tail performance outliers.
  • Use Case: Ranks transactions to show where optimization efforts yield the highest impact.
  • Sample Question: "Where should I focus my optimisation effort in order-service?"
  • Inputs: project ID, time range, limits, and sample size.
  • Next Steps: Use get_transaction_spans on the recommended endpoint.

Time ranges

Most tools accept a time_dur parameter to specify the lookback window. You can use standard duration units:

  • Minutes: 5m, 10m, 15m, 30m, 60m
  • Hours: 1h, 3h, 6h, 12h, 24h
  • Days: 1d, 2d, 3d, 7d, 14d
  • Weeks: 1w, 2w (normalized to 7d and 14d)
  • Months: 1M, 2M, 3M
  • Custom: Set time_dur to custom and provide timeStart and timeEnd as ISO 8601 timestamps.

Example custom time range:

copy
icon/buttons/copy
{
  "time_dur": "custom",
  "timeStart": "2026-04-15T00:00:00Z",
  "timeEnd":   "2026-04-22T00:00:00Z"
}

Note that list_projects, get_kubernetes_cluster_names, and list_active_plugins do not accept time parameters.

Response size and pagination

To keep responses clear and stay within context limits, tools return focused, capped results:

  • Limit: Most list tools accept a limit parameter to restrict the result count.
  • Pagination: Paginated tools accept a page parameter to walk through subsequent batches of results.
  • Automatic Truncation: When results exceed the limit, the response will indicate this (e.g., "showing 20 of 134"). Large fields like log lines or stack traces are truncated using a … (truncated) marker.

If you need more details, refine your query using filters or walk through the results page by page.