This page provides a comprehensive list of all 44 tools available in the Atatus MCP server, organized into six categories. For each tool, you will find its description, the reason it is useful, a sample question that triggers it, its inputs, and typical next steps for chaining queries. You do not need to call these tools manually; your AI assistant will automatically select and chain the correct tools based on your question.
Almost all tools are read-only, with the exception of update_error_status (which performs write operations). Write tools require a Read & Write API key. You can find the access level for each tool in the Access column of the reference tables.
| Category | Count |
|---|---|
| APM / Application | 14 |
| Kubernetes | 12 |
| Infrastructure | 9 |
| Logs | 1 |
| Distributed Tracing | 5 |
| AI-Powered Analysis | 3 |
Most time-windowed tools accept a time_dur parameter. Refer to Time ranges and Response size and pagination at the bottom of this page for more details.
APM / Application
| Tool | Access | Answers |
|---|---|---|
list_projects |
Read | What projects exist in this account? |
get_apm_metrics |
Read | How is this service doing overall? |
get_service_health_summary |
Read | Is service X healthy? (one-call verdict) |
get_recent_errors |
Read | What is currently breaking? |
get_error_details |
Read | Why is this error happening? |
get_error_events |
Read | Who/what is each occurrence hitting? |
get_error_trends |
Read | When did errors spike? |
get_recent_transactions |
Read | Which endpoint is slow? |
get_transaction_spans |
Read | Why is this transaction slow? |
get_apm_database_calls |
Read | How are my database calls performing? |
get_apm_timeseries |
Read | How did performance trend over time? |
get_recent_deployments |
Read | What did we deploy, and when? |
correlate_deploy_with_incident |
Read | What changed around this incident? |
update_error_status |
Write | Resolve / ignore / reopen an error |
list_projects
- Description: Returns all monitored applications and projects in the account (including name, type, language, project ID, and active status).
- Use Case: Helps the assistant discover the required
project_idfor subsequent queries, so you do not have to look up or memorize IDs. - Sample Question: "What projects do we have in Atatus?"
- Inputs: None.
get_apm_metrics
- Description: Returns project-level APM aggregates (including Apdex score details, average response time, throughput, and failure rate).
- Use Case: Provides a high-level view of service health in a single call, unlike endpoint-specific tools.
- Sample Question: "How healthy is order-service overall today?"
- Inputs:
project_id,time_dur, optionaltransactionsubstring filter. - Next Steps: Use
get_apm_timeseriesto see performance trends orget_recent_transactionsto find slow endpoints.
get_service_health_summary
- Description: Evaluates service health to return a status (HEALTHY, WATCH, or DEGRADED) along with golden signals (such as Apdex, error rates, top open errors, slow endpoints, and recent deployments).
- Use Case: Quickly checks if a service is running smoothly by consolidating multiple queries into a single response.
- Sample Question: "Is checkout-service healthy right now?"
- Inputs:
project_id,time_dur(default1h). - Next Steps: If the service is in a WATCH or DEGRADED state, use
get_error_detailsto investigate top errors,get_recent_transactionsfor latency, orcorrelate_deploy_with_incidentto check recent deployments.
get_recent_errors
- Description: Lists the top error groups in a project ranked by impact (frequency and affected users).
- Use Case: Helps identify what is currently breaking without manually filtering the Errors dashboard.
- Sample Question: "Show me the top 5 unresolved errors in checkout-service in the last hour."
- Inputs:
project_id,time_dur,status, optional filters (like app version, browser, OS, URL). - Next Steps: Use
get_error_detailsto view the stack trace and triggering deployment for the top error group.
get_error_details
- Description: Returns the stack trace, request context, deployment details, and sample events for a specific error group.
- Use Case: Helps diagnose the root cause of an error by showing where it occurred and which request triggered it.
- Sample Question: "Tell me everything about error abc123."
- Inputs:
project_id,error_id. - Next Steps: Use
get_error_eventsto see how occurrences vary across users and environments.
get_error_events
- Description: Provides paginated instances of an error group, showing full stack traces, request payloads, user info, and environment details.
- Use Case: Helps identify patterns across occurrences, such as whether an error only affects a specific browser or environment.
- Sample Question: "Show me 5 sample events of error abc123 from the last 6 hours."
- Inputs:
project_id,error_id,time_dur,limit.
get_error_trends
- Description: Tracks error frequency over time, showing total, peak, and latest counts.
- Use Case: Helps correlate error spikes with specific events, such as deployments or traffic surges.
- Sample Question: "Did errors spike in the last 24 hours in order-service?"
- Inputs:
project_id,time_dur. - Next Steps: Use
get_recent_errorsto identify which errors drove the spike, orget_recent_deploymentsto check for recent releases.
get_recent_transactions
- Description: Lists performance metrics (average/min/max response time, throughput, failure rate, and Apdex) for endpoints.
- Use Case: Pinpoints which transactions or endpoints are slowest. Includes a transaction ID to chain into span analysis.
- Sample Question: "Which transactions are slowest in order-service this week?"
- Inputs:
project_id,time_dur. - Next Steps: Use
get_transaction_spanswith the transaction ID to identify the bottleneck.
get_transaction_spans
- Description: Breaks down a transaction into database queries, external HTTP calls, and internal execution time.
- Use Case: Helps understand why a specific endpoint is slow by highlighting the slowest layer.
- Sample Question: "Why is GET /checkout slow? Show me the spans."
- Inputs:
project_id,transaction_id,time_dur. - Next Steps: Use
get_recent_deploymentsto check if a recent release caused a regression.
get_apm_database_calls
- Description: Lists database engines (like PostgreSQL, MySQL, Redis) with call volumes, response times, and slowest queries.
- Use Case: Helps identify slow queries and database performance bottlenecks.
- Sample Question: "Which database queries are slowest in order-service?"
- Inputs:
project_id,time_dur, optionaldatabaseengine filter,order_by(responseTime/throughput),limit. - Next Steps: Use
get_slowest_spans(span_type=database) for a ranked list of slow spans across traces.
get_apm_timeseries
- Description: Returns performance trends over time, including response times, throughput, and HTTP failures.
- Use Case: Visualizes performance trends and anomalies, such as sudden traffic drops or latency spikes.
- Sample Question: "How did response time trend over the last 24 hours in order-service?"
- Inputs:
project_id,time_dur, optionaltransactionfilter. - Next Steps: Use
get_recent_transactionsto find the affected endpoints, orget_recent_deploymentsto check for correlated releases.
get_recent_deployments
- Description: Lists deployment markers, including version, release time, deployer, environment, and repository.
- Use Case: Helps determine if a deployment caused a recent issue.
- Sample Question: "What did we deploy in the last 6 hours?"
- Inputs:
project_id,limit. - Next Steps: Use
correlate_deploy_with_incidentorget_error_trendsto analyze the impact of the deployment.
correlate_deploy_with_incident
- Description: Analyzes error rates before and after deployments within a lookback window.
- Use Case: Quickly answers if a recent deployment caused an incident, replacing manual lookup sequences.
- Sample Question: "What changed around 2:30pm in the payments service?"
- Inputs:
project_id, optionalincident_time(ISO 8601),lookback_minutes(default60),compare_window_minutes(default15). - Next Steps: For the deployment with the largest error increase, use
get_recent_errorsandget_error_detailsto investigate.
update_error_status
- Description: Updates the status of an error group to Resolved, Ignored, or Open.
- Use Case: Allows automated systems or assistants to manage error status. Requires a Read & Write API key.
- Sample Question: "Mark error abc123 as resolved."
- Inputs:
project_id,error_id,status(Resolved/Ignored/Open).
update_error_status mutates state and is the only write tool in the server. It is unavailable to Read-scoped keys. A Resolved error can reopen if it recurs; an Ignored error stays muted and sends no notifications.
Kubernetes
| Tool | Access | Answers |
|---|---|---|
get_kubernetes_overview |
Read | Is the cluster OK? (snapshot) |
get_kubernetes_cluster_names |
Read | What clusters/namespaces exist? |
list_kubernetes_pods |
Read | Pod-level investigation |
get_kubernetes_pending_pods |
Read | What's stuck pending? |
list_kubernetes_nodes |
Read | Node capacity & pressure |
list_kubernetes_deployments |
Read | Did the rollout finish? |
get_kubernetes_events |
Read | Why did this fail? |
get_kubernetes_unhealthy_workloads |
Read | What's broken in the workload tier? |
list_kubernetes_storage |
Read | Any storage/PVC issues? |
get_pod_details |
Read | Full picture of one pod (kubectl describe) |
list_pods_on_node |
Read | Which pods are eating this node? |
get_pod_logs |
Read | What did this pod log before it died? |
get_kubernetes_overview
- Description: Provides a cluster-wide summary of nodes, pods, deployments, and other key resources.
- Use Case: Gives a high-level snapshot of cluster health. Note that on some accounts this snapshot can come back empty, so you should use
get_kubernetes_unhealthy_workloadsfor a more reliable view. - Sample Question: "Give me a snapshot of the production cluster."
- Inputs:
time_dur, optionalcluster,namespace. - Next Steps: Use
get_kubernetes_unhealthy_workloadsto investigate specific workload failures.
get_kubernetes_cluster_names
- Description: Lists monitored clusters and namespaces.
- Use Case: Helps verify cluster and namespace names before running other commands. It is a discovery tool and does not need to be called routinely.
- Sample Question: "What clusters and namespaces are we monitoring?"
- Inputs:
time_dur, optionalclusterfilter.
list_kubernetes_pods
- Description: Lists pods with status, node, CPU, memory, restarts, and owner workload details.
- Use Case: Standard tool for investigating pod health and resource usage, sortable by CPU, memory, or restarts.
- Sample Question: "Top 10 pods by CPU in the prod cluster."
- Inputs:
time_dur, optionalcluster/namespace,order_by,limit,page. - Next Steps: Use
get_pod_detailson a failing pod, thenget_pod_logsto review logs.
get_kubernetes_pending_pods
- Description: Lists pods stuck in a
Pendingstate, highlighting node assignment issues. - Use Case: Helps debug scheduling issues such as resource exhaustion, taints, or affinity rules.
- Sample Question: "Are any pods stuck pending right now?"
- Inputs:
time_dur, optionalcluster/namespace. - Next Steps: Use
list_kubernetes_nodesto check capacity orget_kubernetes_eventsto look for scheduling errors.
list_kubernetes_nodes
- Description: Returns node capacity, resource usage, and pressure conditions (like MemoryPressure, DiskPressure, PIDPressure, or Ready).
- Use Case: Useful for debugging scheduling failures or auditing cluster capacity.
- Sample Question: "Are any nodes under MemoryPressure?"
- Inputs:
time_dur, optionalcluster,order_by, pagination. - Next Steps: Use
list_pods_on_nodeto see which pods are consuming resources on a pressured node.
list_kubernetes_deployments
- Description: Tracks desired, available, updated, and unavailable pod counts for deployments.
- Use Case: Helps verify if a deployment rollout has completed successfully.
- Sample Question: "Are all deployments in the orders namespace healthy?"
- Inputs:
time_dur, optionalcluster/namespace, pagination. - Next Steps: Use
get_pod_detailsandget_pod_logsif a deployment fails to reach its desired replica count.
get_kubernetes_events
- Description: Retrieves the Kubernetes event stream, including warnings and error reasons.
- Use Case: The primary tool for diagnosing workload failures (such as CrashLoopBackOff or ImagePullBackOff).
- Sample Question: "Why is pod payment-api-xyz failing?"
- Inputs:
time_dur, optionalcluster/namespace,kind,name. - Next Steps: Use
get_pod_detailsandget_pod_logsto investigate the affected pod.
get_kubernetes_unhealthy_workloads
- Description: Consolidates unhealthy Deployments, DaemonSets, and StatefulSets into a single view.
- Use Case: A quick shortcut for identifying all broken workloads in the cluster.
- Sample Question: "Show me everything unhealthy in the cluster."
- Inputs:
time_dur, optionalcluster/namespace. - Next Steps: Use
get_kubernetes_eventsto find the failure reason, then checkget_pod_detailsandget_pod_logs.
list_kubernetes_storage
- Description: Displays PersistentVolumeClaims (PVCs) and PersistentVolumes (PVs), flagging unbound claims.
- Use Case: Helps troubleshoot storage issues for stateful workloads, such as databases or queues.
- Sample Question: "Why won't the database pod start? Any storage issues?"
- Inputs:
time_dur, optionalcluster/namespace,limit. - Next Steps: Check
get_kubernetes_eventsfor errors related to unbound volumes.
get_pod_details
- Description: Provides detailed pod state and recent events in a single view (similar to
kubectl describe). - Use Case: Offers comprehensive context for a failing pod without requiring multiple queries.
- Sample Question: "Tell me everything about pod recommendation-service-bd844bf7-wz8wb."
- Inputs:
pod_name(required), optionalcluster,namespace,time_dur. - Next Steps: Use
get_pod_logsto view logs, andget_kubernetes_eventsto check for node-level warnings.
list_pods_on_node
- Description: Lists pods running on a specific node, sorted by resource consumption.
- Use Case: Identifies resource-heavy pods when a node is under high utilization.
- Sample Question: "Top 5 pods on node ip-10-0-3-44 by CPU."
- Inputs:
node_name(required),time_dur, optionalcluster,order_by,limit. - Next Steps: Use
get_pod_detailsandget_pod_logson the heaviest pods.
get_pod_logs
- Description: Retrieves log lines for a specific Kubernetes pod.
- Use Case: Crucial step for diagnosing application-level crashes (such as CrashLoopBackOff). Note that pod name matching is best-effort.
- Sample Question: "Show me the last 100 log lines from pod payments-7f9b8c4d5-xk2lp."
- Inputs:
pod_name(required), optionalnamespace,cluster,level,query,time_dur(default30m),limit(max 200).
Infrastructure
| Tool | Access | Answers |
|---|---|---|
get_infrastructure_overview |
Read | How is the fleet? (summary) |
list_infrastructure_hosts |
Read | Which hosts are inactive? |
get_infrastructure_checks |
Read | Are all agents reporting? |
get_infrastructure_metrics |
Read | Any custom infra metric |
list_containers |
Read | Which Docker containers are hot? |
get_container_events |
Read | Why did this container restart? |
list_host_processes |
Read | Which process is eating CPU? |
list_active_plugins |
Read | Is integration X reporting? |
get_host_inventory |
Read | OS / kernel / hardware per host |
get_infrastructure_overview
- Description: Provides a fleet-wide summary of hosts, containers, processes, and average utilization.
- Use Case: Quick health check for your infrastructure fleet. Note that if this summary returns empty for your account, you should use
list_infrastructure_hostsinstead. - Sample Question: "Give me an infrastructure summary."
- Inputs:
time_dur, optionalhostname. - Next Steps: Use
list_infrastructure_hostsfor a detailed per-host list.
list_infrastructure_hosts
- Description: Lists hosts with status (Active/Inactive), CPU, memory, disk usage, and reporting timestamps.
- Use Case: Identifies inactive hosts or systems under high resource usage.
- Sample Question: "List all hosts and tell me which ones are inactive."
- Inputs:
time_dur, optionalhostname, pagination. - Next Steps: For inactive hosts, check
get_logsorget_container_events. For overloaded hosts, checklist_host_processes.
get_infrastructure_checks
- Description: Checks status (Up/Down) of agent tasks, process monitors, and integrations.
- Use Case: Confirms that telemetry collections and checks are running properly.
- Sample Question: "Are any infra checks failing?"
- Inputs:
time_dur, optionalhostname.
get_infrastructure_metrics
- Description: Queries custom infrastructure metrics (including CPU breakdown, load average, memory, network, and disk).
- Use Case: Custom query tool for metrics not covered by standard tools.
- Sample Question: "Show me CPU iowait by hostname for the last 6 hours."
- Inputs:
time_dur,metrics(array), optionalgroup_by,hostname.
list_containers
- Description: Lists running Docker containers (non-Kubernetes) with resource usage metrics.
- Use Case: Monitors container resource usage and health on standalone Docker hosts.
- Sample Question: "List the top 10 containers by CPU usage."
- Inputs:
time_dur, optionalhostname,container_name,image_name,order_by, pagination. - Next Steps: Use
get_container_eventsif a container is unstable.
get_container_events
- Description: Tracks container lifecycle events (such as restarts, exit codes, and OOM kills).
- Use Case: Troubleshoots container instability and crashes.
- Sample Question: "Why did the worker container restart in the last 24h?"
- Inputs:
time_dur, optionalhostname,container_name,image_name, pagination.
list_host_processes
- Description: Lists the top processes on a host sorted by CPU or memory usage.
- Use Case: Pinpoints the exact process causing high CPU or memory utilization on a host.
- Sample Question: "Top 10 processes on app-prod-1 by CPU."
- Inputs:
hostname(required),time_dur,sort_by(cpu/memory),limit. - Next Steps: Use
get_logsto check for error outputs from the offending process.
list_active_plugins
- Description: Lists active Atatus integrations (like Redis, MongoDB, MySQL, and Nginx).
- Use Case: Confirms if specific middleware integrations are reporting data correctly.
- Sample Question: "Is Redis monitoring active?"
- Inputs: None.
get_host_inventory
- Description: Details hardware and OS configurations for each host.
- Use Case: Useful for system audits, version verification, and upgrade planning.
- Sample Question: "Which hosts are still on Ubuntu 18.04?"
- Inputs: optional
hostname,time_dur, pagination. - Next Steps: Use
list_infrastructure_hoststo verify reporting status.
Logs
get_logs
- Description: Searches unified logs across applications, Kubernetes, Docker, and hosts.
- Use Case: The core log-analysis tool for diagnosing application-level issues.
- Sample Question: "Show me error logs from host Zenitsu in the last 7 days."
- Inputs: hostname, service, level, query, pod_name, namespace, cluster, time parameters, and limits.
- Next Steps: Use
analyze_logsto group recurring patterns if log volume is high.
Distributed Tracing
| Tool | Access | Answers |
|---|---|---|
search_traces |
Read | Find the slow/failed requests |
get_trace_detail |
Read | Where did this request spend its time? |
get_trace_flame_chart |
Read | Which span is the bottleneck? |
get_slowest_spans |
Read | What are the slowest operations? |
get_service_map |
Read | How do my services depend on each other? |
search_traces
- Description: Searches distributed traces, returning latency and status details.
- Use Case: Finding slow or failed microservice requests.
- Sample Question: "Find all traces in checkout-service longer than 2 seconds in the last hour."
- Inputs: time and project ID, status codes, duration thresholds, environment, and pagination.
- Next Steps: Use
get_trace_detailorget_trace_flame_chartwith the returned trace IDs.
get_trace_detail
- Description: Returns trace timeline breakdowns, database calls, and downstream service requests.
- Use Case: Drills down into a trace to identify which service or query is causing latency.
- Sample Question: "Tell me everything about trace abc123."
- Inputs:
trace_id,project_id, timestamp, and fallback time options. - Next Steps: Use
get_trace_flame_chartto identify the bottleneck span.
get_trace_flame_chart
- Description: Renders parent-child span relationships and timings for a specific trace.
- Use Case: Visualizes microservice calls to find bottlenecks.
- Sample Question: "Show me the flame chart for trace abc123 — which span is the bottleneck?"
- Inputs:
trace_id,project_id, time window options.
get_slowest_spans
- Description: Lists the slowest operations (such as DB queries or remote calls) across all traces in a project.
- Use Case: Finds slow queries or operations without checking individual traces.
- Sample Question: "Show me the 10 slowest spans in order-service today."
- Inputs: project ID, time window, span type, sort order, and pagination.
- Next Steps: Use
get_trace_detailfor the trace associated with a slow span.
get_service_map
- Description: Renders microservice dependency graphs with request volume, latency, and error rates.
- Use Case: Visualizes cascading failures or latency propagation across services.
- Sample Question: "Show me the service map for production — anywhere with high error rate?"
- Inputs:
time_dur, optional environment. - Next Steps: Use
search_traceson the service with the highest error rate.
AI-Powered Analysis
| Tool | Access | Answers |
|---|---|---|
analyze_logs |
Read | What are the top recurring log patterns? |
analyze_kubernetes_event_storm |
Read | What is the K8s event storm mostly about? |
analyze_slow_transactions |
Read | Where should I focus optimisation effort? |
analyze_logs
- Description: Clusters log patterns and displays recurrence counts.
- Use Case: Helps analyze large log volumes by grouping duplicate entries.
- Sample Question: "What's the most common log pattern in order-service today?"
- Inputs: service, host, log level, time window, limit, and sample size.
- Next Steps: Run
get_logswith a query matching the target pattern to view raw entries.
analyze_kubernetes_event_storm
- Description: Clusters Kubernetes events by error reason and affected object counts.
- Use Case: Summarizes system events during failures (like OOM or scheduling issues).
- Sample Question: "Why is the prod cluster acting up? Summarise the event storm."
- Inputs: time range, cluster, namespace, limits, and sample size.
- Next Steps: Use
get_kubernetes_eventsfollowed byget_pod_detailsto investigate.
analyze_slow_transactions
- Description: Identifies optimization targets by analyzing high total latency time and long-tail performance outliers.
- Use Case: Ranks transactions to show where optimization efforts yield the highest impact.
- Sample Question: "Where should I focus my optimisation effort in order-service?"
- Inputs: project ID, time range, limits, and sample size.
- Next Steps: Use
get_transaction_spanson the recommended endpoint.
Time ranges
Most tools accept a time_dur parameter to specify the lookback window. You can use standard duration units:
- Minutes:
5m,10m,15m,30m,60m - Hours:
1h,3h,6h,12h,24h - Days:
1d,2d,3d,7d,14d - Weeks:
1w,2w(normalized to7dand14d) - Months:
1M,2M,3M - Custom: Set
time_durtocustomand providetimeStartandtimeEndas ISO 8601 timestamps.
Example custom time range:
{
"time_dur": "custom",
"timeStart": "2026-04-15T00:00:00Z",
"timeEnd": "2026-04-22T00:00:00Z"
}
Note that list_projects, get_kubernetes_cluster_names, and list_active_plugins do not accept time parameters.
Response size and pagination
To keep responses clear and stay within context limits, tools return focused, capped results:
- Limit: Most list tools accept a
limitparameter to restrict the result count. - Pagination: Paginated tools accept a
pageparameter to walk through subsequent batches of results. - Automatic Truncation: When results exceed the limit, the response will indicate this (e.g., "showing 20 of 134"). Large fields like log lines or stack traces are truncated using a
… (truncated)marker.
If you need more details, refine your query using filters or walk through the results page by page.
+1-415-800-4104