Observability

10 min read

Observability

Archestra Logs Viewer

Archestra exposes Prometheus metrics and OpenTelemetry traces for monitoring system health, tracking HTTP requests, and analyzing LLM API performance.

Metrics

The endpoint http://localhost:9050/metrics exposes Prometheus-formatted metrics including:

Generative AI Metrics

LLM Metrics

  • llm_request_duration_seconds - LLM API request duration by provider, model, agent_id, agent_name, agent_type, external_agent_id, and status code
  • llm_tokens_total - Token consumption by provider, model, agent_id, agent_name, agent_type, external_agent_id, and type (input/output)
  • llm_cost_total - Estimated cost in USD by provider, model, agent_id, agent_name, agent_type, and external_agent_id. Requires token pricing to be configured in Archestra.
  • llm_blocked_tools_total - Counter of tool calls blocked by tool invocation policies, grouped by provider, model, agent_id, agent_name, agent_type, and external_agent_id
  • llm_time_to_first_token_seconds - Time to first token (TTFT) for streaming requests, by provider, agent_id, agent_name, agent_type, external_agent_id, and model. Helps developers choose models with lower initial response latency.
  • llm_tokens_per_second - Output tokens per second throughput, by provider, agent_id, agent_name, agent_type, external_agent_id, and model. Allows comparing model response speeds for latency-sensitive applications.

Note: agent_id and agent_name are the internal Archestra agent identifier and name. external_agent_id contains the external agent ID passed via the X-Archestra-Agent-Id header — this allows clients to associate metrics with their own agent identifiers. If the header is not provided, the label will be empty. agent_type indicates the type of agent: agent, llm_proxy, mcp_gateway, or profile.

MCP Metrics

  • mcp_tool_calls_total - Total MCP tool calls by agent_id, agent_name, agent_type, mcp_server_name, tool_name, and status (success/error)
  • mcp_tool_call_duration_seconds - MCP tool call execution duration by agent_id, agent_name, agent_type, mcp_server_name, tool_name, and status
  • mcp_server_deployment_status - Current deployment state of self-hosted MCP servers by server_name and state (not_created/pending/running/failed/succeeded). Value is 1 for the active state. Use count(mcp_server_deployment_status{state="running"} == 1) to count running deployments.

Archestra Application Metrics

HTTP Metrics

  • http_request_duration_seconds_count - Total HTTP requests by method, route, and status
  • http_request_duration_seconds_bucket - Request duration histogram buckets
  • http_request_summary_seconds - Request duration summary with quantiles

Process Metrics

  • process_cpu_user_seconds_total - CPU time in user mode
  • process_cpu_system_seconds_total - CPU time in system mode
  • process_resident_memory_bytes - Physical memory usage
  • process_start_time_seconds - Process start timestamp

Node.js Runtime Metrics

  • nodejs_eventloop_lag_seconds - Event loop lag (latency indicator)
  • nodejs_heap_size_used_bytes - V8 heap memory usage
  • nodejs_heap_size_total_bytes - Total V8 heap size
  • nodejs_external_memory_bytes - External memory usage
  • nodejs_active_requests_total - Currently active async requests
  • nodejs_active_handles_total - Active handles (file descriptors, timers)
  • nodejs_gc_duration_seconds - Garbage collection timing by type
  • nodejs_version_info - Node.js version information

Distributed Tracing

Archestra exports OpenTelemetry traces to help you understand request flows and identify performance bottlenecks. Traces can be consumed by any OTLP-compatible backend (Jaeger, Tempo, Honeycomb, Grafana Cloud, etc.).

Configuration

Configure the OpenTelemetry Collector endpoint via environment variable:

ARCHESTRA_OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318

This base URL is used for both traces (/v1/traces) and logs (/v1/logs). If not specified, it defaults to http://localhost:4318.

Authentication

Archestra supports authentication for OTEL trace export through environment variables. Authentication is optional and can be configured using either basic authentication or bearer token authentication.

Bearer Token Authentication

Bearer token authentication takes precedence over basic authentication when both are configured:

ARCHESTRA_OTEL_EXPORTER_OTLP_AUTH_BEARER=your-bearer-token

This adds an Authorization: Bearer your-bearer-token header to all OTEL requests.

Basic Authentication

For basic authentication, both username and password must be provided:

ARCHESTRA_OTEL_EXPORTER_OTLP_AUTH_USERNAME=your-username
ARCHESTRA_OTEL_EXPORTER_OTLP_AUTH_PASSWORD=your-password

This adds an Authorization: Basic base64(username:password) header to all OTEL requests.

No Authentication

If none of the authentication environment variables are configured, traces will be sent without authentication headers.

Content Capture

Archestra can capture prompt/completion content and tool call arguments/results as span events for full audit trail visibility. This is enabled by default and can be disabled via the ARCHESTRA_OTEL_CAPTURE_CONTENT environment variable.

When enabled, traces include:

  • LLM spans - gen_ai.content.prompt event with the request messages, and gen_ai.content.completion event with the response text
  • MCP spans - gen_ai.content.input event with tool call arguments, and gen_ai.content.output event with tool call results

Content is truncated to 10,000 characters per event by default to avoid oversized spans. This limit is configurable via the ARCHESTRA_OTEL_CONTENT_MAX_LENGTH environment variable.

Metric-to-Trace Exemplars

All LLM and MCP metrics include trace exemplars. When viewing these metrics in Grafana, you can click on individual data points to jump directly to the corresponding trace in Tempo. This requires:

  • Prometheus configured with --enable-feature=exemplar-storage
  • Grafana Prometheus datasource configured with exemplarTraceIdDestinations pointing to your Tempo datasource

Verbose Tracing

By default, traces only contain GenAI-specific spans (LLM calls, MCP tool calls) for a clean, focused view. To also capture internal infrastructure spans (HTTP routes, outgoing HTTP calls, Node.js fetch, etc), set the ARCHESTRA_OTEL_VERBOSE_TRACING environment variable to true. This is useful for debugging but produces significantly more spans.

When Sentry is configured, infrastructure auto-instrumentations are automatically enabled so that Sentry receives full traces for internal debugging. However, the customer-facing OTLP export is filtered to only include GenAI/MCP spans — customers see a clean trace view while Sentry gets the complete picture. Setting ARCHESTRA_OTEL_VERBOSE_TRACING=true disables this filtering, sending all spans to both Sentry and OTLP.

What's Traced

Archestra automatically traces:

  • LLM API calls - Calls to LLM providers with dedicated spans showing model, tokens, and response time
  • MCP tool calls - Tool executions through the MCP Gateway with tool name, server, and duration
  • HTTP requests (verbose mode only) - All API requests with method, route, and status code

Trace attributes follow the OTEL GenAI Semantic Conventions where applicable.

LLM Request Spans

Each LLM API call produces a span with SpanKind.CLIENT (indicating an outbound call to an external LLM API) and includes detailed attributes for filtering and analysis:

Request Attributes:

  • route.category=llm-proxy - All LLM proxy requests
  • gen_ai.operation.name - The operation type (chat, generate_content)
  • gen_ai.provider.name - Provider name (openai, anthropic, gemini, etc.)
  • gen_ai.request.model - Model name (e.g., gpt-4, claude-3-5-sonnet-20241022)
  • gen_ai.request.streaming - Whether the request was streaming (true/false)
  • server.address - Base URL of the LLM provider API
  • gen_ai.agent.id - Internal Archestra agent ID
  • gen_ai.agent.name - Internal Archestra agent name
  • gen_ai.conversation.id - Session ID for grouping related LLM calls (from X-Archestra-Session-Id header)
  • archestra.agent.type - Agent type (agent, llm_proxy, mcp_gateway, profile)
  • archestra.execution.id - Execution ID (from X-Archestra-Execution-Id header)
  • archestra.external_agent_id - Client-provided agent ID (from X-Archestra-Agent-Id header)
  • archestra.label.<key> - Custom agent labels (e.g., archestra.label.environment=production)
  • archestra.user.id - The Archestra user ID who made the request (when available)
  • archestra.user.email - The Archestra user email (when available)
  • archestra.user.name - The Archestra user display name (when available)

Response Attributes:

  • gen_ai.response.model - The model that actually generated the response (may differ from request model)
  • gen_ai.response.id - Provider-assigned response ID
  • gen_ai.usage.input_tokens - Number of input tokens consumed
  • gen_ai.usage.output_tokens - Number of output tokens generated
  • gen_ai.usage.total_tokens - Total tokens (input + output)
  • archestra.cost - Estimated cost in USD (requires token pricing configuration)
  • gen_ai.response.finish_reasons - Why the model stopped generating (e.g., ["stop"], ["tool_calls"], ["end_turn"])

Error Attributes:

  • error.type - The error class name when an exception occurs during the LLM call

Span Names:

Span names follow the GenAI semconv format {operation} {model}:

  • chat gpt-4o-mini - OpenAI, Anthropic, Cohere, and other chat-based providers
  • generate_content gemini-2.0-flash - Gemini content generation calls

MCP Tool Call Spans

Each MCP tool call executed through the MCP Gateway produces a dedicated span:

Span Attributes:

  • route.category=mcp-gateway - All MCP Gateway tool calls
  • gen_ai.operation.name=execute_tool - Operation type
  • gen_ai.tool.name - The full tool name (e.g., github__list_repos)
  • gen_ai.tool.type=function - Tool type (all MCP tools are function-type)
  • gen_ai.tool.call.id - Unique identifier for this tool call invocation
  • mcp.server.name - The MCP server handling the tool call (e.g., github, slack)
  • gen_ai.agent.id - Internal Archestra agent ID
  • gen_ai.agent.name - Internal Archestra agent name
  • gen_ai.conversation.id - Session ID (when available)
  • archestra.agent.type - Agent type
  • archestra.label.<key> - Custom agent labels
  • archestra.user.id - The Archestra user ID (when available)
  • archestra.user.email - The Archestra user email (when available)
  • archestra.user.name - The Archestra user display name (when available)
  • mcp.blocked - Whether the tool call was blocked by a tool invocation policy (true/false). When true, the tool was never executed — the span represents the policy decision. Blocked tool spans have span status ERROR with the blocked reason as the status message.
  • mcp.blocked_reason - Human-readable reason why the tool call was blocked (only present when mcp.blocked=true). Possible values include policy-specific reasons (e.g., "Tool invocation blocked: policy is configured to always block tool call"), untrusted context reasons, or custom reasons configured on individual policies.
  • mcp.is_error_result - Whether the tool returned an error result (true/false). This is distinct from span status ERROR, which indicates an exception during execution. Only present on executed (non-blocked) tool calls.
  • error.type - The error class name when an exception occurs during tool execution

Span Names:

  • execute_tool {tool_name} - e.g., execute_tool github__list_repos

Session Tracking

Archestra supports session-based grouping of LLM and tool call traces via the gen_ai.conversation.id attribute. Pass a session ID via the X-Archestra-Session-Id header in your LLM proxy requests to group all related traces together. This enables viewing the full timeline of LLM calls and tool executions within a single agent session.

Chat Traces

When using the built-in chat feature, each chat turn produces a unified trace that groups LLM calls and MCP tool executions under a single parent span:

chat {agentName}                       ← parent span (SpanKind.SERVER)
├── chat {model}                       ← LLM call via proxy (SpanKind.CLIENT)
├── execute_tool {tool_name}           ← MCP tool execution
└── chat {model}                       ← follow-up LLM call after tool result

The parent span (route.category=chat) carries the agent identity and session ID. LLM proxy calls from chat are linked via W3C traceparent header propagation, so the LLM spans appear as children rather than independent root traces. MCP tool executions run within the same async context and are automatically parented.

This same unified tracing applies to all agent invocation paths:

Invocation Pathroute.categoryarchestra.trigger.source
Chat UIchat
A2A Protocola2a
MS Teamschatopsms-teams
Emailemailemail

The archestra.trigger.source span attribute lets you filter traces by invocation channel (e.g., find all agent executions triggered from MS Teams).

External LLM proxy calls produce independent root traces.

Custom Agent Labels

Labels are key-value pairs that can be configured when creating or updating agents through the Archestra UI. Use them, for example, to logically group agents by environment or application type. Once added, labels automatically appear in:

  • Metrics - As additional label dimensions on all LLM and MCP metrics. Use them to drill down into charts. Note that kebab-case labels will be converted to snake_case here because of Prometheus naming rules.
  • Traces - As archestra.label.<key> span attributes. Use them to filter traces.

Grafana Dashboards

We provide four Grafana dashboards for monitoring Archestra:

  • GenAI Observability — LLM request metrics, token usage, cost analysis, latency, and traces
  • MCP Monitoring — MCP tool call metrics, error rates, duration, and traces
  • Agent Sessions — Session-level agent audit trail with drill-down into LLM calls, MCP tool calls, and correlated logs
  • Application Metrics — HTTP traffic, Node.js runtime health, and resource usage for monitoring your Archestra deployment

To install all four dashboards at once, create a Grafana Service Account token with the Editor basic role, or the fixed:folders:writer RBAC role for more granular access, and run:

GRAFANA_URL=https://your-grafana-instance GRAFANA_TOKEN=glsa_xxx \
  bash <(curl -sL https://raw.githubusercontent.com/archestra-ai/archestra/main/platform/dev/grafana/install-dashboards.sh)

This creates an "Archestra" folder and imports all dashboards. The script is idempotent — safe to re-run after updates to create new dashboards or update existing ones.