Performance & Latency

3 min read

Overview

This document provides performance metrics and overhead measurements for Archestra Platform. The platform adds approximately 30-50ms latency per request (41ms at p99) while providing enterprise-grade security and policy enforcement for LLM applications.

Current Performance Results

  • Server Configuration: Single-threaded Node.js process
  • Hardware: GCP e2-standard-2 (2 vCPU, 8GB RAM) + Cloud SQL PostgreSQL 16 (8 vCPU, 32GB RAM)
  • Throughput: 155 req/s @ concurrency=10, 272 req/s @ concurrency=500
  • Latency @ concurrency=10:
    • Backend processing: 20-23ms
    • End-to-end: P50=25ms, P95=31ms, P99=41ms
    • Database: <0.5ms (not the bottleneck)
    • LLM: Mock mode (no real LLM API calls) to isolate platform overhead
  • Resource utilization: 0.44% CPU, 222MB RAM

Hardware Requirements

Minimum Requirements

  • CPU: 2 cores
  • RAM: 4GB
  • Storage: 20GB
  • Database: PostgreSQL (can be shared)

Production Deployment

Kubernetes with HPA (Horizontal Pod Autoscaler):

  • Deploy as Kubernetes deployment with multiple replicas
  • Configure HPA to auto-scale based on CPU/memory metrics
  • Scales automatically to handle traffic spikes
  • Recommended for production environments requiring high availability
TierRequests/DayRequests/SecondPlatform ResourcesDatabase ResourcesArchitecture
Small<100K1-1001 instance: 2 vCPU, 4GB RAM2 vCPU, 4GB RAMSingle instance + shared DB
Medium100K-1M100-5002-4 instances: 4 vCPU, 8GB RAM each4 vCPU, 8GB RAM, read replicasLoad balancer + DB replication
Large1M-10M500-2K4-8 instances: 4 vCPU, 16GB RAM each8 vCPU, 16GB RAM, connection poolingMulti-region, dedicated DB cluster
Enterprise>10M2K+8+ instances: 8 vCPU, 16GB RAM each8+ vCPU, 32GB RAM, shardingMulti-region, DB cluster + caching

Operation-Specific Performance

OperationResponse TimeNotes
Chat completion (with tools)~30ms+ Tool metadata persistence
Dual LLM quarantine (1 round)~2-3s2x LLM API calls (provider-dependent)
Dual LLM quarantine (3 rounds)~6-9s6x LLM API calls (provider-dependent)

Failure Handling

Database Failures:

  • Platform requires database connectivity for operation
  • Recommendation: Use managed PostgreSQL with automatic failover
  • Mitigation: Deploy multiple platform instances across availability zones

LLM Provider Failures:

  • Platform forwards provider errors to clients with error codes and messages
  • Interaction logging occurs after successful response to prevent data loss

Platform Instance Failures:

  • Stateless design enables instant failover
  • Deploy behind load balancer for automatic routing
  • No session state - any instance can handle any request

Monitoring & Observability

Built-in Monitoring:

  • Interaction logging for all requests/responses
  • Policy evaluation tracking
  • Error logging and tracking
  • Performance metrics available via database queries

For detailed information on setting up Prometheus monitoring, distributed tracing with OpenTelemetry, and Grafana dashboards, see the Observability documentation.