Supported LLM Providers

Overview

Archestra Platform acts as a security proxy between your AI applications and LLM providers. It currently supports the following LLM providers.

OpenAI

Supported OpenAI APIs

Chat Completions API (/chat/completions) - ✅ Fully supported
Responses API (/responses) - ⚠️ Not yet supported (GitHub Issue #720)

OpenAI Connection Details

Base URL: http://localhost:9000/v1/openai/{profile-id}
Authentication: Pass your OpenAI API key in the Authorization header as Bearer <your-api-key>

Important Notes

Use Chat Completions API: Ensure your application uses the /chat/completions endpoint (not /responses). Many frameworks default to this, but some like Vercel AI SDK require explicit configuration (add .chat to the provider instance).
Streaming: OpenAI streaming responses require your cloud provider's load balancer to support long-lived connections. See Cloud Provider Configuration for more details.

Anthropic

Supported Anthropic APIs

Messages API (/messages) - ✅ Fully supported

Anthropic Connection Details

Base URL: http://localhost:9000/v1/anthropic/{profile-id}
Authentication: Pass your Anthropic API key in the x-api-key header

Google Gemini

Archestra supports both the Google AI Studio (Gemini Developer API) and Vertex AI implementations of the Gemini API.

Supported Gemini APIs

Generate Content API (:generateContent) - ✅ Fully supported
Stream Generate Content API (:streamGenerateContent) - ✅ Fully supported

Gemini Connection Details

Base URL: http://localhost:9000/v1/gemini/{profile-id}/v1beta
Authentication:
- Google AI Studio (default): Pass your Gemini API key in the x-goog-api-key header
- Vertex AI: No API key required from clients - uses server-side Application Default Credentials (ADC)

Using Vertex AI

To use Vertex AI instead of Google AI Studio, configure these environment variables:

Variable	Required	Description
`ARCHESTRA_GEMINI_VERTEX_AI_ENABLED`	Yes	Set to `true` to enable Vertex AI mode
`ARCHESTRA_GEMINI_VERTEX_AI_PROJECT`	Yes	Your GCP project ID
`ARCHESTRA_GEMINI_VERTEX_AI_LOCATION`	No	GCP region (default: `us-central1`)
`ARCHESTRA_GEMINI_VERTEX_AI_CREDENTIALS_FILE`	No	Path to service account JSON key file

GKE with Workload Identity (Recommended)

For GKE deployments, we recommend using Workload Identity which provides secure, keyless authentication. This eliminates the need for service account JSON key files.

Setup steps:

Create a GCP service account with Vertex AI permissions:

gcloud iam service-accounts create archestra-vertex-ai \
  --display-name="Archestra Vertex AI"

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:archestra-vertex-ai@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Bind the GCP service account to the Kubernetes service account:

gcloud iam service-accounts add-iam-policy-binding \
  archestra-vertex-ai@PROJECT_ID.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"

Replace NAMESPACE with your Helm release namespace and KSA_NAME with the Kubernetes service account name (defaults to archestra-platform).

Configure Helm values to annotate the service account:

archestra:
  orchestrator:
    kubernetes:
      serviceAccount:
        annotations:
          iam.gke.io/gcp-service-account: archestra-vertex-ai@PROJECT_ID.iam.gserviceaccount.com
  env:
    ARCHESTRA_GEMINI_VERTEX_AI_ENABLED: "true"
    ARCHESTRA_GEMINI_VERTEX_AI_PROJECT: "PROJECT_ID"
    ARCHESTRA_GEMINI_VERTEX_AI_LOCATION: "us-central1"

With this configuration, Application Default Credentials (ADC) will automatically use the bound GCP service account—no credentials file needed.

Other Environments

For non-GKE environments, Vertex AI supports several authentication methods through Application Default Credentials (ADC):

Service account key file: Set ARCHESTRA_GEMINI_VERTEX_AI_CREDENTIALS_FILE to the path of a service account JSON key file
Local development: Use gcloud auth application-default login to authenticate with your user account
Cloud environments: Attached service accounts on Compute Engine, Cloud Run, and Cloud Functions are automatically detected
AWS/Azure: Use workload identity federation to authenticate without service account keys

See the Vertex AI authentication guide for detailed setup instructions for each environment.

Cerebras

Cerebras provides fast inference for open-source AI models through an OpenAI-compatible API.

Supported Cerebras APIs

Chat Completions API (/chat/completions) - ✅ Fully supported

Cerebras Connection Details

Base URL: http://localhost:9000/v1/cerebras/{agent-id}
Authentication: Pass your Cerebras API key in the Authorization header as Bearer <your-api-key>

Important Notes

Usage of the llama models in the chat ⚠️ Not yet supported (GitHub Issue #2058)

Cohere

Cohere provides enterprise-grade LLMs designed for safe, controllable, and efficient AI applications. The platform offers features like safety guardrails, function calling, and both synchronous and streaming APIs.

Supported Cohere APIs

Chat API (/chat) - ✅ Fully supported
Streaming: ✅ Fully supported

Cohere Connection Details

Base URL: http://localhost:9000/v1/cohere/{profile-id}
Authentication: Pass your Cohere API key in the Authorization header as Bearer <your-api-key>

Environment Variables

Variable	Required	Description
`ARCHESTRA_COHERE_BASE_URL`	No	Cohere API base URL (default: `https://api.cohere.ai`)
`ARCHESTRA_CHAT_COHERE_API_KEY`	No	Default API key for Cohere (can be overridden per conversation/team/org)

Important Notes

API Key format: Obtain your API key from the Cohere Dashboard

Groq

Groq provides low-latency inference for popular open-source models through an OpenAI-compatible API.

Supported Groq APIs

Chat Completions API (/chat/completions) - ✅ Fully supported (OpenAI-compatible)

Groq Connection Details

Base URL: http://localhost:9000/v1/groq/{profile-id}
Authentication: Pass your Groq API key in the Authorization header as Bearer <your-api-key>

Environment Variables

Variable	Required	Description
`ARCHESTRA_GROQ_BASE_URL`	No	Groq API base URL (default: `https://api.groq.com/openai/v1`)
`ARCHESTRA_CHAT_GROQ_API_KEY`	No	Default API key for Groq (can be overridden per conversation/team/org)

Getting an API Key

You can generate an API key from the Groq Console.

Popular Models

llama-3.3-70b-versatile
llama-3.1-8b-instant
gemma2-9b-it

Important Notes

OpenAI-compatible API: Groq uses the OpenAI Chat Completions request/response format, which makes it a good fit for existing OpenAI client libraries.
Base URL includes /openai/v1: When configuring a custom Groq endpoint, ensure the base URL points to the OpenAI-compatible API root (for example, https://api.groq.com/openai/v1).

OpenRouter

OpenRouter provides access to many models via a single OpenAI-compatible API, with optional attribution headers for ranking and analytics.

Supported OpenRouter APIs

Chat Completions API (/chat/completions) - ✅ Fully supported (OpenAI-compatible)

OpenRouter Connection Details

Base URL: http://localhost:9000/v1/openrouter/{profile-id}
Authentication: Pass your OpenRouter API key in the Authorization header as Bearer <your-api-key>

Environment Variables

Variable	Required	Description
`ARCHESTRA_OPENROUTER_BASE_URL`	No	OpenRouter API base URL (default: `https://openrouter.ai/api/v1`)
`ARCHESTRA_CHAT_OPENROUTER_API_KEY`	No	Default API key for OpenRouter (can be overridden per conversation/team/org)
`ARCHESTRA_OPENROUTER_REFERER`	No	Attribution header `HTTP-Referer` sent to OpenRouter (recommended)
`ARCHESTRA_OPENROUTER_TITLE`	No	Attribution header `X-Title` sent to OpenRouter (recommended)

Getting an API Key

You can generate an API key from the OpenRouter dashboard.

Popular Models

openrouter/auto
openrouter/openai/gpt-4o-mini

Important Notes

OpenAI-compatible API: OpenRouter uses the OpenAI Chat Completions request/response format.
Attribution headers: OpenRouter recommends sending HTTP-Referer and X-Title headers. Archestra can be configured to send these automatically via ARCHESTRA_OPENROUTER_REFERER and ARCHESTRA_OPENROUTER_TITLE.

Mistral AI

Mistral AI provides state-of-the-art open and commercial AI models through an OpenAI-compatible API.

Supported Mistral APIs

Chat Completions API (/chat/completions) - ✅ Fully supported

Mistral Connection Details

Base URL: http://localhost:9000/v1/mistral/{agent-id}
Authentication: Pass your Mistral API key in the Authorization header as Bearer <your-api-key>

Getting an API Key

You can get an API key from the Mistral AI Console.

Perplexity AI

Perplexity AI provides AI-powered search and answer engines with real-time web search capabilities through an OpenAI-compatible API.

Supported Perplexity APIs

Chat Completions API (/chat/completions) - ✅ Fully supported

Perplexity Connection Details

Base URL: http://localhost:9000/v1/perplexity/{agent-id}
Authentication: Pass your Perplexity API key in the Authorization header as Bearer <your-api-key>

Environment Variables

Variable	Required	Description
`ARCHESTRA_PERPLEXITY_BASE_URL`	No	Perplexity API base URL (default: `https://api.perplexity.ai`)
`ARCHESTRA_CHAT_PERPLEXITY_API_KEY`	No	Default API key for Perplexity (can be overridden per conversation/team/org)

Getting an API Key

You can get an API key from the Perplexity Settings.

Important Notes

No tool calling support: Perplexity does NOT support external tool calling. It performs internal web searches and returns results in the response. Use Perplexity for search-augmented generation, not agentic workflows requiring custom tools.
Search results: Perplexity responses may include search_results and citations fields containing web search results used to generate the answer.
Models: Popular models include sonar-pro, sonar, and sonar-deep-research for different use cases.

vLLM

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It's ideal for self-hosted deployments where you want to run open-source models on your own infrastructure.

Supported vLLM APIs

Chat Completions API (/chat/completions) - ✅ Fully supported (OpenAI-compatible)

vLLM Connection Details

Base URL: http://localhost:9000/v1/vllm/{profile-id}
Authentication: API key is optional. Pass in Authorization header as Bearer <your-api-key> if your vLLM deployment requires auth.

Setup

Go to Settings > LLM API Keys and add a new key with provider vLLM
Set the Base URL to your vLLM server (e.g., http://your-vllm-host:8000/v1)
API key can be left blank for most self-hosted deployments

The base URL can also be set globally via the ARCHESTRA_VLLM_BASE_URL environment variable. Per-key base URLs in the UI take precedence.

Environment Variables

Variable	Required	Description
`ARCHESTRA_VLLM_BASE_URL`	Yes	vLLM server base URL (e.g., `http://localhost:8000/v1` or your vLLM endpoint)
`ARCHESTRA_CHAT_VLLM_API_KEY`	No	API key for vLLM server (optional, many deployments don't require auth)

Important Notes

Configure base URL to enable vLLM: The vLLM provider is only available when ARCHESTRA_VLLM_BASE_URL is set or a per-key base URL is configured in the UI. Without either, vLLM won't appear as an option.
No API key required for most deployments: Unlike cloud providers, self-hosted vLLM typically doesn't require authentication. When adding a vLLM key in the platform, the API key field is marked as optional.

Ollama

Ollama is a local LLM runner that makes it easy to run open-source large language models on your machine. It's perfect for local development, testing, and privacy-conscious deployments.

Supported Ollama APIs

Chat Completions API (/chat/completions) - ✅ Fully supported (OpenAI-compatible)

Ollama Connection Details

Base URL: http://localhost:9000/v1/ollama/{profile-id}
Authentication: API key is optional. Pass in Authorization header as Bearer <your-api-key> if your Ollama deployment requires auth (e.g., Ollama Cloud).

Setup

Go to Settings > LLM API Keys and add a new key with provider Ollama
Optionally set the Base URL if your Ollama server runs on a non-default host/port
API key can be left blank for self-hosted Ollama

The default base URL is http://localhost:11434/v1. Override it per-key in the UI or globally via ARCHESTRA_OLLAMA_BASE_URL.

Environment Variables

Variable	Required	Description
`ARCHESTRA_OLLAMA_BASE_URL`	No	Ollama server base URL (default: `http://localhost:11434/v1`)
`ARCHESTRA_CHAT_OLLAMA_API_KEY`	No	API key for Ollama server (optional, should be used for the Ollama Cloud API)

Important Notes

Enabled by default: Ollama is enabled out of the box with a default base URL of http://localhost:11434/v1.
No API key required: Self-hosted Ollama doesn't require authentication. When adding an Ollama key in the platform, the API key field is marked as optional.
Model availability: Models must be pulled first using ollama pull <model-name> before they can be used through Archestra.

Zhipu AI

Zhipu AI (Z.ai) is a Chinese AI company offering the GLM (General Language Model) series of large language models. The platform provides both free and commercial models with strong performance in Chinese and English language tasks.

Supported Zhipu AI APIs

Chat Completions API (/chat/completions) - ✅ Fully supported (OpenAI-compatible)

Zhipu AI Connection Details

Base URL: http://localhost:9000/v1/zhipuai/{profile-id}
Authentication: Pass your Zhipu AI API key in the Authorization header as Bearer <your-api-key>

Environment Variables

Variable	Required	Description
`ARCHESTRA_ZHIPUAI_BASE_URL`	No	Zhipu AI API base URL (default: `https://api.z.ai/api/paas/v4`)
`ARCHESTRA_CHAT_ZHIPUAI_API_KEY`	No	Default API key for Zhipu AI (can be overridden per conversation/team/org)

Popular Models

GLM-4.5-Flash (Free tier) - Fast inference model with good performance
GLM-4.5 - Balanced model for general use
GLM-4.5-Air - Lightweight model optimized for speed
GLM-4.6 - Enhanced version with improved capabilities
GLM-4.7 - Latest model with advanced features

Important Notes

OpenAI-compatible API: Zhipu AI's API follows the OpenAI Chat Completions format, making it easy to switch between providers
API Key format: Obtain your API key from the Zhipu AI Platform
Free tier available: The GLM-4.5-Flash model is available on the free tier for testing and development
Chinese language support: GLM models excel at Chinese language understanding and generation, while maintaining strong English capabilities

xAI (Grok)

xAI is Elon Musk's AI company offering the Grok series of large language models with real-time information access and advanced reasoning capabilities.

Supported xAI APIs

Chat Completions API (/chat/completions) - ✅ Fully supported (OpenAI-compatible)

xAI Connection Details

Base URL: http://localhost:9000/v1/xai/{profile-id}
Authentication: Pass your xAI API key in Authorization header as Bearer <your-api-key>

Environment Variables

Variable	Required	Description
`ARCHESTRA_XAI_BASE_URL`	No	xAI API base URL (default: `https://api.x.ai/v1`)
`ARCHESTRA_CHAT_XAI_API_KEY`	No	Default API key for xAI (can be overridden per conversation/team/org)

Getting an API Key

You can generate an API key from the xAI Console.

Popular Models

grok-2-latest - Latest Grok model with enhanced capabilities
grok-2-mini - Lightweight variant optimized for speed
grok-beta - Beta version with experimental features

Important Notes

OpenAI-compatible API: xAI's API follows the OpenAI Chat Completions format, making it easy to switch between providers
Real-time information: Grok models have access to real-time information from X (Twitter) for up-to-date responses
API Key format: Obtain your API key from the xAI Console
Rate limits: Be mindful of xAI's rate limits when implementing high-volume applications

MiniMax

MiniMax is a Chinese AI company offering advanced large language models with strong reasoning capabilities. The platform provides the MiniMax-M2 series with chain-of-thought reasoning capabilities and support for text, images, and multi-turn conversations.

Supported MiniMax APIs

Chat Completions API (/chat/completions) - ✅ Fully supported (OpenAI-compatible)

MiniMax Connection Details

Base URL: http://localhost:9000/v1/minimax/{profile-id}
Authentication: Pass your MiniMax API key in the Authorization header as Bearer <your-api-key>

Environment Variables

Variable	Required	Description
`ARCHESTRA_CHAT_MINIMAX_API_KEY`	No	Default API key for MiniMax (can be overridden per conversation/team/org)
`ARCHESTRA_CHAT_MINIMAX_BASE_URL`	No	MiniMax API base URL (default: `https://api.minimax.io/v1`)

Available Models

MiniMax-M2 - Base model with strong reasoning capabilities ($0.3/$1.2 per M tokens)
MiniMax-M2.1 - Enhanced model with improved performance ($0.3/$1.2 per M tokens)
MiniMax-M2.1-lightning - Fast inference variant of M2.1 ($0.6/$2.4 per M tokens)
MiniMax-M2.5 - Latest model with enhanced capabilities ($0.3/$1.2 per M tokens)
MiniMax-M2.5-highspeed - Fast inference variant of M2.5 ($0.6/$2.4 per M tokens)

Important Notes

OpenAI-compatible API (text-only): MiniMax's API follows the OpenAI Chat Completions format for easy integration. The integration uses text-only messages (no image or multimodal content support).
Reasoning metadata: MiniMax models support extended thinking through the reasoning_details field in responses, which contains the model's reasoning process as structured data (not as <think> tags in the message content).
API Key: Obtain your API key from the MiniMax Platform
No /models endpoint: MiniMax does not provide a models listing API. Available models are hardcoded in the platform configuration
Chinese and English support: MiniMax models excel at both Chinese and English language tasks

Amazon Bedrock

Supported Bedrock APIs

Converse API (/converse) - ✅ Fully supported (AWS Docs)
Converse Stream API (/converse-stream) - ✅ Fully supported (AWS Docs)
InvokeModel API (/invoke) - ⚠️ Not yet supported (AWS Docs)
OpenAI-compatible API (Mantle) - ⚠️ Not yet supported (AWS Docs)

Bedrock Connection Details

Base URL: http://localhost:9000/v1/bedrock/{profile-id}
Authentication: Pass your Amazon Bedrock API key in the Authorization header as Bearer <your-api-key>

Environment Variables

Variable	Required	Description
`ARCHESTRA_BEDROCK_BASE_URL`	Yes	Bedrock runtime endpoint URL (e.g., `https://bedrock-runtime.us-east-1.amazonaws.com`)
`ARCHESTRA_BEDROCK_INFERENCE_PROFILE_PREFIX`	No	Region prefix for cross-region inference profiles (e.g., `us` or `eu`)
`ARCHESTRA_CHAT_BEDROCK_API_KEY`	No	Default API key for Bedrock (can be overridden per conversation/team/org)

`ARCHESTRA_BEDROCK_BASE_URL`

This variable is required to enable the Bedrock provider. It specifies the regional endpoint for the Bedrock Runtime API. The URL format follows AWS regional endpoints:

https://bedrock-runtime.{region}.amazonaws.com

`ARCHESTRA_BEDROCK_INFERENCE_PROFILE_PREFIX`

Some Bedrock models, such as Anthropic's Claude, require cross-region inference profiles. Set this variable to enable those models. If not set, only models with on-demand inference support will be available.

For more details, see how inference works in Amazon Bedrock.