Ruoqi in #general: "Hi team — I think I found a prompt-caching bug in the backend and couldn't open a GitHub issue (interactions are restricted to prior contributors), so..." | Archestra Community

#general

Jun 18June 19, 2026

Hi team — I think I found a prompt-caching bug in the backend and couldn't open a GitHub issue (interactions are restricted to prior contributors), so posting here.

TL;DR: On the Bedrock path, applyPromptCacheBreakpoints attaches ttl:"1h" to cache breakpoints for Claude 4.6 models, but on Bedrock only the 4.5 generation supports the 1-hour TTL — 4.6 is 5-minute only. Bedrock rejects the request (it doesn't silently downgrade), so a Bedrock + Opus/Sonnet 4.6 chat fails.

Where: platform/backend/src/routes/chat/normalization/apply-prompt-cache.ts:24

// comment says "4.5 and newer support a 1-hour cache TTL"

const ONEHOURCACHE_MODEL = /claude-(?:sonnet|haiku|opus)-4-[5-9](?!\d)/;

This matches 4.5–4.9, so 4.6 gets tagged 1h-capable. useOneHour (lines 90-94) then writes ttl:"1h" into the Bedrock cachePoint regardless of provider.

Why it's wrong (Bedrock-specific): AWS's docs list the 1h TTL set by exact version — only Opus/Sonnet/Haiku 4.5; the 4.6 rows say 5-minute only. (Note: on the Anthropic-direct API 4.6 does support 1h, so the bug is specific to the Bedrock path.) Unsupported TTL is a hard error on Bedrock — see the parallel report openclaw/openclaw#21986.

Repro: your own unit test already encodes it — apply-prompt-cache.test.ts:282 uses model: "us.anthropic.claude-opus-4-6-v1:0" and asserts ttl:"1h".

Fix: narrow the 1h allowlist (Bedrock → 4.5 generation only) and default to the 5-minute TTL for anything not on it, so newer/unknown models degrade to 5m instead of failing. Happy to open a PR with the patch + updated test if you can whitelist my account for contributions — just let me know.

(Found while doing academic research on prompt-caching bugs in OSS LLM apps; full writeup available if useful.)

1 reply

Yanis3:35 PMOpen in Slack

Quick question, has anyone tried routing ADK agents through Archestra's LLM Proxy ? I am able to talk to the agent but images/files are not going through, it seems like they are dropped and the agent only receives text. Anyone has any experience with routing different clients through the LLM proxy ?

I am using LiteLLM in order to route through Archestra since ADK does not seem to provide a way to change the api_base. But I might be completely wrong in my approach, any help is appreciated.

from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm
ARCHESTRA_LLM_PROXY_ID = "bc0ecdc9-ff33-43a1-a703-73a1fd1c80fb"
ARCHESTRA_MODEL_ROUTER_URL = f"<http://localhost:3000/v1/model-router/{ARCHESTRA_LLM_PROXY_ID}>"

root_agent = Agent(
    name="test_agent",
    model=LiteLlm(
        model="gemini-2.5-flash",
        api_base=ARCHESTRA_MODEL_ROUTER_URL,
        api_key=os.getenv("ARCHESTRA_API_KEY"),
    ),
    description=(
        "Agent to talk to."
    ),
    instruction=(
        "You are a helpful agent who can have a discussion with the user."
    ),
)```

👀1

13 replies

Hamza Nazir4:46 PMOpen in Slack

Hey everyone!

I'm Hamza Nazir, a MERN Stack Developer focused on building full-stack web applications with React, Node.js, Express, and MongoDB.

I joined Archestra to learn, connect with other developers, and contribute to open-source projects. Looking forward to collaborating and building with the community!

:archestra-love:1

2 replies

Read-only live mirror of Archestra.AI Slack

👋Join the discussion withAI enthusiasts!

Thread

Ruoqi11:10 AMOpen in Slack