$38k AWS Bedrock bill after prompt caching silently fails in coding agent stack
A developer ran a local coding agent (Droid) in a daily autonomous workflow, chaining requests through an OpenAI-compatible API, LiteLLM, AWS Bedrock, and Claude Opus. Every layer in the stack advertised prompt caching support, giving a false sense of cost efficiency. In reality, caching was nearly inactive: of the roughly 8.24 billion total input tokens processed, only ~101M were written to cache and ~1.67B were served from cache, while approximately 6.47 billion tokens — the overwhelming majority — were billed as full-price uncached input, costing ~$35,600 alone. The agent ran autonomously and unattended, repeatedly sending large context payloads including repo state, tool schemas, instructions, history, and file contents with no effective deduplication. Budget alerts had been configured but acted only as after-the-fact notifications, not hard stops. AWS credits of ~$8,026 partially offset the gross bill of $37,901.73, leaving a net charge of approximately $29,875. No hard spending cap, token-rate limit, or automatic shutoff existed at the Bedrock or API level to interrupt the runaway billing. The failure was entirely silent — no error surfaced, no warning fired, no request was rejected — until the billing statement arrived.