$38k AWS Bedrock bill after prompt caching silently fails in coding agent stack

Est. Damage ~$30k

Attribution Anonymous

Independent project · aggregated from public reports and may be unverified — see the primary source below · not affiliated with or endorsed by any company or product named.

Instruction Given to Agent

Prompt

“Normal local coding-agent workflow using Droid routing through an OpenAI-compatible API, LiteLLM, AWS Bedrock, and Claude Opus for repeated coding tasks involving repo state, tool schemas, instructions, conversation history, and file contents.”

Incident Summary

A developer ran a local coding agent (Droid) in a daily autonomous workflow, chaining requests through an OpenAI-compatible API, LiteLLM, AWS Bedrock, and Claude Opus. Every layer in the stack advertised prompt caching support, giving a false sense of cost efficiency. In reality, caching was nearly inactive: of the roughly 8.24 billion total input tokens processed, only ~101M were written to cache and ~1.67B were served from cache, while approximately 6.47 billion tokens — the overwhelming majority — were billed as full-price uncached input, costing ~$35,600 alone. The agent ran autonomously and unattended, repeatedly sending large context payloads including repo state, tool schemas, instructions, history, and file contents with no effective deduplication. Budget alerts had been configured but acted only as after-the-fact notifications, not hard stops. AWS credits of ~$8,026 partially offset the gross bill of $37,901.73, leaving a net charge of approximately $29,875. No hard spending cap, token-rate limit, or automatic shutoff existed at the Bedrock or API level to interrupt the runaway billing. The failure was entirely silent — no error surfaced, no warning fired, no request was rejected — until the billing statement arrived.

Case Analysis

Verified Facts

Gross AWS Bedrock bill totaled $37,901.73
AWS credits covered approximately $8,026.54, leaving a net charge of roughly $29,875.19
Agent stack was: Droid → OpenAI-compatible API → LiteLLM → AWS Bedrock → Claude Opus
Uncached input tokens were approximately 6.47 billion, costing roughly $35,600
Cache read input tokens were approximately 1.67 billion, costing roughly $918
Cache write input tokens were approximately 101 million, costing roughly $698
Output tokens were approximately 25 million, costing roughly $698
Budget alerts were configured but did not stop spending; the incident was described as failing silently with no hard cap at the Bedrock or API level

Not Publicly Confirmed

How many days or hours the agent ran before the bill was discovered
Which specific layer in the chain (LiteLLM, Droid, or Bedrock integration) was the proximate cause of caching headers not being correctly forwarded
What specific project or coding task the agent was performing during the runaway period

Operational Lessons

Each layer claiming prompt caching support does not guarantee end-to-end caching actually activates — verify cache hit rates in production with real token-usage metrics before running premium models in any high-frequency autonomous workflow
Budget alerts are soft signals, not kill switches; metered AI infrastructure requires hard spending caps at the API or IAM level before agents are used as daily infrastructure
Autonomous agents running overnight with large, repeated context windows (repo state, tool schemas, history) are extremely high-risk cost surfaces — always test the full stack at small scale before unattended operation
AWS credits and pre-configured thresholds create false confidence; validate the actual enforcement mechanism, not just its existence
Treat prompt caching as an integration that must be explicitly verified, not assumed: log cache write and cache read ratios from the start and set alerts if the cache-hit rate falls below an expected threshold

Primary Source

$38k AWS Bedrock bill caused by a simple prompt caching missnews.ycombinator.com ↗

Discussion

More Cases

APM-0003·Cursor·MODERATE

Apr 14, 2025

Cursor support AI hallucinates login policy, triggering mass subscription cancellations

A backend session bug at Cursor IDE began silently logging users out whenever they switched between devices — no warning, no notification. Users contacted Cursor support seeking an explanation. Cursor's AI support system, described as designed to 'mimic human responses,' was the first point of contact. Rather than acknowledging ignorance or escalating, the bot fabricated an authoritative-sounding answer: it told multiple users the forced logouts were 'expected behavior' under a new single-device login restriction policy. No such policy existed. Because the bot presented itself as a human support agent, users had no reason to doubt the response. The hallucinated policy explanation spread rapidly across the developer community — multi-device workflows being non-negotiable for most developers, the fabricated policy was treated as a serious product decision made without any changelog entry or user notice. Within hours, dozens of users publicly canceled their subscriptions. As users began cross-referencing the story and noticing inconsistencies, the primary Reddit thread discussing the incident was locked and then deleted by moderators, with no public resolution or official acknowledgment. The underlying cause turned out to be a backend session bug — not a policy — but by the time that became clear, the cancellations had already happened. The hallucinated support response caused substantially more reputational and subscription damage than the original bug ever could have on its own.

APM-0008·Other / Unknown·MODERATE

Jun 20, 2024

McDonald's pulls IBM drive-thru AI after customers receive $250+ of unwanted McNuggets

McDonald's AI-powered drive-thru ordering system, developed in a joint venture with IBM, failed repeatedly across more than 100 test locations, generating incorrect and excessive orders that enraged customers. In documented incidents, the voice AI misinterpreted customer requests and autonomously added large quantities of items never requested, including over $250 worth of chicken McNuggets and unwanted packs of butter charged to individual customers. Rather than escalating ambiguous or unlikely orders to a human worker, the system processed them as-is. Customers filmed their interactions and posted the footage to social media, turning the failures into a public relations liability. Faced with sustained evidence that the technology could not reliably replace human order-takers, McDonald's announced it was terminating the IBM partnership and removing the AI system from all test restaurants. McDonald's USA chief restaurant officer Mason Smoot acknowledged the discontinuation in a statement but indicated the chain would continue exploring voice ordering solutions more broadly. The rollback ended a pilot that had expanded to over 100 locations.

APM-0070·OpenAI·MODERATE

Jul 29, 2026

Klarna replaced 700 agents with an AI assistant, then started rehiring humans after service quality dropped

Klarna said in 2024 that its OpenAI-powered assistant did the work of 700 customer-service agents. By 2025 the company reversed course and began rehiring humans, with the CEO admitting they focused too much on cost and efficiency, which lowered quality. Klarna moved to a hybrid model where AI handles routine queries and people handle escalations and complex cases.

scope-creep social-blunder

All Cases More AWS Bedrock Agent

Share on X