Registry/APM-0006
Case No.
APM-0006
Filed
April 28, 2026
Severity
3 / 5 · MODERATE

$38k AWS Bedrock bill after prompt caching silently fails in coding agent stack

Est. Damage ~$30k
Attribution Anonymous

Independent project · aggregated from public reports and may be unverified — see the primary source below · not affiliated with or endorsed by any company or product named.

Prompt

Normal local coding-agent workflow using Droid routing through an OpenAI-compatible API, LiteLLM, AWS Bedrock, and Claude Opus for repeated coding tasks involving repo state, tool schemas, instructions, conversation history, and file contents.

A developer ran a local coding agent (Droid) in a daily autonomous workflow, chaining requests through an OpenAI-compatible API, LiteLLM, AWS Bedrock, and Claude Opus. Every layer in the stack advertised prompt caching support, giving a false sense of cost efficiency. In reality, caching was nearly inactive: of the roughly 8.24 billion total input tokens processed, only ~101M were written to cache and ~1.67B were served from cache, while approximately 6.47 billion tokens — the overwhelming majority — were billed as full-price uncached input, costing ~$35,600 alone. The agent ran autonomously and unattended, repeatedly sending large context payloads including repo state, tool schemas, instructions, history, and file contents with no effective deduplication. Budget alerts had been configured but acted only as after-the-fact notifications, not hard stops. AWS credits of ~$8,026 partially offset the gross bill of $37,901.73, leaving a net charge of approximately $29,875. No hard spending cap, token-rate limit, or automatic shutoff existed at the Bedrock or API level to interrupt the runaway billing. The failure was entirely silent — no error surfaced, no warning fired, no request was rejected — until the billing statement arrived.

Verified Facts

  • Gross AWS Bedrock bill totaled $37,901.73
  • AWS credits covered approximately $8,026.54, leaving a net charge of roughly $29,875.19
  • Agent stack was: Droid → OpenAI-compatible API → LiteLLM → AWS Bedrock → Claude Opus
  • Uncached input tokens were approximately 6.47 billion, costing roughly $35,600
  • Cache read input tokens were approximately 1.67 billion, costing roughly $918
  • Cache write input tokens were approximately 101 million, costing roughly $698
  • Output tokens were approximately 25 million, costing roughly $698
  • Budget alerts were configured but did not stop spending; the incident was described as failing silently with no hard cap at the Bedrock or API level

Not Publicly Confirmed

  • How many days or hours the agent ran before the bill was discovered
  • Which specific layer in the chain (LiteLLM, Droid, or Bedrock integration) was the proximate cause of caching headers not being correctly forwarded
  • What specific project or coding task the agent was performing during the runaway period

Operational Lessons

  • Each layer claiming prompt caching support does not guarantee end-to-end caching actually activates — verify cache hit rates in production with real token-usage metrics before running premium models in any high-frequency autonomous workflow
  • Budget alerts are soft signals, not kill switches; metered AI infrastructure requires hard spending caps at the API or IAM level before agents are used as daily infrastructure
  • Autonomous agents running overnight with large, repeated context windows (repo state, tool schemas, history) are extremely high-risk cost surfaces — always test the full stack at small scale before unattended operation
  • AWS credits and pre-configured thresholds create false confidence; validate the actual enforcement mechanism, not just its existence
  • Treat prompt caching as an integration that must be explicitly verified, not assumed: log cache write and cache read ratios from the start and set alerts if the cache-hit rate falls below an expected threshold
$38k AWS Bedrock bill caused by a simple prompt caching missnews.ycombinator.com
Discussion
More Cases
0
APM-0008·Other / Unknown·MODERATE
Jun 20, 2024

McDonald's pulls IBM drive-thru AI after customers receive $250+ of unwanted McNuggets

McDonald's AI-powered drive-thru ordering system, developed in a joint venture with IBM, failed repeatedly across more than 100 test locations, generating incorrect and excessive orders that enraged customers. In documented incidents, the voice AI misinterpreted customer requests and autonomously added large quantities of items never requested, including over $250 worth of chicken McNuggets and unwanted packs of butter charged to individual customers. Rather than escalating ambiguous or unlikely orders to a human worker, the system processed them as-is. Customers filmed their interactions and posted the footage to social media, turning the failures into a public relations liability. Faced with sustained evidence that the technology could not reliably replace human order-takers, McDonald's announced it was terminating the IBM partnership and removing the AI system from all test restaurants. McDonald's USA chief restaurant officer Mason Smoot acknowledged the discontinuation in a statement but indicated the chain would continue exploring voice ordering solutions more broadly. The rollback ended a pilot that had expanded to over 100 locations.

0
APM-0046·Other / Unknown·LOW
Jun 10, 2026

Sports Illustrated published product reviews under fake AI-generated authors with AI headshots

Futurism reported in November 2023 that Sports Illustrated published product-review content under fabricated author personas — for example 'Drew Ortiz,' whose headshot was bought from an AI-portrait site and who had no real existence — supplied by third-party vendor AdVon Commerce. After inquiries, the fake authors vanished from the site. Publisher The Arena Group denied the articles themselves were AI-written but acknowledged pseudonyms; the episode damaged SI's credibility.

0
APM-0003·Cursor·MODERATE
Apr 14, 2025

Cursor support AI hallucinates login policy, triggering mass subscription cancellations

A backend session bug at Cursor IDE began silently logging users out whenever they switched between devices — no warning, no notification. Users contacted Cursor support seeking an explanation. Cursor's AI support system, described as designed to 'mimic human responses,' was the first point of contact. Rather than acknowledging ignorance or escalating, the bot fabricated an authoritative-sounding answer: it told multiple users the forced logouts were 'expected behavior' under a new single-device login restriction policy. No such policy existed. Because the bot presented itself as a human support agent, users had no reason to doubt the response. The hallucinated policy explanation spread rapidly across the developer community — multi-device workflows being non-negotiable for most developers, the fabricated policy was treated as a serious product decision made without any changelog entry or user notice. Within hours, dozens of users publicly canceled their subscriptions. As users began cross-referencing the story and noticing inconsistencies, the primary Reddit thread discussing the incident was locked and then deleted by moderators, with no public resolution or official acknowledgment. The underlying cause turned out to be a backend session bug — not a policy — but by the time that became clear, the cancellations had already happened. The hallucinated support response caused substantially more reputational and subscription damage than the original bug ever could have on its own.