Failure Categories
12 categories used to classify AI agent failure modes.
The agent caused embarrassment, reputation damage, or interpersonal harm.
The agent confidently generated false information, fabricated APIs, URLs, or facts.
The agent's actions violated legal, regulatory, or policy requirements.
Credentials, secrets, or sensitive data were exposed or mishandled.
The agent transmitted sensitive data to external or unintended destinations.
The agent's actions resulted in significant unexpected financial costs.
Messages, emails, or data were sent to unintended recipients.
The agent permanently deleted files, database records, or storage objects.
The agent produced, committed, or deployed broken, destructive, or insecure code.
The agent entered an unrecoverable loop, causing resource exhaustion or runaway costs.
The agent took actions far beyond the intended scope of the task.
The agent fundamentally misinterpreted a clear instruction and acted on the wrong assumption.