Failure Categories

12 categories used to classify AI agent failure modes.

The agent caused embarrassment, reputation damage, or interpersonal harm.

The agent confidently generated false information, fabricated APIs, URLs, or facts.

The agent's actions violated legal, regulatory, or policy requirements.

Credentials, secrets, or sensitive data were exposed or mishandled.

The agent transmitted sensitive data to external or unintended destinations.

The agent's actions resulted in significant unexpected financial costs.

The agent took actions far beyond the intended scope of the task.

The agent entered an unrecoverable loop, causing resource exhaustion or runaway costs.

The agent fundamentally misinterpreted a clear instruction and acted on the wrong assumption.

Messages, emails, or data were sent to unintended recipients.

The agent permanently deleted files, database records, or storage objects.

The agent produced, committed, or deployed broken, destructive, or insecure code.