Research

Towards a Science of AI Agent Reliability

February 21, 2026

We propose twelve metrics decomposing AI agent reliability along four dimensions — consistency, robustness, predictability, and safety — and evaluate 14 models, finding that recent capability gains have yielded only small improvements in reliability.