When better retrieval makes agents worse
Session Abstract
Agentic systems can break not because information is missing, but because persuasively wrong context gets promoted into action. We examine a recurring pattern: retrieval metrics improve while agent behavior degrades as distractors enter multi-step loops. We show why relevance, reliability, and security are tightly connected in agentic retrieval.
Session Description
In agentic workflows, retrieval is no longer just ranking for a human reader; it is context injection into reasoning and tool use. That shift changes the failure mode. Plausible but incorrect evidence can degrade outcomes disproportionately, and in noisy settings, longer reasoning can make answers worse rather than better. This is inverse scaling under noise: more capable reasoning produces more confident mistakes. In iterative agent loops, those mistakes are recycled and amplified, turning small retrieval defects into workflow-level failures.
In this talk we’ll break down the main failure modes, including plausible distractors, error compounding across steps, and the gap between traditional retrieval metrics and real task utility. We’ll present design patterns for robust agentic retrieval: stricter evidence selection, sufficiency checks before acting, and explicit pause/retry/escalate behavior when confidence is not warranted. We’ll also connect these patterns to challenges in open agent tooling ecosystems, where untrusted context has shown that retrieval is a threat surface as well as a ranking problem.