How to Tell If Your Agent Used the Right Stuff
Session Abstract
Many so-called “agent failures” are actually context failures in disguise. In this session, we’ll explore how to tell whether your agent truly saw and used the right context, using techniques like tracing and attribution, golden datasets for context-aware evaluation, and targeted probes to test retrieval quality.
Session Description
Your agent answered confidently, did it use the right evidence? We’ll walk through a repeatable debugging workflow for RAG + tool-using agents: instrument traces, inspect retrieved chunks, run attribution and citation checks, and isolate failure modes (missing recall, bad ranking, distractors, stale docs). You’ll learn how to create a lightweight golden set, write probe questions, and track retrieval + answer metrics so improvements are measurable, not vibes.