How to Tell If Your Agent Used the Right Stuff

Data Science

Operations

Stories

How to Tell If Your Agent Used the Right Stuff

Session Abstract

Many so-called “agent failures” are actually context failures in disguise. In this session, we’ll explore how to tell whether your agent truly saw and used the right context, using techniques like tracing and attribution, golden datasets for context-aware evaluation, and targeted probes to test retrieval quality.

Session Description

Your agent answered confidently, did it use the right evidence? We’ll walk through a repeatable debugging workflow for RAG + tool-using agents: instrument traces, inspect retrieved chunks, run attribution and citation checks, and isolate failure modes (missing recall, bad ranking, distractors, stale docs). You’ll learn how to create a lightweight golden set, write probe questions, and track retrieval + answer metrics so improvements are measurable, not vibes.

Talk

Apurva Misra

All Speakers

All Sessions

PS DEV