Data Science
Stories

Let LLMs Wander: Engineering RL Environments

Session Abstract

What if, instead of learning only from examples, Language Models could explore crafted Environments, little worlds where they can act and improve autonomously?

Join me to see how Reinforcement Learning Environments work, how to build them with open-source tools, and how to use them to evaluate and train LLMs/Agents.

Session Description

Since the release of reasoning Language Models like DeepSeek R1, improving model capabilities is moving beyond static examples (Supervised Fine-Tuning) to interaction via Reinforcement Learning.

To enable this, we need RL Environments: controlled worlds where models can act, get rewards, and learn.
An environment is more than a dataset. It is a piece of software that orchestrates interactions with the model, manages state, defines rewards, and verifies outcomes.

In this talk, I will walk you through my journey exploring this emerging space from a software engineering perspective.

1. I will start by mapping classic Reinforcement Learning concepts to Language Models.

2. I will then introduce Verifiers, an open-source library for building environments as software artifacts.

3. Based on Verifiers, we’ll see concrete design patterns that range from simple single-turn tasks, to multi-turn games, to environments for tool-using agents that interact with external systems.

4. I’ll share practical experiences using environments for evaluation and training Small Language Models.

By the end of the session, attendees will be able to start building their own Reinforcement Learning environments, little worlds for LLMs. I’ll also share the joys, frustrations, and lessons learned along the way.