Part of the “Tokens for Thoughts” series
By Han Fang, Karthik Abinav Sankararaman and Claude Cowork — the environment, agents, visualizations, and even this post were created through human-AI collaboration.

Naive vs Learning Agent
The NAIVE agent (left) keeps hitting the same wall. The LEARNING agent (right) remembers what’s blocked and finds a way through. That’s RL in 10 seconds.
We are using an example of an AI gameplay agent via reinforcement learning (RL) to play Pokemon Red. Not a traditional ML model crunching pixels, LLM is acting as the “brain” that learns from experience.
Why does this matter? Because the same principles that teach an AI to navigate Route 1 also teach Claude to write better code, answer questions more helpfully, and avoid harmful outputs. RL environments are the training grounds where AI learns from trial and error.
But here’s the thing— most tutorials skip past what an RL environment actually is. Papers assume you know. Courses hand-wave it. So let’s fix that, using Pokemon as our guide.
Why Pokemon? Three reasons:
Peter Whidden’s “Training AI to Play Pokemon” video went viral a couple years ago and inspired a lot of this work. Go watch it—it’s incredible. But even that video assumes you know what an environment is.
Let’s start from scratch.