Now enrolling for 2025

Learn
Reinforcement
Learning.

Shape the future of AI. Build, visualize, and train intelligent agents with a modern learning platform designed for students in grades 7–10.

Apply Now →
12+
Hands-on modules
7–10
Grade levels
0$
Free forever
Future potential
Built for curious young minds.
🤖
Train Real Agents
Students build and watch agents learn through trial and error — from grid worlds to CartPole — seeing RL in action, not just theory.
📊
Visual Learning
Complex concepts like the Bellman equation and Q-tables are brought to life through animated visualizations and interactive demos.
🧠
Deep Concepts, Simple Language
From explore vs. exploit to deep Q-networks, we translate cutting-edge AI research into something a 7th grader can genuinely understand.
💻
Python-First Projects
Students write real Python code, not just drag-and-drop blocks. They leave with working projects they can show off and build on.
🚀
Built by a Student
ReinforceLearn was created by a high school student who actually does this work — so the curriculum meets students where they are.
🎯
Problem-Centered Design
Every concept is taught through a challenge. Students don't learn RL to pass a test — they learn it to solve interesting problems.

A 12-week journey
into AI

From zero to training intelligent agents. Each module builds on the last — no prior AI experience needed, just curiosity. Three phases: foundations, core RL, and advanced deep RL.

Phase 1 — Foundations
01
Week 1 · Introduction
What Is Reinforcement Learning?
The big picture: what makes RL different from supervised and unsupervised learning. Students explore how agents interact with environments through actions and observations, and why reward-driven learning is so powerful.
Agent & Environment Reward Signal RL vs ML Real-World Examples
02
Week 2 · Core Concepts
States, Actions & the Markov Property
We formalize the RL framework with Markov Decision Processes. Students learn what a state really is, how action spaces are defined, and why the Markov property is the mathematical foundation everything else is built on.
MDPs State Spaces Action Spaces Markov Property
03
Week 3 · Rewards & Goals
Reward Design & Discount Factors
Designing reward functions is one of the hardest parts of RL. Students learn what makes a good (and bad) reward, why long-term vs short-term rewards matter, and how the discount factor γ controls how far ahead an agent thinks.
Reward Shaping Discount Factor γ Sparse Rewards Return G_t
Phase 2 — Core RL
04
Week 4 · Value Functions
Value Functions & the Bellman Equation
The mathematical heart of RL. Students derive the Bellman equation from first principles, implement value and action-value functions, and see how iterating the Bellman backup leads to optimal policies.
V(s) and Q(s,a) Bellman Equation Optimal Policy Policy Evaluation
05
Week 5 · Tabular RL
Q-Learning & Q-Tables
Students build their first Q-learning agent from scratch. They construct a Q-table, implement the TD update rule, and watch the agent gradually learn to navigate a grid world — seeing RL click in real time.
Q-Tables TD Learning Learning Rate α Grid World Lab
06
Week 6 · Exploration
Explore vs. Exploit
Should your agent try new things or stick with what works? Students experiment with epsilon-greedy strategies, decaying exploration schedules, and the multi-armed bandit problem — building deep intuition for one of RL's core dilemmas.
Epsilon-Greedy Exploration Decay Multi-Armed Bandit UCB Strategy
07
Week 7 · Policy Methods
Policy Iteration & SARSA
Students learn the difference between on-policy and off-policy learning. We compare SARSA (on-policy) to Q-learning (off-policy), implement both, and explore how policy iteration converges to optimal solutions.
SARSA On-policy vs Off-policy Policy Iteration Convergence
Phase 3 — Deep RL
08
Week 8 · Neural Networks
Neural Networks for RL
Before building DQNs, students need to understand neural networks. This week covers the basics: layers, activation functions, backpropagation, and how a network learns to approximate functions — setting the stage for Deep RL.
Feedforward Networks Activation Functions Backpropagation Function Approximation
09
Week 9 · Deep Q-Networks
DQNs — When Q-Tables Aren't Enough
When state spaces are too large for a table, we use neural networks. Students learn the DQN architecture, experience replay, and target networks — the innovations that powered DeepMind's Atari-playing agents.
DQN Architecture Experience Replay Target Networks Loss Functions
10
Week 10 · OpenAI Gym
Training Agents in Real Environments
Students put everything together using OpenAI Gym. They train agents on classic control tasks like CartPole and MountainCar, learn how to read environment APIs, tune hyperparameters, and interpret training curves.
OpenAI Gym CartPole Hyperparameter Tuning Training Curves
11
Week 11 · Advanced Topics
Policy Gradients & Actor-Critic
A peek at the frontier. Students are introduced to policy gradient methods — agents that directly optimize their policy rather than learning a value function — and the actor-critic architecture that combines the best of both worlds.
REINFORCE Algorithm Policy Gradients Actor-Critic Advantage Function
12
Week 12 · Capstone
Build & Present Your Own Agent
The culmination of 12 weeks. Students choose a challenge environment, design their agent architecture, train it from scratch, and present results to peers and mentors. This is the project you'll show at science fairs, competitions, and college apps.
Full RL Pipeline Environment of Choice Performance Analysis Project Showcase
Apply Now — It's Free →

AI education shouldn't wait for college.

ReinforceLearn was built on a simple frustration: reinforcement learning — one of the most powerful and fascinating areas of modern AI — is almost completely absent from K-12 education. Most students don't encounter it until graduate school, if ever.

We believe that's backwards. The concepts behind RL — reward, consequence, exploration, strategy — are deeply intuitive. They map onto how humans and animals learn naturally. Students in 7th grade are ready to understand them.

So we built the curriculum we wished existed. Rigorous enough to be meaningful. Approachable enough to be fun. Built by a student, for students.

🤖
Agent
Observes state, chooses action
↓ action
🌍
Environment
Returns reward + new state
↓ reward
📈
Policy Update
Agent learns to maximize reward
↺ repeat
Our Values
🔬
Rigor Without Gatekeeping
We don't water down the math — we make it accessible. Real concepts, real code, taught with patience.
🌱
Curiosity Over Credentials
No prerequisites. No barriers. If you're curious about AI, you belong here. We start from zero, together.
🤝
Student-Built, Student-First
This curriculum was designed by someone who went through the same confusion. We meet students exactly where they are.

Real results
from real students.

Don't take our word for it. Here's what students, parents, and teachers are saying about ReinforceLearn.

★★★★★
"
My daughter came home explaining the explore-exploit tradeoff using a restaurant analogy she invented herself. The curriculum teaches kids to think, not just follow steps.
MP
Maya P.
Parent of 9th Grader
★★★★★
"
As a CS teacher, I've tried a dozen AI curricula. ReinforceLearn is the only one where students are genuinely building something from scratch — and feeling proud of it.
TR
T. Reyes
CS Teacher, Washington Middle School
★★★★★
"
The Bellman equation used to intimidate me. Now I think about it like updating a game strategy. The visual explanations completely changed how I understand AI.
AS
Aiden S.
10th Grade Student
★★★★★
"
The capstone project was the highlight of my son's school year. He stayed up until midnight training his CartPole agent and woke me up to show me when it finally worked.
LW
L. Washington
Parent of 7th Grader

Got questions?
We've got answers.

Everything you need to know about ReinforceLearn. If you don't see your question here, reach out through the contact page.

Let's build
the future of
AI education.

Whether you're a student ready to enroll, a teacher looking to bring ReinforceLearn to your classroom, or just curious — we'd love to hear from you.

📧
Email
hello@reinforcelearn.io
📍
Location
Portland, Oregon
Response Time
Usually within 24 hours