Now enrolling · 2025

Learn
Reinforcement
Learning.

Build, visualize, and train intelligent agents. A modern curriculum designed for students in grades 7–10 — rigorous, free, and actually fun.

Apply Now — It's Free

How RL Works

🤖

Agent

Observes state → chooses action

↓ action sent to environment

🌍

Environment

Returns reward + next state

↓ reward drives learning

📈

Policy Update

Agent improves over time

↺ Repeat until optimal

12+

Hands-on modules

7–10

Grade levels served

Completely free

∞

Future potential

What We Offer

Built for curious
young minds.

Every concept is taught through a problem. Students don't learn RL to pass a test — they learn it to build something real.

🤖

Train Real Agents

Students build and watch agents learn through trial and error — from grid worlds to CartPole — seeing RL work, not just theory on a slide.

📊

Visual Learning

Complex concepts like the Bellman equation and Q-tables are brought to life through animated visualizations and interactive demos.

🧠

Deep Concepts, Simple Language

From explore vs. exploit to deep Q-networks, we translate cutting-edge AI research into something a 7th grader can genuinely understand.

💻

Python-First Projects

Students write real Python code, not drag-and-drop blocks. They leave with working projects they can show off, build on, and present.

🚀

Built by a Student

ReinforceLearn was created by a high school student who actually does this work — so the curriculum meets students exactly where they are.

🎯

Problem-Centered Design

Every module starts with a challenge. Students are curious before they're taught — which makes everything stick.

The Program

A 12-week journey
into AI.

From zero to training intelligent agents. Each module builds on the last — no prior AI experience needed, just curiosity. Three phases: foundations, core RL, and advanced deep RL.

Phase 1 — Foundations

Week 1 · Introduction

What Is Reinforcement Learning?

The big picture: what makes RL different from supervised and unsupervised learning. Students explore how agents interact with environments through actions and observations, and why reward-driven learning is so powerful.

Agent & EnvironmentReward SignalRL vs MLReal-World Examples

Week 2 · Core Concepts

States, Actions & the Markov Property

We formalize the RL framework with Markov Decision Processes. Students learn what a state really is, how action spaces are defined, and why the Markov property is the mathematical foundation everything else is built on.

MDPsState SpacesAction SpacesMarkov Property

Week 3 · Rewards & Goals

Reward Design & Discount Factors

Designing reward functions is one of the hardest parts of RL. Students learn what makes a good (and bad) reward, why long-term vs short-term rewards matter, and how the discount factor γ controls how far ahead an agent thinks.

Reward ShapingDiscount Factor γSparse RewardsReturn G_t

Phase 2 — Core RL

Week 4 · Value Functions

Value Functions & the Bellman Equation

The mathematical heart of RL. Students derive the Bellman equation from first principles, implement value and action-value functions, and see how iterating the Bellman backup leads to optimal policies.

V(s) and Q(s,a)Bellman EquationOptimal PolicyPolicy Evaluation

Week 5 · Tabular RL

Q-Learning & Q-Tables

Students build their first Q-learning agent from scratch. They construct a Q-table, implement the TD update rule, and watch the agent gradually learn to navigate a grid world — seeing RL click in real time.

Q-TablesTD LearningLearning Rate αGrid World Lab

Week 6 · Exploration

Explore vs. Exploit

Should your agent try new things or stick with what works? Students experiment with epsilon-greedy strategies, decaying exploration schedules, and the multi-armed bandit problem — building deep intuition for one of RL's core dilemmas.

Epsilon-GreedyExploration DecayMulti-Armed BanditUCB Strategy

Week 7 · Policy Methods

Policy Iteration & SARSA

Students learn the difference between on-policy and off-policy learning. We compare SARSA (on-policy) to Q-learning (off-policy), implement both, and explore how policy iteration converges to optimal solutions.

SARSAOn-policy vs Off-policyPolicy IterationConvergence

Phase 3 — Deep RL

Week 8 · Neural Networks

Neural Networks for RL

Before building DQNs, students need to understand neural networks. Layers, activation functions, backpropagation, and function approximation — everything that sets the stage for Deep RL.

Feedforward NetworksActivation FunctionsBackpropagationFunction Approximation

Week 9 · Deep Q-Networks

DQNs — When Q-Tables Aren't Enough

When state spaces are too large for a table, we use neural networks. Students learn the DQN architecture, experience replay, and target networks — the innovations that powered DeepMind's Atari-playing agents.

DQN ArchitectureExperience ReplayTarget NetworksLoss Functions

Week 10 · OpenAI Gym

Training Agents in Real Environments

Students put everything together using OpenAI Gym — training agents on CartPole and MountainCar, tuning hyperparameters, and interpreting training curves.

OpenAI GymCartPoleHyperparameter TuningTraining Curves

Week 11 · Advanced Topics

Policy Gradients & Actor-Critic

A peek at the frontier. Students are introduced to policy gradient methods and the actor-critic architecture that combines the best of value-based and policy-based approaches.

REINFORCE AlgorithmPolicy GradientsActor-CriticAdvantage Function

Week 12 · Capstone

Build & Present Your Own Agent

Students choose a challenge environment, design their agent architecture, train it from scratch, and present results. The project you'll show at science fairs, competitions, and college applications.

Full RL PipelineEnvironment of ChoicePerformance AnalysisProject Showcase

Apply Now — It's Free →

Our Story

AI education shouldn't wait for college.

ReinforceLearn was built on a simple frustration: reinforcement learning — one of the most powerful areas of modern AI — is almost completely absent from K-12 education. Most students don't encounter it until graduate school, if ever.

We believe that's backwards. The concepts behind RL are deeply intuitive. They map onto how humans and animals learn naturally. Students in 7th grade are ready to understand them.

So we built the curriculum we wished existed. Rigorous enough to be meaningful. Approachable enough to be fun. Built by a student, for students.

The RL Loop

🤖

Agent

Observes state, chooses action

↓ action

🌍

Environment

Returns reward + new state

↓ reward

📈

Policy Update

Agent learns to maximize reward

↺ repeat until optimal

Our Values

🔬

Rigor Without Gatekeeping

We don't water down the math — we make it accessible. Real concepts, real code, taught with patience and clarity.

🌱

Curiosity Over Credentials

No prerequisites. No barriers. If you're curious about AI, you belong here. We start from zero, together.

🤝

Student-Built, Student-First

This curriculum was designed by someone who went through the same confusion. We meet students exactly where they are.

What People Say

Real results
from real students.

Don't take our word for it. Here's what students, parents, and teachers are saying about ReinforceLearn.

★★★★★

I went in thinking reinforcement learning was something only PhD students could understand. By Week 3, I had a working Q-learning agent navigating a grid world on my own laptop. The way everything connects — the math, the code, the intuition — it just clicks. I've never felt smarter in my life.

Jordan K.

8th Grade Student, Portland OR

★★★★★

My daughter came home explaining the explore-exploit tradeoff using a restaurant analogy she invented herself. The curriculum teaches kids to think, not just follow steps.

Maya P.

Parent of 9th Grader

★★★★★

As a CS teacher, I've tried a dozen AI curricula. ReinforceLearn is the only one where students are genuinely building something from scratch — and feeling proud of it.

T. Reyes

CS Teacher, Washington Middle School

★★★★★

The Bellman equation used to intimidate me. Now I think about it like updating a game strategy. The visual explanations completely changed how I understand AI.

Aiden S.

10th Grade Student

★★★★★

The capstone project was the highlight of my son's school year. He stayed up until midnight training his CartPole agent and woke me up to show me when it finally worked.

L. Washington

Parent of 7th Grader

Learn
Reinforcement
Learning.

A 12-week journey
into AI.

AI education shouldn't wait for college.

Real results
from real students.

Got questions?
We've got answers.

Let's build
the future of
AI education.

Send a Message

Message sent!

LearnReinforcementLearning.

A 12-week journeyinto AI.

AI education shouldn't wait for college.

Real resultsfrom real students.

Got questions?We've got answers.

Let's buildthe future ofAI education.

Send a Message

Message sent!

Learn
Reinforcement
Learning.

A 12-week journey
into AI.

Real results
from real students.

Got questions?
We've got answers.

Let's build
the future of
AI education.