Artificial Intelligence Machine Learning

Reinforcement Learning Fundamentals

Build practical intuition for agents, rewards, policies, value functions, and modern RL methods

Reinforcement Learning Fundamentals logo
Quick Course Facts
20
Self-paced, Online, Lessons
20
Videos and/or Narrated Presentations
6.9
Approximate Hours of Course Media
About the Reinforcement Learning Fundamentals Course

Reinforcement Learning Fundamentals is an online course that introduces how learning agents make decisions, improve through feedback, and solve sequential problems in Artificial Intelligence. You will build practical intuition for agents, rewards, policies, value functions, and modern RL methods while learning how to reason about real-world reinforcement learning workflows.

Build Practical Reinforcement Learning Skills For Artificial Intelligence

  • Learn when reinforcement learning applies and how agents interact with environments through states, actions, and rewards.
  • Develop a clear understanding of Markov Decision Processes, returns, discounting, policies, and value functions.
  • Study core methods including dynamic programming, Monte Carlo learning, temporal-difference learning, SARSA, and Q-learning.
  • Explore modern RL methods such as Deep Q-Networks, policy gradients, actor-critic approaches, reward design, evaluation, safety, and deployment constraints.

Reinforcement Learning Fundamentals teaches the concepts, mathematics, and practical workflows behind decision-making agents in Artificial Intelligence.

This course begins with the foundations of learning agents, showing what reinforcement learning is, when it is useful, and how the agent-environment loop drives learning. You will examine states, actions, rewards, and sequential decisions so you can understand how Artificial Intelligence systems learn from experience instead of relying only on fixed instructions.

From there, you will formalize reinforcement learning problems using Markov Decision Processes and study returns, discounting, long-term value, policies, value functions, and action-value functions. These lessons help you build practical intuition for agents, rewards, policies, value functions, and modern RL methods without losing sight of how the math connects to implementation choices.

The course then moves into essential reinforcement learning algorithms, including Bellman equations, dynamic programming for known environments, Monte Carlo learning, temporal-difference learning, SARSA, and Q-learning. You will also learn how exploration and exploitation affect training, why action selection matters, and how value-based control methods guide agents toward better decisions.

In the later lessons, Reinforcement Learning Fundamentals introduces larger-scale and more modern RL methods, including function approximation, Deep Q-Networks, policy gradient methods, actor-critic methods, and advantage estimation. You will also study reward design, evaluation, common failure modes, safety, ethics, and real-world deployment constraints before designing a small end-to-end RL experiment. By the end of the course, you will be able to think clearly about reinforcement learning problems, compare major RL approaches, and approach Artificial Intelligence agent design with stronger technical judgment.

Course Lessons

Full lesson breakdown

Lessons are organized by topic area and each includes descriptive copy for search visibility and student clarity.

Foundations of Learning Agents

3 lessons

This lesson introduces reinforcement learning as a way to train decision-making agents through interaction, feedback, and delayed consequences. You will learn the core idea behind an agent acting in a…

Lesson 2: Agents, Environments, States, Actions, and Rewards

19 min
This lesson introduces the basic vocabulary of reinforcement learning: agents , environments , states , actions , and rewards . Learners will see how these pieces form a repeated interaction loop wher…

Lesson 3: Sequential Decisions and the Reinforcement Learning Loop

17 min
In this lesson, Professor Victoria Okafor introduces reinforcement learning as a framework for making sequential decisions : choices whose consequences unfold over time rather than ending immediately …

Modeling RL Problems

3 lessons

Lesson 4: Markov Decision Processes and Problem Formalization

22 min
This lesson formalizes reinforcement learning problems as Markov Decision Processes, or MDPs. Learners define the core pieces of an MDP: states, actions, transition dynamics, rewards, discounting, hor…

Lesson 5: Returns, Discounting, and Long-Term Value

20 min
This lesson explains how reinforcement learning agents evaluate outcomes that unfold over time. Learners will distinguish immediate rewards from returns, compute discounted returns, and interpret the …

Lesson 6: Policies, Value Functions, and Action-Value Functions

21 min
This lesson introduces the three core objects used to describe decision-making in reinforcement learning: policies , state-value functions , and action-value functions . Learners will see how a policy…

Core RL Mathematics

2 lessons

Lesson 7: Bellman Equations and the Principle of Optimality

23 min
This lesson introduces Bellman equations as the mathematical link between immediate reward, future value, and decision quality in reinforcement learning. Learners will see how value functions can be d…

Lesson 8: Dynamic Programming for Known Environments

20 min
Dynamic programming is the family of reinforcement learning methods used when the environment model is known: the transition probabilities and rewards are available. In this lesson, Professor Victoria…

Learning from Experience

2 lessons

Lesson 9: Monte Carlo Learning from Episodes

19 min
Monte Carlo learning estimates value functions directly from completed episodes. Instead of requiring a model of transition probabilities or bootstrapping from another estimate, it waits until an epis…

Lesson 10: Temporal-Difference Learning and Bootstrapping

21 min
This lesson introduces temporal-difference learning as the bridge between Monte Carlo learning and dynamic programming. Learners see how an agent can update value estimates after each step by combinin…

Value-Based Control

2 lessons

Lesson 11: SARSA and On-Policy Control

18 min
In this lesson, students learn SARSA as a practical on-policy control method for estimating action values while improving an epsilon-greedy policy. The focus is on how SARSA updates from real experien…

Lesson 12: Q-Learning and Off-Policy Control

20 min
In this lesson, learners move from prediction to value-based control by learning how Q-learning estimates the optimal action-value function directly. The focus is on the Bellman optimality target, the…

Training Reliable Agents

1 lesson

Lesson 13: Exploration, Exploitation, and Action Selection

19 min
This lesson explains the central action-selection problem in reinforcement learning: an agent must use what it already knows while still trying actions that may teach it something better. Learners wil…

Scaling Reinforcement Learning

2 lessons

Lesson 14: Function Approximation for Large State Spaces

22 min
This lesson explains how reinforcement learning scales beyond small tabular environments by replacing lookup tables with function approximation . Learners will see how features, linear models, neural …

Lesson 15: Deep Q-Networks and Stabilizing Deep RL

24 min
This lesson explains how Deep Q-Networks use neural networks to approximate action-value functions when tabular Q-learning no longer fits the state space. Learners connect the Bellman target from earl…

Policy Optimization

2 lessons

Lesson 16: Policy Gradient Methods

22 min
This lesson introduces policy gradient methods , a family of reinforcement learning algorithms that optimize a parameterized policy directly instead of learning only a value function and deriving acti…

Lesson 17: Actor-Critic Methods and Advantage Estimation

23 min
This lesson explains how actor-critic methods combine policy learning with value estimation to improve reinforcement learning updates. You will see why pure policy gradient methods can be noisy, how a…

Practical RL Workflows

3 lessons

Lesson 18: Reward Design, Evaluation, and Common Failure Modes

21 min
This lesson focuses on one of the most practical and error-prone parts of reinforcement learning: deciding what the agent should be rewarded for, how to evaluate whether learning is actually working, …

Lesson 19: Safety, Ethics, and Real-World Deployment Constraints

18 min
This lesson examines what changes when reinforcement learning leaves a controlled notebook and starts influencing real systems, users, money, equipment, or policy decisions. It focuses on practical de…

Lesson 20: Designing a Small End-to-End RL Experiment

25 min
In this lesson, students design a compact reinforcement learning experiment from start to finish: choosing a manageable environment, defining observations and actions, shaping a reward signal, selecti…
About Your Instructor
Professor Victoria Okafor

Professor Victoria Okafor

Professor Victoria Okafor guides this AI-built Virversity course with a clear, practical teaching style.