Data Science & AI Machine Learning

Reinforcement Learning Fundamentals

Build practical intuition for agents, rewards, policies, value functions, and modern RL methods

Reinforcement Learning Fundamentals logo
Quick Course Facts
20
Self-paced, Online, Lessons
20
Videos and/or Narrated Presentations
6.9
Approximate Hours of Course Media
About the Reinforcement Learning Fundamentals Course

Reinforcement Learning Fundamentals is an online course that introduces how learning agents make decisions, improve through feedback, and solve sequential problems in Artificial Intelligence. You will build practical intuition for agents, rewards, policies, value functions, and modern RL methods while learning how to reason about real-world reinforcement learning workflows.

Build Practical Reinforcement Learning Skills For Artificial Intelligence

  • Learn when reinforcement learning applies and how agents interact with environments through states, actions, and rewards.
  • Develop a clear understanding of Markov Decision Processes, returns, discounting, policies, and value functions.
  • Study core methods including dynamic programming, Monte Carlo learning, temporal-difference learning, SARSA, and Q-learning.
  • Explore modern RL methods such as Deep Q-Networks, policy gradients, actor-critic approaches, reward design, evaluation, safety, and deployment constraints.

Reinforcement Learning Fundamentals teaches the concepts, mathematics, and practical workflows behind decision-making agents in Artificial Intelligence.

This course begins with the foundations of learning agents, showing what reinforcement learning is, when it is useful, and how the agent-environment loop drives learning. You will examine states, actions, rewards, and sequential decisions so you can understand how Artificial Intelligence systems learn from experience instead of relying only on fixed instructions.

From there, you will formalize reinforcement learning problems using Markov Decision Processes and study returns, discounting, long-term value, policies, value functions, and action-value functions. These lessons help you build practical intuition for agents, rewards, policies, value functions, and modern RL methods without losing sight of how the math connects to implementation choices.

The course then moves into essential reinforcement learning algorithms, including Bellman equations, dynamic programming for known environments, Monte Carlo learning, temporal-difference learning, SARSA, and Q-learning. You will also learn how exploration and exploitation affect training, why action selection matters, and how value-based control methods guide agents toward better decisions.

In the later lessons, Reinforcement Learning Fundamentals introduces larger-scale and more modern RL methods, including function approximation, Deep Q-Networks, policy gradient methods, actor-critic methods, and advantage estimation. You will also study reward design, evaluation, common failure modes, safety, ethics, and real-world deployment constraints before designing a small end-to-end RL experiment. By the end of the course, you will be able to think clearly about reinforcement learning problems, compare major RL approaches, and approach Artificial Intelligence agent design with stronger technical judgment.

Course Lessons

Full lesson breakdown

Lessons are organized by topic area and each includes descriptive copy for search visibility and student clarity.

Foundations of Learning Agents

3 lessons

This lesson introduces reinforcement learning as a way to train decision-making agents through interaction, feedback, and delayed consequences. You will learn the core idea behind an agent acting in a…
This lesson introduces the basic vocabulary of reinforcement learning: agents , environments , states , actions , and rewards . Learners will see how these pieces form a repeated interaction loop wher…
In this lesson, Professor Victoria Okafor introduces reinforcement learning as a framework for making sequential decisions : choices whose consequences unfold over time rather than ending immediately …

Modeling RL Problems

3 lessons

This lesson formalizes reinforcement learning problems as Markov Decision Processes, or MDPs. Learners define the core pieces of an MDP: states, actions, transition dynamics, rewards, discounting, hor…
This lesson explains how reinforcement learning agents evaluate outcomes that unfold over time. Learners will distinguish immediate rewards from returns, compute discounted returns, and interpret the …
This lesson introduces the three core objects used to describe decision-making in reinforcement learning: policies , state-value functions , and action-value functions . Learners will see how a policy…

Core RL Mathematics

2 lessons

This lesson introduces Bellman equations as the mathematical link between immediate reward, future value, and decision quality in reinforcement learning. Learners will see how value functions can be d…
Dynamic programming is the family of reinforcement learning methods used when the environment model is known: the transition probabilities and rewards are available. In this lesson, Professor Victoria…

Learning from Experience

2 lessons

Monte Carlo learning estimates value functions directly from completed episodes. Instead of requiring a model of transition probabilities or bootstrapping from another estimate, it waits until an epis…
This lesson introduces temporal-difference learning as the bridge between Monte Carlo learning and dynamic programming. Learners see how an agent can update value estimates after each step by combinin…

Value-Based Control

2 lessons

In this lesson, students learn SARSA as a practical on-policy control method for estimating action values while improving an epsilon-greedy policy. The focus is on how SARSA updates from real experien…
In this lesson, learners move from prediction to value-based control by learning how Q-learning estimates the optimal action-value function directly. The focus is on the Bellman optimality target, the…

Training Reliable Agents

1 lesson

This lesson explains the central action-selection problem in reinforcement learning: an agent must use what it already knows while still trying actions that may teach it something better. Learners wil…

Scaling Reinforcement Learning

2 lessons

This lesson explains how reinforcement learning scales beyond small tabular environments by replacing lookup tables with function approximation . Learners will see how features, linear models, neural …
This lesson explains how Deep Q-Networks use neural networks to approximate action-value functions when tabular Q-learning no longer fits the state space. Learners connect the Bellman target from earl…

Policy Optimization

2 lessons

This lesson introduces policy gradient methods , a family of reinforcement learning algorithms that optimize a parameterized policy directly instead of learning only a value function and deriving acti…
This lesson explains how actor-critic methods combine policy learning with value estimation to improve reinforcement learning updates. You will see why pure policy gradient methods can be noisy, how a…

Practical RL Workflows

3 lessons

This lesson focuses on one of the most practical and error-prone parts of reinforcement learning: deciding what the agent should be rewarded for, how to evaluate whether learning is actually working, …
This lesson examines what changes when reinforcement learning leaves a controlled notebook and starts influencing real systems, users, money, equipment, or policy decisions. It focuses on practical de…
In this lesson, students design a compact reinforcement learning experiment from start to finish: choosing a manageable environment, defining observations and actions, shaping a reward signal, selecti…

Take this course at your own pace

Create a free account to enroll, keep your progress, and preview lessons — it takes 30 seconds.

Create a Free Account
About Your Instructor
Professor Victoria Okafor

Professor Victoria Okafor

Professor Victoria Okafor guides this AI-built Virversity course with a clear, practical teaching style.