Sample-based Learning Methods
- Offered byCoursera
Sample-based Learning Methods at Coursera Overview
Duration | 22 hours |
Start from | Start Now |
Total fee | Free |
Mode of learning | Online |
Difficulty level | Intermediate |
Official Website | Explore Free Course |
Credential | Certificate |
Sample-based Learning Methods at Coursera Highlights
- Shareable Certificate Earn a Certificate upon completion
- 100% online Start instantly and learn at your own schedule.
- Course 2 of 4 in the Reinforcement Learning Specialization
- Flexible deadlines Reset deadlines in accordance to your schedule.
- Intermediate Level Probabilities & Expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), implementing algorithms from pseudocode
- Approx. 22 hours to complete
- English Subtitles: Arabic, French, Portuguese (European), Italian, Vietnamese, German, Russian, English, Spanish
Sample-based Learning Methods at Coursera Course details
- In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent's own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment's dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.
- By the end of this course you will be able to:
- - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience
- - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model
- - Understand the connections between Monte Carlo and Dynamic Programming and TD.
- - Implement and apply the TD algorithm, for estimating value functions
- - Implement and apply Expected Sarsa and Q-learning (two TD methods for control)
- - Understand the difference between on-policy and off-policy control
- - Understand planning with simulated experience (as opposed to classic planning strategies)
- - Implement a model-based approach to RL, called Dyna, which uses simulated experience
- - Conduct an empirical study to see the improvements in sample efficiency when using Dyna
Sample-based Learning Methods at Coursera Curriculum
Welcome to the Course!
Course Introduction
Meet your instructors!
Reinforcement Learning Textbook
Read Me: Pre-requisites and Learning Objectives
What is Monte Carlo?
Using Monte Carlo for Prediction
Using Monte Carlo for Action Values
Using Monte Carlo methods for generalized policy iteration
Solving the Blackjack Example
Epsilon-soft policies
Why does off-policy learning matter?
Importance Sampling
Off-Policy Monte Carlo Prediction
Emma Brunskill: Batch Reinforcement Learning
Week 1 Summary
Module 1 Learning Objectives
Weekly Reading
Chapter Summary
Graded Quiz
Temporal Difference Learning Methods for Prediction
What is Temporal Difference (TD) learning?
Rich Sutton: The Importance of TD Learning
The advantages of temporal difference learning
Comparing TD and Monte Carlo
Andy Barto and Rich Sutton: More on the History of RL
Week 2 Summary
Module 2 Learning Objectives
Weekly Reading
Practice Quiz
Temporal Difference Learning Methods for Control
Sarsa: GPI with TD
Sarsa in the Windy Grid World
What is Q-learning?
Q-learning in the Windy Grid World
How is Q-learning off-policy?
Expected Sarsa
Expected Sarsa in the Cliff World
Generality of Expected Sarsa
Week 3 Summary
Module 3 Learning Objectives
Weekly Reading
Chapter summary
Practice Quiz
Planning, Learning & Acting
What is a Model?
Comparing Sample and Distribution Models
Random Tabular Q-planning
The Dyna Architecture
The Dyna Algorithm
Dyna & Q-learning in a Simple Maze
What if the model is inaccurate?
In-depth with changing environments
Drew Bagnell: self-driving, robotics, and Model Based RL
Week 4 Summary
Congratulations!
Module 4 Learning Objectives
Weekly Reading
Chapter Summary
Text Book Part 1 Summary
Practice Assessment