Coursera
Coursera Logo

Sample-based Learning Methods 

  • Offered byCoursera

Sample-based Learning Methods
 at 
Coursera 
Overview

Duration

22 hours

Start from

Start Now

Total fee

Free

Mode of learning

Online

Difficulty level

Intermediate

Official Website

Explore Free Course External Link Icon

Credential

Certificate

Sample-based Learning Methods
 at 
Coursera 
Highlights

  • Shareable Certificate Earn a Certificate upon completion
  • 100% online Start instantly and learn at your own schedule.
  • Course 2 of 4 in the Reinforcement Learning Specialization
  • Flexible deadlines Reset deadlines in accordance to your schedule.
  • Intermediate Level Probabilities & Expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), implementing algorithms from pseudocode
  • Approx. 22 hours to complete
  • English Subtitles: Arabic, French, Portuguese (European), Italian, Vietnamese, German, Russian, English, Spanish
Read more
Details Icon

Sample-based Learning Methods
 at 
Coursera 
Course details

More about this course
  • In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent's own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment's dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.
  • By the end of this course you will be able to:
  • - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience
  • - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model
  • - Understand the connections between Monte Carlo and Dynamic Programming and TD.
  • - Implement and apply the TD algorithm, for estimating value functions
  • - Implement and apply Expected Sarsa and Q-learning (two TD methods for control)
  • - Understand the difference between on-policy and off-policy control
  • - Understand planning with simulated experience (as opposed to classic planning strategies)
  • - Implement a model-based approach to RL, called Dyna, which uses simulated experience
  • - Conduct an empirical study to see the improvements in sample efficiency when using Dyna
Read more

Sample-based Learning Methods
 at 
Coursera 
Curriculum

Welcome to the Course!

Course Introduction

Meet your instructors!

Reinforcement Learning Textbook

Read Me: Pre-requisites and Learning Objectives

What is Monte Carlo?

Using Monte Carlo for Prediction

Using Monte Carlo for Action Values

Using Monte Carlo methods for generalized policy iteration

Solving the Blackjack Example

Epsilon-soft policies

Why does off-policy learning matter?

Importance Sampling

Off-Policy Monte Carlo Prediction

Emma Brunskill: Batch Reinforcement Learning

Week 1 Summary

Module 1 Learning Objectives

Weekly Reading

Chapter Summary

Graded Quiz

Temporal Difference Learning Methods for Prediction

What is Temporal Difference (TD) learning?

Rich Sutton: The Importance of TD Learning

The advantages of temporal difference learning

Comparing TD and Monte Carlo

Andy Barto and Rich Sutton: More on the History of RL

Week 2 Summary

Module 2 Learning Objectives

Weekly Reading

Practice Quiz

Temporal Difference Learning Methods for Control

Sarsa: GPI with TD

Sarsa in the Windy Grid World

What is Q-learning?

Q-learning in the Windy Grid World

How is Q-learning off-policy?

Expected Sarsa

Expected Sarsa in the Cliff World

Generality of Expected Sarsa

Week 3 Summary

Module 3 Learning Objectives

Weekly Reading

Chapter summary

Practice Quiz

Planning, Learning & Acting

What is a Model?

Comparing Sample and Distribution Models

Random Tabular Q-planning

The Dyna Architecture

The Dyna Algorithm

Dyna & Q-learning in a Simple Maze

What if the model is inaccurate?

In-depth with changing environments

Drew Bagnell: self-driving, robotics, and Model Based RL

Week 4 Summary

Congratulations!

Module 4 Learning Objectives

Weekly Reading

Chapter Summary

Text Book Part 1 Summary

Practice Assessment

Sample-based Learning Methods
 at 
Coursera 
Admission Process

    Important Dates

    May 25, 2024
    Course Commencement Date

    Other courses offered by Coursera

    – / –
    3 months
    Beginner
    – / –
    20 hours
    Beginner
    – / –
    2 months
    Beginner
    – / –
    3 months
    Beginner
    View Other 6715 CoursesRight Arrow Icon
    qna

    Sample-based Learning Methods
     at 
    Coursera 

    Student Forum

    chatAnything you would want to ask experts?
    Write here...