Columbia University - Decision Making and Reinforcement Learning
- Offered byCoursera
Decision Making and Reinforcement Learning at Coursera Overview
Duration | 47 hours |
Start from | Start Now |
Total fee | Free |
Mode of learning | Online |
Official Website | Explore Free Course |
Credential | Certificate |
Decision Making and Reinforcement Learning at Coursera Highlights
- Earn a Certificate upon completion
Flexible deadlines
Coursera Labs
Decision Making and Reinforcement Learning at Coursera Course details
- This course is an introduction to sequential decision making and reinforcement learning
- We start with a discussion of utility theory to learn how preferences can be represented and modeled for decision making
- We first model simple decision problems as multi-armed bandit problems in and discuss several approaches to evaluate feedback
- The course will then model decision problems as finite Markov decision processes (MDPs), and discuss their solutions via dynamic programming algorithms
- Course will touch on the notion of partial observability in real problems, modeled by POMDPs and then solved by online planning methods
- Finally, we introduce the reinforcement learning problem and discuss two paradigms: Monte Carlo methods and temporal difference learning
- We conclude the course by noting how the two paradigms lie on a spectrum of n-step temporal difference methods
- An emphasis on algorithms and examples will be a key part of this course
Decision Making and Reinforcement Learning at Coursera Curriculum
Decision Making and Utility Theory
Introduction to Decision Making and Reinforcement Learning
Course Logistics
1.1 Rational Agents and Utility Theory
1.2 Preferences and Axioms of Utility Theory
1.3 Uncertain and Multi-Attribute Utilities
1.4 Value of Perfect Information
Course Syllabus
About the Instructor
Academic Honesty Policy
Discussion Forum Etiquette
Pre-Course Survey
Week 1 Lesson Materials
Utility Theory
Bandit Problems
2.1 Multi-Armed Bandits and Action Values
2.2 ?-Greedy Action Selection
2.3 Upper Confidence Bound
Week 2 Lesson Materials
Multi-Armed Bandit Problems
Markov Decision Processes
3.1 Markov Decision Process Framework
3.2 Gridworld Example
3.3 Rewards, Utilities, and Discounting
3.4 Policies and Value Functions
3.5 Example: Mini-Gridworld
3.6 Bellman Optimality Equations
Week 3 Lesson Materials
Sequential Decision Problems
Dynamic Programming
4.1 Time-Limited Values
4.2 Value Iteration
4.3 Value Iteration Implementation
4.4 Policy Iteration
4.5 Example: Mini-Gridworld
4.6 Algorithm Complexity
Week 4 Lesson Materials
Markov Decision Processes
Partially Observable Markov Decision Processes
5.1 Partial Observability and POMDP
5.2 Belief States
5.3 Belief Transition Model
5.4 Policies and Value Functions
5.5 Example: Mini-Gridworld
Week 5 Lesson Materials
POMDPs
Monte Carlo Methods
6.1 Monte Carlo Methods
6.2 First-Visit MC Prediction
6.3 State-Action Values
6.4 ?−Greedy On-Policy MC Control
6.5 On and Off-Policy MC Control
6.6 Example: Mini-Gridworld
Week 6 Lesson Materials
Monte Carlo RL
Temporal-Difference Learning
7.1 Temporal Difference Learning
7.2 Temporal Difference Prediction
7.3 Batch Updating
7.4 TD Learning for Control
7.5 SARSA vs Q-Learning
Week 7 Lesson Materials
Temporal Difference Learning
Reinforcement Learning - Generalization
8.1 n-step Temporal Difference Prediction
8.2 n-step SARSA
8.3 Model-Based Methods
8.4 Function Approximation
Week 8 Lesson Materials
Post-Course Survey
Generalization of Tabular Methods