Reinforcement Learning – A Complete Guide

Reinforcement Learning – A Complete Guide

5 mins read441 Views Comment
Updated on Dec 22, 2021 10:42 IST

With the evolution in the domain of healthcare, robotics, games, etc. over the past few years in modern-day technology, reinforcement learning has evolved as a potent technique for solving problems that require intelligent successive decision-making based on reward and local environment.

2021_12_iStock-1205428317.jpg

What is Reinforcement Learning (RL)?

Reinforcement Learning is a division of machine learning that utilizes a reward-based system .i.e. attempting to maximize the reward to be collected from the local environment setting.

It comprises of four necessary components –

Agent – The algorithm created that will get trained and perform necessary decisions. For example, PacMan in the PacMan video game can be considered as an agent.
Environment – The virtual or real environment setting, where the agent is present. For example, Maze with enemies and points in the PacMan video game can be regarded as an environment.
Action – All possible moves that an agent can make in the given environment. For example, up-down, left-right motion in any game.
Reward – Estimation of a move made by an agent in the environment that can be negative or positive.

Must Read – Top 10 Machine Learning Algorithms for Beginners

A basic reinforcement learning based-system design requires an interaction between a local environment and an agent, where the agent determines what action to execute based on a scalar reward R (assuming agent executes an Action A) and the perceived environment is known as E.  The environment acquires the action from the agent and sends forth a new different observation O and reward R. Furthermore to understand this, let’s take an example of a highly successful snake game that came out in early phones.

In that game Snake (agent) collects information from the Game (environment) about the Fruit (reward) and by taking the right actions .i.e. by going towards the Fruit without touching the edges, earns the reward. After this collection of one reward, the Game (environment) gives out new information about another reward to the Snake (agent) and this game continues until one touches the edges or there’s no Fruit (reward) left for the Snake (agent) to achieve.

2021_12_reinforcement-learning-based-system.jpg

Interesting Read – Best Machine Learning Books for Beginners

Reinforcement learning can be implemented through the approaches given below –

Value-Based: In a value-based method, the focus is on maximizing a value function. Now, an agent via a strategy awaits a long-term return based on the present states like a greedy function.

Policy-Based: The primary goal of this method is to come up with a scheme that will help in acquiring maximum rewards through actions completed in all states in the future. This can be categorized into 2 methods:- Stochastic and Deterministic

Model-Based: In model-based, to help with learning to execute in each particular environment a virtual model is created for the agent

Model Free: Similar to model-based but there’s no virtual model created

Actor-Critic: Uses a combination of policy and value functions.

Different Types of Neural Networks in Deep Learning

Comparison between Supervised, Unsupervised, and Reinforcement Learning 

No defined right answer – The goal of unsupervised learning is used to search for relevant patterns or groups based on a pattern. Likewise, in supervised learning, the objective is to predict/classify outcomes based on a given dataset. In RL, the agent can at best learn by hit or miss experimentation. The sole direction is the reward of the action i.e. if a reward is positive the action performed was right and, if the reward is negative then the action performed was incorrect, pointing out whether an agent is making growth or not.

Stable V/s Unstable – In supervised learning, the primary goal is to compute a quantity or classify it, similarly in unsupervised learning aim is to search and learn about patterns existing in a dataset; both of which can be considered as stable as there is a constant specified way for computing outcome. In contrast, reinforcement learning develops a strategy, an approach that deduces the “right action” for every step; through this, we can contemplate RL as unstable.

Multi-Decision Process – Reinforcement learning computes the outcome (maximum reward) through a sequence of multiple-small decisions whereas supervised and unsupervised learning is a one-decision process: one prediction for one instance.

Essential Exploration – Supervised and unsupervised learning calculates the result directly from the trained model without exploring further results. Reinforcement learning, on the other hand, needs to strike a balance between new ways to get rewards, environment exploration, and making the most out of discovered reward sources.

Applications of Reinforcement learning

Games – RL is well-known for being the leading algorithm used to challenge gamers and to accomplish phenomenal performances. Like in the field of chess famous chess program AlphaZero Go, trained by playing against itself over and over again and learning with the help of reinforcement learning, by strictly following rules without any human help.

Healthcare – In healthcare, Reinforcement Learning is tagged as dynamic treatment regimes (DTRs). In this, a batch of clinical evaluation and inspections of a patient is taken as input and the output is the treatment plans for every single stage. With help of RL improvement can be seen in long-term effects as it factors the delayed outcomes of treatments.

Robotics – RL is used widely in robotics, they can be trained to incorporate the ability to hold different objects without knowing the object, hence can be used for making products in a factory assembly line.

Trading – Giant corporations in the financial industry make use of RL to do financial trading. These major companies first use supervised learning to predict future sales and the stock prices however it can’t decide optimally which stocks to buy, hold or sell, so here RL comes into play by taking the right decision.

Natural Language Processing (NLP) – RL is employed in different types of NLP tasks such as machine translation, question-answering (like in Chatbots), and text summarization, etc. For instance, in text summarization, a combination of RL and supervised learning is used to generate an efficient readable summarized text from long verses.

You May Like – Most Popular Regression in Machine Learning Techniques

Limitations of Reinforcement Learning

  • RL is appropriate for solving elaborate types of problems rather than straightforward uniform problems.
  • Beyond a limit, RL can lead to diminished outcomes due to an overload of states (observations).
  • A huge amount of data is mandatory for RL, hence resulting in a large number of computations.
  • Assumption of RL that the world is Markovian .i.e. in a sequence of all feasible events, the odds of each event strictly depends on the state achieved in the previous event.

————————————————————————————————————–

If you have recently completed a professional course/certification, click here to submit a review.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio