Reinforcement Learning

What is Reinforcement Learning?

Part of machine learning, reinforcement learning deals with intelligent agents aiming to learn through trial and error. The learning operates through a feedback process, which can be either positive or negative. The operation of reinforcement learning develops in an interactive environment, and the goal of said intelligent agents is to maximise the reward from the trial and error feedback technique – feedback is acquired from the agent’s actions and experiences. Essentially, this type of machine learning is concerned with learning from mistakes.

The following are the basic fundamentals of a RL process:

Environment
The space in which the intelligent agents operate.
Reward function
The goal of the process, i.e. the feedback from the interactions of the agents.
State function
The current situation (state) of the intelligent agent.
Policy
The learning strategy of the intelligent agent.
Value
Long-term reward, i.e. future reward an intelligent agent would receive.
Actions
The moves of the intelligent agents within the environment.

Types of RL

Reinforcement learning can be categorised as two types, positive and negative.

Positive reinforcement learning refers to the boost of positive impact on the intelligent agent’s behaviour, leading to an increase of the behavioural tendency.

Negative reinforcement learning refers to the avoidance of negative conditions, which in turn increases the intelligent agent’s positive behaviour.

RL Implementations

There are three types, or implementations, of reinforcement learning. The following briefly describes each implementational type.

Model-based

In model-based RL, the intelligent agents operate in the environment through virtual modelling. Each problem can be unique and agents learn within the specific environment and model representation.

Value-based

Value-based RL aims to maximise (optimise) the value function – i.e. long-term reward.

Policy-based

Policy-based RL refers to the optimisation of strategic policy (either deterministic or stochastic), aiming to maximise the reward of the agent’s actions.

RL Framework

Two important concepts of reinforcement learning, both associated with dynamic programming, Markov Decision Process (MDP) and Bellman Equation. Dynamic programming tries to solve problems by simplifying and breaking them down into smaller subproblems.

Markov decision process (MDP) provides the mathematical framework for modelling decision making in reinforcement. learning. Most reinforcement leaning problems are defined with MDP.

Bellman equation is one of the main elements within RL, providing a mathematical optimisation equation for the values of decision making, considering previous states, at a particular state point.

RL Algorithms and Methods

The following are common algorithms of RL.

Q-Learning is a model-free algorithm, aiming to learn the value of an action considering the current state of the intelligent agent. It is a an off-policy algorithm, meaning the policies for improvement and evaluation are different.

Deep Q-Networks (DQN) is a Q-learning algorithm, with the usage of neural networks. DQN is an off-policy algorithm and deals with larger and more complex state space environments.

State-action-reward-state-action (SARSA) is an on-policy algorithm, which implies both acting and updating are covered under one policy. SARSA’s difference from Q-learning is in the updating policy, which depends on the current state, current action, reward, next state, and next action.
Monte Carlo methods learn from experience and adjust estimations after the final outcome is known. This methodology requires waiting of all interactions before proceeding to updating.

Temporal Difference (TD) are model-free reinforcement learning methods, combined by Dynamic Programming and Monte Carlo methods. TD proceeding with updating at each step and does not require waiting for the final outcome – essentially bootstrapping.

There are various algorithms and methodologies of reinforcement learning – the above are only a fraction.

Applications of RL

Computer games (gaming)
Marketing and advertising
Robotics and industrial automation
Healthcare and disease diagnostics
Enterprise resource management
Self-driving cars
Stock price predictions in finance

Next: Machine Learning Algorithms

by AICorr Team

We are proud to offer our extensive knowledge to you, for free. The AICorr Team puts a lot of effort in researching, testing, and writing the content within the platform (aicorr.com). We hope that you learn and progress forward.