Fundamentals of Reinforcement Learning Training Course


Reinforcement Learning (RL) is a machine learning technique in which a computer program (agent) learns to behave in an environment by performing the actions and receiving feedback on the results of the actions. For each good action, the agent receives positive feedback, and for each bad action, the agent receives negative feedback (penalty).

This instructor-led, live training (online or onsite) is aimed at data scientists who wish to go beyond traditional machine learning approaches to teach a computer program to figure out things (solve problems) without the use of labeled data and big data sets.

By the end of this training, participants will be able to:

  • Install and apply the libraries and programming language needed to implement Reinforcement Learning.
  • Create a software agent that is capable of learning through feedback instead of through supervised learning.
  • Program an agent to solve problems where decision making is sequential and finite.
  • Apply knowledge to design software that can learn in a way similar to how humans learn.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.


  • Experience with machine learning
  • Programming experience


  • Data scientists

Course Outline


  • Learning through positive reinforcement

Elements of Reinforcement Learning

Important Terms (Actions, States, Rewards, Policy, Value, Q-Value, etc.)

Overview of Tabular Solutions Methods

Creating a Software Agent

Understanding Value-based, Policy-based, and Model-based Approaches

Working with the Markov Decision Process (MDP)

How Policies Define an Agent’s Way of Behaving

Using Monte Carlo Methods

Temporal-Difference Learning

n-step Bootstrapping

Approximate Solution Methods

On-policy Prediction with Approximation

On-policy Control with Approximation

Off-policy Methods with Approximation

Understanding Eligibility Traces

Using Policy Gradient Methods

Summary and Conclusion

Leave a Reply

Your email address will not be published. Required fields are marked *