Applied Reinforcement Learning

Applied Reinforcement Learning
Lecturer: Hao Shen
Assistant: Martin Gottwald
Targeted Audience: Wahlfach, Ergänzungsvorlesung (Master)
Umfang: 2/2 (SWS Lecture/Tutorial)
Term: Summer
Registration phase: 15.02.2022 - 14.04.2022
Time & Place:  
Lecture: 19 - 22.04.2022 (4 days in total)
Online lecture, 9:00 - 17:00 h
Regular Question Session: during Semester Thursdays, 13:15 - 14:45
Extra Question Session: during Semester Thursdays, 10:00 - 12:00 (if requested)
First session in Semester: see TUMonline calendar


The course will take place despite Corona in a pure online format.

Unfortunately, this means it is not possible for you to use our (physical) robots. We replace them by a simulator and adapt the projects accordingly.


Reinforcement learning (RL) is one most powerful approach in solving sequential decision making problems. A reinforcement learning agent interacts with its environment and uses its experience to make decisions towards solving the problem. The technique has succeeded in various applications of operation research, robotics, game playing, network management, and computational intelligence.

This lecture provides an overview of basic concepts, practical techniques, and programming tools used in reinforcement learning. Specifically, it focuses on the application aspects of the subject, such as problem solving and implementations. By design, it aims to complement the theoretical treatment of the subject, such as mathematical derivation, convergence proves, and bound analysis, which are covered in the lecture "Approximate Dynamic Programming and Reinforcement Learning" in winter semesters.

In this lecture, we will cover the following topics (not exclusively):

  • Reinforcement learning problems as Markov decision processes
  • Dynamic programming (value iteration and policy iteration)
  • Monte Carlo reinforcement learning methods
  • Temporal difference learning (SARSA and Q learning)
  • Simulation-based reinforcement learning algorithms
  • Linear value function approximation, e.g. tile coding

We will not cover:

  • Deep Reinforcement Learning in any flavor
  • Deep function approximation architectures that change during the learning process

The excessive tuning of hyper parameters exceeds the time and computational constraints of the lecture.

The course project is done in groups of three, each group works on a physical robot. Currently we can provide:

  • Poppy Humanoid
  • Poppy Ergo
  • Stem Kit Level 1 & 2
  • Turtlebot
  • Metabot V2
  • E-Puck

It is possible to extend the existing robots during the project ( e.g. add new sensors, more construction parts, addtional equipment required for projects etc. ).

On completion of this course, students are able to:

  • describe classic scenarios of reinforcement learning problems;
  • explain basics of reinforcement learning methods;
  • model real engineering problems using reinforcement learning methods;
  • compare performance of the reinforcement learning algorithms that are covered in the course practically in the specific projects;
  • select proper reinforcement learning algorithms in accordance with specific problems, and argue their choices;
  • construct and implement reinforcement learning algorithms to solve simple robotics problems on physical systems

Registration Details

Due to the limited number of available robots, the number of participants has to be restricted. Please mind the following procedure:

  • If you have interest in the course, sign up on TUMOnline for the waiting list
  • Visit the block lecture before the semester starts and participate in the hands-on part (mandatory)
  • The slots for the actual course (lab during semester) will be filled from all active students. The order is determined by TUMonline and its lottery.
  • Once you were signed up for the actual course, you are enrolled and thus block a robot.
    If you quit the course lateron you will prevent other students from taking the course:
    Only sign up if you are sure to stay in the course for the whole semester!

Lecture Details

The course consists of two phases:

  1. four day block lecture before the semester starts:
    1. frontal teaching sessions in the morning
    2. practical part with discussions after lunch
  2. project phase:
    1. weekly tutorial sessions (two hours per week) throughout the semester, where we will be available for questions and feedback
    2. Additional sessions if requested / required


  • Sutton, R. S. & Barto, A. G., Reinforcement Learning: An Introduction. The MIT Press, 1998 (or the new version)
  • Bertsekas, D. P. & Tsitsiklis, J., Neuro-dynamic programming. Athena Scientific, 1996
  • Bertsekas, D. P., Dynamic Programming and Optimal Control Vol. 1 & 2.
  • Szepesvári, S., Algorithms for Reinforcement Learning. Morgan & Claypool, 2010 (a draft)

Target Audience and Signup

Students in a Masters degree program. Registration via TUMOnline.