Applied Reinforcement Learning
|Assistant:||Martin Gottwald and Zhiwei Han|
|Targeted Audience:||Wahlfach, Ergänzungsvorlesung (Master)|
|Umfang:||2/2 (SWS Lecture/Tutorial)|
|Registration phase:||10.02.2023 - 31.03.2023|
|Time & Place:|
03.04.2023 - 05.04.2023 and 12.04.2023 - 14.04.2023
|Regular Question Session:||Wednesday, Z995 or in NavigaTUM ,15:00 - 16:30|
|Extra Question Session:||on request|
|Guest access to Moodle||Link (Password is the universal tool in Linear Algebra, e.g. for solving overdetermined linear equation systems. Three characters, all upper case, englisch Name.)|
- The room for the lecture has been changed to "Hörsaal 2760@0507" due to a broken canvas, please use the new link in the table to get to the room (03.04.23)
- The registration phase has been adjusted to match TUMonline (13.02.2023)
- The room of the block course has been adjusted (30.01.2023)
Reinforcement learning (RL) is one approach for solving sequential decision making problems. A reinforcement learning agent interacts with its environment and uses its experience to make decisions towards solving the problem. The technique has succeeded in various applications of operation research, robotics, game playing, network management, and computational intelligence.
This lecture provides an overview of basic concepts, practical techniques, and programming tools used in reinforcement learning. Specifically, it focuses on the application aspects of the subject, such as problem solving and implementations. By design, it aims to complement the theoretical treatment of the subject, such as mathematical derivation, convergence proves, and bound analysis, which are covered in the lecture "Approximate Dynamic Programming and Reinforcement Learning" in winter semesters.
In this lecture, we will cover the following topics:
- Reinforcement learning problems as Markov decision processes
- Dynamic programming (value iteration and policy iteration)
- Monte Carlo reinforcement learning methods
- Temporal difference learning (SARSA and Q-learning)
- Simulation-based reinforcement learning algorithms
- Linear value function approximation, e.g. tile coding
We will not cover:
- Deep Reinforcement Learning in any flavor
- Deep function approximation architectures that change during the learning process
The excessive tuning of hyper parameters exceeds the time and computational constraints of the lecture.
The goal of the lecture is, that students are able to master the application of reinforcement learning algorithms for solving continuous control problems. This involves:
- describe classic scenarios of reinforcement learning problems
- explain basics of reinforcement learning methods
- select proper reinforcement learning algorithms in accordance with specific problems, and argue their choices
- construct and implement reinforcement learning algorithms
- model an engineering problem as Markov Decision Process
- design and plan experiments
- analyze and evaluate systematically the outcome of experiments
- compare performance of the reinforcement learning algorithms covered by the course
- creating a concise report / documentation of the project and their results.
The reinforcement learning algorithms will be applied on a simulated game in groups of three.
The course consists of two phases:
- block lecture before the semester starts:
- frontal teaching sessions in the morning covering the RL basics
- practical coding part with discussions and individual feedback after lunch
- project phase:
- weekly consulting sessions (two hours per week) throughout the semester, where we will be available for questions and feedback
- Additional sessions if requested and required
Due to the required feedback for the reports and the project nature of the course, we have to restrict the number of students to 15. Please mind the following procedure:
- If you have interest in the course, register in TUMOnline for the waiting list
- Room and Location are given in the table at the top
- Visit the block lecture before the semester starts and participate in the hands-on part to demonstrate your motivation
- The slots for the actual course (project and reports during semester) will be filled from all active and motivated students. The order is still determined by TUMonline and its lottery.
- Once you were signed up for the actual course, you are enrolled and thus block a slot. If you quit the course later on, then you will prevent other students from taking the course, thus: Only sign up if you are sure to stay in the course for the whole semester!
- Sutton, R. S. & Barto, A. G., Reinforcement Learning: An Introduction. The MIT Press, 1998 (or the new version)
- Bertsekas, D. P. & Tsitsiklis, J., Neuro-dynamic programming. Athena Scientific, 1996
- Bertsekas, D. P., Dynamic Programming and Optimal Control Vol. 1 & 2.
- Szepesvári, S., Algorithms for Reinforcement Learning. Morgan & Claypool, 2010 (a draft)
Target Audience and Signup
Students in a Masters degree program. Registration via TUMOnline.