Approximate Dynamic Programming and Reinforcement Learning

Approximate Dynamic Programming and Reinforcement Learning
Dozent:	Hao Shen
Assistent:	Stephan Rappensperger & Martin Gottwald
Contact:	adpr.ldvl@xcit.tum.de
Zielgruppe:	Master Elektrotechnik und Informationstechnik
ECTS:	6
Umfang:	Ergänzungsvorlesung (2/2/1 SWS Vorlesung/Vorlesung mit Integrierter Übung/Fragestunde)
Turnus:	Wintersemester
Anmeldung:	von 15.09.2023 bis 09.02.2024 via TUMonline
Zeit & Ort:	Vorlesung: Donnerstags 13.15 - 14.45 Übung: Montags 15.00 - 16.30 Raum (beide): Bestelmayer Süd (Z2)
Beginn Vorlesung:	19.10.2023
Beginn Übung:	23.10.2023
Hinweis:	Falls Sie an den Kursmaterialien interessiert sind: es gibt einen Gastzugang zum Moodlekurs, das Passwort ist das Universalwerkzeug der Linearen Algebra. z.B. für das Lösen überbestimmter Gleichungssysteme (drei Buchstaben, alle groß, englischer Name)

Content

Approximate Dynamic Programming (ADP) and Reinforcement Learning (RL) are two closely related paradigms for solving sequential decision making problems. ADP methods tackle the problems by developing optimal control methods that adapt to uncertain systems over time, while RL algorithms take the perspective of an agent that optimizes its behavior by interacting with its environment and learning from the feedback received. Both technologies have succeeded in applications of operation research, robotics, game playing, network management, and computational intelligence.

We will cover the following topics (not exclusively):

Markov Decision Processes
Dynamic programming
Approximate Dynamic Programming
Reinforcement Learning
Policy Gradient Algorithms (if time permits)

On completion of this course, students are able to:

describe classic scenarios in sequential decision making problems;
explain basic models of ADP/RL methods;
derive ADP/RL algorithms that are covered in the course;
characterize convergence properties of the ADP/RL algorithms covered in the course;
compare performance of the ADP/RL algorithms that are covered in the course;
select proper ADP/RL algorithms in accordance with specific applications.

Registration

The course communication will be handled through the moodle page or the course e-mail address mentioned above.

Register for the lecture with integrated excercises. The question session is a placeholder in Tumonline and will take place whenever needed.
Use the link with guest access at the top to access our material without getting enrolled.