Generative Multimedia
| Lecturer (assistant) | |
|---|---|
| Number | 0000001215 |
| Type | lecture with integrated exercises |
| Duration | 4 SWS |
| Term | Sommersemester 2026 |
| Language of instruction | English |
| Position within curricula | See TUMonline |
| Dates | See TUMonline |
- 17.04.2026 15:00-16:30 0406, Seminarraum
- 24.04.2026 09:15-10:45 0943, Praktikum
- 24.04.2026 15:00-16:30 0406, Seminarraum
- 08.05.2026 09:15-10:45 0943, Praktikum
- 08.05.2026 15:00-16:30 0406, Seminarraum
- 15.05.2026 09:15-10:45 0943, Praktikum
- 15.05.2026 15:00-16:30 0406, Seminarraum
- 22.05.2026 09:15-10:45 0943, Praktikum
- 22.05.2026 15:00-16:30 0406, Seminarraum
- 29.05.2026 09:15-10:45 0943, Praktikum
- 29.05.2026 15:00-16:30 0406, Seminarraum
- 05.06.2026 09:15-10:45 0943, Praktikum
- 05.06.2026 15:00-16:30 0406, Seminarraum
- 12.06.2026 09:15-10:45 0943, Praktikum
- 12.06.2026 15:00-16:30 0406, Seminarraum
- 19.06.2026 09:15-10:45 0943, Praktikum
- 19.06.2026 15:00-16:30 0406, Seminarraum
- 26.06.2026 09:15-10:45 0943, Praktikum
- 26.06.2026 15:00-16:30 0406, Seminarraum
- 03.07.2026 09:15-10:45 0943, Praktikum
- 03.07.2026 15:00-16:30 0406, Seminarraum
- 10.07.2026 09:15-10:45 0943, Praktikum
- 10.07.2026 15:00-16:30 0406, Seminarraum
- 17.07.2026 09:15-10:45 0943, Praktikum
- 17.07.2026 15:00-16:30 0406, Seminarraum
Admission information
Objectives
- Understand the principles and architectures of modern generative models, including VAEs, GANs, Transformers, diffusion models, and energy-based models, as well as their strengths, limitations, and applications in multimedia generation
- Implement generative models with a focus on generating multimedia -- images, text, and music, and design training strategies for them.
- Critique state-of-the-art generative AI research, including multimodal models and large language models.
- Implement generative models with a focus on generating multimedia -- images, text, and music, and design training strategies for them.
- Critique state-of-the-art generative AI research, including multimodal models and large language models.
Description
- Introduction to probability theory, generative modeling, and Deep Learning
- Generative methods and models: theory, implementation, and applications of Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Energy-based models and normalizing flows, Transformers, Diffusion models
- Large language models, Multimodal models (language-image), Generative world models for reinforcement learning
- Creative applications: image synthesis, text-to-image generation, and music composition
- Tiny Generative Models for Edge Devices (challenges and techniques for running generative models on resource-constrained hardware, model compression, quantization, and software frameworks
- Generative methods and models: theory, implementation, and applications of Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Energy-based models and normalizing flows, Transformers, Diffusion models
- Large language models, Multimodal models (language-image), Generative world models for reinforcement learning
- Creative applications: image synthesis, text-to-image generation, and music composition
- Tiny Generative Models for Edge Devices (challenges and techniques for running generative models on resource-constrained hardware, model compression, quantization, and software frameworks
Prerequisites
Probability theory, Linear Algebra, Optimization
Teaching and learning methods
Lectures introduce theoretical concepts, providing context, and discussing real-world applications. These sessions are interactive, incorporating Q&A and discussions to deepen understanding.
Lab sessions are dedicated to hands-on implementation, where students apply theoretical knowledge to coding assignments. Labs are synchronized with lecture topics to reinforce learning.
Teaching methods
- Presentations and lecture notes
- Interactive coding sessions
- Research paper discussions
- Guest lectures from industry experts (if available)
Learning methods
- Tutorials with programming assignments to build and train generative models
- Essays summarizing and critiquing cutting-edge research papers
Lab sessions are dedicated to hands-on implementation, where students apply theoretical knowledge to coding assignments. Labs are synchronized with lecture topics to reinforce learning.
Teaching methods
- Presentations and lecture notes
- Interactive coding sessions
- Research paper discussions
- Guest lectures from industry experts (if available)
Learning methods
- Tutorials with programming assignments to build and train generative models
- Essays summarizing and critiquing cutting-edge research papers
Examination
Final Exam [100% of the final grade]: A 120-minute written exam (without aids) assesses students' theoretical knowledge, including mathematical problem formulations and derivations, model architectures, and training strategies.
Programming Assignments: Students implement and train generative models (e.g., VAEs, GANs, diffusion models), showcasing their ability to translate theory into practice. Successful completion of the assignments leads to a bonus of 0.3 on the final grade in case the final exam is passed.
Programming Assignments: Students implement and train generative models (e.g., VAEs, GANs, diffusion models), showcasing their ability to translate theory into practice. Successful completion of the assignments leads to a bonus of 0.3 on the final grade in case the final exam is passed.
Recommended literature
- Generative Deep Learning: Teaching Machines To Paint, Write, Compose, and Play, 2nd Edition by David Foster, O’Reilly Media, June 2023
- Understanding Deep Learning by Simon Prince, MIT Press, December 2023
- Further references will be distributed through lecture slides
- Understanding Deep Learning by Simon Prince, MIT Press, December 2023
- Further references will be distributed through lecture slides
Links
Previous Lab Project Demos
Kick off meeting announcement
The lab kick-off meeting will be in-person on 17.10.2024 at the Seminar Room 0406 https://nav.tum.de/room/0504.EG.406 (from 15:00 to 16:30).