Student projects and final year projects at the Chair of Media Technology

We constantly offer topics for student projects (engineering experience, research experience, student, IDPs) and final year projects (bachelor thesis or master thesis).

 

Open Thesis

Diffusion Model-based Imitation Learning for Robot Manipulation Task

Description

Diffusion models are powerful generative models that enable many successful applications, such as image, video, and 3D generation from texts. It's inspired by non-equilibrium thermodynamics, which defines a Markov chain of diffusion steps to slowly add random noise to data and then learn to reverse the diffusion process to construct desired data samples from the noise. 

In this work, we aim to explore the application of the diffusion model or its variants in imitation learning and evaluate it on the real-world Franka robot arm.

Prerequisites

  • Good Programming Skills (Python, C++)
  • Knowledge about Ubuntu/Linux/ROS
  • Motivation to learn and conduct research

Contact

dong.yang@tum.de

(Please attach your CV and transcript)

Supervisor:

Dong Yang

Comparative study of various hand-tracking approaches for Hand-Object Interaction in VR

Description

In this thesis, various hand-tracking approaches should be evaluated  and compared for Hand-Object Interaction in Virtual Reality.

Prerequisites

- C++ and Python

- Ideally: Blender 3D software, experience with game development using Unreal Engine.

Supervisor:

Rahul Chaudhari

text2anim: 3D Animation of Animal Figurines based on Natural Language Commands

Description

This topic is about translating natural language user commands -- e.g, "[animal] shakes off water as if it's just rained" -- to animations of animal figurines.

Prerequisites

Interest and first experiences in Computer Graphics, Blender, Python.

Supervisor:

Rahul Chaudhari

Radar-based Material Classification

Keywords:
signal processing, machine learning, material classification

Description

The work focuses on radar-based material classification. Due to the rapid development of autonomous driving technology, drones, home robots, and various smart devices in recent years, material sensing has received more attention. Millimeter-wave radar has been widely installed on these platforms due to its low price and robustness in harsh environments. Therefore, in this work, we will study methods for classifying some common indoor materials using millimeter wave radar signals.

In this work, we will collect radar signals from some common indoor materials such as wood, metal, glass, etc. After obtaining the required features through radar signal processing methods, we will use some machine learning algorithms to classify the materials. 

Prerequisites

Programming in Python

Knowledge about Machine Learning

Knowledge about signal processing, particularly on radar signal processing

Contact

mengchen.xiong@tum.de

(Please attach your CV and transcript)

Supervisor:

Ensuring Visual Coherency in LDMs

Description

Generative AI models are demonstrating strong performance in various domains. Models such as Stable Diffusion, trained using billions of images, are capable of generating highly realistic images based on text prompts or input images. Using either multiple text descriptions or images as input, completely different visual styles can be combined. However, the results are not always visually pleasing or coherent.

A common approach is to experiment with different prompts or input images until the desired visual result is achieved. This is a slow and manual method that potentially wastes significant computing resources. Instead of generating images and then assessing their visual coherence by inspection, this thesis is focused on automatically assessing the likelihood of pleasing visual results from the input alone. Specifically, we will focus on assessing the compatibility of inputs for the open-source image generation model Stable Diffusion.

For this, different text or image inputs are encoded into latent space and analyzed regarding their compatibility. A method needs to be developed that assesses the compatibility of the different inputs. Different metrics such as cosine similarity, FID, or Image Aesthetic Assessment (IAA) methods such as the CLIP score  can be used as a starting point. After assigning the input an aesthetic score, the same method should then be used to identify potential changes of the input to increase the predicted aesthetic. The goal of this work is to allow for designing inputs that contain the desired visual concepts, while maximizing the likelihood of visually pleasing and coherent outputs.

This thesis will be conducted externally at Sureel Inc., a startup specializing in secure and legal generative AI content.

Prerequisites

Requirements: Experience with Python and machine learning

Contact

christopher@sureel.ai

Supervisor:

Eckehard Steinbach - Christopher Kuhn (Sureel Inc.)

Equivariant 3D Object Detection

Keywords:
3D Object Detection, Computer Vision, Deep Learning, Indoor Environments

Description

The thesis focuses on the application of equivariant deep learning techniques for 3D object detection in indoor scenes. Indoor environments, such as homes, offices, and industrial settings, present unique challenges for 3D object detection due to diverse object arrangements, varying lighting conditions, and occlusions. Traditional methods often struggle with these complexities, leading to suboptimal performance. The motivation for this research is to enhance the robustness and accuracy of 3D object detection in these environments, leveraging the inherent advantages of equivariant deep learning. This approach aims to improve the model's ability to recognize objects regardless of their orientation and position in the scene, which is crucial for applications in robotics, or augmented reality. 

 

The thesis proposes the development of a deep learning model that incorporates equivariant neural networks for 3D object detection, such as the equivariant framework proposed in [1]. The proposed model will be evaluated on a benchmark 3D indoor dataset, such as the Stanford 3D Indoor Spaces Dataset (S3DIS) or the ScanNet dataset [2, 3].

 

References

[1] Deng, Congyue, et al. "Vector neurons: A general framework for so (3)-equivariant networks." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

[2] Dai, Angela, et al. "Scannet: Richly-annotated 3d reconstructions of indoor scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[3] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019).

Prerequisites

  • Python and Git
  • Experience with a deep learning framework (Pytorch, Tensorflow)
  • Interest in Computer Vision and Machine Learning

Supervisor:

Adam Misik

3D Hand-Object Reconstruction from monocular RGB images

Keywords:
Computer Vision, Hand-Object Interaction

Description

Understanding human hand and object interaction is fundamental for meaningfully interpreting human action and behavior.

With the advent of deep learning and RGB-D sensors, pose estimation of isolated hands or objects has made significant progress.

However, despite a strong link to real applications such as augmented and virtual reality, joint reconstruction of hand and object has received relatively less attention.

This task focuses on accurately reconstructing hand-object interactions in three-dimensional space, given a single RGB image.

 

Prerequisites

  • Programming in Python
  • Knowledge about Deep Learning
  • Knowledge about Pytorch

Contact

xinguo.he@tum.de

Supervisor:

Xinguo He

intuitive teleoperation and behavior understanding

Description

You need to follow our previous development and implement a demo for real-time human behavior understanding/prediction.

job 1: dataset generation

job 2: pipeline and demo implementation

job 3: algo development

 requirements: knowledge of yolo, opencv, mediapipe, programming with python or C++

Supervisor:

Real-Time 3D Object Tracking and Pose Estimation of Textureless Objects

Keywords:
computer vision, machine learning, digital twin

Description

Real time 3D tracking of objects using one or more cameras is crucial to build a Digital Twin. In this project, you will improve an algorithm for 3D tracking and pose estimation, and use it to update a Digital Twin of a factory environment that is used in robotic manipulation tasks.

We will pay special attention to the tracking of textureless objects and the speed of the algorithm. We will also try to compare the results using one and more cameras.

Prerequisites

For this work, good knowledge of C++ is required.

Some knowledge of Python and ROS will be useful, but it is not required.

Contact

diego.prado@tum.de

Supervisor:

Diego Fernandez Prado

HiWI Position Project Lab Human Activity Understanding

Keywords:
deep-learning,ros,real-sense,python

Description

A HiWi position is available for the Lab Course Human Activity Understanding

 

The position offers 6 h/week contract.

 

The lab involves:

  •  Practical Sessions where the students collect data from a color/depth sensor setup.
  •  Notebook Sessions where the students are introduced to a jupyter notebook with brief theoretical content and homework.
  • Project Sessions, where the students are working on their own projects.  

The main tasks of this position involve the following:

  • Helping students with data collection in Practical and Project Sessions.
  • Assisting during the notebook sessions with regard to the contents of the notebooks and homework. 

Prerequisites

  • Knowledge about ROS.
  • Knowledge about python.
  • Basic Knowledge in Deep Learning

Contact

marsil.zakour@tum.de

Supervisor:

Marsil Zakour

 

You can find important information about writing your thesis, giving a talk at LMT as well as templates for Powerpoint and LaTeX here.