Student Thesis

Open thesis as PDF.

Diffusion Model-based Imitation Learning for Robot Manipulation Task

Description

Diffusion models are powerful generative models that enable many successful applications, such as image, video, and 3D generation from texts. It's inspired by non-equilibrium thermodynamics, which defines a Markov chain of diffusion steps to slowly add random noise to data and then learn to reverse the diffusion process to construct desired data samples from the noise.

In this work, we aim to explore the application of the diffusion model or its variants in imitation learning and evaluate it on the real-world Franka robot arm.

Prerequisites

Good Programming Skills (Python, C++)
Knowledge about Ubuntu/Linux/ROS
Motivation to learn and conduct research

Contact

dong.yang@tum.de

(Please attach your CV and transcript)

Supervisor:

Dong Yang

Open thesis as PDF.

Comparative study of various hand-tracking approaches for Hand-Object Interaction in VR

Description

In this thesis, various hand-tracking approaches should be evaluated and compared for Hand-Object Interaction in Virtual Reality.

Prerequisites

- C++ and Python

- Ideally: Blender 3D software, experience with game development using Unreal Engine.

Supervisor:

Rahul Chaudhari

Open thesis as PDF.

text2anim: 3D Animation of Animal Figurines based on Natural Language Commands

Description

This topic is about translating natural language user commands -- e.g, "[animal] shakes off water as if it's just rained" -- to animations of animal figurines.

Prerequisites

Interest and first experiences in Computer Graphics, Blender, Python.

Supervisor:

Rahul Chaudhari

Open thesis as PDF.

Radar-based Material Classification

Keywords:
signal processing, machine learning, material classification

Description

The work focuses on radar-based material classification. Due to the rapid development of autonomous driving technology, drones, home robots, and various smart devices in recent years, material sensing has received more attention. Millimeter-wave radar has been widely installed on these platforms due to its low price and robustness in harsh environments. Therefore, in this work, we will study methods for classifying some common indoor materials using millimeter wave radar signals.

In this work, we will collect radar signals from some common indoor materials such as wood, metal, glass, etc. After obtaining the required features through radar signal processing methods, we will use some machine learning algorithms to classify the materials.

Prerequisites

Programming in Python

Knowledge about Machine Learning

Knowledge about signal processing, particularly on radar signal processing

Contact

mengchen.xiong@tum.de

(Please attach your CV and transcript)

Supervisor:

Mengchen Xiong

Open thesis as PDF.

Equivariant 3D Object Detection

Keywords:
3D Object Detection, Computer Vision, Deep Learning, Indoor Environments

Description

The thesis focuses on the application of equivariant deep learning techniques for 3D object detection in indoor scenes. Indoor environments, such as homes, offices, and industrial settings, present unique challenges for 3D object detection due to diverse object arrangements, varying lighting conditions, and occlusions. Traditional methods often struggle with these complexities, leading to suboptimal performance. The motivation for this research is to enhance the robustness and accuracy of 3D object detection in these environments, leveraging the inherent advantages of equivariant deep learning. This approach aims to improve the model's ability to recognize objects regardless of their orientation and position in the scene, which is crucial for applications in robotics, or augmented reality.

The thesis proposes the development of a deep learning model that incorporates equivariant neural networks for 3D object detection, such as the equivariant framework proposed in [1]. The proposed model will be evaluated on a benchmark 3D indoor dataset, such as the Stanford 3D Indoor Spaces Dataset (S3DIS) or the ScanNet dataset [2, 3].

References

[1] Deng, Congyue, et al. "Vector neurons: A general framework for so (3)-equivariant networks." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

[2] Dai, Angela, et al. "Scannet: Richly-annotated 3d reconstructions of indoor scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[3] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019).

Prerequisites

Python and Git
Experience with a deep learning framework (Pytorch, Tensorflow)
Interest in Computer Vision and Machine Learning

Supervisor:

Adam Misik

Open thesis as PDF.

3D Hand-Object Reconstruction from monocular RGB images

Keywords:
Computer Vision, Hand-Object Interaction

Description

Understanding human hand and object interaction is fundamental for meaningfully interpreting human action and behavior.

With the advent of deep learning and RGB-D sensors, pose estimation of isolated hands or objects has made significant progress.

However, despite a strong link to real applications such as augmented and virtual reality, joint reconstruction of hand and object has received relatively less attention.

This task focuses on accurately reconstructing hand-object interactions in three-dimensional space, given a single RGB image.

Prerequisites

Programming in Python
Knowledge about Deep Learning
Knowledge about Pytorch

Contact

xinguo.he@tum.de

Supervisor:

Xinguo He

Open thesis as PDF.

intuitive teleoperation and behavior understanding

Description

You need to follow our previous development and implement a demo for real-time human behavior understanding/prediction.

job 1: dataset generation

job 2: pipeline and demo implementation

job 3: algo development

requirements: knowledge of yolo, opencv, mediapipe, programming with python or C++

Supervisor:

Xiao Xu

Open thesis as PDF.

Real-Time 3D Object Tracking and Pose Estimation of Textureless Objects

Keywords:
computer vision, machine learning, digital twin

Description

Real time 3D tracking of objects using one or more cameras is crucial to build a Digital Twin. In this project, you will improve an algorithm for 3D tracking and pose estimation, and use it to update a Digital Twin of a factory environment that is used in robotic manipulation tasks.

We will pay special attention to the tracking of textureless objects and the speed of the algorithm. We will also try to compare the results using one and more cameras.

Prerequisites

For this work, good knowledge of C++ is required.

Some knowledge of Python and ROS will be useful, but it is not required.

Contact

diego.prado@tum.de

Supervisor:

Diego Fernandez Prado

Open thesis as PDF.

HiWI Position Project Lab Human Activity Understanding

Keywords:
deep-learning,ros,real-sense,python

Description

A HiWi position is available for the Lab Course Human Activity Understanding.

The position offers 6 h/week contract.

The lab involves:

Practical Sessions where the students collect data from a color/depth sensor setup.
Notebook Sessions where the students are introduced to a jupyter notebook with brief theoretical content and homework.
Project Sessions, where the students are working on their own projects.

The main tasks of this position involve the following:

Helping students with data collection in Practical and Project Sessions.
Assisting during the notebook sessions with regard to the contents of the notebooks and homework.

Prerequisites

Knowledge about ROS.
Knowledge about python.
Basic Knowledge in Deep Learning

Contact

marsil.zakour@tum.de

Supervisor:

Marsil Zakour

Student projects and final year projects at the Chair of Media Technology

Open Thesis

MA, FP: Diffusion Model-based Imitation Learning for Robot Manipulation Task

Diffusion Model-based Imitation Learning for Robot Manipulation Task

Description

Prerequisites

Contact

Supervisor:

BA, IDP: Comparative study of various hand-tracking approaches for Hand-Object Interaction in VR

Comparative study of various hand-tracking approaches for Hand-Object Interaction in VR

Description

Prerequisites

Supervisor:

BA, IDP: text2anim: 3D Animation of Animal Figurines based on Natural Language Commands

text2anim: 3D Animation of Animal Figurines based on Natural Language Commands

Description

Prerequisites

Supervisor:

MA, FP: Radar-based Material Classification

Radar-based Material Classification

Description

Prerequisites

Contact

Supervisor:

MA: Equivariant 3D Object Detection

Equivariant 3D Object Detection

Description

Prerequisites

Supervisor:

MA, FP: 3D Hand-Object Reconstruction from monocular RGB images

3D Hand-Object Reconstruction from monocular RGB images

Description

Prerequisites

Contact

Supervisor:

SHK: intuitive teleoperation and behavior understanding

intuitive teleoperation and behavior understanding

Description

Supervisor:

BA, MA, FP: Real-Time 3D Object Tracking and Pose Estimation of Textureless Objects

Real-Time 3D Object Tracking and Pose Estimation of Textureless Objects

Description

Prerequisites

Contact

Supervisor:

SHK: HiWI Position Project Lab Human Activity Understanding

HiWI Position Project Lab Human Activity Understanding

Description

Prerequisites

Contact

Supervisor: