Student projects and final year projects at the Chair of Media Technology

We constantly offer topics for student projects (engineering experience, research experience, student, IDPs) and final year projects (bachelor thesis or master thesis).

 

Open Thesis

3D Hand-Object Reconstruction from monocular RGB images

Keywords:
Computer Vision, Hand-Object Interaction

Description

Understanding human hand and object interaction is fundamental for meaningfully interpreting human action and behavior.

With the advent of deep learning and RGB-D sensors, pose estimation of isolated hands or objects has made significant progress.

However, despite a strong link to real applications such as augmented and virtual reality, joint reconstruction of hand and object has received relatively less attention.

This task focuses on accurately reconstructing hand-object interactions in three-dimensional space, given a single RGB image.

 

Prerequisites

  • Programming in Python
  • Knowledge about Deep Learning
  • Knowledge about Pytorch

Contact

xinguo.he@tum.de

Supervisor:

Xinguo He

development of deep learning toolkit

Description

Your task is to implement a series of deep-learning applications, toolkits, and the necessary database.

 

requirements: knowledge of DL, pytorch, programming with python or C++

Supervisor:

Robust Hand-Object Pose estimation from Multi-view 2D Keypoints

Description

Hand-object pose estimation is a challenging task due to multiple factors like occlusion, and ambiguity in pose recovery.  To overcome this issue, multi-view camera systems are used.

Using 2D keypoint detectors for hands and objects like  Yolov8-pose and mmpose we can uplift the 2D detections to 3D. However, the detections usually are usually noisy, and some keypoints may be missing.  

We want to utilize deep learning methods for smoothing, inpainting, and uplifting these detections to 3D in order to estimate the pose of the corresponding hands and objects.

The task is formulated as follows:

Given a sequence of noisy 2D key points for human hands and an object captured from calibrated camera views. Using a deep learning model, estimate a smooth trajectory of the hand and object poses.

 

Prerequisites

  • Python
  • Knowledge about Deep Learning
  • Knowledge about Pytorch
  • Previous Knowledge about 3D data processing is a plus.

Contact

marsil.zakour@tum.de

Supervisor:

Marsil Zakour

Realistic differentiable hand model for synthesizing hand-object interaction data

Description

Our hand-object interaction simulator currently uses a hand mesh driven by a bone rig. This approach has several limitations. The size and shape of the hand cannot be easily controlled. The hand texture cannot be easily changed to another realistic texture. In this thesis, the above limitations should be addressed by relying on the components below.

To replace the fixed hand mesh with a differentiable hand model and drive the hand using motion capture data: https://handover-sim.github.io/

To let the virtual hand interact physically with virtual objects (turns the hand into an articulated rigid body after the shape deformation): https://github.com/ikalevatykh/mano_pybullet

For integrating hand textures from real hands: https://handtracker.mpi-inf.mpg.de/projects/HandTextureModel/

 

Prerequisites

  • Strong familiarity with Python programming and linux environment
  • Interest and first experiences in Computer Graphics, VR, Computer Vision, and Deep Learning.
  • Ideally also interest and experience in Blender 3D software

 

Supervisor:

Rahul Chaudhari

Interactive story generation with visual input

Description

Conventional stories for children of ages 3—6 years are static, independent of the medium (text, video, audio). We aim to make stories interactive, by giving the user control over characters, objects, scenes, and timing. This will lead to the construction of novel, unique, and personalized stories situated in (partially) familiar environments. We restrict this objective to specific domains consisting of a coherent body of works, such as the children’s book series “Meine Freundin Conni”. The challenges in this thesis include finding a suitable knowledge representation for the domain, learning that representation automatically, and inferring a novel storyline over that representation with active user interaction. In this direction, both neural as well as symbolic approaches should be explored.

So far we have implemented a text-based interactive story generation system based on Large Language Models. In this thesis, the text input modality should be replaced by visual input. In particular, the story should be driven by real-world motion of figurines and objects, rather than an abstract textual description of the scene and its dynamics.

 

Prerequisites

- First experiences with 2D/3D Computer Vision and Computer Graphics

- Familiarity with AI incl. Deep Learning (university courses / practical experience)

- Programming in Python

 

Supervisor:

Rahul Chaudhari

Generating 3D facial expressions from speech

Description

Under this topic, the student should investigate Deep Learning approaches for animating faces of characters from speech signals. This involves generating facial expressions as well as lip/mouth movements according to what's being said in the speech signal.

 

References:

[1] NVIDIA audio2face https://www.nvidia.com/de-de/omniverse/apps/audio2face/

[2] Emotional Voice Puppetry https://napier-repository.worktribe.com/preview/3033680/EmotionalVoicePuppetry.pdf

 

Prerequisites

  • Familiarity and first experiences with Computer Vision and Deep Learning
  • Strong Python programming skills
  • Strong interest in 3D Computer Graphics and Computer Vision
  • Familiarity with blender

 

Supervisor:

Rahul Chaudhari

Real-Time 3D Object Tracking and Pose Estimation of Textureless Objects

Keywords:
computer vision, machine learning, digital twin

Description

Real time 3D tracking of objects using one or more cameras is crucial to build a Digital Twin. In this project, you will improve an algorithm for 3D tracking and pose estimation, and use it to update a Digital Twin of a factory environment that is used in robotic manipulation tasks.

We will pay special attention to the tracking of textureless objects and the speed of the algorithm. We will also try to compare the results using one and more cameras.

Prerequisites

For this work, good knowledge of C++ is required.

Some knowledge of Python and ROS will be useful, but it is not required.

Contact

diego.prado@tum.de

Supervisor:

Diego Fernandez Prado

HiWI Position Porject Lab Human Activity Understanding

Keywords:
deep-learning,ros,real-sense

Description

A HiWi position is available for the Lab Course Human Activity Understanding

 

The position offers 6 h/week contract.

 

The lab involves:

  •  Practical Sessions where the students collect data from a color/depth sensor setup.
  •  Notebook Sessions where the students are introduced to a jupyter notebook with brief theoretical content and homework.
  • Project Sessions, where the students are working on their own projects.  

The main tasks of this position involve the following:

  • Helping students with data collection in Practical and Project Sessions.
  • Assisting during the notebook sessions with regard to the contents of the notebooks and homework. 

Prerequisites

  • Knowledge about ROS.
  • Knowledge about python.
  • Basic Knowledge in Deep Learning

Contact

marsil.zakour@tum.de

Supervisor:

Marsil Zakour

Real-time Multi-sensor Processing Framework Based on ROS

Description

Multi-sensor data can provide rich environmental information for robots. In practical applications, it is necessary to ensure real-time and synchronous processing of sensor data. In this work, the student needs to design a ROS-based sensor data acquisition and processing framework and carry it on an existing robot platform. Specifically, the sensors involved in this project include RGBD camera, millimeter-wave radar, LiDAR, and IMU. There exist clock deviations between different sensors. The student needs to calibrate the clocks uniformly to make the timestamps of the data collected by the sensors consistent, transmit the collected data to the robot platform in real-time, and process them into the required data, such as point clouds, RGB pictures, etc.

Prerequisites

  • Strong familiarity with ROS, C++, and Python programming
  • Experience with hardware and sensors
  • Basic knowledge of robotics

Contact

mengchen.xiong@tum.de

dong.yang@tum.de

(Please attach your CV and transcript)

Supervisor:

Mengchen Xiong, Dong Yang

Synthesizing realistic grasps for hand-object interactions using Reinforcement Learning

Description

This topic is about synthesizing realistic grasps for virtual hand-object interactions using Reinforcement Learning. The work will be based on the D-GRASP paper published at the IEEE CVPR 2022: https://eth-ait.github.io/d-grasp/

D-GRASP is built into the RaiSim simulator, which is a multi-body physics engine for robotics and AI. We are however interested in porting and extending it for our own hand-object synthetic data generator based on the open-source Blender 3D software.

Prerequisites

  • Strong familiarity with Python programming and some experience with C++ programming
  • Interest and first experiences in Computer Graphics, VR, Computer Vision, and Deep Learning.
  • Ideally also interest and experience in Blender 3D software

 

Supervisor:

Rahul Chaudhari

Programming of a demo for vibrotactile compression

Keywords:
programming, haptics, compression

Description

In this work you are going to implement a demo presenting vibrotactile signals to a user and asking for a quality rating. The goal is to process data in realtime, such that the latency introduced by the codec can be evaluated. If enough time is left, we will also implement a haptic interface with hardware such as the Phantom Omni.

 

 

The data that is processed has high similarities to audio, such that a C++ audio library will be utilized (JUCE).

Clean programming as well as using professional tools like git will also be something you will learn along the way.

Prerequisites

The student should have decent programming knowledge, ideally in C++.

Contact

lars.nockenberg@tum.de

Supervision in German and English possible

Supervisor:

Lars Nockenberg

Generative Hands Object Interactions Using Diffusion Models

Keywords:
deep-learning, diffusion, stable-diffusion, action, smpl-x,mano, hand-object-interaction

Description

Recently, there has been increasing success in the generation of human motion and object grasp. On the other hand, an increasing number of datasets capture human hands' interaction with surrounding objects in addition to action labels. 

 

One advantage of Diffusion models is that they can easily conditioned on different types of input like text embeddings and control parameters.  

Your task will include exploring the existing models and implementing the hand-object interaction diffusion model.

Datasets We could use: 

  •  https://hoi4d.github.io/
  •  https://taeinkwon.com/projects/h2o/
  •  https://dex-ycb.github.io/

Motion Generation Models

  • https://guytevet.github.io/mdm-page/
  • https://goal.is.tue.mpg.de/

Prerequisites

  • Experience and interest in deep learning research.
  • Knowledge and experience with Pytorch.

Contact

marsil.zakour@tum.de

Supervisor:

Marsil Zakour

Generate in Style: Identifying the Existence of Specific Styles in Images Generated by AI

Description

 

Generative AI models are demonstrating strong performance in various domains. Models such as Stable Diffusion, trained using billions of images, are capable of generating highly realistic images based on text prompts. However, their capabilities are limited to generating content present in established datasets. To incorporate new styles, such as the way an artist paints their pictures, these models can be fine-tuned with data containing the desired style.
Besides fine-tuning the generative model, a desired style can also be transferred to a given image. While style transfer methods are well researched, comparatively little research is available on how to identify whether the style transfer was successful. This thesis is focused on identifying if a specific style is present in a given image. While style classification networks exist, they require a significant amount of images of each new style, plus expensive retraining with all existing style images. Instead, this thesis should investigate how to identify the existence of one specific style in an image when given only a few examples of a style. For this, a literature research about style transfer should be conducted and promising directions for identifying the presence of a given style should be identified. Then, a prototype of a style identifier should be designed and evaluated on AI generated content as well as human-made images as a comparison.
This thesis will be conducted externally at Sureel Inc., a startup specializing in secure and legal generative AI content.

Prerequisites

Experience with Python and machine learning

Contact

Christopher Kuhn

christopher@sureel.io

Supervisor:

Fine-Tuning Generative AI Models: Quantifying the Impact of New Data

Description

 

Generative AI models are demonstrating strong performance in various domains. Models such as Stable Diffusion, trained using billions of images, are capable of generating highly realistic images based on text prompts. However, their capabilities are limited to generating content present in established datasets. To incorporate new styles, unique visual concepts, or personal faces, these models can be fine-tuned.
While the visual impact of such fine-tuning can be evaluated through manual inspection, little research exists on quantifying its effects. This thesis aims to investigate the impact of new training data on large generative models. Specifically, we will focus on studying the effects of fine-tuning the open-source text-to-image model Stable Diffusion.
For this, an analysis of changes in the model parameters can be performed. Secondly, the changes in embeddings or output of the model when given prompts related to the new data can be analyzed. Additionally, existing methods for quantifying the effects of fine-tuning in machine learning should be researched and their application to latent diffusion models should be discussed.
This thesis will be conducted externally at Sureel Inc., a startup specializing in secure and legal generative AI content.

Prerequisites

Experience with Python and machine learning

Contact

Christopher Kuhn

christopher@sureel.io

Supervisor:

iOS app for tracking objects using RGB and depth data

Description

This topic is about the development of an iPhone app for tracking objects in the environment using data from the device's RGB and depth sensors.

Prerequisites

  • Good programming experience with C++ and Python
  • Ideally, experience building iOS apps with SWIFT and/or Unity ARFoundation
  • This topic is only suitable for you if you have a recent personal mac development device (ideally at least a MacBook Pro with Apple Silicon M1) and at least an iPhone 12 Pro with a LiDAR depth sensor

Supervisor:

Rahul Chaudhari

Force Rendering for Model Mediated Teleoperation

Keywords:
Haptics, Force Rendering, Digital Twin, Sensors, Robotics

Description

A Digital Twin is a virtual representation of an asset, to which is connected in a bi-directional way: changes happening in the real asset are shown in the digital asset and vice-versa.

In this project, you will improve force rendering algorithms to make teleoperation more user friendly through means of the Digital Twin of a factory. 

Prerequisites

Required:

  • Python knowledge
  • Chai3D (ideally you have participated in the Computational Haptics Laboratory)

Recommended (not all of them):

  • Experience in ROS
  • C++ knowledge
  • Robotics knowledge
  • MuJoCo

Contact

diego.prado@tum.de

Supervisor:

Diego Fernandez Prado

Reinforcement Learning for Estimating Virtual Fixture Geometry to Improve Robotic Manipulation

Description

Robotic teleoperation is often used to accomplish complex tasks remotely with human-in-the-loop. In cases, where the task requires very precise manipulation, virtual fixtures can be used to restrict and guide the motion of the end effector of the robot while the person teleoperates. In this thesis, we will analyze the geometry of virtual fixtures depending on the scene and task. We will use reinforcement learning to estimate ideal virtual fixture model parameters. At the end of the thesis, the performance can be evaluated with user experiments.

Prerequisites

Useful background:

- Machine learning (Reinforcement Learning)

- Robotic simulation

 

Requirements:

- Experience with Python & Deep learning frameworks (PyTorch / Tensorflow...)

- Experience with a RL framework

- Motivation to yield a good outcome

 

Contact

(Please provide your CV and transcript in your application)

 

furkan.kaynar@tum.de

diego.prado@tum.de

 

Supervisor:

Diego Fernandez Prado

Multi-level Fingerprinting-based Indoor Localization Scheme

Keywords:
Indoor Localization, Multipath, Fingerprinting
Short Description:
Multi-layer reference map implementation for fingerpriting-based Indoor localization.

Description

This work falls within the scope of indoor localization, more precisely the fingerprinting-based indoor localization. 

Your task will be to investigate the potential and outcomes of opting for a multi-layer reference map during the "Offline phase", which corresponds to a potential improvement idea that has never been opted for or tested in the current state-of-the- Art fingerprinting schemes.

The aim here is to achieve a better trade off between both performance and costs yielding a better localization method. 

Prerequisites

Required: 

 - Python and/or Matlab

 - Basic knowledge in signal processing and wireless communication

 - Analytical thinking and creativity 

 

Contact

To get more info/details and initiate contact: 

 

Majdi.abdmoulah@tum.de

(Please attach your cv and transcript)

Supervisor:

Majdi Abdmoulah

Securing Audio with AI and Blockchain: A Study of Digital Watermarking Techniques

Description

Description:

This thesis project will examine the integration of artificial intelligence (AI) and blockchain technology for digital watermarking of audio. Digital watermarking is a technique used to embed hidden information, such as ownership or copyright information, into digital audio files. The goal of this project is to develop new AI-based techniques for digital watermarking that can be secured and protected using blockchain technology.

Prerequisites:

  • Strong background in signal processing and digital audio
  • Familiarity with machine learning and AI techniques
  • Basic understanding of blockchain technology and its applications
  • Experience with programming languages such as Python and JavaScript
  • Strong analytical and problem-solving skills
  • Strong written and verbal communication skills

This project is an exciting opportunity to work at the intersection of AI and blockchain, where you will have the chance to apply your skills and knowledge to the development of new technologies that could have a significant impact on the audio industry. You will be working with an innovative startup in the heart of Silicon Valley, where you will have the opportunity to contribute to the development of cutting-edge technology. If you are passionate about AI, blockchain, and signal processing and are looking for a challenging and rewarding research experience, this thesis project is for you!

Please send your CV and Transcript of Records. Tell me why you are interested in this topic:

 

Contact

tamay@sureel.io

Supervisor:

Eckehard Steinbach - Dr.-Ing. Tamay Aykut (Sureel)

Securing Images/Videos with AI and Blockchain: A Study of Digital Watermarking Techniques

Description

Description:

This thesis project will examine the integration of artificial intelligence (AI) and blockchain technology for digital watermarking of images/videos. Digital watermarking is a technique used to embed hidden information, such as ownership or copyright information, into image files. The goal of this project is to develop new AI-based techniques for digital watermarking that can be secured and protected using blockchain technology.

Prerequisites:

  • Strong background in signal processing and digital images
  • Familiarity with machine learning and AI techniques
  • Basic understanding of blockchain technology and its applications
  • Experience with programming languages such as Python and JavaScript
  • Strong analytical and problem-solving skills
  • Strong written and verbal communication skills

This project is an exciting opportunity to work at the intersection of AI and blockchain, where you will have the chance to apply your skills and knowledge to the development of new technologies that could have a significant impact on the media industry. You will be working with an innovative startup in the heart of Silicon Valley, where you will have the opportunity to contribute to the development of cutting-edge technology. If you are passionate about AI, blockchain, and signal processing and are looking for a challenging and rewarding research experience, this thesis project is for you!

Please send your CV and Transcript of Records. Tell me why you are interested in this topic:

 

Contact

tamay@sureel.io

Supervisor:

Eckehard Steinbach - Dr.-Ing. Tamay Aykut (Sureel)

Unlocking the Potential of AI and Blockchain: Generative Multimedia

Description

Description:

We are excited to offer a unique and innovative thesis project that combines cutting-edge technology with digital multimedia. As an "AI and Blockchain Generative Media Researcher," you will have the opportunity to explore the potential of using AI algorithms to generate one-of-a-kind pieces of media content and using blockchain technology to protect the rights of the original creators via smart licensing mechanisms.

This project is an external thesis with a startup in San Francisco, which will give you the chance to work with real-world industry experts and gain valuable experience in a startup environment. This is not just about writing a thesis, it's about making a real-world impact on the media technology industry. You will have the chance to conduct research, explore new possibilities and create something truly unique.

Prerequisites:

  • Strong background in signal processing and digital images
  • Familiarity with machine learning and AI techniques
  • Basic understanding of blockchain technology and its applications
  • Experience with programming languages such as Python and JavaScript
  • Strong analytical and problem-solving skills
  • Strong written and verbal communication skills

Don't miss this opportunity to be part of a revolutionary project that combines your passion for computer science and art. Apply now and take the first step in unlocking the potential of AI and blockchain technology in generative art, with the added bonus of gaining valuable experience working with a startup in San Francisco.

Please send your CV and Transcript of Records. Tell me why you are interested in this topic:

 

Contact

tamay@sureel.io

Supervisor:

Eckehard Steinbach - Dr.-Ing. Tamay Aykut (Sureel)

Decentralized DRM for Multimedia: Blockchain-Powered Encryption and Encoding

Description

We are excited to offer a cutting-edge thesis project that explores the potential of using blockchain technology to decentralize Digital Rights Management (DRM) in the music and video streaming industry. As a "Web3 DRM Researcher," you will have the opportunity to investigate the use of encryption and encoding techniques, powered by blockchain technology, to secure and protect digital content in a decentralized way.

This project will involve research on the current state of DRM technology used by companies like Spotify and Netflix, and the challenges they face in ensuring the security and protection of digital content. You will then explore the potential of blockchain technology to address these challenges, and investigate the implementation of encryption and encoding techniques to secure and protect digital content in a decentralized manner.

Prerequisites:

  • Strong background in signal processing and audio/video encryption/encoding
  • Familiarity with machine learning and AI techniques
  • Basic understanding of blockchain technology and its applications
  • Experience with programming languages such as Python and JavaScript
  • Strong analytical and problem-solving skills
  • Strong written and verbal communication skills

This is an exciting opportunity for a student to work on a cutting-edge project that has the potential to make a real-world impact on the music and video streaming industry. Apply now and take the first step in decentralizing web3 DRM using blockchain technology.

Please send your CV and Transcript of Records. Tell me why you are interested in this topic:

 

Contact

Supervisor:

Eckehard Steinbach - Dr.-Ing. Tamay Aykut (Sureel)

 

You can find important information about writing your thesis, giving a talk at LMT as well as templates for Powerpoint and LaTeX here.