development of deep learning toolkit
Beschreibung
Your task is to implement a series of deep-learning applications, toolkits, and the necessary database.
requirements: knowledge of DL, pytorch, programming with python or C++
Betreuer:
Robust Hand-Object Pose estimation from Multi-view 2D Keypoints
Beschreibung
Hand-object pose estimation is a challenging task due to multiple factors like occlusion, and ambiguity in pose recovery. To overcome this issue, multi-view camera systems are used.
Using 2D keypoint detectors for hands and objects like Yolov8-pose and mmpose we can uplift the 2D detections to 3D. However, the detections usually are usually noisy, and some keypoints may be missing.
We want to utilize deep learning methods for smoothing, inpainting, and uplifting these detections to 3D in order to estimate the pose of the corresponding hands and objects.
The task is formulated as follows:
Given a sequence of noisy 2D key points for human hands and an object captured from calibrated camera views. Using a deep learning model, estimate a smooth trajectory of the hand and object poses.
Voraussetzungen
- Python
- Knowledge about Deep Learning
- Knowledge about Pytorch
- Previous Knowledge about 3D data processing is a plus.
Kontakt
marsil.zakour@tum.de
Betreuer:
Realistic differentiable hand model for synthesizing hand-object interaction data
Beschreibung
Our hand-object interaction simulator currently uses a hand mesh driven by a bone rig. This approach has several limitations. The size and shape of the hand cannot be easily controlled. The hand texture cannot be easily changed to another realistic texture. In this thesis, the above limitations should be addressed by relying on the components below.
To replace the fixed hand mesh with a differentiable hand model and drive the hand using motion capture data: https://handover-sim.github.io/
To let the virtual hand interact physically with virtual objects (turns the hand into an articulated rigid body after the shape deformation): https://github.com/ikalevatykh/mano_pybullet
For integrating hand textures from real hands: https://handtracker.mpi-inf.mpg.de/projects/HandTextureModel/
Voraussetzungen
- Strong familiarity with Python programming and linux environment
- Interest and first experiences in Computer Graphics, VR, Computer Vision, and Deep Learning.
- Ideally also interest and experience in Blender 3D software
Betreuer:
Interactive story generation with visual input
Beschreibung
Conventional stories for children of ages 3—6 years are static, independent of the medium (text, video, audio). We aim to make stories interactive, by giving the user control over characters, objects, scenes, and timing. This will lead to the construction of novel, unique, and personalized stories situated in (partially) familiar environments. We restrict this objective to specific domains consisting of a coherent body of works, such as the children’s book series “Meine Freundin Conni”. The challenges in this thesis include finding a suitable knowledge representation for the domain, learning that representation automatically, and inferring a novel storyline over that representation with active user interaction. In this direction, both neural as well as symbolic approaches should be explored.
So far we have implemented a text-based interactive story generation system based on Large Language Models. In this thesis, the text input modality should be replaced by visual input. In particular, the story should be driven by real-world motion of figurines and objects, rather than an abstract textual description of the scene and its dynamics.
Voraussetzungen
- First experiences with 2D/3D Computer Vision and Computer Graphics
- Familiarity with AI incl. Deep Learning (university courses / practical experience)
- Programming in Python
Betreuer:
Generating 3D facial expressions from speech
Beschreibung
Under this topic, the student should investigate Deep Learning approaches for animating faces of characters from speech signals. This involves generating facial expressions as well as lip/mouth movements according to what's being said in the speech signal.
References:
[1] NVIDIA audio2face https://www.nvidia.com/de-de/omniverse/apps/audio2face/
[2] Emotional Voice Puppetry https://napier-repository.worktribe.com/preview/3033680/EmotionalVoicePuppetry.pdf
Voraussetzungen
- Familiarity and first experiences with Computer Vision and Deep Learning
- Strong Python programming skills
- Strong interest in 3D Computer Graphics and Computer Vision
- Familiarity with blender
Betreuer:
Real-Time 3D Object Tracking and Pose Estimation of Textureless Objects
computer vision, machine learning, digital twin
Beschreibung
Real time 3D tracking of objects using one or more cameras is crucial to build a Digital Twin. In this project, you will improve an algorithm for 3D tracking and pose estimation, and use it to update a Digital Twin of a factory environment that is used in robotic manipulation tasks.
We will pay special attention to the tracking of textureless objects and the speed of the algorithm. We will also try to compare the results using one and more cameras.
Voraussetzungen
For this work, good knowledge of C++ is required.
Some knowledge of Python and ROS will be useful, but it is not required.
Kontakt
diego.prado@tum.de
Betreuer:
HiWI Position Porject Lab Human Activity Understanding
deep-learning,ros,real-sense
Beschreibung
A HiWi position is available for the Lab Course Human Activity Understanding.
The position offers 6 h/week contract.
The lab involves:
- Practical Sessions where the students collect data from a color/depth sensor setup.
- Notebook Sessions where the students are introduced to a jupyter notebook with brief theoretical content and homework.
- Project Sessions, where the students are working on their own projects.
The main tasks of this position involve the following:
- Helping students with data collection in Practical and Project Sessions.
- Assisting during the notebook sessions with regard to the contents of the notebooks and homework.
Voraussetzungen
- Knowledge about ROS.
- Knowledge about python.
- Basic Knowledge in Deep Learning
Kontakt
marsil.zakour@tum.de
Betreuer:
Real-time Multi-sensor Processing Framework Based on ROS
Beschreibung
Multi-sensor data can provide rich environmental information for robots. In practical applications, it is necessary to ensure real-time and synchronous processing of sensor data. In this work, the student needs to design a ROS-based sensor data acquisition and processing framework and carry it on an existing robot platform. Specifically, the sensors involved in this project include RGBD camera, millimeter-wave radar, LiDAR, and IMU. There exist clock deviations between different sensors. The student needs to calibrate the clocks uniformly to make the timestamps of the data collected by the sensors consistent, transmit the collected data to the robot platform in real-time, and process them into the required data, such as point clouds, RGB pictures, etc.
Voraussetzungen
- Strong familiarity with ROS, C++, and Python programming
- Experience with hardware and sensors
- Basic knowledge of robotics
Kontakt
mengchen.xiong@tum.de
dong.yang@tum.de
(Please attach your CV and transcript)
Betreuer:
Synthesizing realistic grasps for hand-object interactions using Reinforcement Learning
Beschreibung
This topic is about synthesizing realistic grasps for virtual hand-object interactions using Reinforcement Learning. The work will be based on the D-GRASP paper published at the IEEE CVPR 2022: https://eth-ait.github.io/d-grasp/
D-GRASP is built into the RaiSim simulator, which is a multi-body physics engine for robotics and AI. We are however interested in porting and extending it for our own hand-object synthetic data generator based on the open-source Blender 3D software.
Voraussetzungen
- Strong familiarity with Python programming and some experience with C++ programming
- Interest and first experiences in Computer Graphics, VR, Computer Vision, and Deep Learning.
- Ideally also interest and experience in Blender 3D software
Betreuer:
Programming of a demo for vibrotactile compression
programming, haptics, compression
Beschreibung
In this work you are going to implement a demo presenting vibrotactile signals to a user and asking for a quality rating. The goal is to process data in realtime, such that the latency introduced by the codec can be evaluated. If enough time is left, we will also implement a haptic interface with hardware such as the Phantom Omni.
The data that is processed has high similarities to audio, such that a C++ audio library will be utilized (JUCE).
Clean programming as well as using professional tools like git will also be something you will learn along the way.
Voraussetzungen
The student should have decent programming knowledge, ideally in C++.
Kontakt
lars.nockenberg@tum.de
Supervision in German and English possible
Betreuer:
Generative Hands Object Interactions Using Diffusion Models
deep-learning, diffusion, stable-diffusion, action, smpl-x,mano, hand-object-interaction
Beschreibung
Recently, there has been increasing success in the generation of human motion and object grasp. On the other hand, an increasing number of datasets capture human hands' interaction with surrounding objects in addition to action labels.
One advantage of Diffusion models is that they can easily conditioned on different types of input like text embeddings and control parameters.
Your task will include exploring the existing models and implementing the hand-object interaction diffusion model.
Datasets We could use:
- https://hoi4d.github.io/
- https://taeinkwon.com/projects/h2o/
- https://dex-ycb.github.io/
Motion Generation Models
- https://guytevet.github.io/mdm-page/
- https://goal.is.tue.mpg.de/
Voraussetzungen
- Experience and interest in deep learning research.
- Knowledge and experience with Pytorch.
Kontakt
marsil.zakour@tum.de
Betreuer:
Generate in Style: Identifying the Existence of Specific Styles in Images Generated by AI
Beschreibung
Generative AI models are demonstrating strong performance in various domains. Models such as Stable Diffusion, trained using billions of images, are capable of generating highly realistic images based on text prompts. However, their capabilities are limited to generating content present in established datasets. To incorporate new styles, such as the way an artist paints their pictures, these models can be fine-tuned with data containing the desired style.
Besides fine-tuning the generative model, a desired style can also be transferred to a given image. While style transfer methods are well researched, comparatively little research is available on how to identify whether the style transfer was successful. This thesis is focused on identifying if a specific style is present in a given image. While style classification networks exist, they require a significant amount of images of each new style, plus expensive retraining with all existing style images. Instead, this thesis should investigate how to identify the existence of one specific style in an image when given only a few examples of a style. For this, a literature research about style transfer should be conducted and promising directions for identifying the presence of a given style should be identified. Then, a prototype of a style identifier should be designed and evaluated on AI generated content as well as human-made images as a comparison.
This thesis will be conducted externally at Sureel Inc., a startup specializing in secure and legal generative AI content.
Voraussetzungen
Experience with Python and machine learning
Kontakt
Christopher Kuhn
christopher@sureel.io
Betreuer:
Fine-Tuning Generative AI Models: Quantifying the Impact of New Data
Beschreibung
Generative AI models are demonstrating strong performance in various domains. Models such as Stable Diffusion, trained using billions of images, are capable of generating highly realistic images based on text prompts. However, their capabilities are limited to generating content present in established datasets. To incorporate new styles, unique visual concepts, or personal faces, these models can be fine-tuned.
While the visual impact of such fine-tuning can be evaluated through manual inspection, little research exists on quantifying its effects. This thesis aims to investigate the impact of new training data on large generative models. Specifically, we will focus on studying the effects of fine-tuning the open-source text-to-image model Stable Diffusion.
For this, an analysis of changes in the model parameters can be performed. Secondly, the changes in embeddings or output of the model when given prompts related to the new data can be analyzed. Additionally, existing methods for quantifying the effects of fine-tuning in machine learning should be researched and their application to latent diffusion models should be discussed.
This thesis will be conducted externally at Sureel Inc., a startup specializing in secure and legal generative AI content.
Voraussetzungen
Experience with Python and machine learning
Kontakt
Christopher Kuhn
christopher@sureel.io
Betreuer:
iOS app for tracking objects using RGB and depth data
Beschreibung
This topic is about the development of an iPhone app for tracking objects in the environment using data from the device's RGB and depth sensors.
Voraussetzungen
- Good programming experience with C++ and Python
- Ideally, experience building iOS apps with SWIFT and/or Unity ARFoundation
- This topic is only suitable for you if you have a recent personal mac development device (ideally at least a MacBook Pro with Apple Silicon M1) and at least an iPhone 12 Pro with a LiDAR depth sensor
Betreuer:
Force Rendering for Model Mediated Teleoperation
Haptics, Force Rendering, Digital Twin, Sensors, Robotics
Beschreibung
A Digital Twin is a virtual representation of an asset, to which is connected in a bi-directional way: changes happening in the real asset are shown in the digital asset and vice-versa.
In this project, you will improve force rendering algorithms to make teleoperation more user friendly through means of the Digital Twin of a factory.
Voraussetzungen
Required:
- Python knowledge
- Chai3D (ideally you have participated in the Computational Haptics Laboratory)
Recommended (not all of them):
- Experience in ROS
- C++ knowledge
- Robotics knowledge
- MuJoCo
Kontakt
diego.prado@tum.de
Betreuer:
Reinforcement Learning for Estimating Virtual Fixture Geometry to Improve Robotic Manipulation
Beschreibung
Robotic teleoperation is often used to accomplish complex tasks remotely with human-in-the-loop. In cases, where the task requires very precise manipulation, virtual fixtures can be used to restrict and guide the motion of the end effector of the robot while the person teleoperates. In this thesis, we will analyze the geometry of virtual fixtures depending on the scene and task. We will use reinforcement learning to estimate ideal virtual fixture model parameters. At the end of the thesis, the performance can be evaluated with user experiments.
Voraussetzungen
Useful background:
- Machine learning (Reinforcement Learning)
- Robotic simulation
Requirements:
- Experience with Python & Deep learning frameworks (PyTorch / Tensorflow...)
- Experience with a RL framework
- Motivation to yield a good outcome
Kontakt
(Please provide your CV and transcript in your application)
furkan.kaynar@tum.de
diego.prado@tum.de
Betreuer:
Multi-level Fingerprinting-based Indoor Localization Scheme
Indoor Localization, Multipath, Fingerprinting
Multi-layer reference map implementation for fingerpriting-based Indoor localization.
Beschreibung
This work falls within the scope of indoor localization, more precisely the fingerprinting-based indoor localization.
Your task will be to investigate the potential and outcomes of opting for a multi-layer reference map during the "Offline phase", which corresponds to a potential improvement idea that has never been opted for or tested in the current state-of-the- Art fingerprinting schemes.
The aim here is to achieve a better trade off between both performance and costs yielding a better localization method.
Voraussetzungen
Required:
- Python and/or Matlab
- Basic knowledge in signal processing and wireless communication
- Analytical thinking and creativity
Kontakt
To get more info/details and initiate contact:
Majdi.abdmoulah@tum.de
(Please attach your cv and transcript)
Betreuer:
Securing Audio with AI and Blockchain: A Study of Digital Watermarking Techniques
Beschreibung
Description:
This thesis project will examine the integration of artificial intelligence (AI) and blockchain technology for digital watermarking of audio. Digital watermarking is a technique used to embed hidden information, such as ownership or copyright information, into digital audio files. The goal of this project is to develop new AI-based techniques for digital watermarking that can be secured and protected using blockchain technology.
Prerequisites:
- Strong background in signal processing and digital audio
- Familiarity with machine learning and AI techniques
- Basic understanding of blockchain technology and its applications
- Experience with programming languages such as Python and JavaScript
- Strong analytical and problem-solving skills
- Strong written and verbal communication skills
This project is an exciting opportunity to work at the intersection of AI and blockchain, where you will have the chance to apply your skills and knowledge to the development of new technologies that could have a significant impact on the audio industry. You will be working with an innovative startup in the heart of Silicon Valley, where you will have the opportunity to contribute to the development of cutting-edge technology. If you are passionate about AI, blockchain, and signal processing and are looking for a challenging and rewarding research experience, this thesis project is for you!
Please send your CV and Transcript of Records. Tell me why you are interested in this topic:
Kontakt
tamay@sureel.io
Betreuer:
Securing Images/Videos with AI and Blockchain: A Study of Digital Watermarking Techniques
Beschreibung
Description:
This thesis project will examine the integration of artificial intelligence (AI) and blockchain technology for digital watermarking of images/videos. Digital watermarking is a technique used to embed hidden information, such as ownership or copyright information, into image files. The goal of this project is to develop new AI-based techniques for digital watermarking that can be secured and protected using blockchain technology.
Prerequisites:
- Strong background in signal processing and digital images
- Familiarity with machine learning and AI techniques
- Basic understanding of blockchain technology and its applications
- Experience with programming languages such as Python and JavaScript
- Strong analytical and problem-solving skills
- Strong written and verbal communication skills
This project is an exciting opportunity to work at the intersection of AI and blockchain, where you will have the chance to apply your skills and knowledge to the development of new technologies that could have a significant impact on the media industry. You will be working with an innovative startup in the heart of Silicon Valley, where you will have the opportunity to contribute to the development of cutting-edge technology. If you are passionate about AI, blockchain, and signal processing and are looking for a challenging and rewarding research experience, this thesis project is for you!
Please send your CV and Transcript of Records. Tell me why you are interested in this topic:
Kontakt
tamay@sureel.io
Betreuer:
Unlocking the Potential of AI and Blockchain: Generative Multimedia
Beschreibung
Description:
We are excited to offer a unique and innovative thesis project that combines cutting-edge technology with digital multimedia. As an "AI and Blockchain Generative Media Researcher," you will have the opportunity to explore the potential of using AI algorithms to generate one-of-a-kind pieces of media content and using blockchain technology to protect the rights of the original creators via smart licensing mechanisms.
This project is an external thesis with a startup in San Francisco, which will give you the chance to work with real-world industry experts and gain valuable experience in a startup environment. This is not just about writing a thesis, it's about making a real-world impact on the media technology industry. You will have the chance to conduct research, explore new possibilities and create something truly unique.
Prerequisites:
- Strong background in signal processing and digital images
- Familiarity with machine learning and AI techniques
- Basic understanding of blockchain technology and its applications
- Experience with programming languages such as Python and JavaScript
- Strong analytical and problem-solving skills
- Strong written and verbal communication skills
Don't miss this opportunity to be part of a revolutionary project that combines your passion for computer science and art. Apply now and take the first step in unlocking the potential of AI and blockchain technology in generative art, with the added bonus of gaining valuable experience working with a startup in San Francisco.
Please send your CV and Transcript of Records. Tell me why you are interested in this topic:
Kontakt
tamay@sureel.io
Betreuer:
Decentralized DRM for Multimedia: Blockchain-Powered Encryption and Encoding
Beschreibung
We are excited to offer a cutting-edge thesis project that explores the potential of using blockchain technology to decentralize Digital Rights Management (DRM) in the music and video streaming industry. As a "Web3 DRM Researcher," you will have the opportunity to investigate the use of encryption and encoding techniques, powered by blockchain technology, to secure and protect digital content in a decentralized way.
This project will involve research on the current state of DRM technology used by companies like Spotify and Netflix, and the challenges they face in ensuring the security and protection of digital content. You will then explore the potential of blockchain technology to address these challenges, and investigate the implementation of encryption and encoding techniques to secure and protect digital content in a decentralized manner.
Prerequisites:
- Strong background in signal processing and audio/video encryption/encoding
- Familiarity with machine learning and AI techniques
- Basic understanding of blockchain technology and its applications
- Experience with programming languages such as Python and JavaScript
- Strong analytical and problem-solving skills
- Strong written and verbal communication skills
This is an exciting opportunity for a student to work on a cutting-edge project that has the potential to make a real-world impact on the music and video streaming industry. Apply now and take the first step in decentralizing web3 DRM using blockchain technology.
Please send your CV and Transcript of Records. Tell me why you are interested in this topic: