Context-based 3D Animations of Vehicles, Human and Animal Figurines
Description
The goal of this thesis is to animate 3D objects such as vehicles, humans, and animals based on multimodal contextual information.
A simple example: real-world 3D trajectory data of the object can be used to classify whether a given object is moving or idle. Based on the classification result, the corresponding animation is played on the object -- a breathing animation if the object is idle, and a walking/running animation if the object is in motion.
This idea can be extended further to produce more complex animations. For example, if a dog gets wet due to rain in an evolving story, the subsequent animation produced should be "shaking off water from the body".
Possible steps include:
- Using our Large Language Model based system to generate novel animations.
- Designing and evaluating a novel Machine Learning model that decides which animation to play based on 3D trajectory of the objects, semantic and geometric configuration of the 3D scenegraph, user input, and the context of an evolving story. The 3D trajectory can be obtained from our already operational pose tracking system.
Prerequisites
- Working knowledge of Blender
- Python
- Initial experience in training and evaluation of Machine Learning models
Supervisor:
3D Scene Navigation Using Free-hand Gestures
3D, Blender, Python, hand tracking, gesture recognition
Description
The goal of this bachelor thesis project is to design and evaluate a 3D scene navigation system based on free-hand gestures.
Possible steps include:
- Modeling a 3D world in Blender (an existing pre-desgined world may also be used e.g. from sketchfab)
- Designing a distinct set of hand gestures that allows comprehensive navigation of the 3D world (i.e. to control camera translation and rotation based on hand gestures). It should be possible for the user to navigate to any place in the 3D world quickly, efficiently, and intuitively.
- The Google mediapipe framework can be used to detect and track hand keypoints. On top of that, a novel gesture recognition model should be trained and evaluated.
- Comparing, contrasting, and benchmarking the performance of this system against the standard keyboard+mouse-based navigation capabilities offered by Blender.
Start date: 01.04.2025
Prerequisites
- Working knowledge of Blender and python
- Interest in 3D worlds and human-computer interaction
Supervisor:
Selfsupervised IMU-Denoising for Visual-Inertial SLAM
Selfsupervised Learning, IMU denoising
Description
In Visual-Inertial SLAM (Simultaneous Localization and Mapping), inertial measurement units (IMUs) are crucial for estimating motion. However, IMU data often contains accumulative noise, which degrades SLAM performance. Self-supervised machine learning techniques can automatically denoise IMU data without requiring labeled datasets. By leveraging self-supervised training, the project aim to explore neural networks distinguish useful IMU signal patterns from noise, improving the accuracy of motion estimation and robustness of Visual-Inertial SLAM systems.
Prerequisites
- Knowledge in Machine Learning and Transformer.
- Motivation to learn and research.
- Good coding skills in C++ and Python.
- Project experience in Machine Learning (PyTorch) is a plus.
Contact
xin.su@tum.de
Supervisor:
Scene Graph-based Real-time Scene Understanding for Assistive Robot Manipulation Task
Description
With the rapid development of embodied intelligent robots, real-time and accurate scene understanding is crucial for robots to complete tasks efficiently and effectively. Scene graphs represent objects and their relations in a scene via a graph structure. Previous studies have generated scene graphs from images or 3D scenes, also with the assistance of large language models (LLMs).
In this work, we investigate the application of scene graphs in assisting the human operator during the teleoperated manipulation task. Leveraging real-time generated scene graphs, the robot system can obtain a comprehensive understanding of the scene and also reason the best solution to complete the manipulation task based on the current robot state.
Prerequisites
- Good Programming Skills (Python, C++)
- Knowledge about Ubuntu/Linux/ROS
- Motivation to learn and conduct research
Contact
dong.yang@tum.de
(Please attach your CV and transcript)
Supervisor:
Equivariant 3D Object Detection
3D Object Detection, Computer Vision, Deep Learning, Indoor Environments
Description
The thesis focuses on the application of equivariant deep learning techniques for 3D object detection in indoor scenes. Indoor environments, such as homes, offices, and industrial settings, present unique challenges for 3D object detection due to diverse object arrangements, varying lighting conditions, and occlusions. Traditional methods often struggle with these complexities, leading to suboptimal performance. The motivation for this research is to enhance the robustness and accuracy of 3D object detection in these environments, leveraging the inherent advantages of equivariant deep learning. This approach aims to improve the model's ability to recognize objects regardless of their orientation and position in the scene, which is crucial for applications in robotics, or augmented reality.
The thesis proposes the development of a deep learning model that incorporates equivariant neural networks for 3D object detection, such as the equivariant framework proposed in [1]. The proposed model will be evaluated on a benchmark 3D indoor dataset, such as the Stanford 3D Indoor Spaces Dataset (S3DIS) or the ScanNet dataset [2, 3].
References
[1] Deng, Congyue, et al. "Vector neurons: A general framework for so (3)-equivariant networks." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Dai, Angela, et al. "Scannet: Richly-annotated 3d reconstructions of indoor scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[3] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019).
Prerequisites
- Python and Git
- Experience with a deep learning framework (Pytorch, Tensorflow)
- Interest in Computer Vision and Machine Learning