Synthetic data generation for multi-sensor (visual-inertial) rigs for large scale indoor 3D environments
Description
Synthetic data is revolutionizing Computer Vision by enabling the generation of large-scale, annotated datasets for training and evaluating algorithms. Real-world data collection is often limited by cost, privacy, and scalability. By simulating realistic sensor data in virtual 3D environments, we can accelerate research in 3D scene understanding, object detection, SLAM, and multi-modal perception.
Objective: In this thesis, you will design and implement a system to generate synthetic sensor data for multi-sensor rigs (e.g., cameras, LiDARs) in large-scale indoor 3D environments. The system will allow users to:
- Define custom sensor rigs (e.g., combinations of 2D/3D cameras, LiDARs, IMUs).
- Insert these rigs into a virtual world and control their trajectories.
- Render sequential data such as images, panoramas, and 3D point clouds for Computer Vision applications.
Your Tasks:
- Environment Setup: Choose and set up a suitable 3D simulation environment (e.g., NVIDIA Omniverse, CARLA, or Unity/Unreal Engine).
- Sensor Rig Definition: Develop an API to define and configure multi-sensor rigs with realistic sensor models.
- Trajectory Control: Implement functionality to move the rig along pre-defined or user-controlled trajectories within the virtual world.
- Data Generation: Render and export sequential sensor data (images, depth maps, point clouds, etc.) for use in downstream tasks like SLAM, object detection, or scene understanding.
- Validation: Evaluate the realism and utility of the generated data, potentially by training or testing a model on the synthetic dataset.
Prerequisites
- Strong background in 3D Computer Graphics and Computer Vision.
- Proficiency in C++ and Python.
- Experience with 3D simulation tools (e.g., Omniverse, CARLA, Blender) is a plus.
- Self-motivated, creative, and eager to tackle challenging technical problems
Contact: Interested candidates should send a CV, transcript, and a brief statement of motivation to the thesis supervisor.
Supervisor:
Rate distortion performance analysis of learned HDR image compression
Image coding, HDR imaging, Compression, RD optimization
Description
HDR imaging/videography enables visual quality improvement at a cost of sizable storage overhead. Compression techniques to reduce the file sizes whilst keeping the quality acceptable would benefit the HDR pipeline of both imagery and videos and it requires sometimes sophisticated design. This topic offers the student an opportunity to investigate the rate distortion performance of different compression schemes and study the design of conventional algorithimic coding of HDR images as well as modern learned HDR compression networks.
A student assigned to the topic will study and implement HDR imaging or video compression schemes with detailed analysis of the rate distortion performances (benchmarking) the codecs across a number of objective quality metrics, comparison of bitrate saving and propose potential improvements towards HDR image or video compression.
Prerequisites
- Knowledge on image/video codecs, such as JPEG-XT, JPEG-XL and H.265(HEVC), H.266(VVC)
- Knowledge on HDR imaging/videography (including colour theory)
- Hands-on coding skills of C++, PyTorch and shell for running readily available codecs and prototyping experimental ideas.
- Availability for 3/6 months full-time work placement, dependent upon topic registration type
Some references:
- Cao, Peibei, et al. "Learned HDR Image Compression for Perceptually Optimal Storage and Display." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024.
- Shen, Xuelin, et al. "Breaking Boundaries: Unifying Imaging and Compression for HDR Image Compression." IEEE Transactions on Image Processing (2025).
Contact
Contact Hongjie You (hongjie.you@tum.de) for this topic and attach your tabular CV, transcript (Leistungsnachweis from TUM).
Please mention your desired topic type, Forschungspraxis/Masterarbeit etc. and start date for further discussion.
Supervisor:
Motion HDR photography with Gain Map and coding efficiency evaluation
Motion photo, Live Photo, HDR, Gain Map, Video coding
Leverage the gain map technology to enable the motion HDR photograhy by means of encoding and embedding gain map bitstreams into image containers.
Description
Live photo (iOS) or Motion photo (Android) has been widely adopted in terminal devices especially in mobile phones to enable enhanced visual experience by means of embedding few seconds of video clips into photographs. They are not yet able to deliver HDR effect in the footages, not leveraging the full potential of HDR displays while HDR images are matured and standardised with gain map and its metadata. To use a Gain Map to scale the brightness of SDR rendition to HDR rendition, HDR imaging is in this way backward compatible and enables better realism.
The topic scope is not only about investigation of feasibility but also not limited to evaluation of coding efficiency of gain map bitstreams across a number of readily available codecs.
In addition, the student has a potential chance to contribute to the ISO standardisation of next generation HDR photography with specifically the focus on motion/live image format.
This topic is intended for master student's Research Internship (Forschungspraxis) or its extention to Master Thesis.
Prerequisites
Practical C++ and Python programming experience (we will start with Ultra HDR image format from the Android developer).
Knowledge on (HDR) image processing, video coding and motion image format.
(Read upon motion/live photo specifications)
independent problem solving skills and willingness to learn and contribute to ISO standards.
Students to contact Hongjie You and
Attach your tabular CV, Transcript (Leistungsnachweis) from TUMonline, and a short description of your motivation and skills, also your availability in the email.
Contact
Hongjie You
hongjie.you@tum.de
Supervisor:
HDR gain map python implementation and parameters tuning
HDR, gain map
Description
HDR gain map technology involves storing SDR and gain map within a single image file, utilizing a Gain Map to scale or fully recover the HDR rendition for viewing on displays of different HDR headrooms.
This topic includes an implementation of gain map technology in python, encoding gain map and storing gain maps with their metadata, decoding gain map and recovering HDR rendition. Student's also required to evaluate the impacts of different parameters of the gain map and investigate the coding efficiency of gain map under certain conditions.
Prerequisites
Coding skills of Python and C++ for implementation
Knowledge of image processing and file formats
Availability for 3 months.
Please attach your tabular CV and the transcript issued from TUMonline.
Contact
Hongjie You (hongjie.you@tum.de)
Supervisor:
Optimizing Multimodal Tactile Codecs with Cross-Modal Vector Quantization
Description
To achieve better user immersion and interaction fidelity, developing a multimodal tactile codec is necessary. Using correlation to compress multimodal signals into compact latent representations is a key challenge in multimodal codecs. VQ-VAE introduces a discrete latent variable space to achieve efficient coding, and it is promising for extension to multimodal scenarios. This project aims to use multimodal vector quantization to encode multiple tactile signals into a shared latent space.This unified representation will reduce redundancy while preserving important information for reconstruction.
Prerequisites
- knowledge in deep learning
- programming skills (python)
- motivation in research
Contact
wenxuan.wei@tum.de
Supervisor:
Multimodal Tactile Data Compression through Shared-Private Representations
Description
The Tactile Internet relies on real-time transmission of multimodal tactile data for enhancing user immersion and fidelity. However, most existing tactile codecs are limited to vibrotactile data. They are not able to transmit richer multimodal signals.
This project aims to develop a novel tactile codec that supports multimodal data with a shared-private representation framework. A shared network will extract common semantic information from two modalities, while private networks capture modality-specific features. By sharing the common representations during reconstruction, the codec is expected to reduce the volume of data that needs to be transmitted.
Prerequisites
- knowledge in deep learning
- programming skills (python)
- motivation in research
Contact
wenxuan.wei@tum.de