Motion HDR photography with Gain Map and coding efficiency evaluation
Motion photo, Live Photo, HDR, Gain Map, Video coding
Leverage the gain map technology to enable the motion HDR photograhy by means of encoding and embedding gain map bitstreams into image containers.
Description
Live photo (iOS) or Motion photo (Android) has been widely adopted in terminal devices especially in mobile phones to enable enhanced visual experience by means of embedding few seconds of video clips into photographs. They are not yet able to deliver HDR effect in the footages, not leveraging the full potential of HDR displays while HDR images are matured and standardised with gain map and its metadata. To use a Gain Map to scale the brightness of SDR rendition to HDR rendition, HDR imaging is in this way backward compatible and enables better realism.
The topic scope is not only about investigation of feasibility but also not limited to evaluation of coding efficiency of gain map bitstreams across a number of readily available codecs.
In addition, the student has a potential chance to contribute to the ISO standardisation of next generation HDR photography with specifically the focus on motion/live image format.
This topic is intended for master student's Research Internship (Forschungspraxis) or its extention to Master Thesis.
Prerequisites
Practical C++ and Python programming experience (we will start with Ultra HDR image format from the Android developer).
Knowledge on (HDR) image processing, video coding and motion image format.
(Read upon motion/live photo specifications)
independent problem solving skills and willingness to learn and contribute to ISO standards.
Students to contact Hongjie You and
Attach your tabular CV, Transcript (Leistungsnachweis) from TUMonline, and a short description of your motivation and skills, also your availability in the email.
Contact
Hongjie You
hongjie.you@tum.de
Supervisor:
Collaborative Robotic Grasping during Teleoperation Tasks
Description
This topic focuses on improving robotic grasping networks and advancing embodied intelligence. Grasping is a fundamental capability in robotic manipulation and often plays a decisive role in the overall success of a task. Despite significant progress in learning-based grasping, current models still struggle with generalization and robustness in unstructured environments. Our goal is to enhance the success rate of existing grasping models and deploy them in real-world scenarios, where they can provide intelligent assistance during teleoperation tasks. By leveraging pre-trained grasping networks, we aim to reduce the human operator's workload, increase autonomy, and improve manipulation efficiency in complex and dynamic settings. This work offers a unique opportunity to work at the intersection of perception, control, and learning—pushing the boundaries of what robots can achieve through smarter, more adaptive grasping.
Prerequisites
- Good Programming Skills (Python, C++)
- Knowledge about Ubuntu/Linux/ROS
- Motivation to learn and conduct research
Contact
dong.yang@tum.de
(Please attach your CV and transcript)
Supervisor:
(Stereo) Depth Estimation in Challenging Conditions on Edge Devices
Description
This research focuses on enhancing stereo depth estimation techniques to operate effectively under challenging conditions on edge devices. The project aims to develop robust algorithms that can accurately estimate depth information in environments with varying lighting and weather conditions. By optimizing these algorithms for edge devices, the research ensures real-time processing and low-latency responses, which are crucial for portable navigation aids. The effectiveness of these improvements will be validated through a series of experiments, evaluating their performance in real-world scenarios.
Supervisor:
Obstacle Detection and Avoidance Systems Using Meta Aria Smart Glasses
Description
This research focuses on testing and evaluating obstacle detection and avoidance solutions using Meta Aria smart glasses (and other available smart glasses technologies). The project will explore the integration of various detection algorithms and avoidance strategies with these wearable devices to assess their effectiveness in real-world environments.
Supervisor:
HDR gain map python implementation and parameters tuning
HDR, gain map
Description
HDR gain map technology involves storing SDR and gain map within a single image file, utilizing a Gain Map to scale or fully recover the HDR rendition for viewing on displays of different HDR headrooms.
This topic includes an implementation of gain map technology in python, encoding gain map and storing gain maps with their metadata, decoding gain map and recovering HDR rendition. Student's also required to evaluate the impacts of different parameters of the gain map and investigate the coding efficiency of gain map under certain conditions.
Prerequisites
Coding skills of Python and C++ for implementation
Knowledge of image processing and file formats
Availability for 3 months.
Please attach your tabular CV and the transcript issued from TUMonline.
Contact
Hongjie You (hongjie.you@tum.de)
Supervisor:
Hand-Object Interaction Reconstruction via Diffusion Model
Diffusion Model; Computer Vision
Description
This topic explores the use of diffusion models—an advanced generative AI technique—to reconstruct hand-object interactions from RGB images or videos. By learning the complex dynamics of hand movements and object manipulation, the model generates accurate 3D representations, benefiting applications in augmented reality, robotics, and human-computer interaction.
Prerequisites
- Programming in Python
- Knowledge about Deep Learning
- Knowledge about Pytorch
Contact
xinguo.he@tum.de
Supervisor:
Optimizing Multimodal Tactile Codecs with Cross-Modal Vector Quantization
Description
To achieve better user immersion and interaction fidelity, developing a multimodal tactile codec is necessary. Using correlation to compress multimodal signals into compact latent representations is a key challenge in multimodal codecs. VQ-VAE introduces a discrete latent variable space to achieve efficient coding, and it is promising for extension to multimodal scenarios. This project aims to use multimodal vector quantization to encode multiple tactile signals into a shared latent space.This unified representation will reduce redundancy while preserving important information for reconstruction.
Prerequisites
- knowledge in deep learning
- programming skills (python)
- motivation in research
Contact
wenxuan.wei@tum.de
Supervisor:
Multimodal Tactile Data Compression through Shared-Private Representations
Description
The Tactile Internet relies on real-time transmission of multimodal tactile data for enhancing user immersion and fidelity. However, most existing tactile codecs are limited to vibrotactile data. They are not able to transmit richer multimodal signals.
This project aims to develop a novel tactile codec that supports multimodal data with a shared-private representation framework. A shared network will extract common semantic information from two modalities, while private networks capture modality-specific features. By sharing the common representations during reconstruction, the codec is expected to reduce the volume of data that needs to be transmitted.
Prerequisites
- knowledge in deep learning
- programming skills (python)
- motivation in research
Contact
wenxuan.wei@tum.de