Open Thesis

Context-based 3D Animations of Vehicles, Human and Animal Figurines

Description

The goal of this thesis is to animate 3D objects such as vehicles, humans, and animals based on multimodal contextual information.

A simple example: real-world 3D trajectory data of the object can be used to classify whether a given object is moving or idle. Based on the classification result, the corresponding animation is played on the object -- a breathing animation if the object is idle, and a walking/running animation if the object is in motion.

This idea can be extended further to produce more complex animations. For example, if a dog gets wet due to rain in an evolving story, the subsequent animation produced should be "shaking off water from the body".

Possible steps include:

  • Using our Large Language Model based system to generate novel animations.
  • Designing and evaluating a novel Machine Learning model that decides which animation to play based on 3D trajectory of the objects, semantic and geometric configuration of the 3D scenegraph, user input, and the context of an evolving story. The 3D trajectory can be obtained from our already operational pose tracking system.

Prerequisites

  • Working knowledge of Blender
  • Python
  • Initial experience in training and evaluation of Machine Learning models

Supervisor:

Rahul Chaudhari

3D Scene Navigation Using Free-hand Gestures

Keywords:
3D, Blender, Python, hand tracking, gesture recognition

Description

The goal of this bachelor thesis project is to design and evaluate a 3D scene navigation system based on free-hand gestures.

Possible steps include:

  • Modeling a 3D world in Blender (an existing pre-desgined world may also be used e.g. from sketchfab)
  • Designing a distinct set of hand gestures that allows comprehensive navigation of the 3D world (i.e. to control camera translation and rotation based on hand gestures). It should be possible for the user to navigate to any place in the 3D world quickly, efficiently, and intuitively.
  • The Google mediapipe framework can be used to detect and track hand keypoints. On top of that, a novel gesture recognition model should be trained and evaluated.
  • Comparing, contrasting, and benchmarking the performance of this system against the standard keyboard+mouse-based navigation capabilities offered by Blender.

Start date: 01.04.2025

Prerequisites

  • Working knowledge of Blender and python
  • Interest in 3D worlds and human-computer interaction

Supervisor:

Rahul Chaudhari

Ongoing Thesis

Master's Theses

Real-time registration of noisy, incomplete and partially-occluded 3D pointclouds

Description

This topic is about the registration of 3D pointclouds belonging to certain objects in the scene, rather than about registering different pointclouds of the scene itself.

State-of-the-art (SOTA) pointcloud registration models/algorithms should be first reviewed, and promising candidates should be selected for evaluation based on the criteria listed below.

  • The method must work in real-time (at least 25 frames per second) for at least 5 different objects at the same time.
  • The method must be robust to noise in the  pointclouds. They come from an Intel RealSense D435 RGB+Depth camera.
  • The method must be able to robustly track the objects of interest even if they are occluded partially by other objects.

The best-suited method must then be extended or improved in a novel way or a completely novel method should be developed.

Both classical as well as Deep Learning based methods must be considered.

Related work:

  • DeepGMR: https://github.com/wentaoyuan/deepgmr
  • 3D Object Tracking with Transformer: https://github.com/3bobo/lttr

 

Prerequisites

  • First experiences with 3D data processing / Computer Vision
  • Python programming, ideally also familiarity with C++
  • Familiarity with Linux and the command line

Supervisor:

Rahul Chaudhari

Learning 3D skeleton animations of animals from videos

Description

Under this topic, the student should investigate how to learn 3D animations of skeletons of animals from videos. The 2D skeleton should be extracted first automatically from a video. A state-of-the-art 3D animal shape+pose (SMAL, see references below) model should then be fitted to the skeleton.

References

  • https://smal.is.tue.mpg.de/index.html
  • https://smalr.is.tue.mpg.de/
  • https://github.com/silviazuffi/smalr_online
  • https://github.com/silviazuffi/gloss_skeleton
  • https://github.com/silviazuffi/smalst
  • https://github.com/benjiebob/SMALify
  • https://github.com/benjiebob/SMALViewer
  • https://bmvc2022.mpi-inf.mpg.de/0848.pdf

Dataset

  • https://research.google.com/youtube8m/explore.html
  • https://youtube-vos.org/dataset/vos/
  • https://data.vision.ee.ethz.ch/cvl/youtube-objects/
  • https://blog.roboflow.com/youtube-video-computer-vision/
  • https://github.com/gtoderici/sports-1m-dataset/ (this dataset seems to provide raw videos from YT)
  • https://github.com/pandorgan/APT-36K
  • https://calvin-vision.net/datasets/tigdog/: contains all the videos, the behavior labels, the landmarks, and the segmentation masks for all three object classes (dog, horse, tiger)
  • https://github.com/hellock/WLD (raw videos)
  • https://sutdcv.github.io/Animal-Kingdom/
  • https://sites.google.com/view/animal-pose/

Prerequisites

- Background in Computer Vision, Optimization techniques, and Deep Learning

- Python programming

Supervisor:

Rahul Chaudhari

Interactive story generation with visual input

Description

Conventional stories for children of ages 3—6 years are static, independent of the medium (text, video, audio). We aim to make stories interactive, by giving the user control over characters, objects, scenes, and timing. This will lead to the construction of novel, unique, and personalized stories situated in (partially) familiar environments. We restrict this objective to specific domains consisting of a coherent body of works, such as the children’s book series “Meine Freundin Conni”. The challenges in this thesis include finding a suitable knowledge representation for the domain, learning that representation automatically, and inferring a novel storyline over that representation with active user interaction. In this direction, both neural as well as symbolic approaches should be explored.

So far we have implemented a text-based interactive story generation system based on Large Language Models. In this thesis, the text input modality should be replaced by visual input. In particular, the story should be driven by real-world motion of figurines and objects, rather than an abstract textual description of the scene and its dynamics.

 

Prerequisites

- First experiences with 2D/3D Computer Vision and Computer Graphics

- Familiarity with AI incl. Deep Learning (university courses / practical experience)

- Programming in Python

 

Supervisor:

Rahul Chaudhari

Deep Learning models for zero-shot object detection and segmentation

Description

In the world of computer vision, data labeling holds immense significance for training powerful machine learning models. Accurate annotations provide the foundation for teaching algorithms to understand visual information effectively. However, data labeling in computer vision poses unique challenges, including the complexity of visual data, the need for precise annotations, and handling large-scale datasets. Overcoming these challenges is crucial for enabling computer vision systems to extract valuable insights, identify objects, and revolutionize a wide range of industries.

Therefore, the development of automatic annotation pipelines for 2D and 3D labeling in various tasks is crucial, leveraging recent advancements in computer vision to enable automatic, efficient and accurate labeling of visual data.

This master thesis will focus on automatically labeling images and videos, and specifically generating 2D/3D labels (i.e., 2D/3D bounding boxes and segmentation masks). The automatic labeling pipeline has to generalize to any type of images and videos such as, household objects, toys, indoor/outdoor environments, etc.

The automatic labeling pipeline will be developed based on zero-shot detection and segmentation models suchGroundingDINO andsegment-anything, in addition to similar methods (seeAwesome Segment Anything). Additionally, the labeling pipeline including the used models will be implemented in theautodistill code base and the performance will be tested by training and evaluating some smaller target models for specific tasks.

Sub-tasks:

?     Automatic generation of 2D labels for images and videos, such as 2D bounding boxes and segmentation masks (seeGrounded-Segment-Anything andsegment-any-moving,Segment-and-Track-Anything).

?     Automatic generation of 3D labels for images and videos, such as 3D bounding boxes and segmentation masks (see3D-Box-Segment-Anything,SegmentAnything3D,segment-any-moving,Segment-and-Track-Anything).

?     Implement a 2D/3D labeling tool to modify and improve the automatic 2D/3D labels (seeDLTA-AI)

?     The automatic labeling pipeline in addition to the used base models and some target models have to be implemented in theautodistill code base to enable an easy end-to-end labeling, training, and deployment for various tasks such as 2D/3D object detection, segmentation.

?     Comprehensive overview of the performance and limitation of the current zero-shot models for the use of automatic labeling for tasks such as 2D/3D object detection, segmentation.

?     Suggestion of future works to overcome the limitation of the used methods

Bonus tasks:

?     Adding image augmentation and editing methods to the labeling pipeline and tool to generate more data (seeEditAnything)

?     Implement one-shot labeling methods to generate labels for unique objects (seePersonalize-SAM andMatcher)

Prerequisites

Interest and first experiences in Computer Vision, Deep Learning, Python programming, 3D data.

Supervisor:

Rahul Chaudhari

VR-based 3D synthetic data generation for interactive Computer Vision tasks

Description

Under this topic, the student will extend our existing VR-based synthetic data generation tool for Hand-Object interactions. Furthermore, the student will generate synthetic data using this tool and evaluate state-of-the-art Computer Vision and Deep Learning models for tracking Hand-Object Interactions in 3D.

Prerequisites

  • Strong familiarity with Python programming
  • Interest and first experiences in Computer Graphics, VR, Computer Vision, and Deep Learning.
  • Ideally also interest and experience in Blender 3D software

Supervisor:

Rahul Chaudhari

iOS app for tracking objects using RGB and depth data

Description

This topic is about the development of an iPhone app for tracking objects in the environment using data from the device's RGB and depth sensors.

Prerequisites

  • Good programming experience with C++ and Python
  • Ideally, experience building iOS apps with SWIFT and/or Unity ARFoundation
  • This topic is only suitable for you if you have a recent personal mac development device (ideally at least a MacBook Pro with Apple Silicon M1) and at least an iPhone 12 Pro with a LiDAR depth sensor

Supervisor:

Rahul Chaudhari

Student Assistant Jobs

HiWi / Working Student for Blender tasks

Keywords:
3D, blender, python
Short Description:
This is a working student position for a variety of tasks in the Blender environment: 3D modelling of characters / objects, character rigging, animation, interactive rendering, etc. Part of the job is automate certain workflows or tasks in blender using the Blender Python API.

Description

This is a working student position for a variety of tasks in the Blender environment:

  • 3D modelling of characters / objects,
  • character rigging,
  • animation,
  • interactive rendering, etc.
  • Part of the job is automate certain workflows or tasks in blender using the Blender Python API.

 

Prerequisites

  • Strong interest in 3D Computer Graphics and Gaming.
  • Very strong familiarity with Blender
  • Comfortable programming in Python
  • Ideally: also familiarity with development environments on Linux and windows.

Please send a description of your interest and experience regarding the above points together with your application.

Contact

https://www.ce.cit.tum.de/lmt/team/mitarbeiter/chaudhari-rahul/

Supervisor:

Rahul Chaudhari