Seminar on Topics in Signal Processing

Vortragende/r (Mitwirkende/r)
Umfang3 SWS
SemesterWintersemester 2021/22
Stellung in StudienplänenSiehe TUMonline
TermineSiehe TUMonline


Siehe TUMonline
Anmerkung: Please use TUMonline to register for the seminar:


Every participant works on his/her own topic. The goal of the seminar is to train and enhance the ability to work independently on a scientific topic. Every participant is supervised individually by an experienced researcher. This supervisor helps the student to get started, provides links to the relevant literature and gives feedback and advice on draft versions of the paper and the presentation slides.


The major goals of the seminar are to learn how to do scientific research and to learn and practice presentation techniques. Each student has to prepare a scientific talk about the topic he or she has registered for. The students have to collect the required literature, understand its contents, and prepare a presentation about it that summarizes the topic.

Lehr- und Lernmethoden

The main teaching methods are: - Computer-based presentations by the student - The students mainly work with high quality and recent scientific publications

Studien-, Prüfungsleistung

Scientific paper (30%) Interaction with the supervisor and working attitude (20%) Presentation and discussion (50%)


Main subject for WS21/22: Machine Learning and Computer Vision for Digital Twinning

The kick-off meeting for the seminar is on 22.10.2021 at 13:15 in Seminar Room 0406.

The available topics are given below with further details:

Neural rendering refers to the set of generative deep learning methods that enables the extraction and manipulation of scene properties such as semantic information, geometry and illumination [1]. The field being relatively new, most of the methods revolve around the idea of representing the scene properties implicitly by neural networks. One of the first examples are the occupancy networks, mapping coordinates to occupancy values [2] and the DeepSDF network mapping coordinates to signed distance function values [3]. Further work extends these networks to large everyday-life scenes for example, by utilizing them locally in local voxels of the scene [4]. While the aforementioned methods require some sort of supervision either in the form of occupancy or the signed distance function values, recent works utilize differential rendering to backproject color values from posed images [5]. This idea is further extended by the work titled Neural Radiance Field (NeRF) [6]. NeRF achieves state-of-the-art results via the estimation of density values in addition to the color information. This topic will require the students to investigate the current state of the literature regarding the utilization of implicit scene representation methods to extract 3D information such as the geometry from a large scene in presence of environmental data such as RGB(-D) images or Point Cloud obtained from LiDAR scans.

Supervision: Cem Eteke (


[1] Tewari, Ayush, et al. "State of the art on neural rendering." Computer Graphics Forum. Vol. 39. No. 2. 2020.

[2] Mescheder, Lars, et al. "Occupancy networks: Learning 3d reconstruction in function space." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[3] Park, Jeong Joon, et al. "Deepsdf: Learning continuous signed distance functions for shape representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[4] Chabra, Rohan, et al. "Deep local shapes: Learning local sdf priors for detailed 3d reconstruction." European Conference on Computer Vision. Springer, Cham, 2020.

[5] Niemeyer, Michael, et al. "Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

[6] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.

In order to improve the traditional production system in meeting the market demand for customized products, robots are increasingly used in the industrial field.  This has led to a manufacturing trend towards hybrid and more flexible production systems [1]. In the case of human-robot collaboration or human-machine interaction [2], we often need a predictive system to help the robot understand the human's current activity and to predict and assist him in his further work. RGBD camera is widely used in these fields. For example, the prediction can be implemented directly using the end to end model with RGB videos [3]. Alternatively, we can use the RGBD information to extract the semantic information and spatial-temporal relationship (Spatial-temporal Scene Graphs) between people and objects, and then use the predictive model for human activity prediction [4][5].

You need to explore the state of the art approaches by understanding the techniques mentioned above for predicting human activity using RGBD sensor and utilize this technology in Digital Twinning.

Supervision: Yuankai Wu (  


[1] Erkoyuncu, J.A.; del Amo, I.F.; Ariansyah, D.; Bulka, D.; Roy, R. A design framework for adaptive digital twins. CIRP Ann. 2020, 69, 145–148.

[2] Kousi, N.; Gkournelos, C.; Aivaliotis, S.; Lotsaris, K.; Bavelos, A.C.; Baris, P.; Michalos, G.; Makris, S. Digital Twin for Designing and Reconfiguring Human–Robot Collaborative Assembly Lines. Appl. Sci. 2021, 11, 4620.

[3] Hussein, Noureldien, Efstratios Gavves, and Arnold WM Smeulders. "Timeception for complex action recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[4] Dreher, Christian RG, Mirko Wächter, and Tamim Asfour. "Learning object-action relations from bimanual human demonstration using graph networks." IEEE Robotics and Automation Letters 5.1 (2019): 187-194.

[5] Ji, Jingwei, et al. "Action genome: Actions as compositions of spatio-temporal scene graphs." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.


Robot autonomy has been steadily improving since decades, however there can still be cases by which the robot cannot successfully accomplish a task autonomously. Especially, in unstructured environments like household environments, the robots face more uncertainty and novel cases. Having human-in-the-loop by robotic tasks is a well-established approach for handling such autonomy failures via human support.

With the developments in the digital twin technology, robotic task planning is not necessarily open-loop, but it can provide predictions about the outcome of a planned operation, including the physical interactions with the environment. This can help decreasing the task failure rates and increasing safety.  Having a digital twin-based planning interface can decrease the required level of cognitive load by the supporting human, and thereby allow non-experts to support the robotic planning process. In this topic, we will investigate various digital twin-based interfaces for assisting robotic tasks semi-autonomously.

Supervision: Furkan Kaynar (


[1] Verner, Igor, et al. "Digital Twin of the Robot Baxter for Learning Practice in Spatial Manipulation Tasks." International Conference on Remote Engineering and Virtual Instrumentation. Springer, Cham, 2019.

[2] Kuts, Vladimir, et al. "Digital twin based synchronised control and simulation of the industrial robotic cell using virtual reality." Journal of Machine Engineering 19 (2019).

[3] Garg, Gaurav, Vladimir Kuts, and Gholamreza Anbarjafari. "Digital Twin for FANUC Robots: Industrial Robot Programming and Simulation Using Virtual Reality." Sustainability 13.18 (2021): 10336.

[4] Wang, Xi, et al. "Interactive and Immersive Process-Level Digital Twin for Collaborative Human–Robot Construction Work." Journal of Computing in Civil Engineering 35.6 (2021): 04021023.

[5] Petzoldt, Christoph, et al. "Control architecture for digital twin-based human-machine interaction in a novel container unloading system." Procedia Manufacturing 52 (2020): 215-220.


A robotic digital twin is a virtual representation of a robot and all physical elements, along with the dynamics of how they operate and interact. Humans pour liquids every day while eating, cleaning, cooking. However, it is super challenging to transfer these skills to robots. In this regard, the realistic simulated liquid poring experiments generated in the virtual environment can be a great enabler to teach robots how to pour. After infinite trials and errors through safe digital twin experiments, ultimately, we can transfer it to a reliable real-life robotic scenario. The lack of visual and physical realism and limited simulation environments and resources, particularly for precise pouring, motivated us to do comprehensive research and design this seminar topic. An in-depth literature review and extensive comparison would be a great starting point!

Supervision: Edwin Babaians (


[1] Kennedy, Monroe, Karl Schmeckpeper, Dinesh Thakur, Chenfanfu Jiang, Vijay Kumar, and Kostas Daniilidis. "Autonomous precision pouring from unknown containers." IEEE Robotics and Automation Letters 4, no. 3 (2019): 2317-2324. 

[2] Schenck, Connor, and Dieter Fox. "Visual closed-loop control for pouring liquids." In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2629-2636. IEEE, 2017. 

[3] Wu, Hongtao, and Gregory S. Chirikjian. "Can I Pour Into It? Robot Imagining Open Containability Affordance of Previously Unseen Objects via Physical Simulations." IEEE Robotics and Automation Letters 6, no. 1 (2020): 271-278.

[4] Z. Pan and D. Manocha, "Feedback motion planning for liquid pouring using supervised learning," 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 1252-1259, doi: 10.1109/IROS.2017.8202300. 

[5] Valassakis, Eugene, Zihan Ding, and Edward Johns. "Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics." In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5372-5379. IEEE, 2020. 

[6] Juliani, Arthur, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan Harper, Chris Elion, Chris Goy et al. "Unity: A general platform for intelligent agents." arXiv preprint arXiv:1809.02627 (2018).

Among other things, there are two important requirements for every Digital Twin:

1) Having a precise representation of the environment as a 3D model

2) Knowing the exact position of all moving objects withing this 3D model

To create the 3D model, many different portable scanning devices already exist. However, it is desirable to repeat the scanning procedure as often as possible to incorporate changes in the environment. The indoor localization can be done with various sensors (WiFi, Bluetooth or LIDARs) with different precisions. Since nearly all autonomous agents within a Digital Twin are equipped with atleast one camera, Visual SLAM has proven to solve both requirements at the same time: It can provide localization with high accuracy as well as update the 3D model with semantic information or/and point cloud data in real-time. In the recent years, more and more Deep Visual SLAM systems [1-4] were proposed which show promising results. Therefore, the student shall investigate in this topic the advantages and disadvantages of Deep Visual SLAM systems compared to traditional systems. In addition, the application to Digital Twins should be included in the study.

Supervision: Sebastian Eger (


[1] Teed, Zachary, and Jia Deng. 2021. “DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras.” ArXiv:2108.10869 [Cs], August.

[2] Bloesch, Michael, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, and Andrew J. Davison. 2019. “CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM.” ArXiv:1804.00874 [Cs], April.

[3] Jatavallabhula, Krishna Murthy, Ganesh Iyer, and Liam Paull. 2020. “∇SLAM: Dense SLAM Meets Automatic Differentiation.” In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2130–37.

[4] Yang, Nan, Lukas von Stumberg, Rui Wang, and Daniel Cremers. 2020. “D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry.” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1278–89. Seattle, WA, USA: IEEE


Effective Human-machine interaction is an important requirement for an efficient production process. However, operating machines requires comprehensive knowledge about the task and the machine [3]. Digital twins rely on both the knowledge base of accumulated data over time and real-time presentation adaptive the changes in the physical part [2], allowing bidirectional coupling between the virtual and physical model [4]. On the other hand, Augmented Reality (AR) is an interface choice for digital twins that allows visualization and interaction with the twin models in real-time. DT systems have many use cases, including construction cyber-physical systems[4], product assembly simulation [1], production and design  [2], Where AR simplifies the operator task. Your goal is to survey the literature for a suitable use case combining Digital Twins with AR as an interface, understand, and explain the underlying building blocks, with focus the SOTA solutions provided for some of them.

Supervision: Marsil Zakour (


[1] Chan Qiu, Shien Zhou, Zhenyu Liu, Qi Gao, Jianrong Tan,  Digital assembly technology based on augmented reality and digital twins: a review, Virtual Reality & Intelligent Hardware, Volume 1, Issue 6, 2019, Pages 597-610, ISSN 2096-5796,

[2] Zexuan Zhu, Chao Liu, Xun Xu, Visualisation of the Digital Twin data in manufacturing by using Augmented Reality, Procedia CIRP, Volume 81, 2019, Pages 898-903, ISSN 2212-8271,

[3] Xin Ma, Fei Tao, Meng Zhang, Tian Wang, Ying Zuo, Digital twin enhanced human-machine interaction in product lifecycle, Procedia CIRP, Volume 83, 2019, Pages 789-793, ISSN 2212-8271,

[4] Syed Mobeen Hasan, Kyuhyup Lee, Daeyoon Moon, Soonwook Kwon, Song Jinwoo & Seojoon Lee (2021) Augmented reality and digital twin system for interaction with construction machinery, Journal of Asian Architecture and Building Engineering, DOI: 10.1080/13467581.2020.1869557

Modern indoor mapping systems (e.g. NavVis[1], MatterPort[2], etc.) make it possible to create a digital twin of an environment. With sufficient data, machine learning techniques can be used to have the digital twin include semantic information. However, real-world environments are dynamic and change over time. This raises the question of how an object detector, which has been trained to detect specific object classes, can deal with the changes that can occur. Specifically, new unknown objects have to be detected and added to the set of objects to be learned and known classes should not be 'forgotten' by the object detector as the environment evolves. In this seminar topic, we investigate current trends in open world (or open set) object detection and incremental learning to approach this problem.

Supervision: Martin Piccolrovazzi (




[3] Joseph et al. Towards Open World Object Detection, CVPR 2021,

[4] Dhamija et al. The Overlooked Elephant of Object Detection: Open Set, WACV 2020,

[5] Miller et al. Dropout Sampling for Robust Object Detection in Open-Set Conditions, ICRA 2018

[6] Castro et al. End-to-End Incremental Learning, ECCV 2019

An important field in digital twining is object detection and tracking, since for a complete digital twin the information where objects are is vital. Extracting this needed semantic information from environment scans can either be static (object detection) or a process (object tracking). This heavily depends on which modality is used. When using multiple images inside a scanned area, often object detection is done separately on each image. Thus resulting in an assignment problem, where each detected object of every image must be matched with objects detected in another image [1,2,3]. In contrast, when using point cloud data, representing a complete environment scan, usually only one representation of the object is given and hence only one detection can and needs to be done[4]. For this topic, the student should first get a general overview of the two methods. Afterwards some more in-depth research on the topic should be done and the results should be presented.

Supervision: Michael Adam (


[1]Ciaparrone, Gioele, et al. "Deep learning in video multi-object tracking: A survey." Neurocomputing 381 (2020): 61-88.

[2]Scheidegger, Samuel, et al. "Mono-camera 3d multi-object tracking using deep learning detections and pmbm filtering." 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2018.

[3]Weng, Xinshuo, and Kris Kitani. "A baseline for 3d multi-object tracking." arXiv preprint arXiv:1907.03961 1.2 (2019): 6.

[4]Sommer, Markus, et al. "Automated Generation of a Digital Twin of a Manufacturing System by Using Scan and Convolutional Neural Networks." Transdisciplinary Engineering for Complex Socio-technical Systems–Real-life Applications: Proceedings of the 27th ISTE International Conference on Transdisciplinary Engineering, July 1–July 10, 2020. Vol. 12. IOS Press, 2020.

3D point clouds are a very commonly used representation of an environment captured from sensors, for example LiDAR, Radar or are also part of the output from RGB-D sensors. 3D LiDAR sensors capture sparse 360 degree scans in horizontal direction. In comparison, solid-state LiDARs capture a smaller field-of-view, but with a higher density. In some applications it is required to have a full surface reconstruction of an object or a building instead of a sampled point cloud, for example, when it is to be used in a simulation. 3D triangular meshes consist of vertices (points) and faces, which are the surfaces connecting the vertices. The major challenge is to identify which vertices should be connected by a face in order to represent the underlying structure in the best way. The student is required to first work into the literature of state-of-the-art 3D surface reconstruction methods from point clouds, understand the parameterization and finally, demonstrate a successful method of 3D reconstruction given a 3D point cloud of an object.

Supervision: Martin Oelsch (


[1] Kazhdan et al., Poisson Surface Reconstruction with Envelope Constraints, Computer Graphics Forum, 2020

A paradigm shift from static to more dynamic ways of manufacturing is taking place. Digital twins are a key element of the new manufacturing reality, in part because they help to close the Simulation-to-Reality Gap. With modern simulation platforms it is possible to easily generate enormous amounts of photorealistic synthetic data, that can be further augmented with Domain Randomization. This data can then be used to train Deep Learning models with results comparable to the ones obtained training with real data, but without loosing time labeling it. One of the many uses for this synthetic data is the estimation of the robot's pose, that can be employed to achieve a more accurate control of the robot.

Supervision: Diego Fernandez Prado (


[1] Lambrecht, Jens, and Linh Kästner. "Towards the usage of synthetic data for marker-less pose estimation of articulated robots in RGB images." 2019 19th International Conference on Advanced Robotics (ICAR). IEEE, 2019.

[2] Lambrecht, Jens. "Robust few-shot pose estimation of articulated robots using monocular cameras and deep-learning-based keypoint detection." 2019 7th International Conference on Robot Intelligence Technology and Applications (RiTA). IEEE, 2019.

[3] Zuo, Yiming, et al. "Craves: Controlling robotic arm with a vision-based economic system." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[4] Xia, Kaishu, et al. "Towards Semantic Integration of Machine Vision Systems to Aid Manufacturing Event Understanding." Sensors 21.13 (2021): 4276.

[5] Lee, Timothy E., et al. "Camera-to-robot pose estimation from a single image." 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.

As a very promising technology, digital twins can help robots better perceive and understand the environment, which can be used in many fields, such as automotive industry, healthcare services and manufacturing operations. Good object detection ability is indispensable to obtain accurate digital twins. In current research, robots have been able to give relatively accurate object detection results in good visibility environments,  in which computer vision can exert good performance. However, for other common poor visibility environments, such as bad weather and poor illumination, the performance of object dAs a very promising technology, digital twin can help robots better perceive and understand the environment, which can be used in many fields, such as automotive industry, healthcare services and manufacturing operations. Good object detection ability is indispensable to obtain accurate digital twin. In current research, robots have been able to give relatively accurate object detection results in good visibility environments,  in which computer vision can exert good performance. However, for other common poor visibility environments, such as bad weather and poor illumination, the performance of object detection needs further study. Therefore, this topic is to investigate the robot's object detection ability under weak visibility and understand the solution to this problem.

Supervision: Mengchen Xiong (


[1] Yang W, Yuan Y, Ren W, et al. Advancing image understanding in poor visibility environments: A collective benchmark study[J]. IEEE Transactions on Image Processing, 2020, 29: 5737-5752.

[2] Islam M J, Xia Y, Sattar J. Fast underwater image enhancement for improved visual perception[J]. IEEE Robotics and Automation Letters, 2020, 5(2): 3227-3234.

[3] Guan J, Madani S, Jog S, et al. Through Fog High-Resolution Imaging Using Millimeter Wave Radar[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11464-11473.

[4] Bijelic M, Gruber T, Mannan F, et al. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11682-11692.etection needs further study. Therefore, this topic is to investigate the robot's object detection ability under weak visibility and understand the solution to this problem.

In bilateral teleoperation with haptic feedback, a human sends movement control commands to a remote robot and receives the environment force feedback. Traditional bilateral telemanipulation control approaches often fail to provide a transparent interaction due to long and varying delays, packet loss, and limited bandwidth. Model Augmented Haptic Telemanipulation (MATM) and Model-Mediated Teleoperation (MMT) offer a solution to this problem by using a virtual model (e.g. digital twin) of the remote environment at the local side. In this way, predicted feedback from this virtual model can be instantaneously provided to the human operator, leading to high transparency.  MATM further suggests using another model of the task and robot knowledge at the remote (robot) side and enable shared autonomy, which will be further helpful with high delay. In this topic, we will study the MATM and MMT concepts and discuss different approaches to generate the virtual model of the remote environment.

Supervision: Basak Gülecyüz (


[1] T. Hullin et al., " Model-Augmented Haptic Telemanipulation: Concept, Retrospective Overview, and Current Use Cases," Frontiers in Robotics and AI, 2021.

[2] X. Xu, et al., "Model-Mediated Teleoperation: Toward Stable and Transparent Teleoperation Systems," in IEEE Access, 2016.

[3] X. Xu, et al., "Point Cloud-Based Model-Mediated Teleoperation With Dynamic and Perception-Based Model Updating," in IEEE Transactions on Instrumentation and Measurement, Nov. 2014.

[4] H. Beik-Mohammadi et al., "Model Mediated Teleoperation with a Hand-Arm Exoskeleton in Long Time Delays Using Reinforcement Learning," 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2020

[5]  Ni D, et al., "Translational objects dynamic modeling and correction for point cloud augmented virtual reality-based teleoperation," Advances in Mechanical Engineering, January 2018.