Seminar on Topics in Signal Processing

Lecturer (assistant)
Duration3 SWS
TermWintersemester 2021/22
Language of instructionEnglish
Position within curriculaSee TUMonline
DatesSee TUMonline

Admission information

See TUMonline
Note: Please use TUMonline to register for the seminar:


Every participant works on his/her own topic. The goal of the seminar is to train and enhance the ability to work independently on a scientific topic. Every participant is supervised individually by an experienced researcher. This supervisor helps the student to get started, provides links to the relevant literature and gives feedback and advice on draft versions of the paper and the presentation slides.


The major goals of the seminar are to learn how to do scientific research and to learn and practice presentation techniques. Each student has to prepare a scientific talk about the topic he or she has registered for. The students have to collect the required literature, understand its contents, and prepare a presentation about it that summarizes the topic.

Teaching and learning methods

The main teaching methods are: - Computer-based presentations by the student - The students mainly work with high quality and recent scientific publications


Scientific paper (30%) Interaction with the supervisor and working attitude (20%) Presentation and discussion (50%)


Main subject for WS21/22: Machine Learning and Computer Vision for Digital Twinning

The kick-off meeting for the seminar is on 22.10.2021 at 13:15 in Seminar Room 0406.


The available topics are given below with further details:


Neural rendering refers to the set of generative deep learning methods that enables the extraction and manipulation of scene properties such as semantic information, geometry and illumination [1]. The field being relatively new, most of the methods revolve around the idea of representing the scene properties implicitly by neural networks. One of the first examples are the occupancy networks, mapping coordinates to occupancy values [2] and the DeepSDF network mapping coordinates to signed distance function values [3]. Further work extends these networks to large everyday-life scenes for example, by utilizing them locally in local voxels of the scene [4]. While the aforementioned methods require some sort of supervision either in the form of occupancy or the signed distance function values, recent works utilize differential rendering to backproject color values from posed images [5]. This idea is further extended by the work titled Neural Radiance Field (NeRF) [6]. NeRF achieves state-of-the-art results via the estimation of density values in addition to the color information. This topic will require the students to investigate the current state of the literature regarding the utilization of implicit scene representation methods to extract 3D information such as the geometry from a large scene in presence of environmental data such as RGB(-D) images or Point Cloud obtained from LiDAR scans.

Supervision: Cem Eteke (


[1] Tewari, Ayush, et al. "State of the art on neural rendering." Computer Graphics Forum. Vol. 39. No. 2. 2020.

[2] Mescheder, Lars, et al. "Occupancy networks: Learning 3d reconstruction in function space." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[3] Park, Jeong Joon, et al. "Deepsdf: Learning continuous signed distance functions for shape representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[4] Chabra, Rohan, et al. "Deep local shapes: Learning local sdf priors for detailed 3d reconstruction." European Conference on Computer Vision. Springer, Cham, 2020.

[5] Niemeyer, Michael, et al. "Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

[6] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020.

In order to improve the traditional production system in meeting the market demand for customized products, robots are increasingly used in the industrial field.  This has led to a manufacturing trend towards hybrid and more flexible production systems [1]. In the case of human-robot collaboration or human-machine interaction [2], we often need a predictive system to help the robot understand the human's current activity and to predict and assist him in his further work. RGBD camera is widely used in these fields. For example, the prediction can be implemented directly using the end to end model with RGB videos [3]. Alternatively, we can use the RGBD information to extract the semantic information and spatial-temporal relationship (Spatial-temporal Scene Graphs) between people and objects, and then use the predictive model for human activity prediction [4][5].

You need to explore the state of the art approaches by understanding the techniques mentioned above for predicting human activity using RGBD sensor and utilize this technology in Digital Twinning.

Supervision: Yuankai Wu (  


[1] Erkoyuncu, J.A.; del Amo, I.F.; Ariansyah, D.; Bulka, D.; Roy, R. A design framework for adaptive digital twins. CIRP Ann. 2020, 69, 145–148.

[2] Kousi, N.; Gkournelos, C.; Aivaliotis, S.; Lotsaris, K.; Bavelos, A.C.; Baris, P.; Michalos, G.; Makris, S. Digital Twin for Designing and Reconfiguring Human–Robot Collaborative Assembly Lines. Appl. Sci. 2021, 11, 4620.

[3] Hussein, Noureldien, Efstratios Gavves, and Arnold WM Smeulders. "Timeception for complex action recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[4] Dreher, Christian RG, Mirko Wächter, and Tamim Asfour. "Learning object-action relations from bimanual human demonstration using graph networks." IEEE Robotics and Automation Letters 5.1 (2019): 187-194.

[5] Ji, Jingwei, et al. "Action genome: Actions as compositions of spatio-temporal scene graphs." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

Remote operation and assistance technologies are used in many applications including robotic teleoperation and industrial telemaintenance systems. Digital twinning aims to increase the transparency and the task success of the robotic teleoperation setups, by providing a representation of the remote environment and the robotic system to the supporting human. Digital twinning technology is also used for remote maintenance/inspection systems and remote collaboration, in combination with the internet of things technology. In this seminar topic, we will investigate, compare and contrast different remote operation and assistance systems that benefit from digital twinning.

Supervision: Furkan Kaynar (


[1] Cichon, Torben, and Jürgen Roßmann. "Robotic teleoperation: Mediated and supported by virtual testbeds." 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR). IEEE, 2017.

[2] Laaki, Heikki, Yoan Miche, and Kari Tammi. "Prototyping a digital twin for real time remote control over mobile networks: Application of remote surgery." IEEE Access 7 (2019): 20325-20336.

[3] Havard, Vincent, et al. "Digital twin and virtual reality: a co-simulation environment for design and assessment of industrial workstations." Production & Manufacturing Research 7.1 (2019): 472-489.

[4] Ladwig, Philipp, et al. "Remote Guidance for Machine Maintenance Supported by Physical LEDs and Virtual Reality." Proceedings of Mensch und Computer 2019. 2019. 255-262.

[5] Tsokalo, Ievgenii A., et al. "Remote robot control with human-in-the-loop over long distances using digital twins." 2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 2019.


A robotic digital twin is a virtual representation of a robot and all physical elements, along with the dynamics of how they operate and interact. Humans pour liquids every day while eating, cleaning, cooking. However, it is super challenging to transfer these skills to robots. In this regard, the realistic simulated liquid poring experiments generated in the virtual environment can be a great enabler to teach robots how to pour. After infinite trials and errors through safe digital twin experiments, ultimately, we can transfer it to a reliable real-life robotic scenario. The lack of visual and physical realism and limited simulation environments and resources, particularly for precise pouring, motivated us to do comprehensive research and design this seminar topic. An in-depth literature review and extensive comparison would be a great starting point!

Supervision: Edwin Babaians (


[1] Kennedy, Monroe, Karl Schmeckpeper, Dinesh Thakur, Chenfanfu Jiang, Vijay Kumar, and Kostas Daniilidis. "Autonomous precision pouring from unknown containers." IEEE Robotics and Automation Letters 4, no. 3 (2019): 2317-2324. 

[2] Schenck, Connor, and Dieter Fox. "Visual closed-loop control for pouring liquids." In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2629-2636. IEEE, 2017. 

[3] Wu, Hongtao, and Gregory S. Chirikjian. "Can I Pour Into It? Robot Imagining Open Containability Affordance of Previously Unseen Objects via Physical Simulations." IEEE Robotics and Automation Letters 6, no. 1 (2020): 271-278.

[4] Z. Pan and D. Manocha, "Feedback motion planning for liquid pouring using supervised learning," 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 1252-1259, doi: 10.1109/IROS.2017.8202300. 

[5] Valassakis, Eugene, Zihan Ding, and Edward Johns. "Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics." In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5372-5379. IEEE, 2020. 

[6] Juliani, Arthur, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan Harper, Chris Elion, Chris Goy et al. "Unity: A general platform for intelligent agents." arXiv preprint arXiv:1809.02627 (2018).


Among other things, there are two important requirements for every Digital Twin:

1) Having a precise representation of the environment as a 3D model

2) Knowing the exact position of all moving objects withing this 3D model

To create the 3D model, many different portable scanning devices already exist. However, it is desirable to repeat the scanning procedure as often as possible to incorporate changes in the environment. The indoor localization can be done with various sensors (WiFi, Bluetooth or LIDARs) with different precisions. Since nearly all autonomous agents within a Digital Twin are equipped with atleast one camera, Visual SLAM has proven to solve both requirements at the same time: It can provide localization with high accuracy as well as update the 3D model with semantic information or/and point cloud data in real-time. In the recent years, more and more Deep Visual SLAM systems [1-4] were proposed which show promising results. Therefore, the student shall investigate in this topic the advantages and disadvantages of Deep Visual SLAM systems compared to traditional systems. In addition, the application to Digital Twins should be included in the study.

Supervision: Sebastian Eger (


[1] Teed, Zachary, and Jia Deng. 2021. “DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras.” ArXiv:2108.10869 [Cs], August.

[2] Bloesch, Michael, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, and Andrew J. Davison. 2019. “CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM.” ArXiv:1804.00874 [Cs], April.

[3] Jatavallabhula, Krishna Murthy, Ganesh Iyer, and Liam Paull. 2020. “∇SLAM: Dense SLAM Meets Automatic Differentiation.” In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2130–37.

[4] Yang, Nan, Lukas von Stumberg, Rui Wang, and Daniel Cremers. 2020. “D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry.” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1278–89. Seattle, WA, USA: IEEE


Effective Human-machine interaction is an important requirement for an efficient production process. However, operating machines requires comprehensive knowledge about the task and the machine [3]. Digital twins rely on both the knowledge base of accumulated data over time and real-time presentation adaptive the changes in the physical part [2], allowing bidirectional coupling between the virtual and physical model [4]. On the other hand, Augmented Reality (AR) is an interface choice for digital twins that allows visualization and interaction with the twin models in real-time. DT systems have many use cases, including construction cyber-physical systems[4], product assembly simulation [1], production and design  [2], Where AR simplifies the operator task. Your goal is to survey the literature for a suitable use case combining Digital Twins with AR as an interface, understand, and explain the underlying building blocks, with focus the SOTA solutions provided for some of them.

Supervision: Marsil Zakour (


[1] Chan Qiu, Shien Zhou, Zhenyu Liu, Qi Gao, Jianrong Tan,  Digital assembly technology based on augmented reality and digital twins: a review, Virtual Reality & Intelligent Hardware, Volume 1, Issue 6, 2019, Pages 597-610, ISSN 2096-5796,

[2] Zexuan Zhu, Chao Liu, Xun Xu, Visualisation of the Digital Twin data in manufacturing by using Augmented Reality, Procedia CIRP, Volume 81, 2019, Pages 898-903, ISSN 2212-8271,

[3] Xin Ma, Fei Tao, Meng Zhang, Tian Wang, Ying Zuo, Digital twin enhanced human-machine interaction in product lifecycle, Procedia CIRP, Volume 83, 2019, Pages 789-793, ISSN 2212-8271,

[4] Syed Mobeen Hasan, Kyuhyup Lee, Daeyoon Moon, Soonwook Kwon, Song Jinwoo & Seojoon Lee (2021) Augmented reality and digital twin system for interaction with construction machinery, Journal of Asian Architecture and Building Engineering, DOI: 10.1080/13467581.2020.1869557

Modern indoor mapping systems (e.g. NavVis[1], MatterPort[2], etc.) make it possible to create a digital twin of an environment. With sufficient data, machine learning techniques can be used to have the digital twin include semantic information. However, real-world environments are dynamic and change over time. This raises the question of how an object detector, which has been trained to detect specific object classes, can deal with the changes that can occur. Specifically, new unknown objects have to be detected and added to the set of objects to be learned and known classes should not be 'forgotten' by the object detector as the environment evolves. In this seminar topic, we investigate current trends in open world (or open set) object detection and incremental learning to approach this problem.

Supervision: Martin Piccolrovazzi (




[3] Joseph et al. Towards Open World Object Detection, CVPR 2021,

[4] Dhamija et al. The Overlooked Elephant of Object Detection: Open Set, WACV 2020,

[5] Miller et al. Dropout Sampling for Robust Object Detection in Open-Set Conditions, ICRA 2018

[6] Castro et al. End-to-End Incremental Learning, ECCV 2019


An important field in digital twining is object detection and tracking, since for a complete digital twin the information where objects are is vital. Extracting this needed semantic information from environment scans can either be static (object detection) or a process (object tracking). This heavily depends on which modality is used. When using multiple images inside a scanned area, often object detection is done separately on each image. Thus resulting in an assignment problem, where each detected object of every image must be matched with objects detected in another image [1,2,3]. In contrast, when using point cloud data, representing a complete environment scan, usually only one representation of the object is given and hence only one detection can and needs to be done[4]. For this topic, the student should first get a general overview of the two methods. Afterwards some more in-depth research on the topic should be done and the results should be presented.

Supervision: Michael Adam (


[1]Ciaparrone, Gioele, et al. "Deep learning in video multi-object tracking: A survey." Neurocomputing 381 (2020): 61-88.

[2]Scheidegger, Samuel, et al. "Mono-camera 3d multi-object tracking using deep learning detections and pmbm filtering." 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2018.

[3]Weng, Xinshuo, and Kris Kitani. "A baseline for 3d multi-object tracking." arXiv preprint arXiv:1907.03961 1.2 (2019): 6.

[4]Sommer, Markus, et al. "Automated Generation of a Digital Twin of a Manufacturing System by Using Scan and Convolutional Neural Networks." Transdisciplinary Engineering for Complex Socio-technical Systems–Real-life Applications: Proceedings of the 27th ISTE International Conference on Transdisciplinary Engineering, July 1–July 10, 2020. Vol. 12. IOS Press, 2020.

3D point clouds are a very commonly used representation of an environment captured from sensors, for example LiDAR, Radar or are also part of the output from RGB-D sensors. 3D LiDAR sensors capture sparse 360 degree scans in horizontal direction. In comparison, solid-state LiDARs capture a smaller field-of-view, but with a higher density. In some applications it is required to have a full surface reconstruction of an object or a building instead of a sampled point cloud, for example, when it is to be used in a simulation. 3D triangular meshes consist of vertices (points) and faces, which are the surfaces connecting the vertices. The major challenge is to identify which vertices should be connected by a face in order to represent the underlying structure in the best way. The student is required to first work into the literature of state-of-the-art 3D surface reconstruction methods from point clouds, understand the parameterization and finally, demonstrate a successful method of 3D reconstruction given a 3D point cloud of an object.

Supervision: Martin Oelsch (


[1] Kazhdan et al., Poisson Surface Reconstruction with Envelope Constraints, Computer Graphics Forum, 2020

A paradigm shift from static to more dynamic ways of manufacturing is taking place. Digital twins are a key element of the new manufacturing reality, in part because they help to close the Simulation-to-Reality Gap. With modern simulation platforms it is possible to easily generate enormous amounts of photorealistic synthetic data, that can be further augmented with Domain Randomization. This data can then be used to train Deep Learning models with results comparable to the ones obtained training with real data, but without loosing time labeling it. One of the many uses for this synthetic data is the estimation of the robot's pose, that can be employed to achieve a more accurate control of the robot.

Supervision: Diego Fernandez Prado (


[1] Lambrecht, Jens, and Linh Kästner. "Towards the usage of synthetic data for marker-less pose estimation of articulated robots in RGB images." 2019 19th International Conference on Advanced Robotics (ICAR). IEEE, 2019.

[2] Lambrecht, Jens. "Robust few-shot pose estimation of articulated robots using monocular cameras and deep-learning-based keypoint detection." 2019 7th International Conference on Robot Intelligence Technology and Applications (RiTA). IEEE, 2019.

[3] Zuo, Yiming, et al. "Craves: Controlling robotic arm with a vision-based economic system." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[4] Xia, Kaishu, et al. "Towards Semantic Integration of Machine Vision Systems to Aid Manufacturing Event Understanding." Sensors 21.13 (2021): 4276.

[5] Lee, Timothy E., et al. "Camera-to-robot pose estimation from a single image." 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.

As a very promising technology, digital twins can help robots better perceive and understand the environment, which can be used in many fields, such as automotive industry, healthcare services and manufacturing operations. Good object detection ability is indispensable to obtain accurate digital twins. In current research, robots have been able to give relatively accurate object detection results in good visibility environments,  in which computer vision can exert good performance. However, for other common poor visibility environments, such as bad weather and poor illumination, the performance of object dAs a very promising technology, digital twin can help robots better perceive and understand the environment, which can be used in many fields, such as automotive industry, healthcare services and manufacturing operations. Good object detection ability is indispensable to obtain accurate digital twin. In current research, robots have been able to give relatively accurate object detection results in good visibility environments,  in which computer vision can exert good performance. However, for other common poor visibility environments, such as bad weather and poor illumination, the performance of object detection needs further study. Therefore, this topic is to investigate the robot's object detection ability under weak visibility and understand the solution to this problem.

Supervision: Mengchen Xiong (


[1] Yang W, Yuan Y, Ren W, et al. Advancing image understanding in poor visibility environments: A collective benchmark study[J]. IEEE Transactions on Image Processing, 2020, 29: 5737-5752.

[2] Islam M J, Xia Y, Sattar J. Fast underwater image enhancement for improved visual perception[J]. IEEE Robotics and Automation Letters, 2020, 5(2): 3227-3234.

[3] Guan J, Madani S, Jog S, et al. Through Fog High-Resolution Imaging Using Millimeter Wave Radar[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11464-11473.

[4] Bijelic M, Gruber T, Mannan F, et al. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11682-11692.etection needs further study. Therefore, this topic is to investigate the robot's object detection ability under weak visibility and understand the solution to this problem.


Supervision: TBA