Offene Arbeiten

Equivariant 3D Object Detection

Keywords:
3D Object Detection, Computer Vision, Deep Learning, Indoor Environments

Description

The thesis focuses on the application of equivariant deep learning techniques for 3D object detection in indoor scenes. Indoor environments, such as homes, offices, and industrial settings, present unique challenges for 3D object detection due to diverse object arrangements, varying lighting conditions, and occlusions. Traditional methods often struggle with these complexities, leading to suboptimal performance. The motivation for this research is to enhance the robustness and accuracy of 3D object detection in these environments, leveraging the inherent advantages of equivariant deep learning. This approach aims to improve the model's ability to recognize objects regardless of their orientation and position in the scene, which is crucial for applications in robotics, or augmented reality. 

 

The thesis proposes the development of a deep learning model that incorporates equivariant neural networks for 3D object detection, such as the equivariant framework proposed in [1]. The proposed model will be evaluated on a benchmark 3D indoor dataset, such as the Stanford 3D Indoor Spaces Dataset (S3DIS) or the ScanNet dataset [2, 3].

 

References

[1] Deng, Congyue, et al. "Vector neurons: A general framework for so (3)-equivariant networks." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

[2] Dai, Angela, et al. "Scannet: Richly-annotated 3d reconstructions of indoor scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[3] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019).

Prerequisites

  • Python and Git
  • Experience with a deep learning framework (Pytorch, Tensorflow)
  • Interest in Computer Vision and Machine Learning

Supervisor:

Adam Misik

Laufende Arbeiten

Master's Theses

Scene Graph-based Indoor Localization

Keywords:
3D Computer Vision, Deep Learning, Indoor Localization

Description

This thesis investigates 3D scene graph representations and deep learning for localization in complex indoor environments.

Prerequisites

  • Python and Git
  • Experience with a deep learning framework (Pytorch, Tensorflow)
  • Interest in Computer Vision and Machine Learning

Supervisor:

Adam Misik

Leveraging Multimodal Data for Scan2CAD-based 3D Reconstruction

Keywords:
3D Computer Vision, Deep Learning, Scan-to-CAD

Description

3D reconstruction of indoor environments is essential for various applications, such as virtual reality, simulation, and robotics. The Scan2CAD approach is a state-of-the-art method for 3D reconstruction and modeling based on point clouds and CAD models [1]. The Scan2CAD approach is primarily focused on geometric structure and may not be able to capture the color, texture, or material properties of objects within the scene. This can limit its usefulness in applications where object appearance is important.   

The proposed master thesis aims to improve Scan2CAD-based reconstruction by using multiple modalities, such as RGB images and CAD models. By using multiple modalities, the accuracy of Scan2CAD-based reconstruction can be further improved. The thesis can draw inspiration from approaches presented in [2, 3, 4].

References

[1] Avetisyan, Armen, et al. "Scan2cad: Learning cad model alignment in rgb-d scans." Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 2019.

[2] Wald, Johanna, et al. "Rio: 3d object instance re-localization in changing indoor environments." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.

[3] Gümeli, Can, Angela Dai, and Matthias Nießner. "ROCA: robust CAD model retrieval and alignment from a single image." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

[4] Siddiqui, Yawar, et al. "Texturify: Generating textures on 3d shape surfaces." Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III. Cham: Springer Nature Switzerland, 2022.

Prerequisites

  • Python and Git
  • Experience with a deep learning framework (Pytorch, Tensorflow)
  • Interest in Computer Vision and Machine Learning

Contact

Please send your CV and Transcript of Records to:

adam.misik@tum.de

Supervisor:

Adam Misik

Global Camera Localization in Lidar Maps

Keywords:
Contrastive Learning, Localization, Camera, Lidar, Point Clouds

Description

Visual localization is a fundamental problem in computer vision that applies to applications such as robotics, autonomous driving, or augmented reality. A common approach to visual localization is based on matching 2D features of an image query with points in a previously acquired 3D map [1]. A shortcoming of this of this approach is the viewpoint dependency of the query features, which leads to poor results when the viewing angle between the query and the map varies. Other effects, such as photometric inconsistencies, limit the potential of using 2D image features for  localization in a 3D map. 

 

Recently, localization based on point cloud to point cloud matching has been introduced [2,3]. Once a map is created using a 3D sensor such as lidar, the device can be localized by directly matching a query lidar point cloud with a previously created global map. The  advantage of this approach is the higher robustness against viewpoint variations and the direct depth information available in both the 3D map and 3D query [1]. 

 

A common assumption in visual localization approaches is that the same modality is used for both query and map generation. However, this is often not the case, especially for devices used for robotics and augmented reality. In order to perform point cloud-based localization for these common cases, cross-source point cloud retrieval and registration must be addressed. In this work, such an approach is investigated.

 

References

 

[1] T. Caselitz, B. Steder, M. Ruhnke, and W. Burgard, “Monocular camera localization in 3D LiDAR maps,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2016, pp. 1926–1931. doi: 10.1109/IROS.2016.7759304.

 

[2] J. Du, R. Wang, and D. Cremers, “DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization,” in Computer Vision – ECCV 2020, vol. 12349, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham: Springer International Publishing, 2020, pp. 744–762. doi: 10.1007/978-3-030-58548-8_43.

 

[3] J. Komorowski, M. Wysoczanska, and T. Trzcinski, “EgoNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale.” arXiv, Oct. 24, 2021. Accessed: Oct. 31, 2022. [Online]. Available: http://arxiv.org/abs/2110.12486

 

 

 

 

Prerequisites

  • Python and Git
  • Experience with SLAM
  • Experience with a deep learning framework (Pytorch, Tensorflow)
  • Interest in Computer Vision and Machine Learning

Contact

Please send your CV and Transcript of Records to:

adam.misik@tum.de

Supervisor:

Adam Misik

Uncertainty Quantification for Deep Learning-based Point Cloud Registration

Keywords:
Uncertainty Quantification, Point Cloud Registration, Bayesian Inference, Deep Learning

Description

The problem of registering point clouds can be reduced to estimating a Euclidean transformation between two sets of 3D points [1]. Once the transformation is estimated, it can be used to register two point clouds in a common coordinate system.

Applications of point cloud registration include 3D reconstruction, localization, or change detection. However, these applications rely on a high similarity between point clouds and do not account for disturbances in the form of noise, occlusions, or outliers. Such defects degrade the quality of the point cloud and thus the accuracy of the registration-dependent application. One approach to deal with these effects is to quantify the registration uncertainty. The general idea is to use uncertainty as a guide for point cloud registration quality. If the uncertainty is too high, a new registration iteration or re-scanning is needed.

In this project, we investigate uncertainty quantification for current learning-based approaches to point cloud registration [1, 2, 3]. First, several methods for uncertainty quantification are selected [4]. Of particular interest are approaches based on Bayesian inference. The approaches are then modified to fit current point cloud registration frameworks and evaluated against benchmark datasets such as ModelNet or ShapeNet. In the evaluation, different types of scan perturbations need to be tested.

References

[1] Huang, Xiaoshui, et al. A Comprehensive Survey on Point Cloud Registration. arXiv:2103.02690, arXiv, 5 Mar. 2021. arXiv.org, http://arxiv.org/abs/2103.02690.

[2] Yuan, Wentao, et al. DeepGMR: Learning Latent Gaussian Mixture Models for Registration. arXiv:2008.09088, arXiv, 20 Aug. 2020. arXiv.org, http://arxiv.org/abs/2008.09088.

[3] Huang, Shengyu, et al. “PREDATOR: Registration of 3D Point Clouds with Low Overlap.” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2021, pp. 4265–74. DOI.org (Crossref), https://doi.org/10.1109/CVPR46437.2021.00425.

[4] Abdar, Moloud, et al. “A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges.” Information Fusion, vol. 76, Dec. 2021, pp. 243–97. ScienceDirecthttps://doi.org/10.1016/j.inffus.2021.05.008.

   

 

Prerequisites

  • Python and Git
  • Experience with a deep learning framework (Pytorch, Tensorflow)
  • Interest in Computer Vision and Machine Learning

Contact

Please send your CV and Transcript of Records to:

adam.misik@tum.de

 

Supervisor:

Adam Misik

Research Internships (Forschungspraxis)

Camera-Lidar Dataset for Localization Tasks

Keywords:
Camera, Lidar, Dataset Creation, SLAM, Machine Learning

Description

In this project, we will create a camera-lidar dataset. The dataset can be used for the improvement of visual localization tasks. The idea is to extend existing localization datasets with camera-lidar correspondence point clouds [1].  

For the generation lidar submaps, we will use the procedure proposed in [2]. For the camera submaps, we will use visual SLAM and extract the 3D reconstruction [3]. Both the lidar and camera submaps will then be linked based on odometry or timestamp information provided with the localization dataset. 

References

[1] Maddern, Will, et al. "1 year, 1000 km: The Oxford RobotCar dataset." The International Journal of Robotics Research 36.1 (2017): 3-15.

[2] Uy, Mikaela Angelina, and Gim Hee Lee. "Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

[3] Mur-Artal, Raul, Jose Maria Martinez Montiel, and Juan D. Tardos. "ORB-SLAM: a versatile and accurate monocular SLAM system." IEEE transactions on robotics 31.5 (2015): 1147-1163.

Prerequisites

  • Python and Git
  • C++ basics
  • Interest in SLAM and Computer Vision 

Contact

Please send your CV and Transcript of Records to:

adam.misik@tum.de

Supervisor:

Adam Misik