Scene Graph-based Real-time Scene Understanding for Assistive Robot Manipulation Task
Beschreibung
With the rapid development of embodied intelligent robots, real-time and accurate scene understanding is crucial for robots to complete tasks efficiently and effectively. Scene graphs represent objects and their relations in a scene via a graph structure. Previous studies have generated scene graphs from images or 3D scenes, also with the assistance of large language models (LLMs).
In this work, we investigate the application of scene graphs in assisting the human operator during the teleoperated manipulation task. Leveraging real-time generated scene graphs, the robot system can obtain a comprehensive understanding of the scene and also reason the best solution to complete the manipulation task based on the current robot state.
Voraussetzungen
- Good Programming Skills (Python, C++)
- Knowledge about Ubuntu/Linux/ROS
- Motivation to learn and conduct research
Kontakt
dong.yang@tum.de
(Please attach your CV and transcript)
Betreuer:
Equivariant 3D Object Detection
3D Object Detection, Computer Vision, Deep Learning, Indoor Environments
Beschreibung
The thesis focuses on the application of equivariant deep learning techniques for 3D object detection in indoor scenes. Indoor environments, such as homes, offices, and industrial settings, present unique challenges for 3D object detection due to diverse object arrangements, varying lighting conditions, and occlusions. Traditional methods often struggle with these complexities, leading to suboptimal performance. The motivation for this research is to enhance the robustness and accuracy of 3D object detection in these environments, leveraging the inherent advantages of equivariant deep learning. This approach aims to improve the model's ability to recognize objects regardless of their orientation and position in the scene, which is crucial for applications in robotics, or augmented reality.
The thesis proposes the development of a deep learning model that incorporates equivariant neural networks for 3D object detection, such as the equivariant framework proposed in [1]. The proposed model will be evaluated on a benchmark 3D indoor dataset, such as the Stanford 3D Indoor Spaces Dataset (S3DIS) or the ScanNet dataset [2, 3].
References
[1] Deng, Congyue, et al. "Vector neurons: A general framework for so (3)-equivariant networks." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Dai, Angela, et al. "Scannet: Richly-annotated 3d reconstructions of indoor scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[3] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019).
Voraussetzungen
- Python and Git
- Experience with a deep learning framework (Pytorch, Tensorflow)
- Interest in Computer Vision and Machine Learning