In the world of computer vision, data labeling holds immense significance for training powerful machine learning models. Accurate annotations provide the foundation for teaching algorithms to understand visual information effectively. However, data labeling in computer vision poses unique challenges, including the complexity of visual data, the need for precise annotations, and handling large-scale datasets. Overcoming these challenges is crucial for enabling computer vision systems to extract valuable insights, identify objects, and revolutionize a wide range of industries.
Therefore, the development of automatic annotation pipelines for 2D and 3D labeling in various tasks is crucial, leveraging recent advancements in computer vision to enable automatic, efficient and accurate labeling of visual data.
This master thesis will focus on automatically labeling images and videos, and specifically generating 2D/3D labels (i.e., 2D/3D bounding boxes and segmentation masks). The automatic labeling pipeline has to generalize to any type of images and videos such as, household objects, toys, indoor/outdoor environments, etc.
The automatic labeling pipeline will be developed based on zero-shot detection and segmentation models suchGroundingDINO andsegment-anything, in addition to similar methods (seeAwesome Segment Anything). Additionally, the labeling pipeline including the used models will be implemented in theautodistill code base and the performance will be tested by training and evaluating some smaller target models for specific tasks.
Sub-tasks:
? Automatic generation of 2D labels for images and videos, such as 2D bounding boxes and segmentation masks (seeGrounded-Segment-Anything andsegment-any-moving,Segment-and-Track-Anything).
? Automatic generation of 3D labels for images and videos, such as 3D bounding boxes and segmentation masks (see3D-Box-Segment-Anything,SegmentAnything3D,segment-any-moving,Segment-and-Track-Anything).
? Implement a 2D/3D labeling tool to modify and improve the automatic 2D/3D labels (seeDLTA-AI)
? The automatic labeling pipeline in addition to the used base models and some target models have to be implemented in theautodistill code base to enable an easy end-to-end labeling, training, and deployment for various tasks such as 2D/3D object detection, segmentation.
? Comprehensive overview of the performance and limitation of the current zero-shot models for the use of automatic labeling for tasks such as 2D/3D object detection, segmentation.
? Suggestion of future works to overcome the limitation of the used methods
Bonus tasks:
? Adding image augmentation and editing methods to the labeling pipeline and tool to generate more data (seeEditAnything)
? Implement one-shot labeling methods to generate labels for unique objects (seePersonalize-SAM andMatcher)