Upon successful completion of this module, students are able to understand the challenges in Human Activity Understanding and design processes for automatic sensor-based recognition of ongoing human activity.
Students are able to collect and utilize synthetic data as well as multi-camera sequential data in ego-perspective and stationary setups, annotating and extracting relevant semantic information, and learning about representation for spatial and temporal data.
Students are able to learn how to use AI models and algorithms to extract information available from a scene, and recognize and predict human activity based on the extracted information.
They are eventually able to analyze and evaluate the results of the various algorithms involved as well as the solutions they have designed.
Sensor data collection and annotation
- Multi-sensor and multi-view data collection and processing, including color/depth/IMU
- Synthetic data generation for Human Actions
- Accelerated ground truth annotation using interactive instance segmentation and tracking
Semantic inference building blocks
- Object detection
- Human and Object pose estimation
Graph representation of spatial and temporal data
- 3D scene graphs
- semantic graphs
- Spatio-Temporal graphs
- Knowledge Bases (Ontologies)
Sequential deep learning models for Human Activity Recognition and Anticipation
- Recurrent Neural Networks
- Graph Networks
Teaching and learning methods
- Supervised weekly lab sessions with several introductory lectures by research assistants at the beginning of the
course, and supervised practical implementation based on the provided skeleton codes.
- Individual methods and solutions introduced by the student
- Lectures on theoretical basics of project planning, technical management. and tools for collaboration (SCRUM, Gitlab, Wiki, etc.)
- Final project: individual and group work with independent planning, execution and documentation
- Seminar: Presentation of intermediate and final results and discussion (reflection, feedback).
The following media forms will be used:
- Script and review articles from the technical literature
- Tutorials and software documentation
- Development Environment (virtual machines on server)
- Simulation environment
- Data collection setup
- [20%] Implementation of introductory practical tasks in the field of Human Activity Understanding in Python, C++ - data acquisition and processing, recognition of people and objects in the scene, obtaining semantic understanding of ongoing activity (2 Data-acquistion campagains, 8 programming tasks).
- [60%] Hands-on project work - creating project plans and presenting them (8-10 Min. Presentation), regularly discussing work progress and next steps with supervisor (2 meetings), technical problem solving, and using appropriate tools for efficient teamwork (4 lab sessions).
- [20%] Ca. 10-minute presentation of results including demo, followed by a ca. 10-minute discussion.