ICS data sets

We are providing three public data sets for the purposes of activity and object recognition.

The first data set (cooking) is dedicated to explore methods for the recognition of human everyday activities in realistic scenarios. In this case we capture the human motions during two cooking scenarios. The uniqueness of our proposed data set is that the human motions were captured using three external cameras and two egocentric cameras.

The second data set (objects) is available to analyze techniques for the recognition of objects under different constraints and lighting conditions. The images stored in this data set were obtained from the cameras of the iCub humanoid robot, which contain low and high resolution images, thus making major variability of light and background constraints in the obtained images.

The third is the Household Activities from Virtual Environments Data Set (HAVE) consist of 240 recordings in virtual reality, done at the Automatica Trade Fair 2018 in Munich. Recordings are split among the three household scenarios: "setting the table", "washing the dishes" and "cleaning the living room".

Please note:

This data is free for use in research projects. If you publish results obtained using this data, we would appreciate it if you acknowledge our work and would send the citation to your published paper to info@ics.ei.tum.de.


The Household Activities from Virtual Environments Data Set (HAVE) consist of 240 recordings in virtual reality, done at the Automatica Trade Fair 2018 in Munich. Recordings are split among the three household scenarios: "setting the table", "washing the dishes" and "cleaning the living room". The data set is of exploratory nature and each scenario name serves as the sole instruction the participants received (e.g. "please set the table"), in order to collect a wide variety of styles.


The cooking data sets, which explore more realistic scenarios, such as making pancakes or making sandwiches for the reason that in these scenarios several goal-directed movements can be observed, which could be typically perceived in such natural environments.


The object data set contains 16,500 training images, 8,250 validation images and 8,250 testing images with a resolution of 240×240. The 16,500 training images are created by 500 images per setting × 3 settings × 11 object.