Technical specification of the recordings

We recorded two real and challenging tasks: pancake and sandwich making. The experimental setup was similar for both tasks as shown in the above figure. It consists of three cameras located in different positions and a gaze camera with attached markers.

The location of the sensors into the kitchen as well as the objects used to prepare the sandwiches are depicted in the above figure. The number of subjects, repetitions and conditions are: a) for the pancake scenario is 1 subject, 9 repetitions and 1 (normal) condition; b) regarding the sandwich scenario we have 8 subjects, 8 repetitions and 2 (normal and fast speed) conditions.

The recordings have the same configuration. They all contain the following modalities:

  • Three video static cameras. The cameras have a frequency of 60 frames per second with Bitrate of 24012 Kbps. The video was captured with Sanyo digital cameras HD2000. The videos are of high resolution with dimensions of 1920 x 1080 pixels.
  • One mounted camera that was capturing images on top of the head of the subjects with the EyeSeeCam [Schneider et al., 2009]. In other words, this camera has the first view perspective. This camera has a frequency of 25 frames per rate. The dimensions of the video are 720 x 406 pixels.
  • One gaze camera that is rotating following the user’s eye motion. This is a very unique video due to it gives the focus of attention of the subject during the execution of the task. This camera has a frequency of 25 HZ. The video’s dimensions are 720 x 406 pixels similar to the mounted camera, but zooming the focus of the user. This camera has more noise since it has a rotation motion additionally to the motion of the head of the user.