FIGRIM Fixation Dataset

We provide eye fixation data for a total of 2,787 images spanning 21 indoor and outdoor scene categories. These images are split into two sets: 630 target images and over 2K filler images. For the 630 target images, we provide eye fixation data for an average of 16 observers per image, memorability scores, pre-computed fixation maps for training and testing saliency models, and complete annotations for all the objects in each image. For the 2K filler images, we provide eye fixation data for an average of 15 observers per image and pre-computed fixation maps for training and testing saliency models. These images are a subset of the FIGRIM fine-grained image memorability dataset.

                    @article{figrim,
                        title={Intrinsic and Extrinsic Effects on Image Memorability},
                        author={Bylinskii, Zoya and Isola, Phillip and Bainbridge, Constance and Torralba, Antonio and Oliva, Aude},
                        journal={Vision research},
                        volume={116},
                        pages={165--178},
                        year={2015},
                        publisher={Elsevier}
                    }

data collection

Images

A total of 630 target images were chosen from the FIGRIM fine-grained image memorability dataset, by sampling 30 images from each of FIGRIM's 21 indoor and outdoor scene categories. 2K filler images were chosen in equal proportions from the same scene categories (94-105 images per category). In a single session, a participant would see a sequence of about 1000 images, of which about 157-158 were targets randomly sampled from the 630 target images. These targets repeated a total of 3 times throughout the image sequence, spaced 50-60 filler images apart. Filler images only occured once throughout the entire image sequence.

Eye tracking

All images in the sequence were presented for 2 s each, separated by a fixation cross lasting 0.4 s. Participants were instructed to respond to each image to indicate whether or not it had appeared previously in the sequence (forced-choice response at the end of each image presentation). All participant eye-fixations and keypresses were recorded. Images were presented at 1000x1000 px on a 19 inch CRT monitor with a resolution of 1280 x 1024 pixels, 22 inches from the chin rest mount. Images subtended 30 degress of visual angle. Eyetracking was performed on an SR Research Eyelink1000 desktop system at a sampling rate of 500Hz.

target images downloads

README file

target images

download file structure with all images (630 targets across 21 scene categories)

fixation data

download file structure with fixation maps (jpg images containing heatmap of fixations per image file)
download file structure with fixation locations (Matlab file containing binary matrix per image file)

metadata and object annotations

download Matlab file with LabelMe object annotations and individual participant recordings (fixations, memory scores)

filler images downloads

README file

filler images

download file structure with all images (over 2k fillers across 21 scene categories)

fixation data

metadata

download Matlab file with individual participant recordings (fixations, memory scores)

Code

Code to help display object annotations, plot fixations, and otherwise access the data structure.

contact

Zoya Bylinskii
E-mail: zoya[at]mit.edu

This work is supported by the National Science Foundation (under Grant No. 1016862 to A.O), the Natural Sciences and Engineering Research Council of Canada, as well as Google, Xerox, and MIT CSAIL Big Data Initiative Awards. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation and other funding agencies. All materials in this website, including images, data, and visualization, can be used for academic research purpose ONLY.