RESUMO
How best to evaluate a saliency model's ability to predict where humans look in images is an open research question. The choice of evaluation metric depends on how saliency is defined and how the ground truth is represented. Metrics differ in how they rank saliency models, and this results from how false positives and false negatives are treated, whether viewing biases are accounted for, whether spatial deviations are factored in, and how the saliency maps are pre-processed. In this paper, we provide an analysis of 8 different evaluation metrics and their properties. With the help of systematic experiments and visualizations of metric computations, we add interpretability to saliency scores and more transparency to the evaluation of saliency models. Building off the differences in metric properties and behaviors, we make recommendations for metric selections under specific assumptions and for specific applications.
RESUMO
When an observer looks at an image, his eyes fixate on a few select points. Fixations from different observers are often consistent-observers tend to look at the same locations. We investigate how image resolution affects fixation locations and consistency across humans through an eye-tracking experiment. We showed 168 natural images and 25 pink noise images at different resolutions to 64 observers. Each image was shown at eight resolutions (height between 4 and 512 pixels) and upsampled to 860 × 1024 pixels for display. The total amount of visual information available ranged from 1/8 to 16 cycles per degree, respectively. We measure how well one observer's fixations predict another observer's fixations on the same image at different resolutions using the area under the receiver operating characteristic (ROC) curves as a metric. We found that: (1) Fixations from lower resolution images can predict fixations on higher resolution images. (2) Human fixations are biased toward the center for all resolutions and this bias is stronger at lower resolutions. (3) Human fixations become more consistent as resolution increases until around 16-64 pixels (1/2 to 2 cycles per degree) after which consistency remains relatively constant despite the spread of fixations away from the center. (4) Fixation consistency depends on image complexity.