Visual saliency estimation: from heuristic to data-driven

Visual attention is one of the most important mechanisms that are adopted by human brain in processing visual information. By processing ``visually salient’’ information with high priority and ignoring less important visual subsets, the bottleneck in information processing is smartly handled by human brain. In this manner, the human brain, which consists of only 10-20 billion neurons, demonstrates an impressive capability in processing the 10 billion bits information received by retina in every second. Such a mechanism, once being fully studied and understood, may provide some kinds of insights to the researches on brain-liked and brain-inspired computing.

Actually, the mechanisms of visual attention have been studied by neurobiologists and psychologists for several centuries, and it is claimed that ``everyone knows what attention is.’’ However, in the area of computer vision, the problem is how can we turn human knowledge on visual attention and visual saliency into computational models? In early researches, such computational models of visual saliency focus on the direct simulation of certain characteristics of the attention system depicted in neurobiological and psychological experiments. Such heuristic models, which can predict visual attention and saliency to some extent, often fail on unexpected scenes since it is very difficult to directly simulate all the attention mechanisms in human brain. As a result, recent studies turn to a new methodology: developing saliency models in a data-driven manner. In these studies, the attention mechanism in human brain is viewed as a black box, and machine learning methods are adopted to infer such mechanisms reflected in the attentive activities in eye-tracking experiments. As the learning-based methods enable the development of complex attention/saliency models, the performance of the data-driven models often outperform that of heuristic models in predicting human fixations.

Recently, Dr. Jia Li, associate professor at school of computer science and engineering of Beihang University, has published his latest work of visual saliency estimation on IEEE TPAMI. In his work, a template-based contrast computation theory is proposed to explain the process of saliency computation in the frequency domain. As the first attempt to ``learn’’ a saliency model in the frequency domain, this work well explains the correlation between Fourier coefficients and visual saliency. By depicting the inherent relationship between spatial and spectral saliency models, this work enables the development of cascaded saliency filters learned in the frequency domain. Experimental results show that the proposed approach outperforms dozens of state-of-the-arts on public benchmarks.

Jia Li, associate professor, school of computer science and engineering, Beihang University,

[1] Finding the Secret of Image Saliency in the Frequency Domain. IEEE TPAMI, In Press, 2015.