Deep Learning of Invariant Features via Simulated Fixations in Video

Zou, Will; Zhu, Shenghuo; Yu, Kai; Ng, Andrew

Deep Learning of Invariant Features via Simulated Fixations in Video

Will Zou, Shenghuo Zhu, Kai Yu, Andrew Y. Ng

Advances in Neural Information Processing Systems 25 (NIPS 2012)

Abstract

We apply salient feature detection and tracking in videos to simulate ﬁxations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classiﬁcation accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset.

Abstract

Name Change Policy