Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)
Daniel D. Lee, H. Seung
We have constructed an inexpensive video based motorized tracking system that learns to track a head. It uses real time graphical user inputs or an auxiliary infrared detector as supervisory signals to train a convolutional neural network. The inputs to the neural network consist of normalized luminance and chrominance images and motion information from frame differences. Subsampled images are also used to provide scale invariance. During the online training phases the neural network rapidly adjusts the input weights depending up on the reliability of the different channels in the surrounding environment. This quick adaptation allows the system to robustly track a head even when other objects are moving within a cluttered background.