{"title": "Global Optimisation of Neural Network Models via Sequential Sampling", "book": "Advances in Neural Information Processing Systems", "page_first": 410, "page_last": 416, "abstract": null, "full_text": "Global Optimisation of Neural Network \n\nModels Via Sequential Sampling \n\nJ oao FG de Freitas \nCambridge University \n\nEngineering Department \n\nCambridge CB2 1PZ England \n\njfgf@eng.cam.ac.uk \n\n[Corresponding author] \n\nMahesan Niranjan \nCambridge University \nEngineering Department \n\nCambridge CB2 1PZ England \n\nniranjan@eng.cam.ac.uk \n\nArnaud Doucet \n\nCambridge University \n\nEngineering Department \n\nCambridge CB2 1PZ England \n\nad2@eng.cam.ac.uk \n\nAndrew H Gee \n\nCambridge University \nEngineering Department \n\nCambridge CB2 1PZ England \n\nahg@eng.cam.ac.uk \n\nAbstract \n\nWe propose a novel strategy for training neural networks using se(cid:173)\nquential sampling-importance resampling algorithms. This global \noptimisation strategy allows us to learn the probability distribu(cid:173)\ntion of the network weights in a sequential framework. It is well \nsuited to applications involving on-line, nonlinear, non-Gaussian or \nnon-stationary signal processing. \n\n1 \n\nINTRODUCTION \n\nThis paper addresses sequential training of neural networks using powerful sampling \ntechniques. Sequential techniques are important in many applications of neural net(cid:173)\nworks involving real-time signal processing, where data arrival is inherently sequen(cid:173)\ntial. Furthermore, one might wish to adopt a sequential training strategy to deal \nwith non-stationarity in signals, so that information from the recent past is lent more \ncredence than information from the distant past. One way to sequentially estimate \nneural network models is to use a state space formulation and the extended Kalman \nfilter (Singhal and Wu 1988, de Freitas, Niranjan and Gee 1998). This involves local \nlinearisation of the output equation, which can be easily performed, since we only \nneed the derivatives of the output with respect to the unknown parameters. This \napproach has been employed by several authors, including ourselves. \n\n\fGlobal Optimisation of Neural Network Models via Sequential Sampling \n\n4]] \n\nHowever, locallinearisation leading to the EKF algorithm is a gross simplification of \nthe probability densities involved. Nonlinearity of the output model induces multi(cid:173)\nmodality of the resulting distributions. Gaussian approximation of these densities \nwill loose important details. The approach we adopt in this paper is one of sampling. \nIn particular, we discuss the use of 'sampling-importance resampling' and 'sequential \nimportance sampling' algorithms, also known as particle filters (Gordon, Salmond \nand Smith 1993, Pitt and Shephard 1997), to train multi-layer neural networks. \n\n2 STATE SPACE NEURAL NETWORK MODELLING \n\nWe start from a state space representation to model the neural network's evolution \nin time. A transition equation describes the evolution of the network weights, while \na measurements equation describes the nonlinear relation between the inputs and \noutputs of a particular physical process, as follows: \n\nWk+l = Wk +dk \n\nYk = g(Wk, Xk) + Vk \n\n(1) \n(2) \nwhere (Yk E lRO) denotes the output measurements, (Xk E !R