David Barber, Bertrand Mesot
We introduce a method for approximate smoothed inference in a class of switching linear dynamical systems, based on a novel form of Gaussian Sum smoother. This class includes the switching Kalman Filter and the more general case of switch transitions dependent on the continuous latent state. The method improves on the standard Kim smoothing approach by dispensing with one of the key approximations, thus making fuller use of the available future information. Whilst the only central assumption required is projection to a mixture of Gaussians, we show that an additional conditional independence assumption results in a simpler but stable and accurate alternative. Unlike the alternative unstable Expectation Propagation procedure, our method consists only of a single forward and backward pass and is reminiscent of the standard smoothing `correction' recursions in the simpler linear dynamical system. The algorithm performs well on both toy experiments and in a large scale application to noise robust speech recognition.
1 Switching Linear Dynamical System The Linear Dynamical System (LDS)  is a key temporal model in which a latent linear process generates the observed series. For complex time-series which are not well described globally by a single LDS, we may break the time-series into segments, each modeled by a potentially different LDS. This is the basis for the Switching LDS (SLDS) [2, 3, 4, 5] where, for each time t, a switch variable st 1, . . . , S describes which of the LDSs is to be used. The observation (or `visible') vt RV is linearly related to the hidden state ht RH with additive noise by vt = B (st )ht + v (st ) p(vt |ht , st ) = N (B (st )ht , v (st )) (1 ) where N (, ) denotes a Gaussian distribution with mean and covariance . The transition dynamics of the continuous hidden state ht is linear, A ( ht = A(st )ht-1 + h (st ), p(ht |ht-1 , st ) = N (st )ht-1 , h (st ) 2) The switch st may depend on both the previous st-1 and ht-1 . This is an augmented SLDS (aSLDS), and defines the model p(v1:T , h1:T , s1:T ) = tT p(vt |ht , st )p(ht |ht-1 , st )p(st |ht-1 , st-1 )
The standard SLDS considers only switch transitions p(st |st-1 ). At time t = 1, p(s1 |h0 , s0 ) simply denotes the prior p(s1 ), and p(h1 |h0 , s1 ) denotes p(h1 |s1 ). The aim of this article is to address how to perform inference in the aSLDS. In particular we desire the filtered estimate p(ht , st |v1:t ) and the smoothed estimate p(ht , st |v1:T ), for any 1 t T . Both filtered and smoothed inference in the SLDS is intractable, scaling exponentially with time .