{"title": "Switching state space model for simultaneously estimating state transitions and nonstationary firing rates", "book": "Advances in Neural Information Processing Systems", "page_first": 2271, "page_last": 2279, "abstract": "We propose an algorithm for simultaneously estimating state transitions among neural states, the number of neural states, and nonstationary firing rates using a switching state space model (SSSM). This model enables us to detect state transitions based not only on the discontinuous changes of mean firing rates but also on discontinuous changes in temporal profiles of firing rates, e.g., temporal correlation. We derive a variational Bayes algorithm for a non-Gaussian SSSM whose non-Gaussian property is caused by binary spike events. Synthetic data analysis reveals the high performance of our algorithm in estimating state transitions, the number of neural states, and nonstationary firing rates compared to previous methods. We also analyze neural data recorded from the medial temporal area. The statistically detected neural states probably coincide with transient and sustained states, which have been detected heuristically. Estimated parameters suggest that our algorithm detects the state transition based on discontinuous change in the temporal correlation of firing rates, which transitions previous methods cannot detect. This result suggests the advantage of our algorithm in real-data analysis.", "full_text": "000\n001\n002\n003\n004\n005\n006\n007\n008\n009\n010\n011\n012\n013\n014\n015\n016\n017\n018\n019\n020\n021\n022\n023\n024\n025\n026\n027\n028\n029\n030\n031\n032\n033\n034\n035\n036\n037\n038\n039\n040\n041\n042\n043\n044\n045\n046\n047\n048\n049\n050\n051\n052\n053\n\nSwitching state space model for simultaneously\n\nestimating state transitions and nonstationary \ufb01ring\n\nrates\n\nAnonymous Author(s)\n\nAf\ufb01liation\nAddress\nemail\n\nAbstract\n\nWe propose an algorithm for simultaneously estimating state transitions among\nneural states, the number of neural states, and nonstationary \ufb01ring rates using a\nswitching state space model (SSSM). This algorithm enables us to detect state\ntransitions on the basis of not only the discontinuous changes of mean \ufb01ring rates\nbut also discontinuous changes in temporal pro\ufb01les of \ufb01ring rates, e.g., temporal\ncorrelation. We construct a variational Bayes algorithm for a non-Gaussian SSSM\nwhose non-Gaussian property is caused by binary spike events. Synthetic data\nanalysis reveals that our algorithm has the high performance for estimating state\ntransitions, the number of neural states, and nonstationary \ufb01ring rates compared\nto previous methods. We also analyze neural data that were recorded from the\nmedial temporal area. The statistically detected neural states probably coincide\nwith transient and sustained states that have been detected heuristically. Estimated\nparameters suggest that our algorithm detects the state transition on the basis of\ndiscontinuous changes in the temporal correlation of \ufb01ring rates, which transi-\ntions previous methods cannot detect. This result suggests that our algorithm is\nadvantageous in real-data analysis.\n\n1 Introduction\n\nElucidating neural encoding is one of the most important issues in neuroscience. Recent studies have\nsuggested that cortical neuron activities transit among neural states in response to applied sensory\nstimuli[1-3]. Abeles et al. detected state transitions among neural states using a hidden Markov\nmodel whose output distribution is multivariate Poisson distribution (multivariate-Poisson hidden\nMarkov model(mPHMM))[1]. Kemere et al.\nindicated the correspondence relationship between\nthe time of the state transitions and the time when input properties change[2]. They also suggested\nthat the number of neural states corresponds to the number of input properties. Assessing neural\nstates and their transitions thus play a signi\ufb01cant role in elucidating neural encoding. Firing rates\nhave state-dependent properties because mean and temporal correlations are signi\ufb01cantly different\namong all neural states[1]. We call the times of state transitions as change points. Change points\nare those times when the time-series data statistics change signi\ufb01cantly and cause nonstationarity\nin time-series data. In this study, stationarity means that time-series data have temporally uniform\nstatistical properties. By this de\ufb01nition, data that do not have stationarity have nonstationarity.\n\nPrevious studies have detected change points on the basis of discontinuous changes in mean \ufb01r-\ning rates using an mPHMM. In this model, \ufb01ring rates in each neural state take a constant value.\nHowever, actually in motor cortex, average \ufb01ring rates and preferred direction change dynamically\nin motor planning and execution[4]. This makes it necessary to estimate state-dependent, instanta-\nneous \ufb01ring rates. On the other hand, when place cells burst within their place \ufb01eld[5], the inter-burst\n\n1\n\n\f054\n055\n056\n057\n058\n059\n060\n061\n062\n063\n064\n065\n066\n067\n068\n069\n070\n071\n072\n073\n074\n075\n076\n077\n078\n079\n080\n081\n082\n083\n084\n085\n086\n087\n088\n089\n090\n091\n092\n093\n094\n095\n096\n097\n098\n099\n100\n101\n102\n103\n104\n105\n106\n107\n\nintervals correspond to the \u03b8 rhythm frequency. Medial temporal (MT) area neurons show oscilla-\ntory \ufb01ring rates when the target speed is modulated in the manner of a sinusoidal function[6]. These\nresults indicate that change points also need to be detected when the temporal pro\ufb01les of \ufb01ring rates\nchange discontinuously.\n\nOne solution is to simultaneously estimate both change points and instantaneous \ufb01ring rates. A\nswitching state space model(SSSM)[7] can model nonstationary time-series data that include change\npoints. An SSSM de\ufb01nes two or more system models, one of which is modeled to generate observa-\ntion data through an observation model. It can model nonstationary time-series data while switching\nsystem models at change points. Each system model estimates stationary state variables in the region\nthat it handles. Recent studies have been focusing on constructing algorithms for estimating \ufb01ring\nrates using single-trial data to consider trial-by-trial variations in neural activities [8]. However,\nthese previous methods assume \ufb01ring rate stationarity within a trial. They cannot estimate nonsta-\ntionary \ufb01ring rates that include change points. An SSSM may be used to estimate nonstationary\n\ufb01ring rates using single-trial data.\n\nWe propose an algorithm for simultaneously estimating state transitions among neural states and\nnonstationary \ufb01ring rates using an SSSM. We expect to be able to estimate change points when\nnot only mean \ufb01ring rates but also temporal pro\ufb01les of \ufb01ring rates change discontinuously. Our\nalgorithm consists of a non-Gaussian SSSM, whose non-Gaussian property is caused by binary\nspike events. Learning and estimation algorithms consist of variational Bayes[9,10] and local varia-\ntional methods[11,12]. Automatic relevance determination (ARD) induced by the variational Bayes\nmethod[13] enables us to estimate the number of neural states after pruning redundant ones. For\nsimplicity, we focus on analyzing single-neuron data. Although many studies have discussed state\ntransitions by analyzing multi-neuron data, some of them have suggested that single-neuron activ-\nities re\ufb02ect state transitions in a recurrent neural network[14]. Note that we can easily extend our\nalgorithm to multi-neuron analysis using the often-used assumption that change points are common\namong recorded neurons[1-3].\n\n2 De\ufb01nitions of Probabilistic Model\n\n2.1 Likelihood Function\n\nObservation time T consists of K time bins of widths \u2206 (ms), and each bin includes at most one\nspike (\u2206 \u00bf 1). The spike timings are t = ft1, ..., tSg where S is the total number of observed\nspikes. We de\ufb01ne \u03b7k such that \u03b7k = +1 if the kth bin includes a spike and \u03b7k = \u00a11 otherwise\n(k = 1, ..., K). The likelihood function is de\ufb01ned by the Bernoulli distribution\n\np(tj\u03bb) =\n\nK\n\nk=1(\u03bbk\u2206) 1+\u03b7k\n\n(1 \u00a1 \u03bbk\u2206) 1\u00a1\u03b7k\n\n2\n\n2\n\n,\n\n(1)\nwhere \u03bb = f\u03bb1, ..., \u03bbKg and \u03bbk is the \ufb01ring rate at the kth bin. The product of \ufb01ring rates and bin\nwidth corresponds to the spike-occurrence probability and \u03bbk\u2206 2 [0, 1) since \u2206 \u00bf 1. The logit\n1\u00a1\u03bbk\u2206 (xk 2 (\u00a11, 1)) lets us consider the nonnegativity of \ufb01ring\ntransformation of exp(2xk) = \u03bbk\u2206\nrates in detail[11]. Hereinafter, we call x = fx1, ..., xKg the \u201c\ufb01ring rates\u201d.\nSince K is a large because \u2206 \u00bf 1, the computational cost and memory accumulation do matter.\nWe thus use coarse graining[15]. Observation time T consists of M coarse bins of widths r = C\u2206\n(ms). A coarse bin includes many spikes and the \ufb01ring rate in each bin is constant. The likelihood\nfunction which is obtained by applying the logit transformation and the coarse graining to eq. (1) is\n\n\u220f\n\n\u220f\n\n\u2211\n\np(tjx) =\nu=1 \u03b7(m\u00a11)C+u.\n\nC\n\nwhere \u02c6\u03b7m =\n\nM\n\nm=1[exp(\u02c6\u03b7mxm \u00a1 C log 2 cosh xm)],\n\n(2)\n\n2.2 Switching State Space Model\n\nAn SSSM consists of N system models; for each model, we de-\n\ufb01ne a prior distribution. We de\ufb01ne label variables zn\nm such that\nm = 1 if the nth system model generates an observation in the\nzn\nm = 0 otherwise (n = 1, ..., N, m = 1, ..., M).\nmth bin and zn\n\n2\n\nFigure 1: Graphical model rep-\nresentation of SSSM.\n\nx11x21xM1x1Nx2NxMN12Mzzz\u03b71\u03b72\u03b7MFiring rateLabelvariableSpiketrain^^^\f108\n109\n110\n111\n112\n113\n114\n115\n116\n117\n118\n119\n120\n121\n122\n123\n124\n125\n126\n127\n128\n129\n130\n131\n132\n133\n134\n135\n136\n137\n138\n139\n140\n141\n142\n143\n144\n145\n146\n147\n148\n149\n150\n151\n152\n153\n154\n155\n156\n157\n158\n159\n160\n161\n\nWe call N the number of labels and the nth system model the nth\nlabel. The joint distribution is de\ufb01ned by\n\nN\nn=1\n\n(2\u03c0)M exp(\u00a1 \u03b2n\nj\u03b2n\u039bj\n\n2 (xn \u00a1 \u00b5n)T \u039b(xn \u00a1 \u00b5n)),\n\n(7)\n\np(t, x, zj\u03b8\n\n0) = p(tjx, z)p(zj\u03c0, a)p(xj\u00b5, \u03b2),\n1 , ..., xn\nM\n\n(3)\ng, z = fz1\n1, .., z1\n\nM\n\nN\nn=1\n\n\u220f\n\n1 , ..., zN\nM\n\nM , ..., zN\n\ng, and \u03b8\n\np(tjx, z) =\n\nwhere x = fx1, ..., xNg, xn = fxn\n\u220f\nf\u03c0, a, \u00b5, \u03b2g are parameters. The likelihood function, including label variables, is given by\n\u220f\n\nWe de\ufb01ne the prior distributions of label variables as\nn=1(\u03c0n)zn\n\n\u00a1 C log 2 cosh xn\n\u2211\n\u2211\nn=1 \u03c0n \u00a1 1),\nk=1 ank \u00a1 1),\nm+1 \u03b4(\n\n(6)\nwhere \u03c0n and ank are the probabilities that the nth label is selected at the initial time and that the\nnth label switches to the kth one, respectively. The prior distributions of \ufb01ring rates are Gaussian\n\n0 =\n\n(4)\n\n(5)\n\nm=1[exp(\u02c6\u03b7mxn\n\nk=1(ank)zn\n\nm)]zn\nm.\n\nN\nn=1\n\n1 \u03b4(\n\nmzk\n\nm\n\nN\n\nN\n\nN\n\np(zm+1jzm, a) =\n\n\u220f\np(z1j\u03c0) =\n\u220f\ncorrelation satisfying p(xnj\u03b2n, \u00b5n) /\u220f\n\nn=1 p(xnj\u03b2n, \u00b5n) =\n\np(x) =\n\n\u220f\n\nN\n\nN\n\n\u220f\n\u221a\n\nwhere \u03b2n, \u00b5n respectively mean the temporal correlation and the mean values of the nth-label \ufb01ring\nrates (n = 1, ..., N). Here for simplicity, we introduced \u039b, which is the structure of the temporal\n2 ((xm \u00a1 \u00b5m) \u00a1 (xm\u00a11 \u00a1 \u00b5m\u00a11))2). Figure 1\n\nm exp(\u00a1 \u03b2n\ndepicts a graphical model representation of an SSSM.\n\n[\n\n\u220f\n\n\u220f\n\np(\u03c0j\u03b3n) = C(\u03b3n)\n\nGhahramani & Hinton (2000) did not introduce a priori knowledge about the label switching fre-\nquencies. However, in many cases, the time scale of state transitions is probably slower than that of\n\u2211\nthe temporal variation of \ufb01ring rates. We de\ufb01ne prior distributions of \u03c0 and a to introduce a priori\nknowledge about label switching frequencies using Dirichlet distributions\n\u2211\n\u220f\nn=1 \u03c0n \u00a1 1),\nn=1(\u03c0n)\u03b3n\u00a11\u03b4(\nk=1 ank \u00a1 1)\np(aj\u03b3nk) =\nk=1(ank)\u03b3nk\u00a11\u03b4(\n\u0393(\u03b31)...\u0393(\u03b3N ) , C(\u03b3nk) = \u0393(PN\nn=1 \u03b3n)\nk=1 \u03b3nk)\n\nwhere C(\u03b3n) = \u0393(PN\n(9)\n\u222b 1\n\u0393(\u03b3n1)...\u0393(\u03b3nN ) . C(\u03b3n) and C(\u03b3nk) correspond to the\nnormalization constants of p(\u03c0j\u03b3n) and p(aj\u03b3nk), respectively. \u0393(u) is the gamma function de\ufb01ned\n0 dttu\u00a11 exp(\u00a1t). \u03b3n, \u03b3nk are hyperparameters to control the probability that the nth\nby \u0393(u) =\nlabel is selected at the initial time and that the nth label switches to the kth. We de\ufb01ne the prior\ndistributions of \u00b5n and \u03b2n using non-informative priors. Since we do not have a priori knowledge\nabout neural states, \u00b5 and \u03b2, which characterize each neural state, should be estimated from scratch.\n\nC(\u03b3nk)\n\nN\nn=1\n\n]\n\n(8)\n\nN\n\nN\n\nN\n\nN\n\n,\n\n3 Estimation and Learning of non-Gaussian SSSM\n\nIt is generally computationally dif\ufb01cult to calculate the marginal posterior distribution in an\nSSSM[6]. We thus use the variational Bayes method to calculate approximated posterior distri-\nbutions q(w) and q(\u03b8) that minimize the variational free energy\n\nF[q] =\n\nU[q] = \u00a1\u222b\u222b\n\nwhere w = fz, xg are hidden variables, \u03b8 = f\u03c0, ag are parameters,\n\ndwd\u03b8q(w)q(\u03b8) log p(t, w, \u03b8) and S[q] = \u00a1\u222b\u222b\n\ndwd\u03b8q(w)q(\u03b8) log q(w)q(\u03b8)\n\np(t,w,\u03b8) = U[q] \u00a1 S[q]\n\n(\n\n(10)\n\n)\n\ndwd\u03b8q(w)q(\u03b8) log\n\nq(w)q(\u03b8)\n\n.\n\n\u222b\u222b\n\nWe denote q(w) and q(\u03b8) as test distributions. The variational free energy satis\ufb01es\n\nlog p(t) = \u00a1F[q] + KL(q(w)q(\u03b8)kp(w, \u03b8jt)),\n\u222b\n\n(11)\nwhere KL(q(w)q(\u03b8)kp(w, \u03b8jt)) is the Kullback-Leibler divergence between test distributions and\na posterior distribution p(w, \u03b8jt) de\ufb01ned by KL(q(y)kp(yjt)) =\np(yjt) . Since the\nmarginal likelihood log p(t) takes a constant value, the minimization of variational free energy in-\ndirectly minimizes Kullback-Leibler divergence. The variational Bayes method requires conjugacy\nbetween the likelihood function (eq. (4)) and the prior distribution (eq. (7)). However, eqs. (4) and\n(7) are not conjugate to each other because of the binary spike events. The local variational method\nenables us to construct a variational Bayes algorithm for a non-Gaussian SSSM.\n\ndyq(y) log q(y)\n\n3\n\n\f162\n163\n164\n165\n166\n167\n168\n169\n170\n171\n172\n173\n174\n175\n176\n177\n178\n179\n180\n181\n182\n183\n184\n185\n186\n187\n188\n189\n190\n191\n192\n193\n194\n195\n196\n197\n198\n199\n200\n201\n202\n203\n204\n205\n206\n207\n208\n209\n210\n211\n212\n213\n214\n215\n\n3.1 Local Variational Method\n\nm) with respect to (xn\n\nThe local variational method, which was proposed by Jaakola & Jordan[11], approximately trans-\nforms a non-Gaussian distribution into a quadratic-form distribution by introducing variational pa-\nrameters. Watanabe et al. have proven the effectiveness of this method in estimating stationary\n\ufb01ring rates[12]. The exponential function in eq. (4) includes f(xn\nm, which is a\nm)2. The concavity can be con\ufb01rmed by showing the negativity of the\nconcave function of y = (xn\n\u220f\nsecond-order derivative of f(xn\nm. Considering the tangent line of\nf(xn\np\u03be(tjx, z) =\n(12)\nm is a variational parameter. Equation (12) satis\ufb01es the inequality p\u03be(tjx, z) \u2022 p(tjx, z).\nwhere \u03ben\nWe use eq. (12) as the likelihood function instead of eq. (4). The conjugacy between eqs. (12)\nand (7) enables us to construct the variational Bayes algorithm. Using eq. (12), we \ufb01nd that the\nvariational free energy\n\nm)2 at (xn\nm=1[exp(\u02c6\u03b7mxn\n\nm)2, we get a lower bound for eq. (4)\n\nm)2)) \u00a1 C log 2 cosh \u03ben\n\nm) with respect to (xn\n\nm) = log 2 cosh xn\n\nm)2 \u00a1 (\u03ben\n\n\u00a1 C tanh \u03ben\n\nm)2 for all xn\n\nm)2 = (\u03ben\n\nm)]zn\nm,\n\n\u220f\n\nN\nn=1\n\n((xn\n\n2\u03ben\nm\n\nM\n\nm\n\nm\n\nsatis\ufb01es the inequality F\u03be[q] \u201a F[q], where U\u03be[q] = \u00a1\u222b\u222b\n\ndwd\u03b8q(w)q(\u03b8) log q(w)q(\u03b8)\n\nF\u03be[q] =\n\np\u03be(t,w,\u03b8) = U\u03be[q] \u00a1 S[q]\n\ndwd\u03b8q\u03be(w)q\u03be(\u03b8) log p\u03be(s, w, \u03b8).\nSince the inequality log p(t, x, z) \u201a \u00a1F[q] \u201a \u00a1F\u03be[q] is satis\ufb01ed, the test distributions that mini-\nmize F\u03be[q] can indirectly minimize F[q] which is analytically intractable. Using the EM algorithm\nto estimate variational parameters improves the approximation accuracy of F\u03be[q][16].\n\n\u222b\u222b\n\n(13)\n\n\u220f\n\nN\n\n3.2 Variational Bayes Method\n\nm\n\nN\n\nN\n\n1\n\nM\n\n2\u03ben\nm\n\n\u222b\n\n\u222b\n\n(14)\n\nN\nn=1\n\nC tanh \u03ben\nm\n\n\u220f\n\n\u220f\n\n\u2211\n\nM\u00a11\nm=1\n\n(h(xn\n\nz q(z) = 1,\n\nd\u03c0q(\u03c0) = 1 and\n\nq(z) /\u220f\n\n\u220f\nq(xnj\u00b5n, \u03b2n) =\n\n\u222b\nn=1(q(xnj\u00b5n, \u03b2n))q(z)\nWe assume the test distributions that satisfy the constraints q(w) =\nand q(\u03b8) = q(\u03c0)q(a), where \u00b5 = f\u00b51, ..., \u00b5Ng, \u03b2 = f\u03b21, ..., \u03b2Ng. Under constraints\ndxq(xj\u00b5, \u03b2) = 1,\ndaq(a) = 1, we can obtain the test\ndistributions of hidden variables xn, z that minimize eq. (13) as follows:\n\n\u221a\n\u220f\n\u220f\n2(xn \u00a1 \u02c6\u00b5n)T W n(xn \u00a1 \u02c6\u00b5n)),\n(2\u03c0)M exp(\u00a1 1\njW nj\nm=1 exp(\u02c6bn\nm )zn\nn=1 exp(\u02c6\u03c0n)zn\nm)zn\nk=1 exp(\u02c6ank\n(15)\ni \u00a1\nm = \u02c6\u03b7mhxn\nwhere W n = CLn + \u03b2n\u039b, \u02c6\u00b5n = (W n)\u00a11(wn + \u03b2n\u039b\u00b5n), \u02c6\u03c0n = hlog \u03c0ni, \u02c6bn\nm, \u02c6ank = hlog anki, Ln is the diagonal matrix whose\nm)2) \u00a1 C log 2 cosh \u03ben\nm)2i \u00a1 (\u03ben\ni tanh \u03ben\n(m, m) component is hzn\ni\u02c6\u03b7m. h\u00a2i means\nthe average obtained using a test distribution q(\u00a2). The computational cost of calculating the inverse\nof each W is O(M) because \u039b is de\ufb01ned by a tridiagonal and Ln is a diagonal matrix.\n\u2211\ni controls the effective variance of the likelihood function. A higher\nIn the calculation of q(xn), hzn\ni means the data are\nhzn\n(\ni = 1, all labels estimate their \ufb01ring rates on the basis\nunreliable. Under the constraint\nm)2i that will be\nof divide-and-conquer principle of data reliability. Using the equality (\u03ben\nm = \u02c6\u03b7mhxn\ndeveloped in the next section, we obtain \u02c6bn\n1 +\n2 log 2 cosh\ni2\n(m,m)/hxn\n\u00a11\n(W n)\nin eq. (15). When the mth bin includes many (few) spikes, the nth label tends\nm\nto be selected if it estimates the highest (lowest) \ufb01ring rate among the labels. But the variance of the\n\u00a11\nnth label (W n)\n(m,m) penalizes that label\u2019s selection probability.\n\ni means the data are reliable for the nth label in the mth bin and lower hzn\n\n, wn is the vector whose (1, m) component is hzn\n\ni \u00a1 C log 2 coshhxn\n\nm)2 = h(xn\ni \u00a1 C\n\nhzn\n\nN\nn=1\n\nN\nn=1\n\nm+1,\n\n)\n\nmzk\n\n\u03ben\nm\n\nm\n\nm\n\nm\n\nm\n\nm\n\nm\n\nm\n\nm\n\nm\n\nm\n\n\u220f\n[\n\nWe can also obtain the test distribution of parameters \u03c0, a as\n\n\u2211\n\u220f\n\u2211\nn=1 \u03c0n \u00a1 1),\nn=1(\u03c0n)\u02c6\u03b3n\u00a11\u03b4(\nk=1 ank \u00a1 1)\nk=1(ank)\u02c6\u03b3nk\u00a11\u03b4(\nC(\u02c6\u03b3nk)\n\u2211\n\u0393(\u02c6\u03b31)...\u0393(\u02c6\u03b3N ) , C(\u02c6\u03b3nk) = \u0393(PN\n\nq(\u03c0) = C(\u02c6\u03b3n)\nq(a) =\n\nwhere C(\u02c6\u03b3n) = \u0393(PN\nk=1 \u02c6\u03b3nk)\nnormalization constants of q(\u03c0) and q(a), and \u02c6\u03b3n = hzn\n\n\u220f\n\nn=1 \u02c6\u03b3n)\n\nN\nn=1\n\nN\n\nN\n\nN\n\nN\n\n\u0393(\u02c6\u03b3n1)...\u0393(\u02c6\u03b3nN ) . C(\u02c6\u03b3n) and C(\u02c6\u03b3nk) correspond to the\ni + \u03b3nk.\n\ni + \u03b31, \u02c6\u03b3nk =\n\nhzn\nmzk\n\nM\u00a11\nm=1\n\nm+1\n\n1\n\n]\n\n,\n\n(16)\n\n(17)\n\n4\n\n\f216\n217\n218\n219\n220\n221\n222\n223\n224\n225\n226\n227\n228\n229\n230\n231\n232\n233\n234\n235\n236\n237\n238\n239\n240\n241\n242\n243\n244\n245\n246\n247\n248\n249\n250\n251\n252\n253\n254\n255\n256\n257\n258\n259\n260\n261\n262\n263\n264\n265\n266\n267\n268\n269\n\nWe can see \u03b3n in \u02c6\u03b3n controls the probability that the nth label is selected at the initial time, and\n\u03b3nk in \u02c6\u03b3nk biases the probability of the transition from the nth label to the kth label. A forward-\nbackward algorithm enables us to calculate the \ufb01rst- and second-order statistics of q(z). Since an\nSSSM involves many local solutions, we search for a global one using deterministic annealing,\nwhich is proven to be effective for estimating and learning in an SSSM [7].\n\n3.3 EM algorithm\n\nThe EM algorithm enables us to estimate variational parameters \u03be and parameters \u00b5 and \u03b2. In the\nEM algorithm, the calculation of the Q function is computationally dif\ufb01cult because it requires us to\ncalculate averages using the true posterior distribution. We thus calculate the Q function using test\ndistributions instead of the true posterior distributions as follows:\n\u02dcQ(\u00b5, \u03b2, \u03bek\u00b5(t\n0\n), \u03b2(t\nSince \u02dcQ(\u00b5, \u03b2, \u03bek\u00b5(t\nequivalent to minimizing the variational free energy (eq. (10) ). The update rules\nm)2i,\n\n\u222b\ndxq(xj\u00b5(t\n))q(z)q(\u03c0)q(a) log p\u03be(t, x, z, \u03c0, aj\u00b5, \u03b2). (18)\n)) = \u00a1U[q]\u03be, maximizing the Q function with respect to \u00b5, \u03b2, \u03be is\n\n0\n), \u03be(t\nm = hxn\n\u00b5n\n\nTr[\u039b((Wn)\u00a11+(hxni\u00a1\u00b5n)(hxni\u00a1\u00b5n)T )]\n\n)) =\n0\n), \u03b2(t\n\nm)2 = h(xn\n(\u03ben\n\n0\n), \u03b2(t\n\n0\n), \u03be(t\n\n\u03b2n =\n\nand\n\n(19)\n\ni,\n\nM\n\nm\n\n0\n\n0\n\n0\n\n\u00b6\n\nmaximize the Q function. The following table summarizes our algorithm.\n\nSummary of our algorithm\nInitialize parameters of model.\n\nt\n\n0 \u02c6 1\n) \u02c6 \u03be, \u00b5, \u03b2\n\n0\n), \u00b5(t\n\n0\n), \u03b2(t\n\nSet \u03b31 and \u03b3nk.\nPerform the following VB and EM algorithm until F\u03be[q] converges.\n0\n\u03be(t\nVariational Bayes algorithm Perform the VB-E and VB-M step until F\nVB-E step: Compute q(xj\u00b5(t\nVB-M step: Compute q(\u03c0) and q(a) using eq. (16) and eq. (17).\n\n)) and q(z) using eq. (14) and eq. (15).\n\n0\n), \u03b2(t\n\n\u03be(t\n\n0\n\n0\n\n)[q] converges.\n\n\u2021\n\n\u00b7\n\n0 + 1(cid:181)\n\nEM algorithm Compute \u03be, \u00b5, \u03b2 using eq. (19).\n0 \u02c6 t\n\nt\n\n4 Results\nThe estimated \ufb01ring rate in the mth bin is de\ufb01ned by \u02dcxm = hx\u02dcnm\narg maxnhzn\nand hzn\nN \u00a1 (the number of pruned labels), where we assume that the nth label is pruned out if hzn\n10\u00a15(8\n\ni, where \u02dcnm satis\ufb01es \u02dcnm =\nk 6= n)\nk 6= n). The estimated number of labels \u02dcN is given by \u02dcN =\ni <\nm). We call our algorithm \u201cthe variational Bayes switching state space model\u201d (VB-SSSM).\n\ni. The estimated change points \u02dcmr = \u02dcmC\u2206 satis\ufb01es hzn\n\ni < hzk\n\ni > hzk\n\ni (9\n\ni (8\n\n\u02dcm+1\n\n\u02dcm+1\n\nm\n\n\u02dcm\n\nm\n\n\u02dcm\n\nm\n\n4.1 Synthetic data analysis and Comparison with previous methods\n\nWe arti\ufb01cially generate spike trains from arbitrarily set \ufb01ring rates with an inhomogeneous gamma\nprocess. Throughout this study, we set \u03ba which means the spike irregularity to 2.4 in generating\nspike trains. We additionally con\ufb01rmed that the following results are invariant if we generate spikes\nusing inhomogeneous Poisson or inverse Gaussian process.\nIn this section, we set parameters to N = 5, T = 4000, \u2206 = 0.001, r = 0.04, \u03b3n = 1, \u03b3nk =\n100(n = k) or 2.5(n 6= k). The hyperparameters \u03b3nk represent the a priori knowledge where the\ntime scale of transitions among labels is suf\ufb01ciently slower than that of \ufb01ring-rate variations.\n\n4.1.1 Accuracy of change-point detections\n\nThis section discusses the comparative results between the VB-SSSM and mPHMM regarding the\naccuracy of change-point detections and number-of-labels estimation. We used the EM algorithm to\n\n5\n\n\f270\n271\n272\n273\n274\n275\n276\n277\n278\n279\n280\n281\n282\n283\n284\n285\n286\n287\n288\n289\n290\n291\n292\n293\n294\n295\n296\n297\n298\n299\n300\n301\n302\n303\n304\n305\n306\n307\n308\n309\n310\n311\n312\n313\n314\n315\n316\n317\n318\n319\n320\n321\n322\n323\n\nFigure 2: Comparative results of change-point detections for the VB-SSSM and the mPHMM. (a)\nand (c): Arbitrary set \ufb01ring rates for validating the accuracy of change-point detections when \ufb01ring\nrates include discontinuous changes in mean value (\ufb01g. (a)) or temporal correlation (\ufb01g. (c)). (b)\nand (d): Comparative results that correspond to \ufb01ring rates in (a) ((b)) and (c) ((d)). The stronger\nthe white color becomes, the more dominant the label is in the bin.\n\nestimate the label variables in the mPHMM[1-3]. Since the mPHMM is useful in analyzing multi-\n(\ntrial data, in the estimation of mPHMM we used ten spike trains under the assumption that change\n)\n)\npoints were common among ten spike trains. On the other hand, VB-SSSM uses single-trial data.\nFig. 2(a) displays arbitrarily set \ufb01ring rates to verify the change point detection accuracy when\nt 2\nmean \ufb01ring rates changed discontinuously. The \ufb01ring rate at time t(ms) was set to \u03bbt = 0.0\nt 2 [3000, 4000]\n[0, 1000), t 2 [2000, 3000)\n.\nThe upper graph in \ufb01g. 2(b) indicates the label variables estimated with the VB-SSSM and the\nlower indicates those estimated with the mPHMM. In the VB-SSSM, ARD estimated the number\nof labels to be three after pruning redundant labels. As a result of ten-trial data analysis, the VB-\nSSSM estimated the number of labels to be three in nine over ten spike trains. The estimated change\npoints were 1000\u00a70.0, 2000\u00a70.0, and 2990\u00a716.9ms. The true change points were 1000, 2000, and\n3000ms.\n\n(\n)\nt 2 [1000, 2000)\n\n, and \u03bbt = 60.0\n\n, \u03bbt = 110.0\n\n(\n\n(\n\n)\nt 2 [0, 2000)\n\n(\n\n)\nt 2 [2000, 4000]\n\n, \u03bbt = \u03bbt\u00a11 + 20.0zt\n\nFig. 2(c) plots the arbitrarily set \ufb01ring rates for verifying the change point detection accuracy when\ntemporal correlation changes discontinuously. The \ufb01ring rate at time t(ms) was set to \u03bbt = \u03bbt\u00a11 +\n2.0zt\n, where zt is a standard normal\nrandom variable that satis\ufb01es hzti = 0, hztzt0i = \u03b4tt0 (\u03b4tt0 = 0(t 6= t\n0)). Fig. 2(d)\nshows the comparative results between the VB-SSSM and mPHMM. ARD estimates the number of\nlabels to be two after pruning redundant labels. As a result of ten-trial data analysis, our algorithm\nestimated the number of labels to be two in nine over ten spike trains. The estimated change points\nwas 1933\u00a7315.1ms and the true change point was 2000ms.\n\n0), 1(t = t\n\n4.1.2 Accuracy of \ufb01ring-rate estimation\n\nThis section discusses the nonstationary \ufb01ring rate estimation accuracy. The comparative methods\ninclude kernel smoothing (KS), kernel band optimization (KBO)[17], adaptive kernel smoothing\n(KSA)[18], Bayesian adaptive regression splines (BARS)[19], and Bayesian binning (BB)[20]. We\nused a Gaussian kernel in KS, KBO, and KSA. The kernel widths \u03c3 were set to \u03c3 = 30 (ms) (KS30),\n)\n\u03c3 = 50 (ms) (KS50) and \u03c3 = 100 (ms) (KS100) in KS. In KSA, we used the bin widths estimated\n(\n)\nusing KBO. Cunningham et al. have reviewed all of these compared methods [8].\nt 2 [0, 480), t 2 [3600, 4000]\n, \u03bbt = 90.0 \u00a3\nA \ufb01ring rate at time t(ms) was set to \u03bbt = 5.0\nexp(\u00a111 (t\u00a1480)\nt 2 [2400, 3600)\n4000 )\nand we reset \u03bbt to 5.0 if \u03bbt < 5.0. We set these \ufb01ring rates assuming an experiment in which tran-\nsient and persistent inputs are applied to an observed neuron in a series. Note that input information,\nsuch as timings, properties, and sequences is entirely unknown.\n\n, \u03bbt = 80.0 \u00a3 exp(\u00a10.5(t \u00a1 2400)/4000))\n\n)\nt 2 [480, 2400)\n\n(\n\n(\n\n6\n\n(a)mPHMM0< z >1VB-SSSM04080120Firing rate (Hz)True firing rate10002000300004000Time (ms)0< z >110002000300004000(c)(b)(d)True firing rate04080120Firing rate (Hz)mPHMMVB-SSSMTime (ms)\f324\n325\n326\n327\n328\n329\n330\n331\n332\n333\n334\n335\n336\n337\n338\n339\n340\n341\n342\n343\n344\n345\n346\n347\n348\n349\n350\n351\n352\n353\n354\n355\n356\n357\n358\n359\n360\n361\n362\n363\n364\n365\n366\n367\n368\n369\n370\n371\n372\n373\n374\n375\n376\n377\n\nFigure 3: Results of \ufb01ring-rate estimation. (a): Estimated \ufb01ring rates. Vertical bars above abscissa\naxes are spikes used for estimates. (b): Averaged label variables hzn\ni. (c): Estimated \ufb01ring rates\nusing each label. (d): Mean absolute error \u00a7 standard deviation when applying our algorithm and\nother methods to estimate \ufb01ring rates plotted in (a). * indicates p<0.01 and ** indicates p<0.005.\n\nm\n\nFig. 3(a) plots the estimated \ufb01ring rates (red line). Fig. 3(b) plots the estimated label variables and\n\ufb01g. 3(c) plots the estimated \ufb01ring rates when all labels other than the pruned ones were used. ARD\nestimates the number of labels to be three after pruning redundant labels. As a result of ten spike\ntrains analysis, the VB-SSSM estimated the number of labels to be three in eight over ten spike\ntrains. The change points were estimated at 420\u00a782.8, 2385\u00a720.7, and 3605\u00a714.1ms. The true\nchange points were 480, 2400, and 3600ms.\n\n\u2211\n\nj\u03bbk \u00a1 \u02c6\u03bbkj, where \u03bbk and \u02c6\u03bbk are\nThe mean-absolute-error (MAE) is de\ufb01ned by MAE = 1\nK\nthe true and estimated \ufb01ring rates in the kth bin. All the methods estimate the \ufb01ring rates at ten\ntimes. Fig. 3(d) shows the mean MAE values averaged across ten trials and the standard deviations.\nWe investigated the signi\ufb01cant differences in \ufb01ring-rate estimation among all the methods using\nWilcoxon signed rank test. Both the VB-SSSM and BB show the high performance. Note that the\nVB-SSSM can estimate not only \ufb01ring rates but change points and the number of neural states.\n\nK\nk=1\n\n4.2 Real Data Analysis\n\nIn area MT, neurons preferentially respond to the movement directions of visual inputs[21]. We ana-\nlyzed the neural data recorded from area MT of a rhesus monkey when random dots were presented.\nThese neural data are available from the Neural Signal Archive (http://www.neuralsignal.org.), and\ndetailed experimental setups are described by Britten et al. [22]. The input onsets correspond to\nt = 0(ms), and the end of the recording corresponds to t = 2000(ms). This section discusses our\nanalysis of the neural data included in nsa2004.1 j001 T2. These data were recorded from the same\nneuron of the same subject. Parameters were set as follows: T = 2000, \u2206 = 0.001, N = 5, r =\n0.02, \u03b3n = 1(n = 1, ..., 5), \u03b3nk = 100(n = k) or 2.5(n 6= k).\nFig. 4 shows the analysis results when random dots have 3.2% coherence. Fig. 4 (a) plots the\nestimated \ufb01ring rates (red line) and a Kolmogorov-Smirnov plot (K-S plot) (inset)[23]. Since the\ntrue \ufb01ring rates for the real data are entirely unknown, we evaluated the reliability of estimated\nvalues from the con\ufb01dence intervals. The black and gray lines in the inset denote the K-S plot and\n95 % con\ufb01dence intervals. The K-S plot supported the reliability of the estimated \ufb01ring rates since\nit \ufb01ts into the 95% con\ufb01dence intervals. Fig. 4(b) depicts the estimated label variables, and \ufb01g.\n4(c) shows the estimated \ufb01ring rates using all labels other than the pruned ones. The VB-SSSM\nestimates the number of labels to be two. We call the label appearing on the right after the input\nonset \u201cthe 1st neural state\u201d and that appearing after the 1st neural state \u201cthe 2nd neural state\u201d. The\n1st and 2nd neural states in \ufb01g. 4 might corresponded to transient and sustained states[6] that have\nbeen heuristically detected, e.g. assuming the sustained state lasts for a constant time[24].\n\nWe analyzed all 105 spike trains recorded under presentations of random dots with 3.2%, 6.4%,\n12.8%, and 99.9% coherence, precluding the neural data in which the total spike count was less than\n\n7\n\n04080120Firing rate (Hz)Estimated firing rateTrue firing rate12345Label number0< z >110002000300004000Time (ms)(a)(b)10002000300004000Time (ms)04080120Firing rate (Hz)Estimated value using label 2True firing rateEstimated value using label 3Estimated value using label 110002000300004000Time (ms)(c)(d)56789101112BARSMean absolute errorKS30KS100KS50KBOKSABBVB-SSSM********(cid:638)(cid:638)(cid:638)p<0.01 **(cid:638)(cid:638)(cid:638)p<0.005*****\f378\n379\n380\n381\n382\n383\n384\n385\n386\n387\n388\n389\n390\n391\n392\n393\n394\n395\n396\n397\n398\n399\n400\n401\n402\n403\n404\n405\n406\n407\n408\n409\n410\n411\n412\n413\n414\n415\n416\n417\n418\n419\n420\n421\n422\n423\n424\n425\n426\n427\n428\n429\n430\n431\n\nFigure 4: Estimated results when applying the VB-SSSM to area MT neural data. (a): Estimated\n\ufb01ring rates. Vertical bars above abscissa axes are spikes used for estimates.\nInset is result of\nKolmogorov-Smirnov goodness-of-\ufb01t. Solid and gray lines correspond to K-S plot and 95% con\ufb01-\ndence interval. (b): Averaged label variables using test distribution. (c): Estimated \ufb01ring rates using\neach label. (d) and (e): Estimated parameters in the 1st and the 2nd neural states.\n\n20. The VB-SSSM estimated the number of labels to be two in 25 over 30 spike trains (3.2%), 19\nover 30 spike trains (6.4%), 26 over 30 spike trains (12.8%), and 16 over 16 spike trains (99.9%). In\nsummary, the number of labels is estimated to be two in 85 over 101 spike trains.\n\n\u2211\n\nTn\n\nFigs. 4(d) and (e) show the estimated parameters from 19 spike trains whose estimated number\nof labels was two (6.4% coherence). The horizontal axis denotes the arranged number of trials\nin ascending order. Figs. 4 (d) and (e) correspond to the estimated temporal correlation \u03b2 and\nthe time average of \u00b5, which is de\ufb01ned by h\u00b5ni = 1\nt , where Tn denotes the sojourn\ntime in the nth label or the total observation time T . The estimated temporal correlation differed\nsigni\ufb01cantly between the 1st and 2nd neural states (Wilcoxon signed rank test, p<0.00005). On the\nother hand, the estimated mean \ufb01ring rates did not differ signi\ufb01cantly between these neural states\n(Wilcoxon signed rank test, p>0.1). Our algorithm thus detected the change points on the basis of\ndiscontinuous changes in temporal correlations. We could see the similar tendencies for all random-\ndot coherence conditions (data not shown). We con\ufb01rmed that the mPHMM could not detect these\nchange points (data not shown), which we were able to deduce from the results shown in \ufb01g. 2(d).\nThese results suggest that our algorithm is effective in real data analysis.\n\nTn\nt=1 \u00b5n\n\n5 Discussion\n\nWe proposed an algorithm for simultaneously estimating state transitions, the number of neural\nstates, and nonstationary \ufb01ring rates using single-trial data.\n\nThere are ways of extending our research to analyze multi-neuron data. The simplest one assumes\nthat the time of state transitions is common among all recorded neurons[1-3]. Since this assumption\ncan partially include the effect of inter-neuron interactions, we can de\ufb01ne prior distributions that are\nindependent between neurons. Because there are no loops in the statistical dependencies of \ufb01ring\nrates under these conditions, the variational Bayes method can be applied directly.\nOne important topic for future study is optimization of coarse bin widths r = C\u2206. A bin width\nthat is too wide obscures both the time of change points and temporal pro\ufb01le of nonstationary \ufb01ring\nrates. A bin width that is too narrow, on the other hand, increases computational costs and worsens\nestimation accuracy. Watanabe et al. proposed an algorithm for estimating the optimal bin width by\nmaximization the marginal likelihood [15], which is probably applicable to our algorithm.\n\n8\n\n0< z >112345Label number5001000150002000Time (ms)(a)(b)(c)50010001500020000100200Firing rate (Hz)Estimated firing rateK-S(cid:1)plotEstimated value using label 2Estimated value using label 45001000150002000Time (ms)0100200Firing rate (Hz)\u03b2(d)(e)Trial number051015200.51.52.53.5x 105The 1st neural stateThe 2nd neural state-2.2-1.8-1.4The 1st neural stateThe 2nd neural stateTrial number05101520p<0.0005p>0.1\f432\n433\n434\n435\n436\n437\n438\n439\n440\n441\n442\n443\n444\n445\n446\n447\n448\n449\n450\n451\n452\n453\n454\n455\n456\n457\n458\n459\n460\n461\n462\n463\n464\n465\n466\n467\n468\n469\n470\n471\n472\n473\n474\n475\n476\n477\n478\n479\n480\n481\n482\n483\n484\n485\n\n[1] Abeles, M. et al. (1995), PNAS, pp. 609-616.\n[2] Kemere, C. et al. (2008) J. Neurophyiol. 100(7):2441-2452.\n[3] Jones, L. M. et al. (2007), PNAS 104(47):18772-18777.\n[4] Rickert, J. et al. (2009) J. Neurosci. 29(44): 13870-13882.\n[5] Harvey, C. D. et al. (2009), Nature 461(15):941-946.\n[6] Lisberger, et al. (1999), J. Neurosci. 19(6):2224-2246.\n[7] Ghahramani, Z., and Hinton, G. E. (2000) Neural Compt. 12(4):831-864.\n[8] Cunningham J. P. et al. (2007), Neural Netw. 22(9):1235-1246.\n\n[9] Attias, H. (1999), Proc. 15th Conf. on UAI\n[10] Beal, M. (2003), Pd. D thesis University College London.\n[11] Jaakkola, T. S., and Jordan, M. I. (2000)., Stat. and Compt. 10(1): pp. 25-37.\n[12] Watanabe, K. and Okada, M. (2009) Lecture Notes in Computer Science 5506:655-662.\n\n[13] Corduneanu, A. and Bishop, C. M. (2001) Arti\ufb01cial Intelligence and Statistics: 27-34.\n[14] Fuzisawa, S. et al. (2005), Cerebral Cortex 16(5):639-654.\n[15] Watanabe, K. et al. (2009), IEICE E92-D(7):1362-1368.\n[16] Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer.\n\n[17] Shimazaki, H., and Shinomoto, S. (2007), Neural Coding Abstract :120-123.\n[18] Richmond, B. J. et al. (1990), J. Neurophysiol. 64(2):351-369.\n[19] Dimatteo, I., et al. (2001), Biometrika 88(4):1055-1071.\n[20] Endres, D. et al. (2008), Adv. in NIPS 20:393-340.\n[21] Maunsell, J. H. and Van Essen, D. C. (1983) J. Neurophysiol. 49(5): 1127-1147.\n[22] Britten, K. H. et al. (1992), J. Neurosci. 12:4745-4765.\n[23] Brown, E. N. et al. (2002), Neural Compt. 14(2):325-346.\n[24] Bair, W. and Koch, C. (1996) Neural Compt. 8(6): 1185-1202.\n\n9\n\n\f", "award": [], "sourceid": 191, "authors": [{"given_name": "Ken", "family_name": "Takiyama", "institution": null}, {"given_name": "Masato", "family_name": "Okada", "institution": null}]}