{"title": "A multi-agent control framework for co-adaptation in brain-computer interfaces", "book": "Advances in Neural Information Processing Systems", "page_first": 2841, "page_last": 2849, "abstract": "In a closed-loop brain-computer interface (BCI), adaptive decoders are used to learn parameters suited to decoding the user's neural response. Feedback to the user provides information which permits the neural tuning to also adapt. We present an approach to model this process of co-adaptation between the encoding model of the neural signal and the decoding algorithm as a multi-agent formulation of the linear quadratic Gaussian (LQG) control problem. In simulation we characterize how decoding performance improves as the neural encoding and adaptive decoder optimize, qualitatively resembling experimentally demonstrated closed-loop improvement. We then propose a novel, modified decoder update rule which is aware of the fact that the encoder is also changing and show it can improve simulated co-adaptation dynamics. Our modeling approach offers promise for gaining insights into co-adaptation as well as improving user learning of BCI control in practical settings.", "full_text": "A multi-agent control framework for co-adaptation in\n\nbrain-computer interfaces\n\n\u2217 Josh Merel1, \u2217 Roy Fox2, Tony Jebara3, Liam Paninski4\n\n1Department of Neurobiology and Behavior, 3Department of Computer Science,\n\n4Department of Statistics, Columbia University, New York, NY 10027\n\n2School of Computer Science and Engineering, Hebrew University, Jerusalem 91904, Israel\n\njsm2183@columbia.edu, royf@cs.huji.ac.il,\n\njebara@cs.columbia.edu, liam@stat.columbia.edu\n\nAbstract\n\nIn a closed-loop brain-computer interface (BCI), adaptive decoders are used to\nlearn parameters suited to decoding the user\u2019s neural response. Feedback to the\nuser provides information which permits the neural tuning to also adapt. We\npresent an approach to model this process of co-adaptation between the encod-\ning model of the neural signal and the decoding algorithm as a multi-agent for-\nmulation of the linear quadratic Gaussian (LQG) control problem. In simulation\nwe characterize how decoding performance improves as the neural encoding and\nadaptive decoder optimize, qualitatively resembling experimentally demonstrated\nclosed-loop improvement. We then propose a novel, modi\ufb01ed decoder update rule\nwhich is aware of the fact that the encoder is also changing and show it can im-\nprove simulated co-adaptation dynamics. Our modeling approach offers promise\nfor gaining insights into co-adaptation as well as improving user learning of BCI\ncontrol in practical settings.\n\n1 Introduction\n\nNeural signals from electrodes implanted in cortex [1], electrocorticography (ECoG) [2], and elec-\ntroencephalography (EEG) [3] all have been used to decode motor intentions and control motor\nprostheses. Standard approaches involve using statistical models to decode neural activity to control\nsome actuator (e.g. a cursor on a screen [4], a robotic manipulator [5], or a virtual manipulator [6]).\nPerformance of of\ufb02ine decoders is typically different from the performance of online, closed-loop\ndecoders where the user gets immediate feedback and neural tuning changes are known to occur\n[7, 8]. In order to understand how decoding will be performed in closed-loop, it is necessary to\nmodel how the decoding algorithm updates and neural encoding updates interact in a coordinated\nlearning process, termed co-adaptation.\n\nThere have been a number of recent efforts to learn improved adaptive decoders speci\ufb01cally tailored\nfor the closed loop setting [9, 10], including an approach relying on stochastic optimal control theory\n[11]. In other contexts, emphasis has been placed on training users to improve closed-loop control\n[12]. Some efforts towards modeling the co-adaptation process have sought to model properties\nof different decoders when used in closed-loop [13, 14, 15], with emphasis on ensuring the stabil-\nity of the decoder and tuning the adaptation rate. One recent simulation study also demonstrated\nhow modulating task dif\ufb01culty can improve the rate of co-adaptation when feedback noise limits\nperformance [16]. However, despite speculation that exploiting co-adaptation will be integral to\nstate-of-the-art BCI [17], general models of co-adaptation and methods which exploit those models\nto improve co-adaptation dynamics are lacking.\n\n\u2217These authors contributed equally.\n\n1\n\n\fWe propose that we should be able to leverage our knowledge of how the encoder changes in order\nto better update the decoder. In the current work, we present a simple model of the closed-loop co-\nadaptation process and show how we can use this model to improve decoder learning on simulated\nexperiments. Our model is a novel control setting which uses a split Linear Quadratic Gaussian\n(LQG) system. Optimal decoding is performed by Linear Quadratic Estimation (LQE), effectively\nthe Kalman \ufb01lter model. Encoding model updates are performed by the Linear Quadratic Regulator\n(LQR), the dual control problem of the Kalman \ufb01lter. The system is split insofar as each agent has\ndifferent information available and each performs optimal updates given the state of the other side\nof the system. We take advantage of this model from the decoder side by anticipating changes in\nthe encoder and pre-emptively updating the decoder to match the estimate of the further optimized\nencoding model. We demonstrate that this approach can improve the co-adaptation process.\n\n2 Model framework\n\n2.1 Task model\n\nFor concreteness, we consider a motor-cortical neuroprosthesis setting. We assume a naive user,\nplaced into a BCI control setting, and propose a training scheme which permits the user and decoder\nto adapt. We provide a visual target cue at a 3D location and the user controls the BCI via neural sig-\nnals which, in a natural setting, relate to hand kinematics. The target position is moved each timestep\nto form a trajectory through the 3D space reachable by the user\u2019s hand. The BCI user receives visual\nfeedback via the displayed location of their decoded hand position. The user\u2019s objective is to control\ntheir cursor to be as close to the continuously moving target cursor as possible. A key feature of this\nscheme is that we know the \u201cintention\u201d of the user, assuming it corresponds to the target.\nThe complete graphical model of this system is provided in \ufb01gure 1. xt in our simulations is a three\ndimensional position vector (Cartesian Coordinates) corresponding to the intended hand position.\nThis variable could be replaced or augmented by other variables of interest (e.g. velocity). We\nrandomly evolve the target signal using a linear-Gaussian drift model (eq. (1)). The neural encoding\nmodel is linear-Gaussian in response to intended position xt and feedback \u02c6xt\u22121 (eq. (2)), giving\na vector of neural responses ut (e.g. local \ufb01eld potential or smoothed \ufb01ring rates of neural units).\nSince we do not observe the whole brain region, we must subsample the number of neural units\nfrom which we collect information. The transformation C is conceptually equivalent to electrode\nsampling and yt is the observable neural response vector via the electrodes (eq. (3)). Lastly, \u02c6xt is\nthe decoded hand position estimate, which also serves as visual feedback (eq. (4)).\n\nxt = P xt\u22121 + \u03bet;\nut = Axt + B \u02c6xt\u22121 + \u03b7t;\nyt = Cut + \u01ebt;\n\u02c6xt = F yt + G\u02c6xt\u22121.\n\n\u03bet \u223c N (0, Q)\n\u03b7t \u223c N (0, R)\n\u01ebt \u223c N (0, S)\n\n(1)\n(2)\n(3)\n(4)\n\nxt\n\nA\n\nP\n\nut\n\nxt+1\n\nA\n\nut+1\n\nC\n\nyt\n\nF\n\nG\n\nB\n\n\u02c6xt\n\nB\n\n\u02c6xt\u22121\n\nC\n\nyt+1\n\nF\n\nG\n\n\u02c6xt+1\n\nDuring training, the decoding system is allowed ac-\ncess to the target position, interpreted as the real in-\ntention xt. The decoded \u02c6xt is only used as feedback,\nto inform the user of the gradually learned dynamics\nof the decoder. After training, the system is tested\non a task with the same parameters of the trajectory\ndynamics, but with the actual intention only known\nto the user, and hidden from the decoder. A natural\nobjective is to minimize tracking error, measured as\naccumulated mean squared error between the target\nand neurally decoded pose over time.\n\nFigure 1: Graphical model relating target sig-\nnal (xt), neural response (ut), electrode ob-\nservation of neural response (yt), and de-\ncoded feedback signal (\u02c6xt).\n\nFor contemporary BCI applications, the Kalman \ufb01l-\nter is a reasonable baseline decoder, so we do not\nconsider even simpler models. However, for other\napplications one might wish to consider a model in\nwhich the state at each timestep is encoded indepen-\ndently. It is possible to \ufb01nd a closed form for the optimal encoder and decoder that minimizes the\nerror in this case [18, 19].\n\n2\n\n\fSections 2.2 and 2.3 describe the model presented in \ufb01gure 1 as seen from the distinct viewpoints\nof the two agents involved \u2013 the encoder and the decoder. The encoder observes xt and \u02c6xt\u22121, and\nselects A and B to generate a control signal ut. The decoder observes yt, and selects F and G\nto estimate the intention as \u02c6xt. We assume that both agents are free to performed unconstrained\noptimization on their parameters.\n\n2.2 Encoding model and optimal decoder\n\nOur encoding model is quite simple, with neural units responding in a linear-Gaussian fashion to\nintended position xt and feedback \u02c6xt\u22121 (eq. (2)). This is a standard model of neural responses for\nBCI. The matrices A and B effectively correspond to the tuning response functions of the neural\nunits, and we will allow these parameters to be adjusted under the control of the user. The matrix\nC corresponds to the observation of the neural units by the electrodes, so we treat it as \ufb01xed (in our\ncase C will down-sample the neurons). For this paper, we assume noise covariances are \ufb01xed and\nknown, but this can be generalized. Given the encoder, the decoder will estimate the intention xt,\nwhich follows a hidden Markov chain (eq. (1)). The observations available to the decoder are the\nelectrode samples yt (eq. (2) and (3))\n\nyt = CAxt + CB \u02c6xt\u22121 + \u01eb\u2032\nt;\nRC = CRC T + S.\n\n\u01eb\u2032\nt \u223c N (0, RC )\n\n(5)\n(6)\n\nGiven all the electrode samples up to time t, the problem of \ufb01nding the most likely hidden intention\nis a Linear-Quadratic Estimation problem (\ufb01gure 2), and its standard solution is the Kalman \ufb01lter,\nand this decoder is widely in similar contexts. To choose appropriate Kalman gain F and mean\ndynamics G, the decoding system needs a good model of the dynamics of the underlying intention\nprocess (P , Q of eq.(1)) and the electrode observations (CA, CB, and RC of eqs.\n(5) & (6)).\nWe can assume that P and Q are known since the decoding algorithm is controlled by the same\nexperimenter who speci\ufb01es the intention process for the training phase. We discuss the estimation\nof the observation model in section 4.\n\nxt\n\nCA\n\nCB\n\n\u02c6xt\u22121\n\nP\n\nyt\n\nF\n\nG\n\nxt+1\n\nCA\n\nCB\n\n\u02c6xt\n\nyt+1\n\nF\n\nG\n\nxt\n\nA\n\nB\n\n\u02c6xt+1\n\n\u02c6xt\u22121\n\nP\n\nxt+1\n\nut\n\nF C\n\nG\n\nA\n\nB\n\nut+1\n\nF C\n\nG\n\n\u02c6xt\n\n\u02c6xt+1\n\nFigure 2: Decoder\u2019s point of view \u2013 target\nsignal (xt) directly generates observed re-\nsponses (yt), with the encoding model col-\nlapsed to omit the full signal (ut). De-\ncoded feedback signal (\u02c6xt) is generated by\nthe steady state Kalman \ufb01lter.\n\nFigure 3: Encoder\u2019s point of view \u2013 target sig-\nnal (xt) and decoded feedback signal (\u02c6xt\u22121)\ngenerate neural response (ut). Model of de-\ncoder collapses over responses (yt) which are\nunseen by the encoder side.\n\nGiven an encoding model, and assuming a very long horizon 1, there exist standard methods to\noptimize the stationary value of the decoder parameters [20]. The stationary covariance \u03a3 of xt\ngiven \u02c6xt\u22121 is the unique positive-de\ufb01nite \ufb01xed point of the Riccati equation\n\n\u03a3 = P \u03a3P T \u2212 P \u03a3(CA)T (RC + (CA)\u03a3(CA)T )\u22121(CA)\u03a3P T + Q.\n\nThe Kalman gain is then\n\nwith mean dynamics\n\nF = \u03a3(CA)T ((CA)\u03a3(CA)T + RC )\u22121\n\nG = P \u2212 F (CA)P \u2212 F (CB).\n\n(7)\n\n(8)\n\n(9)\n\n1Our task is control of the BCI for arbitrarily long duration, so it makes sense to look for the stationary\ndecoder. Similarly the BCI user will look for a stationary encoder. We could also handle the \ufb01nite horizon case\n(see section 2.3 for further discussion).\n\n3\n\n\fWe estimate \u02c6xt using eq.\n(4), and this is the most likely value, as well as the expected value,\nof xt given the electrode observations y1, . . . , yt. Using this estimate as the decoded intention is\nequivalent to minimizing the expectation of a quadratic cost\n\nclqe =Xt\n\n1\n\n2 kxt \u2212 \u02c6xtk2.\n\n(10)\n\n2.3 Model of co-adaptation\n\nAt the same time as the decoder-side agent optimizes the decoder parameters F and G, the encoder-\nside agent can optimize the encoder parameters A and B. We formulate encoder updates for the BCI\napplication as a standard LQR problem. This framework requires that the encoder-side agent has an\nintention model (same as eq. (1)) and a model of the decoder. The decoder model combines eq. (3)\nand (4) into\n\n\u02c6xt = F Cut + G\u02c6xt\u22121 + F \u01ebt.\n\n(11)\n\nThis model is depicted in \ufb01gure 3. We assume that the encoder has access to a perfect estimate of the\nintention-model parameters P and Q (task knowledge). We also assume that the encoder is free to\nchange its parameters A and B arbitrarily given the decoder-side parameters (which it can estimate\nas discussed in section 4).\n\nAs a model of real neural activity, there must be some cost to increasing the power of the neural\nsignal. Without such a cost, the solutions diverge. We add an additional cost term (a regularizer),\nwhich is quadratic in the magnitude of the neural response ut, and penalizes a large neural signal\n\nclqr =Xt\n\n1\n\n2 kxt \u2212 \u02c6xtk2 + 1\n\n2 uT\n\nt\n\n\u02dcRut.\n\n(12)\n\nSince the decoder has no direct in\ufb02uence on this additional term, it can be viewed as optimizing for\nthis target cost function as well. The LQR problem is solved similarly to eq. (7), by assuming a very\nlong horizon and optimizing the stationary value of the encoder parameters [20].\n\nWe next formulate our objective function in terms of standard LQR parameters. The control depends\non the joint process of the intention and the feedback (xt, \u02c6xt\u22121), but the cost is de\ufb01ned between xt\nand \u02c6xt. To compute the expected cost given xt, \u02c6xt\u22121 and ut, we use eq. (11) to get\n\nE k\u02c6xt \u2212 xtk2 = kF Cut + G\u02c6xt\u22121 \u2212 xtk2 + const\n\n(13)\n\n= (G\u02c6xt\u22121 \u2212 xt)T (G\u02c6xt\u22121 \u2212 xt) + (F Cut)T (F Cut) + 2(G\u02c6xt\u22121 \u2212 xt)T (F Cut) + const.\n\nEquation 13 provides the error portion of the quadratic objective of the LQR problem. The standard\nsolution for the stationary case involves computing the Hessian V of the cost-to-go in joint state\n\n\u02c6xt\u22121(cid:21) as the unique positive-de\ufb01nite \ufb01xed point of the Riccati equation\n(cid:20) xt\n\nV = \u02dcP T V \u02dcP + ( \u02dcN + \u02dcP T V \u02dcD)( \u02dcR + \u02dcS + \u02dcDT V \u02dcD)\u22121( \u02dcN T + \u02dcDT V \u02dcP ) + \u02dcQ.\n\n(14)\nHere \u02dcP is the process dynamics for the joint state of xt and \u02c6xt\u22121 and \u02dcD is the controllability of this\ndynamics. \u02dcQ, \u02dcS and \u02dcN are the cost parameters which can be determined by inspection of eq. (13).\n\u02dcR is the Hessian of the neural response cost term which is chosen in simulations so that the resulting\nincrease in neural signal strength is reasonable.\n\n\u02dcP =(cid:20)P\n\n0 G(cid:21) ,\n\n0\n\nF C(cid:21) ,\n\u02dcD =(cid:20) 0\n\n\u02dcQ =(cid:20) I\n\n\u2212G GT G(cid:21) ,\n\n\u2212GT\n\n\u02dcS = (F C)T (F C),\n\n\u02dcN =(cid:20) \u2212F C\n\nGT (F C)(cid:21) .\n\nIn our formulation, the encoding model (A, B) is equivalent to the feedback gain\n\n[A B] = \u2212( \u02dcDT V \u02dcD + \u02dcR + \u02dcS)\u22121( \u02dcN T + \u02dcDT V \u02dcP ).\n\n(15)\n\nThis is the optimal stationary control, and is generally not optimal for shorter planning horizons. In\nthe co-adaptation setting, the encoding model (At, Bt) regularly changes to adapt to the changing\ndecoder. This means that (At, Bt) is only used for one timestep (or a few) before it is updated. The\neffective planning horizon is thus shortened from its ideal in\ufb01nity, and now depends on the rate and\nmagnitude of the perturbations introduced in the encoding model. Eq. (14) can be solved for this\n\ufb01nite horizon, but here for simplicity we assume the encoder updates introduce small or infrequent\nenough changes to keep the planning horizon very long, and the stationary control close to optimal.\n\n4\n\n\f14000\n\n13000\n\n12000\n\n11000\n\n10000\n\n9000\n\n8000\n\n7000\n\n6000\n\n)\nz\n,\ny\n,\nx\n \nr\ne\nv\no\n \nd\ne\nm\nm\nu\ns\n(\n \nr\no\nr\nr\ne\n\n2\n\n4\n\n1\n\n0.95\n\n0.9\n\n\u03c1\n\n0.85\n\n0.8\n\n0.75\n\n16\n\n18\n\n20\n\n0.7\n1\n\n2\n\n6\n\n8\n\n10\n\n12\n\n14\n\nupdate iteration index\n\n3\n\n4\n\n5\n\n6\n\nencoder update iteration index\n\n7\n\n8\n\n9\n\n10\n\n(a)\n\n(b)\n\nFigure 4: (a) Each curve plots single trial changes in decoding mean squared error (MSE) over\nwhole timeseries as a function of the number of update half-iterations. The encoder is updated in\neven steps, the decoder in odd ones. Distinct curves are for multiple, random initializations of the\nencoder. (b) Plots the corresponding changes in encoder parameter updates - y-axis, \u03c1, is correlation\nbetween the vectorized encoder parameters after each update with the \ufb01nal values.\n\n3 Perfect estimation setting\n\nWe can consider co-adaptation in a hypothetical setting where each agent has instant access to a\nperfect estimate of the other\u2019s parameters as soon as they change. To keep this setting comparable\nto the setting of section 4, where parameter estimation is needed, we only allow each agent access to\nthose variables that it could, in principle, estimate. We assume both agents know the parameters P\nand Q of the intention dynamics, that the encoder knows F C and G of eq. (11), and that the decoder\nknows CA, CB and RC of eq. (5) and (6). These are the same parameters needed by each agent for\nits own re-optimization. This process of parameter updates is performed by alternating between the\nencoder update equations (7)-(9) and the decoder update equations (14)-(15). Since the agents take\nturns minimizing the expected in\ufb01nite-horizon objectives of eq. (12) given the other, this cost will\ntend to decrease, approximately converging.\nNote that neither of these steps depends explicitly on the observed values of the neural signal ut\nor the decoded output \u02c6xt. In other words, co-adaptation can be simulated without ever actually\ngenerating the stochastic process of intention, encoding and decoding. However, this process and\nthe signal-feedback loop become crucial when estimation is involved, as in section 4. Then each\nagent\u2019s update indirectly depends on its observations through its estimated model of the other agent.\nTo examine the dynamics in this idealized setting, we hold \ufb01xed the target trajectory x1...T as well\nas the realization of the noise terms. We initialize the simulation with a random encoding model and\nobserve empirically that, as the encoder and the decoder are updated alternatingly, the error rapidly\nreduces to a plateau. As the improvement saturates, the joint encoder-decoder pair approximates\na locally optimal solution to the co-adaptation problem. Figure 4(a) plots the error as a function\nof the number of model update iterations \u2013 the different curves correspond to distinct, random ini-\ntializations of the encoder parameters A, B with everything else held \ufb01xed. We emphasize that\nfor a \ufb01xed encoder, the \ufb01rst decoder update would yield the in\ufb01nite-horizon optimal update if the\nencoder could not adapt, and the error can be interpreted relative to this initial optimal decoding\n(see supplementary \ufb01g1(a) for depiction of initial error and improvement by encoder adaptation in\nsupplementary \ufb01g1(b)). This method obtains optimized encoder-decoder pairs with moderate sensi-\ntivity to the initial parameters of the encoding model. Interpreted in the context of BCI, this suggests\nthat the initial tuning of the observed neurons may affect the local optima attainable for BCI perfor-\nmance due to standard co-adaptation. We may also be able to optimize the \ufb01nal error by cleverly\nchoosing updates to decoder parameters in a fashion which shifts which optimum is reached. Figure\n4(b) displays the corresponding approximate convergence of the encoder parameters - as the error\ndecreases, the encoder parameters settle to a stable set (the actual \ufb01nal values across initializations\nvary).\nParameters free from the standpoint of the simulation are the neural noise covariance RC and the\nHessian \u02dcR of the neural signal cost. We set these to reasonable values - the noise to a moderate\n\n5\n\n\flevel and the cost suf\ufb01ciently high as to prevent an exceedingly large neural signal which would\nswamp the noise and yield arbitrarily low error (see supplement). In an experimental setting, these\nparameters would be set by the physical system and they would need to be estimated beforehand.\n\n4 Partially observable setting with estimation\n\nMore realistic than the model of co-adaptation where the decoder-side and encoder-side agents au-\ntomatically know each other\u2019s parameters, is one where the rate of updating is limited by the partial\nknowledge each agent has about the other. In each timestep, each agent will update its estimate of\nthe other agent\u2019s parameters, and then use the current estimates to re-optimize its own parameters.\nIn this work we use a recursive least squares (RLS) which is presented in the supplement section 3\nfor this estimation. RLS has a forgetting factor \u03bb which regulates how quickly the routine expects\nthe parameters it estimates to change. This co-adaptation process is detailed in procedure 1. We\nelect to use the same estimation routine for each agent and assume that the user performs ideal-\nobserver style optimal estimation. In general, if more knowledge is available about how a real BCI\nuser updates their estimates of the decoder parameters, such a model could easily be used. We could\nalso explore in simulation how various suboptimal estimation models employed by the user affect\nco-adaptation.\n\nAs noted previously, we will assume the noise model is \ufb01xed and that the decoder side knows the\nneural signal noise covariance RC (eq. (6)). The encoder-side will use a scaled identity matrix as the\nestimate of the electrodes-decoder noise model. To jointly estimate the decoder parameters and the\nnoise model, an EM-based scheme would be a natural approach (such estimation of the BCI user\u2019s\ninternal model of the decoder has been treated explicitly in [21]).\n\nProcedure 1 standard co-adaptation\nfor t = 1 to lengthT raining do\n\nEncoder-side\n\nGet xt and \u02c6xt\u22121\n\nUpdate encoder-side estimate of decoderdF C, bG (RLS)\nUpdate optimal encoder A, B using current decoder estimatedF C, bG (LQR)\n\nEncode current intention using A, B and send signal yt\n\nDecoder-side\n\nGet xt and yt\n\nUpdate decoder-side estimate of encoderdCA,dCB (RLS)\nUpdate optimal decoder F, G using current encoder estimatedCA,dCB (LQE)\n\nDecode current signal using F, G and display as feedback \u02c6xt\n\nend for\n\nStandard co-adaptation yields improvements in decoding performance over time as the encoder and\ndecoder agents estimate each others\u2019 parameters and update based on those estimates. Appropriately,\nthat model will improve the encoder-decoder pair over time, as in the blue curves of \ufb01gure 5 below.\n\n5 Encoder-aware decoder updates\n\nIn this section, we present an approach to model the encoder updates from the decoder side. We\nwill use this to \u201ctake an extra step\u201d towards optimizing the decoder for what the anticipated future\nencoder ought to look like.\nIn the most general case, the encoder can update At and Bt in an unconstrained fashion at each\ntimestep t. From the decoder side, we do not know C and therefore we cannot know F C, an\nestimate of which is needed by the user to update the encoder. However, the decoder sets F and can\npredict updates to [CA CB] directly, instead of to [A B] as the actual encoder does (equation\n15). We emphasize that this update is not actually how the user will update the encoder, rather it\ncaptures how the encoder ought to change the signals observed by the decoder (from the decoder\u2019s\nperspective).\n\n6\n\n\fFigure 5: In each subplot, the blue line corresponds to decreasing error as a function of simulated\ntime from standard co-adaptation (procedure 1). The green line corresponds to the improved one-\nstep-ahead co-adaptation (procedure 2). Plots from left to right have decreasing RLS forgetting\nfactor used by the encoder-side to estimate the decoder parameters. Curves depict the median error\nacross 20 simulations with con\ufb01dence intervals of 25% and 75% quantiles. Error at each timestep is\nappropriately cross-validated, it corresponds to taking the encoder-decoder pair of that timestep and\ncomputing error on \u201ctest\u201d data.\n\nWe can \ufb01nd the update [CApred CBpred] by solving a modi\ufb01ed version of the LQR problem\npresented in section 2.3, eq. (15)\n\n[CApred CBpred] = \u2212( \u02dcD\u2032T V \u02dcD\u2032 + \u02dcR\u2032 + \u02dcS \u2032)\u22121( \u02dcN \u2032T + \u02dcD\u2032T V \u02dcP ),\n\nwith terms de\ufb01ned similarly to section 2.3, except\n\nF(cid:21) ,\n\u02dcD\u2032 =(cid:20) 0\n\n\u02dcS \u2032 = F T F,\n\nGT F(cid:21) .\n\u02dcN \u2032 =(cid:20) \u2212F\n\n(16)\n\n(17)\n\nWe also note that the quadratic penalty used in this approximation been transformed from a cost\non the responses of all of the neural units to a cost only on the observed ones. \u02dcR\u2032 serves as a\nregularization parameter which now must be tuned so the decoder-side estimate of the encoding\nupdate is reasonable. For simplicity we let \u02dcR\u2032 = \u03b3I for some constant coarsely tuned \u03b3, though\nin general this cost need not be a scaled identity matrix. Equations 16 & 17 only use information\navailable at the decoder side, with terms dependent on F C having been replaced by terms dependent\ninstead on F . These predictions will be used only to engineer decoder update schemes that can be\nused to improve co-adaptation (as in procedure 2).\n\nProcedure 2 r-step-ahead co-adaptation\n\nfor t = 1 to lengthT raining do\n\nEncoder-side\n\nAs in section 5\n\nDecoder-side\n\nGet xt and yt\n\nUpdate decoder-side estimate of encoderdCA,dCB (RLS)\nUpdate optimal decoder F, G using current encoder estimatedCA,dCB (LQE)\n\nfor r = 1 to numStepsAhead do\n\nAnticipate encoder update CApred, CBpred to updated decoder F, G (modi\ufb01ed LQR)\nUpdate r-step-ahead optimal decoder F, G using CApred, CBpred (LQE)\n\nend for\n\nend for\n\nDecode current signal using r-step-ahead F, G and display as feedbackbxt\n\nThe ability to compute decoder-side approximate encoder updates opens the opportunity to improve\nencoder-decoder update dynamics by anticipating encoder-side adaptation to guide the process to-\nwards faster convergence, and possibly to better solutions. For the current estimate of the encoder,\nwe update the optimal decoder, anticipate the encoder update by the method of section above, and\nthen update the decoder in response to the anticipated encoder update. This procedure allows r-\nstep-ahead updating as presented in procedure 2. Figure 5 demonstrates how the one-step-ahead\n\n7\n\n\fscheme can improve the co-adaptation dynamics. It is not a priori obvious that this method would\nhelp - the decoder-side estimate of the encoder update is not identical to the actual update. An\nencoder-side agent more permissive of rapid changes in the decoder may better handle r-step-ahead\nco-adaptation. We have also tried r-step-ahead updates for r > 1. However, this did not outperform\nthe one-step-ahead method, and in some cases yields a decline relative to standard co-adaptation.\nThese simulations are susceptible to the setting of the forgetting factor used by each agent in the\nRLS estimation, the initial uncertainty of the parameters, and the quadratic cost used in the one-\nstep-ahead approximation \u02dcR\u2032. The encoder-side RLS parameters in a real setting will be determined\nby the BCI user and \u02dcR\u2032 should be tuned (as a regularization parameter).\nThe encoder-side forgetting factor would correspond roughly to the plasticity of the BCI user with\nrespect to the task. A high forgetting factor permits the user to tolerate very large changes in the\ndecoder, and a low forgetting factor corresponds to the user assuming more decoder stability. From\nleft to right in the subplots of \ufb01gure 5, encoder-side forgetting factor decreases - the regime where\naugmenting co-adaptation may offer the most bene\ufb01t corresponds to a user that is most uncertain\nabout the decoder and willing to tolerate decoder changes. Whether or not co-adaptation gains\nare possible in our model depend upon parameters of the system. Nevertheless, for appropriately\nselected parameters, attempting to augment the co-adaptation should not hurt performance even\nif the user were outside of the regime where the most bene\ufb01t is possible. A real user will likely\nperform their half of co-adaptation sub-optimally relative to our idealized BCI user and the structure\nof such suboptimalities will likely increase the opportunity for co-adaptation to be augmented. The\ntimescale of these simulation results are unspeci\ufb01ed, but would correspond to the timescale on which\nthe biological neural encoding can change. This varies by task and implicated brain-region, ranging\nfrom a few training sessions [22, 23] to days [24].\n\n6 Conclusion\n\nOur work represents a step in the direction of exploiting co-adaptation to jointly optimize the neural\nencoding and the decoder parameters, rather than simply optimizing the decoder parameters without\ntaking the encoder parameter adaptation into account. We model the process of co-adaptation that\noccurs in closed-loop BCI use between the user and decoding algorithm. Moreover, the results using\nour modi\ufb01ed decoding update demonstrate a proof of concept that reliable improvement can be\nobtained relative to naive adaptive decoders by encoder-aware updates to the decoder in a simulated\nsystem. It is still open how well methods based on this approach will extend to experimental data.\n\nBCI is a two-agent system, and we may view co-adaptation as we have formulated it within multi-\nagent control theory. As both agents adapt to reduce the error of the decoded intention given their\nrespective estimates of the other agent, a \ufb01xed point of this co-adaptation process is a Nash equilib-\nrium. This equilibrium is only known to be unique in the case where the intention at each timestep is\nindependent [25]. In our more general setting, there may be more than one encoder-decoder pair for\nwhich each is optimal given the other. Moreover, there may exist non-linear encoders with which\nnon-linear decoders can be in equilibrium. These connections will be explored in future work.\n\nObviously our model of the neural encoding and the process by which the neural encoding model\nis updated are idealizations. Future experimental work will determine how well our co-adaptive\nmodel can be applied to the real neuroprosthetic context. For rapid, low-cost experiments it might\nbe best to begin with a human, closed-loop experiments intended to simulate a BCI [26]. As the\nKalman \ufb01lter is a standard decoder, it will be useful to begin experimental investigations with this\nchoice (as analyzed in this work). More complicated decoding schemes also appear to improve\ndecoding performance [23] by better accounting for the non-linearities in the real neural encoding,\nand such methods scale to BCI contexts with many output degrees of freedom [27]. An important\nextension of the co-adaptation model presented in this work is to non-linear encoding and decoding\nschemes. Even in more complicated, realistic settings, we hope the framework presented here will\noffer similar practical bene\ufb01ts for improving BCI control.\n\nAcknowledgments\n\nThis project is supported in part by the Gatsby Charitable Foundation. Liam Paninski receives\nsupport from a NSF CAREER award.\n\n8\n\n\fReferences\n[1] M. D. Serruya, N. G. Hatsopoulos, L. Paninski, M. R. Fellows, and J. P. Donoghue, \u201cInstant neural control\n\nof a movement signal.,\u201d Nature, vol. 416, no. 6877, pp. 141\u2013142, 2002.\n\n[2] K. J. Miller et al., \u201cCortical activity during motor execution, motor imagery, and imagery-based online\n\nfeedback.,\u201d PNAS, vol. 107, no. 9, pp. 4430\u20134435, 2010.\n\n[3] D. J. McFarland, W. A. Sarnacki, and J. R. Wolpaw, \u201cElectroencephalographic (eeg) control of three-\n\ndimensional movement.,\u201d Journal of Neural Engineering, vol. 7, no. 3, p. 036007, 2010.\n\n[4] V. Gilja et al., \u201cA high-performance neural prosthesis enabled by control algorithm design.,\u201d Nat Neurosci,\n\n2012.\n\n[5] L. R. Hochberg et al., \u201cReach and grasp by people with tetraplegia using a neurally controlled robotic\n\narm,\u201d Nature, vol. 485, no. 7398, pp. 372\u2013375, 2012.\n\n[6] D. Putrino et al., \u201cDevelopment of a closed-loop feedback system for real-time control of a high-\n\ndimensional brain machine interface,\u201d Conf Proc IEEE EMBS, vol. 2012, pp. 4567\u20134570, 2012.\n\n[7] S. Koyama et al., \u201cComparison of brain-computer interface decoding algorithms in open-loop and closed-\n\nloop control.,\u201d Journal of Computational Neuroscience, vol. 29, no. 1-2, pp. 73\u201387, 2010.\n\n[8] J. M. Carmena et al., \u201cLearning to control a brainmachine interface for reaching and grasping by pri-\n\nmates,\u201d PLoS Biology, vol. 1, no. 2, p. E42, 2003.\n\n[9] V. Gilja et al., \u201cA brain machine interface control algorithm designed from a feedback control perspec-\n\ntive.,\u201d Conf Proc IEEE Eng Med Biol Soc, vol. 2012, pp. 1318\u201322, 2012.\n\n[10] Z. Li, J. E. ODoherty, M. A. Lebedev, and M. A. L. Nicolelis, \u201cAdaptive decoding for brain-machine\n\ninterfaces through bayesian parameter updates.,\u201d Neural Comput., vol. 23, no. 12, pp. 3162\u2013204, 2011.\n\n[11] K. Kowalski, B. He, and L. Srinivasan, \u201cDynamic analysis of naive adaptive brain-machine interfaces,\u201d\n\nNeural Comput., vol. 25, no. 9, pp. 2373\u20132420, 2013.\n\n[12] C. Vidaurre, C. Sannelli, K.-R. Muller, and B. Blankertz, \u201cMachine-learning based co-adaptive calibration\n\nfor brain-computer interfaces,\u201d Neural Computation, vol. 816, no. 3, pp. 791\u2013816, 2011.\n\n[13] M. Lagang and L. Srinivasan, \u201cStochastic optimal control as a theory of brain-machine interface opera-\n\ntion,\u201d Neural Comput., vol. 25, pp. 374\u2013417, Feb. 2013.\n\n[14] R. Heliot, K. Ganguly, J. Jimenez, and J. M. Carmena, \u201cLearning in closed-loop brain-machine inter-\nfaces: Modeling and experimental validation,\u201d Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE\nTransactions on, vol. 40, no. 5, pp. 1387\u20131397, 2010.\n\n[15] S. Dangi, A. L. Orsborn, H. G. Moorman, and J. M. Carmena, \u201cDesign and Analysis of Closed-Loop De-\ncoder Adaptation Algorithms for Brain-Machine Interfaces,\u201d Neural Computation, pp. 1\u201339, Apr. 2013.\n[16] Y. Zhang, A. B. Schwartz, S. M. Chase, and R. E. Kass, \u201cBayesian learning in assisted brain-computer\n\ninterface tasks.,\u201d Conf Proc IEEE Eng Med Biol Soc, vol. 2012, pp. 2740\u20133, 2012.\n\n[17] S. Waldert et al., \u201cA review on directional information in neural signals for brain-machine interfaces.,\u201d\n\nJournal Of Physiology Paris, vol. 103, no. 3-5, pp. 244\u2013254, 2009.\n\n[18] G. P. Papavassilopoulos, \u201cSolution of some stochastic quadratic Nash and leader-follower games,\u201d SIAM\n\nJ. Control Optim., vol. 19, pp. 651\u2013666, Sept. 1981.\n\n[19] E. Doi and M. S. Lewicki, \u201cCharacterization of minimum error linear coding with sensory and neural\n\nnoise.,\u201d Neural Computation, vol. 23, no. 10, pp. 2498\u20132510, 2011.\n\n[20] M. Athans, \u201cThe discrete time linear-quadratic-Gaussian stochastic control problem,\u201d Annals of Economic\n\nand Social Measurement, vol. 1, pp. 446\u2013488, September 1972.\n\n[21] M. D. Golub, S. M. Chase, and B. M. Yu, \u201cLearning an internal dynamics model from control demonstra-\n\ntion.,\u201d 30th International Conference on Machine Learning, 2013.\n\n[22] R. Shadmehr, M. A. Smith, and J. W. Krakauer, \u201cError correction, sensory prediction, and adaptation in\n\nmotor control.,\u201d Annual Review of Neuroscience, vol. 33, no. March, pp. 89\u2013108, 2010.\n\n[23] L. Shpigelman, H. Lalazar, and E. Vaadia, \u201cKernel-arma for hand tracking and brain-machine interfacing\n\nduring 3d motor control,\u201d in NIPS, pp. 1489\u20131496, 2008.\n\n[24] A. C. Koralek, X. Jin, J. D. Long II, R. M. Costa, and J. M. Carmena, \u201cCorticostriatal plasticity is neces-\n\nsary for learning intentional neuroprosthetic skills.,\u201d Nature, vol. 483, no. 7389, pp. 331\u2013335, 2012.\n\n[25] T. Basar, \u201cOn the uniqueness of the Nash solution in linear-quadratic differential games,\u201d International\n\nJournal of Game Theory, vol. 5, no. 2-3, pp. 65\u201390, 1976.\n\n[26] J. P. Cunningham et al., \u201cA closed-loop human simulator for investigating the role of feedback control in\n\nbrain-machine interfaces.,\u201d Journal of Neurophysiology, vol. 105, no. 4, pp. 1932\u20131949, 2010.\n\n[27] Y. T. Wong et al., \u201cDecoding arm and hand movements across layers of the macaque frontal cortices.,\u201d\n\nConf Proc IEEE Eng Med Biol Soc, vol. 2012, pp. 1757\u201360, 2012.\n\n9\n\n\f", "award": [], "sourceid": 1297, "authors": [{"given_name": "Josh", "family_name": "Merel", "institution": "Columbia University"}, {"given_name": "Roy", "family_name": "Fox", "institution": "Hebrew University"}, {"given_name": "Tony", "family_name": "Jebara", "institution": "Columbia University"}, {"given_name": "Liam", "family_name": "Paninski", "institution": "Columbia University"}]}