Paper ID: | 1087 |
---|---|

Title: | Inference by Reparameterization in Neural Population Codes |

Inference by Reparametrization in Neural Population Codes attempts to map a popular approximate inference algorithm (Loopy Belief Propagation, LBP) onto a brain. The proposed solution combines Probabilistic Population Codes (PPC) to represent the parameters of the distribution and the Tree-Based Reparametrization to overcome biologically implausible requirements of the standard LBP. An example set of equations is derived for a Gaussian graphical model and a linear dynamical system with stationary solution around the discrete version of the algorithm is put forward. Simulations show a good recovery of ground truth and a decent amount of robustness to noise of the proposed algorithm.

The proposed algorithm is an interesting extension to the line of research around probabilistic population codes. It is the first one to discuss inference in non-trivial graphical models. Both the background and the algorithm are well described, all figures and the concrete example are clear, informative and useful. The only aspect of the study I would appreciate expanding is the biological setting and discussion of the implications for the neural systems. E.g., when you introduce the Gaussian model, please be concrete with what its "real-world" counterpart could be. Make it clear that the brain would need to learn relationships between the variables to efficiently compute the marginals. Also, would you expect the brain to be using a variety of trees for updates (as in Fig. 2C)?

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

A fundamental challenge for probabilistic interpretations of brain computation is to figure out how can neural networks represent complex probability distributions and use them to guide actions. Two main classes of solutions have been proposed - PPCs which work well for univariate distributions and very restricted parametric families of multivariate distributions and sampling based representations which have more flexibility but at the cost of potentially long sampling times. This paper considers a tempting alternative within the class of variational algs, using a continuous time approximation of TRP. They show that the resulting network dynamics are biologically plausible (and similar to traditional PPCs) and does numerically not much worse than the original. Overall, a neat idea which is worth exploring further in particular in terms of neural implications. A few comments: - the main advantage of such a representations seem to me to be twofold a) easy access to posterior marginals throughout the computation (which is what probably drives individual decisions) and b) convergence speed. It would be nice to have a better illustration of this in the results, In particular how long does it take to reach convergence in neural terms (some way to tie dynamics to membrane time constant? how slow are the slow dynamics relative to this)? How much faster is this compared to a standard sampling based alternative implementation? - the idea of state space expansion when mapping random variable is something which has been put forward in the context of sampling as well (Savin& Deneve, Nips 2013). It would be interesting to make these parallels more explicit ...

Would be nice to have a clearer understanding of what the different scales of the dynamics mean in neural terms, beyond the fact that the network activity looks complicated. Fig.4,6: what are the units for the time axis (fast time scale = membrane time const? slow timescale = synaptic facilitation/depression or NMDA/GABAb currents or sth even slower ?)? Still not 100% sure how the graphical model structure translates into the network connectivity structure Minor typo: last page "biologically plauble interpretable"

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

The authors present a novel way in which neurons encoding probabilistic information as a PPC (probabilistic population code) can perform approximate loopy Belief Propagation. The novelty is the application of tree-based reparametrization in the context of PPC.

Overall I find this paper interesting to read, well presented and well thought out. I will briefly state what I found to be the stronger and weaker point of this paper. My main concern with this paper is the insistence on 'biological compatibility' while the argumentation runs on a very abstract level. E.g. line 88 seems fairly weakly supported by the preceding section 2; as that section is written it seems to me that it is specifically eq. (1) and the existence of divisive normalization that are well supported. A convincing model would use a detailed (biophysical) neuron model that can actually support all the required computation. This would also make this model of inference appealing for potential neuromorphic implementations. (By consequence) The scope of the paper is somewhat limited, in as far as that PPCs are one of a number of hypotheses on how the brain performs something like probabilistic inference. The key strengths are well described by the first two paragraphs of the conclusion. The highly distributed representation that this model entails is impressive. The idea of using multiple time scales in loopy belief propagation to counteract overcounting is nice (though I am no expert in this field) and I hope I will see more work on this in the future.

2-Confident (read it all; understood it all reasonably well)

The paper outlines a neural network implementation of loopy belief propagation using the tree-based reparameterization of Wainwright, Jakkola, and Willsky. The work is significant in that it provides a plausible implementation of LBP in neural networks and makes deep connections to distributed representations, divisive normalization, and probabilistic population codes.

Although sections 1-4 are very well written and provide a clear outline of the main ideas of the authors, I have 2 concerns: 1) Am concerned that the experiments presented in section 5 insufficiently demonstrate the quality of their method. For example, it is impossible to determine which dots correspond to which network architectures. 2) It is difficult to determine the necessity of the separation of time scales in equation (11). After all, we could simply have re-written equation (6) so that the RHS of the second set of equations was only a function of pseudomarginals from iteration n. If the separation of time scales is necessary, by how much? How would this have effected their dynamics? Or the convergence?

2-Confident (read it all; understood it all reasonably well)

This paper proposed a biologically-plausible neural implementation of probabilistic inference based on Tree-based Parametrizations and Probabilistic population Codes.

1. The authors give an example of Gaussian graphical model and then the non-linear functions of equation (10) are all quadratic. How about other graphical models? How to implement other non-linear computation with the neural circuits ? 2.It is not clear to me how to implement inference of discrete distribution with this framework.

2-Confident (read it all; understood it all reasonably well)

Authors present a biologically-plausible nonlinear recurrent network of neurons that (1) can represent a multivariate probability distribution using population codes (PPC), and (2) can perform inference analogous to Loopy Belief Propagation. The algorithm is based on a reparameterization of the joint distribution to obtain approximate marginal probabilities. The paper is sound and technically clear and provides a clear overview of the different approaches that are combined in this novel algorithm. It is finally validated on synthetic data.

The paper, while presenting a quite novel algorithm, contains a long introduction reviewing the different techniques (PPC, LBP, TRP) before actually presenting the proposed neural-network. There is thus quite little room to detail the algorithm in terms of (1) biological plausibility (2) mathematical soundness (convergence proof?) (3) application to real data. Also, the extension of the discrete algorithm to a continuous dynamical system seems very attractive but should be justified mathematically or at least by showing that both versions lead to similar convergences on synthetic data. An application to real data could also illustrate the potential impact of the algorithm.

2-Confident (read it all; understood it all reasonably well)