{"title": "Push-pull Feedback Implements Hierarchical Information Retrieval Efficiently", "book": "Advances in Neural Information Processing Systems", "page_first": 5701, "page_last": 5710, "abstract": "Experimental data has revealed that in addition to feedforward connections, there exist abundant feedback connections in a neural pathway. Although the importance of feedback in neural information processing has been widely recognized in the field, the detailed mechanism of how it works remains largely unknown. Here, we investigate the role of feedback in hierarchical information retrieval. Specifically, we consider a hierarchical network storing the hierarchical categorical information of objects, and information retrieval goes from rough to fine, aided by dynamical push-pull feedback from higher to lower layers. We elucidate that the push (positive) and pull (negative) feedbacks suppress the interferences due to neural correlations between different and the same categories, respectively, and their joint effect improves retrieval performance significantly. Our model agrees with the push-pull phenomenon observed in neural data and sheds light on our understanding of the role of feedback in neural information processing.", "full_text": "Push-pull Feedback Implements Hierarchical\n\nInformation Retrieval Ef\ufb01ciently\n\nXiao Liu1\n\nXiaolong Zou1\n\nZilong Ji2\n\nGengshuo Tian3\n\nYuanyuan Mi4\n\nTiejun Huang1\n\nK. Y. Michael Wong5\n\n1School of Electronics Engineering & Computer Science, IDG/McGovern Institute for Brain Research,\n\nPeking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies,\n\nPeking University, Beijing, China.\n\nSi Wu1\n\n2State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, China.\n\n3Department of Mathematics, Beijing Normal University, China\n\n4Center for Neurointelligence, Chongqing University, China\n\n5Department of Physics, Hong Kong University of Science and Technology, China.\n\n{xiaoliu23,xiaolz,tjhuang,siwu}@pku.edu.cn,\n\njizilong@mail.bnu.edu.cn, gengshuo_tian@163.com,\n\nmiyuanyuan0102@cqu.edu.cn, phkywong@ust.hk\n\nAbstract\n\nExperimental data has revealed that in addition to feedforward connections, there\nexist abundant feedback connections in a neural pathway. Although the impor-\ntance of feedback in neural information processing has been widely recognized\nin the \ufb01eld, the detailed mechanism of how it works remains largely unknown.\nHere, we investigate the role of feedback in hierarchical information retrieval.\nSpeci\ufb01cally, we consider a hierarchical network storing the hierarchical categorical\ninformation of objects, and information retrieval goes from rough to \ufb01ne, aided\nby dynamical push-pull feedback from higher to lower layers. We elucidate that\nthe push (positive) and pull (negative) feedbacks suppress the interferences due to\nneural correlations between different and the same categories, respectively, and\ntheir joint effect improves retrieval performance signi\ufb01cantly. Our model agrees\nwith the push-pull phenomenon observed in neural data and sheds light on our\nunderstanding of the role of feedback in neural information processing.\n\n1\n\nIntroduction\n\nDeep neural networks (DNNs), which mimic hierarchical information processing in the ventral visual\npathway, have achieved great success in object recognition [15]. The structure of DNNs mainly\ncontains feedforward connections from lower to higher layers. The experimental data, however, has\nrevealed that there also exist abundant feedback connections from higher to lower layers, whose\nnumber is even larger than that of feedforward ones [23]. It has been widely suggested that these\nfeedback connections play an important role in visual information processing. For instance, the\ntheory of analysis-by-synthesis proposes that the feedback connections, in coordination with the\nfeedforward ones, enable the neural system to recognize an object in an interactive manner [16], that\nis, the feedforward pathway extracts the object information from external inputs, while the feedback\npathway generates hypotheses about the object; and the interaction between the two pathways\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\faccomplishes the recognition task. Based on a similar idea, the theory of predictive coding proposes\nthat the feedback from the higher cortex predicts the output of the lower cortex [22]. Although the\nimportance of feedback has been widely recognized in the \ufb01eld, computational models elucidating\nhow it works exactly remain poorly developed. Interestingly, the experiment data has unveiled a\nsalient characteristic of feedback in the visual system [8, 6]. Fig.1A displays the neural population\nactivities in V1 when a monkey was performing a contour integration task [8]. In response to the visual\nstimulus, the neural activity in V1 increased at the early phase, displaying the push characteristic;\nand decreased at the late phase, displaying the pull characteristic. Multi-unit recording revealed that\nin the pull phase, there was strong negative feedback from the higher cortex (V4) [6]; while in the\npush phase, although the contributions of the feedforward input and feedback were mixed, causality\nanalysis con\ufb01rmed that there indeed existed a feedback component [7].\n\nFigure 1: A. The push-pull phenomenon of neural population activity. Data was recorded at V1 of an\nawake monkey. The visual stimulus was a contour embedded in noises, which was onset at t = 0.\nThe blue curve shows the change of neural response over time, which increased at the early phase\nand decreased at the late phase, when the monkey recognized the contour. The red curve represents\nthe condition when the monkey did not recognize the contour, and the green curve the condition of\nno visual stimulus. Adopted from [8]. B. A three-layer network storing hierarchical categories of\nobjects, denoted, from bottom to top, as child, parent, and grandparent patterns, respectively. Between\nlayers, neurons are connected by both feedforward and feedback connections. C. The branching tree\ndisplaying the categorical relationships between hierarchical memory patterns.\n\nThe categorization of objects is based on their similarity/dissimilarity either at the image level or in\nthe semantic sense, and it is organized hierarchically, in the sense that objects belonging to the same\ncategory are more similar than those belonging to different ones. The experimental data has revealed\nthat the brain encodes these similarities using overlapping neural representations, with the greater the\nsimilarity, the larger the correlation between neural responses [26]. However, it is well known that\nneural networks have dif\ufb01culties of storing and retrieving correlated memory patterns; a small amount\nof correlation in the Hop\ufb01eld model will deteriorate memory retrieval dramatically [9]. Notably,\nthis instability is intrinsic to a neural network, as it utilizes synapses to carry out both information\nencoding and retrieving: the synaptic strengths are affected by the correlations of stored patterns,\nwhich in turn interfere with the retrieval of a stored pattern. Thus, a dilemma is raised to neural\ncoding: on one hand, to encode the categorical relationships between objects, a neural system needs\nto exploit correlated neural representations; on the other hand, to retrieve information reliably, these\npatterns correlations are harmful. How to achieve reliable hierarchical information retrieval in a\nneural network remains unresolved in the \ufb01eld [2, 13].\nIn the present study, motivated by the push-pull phenomenon in neural data, we investigate the role\nof feedback in hierarchical information retrieval. Speci\ufb01cally, we consider that a neural system\nemploys a rough-to-\ufb01ne retrieval procedure, in which higher categorical representations of objects\nare retrieved \ufb01rst, since they are less correlated than lower categorical ones and hence have better\nretrieval accuracy; subsequently, through feedback, the retrieved higher categorical information\nfacilitates the retrieval of lower categorical representations. We elucidate that the optimal feedback\nshould be dynamical, varying from positive (push) to negative (pull) over time, and they suppress\nthe interferences due to pattern correlations from different and the same categories, respectively.\nUsing synthetic and real data, we demonstrate that the push-pull feedback implements hierarchical\ninformation retrieval ef\ufb01ciently.\n\n2\n\nABC\fxl\n\ni(t + 1) = sign(cid:2)hl\ni(t)(cid:3) ,\n(cid:88)\n(cid:88)\n\nW 1\n\nijx1\n\nj (t) +\n\nh1\ni (t) =\n\nW 1,2\n\nij x2\n\nj (t).\n\n(1)\n\n(2)\n\n2 A model for Hierarchical Information Representation\n\nTo elucidate the role of feedback clearly, we consider a simple model for hierarchical information\nrepresentation. The model is kept simple to illustrate insights derived from the role played by different\nsources of interferences during different stages of dynamical retrieval. Speci\ufb01cally, the model consists\nof three layers which store three-level hierarchical memory patterns. For convenience, we call the\nthree layers, from bottom to top, child, parent, and grandparent layers, respectively, to re\ufb02ect their\nascending category relationships (Fig.1B). Neurons in the same layer are connected recurrently\nwith each other to function as an associative memory. Between layers, neurons communicate via\nfeedforward and feedback connections. Denote the state of neuron i in layer l at time t as xl\ni(t) for\ni = 1, . . . , N, which takes value of \u00b11, the symmetric recurrent connections from neuron j to i in\nij, the feedforward connections from neuron j of layer l to neuron i in layer l + 1 as\nlayer l as W l\nW l+1,l\nand the feedback connections from neuron j of layer l + 1 to neuron i of layer l as W l,l+1\n.\nThe neuronal dynamics follows the Hop\ufb01eld model [10], which is written as\n\nij\n\nij\n\nwhere sign(h) = 1 for h > 0 and sign(h) = \u22121 otherwise. hl\ni(t) is the total input received by the\nneuron, which is given by (only the result for layer 1 is shown, and the results for other layers are\nsimilar),\n\nj\n\nj\n\nWe generate synthetic patterns to study information retrieval, which are denoted as: {\u03be\u03b1} for\n\u03b1 = 1, . . . , P\u03b1 represents the grandparent patterns, {\u03be\u03b1,\u03b2} for \u03b2 = 1, . . . , P\u03b2 the parent patterns of\ngrandparent \u03b1, and {\u03be\u03b1,\u03b2,\u03b3} for \u03b3 = 1, . . . , P\u03b3 the child patterns of parent (\u03b1, \u03b2), where P\u03b1, P\u03b2,\nand P\u03b3 are denoted as the number of grandparent patterns, the number of parent patterns belonging\nto the same grandparent, and the number of child patterns belonging to the same parent, respectively.\nThese hierarchical memory patterns are constructed as follows [1] (Fig.1C).\nFirst, grandparent patterns are statistically independent of each other. The value of each element in a\ngrandparent pattern is drawn from the distribution\n\nP (\u03be\u03b1\n\ni ) =\n\n1\n2\n\n\u03b4(\u03be\u03b1\n\ni + 1) +\n\n\u03b4(\u03be\u03b1\n\ni \u2212 1),\n\n1\n2\n\n(3)\n\nwhere \u03b4(x) = 1 for x = 0 and \u03b4(x) = 0 otherwise. Each element of a grandparent has equal\nprobabilities of taking a value of 1 or \u22121.\nSecondly, for each grandparent pattern, its descending parent patterns are drawn from the distribution\n\nP (\u03be\u03b1,\u03b2\n\ni\n\n) = (\n\n1 + b2\n\n2\n\n)\u03b4(\u03be\u03b1,\u03b2\n\ni \u2212 \u03be\u03b1\n\ni ) + (\n\n1 \u2212 b2\n\n2\n\n)\u03b4(\u03be\u03b1,\u03b2\n\ni + \u03be\u03b1\n\ni ),\n\n(4)\n\nwhere 0 < b2 < 1 implies that each element of a parent pattern has a probability of (1 + b2)/2 > 0.5\nto have the same value as the corresponding element of the grandparent. This establishes the\nrelationship between a grandparent and the parent patterns.\nThirdly, for each parent pattern, its descending child patterns are drawn from the distribution\n\nP (\u03be\u03b1,\u03b2,\u03b3\n\ni\n\n) = (\n\n1 + b1\n\n2\n\n)\u03b4(\u03be\u03b1,\u03b2,\u03b3\n\ni\n\n\u2212 \u03be\u03b1,\u03b2\n\ni\n\n) + (\n\n1 \u2212 b1\n\n2\n\n)\u03b4(\u03be\u03b1,\u03b2,\u03b3\n\ni\n\n+ \u03be\u03b1,\u03b2\n\ni\n\n),\n\n(5)\n\nwhere 0 < b1 < 1, which speci\ufb01es the relationship between a parent and its child patterns.\nThe above stochastic pattern generation process speci\ufb01es the categorical relationships among memory\npatterns, in the sense that patterns in the same group have stronger correlation than those belong-\ning to different groups. For example, the correlation between two child patterns belonging to the\n1, referred to as the intra-class corre-\nlation; the correlation between two child patterns belonging to different parents but sharing the\n2, referred to as the inter-class\ncorrelation; and the correlation between two child patterns belonging to different grandparents is\n/N = 0. These correlation values satisfy the hierarchical relationship, i.e.,\n\nsame parent (siblings) is given by(cid:80)\nsame grandparent (cousins) is given by(cid:80)\ngiven by(cid:80)\n\ni \u03be\u03b1,\u03b2,\u03b3\u03be\u03b1,\u03b2(cid:48),\u03b3(cid:48)\n\ni \u03be\u03b1,\u03b2,\u03b3\u03be\u03b1,\u03b2,\u03b3(cid:48)\n\n/N = b2\n\n/N = b2\n\ni \u03be\u03b1,\u03b2,\u03b3\u03be\u03b1(cid:48),\u03b2(cid:48),\u03b3(cid:48)\n\n1b2\n\n3\n\n\f1b2\n\n2 > 0. Other correlation relationships can be obtained similarly (see Sec.1 in Supplementary\n\nb2\n1 > b2\nInformation (SI)).\nEach layer of the network behaves as an associative memory. Using the standard Hebbian learning\nrule, the recurrent connections between neurons in the same layer are constructed to be W 1\nij =\nj /N. The feedforward\n\u03be\u03b1,\u03b2,\u03b3\nij =\nj\n/N. It is easy to understand the effect of feedforward connections. For example, if layer\n1 is at the state of the memory pattern \u03be\u03b10,\u03b20,\u03b30, then the feedforward input to layer 2 is given by\n, which contributes to improving the retrieval of the parent pattern \u03be\u03b10,\u03b20 at\n\nconnections from lower to higher layers are set to be W 2,1\n\nij = (cid:80)\nij =(cid:80)\n\nij = (cid:80)\n\n(cid:80)\n(cid:80)\n(cid:80)\n\n\u03b1 \u03be\u03b1\n\ni \u03be\u03b1\n\u03b1,\u03b2,\u03b3 \u03be\u03b1,\u03b2\n\n/N, and W 3,2\n\n\u03b1,\u03b2,\u03b3 \u03be\u03b1,\u03b2,\u03b3\n\n/N, and W 3\n\n\u2248 \u03be\u03b10,\u03b20\n\n\u03b1,\u03b2 \u03be\u03b1\n\ni \u03be\u03b1,\u03b2\n\n\u03b1,\u03b2 \u03be\u03b1,\u03b2\n\n/N, W 2\n\n\u03be\u03b1,\u03b2,\u03b3\nj\n\n\u03be\u03b1,\u03b2\nj\n\nj\n\ni\n\ni\n\nj W 2,1\n\nij \u03be\u03b1,\u03b2,\u03b3\n\nj\n\ni\n\nlayer 2. The form of feedback connections is the focus of this study and will be introduced later.\nTo quantify the retrieval performance, we de\ufb01ne a macroscopic variable m(t), measuring the overlap\nbetween the neural state x(t) and a memory pattern, which is calculated to be [9] (again, for simplicity,\nonly the result for layer 1 is shown),\n\ni\n\nm\u03b1,\u03b2,\u03b3(t) =\n\n1\nN\n\nN(cid:88)\n\ni=1\n\n\u03be\u03b1,\u03b2,\u03b3\ni\n\nx1\ni (t).\n\n(6)\n\nwhere \u22121 < m\u03b1,\u03b2,\u03b3(t) < 1 represents the retrieval accuracy of the memory pattern \u03be\u03b1,\u03b2,\u03b3, and the\nlarger the value of m, the higher the retrieval accuracy.\n\n3\n\nInformation retrieval without feedback\n\nTo elucidate the role of feedback, it is valuable to \ufb01rst check information retrieval without feedback,\nand without loss of generality, we focus on layer 1. Following the standard stability analysis [9],\nwe consider that the initial state of layer 1 is a memory pattern, x1(0) = \u03be\u03b10,\u03b20,\u03b30, and investigate\nwhat are the key factors determining the retrieval performance. After one step of iteration, we get the\nretrieval accuracy,\n\nm\u03b10,\u03b20,\u03b30 (1) =\n\n=\n\n1\nN\n\n1\nN\n\nN(cid:88)\nN(cid:88)\n\ni=1\n\ni=1\n\n(cid:104)\n\nN(cid:88)\n(cid:105)\n\ni=1\n\n.\n\n\u03be\u03b10,\u03b20,\u03b30\ni\n\nx1\ni (1) =\n\n1\nN\n\n\u03be\u03b10,\u03b20,\u03b30\ni\n\nsign\n\n\u03be\u03b10,\u03b20,\u03b30\ni\n\nh1\ni (0)\n\nsign(cid:2)h1\n\ni (0)(cid:3) ,\n\nWe see that the retrieval of a memory pattern is determined by the alignment between the neural input\nand the memory pattern, which is further written as (see Sec.2 in SI),\n\n\u03be\u03b10,\u03b20,\u03b30 h1\n\ni (0) = \u03be\u03b10,\u03b20,\u03b30\n\n1\nN\n\n(cid:88)\n\nj\n\nj (0) = 1 + Ci +(cid:102)Ci.\n\nW 1\n\nijx1\n\nHere, the input received by the neuron is decomposed into the signal and noise parts, and the latter\n\nis further divided into two components, Ci and (cid:101)Ci, which represent, respectively, the interferences\n\nto memory retrieval due to: 1) the correlation of the pattern to be retrieved with siblings from the\nsame parent, called the intra-class interference; 2) the correlation of the pattern to be retrieved\nwith cousins from the same grandparent but different parents, called the inter-class interference.\nIt can be checked that in the limits of large N, P\u03b3 and P\u03b2, the intra- and inter- class interferences,\n\nCi and (cid:101)Ci, satisfy the distributions, P (Ci) = N (EC, VC)(1 + b1)/2 + N (\u2212EC, VC)(1 \u2212 b1)/2,\nP ((cid:101)Ci) = N (E(cid:101)C, V(cid:101)C)(1 + b1b2)/2 + N (\u2212E(cid:101)C, V(cid:101)C)(1 \u2212 b1b2)/2, where N (E, V ) represents a\n1(P\u03b3 \u2212 1), E(cid:101)C = b3\n2P\u03b3(P\u03b2 \u2212 1),\n1b3\n2) (see Sec.2 in SI).\n\nnormal distribution with mean E and variance V , and EC = b3\n1)(1 \u2212 b2\nVC = b4\nThe breadth of the above noise distributions, as a consequence of pattern correlations, implies that\neven starting from a noiseless state, the network dynamics still incur retrieving instability [9], and the\n\n1), V(cid:101)C = b4\n2P\u03b3(P\u03b2 \u2212 1)(1 \u2212 b2\n1b4\nerror occurs when noises are large (i.e., Ci + (cid:101)Ci < \u22121).\n\n1(P\u03b3 \u2212 1)(1 \u2212 b2\n\n(7)\n\n(8)\n\n4\n\n\f4 Hierarchical Information Retrieval with the push-pull feedback\n\nAccording to the above theoretical analysis, to improve memory retrieval, the key is to suppress\nthe inter- and intra- class noises due to pattern correlations. Note that, in practice, the correlations\nbetween higher categorical patterns tend to be smaller than that between lower categorical patterns.\nFor example, the similarity between cats and dogs is usually smaller than that between two sub-types\nof cats. In our model, this corresponds to the condition of b1 > b2. For an associative memory, this\nimplies that given the same amount of input information (e.g., an ambiguous image of a Siamese cat),\nthe parent pattern (e.g., a cat) can be better retrieved than the child pattern (e.g., a Siamese cat). Thus,\nwe consider a rough-to-\ufb01ne retrieval procedure, in which the parent pattern in layer 2 is \ufb01rst retrieved,\nwhose result is subsequently fed back to layer 1 to improve the retrieval of the child pattern.\nBelow, for the convenience of analysis, we assume that the parent pattern is \ufb01rst perfectly retrieved\n(m = 1) and explore the appropriate form of feedback which can ef\ufb01ciently utilize the parent\ninformation to enhance the retrieval of the child pattern. Later, we carry out simulations demonstrating\nthat the model works in general cases when the parent pattern is not perfectly retrieved.\n\n4.1 The form and the role of push feedback\n\nWe \ufb01rst show that a push (positive) feedback of a proper form can suppress the inter-class interference\nin memory retrieval effectively. Without loss of generality, we consider that for a given input, the\ncorresponding child pattern to be retrieved in layer 1 is \u03be\u03b10,\u03b20,\u03b30 and that the corresponding parent\npattern in layer 2 is \u03be\u03b10,\u03b20. Consider the push feedback of the below form,\n\n(cid:88)\nfeedback to neuron i in layer 1 is calculated to be(cid:80)\n\nW 1,2\nij\n\nN P\u03b3\n\n=\n\n1\n\n\u03b1,\u03b2,\u03b3\n\nwhich follows the standard Hebb rule between the parent and their child patterns, and its contribution\nis intuitively understandable. Given that the parent pattern \u03be\u03b10,\u03b20 in layer 2 is retrieved, its push\n/P\u03b3. Obviously,\nthis positive current increases the activities of all child patterns belonging to the parent, i.e., those\n\u03be\u03b10,\u03b20,\u03b3 for any \u03b3, and it has little in\ufb02uence on other child patterns from different parents, i.e., those\n\u03be\u03b10,\u03b2,\u03b3 for \u03b2 (cid:54)= \u03b20. Due to the competition between memory patterns in the network dynamics, this\neffectively suppresses the inter-class interference in memory retrieval (Fig.2A ).\n\n\u03b3 \u03be\u03b10,\u03b20,\u03b3\n\nij \u03be\u03b10,\u03b20\n\nj W 1,2\n\nj\n\ni\n\n\u2248(cid:80)\n\n\u03be\u03b1,\u03b2,\u03b3\ni\n\n\u03be\u03b1,\u03b2\nj\n\n,\n\n(9)\n\ncontribution to sibling patterns is measured by(cid:80)\nby(cid:80)\n\nFigure 2: A. Illustrating the effect of push-feedback. The parent pattern is \u03be1,1, whose feedback\n\u03b3 m1,1,\u03b3/P\u03b3, and to cousin patterns is measured\n\u03b3 m1,2,\u03b3/P\u03b3. The results averaged over 100 trials are shown. B. Illustrating the effect of pull-\nfeedback. The distributions of the intra-class noise Ci without feedback and the noise C\u2217\ni with the\npull-feedback are presented (see Eqs.(11,14) in SI). Retrieval errors occur when Ci < \u22121 or C\u2217\ni < \u22121\n(indicated by the yellow line). The parameters are N = 2000, P\u03b1 = 2, P\u03b2 = 10, P\u03b3 = 70, b1 = 0.2,\nand b2 = 0.15.\n\n4.2 The form and the role of pull feedback\n\nWe further show that a pull (negative) feedback of an appropriate form can suppress the intra-class\ninterference in memory retrieval. Consider the pull feedback of the below form,\n\n(10)\n\nij = \u2212b1\u03b4ij.\nW 1,2\n\n5\n\n-0.100.10.20.30.4without feedbackwith push-feedbackm1,1,/PMm1,2,/PM0.40.30.20.10-0.1-2-1012Ci,Ci*00.20.40.6P(Ci),P(Ci*)ABcici*0.8\f1 is calculated to be(cid:80)\n\nGiven the parent pattern \u03be\u03b10,\u03b20 in layer 2 is retrieved, its negative feedback to neuron i in layer\n. In the large P\u03b3 limit, the parent pattern is\napproximated to be the mean of its child patterns (Sec.1 in SI), thus, in effect the pull feedback is to\nsubtract a portion of the mean value from sibling patterns. The retrieval accuracy of the target child\npattern after applying the pull feedback is calculated to be (Sec.3 in SI),\n\n= \u2212b1\u03be\u03b10,\u03b20\n\nij \u03be\u03b10,\u03b20\n\nj W 1,2\n\nj\n\ni\n\nN(cid:88)\n\ni=1\n\n(cid:104)\n\n(cid:105)\n\ni + (cid:101)Ci\n\nm\u03b10,\u03b20,\u03b30 =\n\n1\nN\n\nsign\n\n1 + C\u2217\n\n,\n\n(11)\n\n(cid:88)\n\n(cid:88)\n\nj\n\nk\ni ), m, n = 1, 2,\n\ni \u2261 Ci \u2212 b1\u03be\u03b10,\u03b20,\u03b30\nwhere C\u2217\nis the new noise term after applying the pull-feedback. As shown\nin Fig.2B, with the pull feedback, the negative tail of the noise distribution (where retrieval errors\noccur) is considerably reduced.\n\n\u03be\u03b10,\u03b20\ni\n\ni\n\n4.3 The joint effect of the push-pull feedback\n\nSummarizing the above results, we come to the conclusion that to achieve good information retrieval,\nthe neural feedback needs to be dynamical, exerting the push and pull components at different stages,\nso that they can suppress the inter- and intra- class interferences, respectively.\nTo better demonstrate the joint effect of the push-pull feedback, we consider a continuous version of\nthe Hop\ufb01eld model, so that the network state changes smoothly and the joint effects of push- and\npull- feedbacks are integrated over time (the discrete Hop\ufb01eld model still works, but the overall effect\nis less signi\ufb01cant). The network dynamics are given by [11]\n\n\u03c4\n\ndhn\ni\ndt\n\n= \u2212hn\n\ni +\n\nW n\n\nijxn\n\nj +\n\nW n,m\n\nik\n\n(t)xm\n\nk + I ext,n\n\ni\n\n,\n\n(12)\n\ni\n\ni and xn\n\nik = a+P\u03b3\n\n\u03b1,\u03b2,\u03b3(\u03be\u03b1,\u03b2,\u03b3\n\nxn\ni = f (hn\n\n\u2212 (cid:104)\u03be(cid:105))(\u03be\u03b1,\u03b2,\u03b3\n\ni < 1.To match the strength of \ufb01ring rate xn\n\n(13)\nwhere hn\ni denote the synaptic input and the \ufb01ring rate of neuron i in layer n, respectively,\nand their relationship is given by a sigmoid function, f (x) = arctan(8\u03c0x)/\u03c0 + 1/2, therefore\ni , we also align all the hierarchical patterns \u03bei\n0 < xn\ninto 0, 1. The parameter \u03c4 is the time constant. The recurrent and feedforward connections follow\n(cid:80)\nthe standard Hebb rule as described above. The feedback connections are slightly modi\ufb01ed from\nEqs.(9-10) to accommodate positive values of neural activities in the continuous model. They are\n\u2212 (cid:104)\u03be(cid:105))/N, with a+ being\ngiven by: the push-feedback W 1,2\nik = \u2212a\u2212b1\u03b4ik, with a\u2212 being a positive number. I ext\na positive number, and the pull-feedback W 1,2\nis the external input conveying the object information. The push and pull feedbacks are applied\nsequentially, with each of them lasting in the time order of \u03c4 (\u03c4 \u223c 10 \u2212 20ms), as suggested by the\ndata [6]. For details of the model, see Sec.4.1 in SI.\nFig.3 displays a typical example of the memory retrieval process in the network, demonstrating that:\n1) the neural population activity at layer 1 exhibits the push-pull phenomenon, agreeing qualitatively\nwith the experimental data (Fig.3A compared to Fig.1A); 2) the retrieval accuracy of layer 1 with\nthe push-pull feedback is improved signi\ufb01cantly compared to the case of no feedback (Fig.3B).\nInterestingly, we note that when the push feedback is applied, the retrieval accuracy of the target\nchild pattern is decreased a little bit. This is due to that the push feedback only aims at reducing the\ninter-class interference without differentiating sibling patterns.\nWe evaluate the performances of the model by varying the amplitude of pattern correlations and\ncon\ufb01rm that the push-pull feedback always improves the network performance statistically (Sec.4.2\nin SI).\n\nk\n\ni\n\n5 Applying to Real Images\n\nWe test our model in the processing of real images. As shown in the top of Fig.4A, the dataset we use\nconsists of P\u03b2 = 2 types of animals, cats and dogs, corresponding to parents in our model. For each\ntype of animal, it is further divided into P\u03b3 = 9 sub-types, corresponding to children. A total of 1800\nimages, with K = 100 for each sub-type of animals, are chosen from ImageNet. It has been shown\nthat the neural representations generated by a DNN (after being trained with ImageNet) capture the\n\n6\n\n\fFigure 3: A. The neural population activity (cid:104)x(cid:105) in layer 1 as a function of time in a typical trial (blue\ncurve), which exhibits the push-pull phenomenon as observed in the experiment [8]. The red curve is\nthe case without feedback. (cid:104)x(cid:105) is obtained by averaging over all neurons in layer 1. B. The retrieval\naccuracies of the child (red curve) and the parent patterns (yellow curve) as functions of time in the\nsame trial as in A. The blue curve is the case without feedback. The lower panel in both A and B\ndisplays the time course of applying an external input (t \u2208 (0, 4\u03c4 )), the push-feedback (t \u2208 (\u03c4, 2\u03c4 )),\nand the pull-feedback (t \u2208 (2\u03c4, 3\u03c4 )). The child pattern conveyed by the external input is \u03be1,1,1, and\nthe corresponding parent pattern is \u03be1,1. The parameters used are: N = 2000, P\u03b3 = 25, P\u03b2 = 4,\nP\u03b1 = 2, b1 = 0.2, b2 = 0.1, \u03c4 = 5,aext = 1,a1\next = 0.1, a+ = 1, a\u2212 = 10, \u03bb1=0.1,\n\u03bb2=0.1.\n\nr = 1, a2\n\nr = 2,a2\n\ncategorical relationships between objects, in the sense that the overlap between neural representations\nre\ufb02ect the closeness of objects in category, rather than their similarity in pixels [26]. This indicates\nthat the memory patterns are hierarchically organized. We therefore pre-process images by \ufb01ltering\nthem through VGG, a type of DNN [24], and use the neural representations generated by VGG\n(i.e., the neural activities before the read-out layer) to construct the memory patterns. The details\nof pre-processing are described in Sec.5 in SI. The lower panel of Fig.4A shows the correlations\nbetween the memory patterns generated by VGG, which exhibits a hierarchical structure, i.e., siblings\nfrom the same parent have stronger correlations than cousins from different parents, similar to the\ncorrelation structure in our model.\n\nFigure 4: The model performances with real images. A. The dataset. Top panel: example images,\none for each sub-type of cat or dog. Lower panel: the correlations between child patterns after\npre-processing by VGG. Cat: 1 \u2212 9; Dog: 10 \u2212 18. B. Retrieval accuracies of the child (cat A, blue\ncurve) and parent (cat, yellow curve) patterns as functions of time in a typical trial. The red curve is\nthe case without feedback. C. Different effects of the push and pull feedbacks in the example trial\nas in B. The blue, purple, and green curves represent, respectively, the retrieval accuracies of the\ntarget child (cat A, a Siamese cat), the siblings (other sub-types of cats), and other child patterns\n(all sub-types of dogs) in layer 1. In B-C, the image presented to the network is cat Siamese, and\nthe lower panel displays the time course of applying the external input (0, 4\u03c4 ), the push feedback\n(\u03c4, 2.4\u03c4 ), and the pull feedback (2.4\u03c4, 3.8\u03c4 ). The parameters: N = 4096, a1\next = 6,\next = 1, a+ = 2, a\u2212 = 1.5, a21 = 1. Other parameters are the same as in Fig.3.\na2\n\nr = 1, a2\n\nr = 2, a1\n\nWe present each image to the network and measure its retrieval accuracy by calculating m\u03b2,\u03b3, i.e., the\noverlap between the network response and the memory pattern corresponding to the image. Fig.4B\nshows a typical example of the retrieval process. We see that the retrieval accuracy of layer 1 keeps\n\n7\n\nTime ( )0 1 23 4 5InputPushPullTime ( )PushRetrieval AccuracyuPll00.20.40.60.81012345without fbwith fbm1,1,11,1,1mm1,1Population Activity < >00.20.40.60.81<x>without fbwith fb<x>0 1 2 3 4 50 1 2 3 4 5InputABACBPullInputRetrieval AccuracyPush-0.200.20.40.6PullInputPushRetrieval Accuracy00.20.40.60.813691215183 6 9 12 15 18catdogcatdogmcat Awithout fbmcat Awith fbmcat01234Time(\u03c4)00.20.40.60.81012340123401234Time(\u03c4)mcat A<m>cat<m>dog 0.8 0.6 0.4 0.2 0-0.1\fincreasing when the push and pull feedbacks are applied sequentially, and the result is signi\ufb01cantly\nimproved compared to the case without feedback. Over 1800 images, the averaged improvement is\n71.04% (measured at the moment when the pull feedback stops).\nTo illustrate the individual effects of the push and pull feedbacks, we also calculate the retrieval\naccuracies of sibling and cousin patterns. As shown in Fig.4C, we see that: 1) at the early phase\nof push feedback, both the retrieval accuracies of the target child pattern and its siblings increase,\nwhereas the retrieval accuracy of cousins drops, indicating that the push feedback has the effect of\nsuppressing the inter-class interference; 2) at the later phase of pull feedback, the retrieval accuracy of\nthe target child pattern experiences another signi\ufb01cant increase much larger than that for other child\npatterns, indicating that the pull feedback has the effect of suppressing the intra-class interference.\n\n6 Conclusion and Discussion\n\nThe present study investigates the role of feedback in hierarchical information retrieval. Hierarchical\nassociative memory models have been studied previously [21, 1, 25, 20], but these works considered\nonly a single layer network without feedback. To our knowledge, our paper is the \ufb01rst one studying the\ncontributions of feedback. In machine learning, there were studies which utilize the semantics-based\nhigher category knowledge of objects as side information to enhance image recognition [14, 19, 3],\nbut they are very different from our network model in the use of dynamical feedback between layers\nto enhance information retrieval.\nFeedback connections have been widely observed in neural signalling pathways, but their exact\ncomputational functions remain largely unclear. Here, in the task of information retrieval, our\nstudy reals that the neural feedback, which varies from positive (push) to negative (pull) over time,\ncontributes to the suppression of the inter- and intra- class noises in information retrieval. This\npush-pull characteristic agrees with the push-pull phenomenon of neural activities observed in the\nexperiments [8, 6]. Notably, the neural systems have resources to realize such a dynamical feedback,\nand they are likely implemented via different signal pathways. For instance, the push feedback may\nbe realized via direct excitatory synapses from higher to lower layers, and the stopping of push\nfeedback can be controlled by short-term synaptic depression; on the other hand, the pull feedback\nmay go through a separate path mediated by inhibitory interneurons, which is naturally delayed\ncompared to the direct excitatory path [12].\nThrough studying feedback, the present study also addresses a dilemma in neural coding, which\nconcerns the con\ufb02icting roles of neural correlation: on one hand, pattern correlations are essential\nto encode the categorical relationships between objects; on the other hand, they inevitably incur\ninterference to memory retrieval. To diminish the correlation interference, we propose that neural\nsystems employ a rough-to-\ufb01ne information retrieval procedure. Upon receiving the external informa-\ntion, the higher categorical pattern is \ufb01rst retrieved, whose result is subsequently utilized to enhance\nthe retrieval of the lower categorical pattern via dynamical push-pull feedback. In such a way, the\nhighly correlated neural representations for objects are reliably retrieved. The idea of rough-to-\ufb01ne\ninformation retrieval is in agreement with the concept of \u201cglobal \ufb01rst\" in cognitive science, which\nstates that the brain extracts \ufb01rst the global (e.g., topological), rather than the local (e.g., Euclidean),\ngeometrical features of objects [4]. This phenomenon has been con\ufb01rmed by a large volume of\npsychophysical experiments [5]. Here, our study unveils a computational advantage of \u201cglobal \ufb01rst\"\nnot realized previously, that is, extracting global features \ufb01rst, aided by the push-pull feedback, serves\nas an ef\ufb01cient strategy to overcome the interference due to neural correlations. It has been suggested\nby experimental \ufb01ndings that the dorsal pathway [17], and/or the subcortical pathway from retina\nto superior colliculus [18], carry out the rapid computation of extracting global features of objects;\nwhile, along the ventral pathway, the push-pull feedback assists the feedforward input to extract the\n\ufb01ne structures of objects in a relatively slow manner. In our future work, we will extend the present\nstudy to explore the role of feedback in biologically more detailed models.\n\n8\n\n\fAcknowledgments\n\nThis work was supported by BMSTC (Beijing municipal science and technology commission) under\ngrant No: Z171100000117007 (D.H. Wang & Y.Y. Mi), the National Natural Science Foundation\nof China (N0: 31771146, 11734004, Y.Y. Mi),the National Natural Science Foundation of China\n(N0: 61425025, T.J. Huang) Beijing Nova Program (N0: Z181100006218118, Y. Y. Mi), Guangdong\nProvince with Grant (No. 2018B03033800, Si Wu & Y.Y. Mi) and grants from the Research Grants\nCouncil of Hong Kong (grant numbers 16322616, 16306817 and 16302419, K. Y. Michael Wong).\nThis work received support from Huawei Technology Co., Ltd..\n\nReferences\n[1] S. Amari, Statistical neurodynamics of associative memory. Neural Networks, 1, 63-73 (1988).\n\n[2] B. Blumenfeld, S. Preminger, D. Sagi, M. Tsodyks, Dynamics of memory representations in\n\nnetworks with novelty-facilitated synaptic plasticity. Neuron, 52, 383-394 (2006).\n\n[3] S. Chandar, S. Ahn, H. Larochelle, P. Vincent, G. Tesauro, Y. Bengio. Hierarchical memory\n\nnetworks. arXiv preprint arXiv:1605.07427(2016).\n\n[4] L. Chen. Topological structure in visual perception. Science, 218:699-700 (1982).\n\n[5] L. Chen. The topological approach to perceptual organization. Visual Cognition, 12, 553-637\n\n(2015).\n\n[6] M. Chen, Y. Yan, X. Gong, C. Gilbert, H. Liang, W. Li, Incremental integration of global contours\n\nthrough interplay between visual cortical areas. Neuron, 82, 682-694 (2014).\n\n[7] R. Chen, F. Wang, H. Liang, W. Li. Synergistic processing of visual contours across cortical\n\nlayers in V1 and V2. Neuron, 96(6), 1388-1402(2017).\n\n[8] A. Gilad, E. Meirovithz, H. Slovin, Population responses to contour integration: early encoding\n\nof discrete elements and late perceptual grouping. Neuron, 78, 389-402 (2013).\n\n[9] J. Hertz, A. S. Krogh, R. G. Palmer, Introduction to the theory of Neural Computation, Addison-\n\nWesley (1991).\n\n[10] J. J. Hop\ufb01eld, Neural networks and physical systems with emergent collective computational\nabilities. Proceedings of the National Academy of Sciences of the USA, 79, 2554 -2558 (1982).\n\n[11] J. J. Hop\ufb01eld, Neurons with graded response have collective computational properties like those\nof two-sate neurons. Proceedings of the National Academy of Sciences of the USA, 81, 3088-3092\n(1984).\n\n[12] E. R. Kandel, J. H. Schwartz, T. M. Jessell. Principles of Neural Science, 4th ed. McGraw-Hill,\n\nNew York (2000).\n\n[13] E. Kropff, A. Treves, Uninformative memories will prevail: the storage of correlated representa-\n\ntions and its consequences. HFSP journal, 1, 249-262 (2007).\n\n[14] S. Kumar, M. Hebert. A hierarchical \ufb01eld framework for uni\ufb01ed context-based classi\ufb01cation.\n\nComputer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on. IEEE, 2(2005).\n\n[15] Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature, 521, 436-444 (2015).\n\n[16] T. Lee, D. Mumford, Hierarchical Bayesian inference in the visual cortex. Journal of the Optical\n\nSociety of America A,20, 1434-1448 (2003).\n\n[17] L. Liu and F. Wang and K. Zhou et al. Perceptual integration rapidly activates dorsal visual\n\npathway to guide local processing in early visual areas. Plos Biology, 15, e2003646 (2017).\n\n[18] J. Mcfadyen and M. Mermillod and J B Mattingley et al. A Rapid Subcortical Amygdala Route\n\nfor Faces Irrespective of Spatial Frequency and Emotion. J. Neurosci. 37:3864-3874 (2017).\n\n9\n\n\f[19] F. Morin, Y. Bengio. Hierarchical Probabilistic Neural Network Language Model. Aistats, 5\n\n246-252(2005).\n\n[20] M. Okada, Turorial series on brain-inspired computing, part 3: brain science, information\n\nscience and associative memory model. New Generation Computing, 24, 185-201 (2006).\n\n[21] N. Parga, M. Virasoro, The ultrametric organization of memories in a neural network. Journal\n\nde Physique, 47, 1857-1864 (1986).\n\n[22] R. Rao, D. Ballard, Predictive coding in the visual cortex: A functional interpretation of some\n\nextra-classical receptive-\ufb01eld effects. Nature Neuroscience, 2, 79-87 (1999).\n\n[23] A. Sillito, J. Cudeiro, H. Jones, Always returning: Feedback and sensory processing in visual\n\ncortex and thalamus. Trends in neurosciences, 29, 307-316 (2006).\n\n[24] K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition.\n\narXiv preprint, arXiv:1409.1556 (2014).\n\n[25] N. Sourlas, Multilayer neural networks for hierarchical patterns. Europhysics Letters, 7, 749-753\n\n(1988).\n\n[26] D. L. Yamins, H. Hong, C. F. Cadieu, E. A. Solomon, D. Seibert, J. J. DiCarlo. Performance-\noptimized hierarchical models predict neural responses in higher visual cortex. Proceedings of\nthe National Academy of Sciences, 111(23), 8619-8624(2014).\n\n10\n\n\f", "award": [], "sourceid": 3063, "authors": [{"given_name": "Xiao", "family_name": "Liu", "institution": "Peking University"}, {"given_name": "Xiaolong", "family_name": "Zou", "institution": "Peking University"}, {"given_name": "Zilong", "family_name": "Ji", "institution": "Beijing Normal University"}, {"given_name": "Gengshuo", "family_name": "Tian", "institution": "Beijing Normal University"}, {"given_name": "Yuanyuan", "family_name": "Mi", "institution": "Weizmann Institute of Science"}, {"given_name": "Tiejun", "family_name": "Huang", "institution": "Peking University"}, {"given_name": "K. Y. Michael", "family_name": "Wong", "institution": "Department of Physics, Hong Kong University of Science and Technology"}, {"given_name": "Si", "family_name": "Wu", "institution": "Peking University"}]}