{"title": "Representing Part-Whole Relationships in Recurrent Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 563, "page_last": 570, "abstract": null, "full_text": "Representing Part-Whole Relationships in Recurrent Neural Networks\nViren Jain2 , Valentin Zhigulin1,2 , and H. Sebastian Seung1,2 1 Howard Hughes Medical Institute and 2 Brain & Cog. Sci. Dept., MIT viren@mit.edu, valentin@mit.edu, seung@mit.edu\n\nAbstract\nThere is little consensus about the computational function of top-down synaptic connections in the visual system. Here we explore the hypothesis that top-down connections, like bottom-up connections, reflect partwhole relationships. We analyze a recurrent network with bidirectional synaptic interactions between a layer of neurons representing parts and a layer of neurons representing wholes. Within each layer, there is lateral inhibition. When the network detects a whole, it can rigorously enforce part-whole relationships by ignoring parts that do not belong. The network can complete the whole by filling in missing parts. The network can refuse to recognize a whole, if the activated parts do not conform to a stored part-whole relationship. Parameter regimes in which these behaviors happen are identified using the theory of permitted and forbidden sets [3, 4]. The network behaviors are illustrated by recreating Rumelhart and McClelland's \"interactive activation\" model [7]. In neural network models of visual object recognition [2, 6, 8], patterns of synaptic connectivity often reflect part-whole relationships between the features that are represented by neurons. For example, the connections of Figure 1 reflect the fact that feature B both contains simpler features A1, A2, and A3, and is contained in more complex features C1, C2, and C3. Such connectivity allows neurons to follow the rule that existence of the part is evidence for existence of the whole. By combining synaptic input from multiple sources of evidence for a feature, a neuron can \"decide\" whether that feature is present. 1 The synapses shown in Figure 1 are purely bottom-up, directed from simple to complex features. However, there are also top-down connections in the visual system, and there is little consensus about their function. One possibility is that top-down connections also reflect part-whole relationships. They allow feature detectors to make decisions using the rule that existence of the whole is evidence for existence of its parts. In this paper, we analyze the dynamics of a recurrent network in which part-whole relationships are stored as bidirectional synaptic interactions, rather than the unidirectional interactions of Figure 1. The network has a number of interesting computational capabilities. When the network detects a whole, it can rigorously enforce part-whole relationships\nSynaptic connectivity may reflect other relationships besides part-whole. For example, invariances can be implemented by connecting detectors of several instances of the same feature to the same target, which is consequently an invariant detector of the feature.\n1\n\n\f\nC1\n\nC2\n\nC3\n\nB\nA1 A2 A3\n\nFigure 1: The synaptic connections (arrows) of neuron B represent part-whole relationships. Feature B both contains simpler features and is contained in more complex features. The synaptic interactions are drawn one-way, as in most models of visual object recognition. Existence of the part is regarded as evidence for existence of the whole. This paper makes the interactions bidirectional, allowing the existence of the whole to be evidence for the existence of its parts.\n\nby ignoring parts that do not belong. The network can complete the whole by filling in missing parts. The network can refuse to recognize a whole, if the activated parts do not conform to a stored part-whole relationship. Parameter regimes in which these behaviors happen are identified using the recently developed theory of permitted and forbidden sets [3, 4]. Our model is closely related to the interactive activation model of word recognition, which was proposed by McClelland and Rumelhart to explain the word superiority effect studied by visual psychologists [7]. Here our concern is not to model a psychological effect, but to characterize mathematically how computations involving part-whole relationships can be carried out by a recurrent network.\n\n1\n\nNetwork model\n\nSuppose that we are given a set of part-whole relationships specified by 1 , if part i is contained in whole a a i = 0, otherwise We assume that every whole contains at least one part, and every part is contained in at least one whole. The stimulus drives a layer of neurons that detect parts. These neurons also interact with a layer of neurons that detect wholes. We will refer to part-detectors as \"P-neurons\" and whole-detectors as \"W-neurons.\" The part-whole relationships are directly stored in the synaptic connections between P and a W neurons. If i = 1, the ith neuron in the P layer and the ath neuron in the W layer have a an excitatory interaction of strength . If i = 0, the neurons have an inhibitory interaction of strength . Furthermore, the P-neurons inhibit each other with strength , and the Wneurons inhibit each other with strength . All of these interactions are symmetric, and all activation functions are the rectification nonlinearity [z ]+ = max{z , 0}. Then the dynamics of the network takes the form i i b a a Wa + Wa = Pi i - (1 - i )Pi - \n=a\n\n+ Wb , + (1)\n\n Pi + Pi = a\na Wa i - \n\na\n\na (1 - i )Wa - \n\nj\n=i\n\nPj + B i .\n\n(2)\n\n\f\nwhere Bi is the input to the P layer from the stimulus. Figure 2 shows an example of a network with two wholes. Each whole contains two parts. One of the parts is contained in both wholes.\nWa - - - - - Wb\n\nexcitation inhibition P1\n\n} W layer\nP3\n\nP2\n\n} P layer\n\nB 1\n\nB 2\n\nB 3\n\nFigure 2: Model in example configuration: = {(1, 1, 0), (0, 1, 1)}. When a stimulus is presented, it activates some of the P-neurons, which activate some of the W-neurons. The network eventually converges to a stable steady state. We will assume that > 1. In the Appendix, we prove that this leads to unconditional winner-take-all behavior in the W layer. In other words, no more than one W-neuron can be active at a stable steady state. If a single W-neuron is active, then a whole has been detected. Potentially there are also many P-neurons active, indicating detection of parts. This representation may have different properties, depending on the choice of parameters , , and . As discussed below, these include rigorous enforcement of part-whole relationships, completion of wholes by \"filling in\" missing parts, and non-recognition of parts that do not conform to a whole.\n\n2\n\nEnforcement of part-whole relationships\n\nSuppose that a single W-neuron is active at a stable steady state, so that a whole has been detected. Part-whole relationships are said to be enforced if the network always ignores parts that are not contained in the detected whole, despite potentially strong bottom-up evidence for them. It can be shown that enforcement follows from the inequality 2 + 2 + 2 + 2 > 1. (3) which guarantees that neuron i in the P layer is inactive, if neuron a in the W layer is a active and i = 0. When part-whole relations are enforced, prior knowledge about legal combinations of parts strictly constrains what may be perceived. This result is proven in the Appendix, and only an intuitive explanation is given here. Enforcement is easiest to understand when there is interlayer inhibition ( > 0). In this case, the active W-neuron directly inhibits the forbidden P-neurons. The case of = 0 is more subtle. Then enforcement is mediated by lateral inhibition in the P layer. Excitatory feedback from the W-neuron has the effect of counteracting the lateral inhibition between the P-neurons that belong to the whole. As a result, these P-neurons become strongly activated enough to inhibit the rest of the P layer.\n\n3\n\nCompletion of wholes by filling in missing parts\n\nIf a W-neuron is active, it excites the P-neurons that belong to the whole. As a result, even if one of these P-neurons receives no bottom-up input (Bi = 0), it is still active. We call\n\n\f\nthis phenomenon \"completion,\" and it is guaranteed to happen when (4) > The network may thus \"imagine\" parts that are consistent with the recognized whole, but are not actually present in the stimulus. As with enforcement, this condition depends on top-down connections. In the special case = , the interlayer excitation between a W-neuron and its P-neurons exactly cancels out the lateral inhibition between the P-neurons at a steady state. So the recurrent connections effectively vanish, letting the activity of the P-neurons be determined by their feedforward inputs. When the interlayer excitation is stronger than this, the inequality (4) holds, and completion occurs.\n\n4\n\nNon-recognition of a whole\n\nIf there is no interlayer inhibition ( = 0), then a single W-neuron is always active, assuming that there is some activity in the P layer. To see this, suppose for the sake of contradiction that all the W-neurons are inactive. Then they receive no inhibition to counteract the excitation from the P layer. This means some of them must be active, which contradicts our assumption. This means that the network always recognizes a whole, even if the stimulus is very different from any part-whole combination that is stored in the network. However, if interlayer inhibition is sufficiently strong (large ), the network may refuse to recognize a whole. Neurons in the P layer are activated, but there is no activity in the W layer. Formal conditions on can be derived, but are not given here because of space limitations. In case of non-recognition, constraints on the P-layer are not enforced. It is possible for the network to detect a configuration of parts that is not consistent with any stored whole.\n\n5\n\nExample: Interactive Activation model\n\nTo illustrate the computational capabilities of our network, we use it to recreate the interactive activation (IA) model of McClelland and Rumelhart. Figure 3 shows numerical simulations of a network containing three layers of neurons representing strokes, letters, and words, respectively. There are 16 possible strokes in each of four letter positions. For each stroke, there are two neurons, one signaling the presence of the stroke and the other signaling its absence. Letter neurons represent each letter of the alphabet in each of four positions. Word neurons represent each of 1200 common four letter words. The letter and word layers correspond to the P and W layers that were introduced previously. There are bidirectional interactions between the letter and word layers, and lateral inhibition within the layers. The letter neurons also receive input from the stroke neurons, but this interaction is unidirectional. Our network differs in two ways from the original IA model. First, all interactions involving letter and word neurons are symmetric. In the original model, the interactions between the letter and word layers were asymmetric. In particular, inhibitory connections only ran from letter neurons to word neurons, and not vice versa. Second, the only nonlinearity in our model is rectification. These two aspects allow us to apply the full machinery of the theory of permitted and forbidden sets. Figure 3 shows the result of presenting the stimulus \"MO M\" for four different settings of parameters. In each of the four cases, the word layer of the network converges to the same result, detecting the word \"MOON\", which is the closest stored word to the stimulus. However, the activity in the letter layer is different in the four cases.\n\n\f\ninput:\n\nP layer\n\nreconstruction\n\nW layer\n\nP layer\n\nreconstruction\n\nW layer\n\ncompletion\n\nnoncompletion\n\nenforcement\n\nnon-enforcement\n\nFigure 3: Simulation of 4 different parameter regimes in a letter-word recognition network. Within\neach panel, the middle column presents a feature-layer reconstruction based on the letter activity shown in the left column. W layer activity is shown in the right column. The top row shows the network state after 10 iterations of the dynamics. The bottom row shows the steady state.\n\nIn the left column, the parameters obey the inequality (3), so that part-whole relationships are enforced. The activity of the letter layer is visualized by activating the strokes corresponding to each active letter neuron. The activated letters are part of the word \"MOON\". In the top left, the inequality (4) is satisfied, so that the missing \"O\" in the stimulus is filled in. In the bottom left, completion does not occur. In the simulations of the right column, parameters are such that part-whole relationships are not enforced. Consequently, the word layer is much more active. Bottom-up input provides evidence for several other letters, which is not suppressed. In the top right, the inequality (4) is satisfied, so that the missing \"O\" in the stimulus is filled in. In the bottom right, the \"O\" neuron is not activated in the third position, so there is no completion. However, some letter neurons for the third position are activated, due to the input from neurons that indicate the absence of strokes.\n\ninput:\n\nnon-recognition event\n\nmulti-stability\n\nFigure 4: Simulation of a non-recognition\nevent and example of multistability.\n\nFigure 4 shows simulations for large , deep in the enforcement regime where non-recognition is a possibility. From one initial condition, the network converges to a state in which no W neurons are active, a non-recognition. From another initial condition, the network detects the word \"NORM\". Deep in the enforcement regime, the top-down feedback can be so strong that the network has multiple stable states, many of which bear little resemblance to the stimulus at all. This is a problematic aspect of this network. It can be prevented by setting parameters at the edge of the enforcement regime.\n\n6\n\nDiscussion\n\nWe have analyzed a recurrent network that performs computations involving part-whole relationships. The network can fill in missing parts and suppress parts that do not belong.\n\n\f\nThese two computations are distinct and can be dissociated from each other, as shown in Figure 3. While these two computations can also be performed by associative memory models, they are not typically dissociable in these models. For example, in the Hopfield model pattern completion and noise suppression are both the result of recall of one of a finite number of stereotyped activity patterns. We believe that our model is more appropriate for perceptual systems, because its behavior is piecewise linear, due its reliance on rectification nonlinearity. Therefore, analog aspects of computation are able to coexist with the part-whole relationships. Furthermore, in our model the stimulus is encoded in maintained synaptic input to the network, rather than as an initial condition of the dynamics.\n\nA\n\nAppendix: Permitted and forbidden sets\n\nOur mathematical results depend on the theory of permitted and forbidden sets [3, 4], which is summarized briefly here. The theory is applicable to neural networks with rectification j nonlinearity, of the form xi + xi = [bi + Wij xj ]+ . Neuron i is said to be active when xi > 0. For a network of N neurons, there are 2N possible sets of active neurons. For each active set, consider the submatrix of Wij corresponding to the synapses between active neurons. If all eigenvalues of this submatrix have real parts less than or equal to unity, then the active set is said to be permitted. Otherwise the active set is said to be forbidden. A set is permitted if and only if there exists an input vector b such that those neurons are active at a stable steady state. Permitted sets can be regarded as memories stored in the synaptic connections Wij . If Wij is a symmetric matrix, the nesting property holds: every subset of a permitted set is permitted, and every superset of a forbidden set is forbidden. The present model can be seen as a general method for storing permitted sets in a recurrent network. This method introduces a neuron for each permitted set, relying on a unary or \"grandmother cell\" representation. In contrast, Xie et al.[9] used lateral inhibition in a single layer of neurons to store permitted sets. By introducing extra neurons, the present model achieves superior storage capacity, much as unary models of associative memory [1] surpass distributed models [5]. A.1 Unconditional winner-take-all in the W layer\n\nThe synapses between two W-neurons have strengths T 0 - - 0 he eigenvalues of this matrix are . Therefore two W-neurons constitute a forbidden set if > 1. By the nesting property, it follows more than two W-neurons is also a forbidden set, and that the W layer has the unconditional winner-take-all property. A.2 Part-whole combinations as permitted sets\n\nTheorem 1. Suppose that < 1. If 2 < + (1 - )/k then any combination of k 1 parts consistent with a whole corresponds to a permitted set. Proof. Consider k parts belonging to a whole. They are represented by one W-neuron and k P-neurons, with synaptic connections given by the (k + 1) (k + 1) matrix - , (11T - I ) 1 M= (5) 1T 0\n\n\f\nwhere 1 is the k -dimensional vector whose elements are all equal to one. Two eigenvectors of M are of the form (1T c), and have the same eigenvalues as the 2 2 matrix T - (k - 1) k 0 his matrix has eigenvalues less than one when 2 < + (1 - )/k and (k - 1) + 2 > 0. The other k - 1 eigenvectors are of the form (dT , 0), where dT 1 = 0. These have eigenvalues . Therefore all eigenvalues of W are less than one if the condition of the theorem is satisfied. A.3 Constraints on combining parts\n\nHere, we derive conditions under which the network can enforce the constraint that steady state activity be confined to parts that constitute a whole. Theorem 2. Suppose that > 0 and 2 + 2 + 2 + 2 > 1 If a W-neuron is active, then only P-neurons corresponding to parts contained in the relevant whole can be active at a stable steady state. Proof. Consider P-neurons Pi , Pj , and W-neuron Wa . Supa a pose that i = 1 but j = 0. As shown in Figure 5, the matrix of connections is given by: 0 ( - 0 - W = - 6) - 0\n\n\nWa\n- -\n\nPi\n\nPj\n\nFigure 5: A set of one W-neuron and two P-neurons is forbidden if one part belongs to the whole and the other does not.\n\nThis set is permitted if all eigenvalues of W - I have negative real parts. The characteristic equation of I - W is 3 + b1 2 + b2 + b3 = 0, where b1 = 3, b2 = 3 - 2 - 2 - 2 and b3 = 1 - 2 - 2 - 2 - 2 . According to the Routh-Hurwitz theorem, all the eigenvalues have negative real parts if and only if b1 > 0, b3 > 0 and b1 b2 > b3 . Clearly, the first condition is always satisfied. The second condition is more restrictive than the third. It is satisfied only when 2 + 2 + 2 + 2 < 1. Hence, one of the eigenvalues has a positive real part when this condition is broken, i.e., when 2 + 2 + 2 + 2 > 1. By the nesting property, any larger set of P-neurons inconsistent with the W-neuron is also forbidden. A.4 Completion of wholes Theorem 3. If > and a single W-neuron a is active at a steady state, then Pi > 0 a for all i such that i = 1. Proof. Suppose that the detected whole has k parts. At the steady state Pi = where Ptot = i Pi = ik 1 a Bi i 1 - + ( - 2 )k =1 (7)\na B + i 2 i - ( - )Ptot 1-\n\n\f\nA.5\n\nPreventing runaway\n\nIf feedback loops cause the network activity to diverge, then the preceding analyses are not relevant. Here we give a sufficient condition guaranteeing that runaway instability does not happen. It is not a necessary condition. Interestingly, the condition implies the condition of Theorem 1. Theorem 4. Suppose that P and W obey the dynamics of Eqs. (1) and (2), and define the objective function a i 2 2 1- a 1- i 2 2 E= Wa + Wa + Pi + Pi 2 2 2 2 i i i a a - Bi Pi - Pi Wa i + (1 - i )Pi Wa . (8)\na a 1- 2 N -1 ,\n\nThen E is a Lyapunov like function that, given > 2 - dynamics to a stable steady state.\n\nensures convergence of the\n\nProof. (sketch) Differentiation of E with respect to time shows that that E is nonincreasing in the nonnegative orthant and constant only at steady states of the network dynamics. We must also show that E is radially unbounded, which is true if the quadratic part of E is copositive definite. Note that thi last term of E is lower-bounded by zero and the previous e term is upper bounded by , a Pi Wa . We assume > 1. Thusa we can use Cauchy's i2 i a2 2 inequality, Pi ( Pi ) /N , and the fact that Wa ( Wa )2 for Wa 0, to derive ( - a a i i 1 1 - + N i E ( Pi )2 - 2 ( Wa )2 + Wa Pi ) Bi Pi . (9) 2 N If > 2 - unbounded.\n1- 2 N -1 ,\n\nthe quadratic form in the inequality is positive definite and E is radially\n\nReferences\n[1] E. B. Baum, J. Moody, and F. Wilczek. Internal representations for associative memory. Biol. Cybern., 59:217228, 1988. [2] K. Fukushima. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern, 36(4):193202, 1980. [3] R.H. Hahnloser, R. Sarpeshkar, M.A. Mahowald, R.J. Douglas, and H.S. Seung. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405(6789):947 51, Jun 22 2000. [4] R.H. Hahnloser, H.S. Seung, and J.-J. Slotine. Permitted and forbidden sets in symmetric threshold-linear networks. Neural Computation, 15:621638, 2003. [5] J.J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A, 79(8):25548, Apr 1982. [6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1:541551, 1989. [7] J. L. McClelland and D. E. Rumelhart. An interactive activation model of context effects in letter perception: Part i. an account of basic findings. Psychological Review, 88(5):375407, Sep 1981. [8] M Riesenhuber and T Poggio. Hierarchical models of object recognition in cortex. Nat Neurosci, 2(11):101925, Nov 1999. [9] X. Xie, R.H. Hahnloser, and H. S. Seung. Selectively grouping neurons in recurrent networks of lateral inhibition. Neural Computation, 14:26272646, 2002.\n\n\f\n", "award": [], "sourceid": 2765, "authors": [{"given_name": "Viren", "family_name": "Jain", "institution": null}, {"given_name": "Valentin", "family_name": "Zhigulin", "institution": null}, {"given_name": "H.", "family_name": "Seung", "institution": null}]}