{"title": "Encoding Labeled Graphs by Labeling RAAM", "book": "Advances in Neural Information Processing Systems", "page_first": 1125, "page_last": 1132, "abstract": null, "full_text": "Encoding Labeled Graphs by Labeling \n\nRAAM \n\nAlessandro Sperduti* \n\nDepartment of Computer Science \n\nPisa University \n\nCorso Italia 40, 56125 Pisa, Italy \n\nAbstract \n\nIn this paper we propose an extension to the RAAM by Pollack. \nThis extension, the Labeling RAAM (LRAAM), can encode la(cid:173)\nbeled graphs with cycles by representing pointers explicitly. Data \nencoded in an LRAAM can be accessed by pointer as well as by \ncontent. Direct access by content can be achieved by transform(cid:173)\ning the encoder network of the LRAAM into an analog Hopfield \nnetwork with hidden units. Different access procedures can be \ndefined depending on the access key. Sufficient conditions on the \nasymptotical stability of the associated Hopfield network are briefly \nintroduced. \n\n1 \n\nINTRODUCTION \n\nIn the last few years, several researchers have tried to demonstrate how symbolic \nstructures such as lists, trees, and stacks can be represented and manipulated in a \nconnectionist system, while still preserving all the computational characteristics of \nconnectionism (and extending them to the symbolic representations) (Hinton, 1990; \nPlate, 1991; Pollack, 1990; Smolensky, 1990; Touretzky, 1990). The goal is to high(cid:173)\nlight the potential of the connectionist approach in handling domains of structured \ntasks. The common background of their ideas is an attempt to achieve distal access \nand consequently compositionality. The RAAM model, proposed by Pollack (Pol(cid:173)\nlack, 1990), is one example of how a neural network can discover compact recursive \n\n\"Work partially done while at the International Computer Science Institute, Berkeley. \n\n1125 \n\n\f1126 \n\nSperduti \n\nOutput Layer \n\nHidden Layer \n\nInput Layer \n\nLabel \n\nFigure 1: The network for a general LRAAM. The first layer of the network imple(cid:173)\nments an encoder; the second layer, the corresponding decoder. \n\ndistributed representations of trees with a fixed branching factor. \n\nThis paper presents an extension of the RAAM, the Labeling RAAM (LRAAM). \nAn LRAAM allows one to store a label for each component of the structure to be \nrepresented, so as to generate reduced representations of labeled graphs. Moreover, \ndata encoded in an LRAAM can be accessed not only by pointer but also by content. \nIn Section 2 we present the network and we discuss some technical aspects of the \nmodel. The possibility to access data by content is presented in Section 3. Some \nstability results are introduced in Section 4, and the paper is closed by discussion \nand conclusions in Section 5. \n\n2 THE NETWORK \n\nThe general structure of the network for an LRAAM is shown in Figure 1. The \nnetwork is trained by backpropagation to learn the identity function. The idea is to \nobtain a compressed representation (hidden layer activation) of a node of a labeled \ngraph by allocating a part of the input (output) of the network to represent the \nlabel (Nl units) and the remaining part to represent one or more pointers. This \nrepresentation is then used as pointer to the node. To allow the recursive use of these \ncompressed representations, the part of the input (output) layer which represents \na pointer must be of the same dimension as the hidden layer (N H units) . Thus, a \ngeneral LRAAM is implemented by a NJ - N H - NJ feed-forward network, where \nNJ = Nl + nN H, and n is the number of pointer fields. \nLabeled graphs can be easily encoded using an LRAAM. Each node of the graph \nonly needs to be represented as a record, with one field for the label and one \nfield for each pointer to a connected node. The pointers only need to be logical \npointers, since their actual values will be the patterns of hidden activation of the \nnetwork. At the beginning of learning, their values are set at random. A graph is \nrepresented by a list of these records, and this list constitutes the initial training set \nfor the LRAAM. During training the representations of the pointers are consistently \nupdated according to the hidden activations. Consequently, the training set is \ndynamic. For example, the network for the graph shown in Figure 2 can be trained \nas follows: \n\n\fEncoding Labeled Graphs by Labeling RAAM \n\n1127 \n\ninput \n\nhidden \n\noutput \n\n(Ll dn2 dn4 dn5 ) - d~1 - (L\" d\" d\" d\" ) \n(L2 dn3 dn4 nil) - d~2 - (L\" d\" d\" nil\") \n(L3 dn6 nil nil) - d~3 - (L\" d\" nil\" nil\") \n(L4 dn6 dn3 nil) - d~4 - (L\" d\" d\" nil\") \n(L5 dn4 dn6 nil) - d~5 - (L\" d\" d\" nil\") \n(L6 nil nil nil) - d~6 - (L~ nil\" nil\" nil\") \n\nn3 \nn6 \nn6 n3 \nn4 \nn6 \n\nn5 \n\n1 \n\n2 \n3 \n4 \n5 \n\nn2 \n\nn4 \n\nn4 \n\nwhere Li and dni are respectively the label and the pointer (reduced descriptor to \nthe i-th node. For the sake of simplicity, the void pointer is represented by a single \nsymbol, nil, but each instance of it must be considered as being different. This \nstatement will be made clear in the next section. \n\nOnce the training is complete, the patterns of activation representing pointers can be \nused to retrieve information. Thus, for example, if the activity of the hidden units of \nthe network is clamped to dn1 , the output of the network becomes (Ll ,dn2 ,dn4 ,dn5 ), \nenabling further retrieval of information by decoding dn2 , or dn4 , or dn5 , and so on. \nNote that more labeled graphs can be encoded in the same LRAAM. \n\n2.1 THE VOID POINTER PROBLEM \n\nIn the RAAM model there is a termination problem in the decoding of a compressed \nrepresentation: due to approximation errors introduced during decoding, it is not \nclear when a decoded pattern is a terminal or a nonterminal. One solution is to test \nfor \"binary-ness\", which consists in checking whether all the values of a pattern are \nabove 1 - T or below T, T > 0, T \u00ab 1. However, a nonterminal may also pass the \ntest for \"binary-ness\". \n\nOne advantage of LRAAM over RAAM is the possibility to solve the problem by \nallocating one bit of the label for each pointer to represent if the pointer is void or \nnot. This works better than fixing a particular pattern for the void pointer, such \nas a pattern with all the bits to 1 or 0 or -1 (if symmetrical sigmoids are used). \nSimulations performed with symmetrical sigmoids showed that the configurations \nwith all bits equal to 1 or -1 were also used by non void pointers, whereas the \nconfiguration with all bits set to zero considerably reduced the rate of convergence. \nU sing a part of the label to solve the problem is particularly efficient, since the \npointer fields are free to take on any configuration when they are void, and this \nincreases the freedom of the system. To facilitate learning, the output activation \nof the void pointers in one epoch is used as an input activation in the next epoch. \nExperimentation showed fast convergence to different fixed points for different void \n\nFigure 2: An example of a labeled graph. \n\n\f1128 \n\nSperduti \n\npointers. For this reason, we claimed that each occurrence of the void pointer is \ndifferent, and that the nil symbol can be considered as a \"don't care\" symbol. \n\n2.2 REPRESENTATION OF THE TRAINING SET \n\nAn important question about the way a graph is represented in the training set \nis which aspects of the representation itself can make the encoding task harder \nor easier. In (Sperduti, 1993a) we made a theoretical analysis on the constraints \nimposed by the representation on the set of weights of the LRAAM, under the \nhypotheses of perfect learning (zero total error after learning) and linear output \nunits. Our findings were: \n\ni) pointers to nodes belonging to the same cycle of length k and represented in \nthe same pointer field p, must be eigenvectors of the matrix (W(p))k, where \nW(p) is the connection matrix between the hidden layer and the output \nunits representing the pointer field p; \n\nii) confluent pointers, i.e., pointers to the same node represented in the same \npointer field p (of different nodes), contribute to reducing the rank of the \nmatrix W(p), the actual rank is however dependent on the constraints im(cid:173)\nposed by the label field and the other pointer fields. \n\nWe have observed that different representations of the same structure can lead to \nvery different learning performances. However, representations with roughly the \nsame number of non void pointers for each pointer field, with cycles represented in \ndifferent pointer fields and with confluent pointers seem to be more effective. \n\n3 ACCESS BY CONTENT \n\nRetrieval of coded information is performed in RAAM through the pointers. All the \nterminals and nonterminals can be retrieved recursively by the pointers to the whole \ntree encoded in a RAAM. If direct access to a component of the tree is required, \nthe pointer to the component must be stored and used on demand. \n\nData encoded in an LRAAM can also be accessed directly by content. In fact, an \nLRAAM network can be transformed into an analog Hopfield network with one \nhidden layer and asymmetric connection matrix by feeding back its output into its \ninput units. 1 Because each pattern is structured in different fields, different access \n\n1 Experimental results have shown that there is a high correlation between elements of \nW(h) (the set of weights from the input to the hidden layer) and the corresponding elements \nin W(o)T (the set of weights from the hidden to the output layer). This is particularly true \nfor weights corresponding to units of the label field. Such result is not a total surprise, \nsince in the case of a static training set, the error function of a linear encoder network has \nbeen proven to have a unique global minimum corresponding to the projection onto the \nsubspace generated by the first principal vectors of a covariance matrix associated with the \ntraining set (Baldi & Hornik, 1989). This implies that the weights matrices are transposes \nof each other unless there is an invertible transformation between them (see also (Bourlard \n& Kamp, 1988)) . \n\n\fEncoding Labeled Graphs by Labeling RAAM \n\n1129 \n\nn2=.-r..~= \n\nn5 \n\n=100=00=-==.=1.1 \n\nn9 \n\nnlO \n\n~IQl R\",.~.=. \n/n15 \\ \n\n101 \u2022\u2022 101.1. 1 O~.=lctJ~.=I.1 \n\nnl4 \n\nFigure 3: The labeled graph encoded in a 16-3-16 LRAAM (5450 epochs), and the \nlabeled tree encoded in a 18-6-18 LRAAM (1719 epochs). \n\nprocedures can be defined on the Hopfield network according to the type of access \nkey. An access procedure is defined by: \n\n1. choosing one or more fields in the input layer according to the access key(s); \n2. clamping the output of such units to the the access key(s); \n3. setting randomly the output of the remaining units in the network; \n4. letting the remaining units of the network to relax into a stable state. \n\nA validation test of the reached stable state can be performed by: \n\n1. unfreezing the clamped units in the input layer; \n2. if the stable state is no longer stable the result of the procedure is considered \n\nwrong and another run is performed; \n\n3. otherwise the stable state is considered a success. \n\nThis validation test, however, sometimes can fail to detect an erroneous retrieval \n(error) because of the existence of spurious stable states that share the same known \ninformation with the desired one. \n\nThe results obtained by the access procedures on an LRAAM codifying the graph \nand on an LRAAM codifying the tree shown in Figure 3 are reported in Table \n1. For each procedure 100 trials were performed. \nThe \"mean\" column in the \ntable reports the mean number of iterations employed by the Bopfield network to \nconverge. The access procedure by outgoing pointers was applied only for the tree. \nIt can be seen from Table 1 that the performances of the access procedures were \nhigh for the graph (no errors and no wrong retrievals), but not so good for the \ntree, in particular for the access by label procedure, due to spurious memories. It is \ninteresting to note that the access by label procedure is very efficient for the leaves \nof the tree. This feature can be used to build a system with two identical networks, \none accessed by pointer and the other by content. The search for a label proceeds \nsimultaneously into the two networks. The network accessed by pointer will be very \nfast to respond when the label is located on a node at lower levels of the tree, and \nthe network accessed by content will be able to respond correctly and very fast \"2 \nwhen the label is located on a node at higher levels of the tree. \n\n2 Assuming an analog implementation of the Hopfield network. \n\n\f1130 \n\nSperduti \n\nkey(s) \n\n(d1 , d 2) \n(d3,d4) \n(d5, d6) \n(d7 , ds) \n(d9, d lO ) \n\nd~~) \n\n(d12 ,d13 ) \nld14, d 15 ) \n(*) one pointer \n\n49% \n10% \n40% \n78% \n9% \n14% \n14% \n28% \n\nGRAPH: Access by Label \n\nsuccess wrong \n100% \n100% \n100% \n100% \n100% \n100% \n100% \n\nio \ni1 \ni2 \ni3 \ni4 \n15 \ni6 \nTREE: Access by Children Pointers \n\nerror mean \n7.35 \n0% \n36.05 \n0% \n6.04 \n0% \n0% \n3.99 \n23.12 \n0% \n18.12 \n0% \n29.26 \n0% \n\n0% \n0% \n0% \n0% \n0% \n0% \n0% \n\n51% \n90% \n60% \n22% \n91% \n86% \n86% \n72% \n\n0% \n0% \n0% \n0% \n0% \n0% \n0% \n0% \n\n6.29 \n8.55 \n12.48 \n6.57 \n6.22 \n14.01 \n7.87 \n6.07 \n\nTREE: Access by Label \nsuccess wrong \n100% \n6% \n53% \n0% \n0% \n0% \n51% \n58% \n43% \n0% \n0% \n0% \n0% \n71% \n0% \n0% \n\nerror mean \n16.48 \n0% \n14.57 \n0% \n16.92 \n0% \n18.07 \n0% \n32.64 \n3% \n16.03 \n0% \n27.50 \n0% \n27.10 \n0% \n62.45 \n0% \n80% \n14.75 \n19.11 \n0% \n10.83 \n0% \n19.12 \n0% \n23.87 \n0% \n12.09 \n0% \n13.11 \n0% \n\n0% \n94% \n47% \n100% \n97% \n100% \n49% \n42% \n57% \n20% \n100% \n100% \n100% \n29% \n100% \n100% \n\nkey \nio \nit \ni2 \ni3 \ni4 \n15 \n16 \ni7 \nis \ni9 \n11O \nIII \nlt2 \nit3 \n114 \n115 \n\nTable 1: Results obtained by the access procedures. \n\n4 STABILITY RESULTS \n\nIn the LRAAM model two stability problems are encountered. The first one arises \nwhen considering the decoding of a pointer along a cycle of the encoded structures. \nSince the decoding process suffers, in general, of approximation errors, it may hap(cid:173)\npen that the decoding diverges from the correct representations of the pointers \nbelonging to the cycle. Thus, it is fundamental to discover under which conditions \nthe representations obtained for the pointers are asymptotically stable with respect \nto the pointer transformation. In fact, if the representations are asymptotically \nstable, the errors introduced by the decoding function are automatically corrected. \nThe following theorem can be proven (Sperduti, 1993b): \n\nTheorem 1 A decoding sequence \n\nl(i;+I) = F(p';)(l(iJ\u00bb), \n\nj = 0, .. . ,L \n\nwith l(iL+d = l(t o) , satisfying \n\nm L Ibikl < 1, \n\nk=l \n\ni = 1, ... ,m \n\n(1) \n\n(2) \n\nfor some index Pi'l' q = 0, ... , L, is asymptotically stable, where btk is the (i, k) th \nelement of a matrix B, given by \nB = J(P\"I) (l( i'l) )J(P\"I-l ) (l( i'l_ J)) ... J(p'{J) (l( io) )J(p, L \\ l(iL\u00bb) ... J(P\"I+l ) (d (i'l+d). \n\nIn the statement of the theorem, F(p;) (l) = F(D(p; )l+~;\u00bb) is the transformation \nof the reduced descriptor (pointer) d by the pointer field Pj, and J(pJ)(l) is its \n\n\fEncoding Labeled Graphs by Labeling RAAM \n\n1131 \n\nJacobian matrix. As a corollary of this theorem we have that if at least one pointer \nbelonging to the cycle has saturated components, then the cycle is asymptotically \nstable with respect to the decoding process. Moreover, the theorem can be applied \nwith a few modifications to the stability analysis of the fixed points of the associated \nHopfield network. \n\nThe second stability problem consists into the discovering of sufficient conditions \nunder which the property of asymptotical stability of a fixed point in one particular \nconstrained version of the associated Hopfield network, i.e., an access procedure, \ncan be extended to related fixed points of different constrained versions of it, i.e., \naccess procedures with more information or different information. The result of \nTheorem 1 was used to derive three theorems regarding this issue (see (Sperduti, \n1993b) ). \n\n5 DISCUSSION AND CONCLUSIONS \n\nThe LRAAM model can be seen from various perspectives. It can be considered as \nan extension of the RAAM model, which allows one to encode not only trees with \ninformation on the leaves, but also labeled graphs with cycles. On the other hand, \nit can be seen as an approximate method to build analog Hopfield networks with \na hidden layer. An LRAAM is probably somewhere in between. In fact, although \nit extends the representational capabilities of the RAAM model, it doesn't possess \nthe same synthetic capabilities as the RAAM, since it explicitly uses the concept \nof pointer. Different subsets of units are thus used to codify labels and pointers. \nIn the RAAM model, using the same set of units to codify labels and reduced \nrepresentations is a more natural way of integrating a previously developed reduced \ndescriptor as a component of a new structure. In fact, this ability was Pollack's \noriginal rationale behind the RAAM model, since with this ability it is possible to fill \na linguistic role with the reduced descriptor of a complex sentence. In the LRAAM \nmodel the same target can be reached, but less naturally. There are two possible \nsolutions. One is to store the pointer of some complex sentence (or structure, in \ngeneral), which was previously developed, in the label of a new structure. The \nother solution would be to have a particular label value which tells us that the \ninformation we are looking for can be retrieved using one conventional or particular \npointer among the current ones. \n\nAn issue strictly correlated with this is that, even if in an LRAAM it is possible \nto encode a cycle, what we get from the LRAAM is not an explicit reduced repre(cid:173)\nsentation of the cycle, but several pointers to the components of the cycle forged \nin such a way that the information on the cycle is only represented implicitly in \neach of them. However, the ability to synthesize reduced descriptors for structures \nwith cycles is what makes the difference between the LRAAM and the RAAM. The \nonly system that we know of which is able to represent labeled graphs is the DUAL \nsystem proposed by Dyer (Dyer, 1991). It is able to encode small labeled graphs \nrepresenting relationships among entities. However, the DUAL system cannot be \nconsidered as being on the same level as the LRAAM, since it devises a reduced \nrepresentation of a set of functions relating the components of the graph rather \nthan a reduced representation for the graph. Potentially also Holographic Reduced \nRepresentations (Plate, 1991) are able to encode cyclic graphs. \n\n\f1132 \n\nSperduti \n\nThe LRAAM model can also be seen as an extension of the Hopfield networks \nphilosophy. A relevant aspect of the use of the Hopfield network associated with an \nLRAAM, is that the access procedures defined on it can efficiently exploit subsets \nof the weights. In fact, their use corresponds to generating several smaller networks \nfrom a large network, one for each kind of access procedure, each specialized on a \nparticular feature of the stored data. Thus, by training a single network, we get \nseveral useful smaller networks. \n\nIn conclusion an LRAAM has several advantages over a standard RAAM. Firstly, \nit is more powerful, since it allows to encode directed graphs where each node has \na bounded number of outgoing arcs. Secondly, an LRAAM allows direct access to \nthe components of the encoded structure not only by pointer, but also by content. \nConcerning the applications where LRAAMs can be exploited, we believe there are \nat least three possibilities: in knowledge representation, by encoding Conceptual \nGraphs (Sowa, 1984); in unification, by representing terms in restricted domains \n(Knight, 1989); in image coding, by storing Quadtrees (Samet, 1984); \n\nReferences \n\nP. Baldi & K. Hornik. (1989) Neural networks and principal component analysis: Learning \nfrom examples without local minima. Neural Networks, 2:53-58. \n\nH. Bourlard & Y. Kamp. (1988) Auto-association by multilayer perceptrons and singular \nvalue decomposition. Biological Cybernetics, 59:291-294. \n\nM. G. Dyer. (1991) Symbolic NeuroEngineering for Natural Language Processing: A Multi(cid:173)\nlevel Research Approach., volume 1 of Advances in Connectionist and Neural Computation \nTheory, pages 32-86. Ablex. \n\nG. E. Hinton. (1990) Mapping part-whole hierarchies into connectionist networks. A rtifi(cid:173)\ncial Intelligence, 46:47-75. \n\nK. Knight. \n21:93-124. \n\n(1989) Unification: A multidisciplinary survey. A CM Computing Surveys, \n\nT. Plate. (1991) Holographic reduced representations. Technical Report CRG-TR-91-1, \nDepartment of Computer Science, University of Toronto. \n\nJ. B. Pollack. (1990) Recursive distributed representations. Artificial Intelligence, 46(1-\n2):77-106. \n\nH. Samet. (1984) The quadtree and related hierarchical data structures. A CM Computing \nSurveys, 16:187-260. \n\nP. Smolensky. (1990) Tensor product variable binding and the representation of symbolic \nstructures in connectionist systems. Artificial Intelligence, 46:159-216. \n\nJ.F. Sowa. (1984) Conceptual Structures: Information Processing in Mind and Machine. \nAddison-Wesley. \n\nA. Sperduti. (1993a) Labeling RAAM. TR 93-029, ICSI, Berkeley. \n\nA. Sperduti. (1993b) On some stability properties of the LRAAM model. TR 93-031, \nICSI, Berkeley. \n\nD. S. Touretzky. (1990) Boltzcons: Dynamic symbol structures in a connectionist network. \nA rtificial Intelligence, 46:5-46. \n\n\fPART XI \n\nADDENDA TO NIPS 5 \n\n\f\f", "award": [], "sourceid": 860, "authors": [{"given_name": "Alessandro", "family_name": "Sperduti", "institution": null}]}