{"title": "Introduction to a System for Implementing Neural Net Connections on SIMD Architectures", "book": "Neural Information Processing Systems", "page_first": 804, "page_last": 813, "abstract": null, "full_text": "804 \n\nINTRODUCTION TO A SYSTEM FOR IMPLEMENTING NEURAL NET \n\nCONNECTIONS ON SIMD ARCHITECTURES \n\nSherryl Tomboulian \n\nInstitute for Computer Applications in Science and Engineering \n\nNASA Langley Research Center, Hampton VA 23665 \n\nABSTRACT \n\nNeural networks have attracted much interest recently, and using parallel \n\narchitectures to simulate neural networks is a natural and necessary applica(cid:173)\ntion. The SIMD model of parallel computation is chosen, because systems of \nthis type can be built with large numbers of processing elements. However, \nsuch systems are not naturally suited to generalized communication. A method \nis proposed that allows an implementation of neural network connections on \nmassively parallel SIMD architectures. The key to this system is an algorithm \nthat allows the formation of arbitrary connections between the \"neurons\". A \nfeature is the ability to add new connections quickly. It also has error recov(cid:173)\nery ability and is robust over a variety of network topologies. Simulations of \nthe general connection system, and its implementation on the Connection Ma(cid:173)\nchine, indicate that the time and space requirements are proportional to the \nproduct of the average number of connections per neuron and the diameter of \nthe interconnection network. \n\nINTRODUCTION \n\nNeural Networks hold great promise for biological research, artificial intelli(cid:173)\n\ngence, and even as general computational devices. However, to study systems \nin a realistic manner, it is highly desirable to be able to simulate a network \nwith tens of thousands or hundreds of thousands of neurons. This suggests the \nuse of parallel hardware. The most natural method of exploiting parallelism \nwould have each processor simulating a single neuron. \n\nConsider the requirements of such a system. There should be a very large \nnumber of processing elements which can work in parallel. The computation \nthat occurs at these elements is simple and based on local data. The processing \nelements must be able to have connections to other elements. All connections \nin the system must be able to be traversed in parallel. Connections must be \nadded and deleted dynamically. \n\nGiven current technology, the only type of parallel model that can be con(cid:173)\n\nstructed with tens of thousands or hundreds of thousands of processors is an \nSIMD architecture. In exchange for being able to build a system with so many \nprocessors, there are some inherent limitations. SIMD stands for single instruc(cid:173)\ntion multiple datal which means that all processors can work in parallel, but \nthey must do exactly the same thing at the same time. This machine model \nis sufficient for the computation required within a neuron, however in such a \nsystem it is difficult to implement arbitrary connections between neurons. The \nConnection Machine2 provides such a model, but uses a device called the router \n\nThis work was supported by the National Aeronautics and Space Administration under \n\nNASA Constract No. NASl-18010-7 while the author was in residence at ICASE. \n\n\u00a9 American Institute of Physics 1988 \n\n\f805 \n\nto deliver messages. The router is a complex piece of hardware that uses signif(cid:173)\nicant chip area, and without the additional hardware for the router, a machine \ncould be built with significantly more processors. Since one of the objectives is \nto maximize the number of \"neurons\" it is desirable to eliminate the extra cost \nof a hardware router and instead use a software method. \n\nExisting software algorithms for forming connections on SIMD machines \nare not sufficient for the requirements of a neural networks. They restrict the \nform of graph (neural network) that can be embedded to permutations!\u00b7\u00b7 or \nsorts5.6combinedwith7, the methods are network specific, and adding a new connec(cid:173)\ntion is highly time consuming. \n\nThe software routing method presented here is a unique algorithm which al(cid:173)\n\nlows arbitrary neural networks to be embedded in machines with a wide variety \nof network topologies. The advantages of such an approach are numerous: A \nnew connection can be added dynamically in the same amount of time that it \ntakes to perform a parallel traversal of all connections. The method has error \nrecovery ability in case of network failures. This method has relationships with \nnatural neural models. When a new connection is to be formed, the two neurons \nbeing connected are activated, and then the system forms the connection with(cid:173)\nout any knowledge of the \"address\" of the neuron-processors and without any \ninstruction as to the method of forming the connecting path. The connections \nare entirely distributed; a processor only knows that connections pass through \nit - it doesn't know a connection's origin or final destination. \n\nSome neural network applications have been implemented on massively par(cid:173)\n\nallel architectures, but they have run into restrictions due to communication. \nAn implementation on the Connection Machines discovered that it was more \ndesirable to cluster processors in groups, and have each processor in a group \nrepresent one connection, rather than having one processor per neuron, because \nthe router is designed to deliver one message at a time from each processor. This \napproach is contrary with the more natural paradigm of having one processor \nrepresent a neuron. The MPP 9, a massively parallel architecture with proces(cid:173)\nsors arranged in a mesh, has been used to implement neural nets10, but because \nof a lack of generalized communication software, the method for edge connec(cid:173)\ntions is a regular communication pattern with all neurons within a specified \ndistance. This is not an unreasonable approach, since within the brain neurons \nare usually locally connected, but there is also a need for longer connections \nbetween groups of neurons. The algorithms presented here can be used on \nboth machines to facilitate arbitrary connections with an irregular number of \nconnections at each processor. \n\nMACHINE MODEL \n\nAs mentioned previously, since we desire to build a system with an large \n\nnumber of processing elements, the only technology currently available for build(cid:173)\ning such large systems is the SIMD architecture model. In the SIMD model \nthere is a single control unit and a very large number of slave processors that \ncan execute the same instruction stream simultaneously. It is possible to disable \nsome processors so that only some execute an instruction, but it is not possible \nto have two processor performing different instructions at the same time. The \nprocessors have exclusively local memory which is small (only a few thousand \nbits), and they have no facilities for local indirect addressing. In this scheme \nan Instruction involves both a particular operation code and the local memory \n\n\f806 \n\naddress. All processors must do this same thing to the same areas of their local \nmemory at the same time. \n\nThe basic model of computation is bit-serial - each instruction operates on \na bit at a time. To perform multiple bit operations, such as integer addition, \nrequires several instructions. This model is chosen because it requires less \nhardware logic, and so would allow a machine to be built with a larger number \nof processors than could otherwise be achieved with a standard word-oriented \napproach. Of course, the algorithms presented here will also work for machines \nwith more complex instruction abilities; the machine model described satisfies \nthe minimal requirements. \n\nAn important requirement for connection formation is that the processors \n\nare connected in some topology. For instance, the processors might be con(cid:173)\nnected in a grid so that each processor has a North, South, East, and West \nneighbor. The methods presented here work for a wide variety of network \ntopologies. The requirements are: (1) there must be some path between any \ntwo proeessors; (2) every neighbor )ink must be bi-directional, i.e. \nif A is a \nneighbor of B, then B must be a neighbor of A; (3) the neighbor relations \nbetween processors must have a consistent invertible labeling. A more pre(cid:173)\ncise definition of the labeling requirements can be found in 11. It suffices that \nmost networks 12, including grid, hypercube, cube connected cycles1S, shuffle \nexchange14 , and mesh of trees15 are admissible under the scheme. Additional \nrequirements are that the processors be able to read from or write to their \nneighbors' memories, and that at least one of the processors acts as a serial \nport between the processors and the controller. \n\nCOMPUTATIONAL REQUIREMENTS \n\nThe machine model described here is sufficient for the computational re(cid:173)\n\nquirements of a neuron. Adopt the paradigm that each processor represents one \nneuron. While several different models of neural networks exist with slightly \ndifferent features, they are all fairly well characterized by computing a sum or \nproduct of the neighbors values, and if a certain threshold is exceeded, then \nthe processor neuron will fire, Le. activate other neurons. The machine model \ndescribed here is more efficient at boolean computation, such as described by \nMcCulloch and Pitts16, since it is bit serial. Neural net models using integers \nand floating point arithmetic 17,18 will also work but will be somewhat slower \nsince the time for computation is proportional to the number of bits of the \noperands. \n\nThe only computational difficulty lies in the fact that the system is SIMD, \nwhich means that the processes are synchronous. For some neural net models \nthis is sufficient18 however others require asynchronous behavior 17. This can \neasily be achieved simply by turning the processors on and off based on a spec(cid:173)\nified probability distribution. (For a survey of some different neural networks \nsee 19). \n\nCONNECTION ASSUMPTIONS \n\nMany models of neural networks assume fully connected systems. This \nmodel is considered unrealistic, and the method presented here will work better \nfor models that contain more sparsely connected systems. While the method \nwill work for dense connections, the time and space required is proportional to \n\n\f807 \n\nthe number of edges, and becomes prohibitively expensive. \n\nOther than the sparse assumptions, there are no restrictions to the topo(cid:173)\n\nlogical form of the network being simulated. For example, multiple layered \nsystems, slightly irregular structures, and completely random connections are \nall handled easily. The system does function better if there is locality in the \nneural network. These assumptions seem to fit the biological model of neurons. \n\nTHE CONNECTION FORMATION METHOD \n\nA fundamental part of a neural network implementation is the realization of \n\nthe connections between neurons. This is done using a software scheme first pre(cid:173)\nsented in 11,20. The original method was intended for realizing directed graphs \nin SIMD architectures. Since a neural network is a graph with the neurons \nbeing vertices and the connections being arcs, the method maps perfectly to \nthis system. Henceforth the terms neuron and vertex and the terms arc and \nconnection will be used interchangeably. \n\nThe software system presented here for implementing the connections has \nseveral parts. Each processor will be assigned exactly one neuron. (Of course \nsome processors may be \"free\" or unallocated, but even \"free\" processor par(cid:173)\nticipate in the routing process.) Each connection will be realized as a path \nin the topology of processors. A labeling of these paths in time and space is \nintroduced which allows efficient routing algorithms and a set-up strategy is \nintroduced that allows new connections to be added quickly. \n\nThe standard computer science approach to forming the connection would \nbe to store the addresses of the processors to which a given neuron is connected. \nThen, using a routing algorithm, messages could be passed to the processors \nwith the specified destination. However, the SIMD architecture does not lend \nitself to standard message passing schemes because processors cannot do indi(cid:173)\nrect addressing, so buffering of values is difficult and costly. \n\nInstead, a scheme is introduced which is closer to the natural neuron-synapse \nstructures. Instead of having an address for each connection, the connection \nis actually represented as a fixed path between the processors, using time as a \nvirtual dimension. The path a connection takes through the network of pro(cid:173)\ncessors is statically encoded in the local memories of the neurons that it passes \nthrough. To achieve this, the following data structures will be resident at each \nprocessor. \n\nALLOCATED ---- boolean flag indicating \nwhether this processor is assigned \na vertex (neuron) in the graph \n\nVERTEX LABEL --- label of graph vertex \nHAS_NEIGHBOR[l .. neighbor_limit] flag \n\nindicating the existence of neighbors \narc path information \n\nSLOTS[l .. T] OF \n\n(neuron) \n\nSTART----------new arc starts here \nDIRECTION------direction to send \n\n{l .. neighbor_limit.FREE} \n\nEND-----------arc ends here \nARC LABEL-----label of arc \n\n\f808 \n\nThe ALLOCATED and VERTEX LABEL field indicates that the processor \nhas been assigned a vertex in the graph (neuron). The HAS NEIGHBOR field \nis used to indicate whether a physical wire exists in the particular direction; it \nallows irregular network topologies and boundary conditions to be supported. \nThe SLOTS data structure is the key to realizing the connections. It is used \nto instruct the processor where to send a message and to insure that paths are \nconstructed in such a way that no collisions will occur. \n\nSLOTS is an array with T elements. The value T is called the time quantum. \nTraversing all the edges of the embedded graph in parallel will take a certain \namount of time since messages must be passed along through a sequence of \nneighboring processors. Forming these parallel connections will be considered \nan uninterruptable operation which will take T steps. The SLOTS array is used \nto tell the processors what they should do on each relative time position within \nthe time quantum. \n\nOne of the characteristics of this algorithm is that a fixed path is chosen to \nrepresent the connection between two processors, and once chosen it is never \nchanged. For example, consider the grid below. \n\n--A--B--C--D--E--\n\n--F--G--H--I--J--\n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nFig. 1. Grid Example \n\nIf there is an arc between A and H, there are several possible paths: East(cid:173)\n\nEast-South, East-South-East, and South-East-East. Only one of these paths \nwill be chosen between A and H, and that same path will always be used. \nBesides being invariant in space, paths are also invariant in time. As stated \nabove, traversal is done within a time quantum T. Paths do no have to start \non time 1, but can be scheduled to start at some relative offset within the \ntime quantum. Once the starting time for the path has been fixed, it is never \nchanged. Another requirement is that a message can not be buffered, it must \nproceed along the specified directions without interruption. For example, if \nthe path is of length 3 and it starts at time 1, then it will arrive at time \n4. Alternatively, if it starts at time 2 it will arrive at time 5. Further, it is \nnecessary to place the paths so that no collisions occur; that is, no two paths \ncan be at the same processor at the same instant in time. Essentially time \nadds an extra dimension to the topology of the network, and within this space(cid:173)\ntime network all data paths must be non-conflicting. The rules for constructing \npaths that fulfill these requirements are listed below . \n\n\u2022 At most one connection can enter a processor at a given time, and at \nmost one connection can leave a processor at a given time. It is possible \nto have both one coming and one going at the same time. Note that this \ndoes not mean that a processor can have only one connection; it means \nthat it can have only one connection during anyone of the T time steps. \nIt can have as many as T connections going through it . \n\n\u2022 Any path between two processors (u,v) repr('senting a connection must \n\nconsist of steps at contiguous times. For example, if the path from pro(cid:173)\ncessor u to processor v is u,f,g,h,v, then if the arc from u-f is assigned \ntime 1, f-g must have time 2, g-h time 3, and h-v time 4. Likewise if u-f \noccurs at time 5, then arc h-v will occur time 8. \n\n\f809 \n\nWhen these rules are used when forming paths, the SLOTS structure can \nbe used to mark the paths. Each path goes through neighboring processors at \nsuccessive time steps. For each of these time steps the DffiECTION field of \nthe SLOTS structure is marked, telling the processor which direction it should \npass a message if it receives it on that time. SLOTS serves both to instruct the \nprocessors how to send messages, and to indicate that a processor is busy at a \ncertain time slot so that when new paths are constructed it can be guaranteed \nthat they won't conflict with current paths. \nConsider the following example. Suppose we are given the directed graph \nwith vertices A,B,C,D and edges A - > C, B - > C,B - > D, and D - > \nA. This is to be done where A,B,C, and D have been assigned to successive \nelements of a linear array. (A linear array in not a good network for this \nscheme, but is a convenient source of examples.) \n\nLo~ical Connections \n\nFaa. 2. GIapb Example \n\nA.B.C.D are successive members in a linear array \n\n1---2---3---4 \nA---B---C---D \n\nFirst. A ->C can be completed with the map East-East. so \nSlots[A][1].direction = E. Slots[B][2].direction=E. \nSlots[C][2].end = 1 . \n\nB->C can be done with the map East. it can start at time 1. \nsince Slots[B] [1] . direction and Slots[C] [1].end are free. \n\nB->D goes through C then to D. its map is East-East. B is \noccupied at time 1 and 2. It is free at time 3. \nso Slots[B] [3].direction = E. Slots[C] [4].direction = E. \nSlots[D] [4].end = 1. \n\nD->A must go through C.B.A. using map West-West-West. \nD is free on time 1. C is free on time 2. but B is occupied \non time 3. D is free on time 2. but C is occupied on time 3. \nIt can start from D at time 3. Slots[D] [3].direction = W. \nSlots[C] [4] . direction = W. Slots[B] [5].direction = W. \nSlots [A] [5].end=1 \n\n\f810 \n\nEvery processor acts as a conduit for its neighbors messages. No processor \nknows where any message is going to or coming from, but each processor knows \nwhat it must do to establish the local connections. \n\nThe use of contiguous time slots is vital to the correct operation of the \nsystem. If all edge-paths are established according to the above rules, there is \na simple method for making the connections. The paths have been restricted \nso that there will be no collisions, and paths' directions use consecutive time \nslots. Hence if all arcs at time i send a message to their neighbors, then each \nprocessor is guaranteed no more than 1 message coming to it. The end of a \npath is specified by setting a separate bit that is tested after each message \nis received. A separate start bit indicates when a path starts. The start bit \nis needed because the SLOTS array just tells the processors where to send a \nmessage, regardless of how that message arrived. The start array indicates \nwhen a message originates, as opposed to arriving from a neighbor. \n\nThe following algorithm is basic to the routing system. \n\nfor i = time 1 to T \n\nFORALL processors \n/* if an arc starts or is passing through at this time*/ \n\nif SLOT[i] . START = 1 or active = 1 \n\nfor j=1 to neighbor-limit \n\nif SLOT[i].direction= j \n\nwrite message bit to in-box \n\nof neighbor j: \n\nset active = 0: \nFORALL processor that just received a message \nif end[i] \n\nmove in-box to message-destination; \n\nelse \n\nmove in-box to out-box: \nset active bit = 1: \n\nThis code follows the method mentioned above. The time slots are looped \nthrough and the messages are passed in the appropriate directions as specified \nin the SLOTS array. Two bits, in-box and out-box, are used for message passing \nso that an out-going message won't be overwritten by an in-coming message \nbefore it gets transferred. The inner loop lor j = 1 to neighbor limit checks \neach of the possible neighbor directions and sends the message to the correct \nneighbor. For instance, in a grid the neighbor limit is 4, for North, South, East, \nand West neighbors. The time complexity of data movement is O(T times \nneighbor-limi t) . \n\nSETTING UP CONNECTIONS \n\nOne of the goals in developing this system was to have a method for adding \nnew connections quickly. Paths are added so that they don't conflict with any \npreviously constructed path. Once a path is placed it will not be re-routed \n\n\f811 \n\nby the basic placement algorithm; it will always start at the same spot at the \nsame time. The basic idea of the method for placing a connection is to start \nfrom the source processor and in parallel examine all possible paths outward \nfrom it that do not conflict with pre-established paths and which adhere to the \nsequential time constraint. As the trial paths are flooding the system, they \nare recorded in temporary storage. At the end of this deluge of trial paths all \npossible paths will have been examined. If the destination processor has been \nreached, then a path exists under the current time-space restrictions. Using \nthe stored information a path can be backtraced and recorded in the SLOTS \nstructure. This is similar to the Lee-Moore routing algorithm21 \u202222 for finding a \npath in a system, but with the sequential time restriction. \n\nFor example, suppose that the connection (u,v) is to be added. First it is \nassumed that processors for u and v have already been determined, otherwise \n(as a simplification) assume a random allocation from a pool of free proces(cid:173)\nsors. A parallel breadth-first search will be performed starting from the source \nprocessor. During the propagation phase a processor which receives a message \nchecks its SLOTS array to see if they are busy on that time step, if not it will \npropagate to its neighbors on the next time step. For instance, suppose a trial \npath starts at time 1 and moves to a neighboring processor, but that neighbor is \nalready busy at time 1 (as can be seen by examining the DIRECTION-SLOT.) \nSince a path that would go through this neighbor at this time is not legal, the \ntrial path would commit suicide, that is, it stops propagating itself. If the pro(cid:173)\ncessor slot for time 2 was free, the trial path would attempt to propagate to all \nof its' neighbors at time 3. \n\nUsing this technique paths can be constructed with essentially no knowl(cid:173)\nedge of the relative locations of the \"neurons\" being connected or the underly(cid:173)\ning topology. Variations on the outlined method, such as choosing the shortest \npath, can improve the choice of paths with very little overhead. If the entire net(cid:173)\nwork were known ahead of time, an off-line method could be used to construct \nthe paths more efficiently; work on off-line methods is underway. However, the \nsimple elegance of this basic method holds great appeal for systems that change \nslowly over time in unpredictable ways. \n\nPERFORMANCE \n\nAdding an edge (assuming one can be added), deleting any set of edges, or \n\ntraversing all the edges in parallel, all have time complexity O(T x neighbor(cid:173)\nlimit). If it is assumed that neighbor limit is a small constant then the com(cid:173)\nplexity is O(T). Since T is related both to the time and space needed, it is \na crucial factor in determining the value of the algorithms presented. Some \nanalytic bounds on T were presented inll, but it is difficult to get a tight bound \non T for general interconnection networks and dynamically changing graphs. A \nsimulator was constructed to examine the behavior of the algorithms. Besides \nthe simulated data, the algorithms mentioned were actually implemented for \nthe Connection Machine. The data produced by the simulator is consistent \nwith that produced by the real machine. The major result is that the size of T \nappears proportional to the average degree of the graph times the diameter of \nthe interconnection network20 \u2022 \n\n\f812 \n\nFURTHER RESEARCH \n\nThis paper has been largely concerned with a system that can realize the \nconnections in a neural network when the two neurons to be joined have been \nactivated. The tests conducted have been concerned with the validity of the \nmethod for implementing connections, rather than with a full simulation of a \nneural network. Clearly this is the next step. \n\nA natural extension of this method is a system which can form its .own \nconnections based solely on the activity of certain neurons, without having \nto explicitly activate the source and destination neurons. This is an exciting \navenue, and further results should be forthcoming. \n\nAnother area of research involves the formation of branching paths. The \ncurrent method takes an arc in the neural network and realizes it as a unique \npath in space-time. A variation that has similarities to dendritic structure \nwould allow a path coming from a neuron to branch and go to several target \nneurons. This extension would allow for a much more economical embedding \nsystem. Simulations are currently underway. \n\nCONCLUSIONS \n\nA method has been outlined which allows the implementation of neural nets \nconnections on a class of parallel architectures which can be constructed with \nvery large numbers of processing elements. To economize on hardware so as to \nmaximize the number of processing element buildable, it was assumed that the \nprocessors only have local connections; no hardware is provided for communi(cid:173)\ncation. Some simple algorithms have been presented which allow neural nets \nwith arbitrary connections to be embedded in SIMD architectures having a va(cid:173)\nriety of topologies. The time for performing a parallel traversal and for adding \na new connection appears to be proportional to the diameter of the topology \ntimes the average number of arcs in the graph being embedded. In a system \nwhere the topology has diameter O(logN), and where the degree of the graph \nbeing embedded is bounded by a constant, the time is apparently O(logN). \nThis makes it competitive with existing methods for SIMD routing, with the \nadvantages that there are no apriori requirements for the form of the data, and \nthe topological requirements are extremely general. Also, with our approach \nnew arcs can be added without reconfiguring the entire system. The simplicity \nof the implementation and the flexibility of the method suggest that it could be \nan important tool for using SIMD architectures for neural network simulation. \n\nBIBLIOGRAPHY \n\n1. M.J. Flynn, \"Some computer organizations and their effectiveness\", IEEE \nTrans Comput., vol C-21, no.9, pp. 948-960. \n2. W. Hillis, \"The Connection Machine\", MIT Press, Cambridge, Mass, 1985. \n3. D. Nassimi, S. Sahni, \"Parallel Algorithms to Set-up the Benes Permutation \nNetwork\", Proc. Workshop on Interconnection Networks for Parallel and Dis(cid:173)\ntributed Processing, April 1980. \n4. D. Nassimi, S. Sahni, \"Benes Network and Parallel Permutation Algorithms\", \nIEEE Transactions on Computers, Vol C-30, No 5, May 1981. \n5. D. Nassimi, S. Sahni, \"Parallel Permutation and Sorting Algorithms and a \n\n\f813 \n\nNew Generalized Connection Network\" , JACM, Vol. 29, No.3, July 1982 pp. \n642-667 \n6. K.E. Batcher, \"Sorting Networks and their Applications\", The Proceedings \nof AFIPS 1968 SJCC, 1968, pp. 307-314. \n7. C. Thompson, \"Generalized connection networks for parallel processor inter(cid:173)\ncommunication\", IEEE Tran. Computers, Vol C, No 27, Dec 78, pp. 1119-1125. \n8. Nathan H. Brown, Jr., \"Neural Network Implementation Approaches for the \nConnection Machine\", presented at the 1987 conference on Neural Information \nProcessing Systems - Natural and Synthetic. \n9. K.E. Batcher, \"Design of a massively parallel processor\", IEEE Trans on \nComputers, Sept 1980, pp. 836-840. \n10. H.M. Hastings, S. Waner, \"Neural Nets on the MPP\" , Frontiers of Massively \nParallel Scientific Computation, NASA Conference Publication 2478, NASA \nGoddard Space Flight Center, Greenbelt Maryland, 1986. \n11. S. Tomboulian, \"A System for Routing Arbitrary Communication Graphs \non SIMD Architectures\", Doctoral Dissertation, Dept of Computer Science, \nDuke University, Durham NC. \n12. T. Feng, \"A Survey of Interconnection Networks\", Computer, Dec 1981, \npp.12-27. \n13. F. Preparata and J. Vuillemin, \"The Cube Connected Cycles: a Versatile \nNetwork for Parallel Computation\", Comm. ACM, Vol 24, No 5 May 1981, pp. \n300-309. \n14. H. Stone, \"Parallel processing with the perfect shuffle\", IEEE Trans. Com(cid:173)\nputers, Vol C, No 20, Feb 1971, pp. 153-161. \n15. T. Leighton, \"Parallel Computation Using Meshes of Trees\", Proc. Inter(cid:173)\nnational Workshop on Graph Theory Concepts in Computer Science, 1983. \n16. W.S. McCulloch, and W. Pitts, \"A Logical Calculus of the Ideas Imminent \nin Nervous Activity,\" Bulletin of Mathematical Biophysics, Vol 5, 1943, pp.115-\n133. \n17. J.J. Hopfield, \"Neural networks and physical systems with emergent col(cid:173)\nlective computational abilities\", Prot!. Natl. Aca. Sci., Vol 79, April 1982, pp. \n2554-2558. \n18. T. Kohonen, \"Self-Organization and Associative Memory, Springer-Verlag, \nBerlin, 1984. \n19. R.P. Lippmann, \"An Introduction to Computing with Neural Nets\", IEEE \nAASP, Apri11987, pp. 4-22. \n20. S. Tomboulian, \"A System for Routing Directed Graphs on SIMD Architec(cid:173)\ntures\", ICASE Report No. 87-14, NASA Langley Research Center, Hampton, \nVA. \n21. C.Y. Lee, \"An algorithm for path connections and its applications\", IRE \nTrans Elec Comput, Vol. EC-I0, Sept. 1961, pp. 346-365. \n22. E. F. Moore, \"Shortest path through a maze\", A nnals of Computation \nLaboratory, vol. 30. Cambridge, MA: Harvard Univ. Press, 1959, pp.285-292. \n\n\f", "award": [], "sourceid": 35, "authors": [{"given_name": "Sherryl", "family_name": "Tomboulian", "institution": null}]}