{"title": "Invariant Object Recognition Using a Distributed Associative Memory", "book": "Neural Information Processing Systems", "page_first": 830, "page_last": 839, "abstract": null, "full_text": "830 \n\nInvariant Object Recognition Using a Distributed Associative Memory \n\nHarry Wechsler and George Lee Zimmerman \n\nDepartment or Electrical Engineering \n\nUniversity or Minnesota \nMinneapolis, MN 55455 \n\nAbstract \n\nThis paper describes an approach to 2-dimensional object recognition. Complex-log con(cid:173)\nformal mapping is combined with a distributed associative memory to create a system \nwhich recognizes objects regardless of changes in rotation or scale. Recalled information \nfrom the memorized database is used to classify an object, reconstruct the memorized ver(cid:173)\nsion of the object, and estimate the magnitude of changes in scale or rotation. The system \nresponse is resistant to moderate amounts of noise and occlusion. Several experiments, us(cid:173)\ning real, gray scale images, are presented to show the feasibility of our approach. \n\nIntroduction \n\nThe challenge of the visual recognition problem stems from the fact that the projec(cid:173)\n\ntion of an object onto an image can be confounded by several dimensions of variability \nsuch as uncertain perspective, changing orientation and scale, sensor noise, occlusion, and \nnon-uniform illumination. A vision system must not only be able to sense the identity of an \nobject despite this variability, but must also be able to characterize such variability -- be(cid:173)\ncause the variability inherently carries much of the valuable information about the world. \nOur goal is to derive the functional characteristics of image representations suitable for in(cid:173)\nvariant recognition using a distributed associative memory. The main question is that of \nfinding appropriate transformations such that interactions between the internal structure \nof the resulting representations and the distributed associative memory yield invariant \nrecognition. As Simon [1] points out, all mathematical derivation can be viewed simply as \na change of representation, making evident what was previously true but obscure. This \nview can be extended to all problem solving. Solving a problem then means transforming it \nso as to make the solution transparent . \n\nWe approach \n\nthe problem of object \n\nrecognition with \n\nthree \n\nrequirements: \n\nclassification, reconstruction, and characterization. Classification implies the ability to dis(cid:173)\ntinguish objects that were previously encountered. Reconstruction is the process by which \nmemorized images can be drawn from memory given a distorted version exists at the in(cid:173)\nput. Characterization involves extracting information about how the object has changed \nfrom the way in which it was memorized. Our goal in this paper is to discuss a system \nwhich \ntortions like changes in scale and orientation, and can characterize those transformations. \nThe system also allows for noise and occlusion and is tolerant of memory faults. \n\nis able to recognize memorized 2-dimensional objects regardless of geometric dis(cid:173)\n\nThe following sections, Invariant Representation and Distributed Associative \n\nMemory, respectively, describe the various components of the system in detail. The Experi(cid:173)\nments section presents the results from several experiments we have performed on real \ndata. The paper concludes with a discussion of our results and their implications for future \nresearch. \n\n\u00a9 American Institute of Physics 1988 \n\n\f831 \n\n1. Invariant Representation \n\nThe goal of this section is to examine the various components used to produce the \nvectors which are associated in the distributed associative memory. The block diagram \nwhich describes the various functional units involved in obtaining an invariant image \nrepresentation is shown in Figure 1. The image is complex-log conformally mapped so that \nrotation and scale changes become translation in the transform domain . Along with the \nconformal mapping, the image is also filtered by a space variant filter to reduce the effects \nof aliasing. The conformally mapped image is then processed through a Laplacian in order \nto solve some problems associated with the conformal mapping. The Fourier transform of \nboth the conformally mapped image and the Laplacian processed image produce the four \noutput vectors. The magnitude output vector I-II is invariant to linear transformations of \nthe object in the input image. The phase output vector <1>2 contains information concern(cid:173)\ning the spatial properties of the object in the input image. \n\n1.1 Complex-Log Mapping and Space Variant Filtering \n\nThe first box of the block diagram given in Figure 1 consists of two components: \nComplex-log mapping and space variant filtering. Complex-log mapping transforms an \nimage from rectangular coordinates to polar exponential coordinates. This transformation \nchanges rotation and scale into translation. If the image is mapped onto a complex plane \nthen each pixel (x,y) on the Cartesian plane can be described mathematically by z = x + \njy. The complex-log mapped points ware described by \n\nw =In{z) =In(lzl} +jiJ z \n\n(1) \n\nOur system sampled 256x256 pixel images to construct 64x64 complex-log mapped \nimages. Samples were taken along radial lines spaced 5.6 degrees apart. Along each radial \nline the step size between samples increased by powers of 1.08. These numbers are derived \nfrom the number of pixels in the original image and the number of samples in the \ncomplex-log mapped image. An excellent examination of the different conditions involved \nin selecting the appropriate number of samples for a complex-log mapped image is given in \n[2J. The non-linear sampling can be split into two distinct parts along each radial line. To(cid:173)\nward the center of the image the samples are dense enough that no anti-aliasing filter is \nneeded. Samples taken at the edge of the image are large and an anti-aliasing filter is \nnecessary. The image filtered in this manner has a circular region around the center which \ncorresponds to an area of highest resolution. The size of this region is a function of the \nnumber of angular samples and radial samples. The filtering is done, at the same time as \nthe sampling, by convolving truncated Bessel functions with the image in the space \ndomain. The width of the Bessel functions main lobe is inversely proportional to the eccen(cid:173)\ntricity of the sample point. \n\nA problem associated with the complex-log mapping is sensitivity to center \n\nmisalignment of the sampled image. Small shifts from the center causes dramatic distor(cid:173)\ntions in the complex-log mapped image. Our system assumes that the object is centered in \nthe image frame. Slight misalignments are considered noise. Large misalignments are con(cid:173)\nsidered as translations and could be accounted for by changing the gaze in such a way as \nto bring the object into the center of the frame. The decision about what to bring into the \ncenter of the frame is an active function and should be determined by the task. An exam(cid:173)\nple of a system which could be used to guide the translation process was developed by \nAnderson and Burt [3J. Their pyramid system analyzes the input image at different tem-\n\n\f00 \nc..:> \n~ \n\n~-\u00b7-FO\",i\" -~ \n2 \n-1-1 2 \n\n' \n1ransform \n\nFourier \n\n_~I \n\nTransform \n\nI-II \n\nLaplacian \n\nDistributed \nAssociative \nMemory \n\n~ \n\nInverse \n\nProcessing \n\nand \n\nReconstruction \n\nRotation \n\nand \nScale \n\nEstimation \n\nClassification \n\nFigure 1. Block Diagram of the System. \n\nMapping \n\n~. Compl\".lo, I \nI I \nI \nI I \nI Space Variant \nI \nI \n\nand \n\nFiltering \n\nImage \n\n\f833 \n\nporal and spatial resolution levels. Their smart sensor was then able to shift its fixation \nsuch that interesting parts of the image (ie. something large and moving) was brought into \nthe central part of the frame for recognition . \n\n1.2 Fourier Transform \n\nThe second box in the block diagram of Figure 1 is the Fourier transform. The \n\nFourier transform of a 2-dimensional image f(x,y) is given by \n\nF(u,v) = j j f(x,y)e-i(ux+vy) dx dy \n\n-00 -00 \n\n(2) \n\nand can be described by two 2-dimensional functions corresponding to the magnitude \nIF(u,v)1 and phase F(u,v). The magnitude component of the Fourier trans~rm which is \ninvariant to translatIOn, carries much of the contrast information of the image . The phase \ncomponent of the Fourier transform carries information about how things ar} placed in an \nimage. Translation of f(x,y) corresponds to the addition of a linear phase cpmponent. The \ncomplex-log mapping transforms rotation and scale into translation and tije magnitude of \nthe Fourier transform is invariant to those translations so that I-II ivill not change \nsignificantly with rotation and scale of the object in the image. \n\n1.3 Laplacian \n\nThe Laplacian that we use is a difference-of-Gaussians (DOG) approximation to the \nfunction as given by Marr [4). \n\n2 \n'V2G =h [1 - r2/2oo2) e{ -r /200 } \n\n2 \n\n'1rtT \n\n(3) \n\nThe result of convolving the Laplacian with an image can be viewed as a two step process. \nThe image is blurred by a Gaussian kernel of a specified width oo. Then the isotropic \nsecond derivative of the blurred image is computed. The width of the Gaussian kernel is \nchosen such that the conformally mapped image is visible -- approximately 2 pixels in our \nexperiments. The Laplacian sharpens the edges of the object in the image and sets any re(cid:173)\ngion that did not change much to zero. Below we describe the benefits from using the La(cid:173)\nplacian. \n\nThe Laplacian eliminates the stretching problem encountered by the complex-log \nmapping due to changes in object size. When an object is expanded the complex-log \nmapped image will translate . The pixels vacated by this translation will be filled with \nmore pixels sampled from the center of the scaled object. These new pixels will not be \nsignificantly different than the displaced pixels so the result looks like a stretching in the \ncomplex-log mapped image. The Laplacian of the complex-log mapped image will set the \nnew pixels to zero because they do not significantly change from their surrounding pixels. \nThe Laplacian eliminates high frequency spreading due to the finite structure of the \ndiscrete Fourier transform and enhances the differences between memorized objects by ac(cid:173)\ncentuating edges and de-emphasizing areas of little change. \n\n2. Distributed Associative Memory (DAM) \n\nThe particular form of distributed associative memory that we deal with in this pa(cid:173)\nper is a memory matrix which modifies the flow of information. Stimulus vectors are asso(cid:173)\nciated with response vectors and the result of this association is spread over the entire \nmemory space . Distributing in this manner means that information about a small portion \nof the association can be found in a large area of the memory. New associations are placed \n\n\f834 \n\nover the older ones and are allowed to interact. This means that the size of the memory \nmatrix stays the same regardless of the number of associations that have been memorized. \nBecause the associations are allowed to interact with each other an implicit representation \nof structural relationships and contextual information can develop, and as a consequence a \nvery rich level of interactions can be captured. There are few restrictions on what vectors \ncan be associated there can exist extensive indexing and cross-referencing in the memory. \nDistributed associative memory captures a distributed representation which is context \ndependent. This is quite different from the simplistic behavioral model [5]. \n\nThe construction stage assumes that there are n pairs of m-dimensional vectors that \n\nare to be associated by the distributed associative memory. This can be written as \n\n-d \n\nh \n\nIV~ \n1 \n\n\"l.K:::+. = -r. \n1 \nI \n\n. \n\n\u00b7th \n\n~or 1\u00b7 \nI' \n\n= \n\nn \n1 \n, ... , \n\n(4) \n\nh \nwere s. enotes tel stlmu us vector an \nr. enotes tel correspon mg response Vec-\ntor. W~ want to construct a memory matrix M such that when the kth stimulus vector S; \nis projected onto the space defined by M the resulting projection will be the corresponding \nresponse vector r;. More specifically we want to solve the following equation: \n(5) \n\nMS=R \n\nd -d \n\nh \n\n.th \n\nd\u00b7 \n\nh \n\nS \n\n[ -\ns1 1 s2 1 \n1 \n\n1 \n\n-\n\n1 - ] d R \n\n[ -\n\nS \n\n\u00b7 \u00b7\u00b71 \n\n= \n\nan \n\nwere = \n\u00b7\u00b7\u00b71 r . umque so utlOn lor t IS equa-\ntion does not necessarily n exist for any arbitrary gr~up of associations that might be \nchosen. Usually, the number of associations n is smaller than m, the length of the vector to \nbe associated, so the system of equations is underconstrained. The constraint used to solve \nfor a unique matrix M is that of minimizing the square error, IIMS - RJ1 2, which results in \nthe solution \n\nr 1 1 r 2 1 \n1 \n\n1 \n\n-\n\n1 - ] A \u00b7 \n\nI \u00b7 ~ \n\nh\u00b7 \n\n(6) \n\nwhere S+ is known as the Moore-Penrose generalized inverse of S [6J. \n\nThe recall operation projects an unknown stimulus vector s onto the memory space \n\nM. The resulting projection yields the response vector r \n\nr =Ms \n\n(7) \n\nIf the memorized stimulus vectors are independent and the unknown stimulus vector s is \none of the memorized vectors S;, then the recalled vector will be the associated response \nvector r;. If the memorized stimulus vectors are dependent, then the vector recalled by \n\none of the memorized stimulus vectors will contain the associated response vector and \nsome crosstalk from the other stored response vectors. \n\nThe recall can be viewed as the weighted sum of the response vectors. The recall \nbegins by assigning weights according to how well the unknown stimulus vector matches \nwith the memorized stimulus vector using a linear least squares classifier. The response \nvectors are multiplied by the weights and summed together to build the recalled response \nvector. The recalled response vector is usually dominated by the memorized response vec(cid:173)\ntor that is closest to the unknown stimulus vector. \n\nAssume that there are n associations in the memory and each of the associated \nstimulus and response vectors have m elements. This means that the memory matrix has \nm2 elements. Also assume that the noise that. is added to each element of a memorized \n\n\f835 \n\nstimulus vector IS independent, Zero mean, with a variance of O'~ The recall from the \nmemory is then \n\n1 \n\nwhere tt is the input noise vector and t1 is the output noise vector. The ratio of the aver(cid:173)\nage output noise variance to the averagg input noise variance is \n\n(8) \n\n0' /0'. = -Tr \n2 \no \n\n1 \nm \n\n2 \n1 \n\n[MMT] \n\nFor the autoassociative case this simplifies to \n\n(9) \n\n(10) \n\nThis says that when a noisy version of a memorized input vector is applied to the memory \nthe recall is improved by a factor corresponding to the ratio of the number of memorized \nvectors to the number of elements in the vectors. For the heteroassociative memory ma(cid:173)\ntrix a similar formula holds as long as n is less than m [7]. \n\n(11) \n\nFault tolerance is a byproduct of the distributed nature and error correcting capa(cid:173)\n\nbilities of the distributed associative memory. By distributing the information, no single \nmemory cell carries a significant portion of the information critical to the overall perfor(cid:173)\nmance of the memory. \n\n3. Experiments \n\nIn this section we discuss the result of computer simulations of our system. Images \nof objects are first preprocessed through the sUbsystem outlined in section 2. The output of \nsuch a subsystem is four vectors: I-I , <1>1' 1-12, and <1>2' We construct the memory by associ(cid:173)\nating the stimulus vector I-II with \u00a3he response vector <1>2 for each object in the database. \nTo perform a recall from the meJIlory the.. unknown image is preprocessed by the same_sub(cid:173)\nsystem to produce the vectors I-II' <1>1' 1-12, and <1>2' The resulting stimulus vector I-I is \nprojected onto the m~mory matrix to produce a respOJlse vector which is an ~stimatel of \nthe memorized phase <1>2' The estimated phase vector cI> 2 and the magnitude I-II ate used \nto reconstruct the memorized object. The difference between the estimated phase <1>2 and \nthe unknown phase <1>2 is used to estimate the amount of rotation and scale experienced by \nthe object. \n\nThe database of images consists of twelve objects: four keys, four mechanical parts, \nand four leaves. The objects were chosen for their essentially two-dimensional structure. \nEach object was photographed using a digitizing video camera against a black back(cid:173)\nground. We emphasize that all of the images used in creating and testing the recognition \nsystem were taken at different times using various camera rotations and distances. The im(cid:173)\nages are digitized to 256x256, eight bit quantized pixels, and each object covers an area of \nabout 40x40 pixels. This small object size relative to the background is necessary due to \nthe non-linear sampling of the complex-log mapping. The objects were centered within the \nframe by hand. This is the source of much of the noise and could have been done automat(cid:173)\nically using the object's center of mass or some other criteria determined by the task. The \norientation of each memorized object was arbitrarily chosen such that their major axis \n\n\f836 \n\nwas vertical. The 2-dimensional images that are the output from the invariant represen(cid:173)\ntation subsystem are scanned horizontally to form the vectors for memorization. The da(cid:173)\ntabase used for these experiments is shown in Figure 2. \n\nFigure 2. The Database of Objects Used in the Experiments \n\na) Original \n\nb) Unknown \n\nc) Recall: rotated 135\u00b7 \n\nd) Memory:6 \n\nFigure 3. :Recall Using a Rotated and scaled key \n\nSNR: -3.37 Db \n\nThe first example of the operation of our system is shown in Figure 3. Figure 3a) is \nthe image of one of the keys as it was memorized. Figure 3b) is the unknown object \npresented to our system. The unknown object in this caSe is the same key that has been \nrotated by 180 degrees and scaled. Figure 3c) is the recalled, reconstructed image. The \n\n\f837 \n\nthe recalled phase. Figure 3d) \n\nrounded edges of the recalled image are artifacts of the complex-log mapping. Notice that \nthe reconstructed recall is the unrotated memorized key with some noise caused by errors \nin \nis a histogram which graphically displays the \nclassification vector which corresponds to S+S. The histogram shows the interplay between \nthe memorized images and the unknown image. The\" 6\" on the bargraph indicates which \nof the twelve classes the unknown object belongs. The histogram gives a value which is \nthe best linear estimate of the image relative to the memorized objects. Another measure, \nthe signal-to-noise ratio (SNR), is given at the bottom of the recalled image. SNR com(cid:173)\npares the variance of the ideal recall after processing with the variance of the difference \nbetween the ideal and actual recall. This is a measure of the amount of noise in the recall. \nThe SNR does not carry rr.uch information about the q\"Jality of the recall image because \nthe noise measured by the SNP.. is jue to many factors such as misalignment of the center, \nchanging reflections, and dependence between other memorized objects -- each affecting. \nquality in a variety of ways. Rotation and scale estimate~ are made using a vector_ D \ncorresponding to the dlll'erence between the unknown vector <1>2 and the recalled vector * 2' \nIn an ideal situation D will be a plane whose E;radient indicates the exact amount of r:.ota(cid:173)\ntion and scale the recalled object has experienced. In our system the recalled vector ** 2 is \ncorrupted with noise which means rotation...and scale have to be estim:ned. The estimate is \nmade by letting the first order difference D at each point in the plane vote for a specified \nrange of rotation or scale. \n\na) Original \n\nb) Unknown \n\nc) Recall \n\nd) Memory:4 \n\nFigure 4 Recall Using Scaled and Rotated\" S\" with Occlusion \n\nFigure 4 is an example of occlusion. The unknown object in this case is an \"s\" \ncurve which is larger and slightly tilted from the memorized \"s\" curve. A portion of the \nbottom curve was occluded. The resulting reconstruction is very noisy but has filled in the \nmissing part of the bottom curve. The noisy recall is reflected in both the SNR and the in(cid:173)\nterplay betw~en the memories shown by the hi~togram. \n\na) Ideal recall \n\nb) 30% removed \n\nc) 50% removed \n\nd) 75% removed \n\nFigure 5. Recall for Memory Matrix Randomly Set to Zero \n\nFigure 5 is the result of randomly setting the elements of the memory matrix to \n\n\f838 \n\nzero. Figure 5a) shows is the ideal recall. Figure 5b) is the recall after 30 percent of the \nmemory matrix has been set to zero. Figure 5c) is the recall for 50 percent and Figure 5d) \nis the recall for 75 percent. Even when 90 percent of the memory matrix has been set to \nzero a faint outline of the pin could still be seen in the recall. This result is important in \ntwo ways. First, it shows that the distributed associative memory is robust in the presence \nof noise. Second, it shows that a completely connected network is not necessary and as a \nconsequence a scheme for data compression of the memory matrix could be found. \n\n4. Conclusion \n\nIn this paper we demonstrate a computer vIsIon system which recognIzes 2-\ndimensional objects invariant to rotation or scale. The system combines an invariant \nrepresentation of the input images with a distributed associative memory such that objects \ncan be classified, reconstructed, and characterized. The distributed associative memory is \nresistant to moderate amounts of noise and occlusion. Several experiments, demonstrating \nthe ability of our computer vision system to operate on real, grey scale images, were \npresented. \n\nNeural network models, of which the di~tributed associative memory is one example, \nwere originally developed to simulate biological memory. They are characterized by a \nlarge number of highly interconnected simple processors which operate in p2..rallel. An ex(cid:173)\ncellent review of the many neural network models is given in [8J. The distrib-uted associa(cid:173)\ntive memory we use is linear, and as a result there are certain desirable properties which \nwill not be exhibited by our computer vision system. For example, feedback through our \nsystem will not improve recall from the memory. Recall could be improved if a non-linear \nelement, such as a sigmoid function, is introduced into the feedback loop. Non-linear neur(cid:173)\nal networks, such as those proposed by Hopfield [9] or Anderson et. al. [10J, can achieve \nthis type of improvement because each memorized pattern js associated with sta~le points \nin an energy space. The price to be paid for the introduction of non-linearities into a \nmemory system is that the system will be difficult to analyze and can be unstable. Imple(cid:173)\nmenting our computer vision system using non-linear distributed associative memory is a \ngoal of our future research. \n\nWe are presently extending our work toward 3-dimensional object recognition. Much \n\nof the present research in 3-dimensional object recognition is limited to polyhedral, non(cid:173)\noccluded objects' in a clean, highly controlled environment. Most systems are edge based \nand use a generate-and-test paradigm to estimate the position and orientation of recog(cid:173)\nnized objects. We propose to use an approach based on characteristic views [llJ or aspects \n[12J which suggests that the infinite 2-dimensional projections of a 3-dimensional object \ncan be grouped into a finite number of topological equivalence classes. An efficie:.t 3-\ndimensional recognition system would require a parallel indexing method to search for ob(cid:173)\nject models in the presence of geometric distortions, noise, and occlusion. Our object recog(cid:173)\nnition system using distributed associative memory can fulfill those requirements with \nrespect to characteristic views. \n\nReferenees \n\n[lJ Simon, H. A., (1984), The Seienee of the Artifldal (2nd ed.), MIT Press. \n[2J Massone, L., G. Sandini, and V. Tagliasco (1985), \"Form-invariant\" topological map(cid:173)\n\nping strategy for 2D shape recognition, CVGIP, 30, 169-188. \n\n[3J Anderson, C. H., P. J. Burt, and G. S. Van Der Wal (1985), Change detection and \ntracking using pyramid transform techniques, Proe. of the SPIE Conferenee on \nIntelligenee, Robots, and Computer Vision, Vol. 579, 72-78. \n\n\f839 \n\n[4] Marr, D. (1982), Vision, W. H. Freeman, 1982. \n[5] Hebb, O. D. (1949), The Organization of Behavior, New York: Wiley. \n[6J Kohonen, T. (1984), Self-Organization and Associative-Memories, Springer-Verlag. \n[7] Stiles, G. S. and D. L. Denq (1985), On the effect of noise on the Moore-Penrose gen-\n\neralized inverse associative memory, IEEE Trans. on PAMI, 7, 3,358-360. \n\n[8J MCClelland, J. L., and D. E. Rumelhart, and the PDP Research Group (Eds.) (1986), \n\nParallel Distributed, Processing, Vol. 1, 2, MIT Press. \n\n[9] Hopfield, J. J. (1982), Neural networks and physical systems with emergent collective \n\ncomputational abilities, Proc. Natl. Acad. Sci. USA, 79, April 1982. \n\n[10J Anderson, J. A., J. W. Silversteir., S. A. Ritz, and R. S. Jones (1977), Distinctive \nfeatures, categorical perception, and probability learning: some applications of a \nneural model, Psychol. Rev., 84,413-451. \n\n[11] Chakravarty, I., and H. Freeman (1982), Characteristic views as a basis for 3-D object \n\nrecognition, Proc. SPIE on Robot Vision, 336,37-45. \n\n[12] Koenderink, J. J., and A. J. Van Doorn (1979), Internal representation of solid shape \n\nwith respect to vision, Bioi. Cybern., 32,4,211-216. \n\n\f", "award": [], "sourceid": 81, "authors": [{"given_name": "Harry", "family_name": "Wechsler", "institution": null}, {"given_name": "George", "family_name": "Zimmerman", "institution": null}]}*