{"title": "Contour-Map Encoding of Shape for Early Vision", "book": "Advances in Neural Information Processing Systems", "page_first": 282, "page_last": 289, "abstract": null, "full_text": "282 \n\nKanerva \n\nContour-Map Encoding of Shape for Early Vision \n\nResearch Institute for Advanced Computer Science \n\nPentti Kanerva \n\nMail Stop 230-5, NASA Ames Research Center \n\nMoffett Field, California 94035 \n\nABSTRACT \n\nContour maps provide a general method for \nrecognizing two-dimensional shapes. All but \nblank images give rise to such maps, and people \nare good at recognizing objects and shapes \nfrom them. The maps are encoded easily in \nlong feature vectors that are suitable for \nrecognition by an associative memory. These \nproperties of contour maps suggest a role for \nthem in early visual perception. The prevalence \nof direction-sensitive neurons in the visual \ncortex of mammals supports this view. \n\nINTRODUCTION \n\nEarly vision refers here to the first stages of visual \nperception of an experienced (adult human) observer. \nOverall, visual perception results in the identification of \nwhat is being viewed: We recognize an image as the letter A \nbecause it looks to us like other As we have seen. Early \nvision is the beginning of this process of identification-(cid:173)\nthe making of the first guess. \n\nEarly vision cannot be based on special or salient \n\nfeatures. For example, we normally think of the letter A \nas being composed of two slanted strokes, / and \\, meeting \nat the top and connected in the middle by a horizontal \nstroke, -. The strokes and their coincidences define all \nthe features of A. However, we recognize the As in Figure 1 \neven though the strokes and the features, if present at all, \ndo not stand out in the images. \n\n\fContour-Map Encoding of Shape for Early Vision \n\n283 \n\nMost telling about human vision is that we can recognize \n\nsuch As after seeing more or less normal As only. The \nchallenge of early vision, then, is to find general encoding \nmechanisms that turn these quite dissimilar images of the \nsame object into similar internal representations while \nleaving the representations of different objects dissimilar; \nand to find basic pattern-recognition mechanisms that work \nwith these representations. Since our main work is on \nassociative memories, we have been interested in ways to \nencode images into long feature vectors suitable for such \nmemories. The contour-map method of this paper encodes a \nvariety of images into vectors for associative memories. \n\nREPRESENTING AN IMAGE AS A CONTOUR MAP \n\nline drawings, silhouettes, \n\nImages take many forms: \noutlines, dot-matrix pictures, gray-scale pictures, color \npictures, and the like, and pictures that combine all these \nelements. Common to all is that they occupy a region of \n(two-dimensional) space. An early representation of an \nimage should therefore be concerned with how the image \ncontrols its space or, in technical terms, how might it be \nrepresented as a field. \n\nLet us consider first a gray-scale image. \n\nIt defines \n\na field by how dark it is in different places (image \nintensity--a scalar field--the image itself is the field). \nA related field is given by how the darkness changes from \nplace to place (gradient of intensity--a vector field) . \nNeither one is quite right for recognizing As because \nreversing the field (turning dark to light and light to \ndark) leaves us with the \"same\" A. However, the dark(cid:173)\nand-light reversal leaves the contour lines of the image \nunchanged (i.e., lines of uniform intensity--technically \na tangent field perpendicular to the gradient field). My \nproposal is to base initial recognition on the contour \nlines. \n\nIn line drawings and black-and-white images, which have \n\nonly two darkness levels or \"colors\", the contour lines are \nnot well defined. This is overcome by propagating the lines \nand the edges of the image outward and inward over areas of \n\n............ .. ..... . \n:'. : :: :::: :::: ::: :: :: \n..... .....\u2022. .... .... \n............. ....... \n........ .\u2022.\u2022. ....... \n....... ....... ...... \n... .....\u2022........... \n........... ......... \n:: :::::=:: :: :::; :::: \n........ ............ \n:; ;:::::: ::;; :::;:: : \n: ;:::!:~:::!~~:::::: \n............... ..\u2022.. \n:; ~: : : : ~ :: : : : : : : ::;; \n\nto ' \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 , \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \n\nFIGURE 1. Various kinds of As. \n\n\f284 \n\n]{anerva \n\nuniform image intensity, in the manner of contour lines, \nroughly parallel to the lines and the edges. Figure 2 shows \nonly a few such lines, but, in fact, the image is covered \nwith them, running roughly parallel to each other. As a \nrule, exactly one contour line runs through any given point. \nComputing its direction is discussed near the end of the \npaper. \n\nENCODING THE CONTOUR MAP \n\nTable 1 shows how the direction of the contour at a point \ncan be encoded in three trits (-1, 0, 1 ternary variables) . \nThe code divides 180 degrees into six equal sectors and \nassigns a codeword to each sector. The distance between \ntwo codewords is the number of (Hamming) units by which \nthe words differ (L1 distance). The code is circular, and \nthe distance between codewords is related directly to the \ndifference in direction: Directions 30, 60, and 90 degrees \napart are encoded with words that are 2, 4, and 6 units \napart, respectively. The code wraps around, as do tangents, \nso that directions 180 degrees apart are encoded the same. \nFor finer discrimination we would use some finer circular \ncode. The zero-word 000, which is equally far from all \nother words in the code, is used for points at which the \ndirection of the contour is ill-defined, such as the very \ncenters of circles. \n\nThis encoding makes the direction of the contour at any \n\npoint on a map into a three-component vector. To encode \nthe entire map, the vector field is sampled at a fixed, \nfinite set of points, and the encodings of the sample points \nare concatenated in fixed order into a long vector. \npreliminary studies we have used small sample sizes: \n(= 35) sample points, each encoded into three trits, for a \ntotal vector of (3 x 35 =) 105 trits, and 8 x 8 sample \npoints by three trits for a total vector of 192 trits. \n\n7 x 5 \n\nIn \n\nFIGURE 2. Propagating the contour. \n\n\fContour-Map Encoding of Shape for Early Vision \n\n285 \n\nFor an example, Figure 3 shows the digit 4 drawn on a \n\n21-by-15-pixel grid. It also shows a 7 x 5 sampling grid \nlaid over the image and the direction of the contour at \nthe sample points (shown by short line segments). Below \nthe image are the three-trit encodings of the sample points \nstarting at the upper left corner and progressing by rows, \nconcatenated into a 105-trit encoding of the entire image. \nIn this encoding, + means +1 and - means -1. \n\nFrom Positions of the Code to Directional Sensors \n\nEach position of the three-trit code can be thought of as a \ndirectional sensor. For example, the center position senses \ncontours at 90 degrees, plus or minus 45 degrees: \nwhen the direction of the contour is closer to vertical than \nto horizontal (see Table 1). Similarly, each position of \nthe long (105-trit) code for the entire map can be thought \nof as a sensor for a specific direction--plus or minus--at \na specific location on the map. \n\nIt is 1 \n\nAn array of sensors will thus encode an image. The \n\nsensors are like the direction-sensitive cells of the visual \ncortex. Such cells, of course, are not laid down with \nperfect regularity over the cortex, but that does not mean \n\nI I I \n\n\\ \n\nTABLE 1 \n\nCoarse Circular Code for \n\nDirection of Contour \n\n~~===================== \n\ndegrees \n\nDirection, \nCodeword \n-----------------------\n1 \n1 \n1 \n\n0 + 15 \n30 + 15 \n60 + 15 \n\n1 -1 \n-1 -1 \n-1 \n1 \n\nIi! \n\n~~ f \n--. \n-- --\n\n~I \u2022 \n[fl \n\nf \nf \n\n,~ . \nf.~.,.~ }:< \"', \n\n....... \n\n....... \n\n....... \n\n\\ \n\nf \n\\ \n\nf \nI \n\n90 + 15 \n120 + 15 \n150 + 15 \n\n180 + 15 \n. . . \n\nUndefined \n\n1 -1 \n-1 \n1 \n1 -1 \n1 -1 -1 \n\n1 -1 \n1 \n. . . \n0 \n0 \n\n0 \n\n======================= \n\n-++ -++ -++ --+ ++-\n-++ -++ -++ -+- -+-\n--+ --+ --+ -+- -+-\n--+ -++ 000 -+- -+-\n000 +-+ +-+ +-+ --+ \n+-- +-+ +-+ -+- -+-\n+-- +-- ++- ++- -++ \n\nFIGURE 3. Encoding an image. \n\n\f286 \n\nKanerva \n\nthat they could not perform as encoders. Accordingly, a \ndirection-sensitive cell can be thought of as a feature \ndetector that encodes for a certain direction at a certain \nlocation in the visual or attentional field. An irregular \narray of randomly oriented sensors laid over images would \nproduce perfectly good encodings of their contour maps. \n\nCOMPARING TWO CONTOUR MAPS \n\nHow closely do two contour maps resemble each other? For \nsimplicity, we will compare maps of equal size (and shape) \nonly. The maps are compared point to point. The difference \nat a point is the difference in the direction of the contour \nat that point on the two maps--that is, the magnitude of the \nlesser of the two angles made by the two contour lines that \nrun through the two points that correspond to each other \non the two maps. The maximum difference at a point is \ntherefore 90 degrees. The entire maps are then compared \nby adding the pointwise differences over all the points (by \nintegrating over the area of the map). \n\nThe purpose of the encoding is to make the comparing of \n\nmaps simple. The code is so constructed that the difference \nof two maps at a point is roughly proportional to the \ndistance between the two (3-trit) codewords--one from each \nmap--for that point. We need not even concern ourselves \nwith the finding of the lesser of the two angles made by the \ncrossing of the two contours; the distance between codewords \naccounts for that automatically. \n\nEntire maps are then compared by adding together the \n\ndistances at the (35) sample points. This is equivalent \nto computing the distance between the (105-trit) codewords \nfor the two maps. This distance is proportional to the \ndifference between the maps, and it is approximately so \nbecause the maps are sampled at a small number of points \nand because the direction at each point is coded coarsely. \n\nCOMPUTING THE DIRECTION OF THE CONTOUR \n\nWe have not explored widely how to compute contours from \nimages and merely outline here one method, not exactly \nbiological, that works for line drawings and two-tone images \nand that can be generalized to gray-scale images and even \nto many multicolor images. We have also experimented with \noriented, difference-of-Gaussian filters of Parent and \nZucker (1985) and with cortex transforms of Watson (1987). \nThe contours are based on a simple model of attraction, \nakin to gravity, by assuming that the lines and the edges \nof the image attract according to their distance from the \npoint. The net attraction at any point on the image defines \n\n\fContour-Map Encoding of Shape for Early Vision \n\n287 \n\na gradient field, and the contours are perpendicular to it. \nIn practice we work with pixels and assume, for the sake \n\nof the gravity model, that pixels of the same color--same as \nthat of the sample point P for which we are computing the \ndirection--have mass zero and those of the opposite color \nhave mass one. For the direction to be independent of \nscale, the attractive force must be inversely proportional \nto some power of the distance. Powers greater than 2 make \nthe computation local. For example, power 7 means that one \npixel, twice as far as another, contributes only 1/128 as \nmuch as the other to the net force. To make the attraction \nsomewhat insensitive to noise, a small constant, 3, is added \nto the distance. \nsmall amount of experimentation.) Hence, pixel X (of mass \n1) attracts P with a magnitude \n\n(The values 7 and 3 were chosed after a \n\n[d(P,X) + 3] \n\n-7 \n\nforce in the direction of X, where d(P,X) is the (Euclidean) \ndistance between P and X. The vector sum of the forces \nover all pixels X (of mass 1) then is the attractive \nforce at point P, and the direction of the contour at P is \nperpendicular to it. The magnitude of the vector surn is \nscaled by dividing it with the sum of the magnitudes of its \ncomponents. This scaled magnitude indicates how well the \ndirection is defined in the image. \n\nWhen this computation is made at a point on a (one-pixel \n\nwide) line, the result is a zero-vector (the gradient at \nthe top of a ridge is zero). However, we want to use the \ndirection of the line itself as the direction of the \ncontour. To this end, we compute at each sample point P \nanother vector that detects linear features, such as lines. \nThis computation is based on the above attraction model, \nmodified as follows: Pixels of the same color as P's now \nhave mass one and those of the opposite color have mass zero \n(the pixel at P being always regarded as having mass zero); \nand the direction of the force, instead of being the angle \nfrom P to X, is twice that angle. The doubling of the angle \nmakes attractive forces in opposite directions (along a \nline) reenforce each other and in perpendicular directions \ncancel out each other. The angle of the net force is then \nhalved, and the magnitude of the force is scaled as above. \n\nThe two computations yield two vectors, both representing \n\nthe direction of the contour at a point. They can be \ncombined into a single vector by doubling their angles, \nto eliminate lBO-degree ambiguities, by adding together \nthe resulting vectors, and by halving the angle of the sum. \nThe direction of the result gives the direction of the \ncontour, and the magnitude of the result indicates how well \n\n\f288 \n\nKanerva \n\nthis direction is defined. \nthreshold, the direction is taken to be undefined and is \nencoded with 000. \n\nIf the magnitude is below some \n\nSOME COMPARISONS \n\nThe method is very general, which is at once its virtue and \nlimitation. The virtue is that it works where more specific \nmethods fail, the limitation that the specific methods are \nneeded for specific problems. \n\nIn our preliminary experiments with handwritten Zip-code \ndigits, low-pass filtering (blurring) an image, as a method \nof encoding it, and contour maps resulted in similar rates \nof recognition by a sparse distributed memory. Higher rates \non this same task were gotten by Denker et al. (1989) by \nencoding the image in terms of features specific to \nhandwriting. \n\nTo get an idea of the generality of contour maps, Figure \n\n4 shows encoded maps of ten normal digits like that in \nFigure 3, and for three unusual digits barely recognizable \nby humans. The labels for the unusual ones and for their \nmaps, 8a, 8b, and 9a, tell what digits they were intented \nto be. Table 2 of distances between the encoded maps \nshows that 8 gives only the second best match to 8a and 8b, \nwhereas the digit closest to 9a indeed is 9. This suggest \nthat a system trained on normal letters and digits would do \n\n6 \n\n9a \n\na \n\n8a \n\n8b \n\n1 r \n\na \n\na \n\n8a \n\n8b \n\n9a \n\n\u2022 \n\n\u2022 \n\n\u2022 \n/ . / \n\nFIGURE 4. Contour maps of digits. Unusual text. \n\n\fContour-Map Encoding of Shape for Early Vision \n\n289 \n\nDistances Between Normal and Unusual Digits of Figure 4 \n\nTABLE 2 \n\no \n\n1 \n\n2 \n\n3 \n\n4 \n\n5 \n\n6 \n\n7 \n\n8 \n\n9 \n\n8a \n8b \n9a \n\n62 \n38 \n70 \n\n95 \n71 \n89 \n\n80 \n88 \n66 \n\n91 \n74 \n64 \n77 \n90 109 \n\n83 \n65 \n\n86 \n87 \n73 \n88 \n99 103 62 \n\n79 \n67 \n51 \n73 \n83 59 \n\n-============================================= \n\na fair job at recognizing the 'NIPS 1989' at the bottom of \nFigure 4. Systems that encode characters as bit maps, or \nthat take them as composed of strokes, likewise trained, \nwould not do nearly as well. Going back to the As of Figure \n1, they can, with one exception, be recognized based on the \nmap of a normal A. Logograms are a rich source of images of \nthis kind. They are excellent for testing a vision system \nfor generality. Finally, other oriented fields, not just \ncontour maps, can be encoded with methods similar to this \nfor recognition by an associative memory. \n\nAcknowledgements \n\nThis research was supported by the National Aeronautics and \nSpace Administration (NASA) with cooperative agreement No. \nNCC2-387 with the Universities Space Research Association. \nThe idea of contour maps was inspired by the gridfonts of \nDouglas Hofstadter (1985). The first experiments with the \ncontour-map method were done by Bruno Olshausen. The \ngravity model arose from discussions with Lauri Kanerva. \nDavid Rogers made the computer-drawn illustrations. \n\nReferences \n\nDenker, J.S., Gardner, W.R., Graf, H.P., Henderson, D., \n\nHoward, R.E., Hubbard, W., Jackel, L.D., Baird, H.S., and \nGuyon, I. \nWritten Zip Code Digits. \nIn D.S. Touretzky (ed.), \nAdvances in Neural Information Systems, Volume I. \nSan Mateo, California: Kaufmann. 323-331. \n\n(1989) Neural Network Recognizer for Hand(cid:173)\n\nHofstadter, D.R. \n\n(1985) Metamagical Themas. New Your: \n\nBasic Books. \n\nParent, P., and Zucker, S.W. \n\n(1985) Trace Inference, \n\nCurvature Consistency, and Curve Detection. Report CIM-\n86-3, McGill Research Center for Intelligent Machines, \nMontreal, Canada. \n\nWatson, A.W. \n\n(1987) The Cortex Transform: Rapid \n\nComputation of Simulated Neural Images. Computer Vision, \nGraphics, and Image Processing 39(3) :311-327. \n\n\f", "award": [], "sourceid": 190, "authors": [{"given_name": "Pentti", "family_name": "Kanerva", "institution": null}]}