{"title": "The Fidelity of Local Ordinal Encoding", "book": "Advances in Neural Information Processing Systems", "page_first": 1279, "page_last": 1286, "abstract": "", "full_text": "The Fidelity of Local Ordinal Encoding\n\nJavid Sadr, Sayan Mukherjee, Keith Thoresz, Pawan Sinha\n\nCenter for Biological and Computational Learning\nDepartment of Brain and Cognitive Sciences, MIT\n\nCambridge, Massachusetts, 02142 USA\nfsadr,sayan,thorek,sinhag@ai.mit.edu\n\nAbstract\n\nA key question in neuroscience is how to encode sensory stimuli\nsuch as images and sounds. Motivated by studies of response prop-\nerties of neurons in the early cortical areas, we propose an encoding\nscheme that dispenses with absolute measures of signal intensity\nor contrast and uses, instead, only local ordinal measures. In this\nscheme, the structure of a signal is represented by a set of equalities\nand inequalities across adjacent regions. In this paper, we focus\non characterizing the (cid:12)delity of this representation strategy. We\ndevelop a regularization approach for image reconstruction from\nordinal measures and thereby demonstrate that the ordinal repre-\nsentation scheme can faithfully encode signal structure. We also\npresent a neurally plausible implementation of this computation\nthat uses only local update rules. The results highlight the robust-\nness and generalization ability of local ordinal encodings for the\ntask of pattern classi(cid:12)cation.\n\n1\n\nIntroduction\n\nBiological and arti(cid:12)cial recognition systems face the challenge of grouping together\ndi(cid:11)ering proximal stimuli arising from the same underlying object. How well the\nsystem succeeds in overcoming this challenge is critically dependent on the nature\nof the internal representations against which the observed inputs are matched. The\nrepresentation schemes should be capable of e(cid:14)ciently encoding object concepts\nwhile being tolerant to their appearance variations.\n\nIn this paper, we introduce and characterize a biologically plausible representation\nscheme for encoding signal structure. The scheme employs a simple vocabulary\nof local ordinal relations, of the kind that early sensory neurons are capable of\nextracting. Our results so far suggest that this scheme possesses several desirable\ncharacteristics, including tolerance to object appearance variations, computational\nsimplicity, and low memory requirements. We develop and demonstrate our ideas in\nthe visual domain, but they are intended to be applicable to other sensory modalities\nas well.\n\nThe starting point for our proposal lies in studies of the response properties of\nneurons in the early sensory cortical areas. These response properties constrain\n\n\f(a) A schematic contrast response curve for a primary visual cortex\nFigure 1:\nneuron. The response of the neuron saturates at low contrast values.\n(b) An\nidealization of (a). This unit can be thought of as an ordinal comparator, providing\ninformation only about contrast polarity but not its magnitude.\n\nthe kinds of measurements that can plausibly be included in our representation\nscheme. In the visual domain, many striate cortical neurons have rapidly saturating\ncontrast response functions [1, 4]. Their tendency to reach ceiling level responses at\nlow contrast values render these neurons sensitive primarily to local ordinal, rather\nthan metric, relations. We propose to use an idealization of such units as the basic\nvocabulary of our representation scheme ((cid:12)gure 1).\nIn this scheme, objects are\nencoded as sets of local ordinal relations across image regions. As discussed below,\nthis very simple idea seems well suited to handling the photometric appearance\nvariations that real-world objects exhibit.\n\nFigure 2: The challenge for a representation scheme: to construct stable descriptions\nof objects despite radical changes in appearance.\n\nAs (cid:12)gure 2 shows, variations in illumination signi(cid:12)cantly alter the individual bright-\nness of di(cid:11)erent parts of the face, such as the eyes, cheeks, and forehead. Therefore,\nabsolute image brightness distributions are unlikely to be adequate for classifying\nall of these images as depicting the same underlying object. Even the contrast\nmagnitudes across di(cid:11)erent parts of the face change greatly under di(cid:11)erent lighting\nconditions. While the absolute luminance and contrast magnitude information is\nhighly variable across these images, Thoresz and Sinha [9] have shown that one can\nidentify some stable ordinal measurements. Figure 3 shows several pairs of average\nbrightness values over localized patches for each of the three images included in\n(cid:12)gure 2. Certain regularities are apparent. For instance, the average brightness\nof the left eye is always less than that of the forehead, irrespective of the lighting\nconditions. The relative magnitudes of the two brightness values may change, but\nthe sign of the inequality does not. In other words, the ordinal relationship between\nthe average brightnesses of the <left-eye, forehead> pair is invariant under lighting\nchanges. Figure 3 shows several other such pair-wise invariances. It seems, therefore\nthat local ordinal relations may encode the stable facial attributes across di(cid:11)erent\nillumination conditions. An additional advantage to using ordinal relations is their\nnatural robustness to sensor noise. Thus, it would seem that local ordinal repre-\nsentations may be well suited for devising compact representations, robust against\n\n\fFigure 3: The absolute brightnesses and their relative magnitudes change under dif-\nferent lighting conditions but several pair-wise ordinal relationships stay invariant.\n\nlarge photometric variations, for at least some classes of objects. Notably, for simi-\nlar reasons, ordinal measures have also been shown to be a powerful tool for simple,\ne(cid:14)cient, and robust stereo image matching [3].\n\nIn what follows, we address an important open question regarding the expressive-\nness of the ordinal representation scheme. Given that this scheme ignores absolute\nluminance and contrast magnitude information, an obvious question that arises is\nwhether such a crude representation strategy can encode object/image structure\nwith any (cid:12)delity.\n\n2\n\nInformation Content of Local Ordinal Encoding\n\nFigure 4 shows how we de(cid:12)ne ordinal relations between an image region pa and\nits immediate neighbors pb = fpa1; : : : ; pa8g. In the conventional rectilinear grid,\nwhen all image regions pa are considered, four of the eight relations are redundant;\nwe encode the remaining four as f1; 0; (cid:0)1g based on the di(cid:11)erence in luminance\nbetween two neighbors being positive, zero, or negative, respectively. To demon-\nstrate the richness of information encoded by this scheme, we compare the original\nimage to one produced by a function that reconstructs the image using local ordinal\nrelationships as constraints. Our reconstruction function has the form\n\n(1)\nwhere x = fi; jg is the position of a pixel, f (x) is its intensity, (cid:30) is a map from the\ninput space into a high (possibly in(cid:12)nite) dimensional space, w is a hyperplane in\nthis high-dimensional space, and u (cid:1) v denotes an inner product.\n\nf (x) = w (cid:1) (cid:30)(x);\n\nIn(cid:12)nitely many reconstruction functions could satisfy the given ordinal constraints.\nTo make the problem well-posed we regularize [10] the reconstruction function sub-\nject to the ordinal constraints, as done in ordinal regression for ranking document\n\n\fDepartment of Brain Sciences, MIT\n\nCambridge, Massachusetts, USA.\nfsadr,sayan,thorek,sinhag@ai.mit.edu\n\nNeighbors\u2019 relations to pixel of interest\n\n|||||||||||||||||||||||||||-\n\nI(pa) < I(pa1 )\n= I(pa2 )\n< I(pa3 )\n< I(pa4 )\n> I(pa5 )\n< I(pa6 )\n< I(pa7 )\n< I(pa8 )\n\nFigure 4: Ordinal relationships between an image region pa and its neighbors.\n\n|||||||||||||||||||||||||||-\n\n(1)\n\nretrieval results [5]. Our regularization term is a norm in a Reproducing Kernel\nHilbert Space (RKHS) [2, 11]. Minimizing the norm in a RKHS subject to the\nordinal constraints corresponds to the following convex constrained quadratic opti-\nmization problem:\n\nmin\n(cid:24);w\n\nsubject to\n\n1\n2\n\njjwjj2 + C X\n\n(cid:24)p\n\np\n\n(2)\n\n(3)\n\n(cid:18)((cid:14)p)w (cid:1) ((cid:30)(xpa ) (cid:0) (cid:30)(xpb )) (cid:21) j(cid:14)pj (cid:0) (cid:24)p; 8 p and (cid:24) (cid:21) 0;\n\nwhere the function (cid:18)(y) = +1 for y (cid:21) 0 and (cid:0)1 otherwise, p is the index over\nall pairwise ordinal relations between all pixels pa and their local neighbors pb (as\ndepicted in (cid:12)gure 4), (cid:24)p are slack variables which are penalized by C (the trade-o(cid:11)\nbetween smoothness and ordinal constraints), and (cid:14)p take integer values f(cid:0)1; 0; 1g\ndenoting the ordinal relation (less than, equal to, or greater than, respectively)\nbetween pa and pb; for the case (cid:14)p = 0 the inequality in (3) will be a strict equality.\n\nTaking the dual of (2) subject to constraints (3) results in the following convex\nquadratic optimization problem which has only box constraints:\n\nsubject to\n\n(cid:11) X\nmax\n\np\n\nj(cid:14)pj (cid:11)p (cid:0)\n\n1\n2 X\n\np\n\nX\n\n(cid:11)p(cid:11)q\n\n~Kpq\n\nq\n\n0 (cid:20) (cid:11)p (cid:20) C\n(cid:0)C (cid:20) (cid:11)p (cid:20) C\n(cid:0)C (cid:20) (cid:11)p (cid:20) 0\n\nif\nif\nif\n\n(cid:14)p > 0;\n(cid:14)p = 0;\n(cid:14)p < 0;\n\n(4)\n\n(5)\n\nwhere (cid:11)p are the dual Lagrange multipliers, and the elements of the matrix ~K have\nthe form\n\n~Kpq = ((cid:30)(xpa ) (cid:0) (cid:30)(xpb )) (cid:1) ((cid:30)(xqa ) (cid:0) (cid:30)(xqb ))\n\n= K(xpa ; xqa ) (cid:0) K(xpb ; xqa ) (cid:0) K(xpa ; xqb ) + K(xpb ; xqb );\n\nwhere K(y; x) = (cid:30)(y)(cid:1)(cid:30)(x) using the standard kernel trick [8]. In this paper we use\nonly Gaussian kernels K(y; x) = exp((cid:0)jjx(cid:0) yjj2=2(cid:27)2): The reconstruction function,\nf (x), obtained from optimizing (4) subject to box constraints (5) has the following\nform\n\nf (x) = X\n\n(cid:11)p (K(x; xpa ) (cid:0) K(x; xpb )) :\n\n(6)\n\nNote that in general many of the (cid:11)p values may be zero { these terms do not\ncontribute to the reconstruction, and the corresponding constraints in (3) were not\n\np\n\n\f300\n\n200\n\n100\n\n0\n\n0\n\n300\n\n200\n\n100\n\n0\n\n0\n\n128\n\n255\n\n255\n\n128\n(d)\n\n(a)\n\n(b)\n\n(c)\n\nFigure 5: Reconstruction results from the regularization approach. (a) Original\nimages. (b) Reconstructed images. (c) Absolute di(cid:11)erence between original and\nreconstruction. (d) Histogram of absolute di(cid:11)erence.\n\nrequired. The remaining (cid:11)p with absolute value less than C satisfy the inequality\nconstraints in (3), whereas those with absolute value at C violate them.\n\nFigure 5 depicts two typical reconstructions performed by this algorithm. The\ndi(cid:11)erence images and error histograms suggests that the reconstructions closely\nmatch the source images.\n\n3 Discussion\n\nOur reconstruction results suggest that the local ordinal representation can faith-\nfully encode image structure. Thus, even though individual ordinal relations are\ninsensitive to absolute luminance or contrast magnitude, a set of such relations im-\nplicitly encodes metric information. In the context of the human visual system, this\nresult suggests that the rapidly saturating contrast response functions of the early\nvisual neurons do not signi(cid:12)cantly hinder their ability to convey accurate image\ninformation to subsequent cortical stages.\n\nAn important question that arises here is what are the strengths and limitations of\nlocal ordinal encoding. The (cid:12)rst key limitation is that for any choice of neighbor-\nhood size over which ordinal relations are extracted, there are classes of images for\nwhich the local ordinal representation will be unable to encode the metric struc-\nture. For a neighborhood of size n, an image with regions of di(cid:11)erent luminance\nembedded in a uniform background and mutually separated by a distance greater\nthan n would constitute such an image. In general, sparse images present a prob-\nlem for this representation scheme, as might foveal or cortical \\magni(cid:12)cation,\" for\nexample. This issue could be addressed by using ordinal relations across multiple\nscales, perhaps in an adaptive way that varies with the smoothness or sparseness of\nthe stimulus.\n\nSecond, the regularization approach above seems biologically implausible. Our in-\ntent in using this approach for reconstructions was to show via well-understood\ntheoretical tools the richness of information that local ordinal representations pro-\n\n\fFigure 6: Reconstruction results from the relaxation approach.\n\nvide. In order to address the neural plausibility requirement, we have developed a\nsimple relaxation-based approach with purely local update rules of the kind that\ncan easily be implemented by cortical circuitry. Each unit communicates only with\nits immediate neighbors and modi(cid:12)es its value incrementally up or down (starting\nfrom an arbitrary state) depending on the number of ordinal relations in the positive\nor negative direction. This computation is performed iteratively until the network\nsettles to an equilibrium state. The update rule can be formally stated as\n\nRpa;t+1 = Rpa;t + (cid:1) X\n\n((cid:18)(Rpa;t (cid:0) Rpb;t) (cid:0) (cid:18)(Ipa (cid:0) Ipb ));\n\n(7)\n\npb\n\nwhere Rpa;t is the intensity of the reconstructed pixel pa at step t, Ipa is the in-\ntensity of the corresponding pixel in the original image, (cid:1) is a positive update\nrate, and (cid:18) and pb are as described above. Figure 6 shows four examples of image\nreconstructions performed using a relaxation-based approach.\n\nA third potential limitation is that the scheme does not appear to constitute a\ncompact code. If each pixel must be encoded in terms of its relations with all of\nits eight neighbors, where each relation takes one of three values, f(cid:0)1; 0; 1g, then\nwhat has been gained over the original image where each pixel is encoded by 8 bits?\nThere are three ways to address this question.\n\n1. Eight relations per pixel is highly redundant { four are su(cid:14)cient. In fact, as\nshown in (cid:12)gure 7, the scheme can also tolerate several missing relations.\n\nFigure 7: Five reconstructions, shown here to demonstrate the robustness of local\nordinal encoding to missing inputs. From left to right: reconstructions based on\n100%, 80%, 60%, 40%, and 20% of the full set of immediate neighbor relations.\n\n2. An advantage to using ordinal relations is that they can be extracted and trans-\nmitted much more reliably than metric ones. These relations share the same spirit\n\n\f(a)\n\n(b)\n\nFigure 8: A small collection of ordinal relations (a), though insu(cid:14)cient for high\n(cid:12)delity reconstruction, is very e(cid:11)ective for pattern classi(cid:12)cation despite signi(cid:12)cant\nappearance variations. (b) Results of using a local ordinal relationship based tem-\nplate to detect face patterns. The program places white dots at the centers of\npatches classi(cid:12)ed as faces. (From Thoresz and Sinha, in preparation.)\n\nas loss functions used in robust statistics [6] and trimmed or Winsorized estimators.\n\n3. The intent of the visual system is often not to encode/reconstruct images with\nperfect (cid:12)delity, but rather to encode the most stable characteristics that can aid in\nclassi(cid:12)cation. In this context, a few ordinal relations may su(cid:14)ce for encoding objects\nreliably. Figure 8 shows the results of using less than 20 relations for detecting faces.\nClearly, such a small set would not be su(cid:14)cient for reconstructions, but it works\nwell for classi(cid:12)cation.\nIts generalization arises because it de(cid:12)nes an equivalence\nclass of patterns.\n\nIn summary, the ordinal representation scheme provides a neurally plausible strat-\negy for encoding signal structure. While in this paper we focus on demonstrating\nthe (cid:12)delity of this scheme, we believe that its true strength lies in de(cid:12)ning equiv-\nalence classes of patterns enabling generalizations over appearance variations in\nobjects. Several interesting directions remain to be explored. These include the\nstudy of ordinal representations across multiple scales, learning schemes for identi-\nfying subsets of ordinal relations consistent across di(cid:11)erent instances of an object,\nand the relationship of this work to multi-dimensional scaling [12] and to the use\nof truncated, quantized wavelet coe(cid:14)cients as \\signatures\" for fast, multiresolution\nimage querying [7].\n\nAcknowledgements\n\nWe would like to thank Gadi Geiger, Antonio Torralba, Ryan Rifkin, Gonzalo Ramos, and\nTabitha Spagnolo. Javid Sadr is a Howard Hughes Medical Institute Pre-Doctoral Fellow.\n\nReferences\n\n[1] A. Anzai, M. A. Bearse, R. D. Freeman, and D. Cai. Contrast coding by cells in\nthe cat\u2019s striate cortex: monocular vs. binocular detection. Visual Neuroscience,\n12:77{93, 1995.\n\n[2] N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 686:337{404,\n\n1950.\n\n[3] D. Bhat and S. Nayar. Ordinal measures for image correspondence. In IEEE Conf.\n\non Computer Vision and Pattern Recognition, pages 351{357, 1996.\n\n\f[4] G. C. DeAngelis, I. Ohzawa, and R. D. Freeman. Spatiotemporal organization of\nsimple-cell receptive (cid:12)elds in the cat\u2019s striate cortex. i. general characteristics and\npostnatal development. J. Neurophysiology, 69:1091{1117, 1993.\n\n[5] R. Herbrich, T. Graepel, and K. Obermeyer. Support vector learning for ordinal\nIn Proc. of the Ninth Intl. Conf. on Arti(cid:12)cial Neural Networks, pages\n\nregression.\n97{102, 1999.\n\n[6] P. Huber. Robust Statistics. John Wiley and Sons, New York, 1981.\n\n[7] C. E. Jacobs, A. Finkelstein, and D. H. Salesin. Fast multiresolution image querying.\nIn Computer Graphics Proc., Annual Conf. Series (SIGGRAPH 95), pages 277{286,\n1995.\n\n[8] T. Poggio. On optimal nonlinear associative recall. Biological Cybernetics, 19:201{209,\n\n1975.\n\n[9] K. Thoresz and P. Sinha. Qualitative representations for recognition. Vision Sciences\n\nSociety Abstracts, 1:81, 2001.\n\n[10] A. N. Tikhonov and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston,\n\nWashington, D.C., 1977.\n\n[11] G. Wahba. Spline Models for Observational Data. Series in Applied Mathematics,\n\nVol 59, SIAM, Philadelphia, 1990.\n\n[12] F. W. Young and C. H. Null. Mds of nominal data: the recovery of metric information\n\nwith alscal. Psychometika, 53.3:367{379, 1978.\n\n\f", "award": [], "sourceid": 2018, "authors": [{"given_name": "Javid", "family_name": "Sadr", "institution": null}, {"given_name": "Sayan", "family_name": "Mukherjee", "institution": null}, {"given_name": "Keith", "family_name": "Thoresz", "institution": null}, {"given_name": "Pawan", "family_name": "Sinha", "institution": null}]}