{"title": "A VLSI Neural Network for Color Constancy", "book": "Advances in Neural Information Processing Systems", "page_first": 370, "page_last": 376, "abstract": null, "full_text": "A VLSI Neural Network for Color Constancy \n\nAndrew Moore \nComputation and Neural Systems Program, 116-81 \nCalifornia Institute of Technology \nPasadena, CA 91125 \n\nGeoffrey Fox\u00b7 \nDept. of Physics \nCalifornia Institute of Technology \nPasadena, CA 91125 \n\nJohn Allman \nDept. of Biology, 216-76 \nCalifornia Institute of Technology \nPasadena, CA 91125 \n\nRodney Goodman \nDept. of Electrical Engineering, 116-81 \nCalifornia Institute of Technology \nPasadena, CA 91125 \n\nAbstract \n\nA system for color correction has been designed, built, and tested suc(cid:173)\ncessfully; the essential components are three custom chips built using sub(cid:173)\nthreshold analog CMOS VLSI. The system, based on Land's Retinex the(cid:173)\nory of color constancy, produces colors similar in many respects to those \nproduced by the visual system. Resistive grids implemented in analog \nVLSI perform the smoothing operation central to the algorithm at video \nrates. With the electronic system, the strengths and weaknesses of the \nalgorithm are explored. \n\n1 A MODEL FOR COLOR CONSTANCY \n\nHumans have the remarkable ability to perceive object colors as roughly constant \neven if the color of the illumination is varied widely. Edwin Land, founder of the \nPolaroid Corporation, models the computation that results in this ability as three \nidentical center-surround operations performed independently in three color planes, \nsuch as red, green, and blue (Land, 1986). The basis for this model is as follows. \n\nConsider first an array of grey papers with different reflectances. (Land designated \nthese arrays Mondrians, since they resemble the works of the Dutch painter Piet \n\n\u00b7Present address: Dept. of Physics, Syracuse University, Syracuse, NY 13244 \n\n370 \n\n\fA VLSI Neural Network for Color Constancy \n\n371 \n\nMondrian.) Land illuminated a Mondrian with a gradient of illumination, ten times \nmore bright at the top than at the bottom, so that the flux reaching the eye from a \ndark grey patch at top was identical to the flux from a light grey patch at bottom. \nSubjects reported that the top paper was dark grey and the bottom paper was \nlight grey. Land accounted for this with a center minus surround model. At each \npoint in an image, the incoming light is compared to a spatial average of light in \nthe neighborhood of the point in question. Near the top of the Mondrian, the \nabundance of white is sensed and subtracted from the central sensor to normalize \nthe central reading with respect to neighboring values, weighted with distance; \nnear the bottom, the abundance of dark is sensed and used to correct the central \nreading. Land proposed that the weighting function of the surround is a monotonic \ndecreasing function of distance, such as l/r2. \n\nIn earlier work, similar experiments were carried out with color Mondrians (Land, \n1977; McCann et. al., 1976). However, instead of varying the intensity of illumi(cid:173)\nnation, Land and his colleagues varied the color of the illumination. The color of \npatches in a Mondrian remained nearly constant despite large changes in the illu(cid:173)\nminant color. This is the phenomenon of color constancy: the ability of observers \nto judge, under a wide variety of lighting conditions, the approximate reflectance \nor intrinsic color of objects. Land and his colleagues proposed a variety of different \nmodels for this phenomenon, collectively referred to as Retinex models. (The term \nRetinex was coined by Land since he was not sure whether the computation was \ngoing on in the retina, the cortex, or both.) In his most recent paper on the subject \n(Land, 1986), Land simply extended the black-and-white model to the three color \ndimensions. In each of three independent color planes, the color at a given point is \ncompared to that of the points surrounding it, weighted as 1/ r2. \n\n2 EFFICIENT CALCULATION OF THE SURROUND \n\nIn practical terms, the Retinex algorithm corresponds to subtracting from an image \na blurred version of itself. The distance weighting (type of blurring) Land proposes \nvaries as l/r2, so the operation is a center minus surround operation, where the \nsurround is the center convolved with a l/r2 kernel. \n\n(1) \n\nwhere Ii is the signal or lightness in color plane i, and I~ is the log of the signal. The \nlogs are important since the signal is composed of illuminant times reflectance and \nthe log of a product is a sum. By subtracting the blurred version of the image after \ntaking logs, the illuminant is subtracted away in the ideal case (but see below) . \n\nThis type of Retinex algorithm, then, has a psychophysical basis and sound compu(cid:173)\ntational underpinnings (Hurlbert, 1986). But the complexity is too great. Since the \nrequired surround is so large, such a convolution across an N xN pixel image entails \non the order of N 4 operations. On a chip, this corresponds to explicit connections \nfrom each pixel to most if not all other pixels. \n\nA similar operation can be carried out much more efficiently by switching from \n\n\f372 Moore, Allman, lOx, and Goodman \n\na convolution to a resistive grid calculation. The operations are similar since the \nweighting of neighboring points (Green's function) in a resistive grid decreases in \nthe limit as the exponential of the distance from a given location on a resistive grid \n(Mead, 1989). Again, the kernel is a monotonic decreasing function. With this type \nof kernel, the operation in each Retinex (color channel) is \n\n(2) \n\nwhere A is the length constant or extent of weighting in the grid. Since the calcula(cid:173)\ntion is purely local, the complexity is reduced dramatically from O(N 4 ) to O(N2). \nOn a chip, a local computation corresponds to connections only between nearest(cid:173)\nneighbor pixels. \n\n3 EVALUATION OF THE ALGORITHM WITH \n\nCOMPUTER SIMULATIONS \n\n3.1 STRENGTHS AND WEAKNESSES OF THE ALGORITHM \n\nImages of a subject holding a color poster were captured under fluorescent and \nincandescent light with an RGB video camera and a 24 bit frame grabber. First, \nthe camera was adjusted so that the color looked good under fluorescent light. Next, \nwithout readjusting the camera, the fluorescents were turned off and the subject was \nilluminated with incandescent light. The results were unacceptable. The skin color \nwas very red, and, since the incandescent lamp was not very bright, the background \nwas lost in darkness. The two images were processed with the Land algorithm, using \nresistive grids to form the surround for subtraction. Details of the simulations and \ncolor images can be found in (Moore et. ai, 1991). For the good, fluorescent image, \nthe processing improved the image contrast somewhat. For the poor, incandescent \nimage, the improvement was striking. Skin color was nearly normal, shadows were \nsoftened, and the the background was pulled out of darkness. \n\nComputer simulation also pointed out two weaknesses of the algorithm: color Mach \nbands and the greying out of large monochromatic regions. Color Mach bands \narise from this algorithm in the following way. Suppose that a strongly colored \nregion, e.g. red, abuts a grey region. In the grey region, the surround subtracted \nat a given point has a strong red component. Therefore, after subtraction of the \nsurround, a grey point is rendered as grey minus red, or equivalently, grey plus \nthe complementary color of red, which is blue-green. Since the surround weighting \ndecreases with distance, the points in the image closest to the red area are strongly \ntinged with blue-green, while points further away are less discolored. Induction of \nthis sort in black-and-white images is known as the Mach band effect. An analogous \ninduction effect in color is intrinsic to this algorithm. \nGreying out of large colored areas is also an intrinsic weakness of the algorithm. \nThe surrounds used in the simulations are quite large, with a length constant of \nnearly one third of the image. Often a large portion of an image is of a single color, \ne.g. a blue sky commonly fills the upper half of many natural scenes. In the sky \nregion, the surround samples mostly blue, and with subtraction, blue is subtracted \nfrom blue, leaving a grey sky. This effect illustrates the essence of the algorithm \n\n\fA VLSI Neural Network for Color Constancy \n\n373 \n\n- it operates under a grey world assumption. The image for which this algorithm \nis ideal is richly colored, with reds and their green complements, yellows and their \nblue complements, and whites with their black complements. In such images, the \nlarge surround is sampling the color of a grey \"mirror\", since the sum of a color \nand its complement is grey. If this condition holds, the color subtracted when the \nsurround is subtracted from a point in the image is the color of the illuminant; \nthe surround acts as a dull grey mirror which reflects the illuminant. [Many color \nconstancy schemes rely on this assumption; for a review see (Lennie and D'Zmura, \n1988).] \n\n3.2 AN EXTENSION TO THE LAND ALGORITHM \n\nThese two weaknesses arise from too much surround subtraction in solidly colored \nareas. One way the minimize the effects is to modulate the surround with a measure \nof image structure, which we call edginess, before subtraction. So, while for the \noriginal algorithm, the operation is output = center - surround, to ameliorate \ninduction effects and lessen reliance on the grey world assumption, the surround \nweight should be modified pointwise. In particular, if edginess is given a value \nclose to zero in homogeneous regions like the blue sky, and is given a value close \nto one in detailed areas, a better formulation is output = center - surround . \nedginess. In this relation, the surround is effectively zeroed in smooth areas before \nit is subtracted, so that induction is diminished - more of the original color is \nretained. The extended algorithm, then, is a working compromise between color \nconstancy via strict application of the grey world assumption and no color constancy \nat all. To compute a measure of spatial structure, the average magnitude of the \nfirst spatial derivatives is found at each point in each color plane is smoothed on \na resistive grid; the output at a given point is multiplied with the surround value \nfrom the corresponding point of first resistive grid. In our simulations, the modified \nalgorithm reduces (but does not eliminate) color Mach bands, and returns color to \nlarge monochromatic regions such as the the sky in the example image discussed \nabove, at the cost of one additional resistive grid per color channel. This extension \nis not the whole answer, however. If a large region is highly textured (for example, \nif there is a flock of birds in the sky), edginess is high, the surround is subtracted \nat near full strength, and the sky is rendered grey in the textured region. This is \na subject of continuing research. We implemented the original algorithm, but not \nthis extension of it, using analog VLSI. \n\n4 VLSI IMPLEMENTATION OF THE RETINEX \n\nALGORITHM \n\nTo realize a real-time electronic system of video camera color correction based on \nLand's algorithm, the three color outputs of a video camera are fed onto three \nseparate resistive grids built from subthreshold analog CMOS VLSI. Each 48 by 47 \nnode resistive grid was built using 2 micron design rules and contains about 60,000 \ntransistors. The circuit details within each pixel are similar to those of the analog \nretina (Mead, 1989); technical details of the system may be found in (Moore et.al., \n1991). \n\n\f374 Moore, Allman, fux, and Goodman \n\nComputer simulations are quite costly in terms of time and disk storage. With a \nreal-time system, it is possible to intensively investigate the strengths and weak(cid:173)\nnesses of this color correction algorithm quickly and economically. \n\n4.1 REAL-TIME VERIFICATION OF ALGORITHM STRENGTHS \n\n4.1.1 Dynamic range enhancement \n\nA common problem with video imaging is that the range of an image exceeds the \ndynamic range of the camera sensors. For example, consider an image comprised \nof an indoor scene and an outdoor scene viewed through a window. The indoor \nillumination (e.g., direct sunlight) can be one thousand times or more brighter than \nthe indoor illumination (e.g., artificial lights or indirect sunlight). A video camera \ncan only capture one portion of the scene with fidelity. By opening up the camera \niris so that a lot of light falls on the camera sensors, the indoor scene looks good, \nbut the outdoor scene is awash in white. Conversely, by closing the camera iris so \nthat less light falls on the camera sensors, the outdoor scene looks good, but the \nindoor scene is rendered as deep black. \nIn fact, the image information is often not lost in this troublesome situation. Most \nsensors are not linear, but instead have a response function that resembles a hyper(cid:173)\nbolic tangent. Rather than saturating at the extremes of the response range, most \nsensors compress information near those response extremes. With a center-surround \nprocessing stage following a camera, the information \"squashed\" near the camera \nrange limits can be recovered. In extremely bright portions of an image, white is \nsubtracted from white, \"pulling\" the signal toward the mid-range, so that details in \nthat portion of the scene become defined. Similarly, in dark portions of the scene, \ndark is subtracted from dark and the details of the indoor portion of the example \nimage are visible. Thus the Land algorithm as applied to video imaging can enhance \nthe dynamic range of video cameras. [This strength of the algorithm was predicted \nfrom the similar capability of the (black-and-white) silicon retina (Mead, 1989) - it \nhas a dynamic range that exceeds by far the range of conventional cameras since it \nincorporates light sensors and center-surround processing on one chip.] \n\n4.1.2 Color constancy \n\nFor a richly colored scene, the Land algorithm can remove strongly colored illu(cid:173)\nmination, with some qualifications. We constructed a color Mondrian with many \ndifferently colored patches of paper, and illuminated it with ordinary fluorescent \nlight plus various colored lights. Under a wide range of conditions, the color of \nthe Mondrian as viewed on a video monitor changes with the illumination while \nit looks fairly stable to an observer. After passing the images through the elec(cid:173)\ntronic color compensation system, the image is also fairly stable for a wide variety \nof illumination conditions. There is a significant difference, however, between what \nan observer sees and what the corrected camera image reports. The video images \npassed through the electronic implementation of the Land algorithm take on the \nilluminant somewhat in portions of the image that are brighter than average, and \ntake on the complementary color of the illuminant in portions that are darker than \naverage. For example, for a blue illumination, the raw video image looks bluer all \nover. The processed image changes in a different way. White patches are faintly \n\n\fA VLSI Neural Network for Color Constancy \n\n375 \n\nblue (much less as compared to the raw image), and black patches (which remain \nblack in the raw image) are tinged with yellow. There is psychophysical evidence \nthat the same effects are noted by human observers (see Jameson and Hurvich, \n1989, for a review), but they are much less pronounced than those produced by \nthe Land algorithm in our experience. Still, the overall effect of constancy in the \nprocessed images is convincing as compared to the raw images. \n\n4.2 REAL-TIME VERIFICATION OF ALGORITHM \n\nWEAKNESSES \n\n4.2.1 Color Mach bands and greying of large regions \n\nTo our surprise, the color Mach band effect, explained above, is less pronounced \nthan we expected; for many scenes the induction effects are not noticeable. It \nis possible to the see the Mach bands clearly by placing colored cards on a grey \nbackground - the complementary color of the card surrounds the card as a halo \nthat diminishes with distance from the card. \n\nSince the Retinex algorithm relies on the grey world assumption, the algorithm fails \nwhere this assumption fails to hold. With the real-time system, we have demon(cid:173)\nstrated this in many ways. For example, if the video camera is pointed at the color \nMondrian and the hand of a Caucasian investigator (with a reddish skin tone) is \nslowly moved in front of the camera lens, the Mondrian in the background slowly \ngrows more green. Green is the complementary color of red. Another example of \npractical importance is revealed by zooming in on a particular patch of the Mon(cid:173)\ndrian. As more and more of the image is filled with this patch, the patch grows \ngreyer and greyer, because the correction system subtracts the patch color from \nitself. \n\n4.2.2 Scene dependence of color constancy \n\nAs described above, we were impressed with this algorithm after simulating it on \na digital computer. The skin tone of a subject, deeply reddened by incandescent \nlight, was dramatically improved by the algorithm. In the computer study, the \nsubject's face was, by accident rather than design, just in the middle of a large \nwhite patch and a large black patch. The electronic system yields perfect constancy \nof skin tone with this configuration also, but not for an arbitrary configuration. \nIn short, the color constancy afforded by this algorithm is scene dependent; to \nconsistently produce perfect color constancy of an object with the real-time system, \nit is necessary to place the object carefully within a scene. We are still investigating \nthis weakness of the algorithm. Whether it is camera dependent (i.e., the result of \ncamera nonlinearities) remains to be seen. \n\n5 Conclusion \n\nAfter studying the psychophysics and the computational issues in color constancy, \nencouraging preliminary results for a particular version of Land's Retinex algorithm \nwere obtained in computer simulation. In order to study the algorithm intensively, \nan electronic system was developed; the system uses three resistive grids built from \n\n\f376 Moore, Allman, fux, and Goodman \n\nsubthreshold analog CMOS VLSI to form a blurred version of the image for sub(cid:173)\ntraction from the original. It was found that the system produces images that are \nmore constant, in a sense, than raw video images when the illuminant color varies. \nHowever, the constancy is more apparent than real; if absolute constancy of a par(cid:173)\nticular object is desired, that object must be carefully placed in its surroundings. \nThe real-time system allowed us to address this and other such practical issues of \nthe algorithm for the first time. \n\nAcknowledgements \n\nWe are grateful to many of our colleagues at Cal tech and elsewhere for discussions \nand support in this endeavor. A.M. was supported by fellowships from the Parsons \nFoundation and the Pew Charitable Trust and by research assistantships from Office \nof Naval Research, the Joint Tactical Fusion Program and the Center for Research in \nParallel Computation. We are grateful to DARPA for MOSIS fabrication services, \nand to Hewlett Packard for computing support in the Mead Lab. The California \nInstitute of Technology has filed for a U.S. patent for this and other related work. \n\nReferences \n\nA. Hurlbert. (1986) Formal connections between lightness algorithms. J. Opt. Soc. \nAm. A3: 1684-1693. \nD. Jameson & L.M. Hurvich (1989). Essay concerning color constancy. Ann. Rev. \nPsych 01. 40:1-22. \n\nE.H. Land. (1977) The Retinex theory of color vision. Scientific American 237:108-\n128. \n\nE.H. Land. (1986) An alternative technique for the computation of the designator \nin the retinex theory of color vision. Proc. Natl. Acad. Sci. USA 83:3078-3080. \nP. Lennie & M. D'Zmura. (1988) Mechanisms of color vision. eRG Grit. Rev. \nNeurobiol. 3(4):333-400. \nJ.J. McCann, S.P. McKee, & T.H. Taylor. (1976) Quantitative studies in Retinex \ntheory. Vision Res. 16:445-458. \n\n(1989) Analog VLSI and Neural Systems. Reading, MA: Addison(cid:173)\n\nC.A. Mead. \nWesley. \nA. Moore, J. Allman, & R. Goodman. (1991) A Real-time Neural System for Color \nConstancy. IEEE Trans. Neural Networks 2(2) In press. \n\n\f", "award": [], "sourceid": 396, "authors": [{"given_name": "Andrew", "family_name": "Moore", "institution": null}, {"given_name": "John", "family_name": "Allman", "institution": null}, {"given_name": "Geoffrey", "family_name": "Fox", "institution": null}, {"given_name": "Rodney", "family_name": "Goodman", "institution": null}]}