{"title": "Unsupervised Color Decomposition Of Histologically Stained Tissue Samples", "book": "Advances in Neural Information Processing Systems", "page_first": 667, "page_last": 674, "abstract": "", "full_text": "Unsupervised Color Decomposition of\nHistologically Stained Tissue Samples\n\nA. Rabinovich\n\nDepartment of Computer Science\nUniversity of California, San Diego\n\namrabino@ucsd.edu\n\nS. Agarwal\n\nDepartment of Computer Science\nUniversity of California, San Diego\n\nsagarwal@cs.ucsd.edu\n\nC. A. Laris\nQ3DM, Inc.\n\nclaris@q3dm.com\n\nJ.H. Price\n\nDepartment of Bioengineering\n\nUniversity of California, San Diego\n\njhprice@ucsd.edu\n\nS. Belongie\n\nDepartment of Computer Science\nUniversity of California, San Diego\n\nsjb@cs.ucsd.edu\n\nAbstract\n\nAccurate spectral decomposition is essential for the analysis and\ndiagnosis of histologically stained tissue sections. In this paper we\npresent the \ufb01rst automated system for performing this decompo-\nsition. We compare the performance of our system with ground\ntruth data and report favorable results.\n\n1 Introduction\n\nPotentially cancerous tissue samples are analyzed by staining them with a combi-\nnation of two or more dyes. We consider the problem of recovering the amount of\ndye absorbed for each of the stains from a stack of hyperspectral images of the tis-\nsue sample. Since the exact spectral pro\ufb01le of the dyes varies from one experiment\nto the next and is not available to the pathologist, the problem is an instance of\nblind source separation. The problem is of special interest to clinical and research\npathologists as the amount of dye absorbed by the sample is used to determine a\nquantitative estimate of the amount of cancerous cells present in the tissue.\nThe current state of the art solution requires an expert to hand click representative\npoints in the tissue image to indicate \u201cpure\u201d dye spectra. This procedure requires\nhuman intervention and hence is time consuming and error prone.\nIn this paper we present the \ufb01rst system capable of performing this color decomposi-\ntion in a fully automated manner. We also describe a novel procedure for acquiring\nthe ground truth data and quantifying the performance of our system.\n\n\fThe organization of the paper is as follows. In section 2 we address the problem of\nimage alignment in hyperspectral stacks. Section 3 presents the problem of color\nunmixing and proposes two unsupervised techniques as solutions. Data acquisition\nand experiments are discussed in Section 4. Section 5 summarizes the study and\nprovides concluding remarks.\n\n2 Multi-Spectral Alignment\n\nColor unmixing is a challenging problem in itself, but it is complicated further by the\npracticalities of multispectral imaging: the component spectral images are usually\nmisaligned, due to chromatic aberration and shifting of the stage. If the images\ncomprising the spectral stack are out of alignment by as little as half a pixel, the\nestimated stain percentages at a given pixel can be altered drastically. This can\nresult in large inaccuracies in the resulting cancer diagnosis.\nEmpirically, we have observed that the misalignments between images in the spec-\ntral stack can be modeled as small a\ufb03ne transforms, i.e. global translation, stretch-\ning, and rotation. Letting I(x) and J(x) denote two images, where x = (x, y)>,\nthis assumption is expressed as\n\nwhere A is the 2 \u00d7 2 matrix of a\ufb03ne coe\ufb03cients\n\nJ(Ax + d) = I(x)\n\nA =\n\n(cid:20) a11 a12\n\na21 a22\n\n(cid:21)\n\nand d is a 2D translation vector.\nIn the case of unimodal images, the iterative method of Shi and Tomasi [12] has\nbeen very successful for the estimation of di\ufb00erential (subpixel) a\ufb03ne transforms,\ne.g. from frame to frame in a video sequence. However, feeding cross-modal images\ndirectly to this algorithm is ine\ufb00ective since they violatie the brightness constancy\nassumption [3]. We have observed, however, that the high spatial-frequency struc-\ntures, e.g. edges and lines, tend to be consistent throughout the stack. This forms\nthe basis of our alignment technique. We use the Shi-Tomasi algorithm on a band-\npass \ufb01ltered version of the images in the stack. To perform the \ufb01ltering we apply a\nLaplacian of Gaussian (LoG) kernel [8], expressed as\nh(x) = \u22072e\u2212kxk2/2\u03c32\n\nwhere \u03c3 controls the width of the \ufb01lter, to each image. The LoG kernel acts as a\nbandpass \ufb01lter, suppressing constant regions and smooth shading, admitting edges\nand lines, and suppressing high frequency noise. We empirically determined the\noptimal parameters for the \ufb01ltering to be \u03c3=0.5 and a window size of 10 pixels. With\nthis step used as preprocessing, Shi and Tomasi\u2019s algorithm is able to register this\npair of images. An example of a synthesized color image composed of a 3D spectral\nstack is shown with and without this registration step in Figure 1; the blurring\ncaused by misalignment and the subsequent sharpening resulting from registration\nis evident.\n\n3 Color Unmixing\n\nOnce the registration problem is adequately addressed, we can proceed with the\ndetermination of stain concentrations. The problem in its full generality is an\ninstance of the blind source separation problem. Given a spectral stack of ns images\n\n\fFigure 1: Synthesized color image representation of the same tissue core from a 10\ndimensional spectral stack (a) with and (b) without di\ufb00erential a\ufb03ne registration.\n\nobtained from imaging a tissue sample stained with nd dyes, with ns > nd, we wish\nto recover the staining due to each individual dye.\nIn an ideal world, the spectral pro\ufb01le of each dye would be exactly aligned with one\nof the spectral bands, and the absorptions measured therein would directly yield\nthe stain concentrations. Realistically, however, the spectral pro\ufb01le of the dyes\noverlap and extend over several spectral bands, and the goal of recovering the nd\ncomponents representing the dye percentages requires more careful analysis.\nThe problem of unmixing the dyes can be formulated as a matrix factorization\nproblem:\n\nX = AS\n\n(1)\nHere X is an ns \u00d7 l column matrix, where l is the number of pixels and the entry\nXij is the brightness of the ith pixel in the image in to the jth spectral band. The\nmatrix A is an ns \u00d7 nd matrix where each column of the matrix corresponds to the\none of the dyes used in staining the tissue. S is a ns \u00d7 l matrix, with the entry Sij\nindicating the contribution of the ith dye to the jth pixel.\nThe current state of the art solution for this problem in the \ufb01eld of automated\npathology is Color Deconvolution [11], which yields acceptable results, but requires\nmanual interaction in the form of mouse clicks on seed colors for the dyes. This is\nan example of a supervised technique. However, given the data matrix X, there are\na number of ways in which Equation (1) can be solved in a completely automatic\nmanner without any human intervention. The three main classes of such meth-\nods are Principal Component Analysis (PCA), Non-negative Matrix Factorization\n(NMF) and Independent Component Analysis (ICA).\nIn this work we assume that staining is an additive process. Once a part of a tissue\nhas been stained with a dye, addition of another stain can only increase the staining.\nThe additivity of the stains combined with the physical constraint that each dye\ncolor will have a non-negative response in each frequency band implies that A and\nB are forced to be restricted to the class of non-negative matrices.\nMethods based on PCA work by enforcing orthogonality constraints on the columns\nof A and are not well suited for recovering the factorization AS. PCA depends\nheavily on cancellation e\ufb00ects, i.e. a balancing of positive and negative terms as\noccurs with Gibbs\u2019 phenomenon in Fourier series. This will result in PCA returning\nA and S with negative entries which have no physical basis. In the following we\nshall investigate the use of algorithms based on NMF and ICA.\n\n\f3.1 Non-negative Matrix Factorization\n\nNMF is in principle well suited to the task of color unmixing, as it \ufb01nds a factor-\nization of X into A and S such that\n\nsubject to\n\n[A, S] = argmin\n\nkX \u2212 ASk\n\nA,S\n\nAij \u2265 0, Sij \u2265 0\n\n(2)\n\nThe above problem is underconstrained; it has a scale ambiguity. Given a solution\n[A, S] of the above problem, [\u03b1A, S/\u03b1] for \u03b1 6= 0 is also a solution to this problem.\nWe solve this problem by constraining each column of A to have unit norm. This\ndoes not a\ufb00ect the \ufb01nal solution, since only the proportion of each stain is needed\nin the \ufb01nal analysis; the exact intensity of the constituent stain is not important.\nThe choice of the norm k \u00b7 k decides the particular algorithm used for performing\nthe minimization. We have implemented an iterative algorithm for recovering the\nnon-negative factorization of a matrix due to Seung & Lee [7]. We use the L2 norm\nas a measure of the error.\n\n3.2\n\nIndependent Component Analysis\n\nAn alternate approach to matrix factorization is Independent Component Analysis\n(ICA)[4]. While Non-negative Matrix Factorization is based on enforcing a non-\nnegativity constraint, it says nothing about the image formation process.\nICA\nis based on a generative view of the data, where the data is assumed to be a\nresult of superpositioning a number of stochastically independent processes. In the\ncase of histological staining, this corresponds to assuming that each dye stains the\ntissue independently of all the other dyes. The rows of the matrix S represent the\nindividual stochastic processes and the columns of A code their interactions.\nWe implemented the Joint Approximate Diagonalization of Eigenmatrices (JADE)\nalgorithm to recover the independent components of X [2]. This algorithm calcu-\nlates the ICA decomposition of X by calculating the eigenvalue decomposition of\nthe cumulant tensor of the data. The eigenvalues of cumulant tensor are vectors\ncorresponding to the independent components of the mixture.\n\n4 Experimental Results\n\n4.1 Sample Preparation and Data Acqusition\n\nThe histologically stained tissues used in this study were derived from human biop-\nsies. The tissues were \ufb01xed in Bouin\u2019s solution, and embedded in para\ufb03n. Dewaxed\ntissue sections were exposed to polyclonal antibodies (PAB) generated against syn-\nthetic peptides and con\ufb01rmed to be speci\ufb01c for the proteins of interest. The sections\nwere stained using a diaminobenzidine (DAB)-based detection method employing\nthe Envision-Plus-Horseradish Peroxidase (HRP) system using an automated stain-\ning technique [5, 6]. The DAB immunohistochemistry stain used for the tissue\nsamples shown here covers the majority of the visible range of the color spectra\nunder the transmission of white light.\nGreat care must be taken in the acquisition of color images since the extraction of\nspectral information is highly dependent on the quality of the raw data. Hyper-\nspectral imaging has been shown to be the best means of doing so.\n\n\fA spectral image stack can be acquired using a number of di\ufb00erent approaches. We\nuse a setup based on a set of \ufb01xed bandpass \ufb01lters. The \ufb01lters are placed in the\noptical path of the light in front of the light source or camera and transmit only\nthe desired wavelength bands.\nIn the following experiments the images were acquired on a scanning cytometer\n[9, 10, 1] with a 20x, 0.5 NA Fluor Nikon objective lens using a set of 10 equally\nspaced band pass \ufb01lters ranging from 413 nm to 663 nm. The dynamic range of each\nof the spectral bands was maximized by controlling the gain and the exposure of\nthe imaging system. This is required to ensure an accurate hyperspectral-to-RGB\nreconstruction for result visualization. It is important to note that the gain and\nexposure coe\ufb03cients were inverted prior to the unmixing as they have no bearing\non the staining process.\nIn order to quantitatively evaluate the decomposition provided by NMF and ICA,\nwe prepared a set of ground truth data using the following procedure. Using a set of\nfour tissue samples, we \ufb01rst applied the DAB stain and captured the hyperspectral\nimage stack. We then added the hematoxylin stain and acquired a second image\nstack. The second stack serves as the input to our algorithm and the resulting\ndecomposition, which estimates the DAB staining, is compared with the \ufb01rst stack,\nwhich serves as the ground truth.\nWe now experimentally evaluate the use of NMF and ICA for the color decomposi-\ntion problem. While reconstruction error represents a simple quantitative measure,\nit does not provide a standard for judging how accurately the estimated compo-\nnents represent the dye concentrations. We quantify the performance by comparing\nthe ground truth single-stained image to the corresponding automatically extracted\ncomponent of the doubly-stained tissue sample. Figure 2 reports the performance\nof the two algorithms. The error measure used is\n\nP\nP\ni(Ii \u2212 \u02c6Ii)2\n\ni I 2\ni\n\nerror = 100 \u00d7\n\n(3)\n\nwhere the sum is over all pixels, and Ii and \u02c6Ii denote the ground truth and the\nestimate, respectively. Figure 3 shows the results of applying NMF and ICA to an\nimage patch.\n\nset1\nset2\nset3\nset4\noverall\n\nNMF\n18.15\n18.79\n4.47\n5.04\n12.65\n\nICA\n12.81\n14.99\n19.42\n18.12\n18.75\n\nFigure 2: This table shows the percent error for the two unmixing algo-\nrithms across the four image sets. The four sets of images are available at\nhttp://vision.ucsd.edu/.\n\n5 Discussion\n\nThe above experiments indicate that both NMF and ICA are capable of performing\ncolor decomposition of tissue samples stained with multiple histological dyes. How-\never, there remain a number of sources of error, both during image acquisition as\nwell as in the decomposition stage. These include errors due to imperfect focussing\n\n\f(a) DAB only\n\n(b) DAB & Hematoxylin\n\n(c) NMF\n\n(d) ICA\n\nFigure 3: Color unmixing using Non-negative Matrix factorization and Independent\nComponent Analysis. Figure (a) shows a segment of the tissue stained using DAB,\n(b) shows the same tissue segment with DAB and Hematoxylin staining. The image\nin \ufb01gure (b) serves as input to the two unmixing algorithms, the output of which is\nshown in (c) and (d). Figure (c) shows the DAB stain estimate produced by NMF\nand (d) shows the DAB staining estimated by ICA\n\nin the various spectral bands and distortion in the acquired images which cannot\nbe accounted for by optical \ufb02ow based alignment methods such as Shi & Tomasi\u2019s\nalgorithm. The principal source of discrepancy between the decomposition and the\nground truth images, however, is caused by the chemical interaction between the\nvarious dyes used for staining. Measurement error due to dye interaction can be as\nhigh as 15%[13]. In this light, both ICA and NMF provide good results, and we\nexpect that improvements in the image acquisition and registration procedure will\nresult in systems capable of delivering performance close to the theoretical optimum.\nIn conclusion, we have addressed the problem of image registration for the planes in\na hyperspectral stack for spectral information extraction and we proposed the use of\ntwo unsupervised algorithms, Non-negative Matrix Factorization and Independent\nComponent Analysis, for extracting the contributions of various histological stains\nto the overall spectral composition throughout the tissue sample. We demonstrate\nthe performance of these algorithms by comparing them with ground truth data.\nWe intend to address errors in the image acquisition and registration to further\nreduce the decomposition error in future work.\n\nReferences\n\n[1] M. Bravo-Zanoguera, B. V. Massenbach, A. L. Kellner, and J. H. Price. High-\nperformance autofocus circuit for biological microscopy. Review of Scienti\ufb01c Instru-\nments, 69(11):3966\u20133977, 1998.\n\n\f[2] Jean-Fracois Cardoso and Antoine Souloumiac. Blind beamforming for non gaussian\n\nsignals. IEE Proceedings-F, 140(6), December 1993.\n\n[3] B. K. P. Horn and B. G. Schunck. Determining optical \ufb02ow. Arti\ufb01cial Intelligence,\n\n17:185\u2013204, 1981.\n\n[4] A. Hyv\u00a8arinen, J. Karhunen, and E. Oja. Independent Component Analysis. John\n\nWiley & Sons, 2001.\n\n[5] S. Krajewski, M. Krajewska, L.M. Ellerby, K. Welsh, Z. Xie, Q.L. Deveraux, G.S.\nSalvesen, D.E. Bredesen, R.E. Rosenthal, G. Fiskum, and J.C. Reed. Release of\ncaspase - 9 from mitochondria during neuronal apoptosis and cerebral ischemia. Proc\nNatl Acad Sci USA, 96:5752\u20135757, 1999.\n\n[6] S. Krajewski, M. Krajewska, A. Shabaik, T. Miyashita, H.G. Wang, and J.C. Reed.\nImmunohistochemical determination of in vivo distribution of Bax, a dominant in-\nhibitor of Bcl - 2. American Journal of Pathology, 145:1323\u20131236, 1994.\n\n[7] D. D. Lee and H. S. Seung. Learning the parts of objects with nonnegative matrix\n\nfactorization. Nature, 401:788\u2013791, 1999.\n\n[8] David Marr. Vision: A Computational Investigation into the Human Representation\n\nand Processing of Visual Information. W. H. Freeman & Co., 1983.\n\n[9] J. H. Price. Scanning cytometry for cell monolayers. PhD thesis, University of Cali-\n\nfornia, San Diego, 1990.\n\n[10] J. H. Price, E. A. Hunter, and D. A. Gough. Accuracy of least squares designed spatial\n\ufb01r \ufb01lters for segmentation of images of \ufb02ourescence stained cell nuclei. Cytometry,\n25:303\u2013316, 1996.\n\n[11] Arnout C. Ruifrok and Dennis A. Johnston. Quanti\ufb01cation of histochemical staining\n\nby color deconvolution. Analyt Quant Cytol Histol, 23:291\u2013299, 2001.\n\n[12] Jianbo Shi and Carlo Tomasi. Good features to track. In Proc. IEEE Conf. Comput.\n\nVision and Pattern Recognition, pages 593\u2013600, 1994.\n\n[13] R. J. Wordinger, G. W. Miller, and D. S. Nicodemus, editors. Manual of Immunoper-\n\noxidase Techniques. Americal Society of Clinical Pathologists, 1985.\n\n\f", "award": [], "sourceid": 2497, "authors": [{"given_name": "Andrew", "family_name": "Rabinovich", "institution": null}, {"given_name": "Sameer", "family_name": "Agarwal", "institution": null}, {"given_name": "Casey", "family_name": "Laris", "institution": null}, {"given_name": "Jeffrey", "family_name": "Price", "institution": null}, {"given_name": "Serge", "family_name": "Belongie", "institution": null}]}