{"title": "Unmixing Hyperspectral Data", "book": "Advances in Neural Information Processing Systems", "page_first": 942, "page_last": 948, "abstract": null, "full_text": "U nmixing Hyperspectral  Data \n\nLucas  Parra,  Clay Spence,  Paul Sajda \n\nSarnoff Corporation, CN-5300,  Princeton,  NJ 08543, USA \n\n{lparra, cspence,psajda} @sarnoff.com \n\nAndreas Ziehe,  Klaus-Robert Miiller \n\nGMD  FIRST.lDA,  Kekulestr.  7,  12489 Berlin,  Germany \n\n{ziehe,klaus}@first.gmd.de \n\nAbstract \n\nIn hyperspectral imagery one pixel  typically  consists of a  mixture \nof the  reflectance  spectra of several  materials,  where  the  mixture \ncoefficients  correspond  to  the  abundances of the  constituting  ma(cid:173)\nterials.  We  assume linear combinations of reflectance spectra with \nsome additive normal sensor noise and derive a  probabilistic MAP \nframework  for  analyzing  hyperspectral  data.  As  the  material re(cid:173)\nflectance characteristics are not know  a priori, we face  the problem \nof  unsupervised  linear  unmixing.  The  incorporation  of  different \nprior  information  (e.g.  positivity  and  normalization  of the  abun(cid:173)\ndances)  naturally  leads  to  a  family  of  interesting  algorithms,  for \nexample  in  the  noise-free  case  yielding  an  algorithm  that  can  be \nunderstood as constrained independent component analysis (ICA). \nSimulations underline the usefulness of our theory. \n\n1 \n\nIntroduction \n\nCurrent hyperspectral remote sensing technology can form images of ground surface \nreflectance at a few  hundred wavelengths simultaneously, with wavelengths ranging \nfrom  0.4  to  2.5 J.Lm  and  spatial  resolutions  of  10-30 m.  The  applications  of  this \ntechnology include environmental monitoring and mineral exploration and mining. \nThe  benefit  of  hyperspectral  imagery  is  that  many  different  objects  and  terrain \ntypes can be characterized by  their spectral signature. \nThe first step in most hyperspectral image analysis systems is to perform a spectral \nunmixing to determine the original spectral signals of some set of prime materials. \nThe basic difficulty is  that for  a  given  image pixel the spectral reflectance patterns \nof the surface  materials  is  in  general  not  known  a  priori.  However  there  are  gen(cid:173)\neral  physical  and  statistical  priors  which  can  be  exploited  to  potentially  improve \nspectral unmixing.  In this paper we address the problem of unmixing hyperspectral \nimagery through incorporation of physical and statistical priors within  an unsuper(cid:173)\nvised Bayesian framework. \n\nWe  begin  by  first  presenting  the  linear  superposition  model  for  the  reflectances \nmeasured.  We then discuss the advantages of unsupervised over supervised systems. \n\n\fUnmixing Hyperspectral Data \n\n943 \n\nWe  derive  a  general maximum a  posteriori  (MAP)  framework  to find  the material \nspectra and  infer  the  abundances.  Interestingly,  depending  on  how  the  priors  are \nincorporated, the zero noise case yields  (i)  a  simplex approach or  (ii)  a  constrained \nleA algorithm.  Assuming  non-zero noise  our  MAP  estimate utilizes  a  constrained \nleast squares algorithm.  The two latter approaches are new algorithms whereas the \nsimplex  algorithm  has  been  previously suggested for  the  analysis  of hyperspectral \ndata. \n\nLinear  Modeling  To  a  first  approximation  the intensities  X  (Xi>.)  measured in \neach  spectral  band  A = 1, ... , L  for  a  given  pixel  i  = 1, ... , N  are  linear  combi(cid:173)\nnations  of the  reflectance  characteristics  S  (8m >.)  of the  materials  m  =  1, ... , M \npresent  in  that  area.  Possible  errors  of  this  approximation  and  sensor  noise  are \ntaken  into  account  by  adding  a  noise  term  N  (ni>').  In  matrix  form  this  can  be \nsummarized as \n\nX  =  AS + N,  subject to:  AIM =  lL,  A  ~ 0, \n\n(1) \n\nwhere  matrix  A  (aim)  represents  the  abundance  of  material  m  in  the  area  cor(cid:173)\nresponding  to  pixel  i,  with  positivity  and  normalization  constraints.  Note  that \nground inclination or a changing viewing angle may cause an overall scale factor for \nall bands that varies with the pixels.  This can be incorporated in the model by sim(cid:173)\nply replacing the constraint AIM =  lL with AIM  ~ lL which does does not affect \nthe  discussion  in  the remainder  of the paper.  This is  clearly  a  simplified  model  of \nthe physical phenomena.  For example,  with spatially fine  grained  mixtures,  called \nintimate  mixtures,  multiple  reflectance  may  causes  departures  from  this  first  or(cid:173)\nder  model.  Additionally  there  are  a  number  of inherent  spatial  variations  in  real \ndata, such as inhomogeneous vapor and dust particles in  the atmosphere, that will \ncause a  departure from  the linear model in equation  (1).  Nevertheless,  in  practical \napplications a  linear model has produced reasonable results for  areal  mixtures. \n\nSupervised  vs.  Unsupervised  techniques  Supervised spectral  un mixing  re(cid:173)\nlies  on  the  prior  knowledge  about  the  reflectance  patterns  S  of candidate surface \nmaterials, sometimes called  endmembers,  or expert knowledge  and a series of semi(cid:173)\nautomatic steps  to find  the constituting materials in  a  particular scene.  Once  the \nuser identifies a  pixel i  containing a  single material, i.e.  aim  =  1 for  a  given  m  and \ni,  the corresponding spectral characteristics of that material can be taken directly \nfrom  the observations, i.e.,  8 m >.  =  Xi>.  [4].  Given knowledge about the endmembers \none can simply find  the abundances by solving a constrained least squares problem. \nThe problem with such supervised techniques is  that finding  the correct S  may re(cid:173)\nquire substantial user interaction and the result may be error prone, as a pixel that \nactually contains a  mixture can be misinterpreted as a  pure endmember.  Another \napproach obtains endmembers  directly  from  a  database.  This  is  also  problematic \nbecause the actual surface material on the ground may not match the database en(cid:173)\ntries,  due  to atmospheric absorption or other noise sources.  Finding close  matches \nis  an ambiguous process as some endmembers have very similar reflectance charac(cid:173)\nteristics and may match several entries in the database. \nUnsupervised unmixing,  in contrast, tries to identify the endmembers and mixtures \ndirectly from the observed data X  without any user interaction.  There are a variety \nof such approaches.  In one approach a simplex is fit to the data distribution [7,  6,  2]. \nThe resulting vertex points of the simplex represent  the  desired  endmembers,  but \nthis  technique  is  very  sensitive  to  noise  as  a  few  boundary points  can  potentially \nchange the location of the simplex vertex points considerably.  Another approach by \nSzu [9]  tries to find abundances that have the highest entropy subject to constraints \nthat the  amount  of materials is  as  evenly  distributed  as  possible - an  assumption \n\n\f944 \n\nL.  Parra,  C.  D.  Spence,  P  Sajda,  A.  Ziehe and K.-R.  Muller \n\nwhich is clearly not valid in many actual surface material distributions.  A relatively \nnew  approach  considers  modeling  the  statistical  information  across  wavelength  as \nstatistically  independent  AR  processes  [1].  This  leads  directly  to  the  contextual \nlinear  leA algorithm  [5].  However,  the approach in  [1]  does  not take into account \nconstraints on  the  abundances,  noise,  or prior information.  Most  importantly,  the \nmethod  [1]  can only  integrate information from  a  small number of pixels  at a  time \n(same  as  the number of endmembers).  Typically  however  we  will  have  only  a  few \nendmembers but many thousand pixels. \n\n2  The Maximum A  Posterior Framework \n\n2.1  A  probabilistic model of unsupervised spectral unmixing \n\nOur  model  has  observations  or  data  X  and  hidden  variables  A,  S,  and  N  that \nare explained by  the  noisy  linear model  (1).  We  estimate the values  of the hidden \nvariables  by  using MAP \n\n(A  SIX) = p(XIA, S)p(A, S)  = Pn(XIA, S)Pa(A)ps(S) \np \n\np(X) \n\n, \n\np(X) \n\n(2) \n\nwith Pa(A),  Ps(S),  Pn(N)  as  the  a  priori  assumptions  of  the  distributions.  With \nMAP we estimate the most probable values for given priors after observing the data, \n\nA MAP , SMAP  =  argmaxp(A, SIX) \n\nA,S \n\n(3) \n\nNote that for  maximization the constant factor p(X)  can  be  ignored.  Our first  as(cid:173)\nsumption, which is indicated in equation (2)  is that the abundances are independent \nof the reflectance spectra as  their origins are completely  unrelated:  (AO)  A  and S \nare independent. \n\nThe MAP  algorithm is  entirely defined  by  the choices  of priors  that  are  guided by \nthe  problem  of  hyperspectral  unmixing:  (AI)  A  represent  probabilities  for  each \npixel i.  (A2)  S  are independent  for  different  material m.  (A3)  N  are normal  i.i.d. \nfor  all i, A.  In  summary, our MAP  framework includes the assumptions AO-A3. \n\n2.2 \n\nIncluding Priors \n\nPriors on the abundances  Positivity and normalization of the abundances can \nbe represented as, \n\n(4) \nwhere 60  represent the Kronecker delta function  and eo the step function.  With \nthis choice a  point not satisfying the constraint will  have zero a posteriori probabil(cid:173)\nity.  This prior introduces no  particular bias of the solutions other then  abundance \nconstraints.  It does  however  assume  the abundances of different  pixels  to be  inde(cid:173)\npendent. \n\nPrior  on  spectra  Usually  we  find  systematic  trends  in  the  spectra  that  cause \nsignificant  correlation.  However  such  an  overall  trend  can  be  subtracted  and/or \nfiltered  from  the  data leaving  only  independent  signals  that  encode  the  variation \nfrom  that overall trend.  For example one  can capture  the conditional dependency \nstructure with a linear auto-regressive (AR)  model and analyze the resulting  \"inno(cid:173)\nvations\"  or prediction errors [3].  In our model we assume that the spectra represent \nindependent instances of an AR process having a white innovation process em.>.  dis(cid:173)\ntributed  according  to Pe(e).  With  a  Toeplitz  matrix  T  of  the  AR  coefficients  we \n\n\fUnmixing Hyperspectral Data \n\n945 \n\ncan write,  em  = Sm T.  The AR coefficients can be found  in  a  preprocessing step on \nthe observations X.  If S  now represents the innovation process itself,  our prior can \nbe represented as, \n\nPe (S)  <X  Pe(ST)  =  II II Pe( L  sm>.d>.>.,)  , \n\nM \n\nL \n\nL \n\nm=1 >.=1 \n\n>.'=1 \n\n(5) \n\nAdditionally Pe (e)  is  parameterized by a mean and scale parameter and potentially \nparameters determining  the  higher  moments  of the  distributions.  For  brevity  we \nignore the details  of the parameterization in this paper. \n\nPrior on the noise  As  outlined in the introduction there are a  number of prob(cid:173)\nlems  that  can  cause  the  linear  model  X  =  AS  to  be  inaccurate  (e.g.  multiple \nreflections,  inhomogeneous  atmospheric  absorption,  and  detector  noise.)  As  it  is \nhard to treat all these phenomena explicitly,  we suggest to pool them into one noise \nvariable that we  assume for  simplicity to be normal distributed with  a  wavelength \ndependent  noise  variance a>., \n\np(XIA, S) = Pn(N) = N(X - AS,~) = II N(x>.  - As>., a>.l) , \n\nL \n\n(6) \n\nwhere N (', .) represents a zero mean Gaussian distribution, and 1 the identity matrix \nindicating the independent  noise  at each pixel. \n\n>.=1 \n\n2.3  MAP  Solution for  Zero  Noise  Case \n\nLet us consider the noise-free case.  Although this simplification may be inaccurate it \nwill allow us to greatly reduce the number of free hidden variables - from N M + M L \nto M2 . In the noise-free case the variables A, S are then deterministically dependent \non  each other  through a  N L-dimensional 8-distribution, Pn(XIAS)  =  8(X - AS). \nWe  can  remove  one  of these variables from  our discussion  by  integrating  (2).  It is \ninstructive to first  consider removing A \n\np(SIX)  <X I dA 8(X - AS)Pa(A)ps(S) =  IS-1IPa(XS- 1 )Ps(S). \n\n(7) \n\nWe omit tedious details and assume L  =  M  and invertible S so that we can perform \nthe  variable  substitution  that  introduces  the  Jacobian  determinant  IS-II .  Let  us \nconsider  the influence  of  the  different  terms.  The  Jacobian determinant  measures \nthe volume spanned by the endmembers S.  Maximizing its inverse will therefore try \nto shrink the simplex spanned by  S.  The term Pa(XS- 1 )  should guarantee that all \ndata points map into the inside of the simplex, since the term should contribute zero \nor low probability for  points that violate the constraint.  Note that these two terms, \nin  principle,  define  the  same  objective  as  the  simplex  envelope  fitting  algorithms \npreviously mentioned  [2]. \nIn  the  present  work  we  are  more  interested  in  the  algorithm  that  results  from \nremoving S  and finding  the MAP estimate of A.  We  obtain  (d.  Eq.(7)) \n\np(AIX) oc I dS 8(X - AS)Pa(A)ps(S) =  IA -llps(A- 1 X)Pa(A). \n\n(8) \n\nFor  now  we  assumed  N  =  M. 1  If Ps (S)  factors  over  m , i.e.  endmembers  are inde(cid:173)\npendent,  maximizing  the first  two  terms  represents  the  leA  algorithm.  However, \nlIn practice more frequently we  have N  > M.  In that case  the observations  X  can  be \nmapped  into  a  M  dimensional  subspace  using  the singular  value  decomposition  (SVD) , \nX  = UDVT ,  The  discussion  applies  then  to  the  reduced  observations  X = u1x with \nU M  being  the first  M  columns of U . \n\n\f946 \n\nL.  Parra. C.  D.  Spence. P  Sajda.  A.  Ziehe and K.-R.  Muller \n\nthe  prior  on  A  will  restrict  the solutions  to satisfy the abundance constraints and \nbias  the  result  depending  on  the  detailed  choice  of Pa(A),  so  we  are  led  to  con(cid:173)\nstrained ICA. \nIn summary, depending on which variable we  integrate out we  obtain two methods \nfor  solving  the spectral unmixing  problem:  the known  technique of simplex fitting \nand a  new  constrained ICA  algorithm. \n\n2.4  MAP  Solution for  the  Noisy  Case \n\nCombining the choices for  the priors made in section 2.2  (Eqs.(4),  (5)  and (6))  with \n(2)  and  (3)  we  obtain \n\nAMAP, SMAP  = \"''i~ax ft {g N(x\", - a,s\"  a,) ll. P,(t. 'm,d\",) }  , \n\n(9) \n\nsubject to AIM =  lL, A  2:  O.  The logarithm of the cost function  in  (9)  is  denoted \nby  L = L(A, S).  Its gradient with respect to the hidden variables is \n\n88L  =  _AT nm diag(O')-l  -\nSm \n\nfs(sm) \n\n(10) \n\nwhere N  = X - AS, nm  are the M column vectors of N, fs(s)  = - olnc;(s).  In (10) \nfs  is  applied to each element of Sm. \nThe optimization with respect to A  for  given S  can be implemented as  a standard \nweighted least  squares  (L8)  problem with a linear constraint and positivity bounds. \nSince the constraints apply for  every pixel independently one can solve  N  separate \nconstrained LS  problems of M  unknowns each.  We alternate between gradient steps \nfor  S  and explicit  solutions for  A  until convergence.  Any additional parameters of \nPe(e)  such as scale and mean may be obtained in a  maximum likelihood  (ML)  sense \nby maximizing L.  Note that the nonlinear optimization is not subject to constraints; \nthe constraints apply only in  the quadratic optimization. \n\n3  Experiments \n\n3.1  Zero Noise  Case:  Artificial Mixtures \n\nIn our first  experiment we  use  mineral data from  the United States Geological Sur(cid:173)\nvey  (USGS)2  to build  artificial mixtures for  evaluating our unsupervised unmixing \nframework.  Three target endmembers where chosen  (Almandine WS479, Montmo(cid:173)\nrillonite+Illi  CM42  and  Dickite  NMNH106242).  A  spectral  scene  of  100  samples \nwas  constructed  by  creating  a  random  mixture of the  three  minerals.  Of the  100 \nsamples,  there were  no  pure samples  (Le.  no  mineral  had  more  than a  80%  abun(cid:173)\ndance in any sample).  Figure  1A is the spectra of the endmembers recovered by the \nconstrained ICA  technique of section  2.3,  where the constraints were  implemented \nwith penalty terms added to the conventional maximum likelihood ICA  algorithm. \nThese  are  nearly  identical  to  the  spectra  of  the  true  endmembers,  shown  in  fig(cid:173)\nure  1B,  which  were  used  for  mixing.  Interesting to  note is  the  scatter-plot  of the \n100 samples  across  two bands.  The open circles are the absorption values  at these \ntwo  bands for  endmembers  found  by  the  MAP  technique.  Given  that  each  mixed \nsample  consists  of no  more  than  80%  of  any  endmember,  the  endmember  points \non  the scatter-plot are  quite  distant  from  the  cluster.  A  simplex fitting  technique \nwould have significant difficulty  recovering the endmembers from  this clustering. \n\n2see  http://speclab.cr . usgs.gov /spectral.lib.456.descript/ decript04.html \n\n\fUnmixing Hyperspectral Data \n\n947 \n\nfound endmembers \n\ntarget endmembers \n\nobserved X and found S \n\no \n\ng 0.8 \n~ \n., \n~0.6 \n~ \n~ 0.4 \n\no \n\nO~------' \n\n50 \n\n100  150  200 \n\n50 \n\n100  150  200 \n\nO~------' \n\n0.2'---~------' \n\nwavelength \n\nA \n\nwavelength \n\nB \n\n0.4 \n\n0.6 \n\n0.8 \nwavelength=30 \n\nC \n\nFigure  1:  Results  for  noise-free  artificial  mixture.  A  recovered endmembers  using \nMAP  technique.  B  \"true\"  target endmembers.  C  scatter plot of samples across 2 \nbands showing  the  absorption of the three  endmembers computed  by  MAP  (open \ncircles). \n\n3.2  Noisy  Case:  Real Mixtures \n\nTo  validate  the  noise  model  MAP  framework  of section  2.4  we  conducted  an  ex(cid:173)\nperiment using ground truthed USGS  data representing real mixtures.  We  selected \nlOxl0  blocks  of  pixels  from  three  different  regions3  in  the  AVIRIS  data  of  the \nCuprite,  Nevada mining  district.  We  separate these  300  mixed  spectra assuming \ntwo endmembers and an AR detrending with 5 AR coefficients and the MAP  tech(cid:173)\nniques of section 2.4.  Overall brightness was accounted for  as  explain in  the linear \nmodeling of section 1.  The endmembers are shown in figure 2A and B in comparison \nto laboratory spectra from  the USGS  spectral library for  these minerals [8J .  Figure \n2C  shows  the  corresponding  abundances,  which  match  the  ground  truth;  region \n(III)  mainly consists of Muscovite while regions (1)+(I1)  contain (areal) mixtures of \nKaolinite and Muscovite. \n\n4  Discussion \n\nHyperspectral  unmixing is  a  challenging practical problem for  unsupervised learn(cid:173)\ning.  Our probabilistic approach leads to several interesting algorithms:  (1)  simplex \nfitting,  (2)  constrained ICA and (3)  constrained least squares that can efficiently use \nmulti-channel  information.  An  important  element  of  our  approach  is  the  explicit \nuse  of  prior  information.  Our simulation  examples  show  that  we  can  recover  the \nendmembers,  even  in  the presence  of noise  and  model  uncertainty.  The approach \ndescribed in this paper does not yet exploit  local correlations between neighboring \npixels  that  are  well  known  to  exist.  Future  work  will  therefore  exploit  not  only \nspectral but also  spatial prior information for  detecting objects and materials. \n\nAcknowledgments \n\nWe  would  like  to thank Gregg Swayze at the USGS  for  assistance in  obtaining the \ndata. \n\n3The  regions  were  from  the  image  plate2.cuprite95.alpha.2um.image.wlocals.gif  in \nftp:/ /speclab.cr.usgs.gov /pub/cuprite/gregg.thesis.images/,  at  the  coordinates  (265,710) \nand  (275,697),  which  contained  Kaolinite  and  Muscovite  2,  and  (143,661),  which  only \ncontained Muscovite  2. \n\n\f948 \n\n0.65 \n\n0.6 \n\n0.55 \n\n0.5 \n\n0.45 \n\nL.  Parra,  C.  D,  Spence,  P  Sajda,  A.  Ziehe and K-R.  Muller \n\nMuscovite \n\nKaolinite \n\n0.8 \n\n0.7 \n\n0.6 \n\n0.4 \n\n0.3 \n\n'c .\u2022.\u2022  \", \"'0 .. \n' ., \n\n0.4,--~--:-:-:-\"-~----:--:--~ \n220 \n\n210 \n\n160 \n\n190 \n200 \nwaveleng1h \n\n180 \n\n190 \n\n200 \nwavelength \n\n210 \n\n220 \n\nA \n\nB \n\nC \n\nFigure  2:  A  Spectra  of  computed  endmember  (solid  line)  vs  Muscovite  sample \nspectra from  the USGS  data base library.  Note we  show only part of the spectrum \nsince  the  discriminating  features  are  located  only  between  band  172  and  220.  B \nComputed endmember (solid line)  vs Kaolinite sample spectra from the USGS  data \nbase library.  C  Abundances for  Kaolinite  and Muscovite for  three regions  (lighter \npixels represent higher abundance).  Region 1 and region 2 have similar abundances \nfor  Kaolinite and Muscovite, while region 3 contains more Muscovite. \n\nReferences \n\n[1]  J.  Bayliss,  J.  A.  Gualtieri,  and  R.  Cromp.  Analyzing hyperspectral data with \nindependent component analysis.  In J. M.  Selander, editor,  Proc.  SPIE Applied \nImage  and  Pattern  Recognition  Workshop,  volume  9,  P.O.  Box  10,  Bellingham \nWA  98227-0010, 1997. SPIE. \n\n[2]  J.W. Boardman and F.A. Kruse.  Automated spectral analysis:  a geologic exam(cid:173)\n\nple using AVIRIS data, north Grapevine Mountains, Nevada. In  Tenth  Thematic \nConference  on  Geologic  Remote Sensing,  pages 407-418,  Ann  arbor,  MI,  1994. \nEnvironmental Research Institute of Michigan. \n\n[3]  S.  Haykin.  Adaptive  Filter  Theory.  Prentice Hall,  1991. \n[4]  F.  Maselli,  , M.  Pieri, and C.  Conese.  Automatic identification of end-members \nfor  the  spectral  decomposition  of remotely sensed  scenes.  Remote  Sensing  for \nGeography,  Geology,  Land  Planning,  and  Cultural  Heritage  (SPIE) , 2960:104-\n109,1996. \n\n[5]  B.  Pearlmutter and  L.  Parra.  Maximum likelihood  blind source separation:  A \ncontext-sensitive generalization ofICA. In M.  Mozer, M.  Jordan, and T. Petsche, \neditors,  Advances  in  Neural  Information  Processing  Systems  9,  pages 613-619, \nCambridge MA,  1997.  MIT Press. \n\n[6]  J.J.  Settle.  Linear mixing  and the estimation of ground cover  proportions.  In(cid:173)\n\nternational  Journal  of Remote Sensing,  14:1159-1177,1993. \n\n[7]  M.O. Smith, J .B. Adams, and A.R. Gillespie.  Reference endmembers for spectral \nmixture analysis. In Fifth Australian remote sensing conference, volume 1, pages \n331-340, 1990. \n\n[8]  U.S.  Geological Survey.  USGS digital spectral library.  Open File Report 93-592, \n\n1993. \n\n[9]  H.  Szu  and  C.  Hsu.  Landsat  spectral  demixing  a  la superresolution  of  blind \nmatrix  inversion  by  constraint  MaxEnt  neural  nets.  In  Wavelet  Applications \nIV,  volume 3078, pages 147-160. SPIE,  1997. \n\n\f", "award": [], "sourceid": 1714, "authors": [{"given_name": "Lucas", "family_name": "Parra", "institution": null}, {"given_name": "Clay", "family_name": "Spence", "institution": null}, {"given_name": "Paul", "family_name": "Sajda", "institution": null}, {"given_name": "Andreas", "family_name": "Ziehe", "institution": null}, {"given_name": "Klaus-Robert", "family_name": "M\u00fcller", "institution": null}]}