{"title": "The Concave-Convex Procedure (CCCP)", "book": "Advances in Neural Information Processing Systems", "page_first": 1033, "page_last": 1040, "abstract": null, "full_text": "The  Concave-Convex Procedure  (CCCP) \n\nA.  L.  Yuille  and  Anand Rangarajan  * \nSmith-Kettlewell Eye Research Institute, \n\n2318  Fillmore Street, \n\nSan Francisco,  CA  94115,  USA. \n\nTel.  (415)  345-2144.  Fax.  (415)  345-8455. \n\nEmail yuille@ski.org \n\n*  Prof.  Anand  Rangarajan.  Dept.  of  CISE,  Univ.  of  Florida  Room  301,  CSE \nBuilding  Gainesville,  FL  32611-6120  Phone:  (352)  392  1507  Fax:  (352)  392  1220 \ne-mail:  anand@cise.ufl.edu \n\nAbstract \n\nWe  introduce the  Concave-Convex procedure  (CCCP)  which  con(cid:173)\nstructs discrete  time  iterative  dynamical  systems  which  are  guar(cid:173)\nanteed to monotonically decrease global optimization/energy func(cid:173)\ntions.  It can be applied to  (almost)  any optimization problem and \nmany existing algorithms can be interpreted in terms of CCCP.  In \nparticular, we  prove relationships to some applications of Legendre \ntransform techniques.  We  then illustrate CCCP by applications to \nPotts models, linear assignment,  EM  algorithms,  and  Generalized \nIterative Scaling  (GIS).  CCCP  can be  used  both as  a  new  way  to \nunderstand existing optimization algorithms and as a procedure for \ngenerating new algorithms. \n\n1 \n\nIntroduction \n\nThere is  a  lot of interest in designing discrete time dynamical systems for  inference \nand learning  (see,  for  example,  [10],  [3],  [7],  [13]). \n\nThis  paper describes  a  simple  geometrical  Concave-Convex procedure  (CCCP)  for \nconstructing discrete time dynamical systems which can be guaranteed to decrease \nalmost  any  global  optimization/energy  function  (see  technical  conditions  in  sec(cid:173)\ntion  (2)). \n\nWe  prove that there is  a  relationship  between  CCCP  and optimization techniques \nbased on introducing auxiliary variables using Legendre transforms.  We  distinguish \nbetween  Legendre  min-max and  Legendre  minimization.  In the former,  see  [6],  the \nintroduction  of  auxiliary  variables  converts  the  problem  to  a  min-max  problem \nwhere the goal is  to find  a saddle point.  By contrast, in  Legendre  minimization, see \n[8],  the  problem  remains  a  minimization  one  (and  so  it  becomes  easier  to  analyze \n\n\fconvergence).  CCCP relates to Legendre minimization only and gives a geometrical \nperspective which complements the algebraic manipulations  presented in  [8]. \n\nCCCP  can  be  used  both  as  a  new  way  to  understand  existing  optimization  algo(cid:173)\nrithms  and  as  a  procedure  for  generating  new  algorithms.  We  illustrate  this  by \ngiving  examples  from  Potts  models,  EM,  linear  assignment,  and  Generalized  It(cid:173)\nerative  Scaling.  Recently,  CCCP  has  also  been  used  to  construct  algorithms  to \nminimize the Bethe/Kikuchi free  energy  [13]. \n\nWe  introduce  CCCP  in  section  (2)  and  relate  it  to  Legendre  transforms  in  sec(cid:173)\ntion  (3).  Then we  give examples in  section  (4). \n\n2  The  Concave-Convex Procedure  (CCCP) \n\nThe key results of CCCP are summarized by Theorems  1,2, and 3. \n\nTheorem  1  shows that any function ,  subject  to  weak conditions,  can be expressed \nas  the sum of a  convex and concave part  (this  decomposition is  not  unique).  This \nimplies that CCCP can be applied to (almost)  any optimization problem. \nTheorem 1.  Let E(x)  be  an  energy function  with  bounded Hessian [J2 E(x)/8x8x. \nThen  we  can  always  decompose  it into  the  sum  of a  convex function  and  a  concave \nfunction. \n\nProof.  Select  any  convex function  F(x)  with  positive  definite  Hessian  with  eigen(cid:173)\nvalues  bounded  below  by  f  >  o.  Then  there  exists  a  positive  constant  A such  that \nthe  Hessian  of E(x) + AF(x)  is  positive  definite  and  hence  E(x) + AF(x)  is  con(cid:173)\nvex.  Hence  we  can  express  E(x)  as  the  sum  of a  convex part,  E(x) + AF(x) ,  and  a \nconcave  part  -AF(x). \n\nFigure 1:  Decomposing a function into convex and concave parts.  The original func(cid:173)\ntion  (Left  Panel)  can be expressed as  the sum of a  convex function  (Centre Panel) \nand a  concave function  (Right Panel).  (Figure courtesy of James M.  Coughlan). \n\nOur  main  result  is  given  by  Theorem  2  which  defines  the  CCCP  procedure  and \nproves that it converges to a  minimum or saddle point of the energy. \n\nTheorem  2.  Consider  an  energy  function  E(x)  (bounded  below)  of form  E(x)  = \nEvex (x) + E cave (x)  where  Evex (x), E cave (x)  are  convex  and  concave  functions  of x \nrespectively.  Then  the  discrete  iterative  CCCP  algorithm ;zt  f-7  ;zt+1  given  by: \n\n-\n\\1Evex (x \n\n-t+l  _ \n\n-\n\n)  - -\\1Ecave (x  ), \n\n-t \n\n(1) \n\nis  guaranteed  to  monotonically  decrease  the  energy  E(x)  as  a function  of time  and \nhence  to  converge  to  a  minimum  or  saddle  point  of E(x). \n\n\fProof.  The  convexity  and  concavity  of Evex (.)  and Ecave (.)  means  that Evex (X2)  2: \nEvex (xd + (X2 -xd\u00b7 ~ Evex (xd  and Ecave (X4)  :S  Ecave (X3) + (X4 -X3)\u00b7 ~ Ecave (X3 ), \nfor  all  X1 ,X2,X3,X4.  Now  set  Xl  = xt+l,X2  = xt,X3  = xt,X4  = xt+1.  Using  the \nalgorithm  definition  (i.e.  ~Evex (xt+1) =  -~Ecave (xt)) we  find  that  Evex (xt+ 1) + \nEcave (xt+1) :S  Evex (xt) + Ecave (xt),  which  proves  the  claim. \nWe  can get  a graphical illustration of this algorithm by the reformulation shown in \nfigure  (2)  (suggested  by  James  M.  Coughlan).  Think  of  decomposing  the  energy \nfunction  E(x)  into  E1(x)  - E2(x)  where  both  E 1(x)  and  E2(x)  are convex.  (This \nis  equivalent to decomposing E(x)  into a  a  convex term E 1(x)  plus a  concave term \n-E2(X)) .  The algorithm proceeds by matching points on the two terms which have \nthe same tangents.  For an input Xo  we  calculate the gradient ~ E2 (xo)  and find the \npoint Xl  such that ~ E1 (xd = ~ E2 (xo).  We  next determine the point X2  such that \n~E1(X2) =  ~E2 (X1)' and repeat. \n\n7~------~--------~------, \n\n60 -\n\n50 -\n\n40 -\n\n30 -\n\n20 -\n\no \n\n10 \n\nO L---~=-~O-=~~~~----~10 \n\nXO \n\nFigure  2:  A  CCCP  algorithm  illustrated  for  Convex  minus  Convex.  We  want  to \nminimize  the  function  in  the  Left  Panel.  We  decompose  it  (Right  Panel)  into \na  convex  part  (top  curve)  minus  a  convex  term  (bottom  curve).  The  algorithm \niterates by matching points on the two curves which have the same tangent vectors, \nsee text for more details.  The algorithm rapidly converges to the solution at x  =  5.0. \n\nWe  can  extend  Theorem  2  to  allow  for  linear  constraints  on  the  variables  X,  for \nexample  Li et Xi  =  aM  where  the  {en, {aM}  are  constants.  This  follows  directly \nbecause properties such  as  convexity and concavity are preserved when linear  con(cid:173)\nstraints are imposed.  We  can change to new  coordinates defined on the hyperplane \ndefined  by  the  linear  constraints.  Then  we  apply  Theorem  1  in  this  coordinate \nsystem. \nObserve that Theorem 2 defines the update as an implicit function of xt+1.  In many \ncases, as we will show, it is possible to solve for xt+1  directly.  In other cases we  may \nneed  an  algorithm,  or  inner  loop,  to  determine  xt+1  from  ~Evex (xt+1).  In  these \ncases  we  will  need  the  following  theorem  where  we  re-express  CCCP  in  terms  of \nminimizing a time sequence of convex update  energy functions Et+1 (xt+1)  to obtain \nthe updates xt+1  (i.e.  at the tth  iteration of CCCP we  need to minimize the energy \nEt+1 (xt+1 )).  We  include linear constraints in Theorem 3. \nTheorem 3.  Let E(x) =  Evex (x) + E cave (x)  where X is  required to satisfy the linear \nconstraints  Li et Xi  =  aM,  where  the {et}, { aM}  are constants.  Then the update rule \nfor  xt+1  can  be  formulated  as  minimizing  a time  sequence  of convex  update  energy \n\n\ffunctions  Et+1 (;rt+1): \n\n(2) \n\nwhere  the  lagrange  parameters  P'J1}  impose  linear  comnstraints. \n\nProof.  Direct  calculation. \n\nThe convexity of EH1 (;rt+1)  implies that there is  a unique minimum corresponding \nto  ;rt+1.  This  means  that if an inner  loop  is  needed  to calculate ;rt+1  then we  can \nuse  standard techniques such as conjugate gradient descent  (or  even  CCCP). \n\n3  Legendre  Transformations \n\nThe  Legendre  transform  can be  used  to  reformulate  optimization  problems  by  in(cid:173)\ntroducing  auxiliary  variables  [6].  The  idea  is  that  some  of the  formulations  may \nbe  more effective  (and computationally cheaper)  than others.  We  will  concentrate \non Legendre  minimization, see  [7]  and [8],  instead of Legendre  min-max emphasized \nin  [6].  An  advantage  of Legendre  minimization  is  that  mathematical  convergence \nproofs can be given.  (For example,  [8]  proved convergence results for  the algorithm \nimplemented in  [7].) \n\nIn  Theorem  4  we  show  that  Legendre  minimization  algorithms  are  equivalent  to \nCCCP.  The  CCCP  viewpoint  emphasizes  the  geometry of the  approach and  com(cid:173)\nplements  the  algebraic  manipulations  given  in  [8]. \n(Moreover,  our  results  of the \nprevious  section  show  the  generality  of  CCCP  while,  by  contrast,  the  Legendre \ntransform methods have  been applied only on a  case by case  basis). \n\nDefinition  1.  Let  F(x)  be  a  convex  function.  For  each  value  y  let  F*(ff)  = \nminx{F(x) +y\u00b7x.}.  Then  F*(Y)  is  concave  and is  the  Legendre  transform  of F(x). \nMoreover,  F (x)  =  maxy{ F* (y)  - y. x} . \n\nProperty  1.  F(.)  and  F*(.)  are  related  by  a:; (fJ)  =  {~~} - 1(_Y),  -~~(x) = \n\n{a{y* } -1 (x).  (By { a{y* } -1 (x)  we  mean the  value  y  such  that  a{y*  (y)  =  x.) \nTheorem  4.  Let E1 (x)  =  f(x) + g(x)  and  E 2(x, Y)  =  f(x) + x\u00b7 Y + h(i/),  where \nf(.), h(.)  are  convex functions  and g(.)  is  concave.  Then  applying  CCCP to E1 (x)  is \nequivalent to  minimizing  E2 (x, Y)  with  respect  to  x  and y  alternatively  (for  suitable \nchoices  of g(.)  and h(.). \n\nProof.  We  can  write E1(X)  =  f(x) +miny{g*(Y) +x\u00b7y} where g*(.)  is  the  Legendre \ntransform  of g( .)  (identify g(.)  with F*( .)  and g*(.)  with F(.)  in definition  1).  Thus \nminimizing E1 (x)  with  respect  to  x  is  equivalent  to  minimizing E1 (x, Y)  =  f(x)  + \nx  .  y  + g* (Y)  with  respect  to  x  and  y. \n(Alternatively,  we  can  set  g* (Y)  =  h(Y) \nin  the  expression  for  E 2(x,i/)  and  obtain  a  cost  function  E 2(x)  =  f(x)  + g(x).) \nAlternatively minimization  over x  and y  gives:  (i)  of/ax =  y  to  determine  Xt+1  in \nterms  of Yt,  and  (ii)  ag* / ay =  x  to  determine Yt  in  terms  of Xt  which,  by  Property \n1  of the  Legendre  transform  is  equivalent  to  setting y  =  -ag / ax.  Combining  these \ntwo  stages  gives  CCCP: \n\naf (_) \nag (_) \nax  Xt+1  =  - ax  Xt  . \n\n\f4  Examples of CCCP \n\nWe  now  illustrate  CCCP  by  giving  four  examples:  (i)  discrete  time  dynamical \nsystems  for  the  mean  field  Potts  model,  (ii)  an  EM  algorithm for  the  elastic  net, \n(iii)  a  discrete  (Sinkhorn)  algorithm for  solving the linear assignment problem, and \n(iv)  the Generalized Iterative Scaling  (GIS)  algorithm for  parameter estimation. \n\nExample  1.  Discrete  Time  Dynamical  Systems  for  the  Mean  Field  Potts \nThese  attempt  to  minimize  discrete  energy  functions  of  form  E[V]  = \nModel. \n2:i,j,a,b Tij ab Via V)b  + 2:ia Bia Vi a,  where  the  {Via}  take  discrete  values  {a, I}  with \nlinear  constraints 2:i Via  =  1,  Va. \nDiscussion.  Mean field  algorithms minimize a continuous effective energy E ett [S; T] \nf-7  a.  The \nto  obtain  a  minimum  of the  discrete  energy  E[V]  in  the  limit  as  T \n{Sial  are  continuous  variables  in  the  range  [0 ,1]  and  correspond  to  (approximate) \nestimates  of the  mean  states  of the  {Via}.  As  described  in  [12},  to  ensure  that  the \nminima  of E[V]  and E ett [S; T]  all  coincide  (as  T  f-7  0)  it  is  sufficient  that Tijab \nbe  negative  definite.  Moreover,  this  can  be  attained  by  adding  a  term  -K 2:ia Vi! \nto  E[V]  (for  sufficiently  large  K)  without  altering  the  structure  of the  minima  of \nE[V] .  Hence,  without loss  of generality  we  can  consider 2:i,j,a,b Tijab Via V)b  to  be  a \nconcave  function . \n\nthe \n\nimpose \n\nWe \nterm \n2:a Pa {2:i  Via  - I}  to  the  energy  where  the  {Pa}  are  the  Lagrange  multipliers.  The \neffective  energy  becomes: \n\nlinear  constraints  by  adding  a  Lagrange  multiplier \n\ni,j,a ,b \n\nia \n\nia \n\na \n\nWe  can  then  incorporate  the  Lagrange  multiplier  term  into  the  convex  part. \nThis  gives:  Evex [S]  =  T2: ia SialogSia  +  2:aPa{2:iSia  -I}  and  Ecave[S]  = \nTaking  derivatives  yields:  &g  Evex [S]  = \n2:i  jab TijabSiaSjb  +  2:ia BiaS ia \u00b7 \nTI~~Sia + Pa  and &t E cave [S]  =  2 2: j,b TijabSjb + Bia\u00b7  Applying  eeep  by  setting \n&:s::~ (StH)  =  - &:5;:e (st)  gives T{l + log Sia (t + I)} + Pa  =  -2 2:j,b TijabSjb(t)(cid:173)\nBia\u00b7  We  solve  for  the  Lagrange  multipliers  {Pal  by  imposing  the  constraints \n2:i Sia(t + 1)  =  1,  Va.  This  gives  a  discrete  update  rule: \n\nSia (t + 1)  = \n\n(-1/T){2 2:. b TijabSjb(t)+Oia} \ne \n2:c e( -1/T){2 \n\nj,b TijcbSjb(tl+Oi c } \n\n2: \n\nJ, \n\n' \n\n. \n\n(4) \n\nAlgorithms  of this  type  were  derived  in [lO}, [3}  using  different  design  principles. \n\nOur  second  example  relates  to the  ubiquitous  EM  algorithm.  In  general  EM  and \nCCCP give different algorithms but in some cases they are identical.  The EM algo(cid:173)\nrithm seeks to estimate a variable f* =  argmaxt log 2:{I} P(f, l), where {f}, {l} are \nvariables that depend on the specific problem formulation.  It was  shown in  [4]  that \nthis  is  equivalent  to  minimizing the  following  effective  energy  with  respect  to  the \nvariables  f  and P(l):  E ett [!, P(l)]  =  - ~ 2:1 P(l) log P(f, l) + ~ 2:{I} P(l) log P(l). \nTo  apply  CCCP  to an effective  energy  like  this  we  need  either:  (a)  to  decompose \nE ett [!, P(l)]  into convex and concave functions of f, P(l), or (b)  to eliminate either \n\n\fvariable  and obtain a  convex concave  decomposition in the remaining variable  (d. \nTheorem 4).  We  illustrate  (b)  for  the elastic  net  [2].  (See  Yuille  and  Rangarajan, \nin preparation, for  an illustration of (a)). \n\nExample  2.  The  elastic  net  attempts  to  solve  the  Travelling  Salesman  Problem \n(TSP)  by  finding  the  shortest  tour  through  a  set  of cities  at  positions  {Xi }'  The \nelastic  net  is  represented  by  a  set  of nodes  at  positions  {Ya}  with  variables  {Sial \nthat  determine  the  correspondence  between  the  cities  and the  nodes  of the  net.  Let \nE el I [S, 171  be  the  effective  energy  for  the  elastic  net,  then  the  {y}  variables  can  be \neliminated  and  the  resulting  Es[S]  can  be  minimized  using  GGGP.  (Note  that  the \nstandard  elastic  net only  enforces  the  second  set of linear  constraints). \nDiscussion.  The  elastic  net energy function  can  be  expressed  as  [11]: \n\nia \n\na,b \n\ni,a \n\nwhere  we  impose  the  conditions  L:a Sia  =  1,  V i  and L:i Sia  =  1,  V a. \nThe  EM  algorithm  can  be  applied  to  estimate  the  {Ya}.  Alternatively  we  can  solve \nfor  the {Ya}  variables  to  obtain Yb  =  L:i a PabSiaXi  where  {Pab }  =  {Jab + 2')'Aab} -1. \nWe  substitute this  back  into E ell [S, 171  to  get  a  new  energy Es[S]  given  by: \n\ni ,j,a,b \n\ni,a \n\n(6) \n\nOnce  again  this  is  a  sum  of a  concave  and  a  convex part  (the  first  term  is  concave \nbecause  of the  minus sign  and the  fact  that {Pba }  and Xi  . Xj  are  both  positive  semi(cid:173)\ndefinite.)  We  can  now  apply  GGGP  and obtain  the  standard EM algorithm for  this \nproblem.  (See  Yuille  and Rangarajan,  in preparation,  for  more  details). \n\nOur final  example  is  a  discrete  iterative  algorithm  to  solve  the  linear  assignment \nproblem.  This algorithm was  reported  by Kosowsky and Yuille  in  [5] where it  was \nalso shown to correspond to the well-known  Sinkhorn algorithm  [9].  We  now show \nthat both Kosowsky and Yuille's linear assignment algorithm, and hence Sinkhorn's \nalgorithm are examples  of CCCP  (after a  change of variables). \n\nExample  3.  The  linear  assignment problem  seeks  to  find  the  permutation  matrix \n{TIia} which  minimizes  the  energy  E[m  =  L:ia TIia A ia ,  where  {Aia}  is  a  set  of \nassignment  values.  As  shown  in  [5}  this  is  equivalent  to  minimizing  the  (convex) \nEp[P]  energy  given  by  Ep[P]  =  L:aPa  + ~ L:i log L:a e-,B(Aia+Pa) ,  where  the  so(cid:173)\nlution  is  given  by  TI;a  =  e-,B(Aia+Pa) /  L:b e-,B(Aib+Pb)  rounded  off to  the  nearest \ninteger (for sufficiently large  fJ).  The  iterative  algorithm  to  minimize Ep[P]  (which \ncan  be  re-expressed  as  Sinkhorn's  algorithm,  see  [5})  is  of form: \n\n(7) \n\nand  can  be  re-expressed  as  GGGP. \nDiscussion.  By performing  the  change  of coordinates  fJPa  =  - log r a  V a  (for r a > \n\n\f0,  Va)  we  can  re-express  the  Ep[P]  energy  as: \n\n(8) \n\nObserve  that  the  first  term  of Er[r]  is  convex  and  the  second  term  is  concave  (this \ncan  be  verified  by  calculating  the  Hessian).  Applying  CCCP  gives  the  update  rule: \n\n1 \n\na \n\nrt+l  =  2:=  2:::  e-,BAibrt' \n\ni \n\nb \n\ne-,BAia \n\nb \n\n(9) \n\nwhich  corresponds  to  equation  (7). \nExample  4.  The  Generalized  Iterative  Scaling  (GIS)  Algorithm [ll for  estimating \nparameters  in parallel. \nDiscussion.  The  GIS algorithm  is  designed  to  estimate  the  parameter X of a distri(cid:173)\nbution  P(x : X)  =  eX.\u00a2(x) IZ[X]  so  that  2:::x P(x; X)\u00a2(x)  =  h,  where  h  are  observa(cid:173)\ntion  data  (with  components  indexed  by  j.t).  It  is  assumed  that  \u00a2fJ,(x)  :::::  0,  V j.t,x, \nhfJ,  :::::  0,  V j.t,  and  2:::fJ,  \u00a2fJ, (x)  = 1,  V x  and  2:::fJ,  hfJ,  = 1.  (All  estimation problems  of \nthis  type  can  be  transformed into  this  form  [lj). \nDarroch  and  Ratcliff [ll prove  that  the  following  GIS  algorithm  is  guaranteed  to \nconverge to  value X*  that minimizes the  (convex)  cost function E(X)  =  log Z[X]-X.h \nand hence  satisfies  2:::x P(x; X*)\u00a2(x)  = h.  The  GIS algorithms  is  given  by: \n\nXt+!  =  Xt  - log ht + log h, \n\n(10) \n\nwhere  ht  =  2:::x P(x; Xt )\u00a2(x)  {evaluate  log h  componentwise:  (log h)fJ,  =  log hf),') \nTo  show  that  GIS  can  be  reformulated  as  CCCP,  we  introduce  a  new  variable \niJ  =  eX  (componentwise).  We  reformulate  the  problem  in  terms  of  minimizing \nthe  cost  function  E,B [iJ]  =  log Z[log(iJ)]  - h  .  (log iJ).  A  straightforward  calcula(cid:173)\ntion  shows  that  -h . (log iJ)  is  a  convex  function  of iJ  with  first  derivative  being \n-hi iJ  (where  the  division  is  componentwise).  The  first  derivative  of log Z[log(iJ)]  is \n(II iJ)  2:::x \u00a2(x)P(x: log ,8)  (evaluated  componentwise).  To  show that log Z[log(iJ)]  is \nconcave  requires  computing its Hessian  and  applying  the  Cauchy-Schwarz inequality \nusing  the  fact  that  2:::fJ,  \u00a2fJ,(x)  =  1,  V x  and  that \u00a2fJ,(x)  :::::  0,  V j.t,x.  We  can  there-\nfore  apply  CCCP  to  E,B [iJ]  which  yields  l/iJH1  =  l/iJt x  Ilh x  ht  (componentwise) , \nwhich  is  GIS (by  taking  logs  and  using log ,8  =  X). \n\n5  Conclusion \n\nCCCP is  a  general principle which  can  be used to construct discrete time iterative \ndynamical systems for  almost any energy minimization problem.  It gives a  geomet(cid:173)\nric  perspective on  Legendre  minimization (though not on  Legendre  min-max). \n\nWe have illustrated that several existing discrete time iterative algorithms can be re(cid:173)\ninterpreted in  terms of CCCP (see Yuille and Rangarajan, in  preparation, for  other \n\n\fexamples).  Therefore CCCP gives a novel way ofthinking about and classifying ex(cid:173)\nisting algorithms.  Moreover, CCCP can also be used to construct novel algorithms. \nSee, for example, recent work [13]  where CCCP was used to construct a double loop \nalgorithm to minimize the Bethe/Kikuchi free  energy  (which are generalizations of \nthe mean field  free  energy). \n\nThere  are  interesting connections  between  our results  and those  known  to mathe(cid:173)\nmaticians.  After this work was completed we found that a result similar to Theorem \n2  had  appeared  in  an  unpublished  technical  report  by  D.  Geman.  There  also  are \nsimilarities  to  the  work  of  Hoang  Tuy  who  has  shown  that  any  arbitrary  closed \nset  is  the  projection  of  a  difference  of two  convex  sets  in  a  space  with  one  more \ndimension.  (See  http://www.mai.liu.se/Opt/MPS/News/tuy.html). \n\nAcknowledgements \n\nWe  thank James  Coughlan and Yair  Weiss  for  helpful  conversations.  Max Welling \ngave useful feedback on this manuscript.  We  thank the National Institute of Health \n(NEI)  for  grant number R01-EY 12691-01. \n\nReferences \n\n[1]  J.N.  Darroch and D.  Ratcliff.  \"Generalized Iterative Scaling for  Log-Linear Models\". \n\nThe  Annals  of Mathematical  Statistics. Vol.  43.  No.5,  pp 1470-1480.  1972. \n\n[2]  R.  Durbin,  R.  Szeliski  and A.L.  Yuille.\"  An  Analysis  of an  Elastic  net  Approach  to \n\nthe Traveling  Salesman Problem\".  Neural  Computation.  1 , pp 348-358.  1989. \n\n[3]  LM.  Elfadel  \"Convex potentials and their conjugates in analog  mean-field optimiza(cid:173)\n\ntion\".  Neural  Computation. Volume 7. Number 5.  pp.  1079-1104.  1995. \n\n[4]  R.  Hathaway.  \"Another  Interpretation  of the  EM  Algorithm  for  Mixture  Distribu(cid:173)\n\ntions\" .  Statistics  and  Probability  Letters.  Vol.  4,  pp 53-56.  1986. \n\n[5]  J. Kosowsky and A.L. Yuille.  \"The Invisible Hand Algorithm:  Solving the Assignment \nProblem with Statistical Physics\".  Neural  Networks. , Vol.  7,  No.3 , pp 477-490.  1994. \n\n[6]  E.  Mjolsness  and  C.  Garrett.  \"Algebraic  Transformations  of Objective  Functions\". \n\nNeural  Networks.  Vol.  3, pp 651-669. \n\n[7]  A.  Rangarajan,  S.  Gold,  and E.  Mjolsness.  \"A  Novel  Optimizing Network  Architec(cid:173)\n\nture with Applications\" .  Neural  Computation,  8(5),  pp 1041-1060.  1996. \n\n[8]  A.  Rangarajan,  A.L.  Yuille,  S.  Gold.  and  E.  Mjolsness.\" A  Convergence  Proof  for \nthe  Softassign  Quadratic assignment  Problem\".  In  Proceedings  of NIPS '96.  Denver. \nColorado.  1996. \n\n[9]  R.  Sinkhorn.  \"A  Relationship  Between  Arbitrary  Positive  Matrices  and  Doubly \n\nStochastic Matrices\".  Ann.  Math.  Statist ..  35,  pp 876-879.  1964. \n\n[10]  F.R. Waugh and R.M . Westervelt.  \"Analog neural networks with  local  competition: \n\nL  Dynamics and stability\".  Physical  Review E, 47(6),  pp 4524-4536.  1993. \n\n[11]  A.L.  Yuille.  \"Generalized Deformable Models,  Statistical Physics and Matching Prob(cid:173)\n\nlems,\"  Neural  Computation, 2  pp 1-24.  1990. \n\n[12]  A.L.  Yuille and J.J. Kosowsky.  \"Statistical Physics Algorithms that Converge.\"  Neu(cid:173)\n\nral  Computation.  6,  pp 341-356.  1994. \n\n[13]  A.L.  Yuille.  \"A  Double-Loop  Algorithm  to  Minimize  the  Bethe  and  Kikuchi  Free \n\nEnergies\" .  Neural  Computation.  In press.  2002. \n\n\f", "award": [], "sourceid": 2125, "authors": [{"given_name": "Alan", "family_name": "Yuille", "institution": null}, {"given_name": "Anand", "family_name": "Rangarajan", "institution": null}]}