{"title": "Softening Discrete Relaxation", "book": "Advances in Neural Information Processing Systems", "page_first": 438, "page_last": 444, "abstract": null, "full_text": "Softening Discrete Relaxation \n\nAndrew M. Finch,  Richard C.  Wilson and Edwin R.  Hancock \n\nDepartment of Computer Science, \n\nUniversity of York,  York,  Y01  5DD,  UK \n\nAbstract \n\nThis paper describes a  new framework for  relational graph match(cid:173)\ning.  The starting point is  a recently reported Bayesian consistency \nmeasure  which  gauges  structural  differences  using  Hamming  dis(cid:173)\ntance.  The main  contributions of the  work  are threefold.  Firstly, \nwe  demonstrate  how  the  discrete  components  of  the  cost  func(cid:173)\ntion  can  be  softened.  The  second  contribution  is  to  show  how \nthe  softened  cost  function  can  be  used  to  locate  matches  using \ncontinuous non-linear optimisation.  Finally,  we  show how  the res(cid:173)\nulting graph matching algorithm relates to the standard quadratic \nassignment problem. \n\nIntroduction \n\n1 \nGraph  matching  [6,  5,  7,  2,  3,  12,  11J  is  a  topic  of central  importance  in  pattern \nperception.  The main  computational  issues  are how  to compare inexact  relational \ndescriptions (7J  and how to search efficiently for the best match [8J.  These two issues \nhave  recently  stimulated  interest  in  the  connectionist  literature  (9,  6,  5,  lOJ.  For \ninstance, Simic  [9],  Suganathan et al.  (101  and Gold  et ai.  [6,  51  have addressed the \nissue of how to expressively measure relational distance.  Both Gold and Rangarajan \n(61  and  Suganathan  et  al  [101  have  shown  how  non-linear  optimisation  techniques \nsuch  as  mean-field  annealing  [lOJ  and  graduated assignment  [61  can  be applied  to \nfind  optimal matches. \n\nIn a  recent  series of papers we  have developed a  Bayesian framework for  relational \ngraph matching  [2,  3,  11,  121.  The  novelty  resides  in  the fact  that  relational  con(cid:173)\nsistency  is  gauged  by  a  probability  distribution  that  uses  Hamming  distance  to \nmeasure structural differences  between  the  graphs under  match.  This  new  frame(cid:173)\nwork  has  not  only  been  used  to  match  complex  infra-red  (3J  and  radar  imagery \n[11],  it  has  also  been  used  to successfully  control  a  graph-edit  process  (12J  of the \nsort originally proposed by Sanfeliu and Fu  (71.  The optimisation of this relational \nconsistency measure  has  hitherto  been  confined  to the use  of discrete  update pro(cid:173)\ncedures  [11,  2,  3].  Examples include discrete relaxation [7,  11],  simulated annealing \n\n\fSoftening Discrete Relaxation \n\n439 \n\n[4, 3]  and  genetic  search  [2].  Our aim  in  this  paper is  to consider  how  the optim(cid:173)\nisation of the relational  consistency  measure  can  be  realised  by  continuous  means \n[6, 10].  Specifically  we  consider  how  the matching  process  can  be  effected  using  a \nnon-linear  technique  similar  to mean-field  annealing  [IOJ  or  graduated  assignment \n[6].  In order to achieve this goal  we  must  transform our discrete cost function  [11] \ninto  a  form  suitable  for  optimisation  by  continuous  techniques.  The  key  idea  is \nto  exploit  the apparatus of statistical  physics  [13]  to  compute  the effective  Gibbs \npotentials  for  our discrete  relaxation  process.  The potentials  are  in-fact  weighted \nsums  of Hamming  distance enumerated  over  the  consistent  relations  of the  model \ngraph.  The  quantities  of interest  in  the  optimisation  process  are  the  derivatives \nof the global energy function  computed  from  the  Gibbs  potentials.  In  the  case  of \nour  weighted  sum  of Hamming  distance,  these  derivatives  take  on  a  particularly \ninteresting form  which provides an intuitive insight into the dynamics of the update \nprocess.  An  experimental  evaluation  of the  technique  reveals  not  only  that  it  is \nsuccessful in matching noise corrupted graphs, but that it significantly outperforms \nthe optimisation of the standard quadratic energy function. \n2  Relational  Consistency \nOur overall  goal  in  this  paper  is  to  formulate  a  non-linear  optimisation  technique \nfor  matching  relational  graphs.  We  use  the  notation  G  =  (V, E)  to  denote  the \ngraphs  under  match,  where  V  is  the  set  of nodes  and  E  is  the set  of edges.  Our \naim in matching is to associate nodes in  a  graph G D  =  (V D , ED)  representing data \nto be matched against  those in  a  graph G M  =  (V M  , EM)  representing an available \nrelational model.  Formally, the matching is  represented by a function f  : VD  -T VM \nfrom the nodes in the data graph G D  to those in the model graph G M.  We represent \nthe structure of the two graphs using a pair of connection matrices.  The connection \nmatrix for  the data graph consists of the  binary array \n\nwhile  that for  the model graph is \n\nDab  = \n\n{  1 \n\nif (a, b)  E ED \n\n0  otherwise \n\nMOl{3  = \n\nif (a , (3)  E EM \n\n{  1 \n0  otherwise \n\n(1) \n\n(2) \n\nThe current  state of match between  the two  graphs is  represented  by  the function \nf  : V D  -T V M\u00b7  In others words the statement f (a)  =  a  means that the node a E V D \nis  matched  to  the  node  a  E  V M.  The  binary  representation  of the  current  state \nof match  is  captured  by  a  set  of assignment  variables  which  convey  the  following \nmeaning \n\n(3) \n\n_ {1 \n\nSaa  -\n\nif f{a)  =  a \no  otherwise \n\nThe basic goal of the matching process is  to optimise a  consistency-measure which \ngauges  the structural  similarity of the matched  data graph and  the  model  graph. \nIn a  recent  series  of papers,  Wilson  and  Hancock [11,  12]  have shown  how  consist(cid:173)\nency  of match can  be  modelled  using a  Bayesian framework.  The  basic  idea is  to \nconstruct a  probability distribution which  models  the effect  of memoryless  match(cid:173)\ning errors in  generating departures from  consistency  between  the  data and  model \ngraphs.  Suppose  that  Sa  =  aU {(3I(a, (3)  E  EM}  represents  the  set  of nodes  that \nform  the immediate  contextual  neighbourhood of the  node  a  in  the model  graph. \n\n\f440 \n\nA.  M. Finch,  R. C.  Wilson and E. R.  Hancock \n\nFurther suppose that ra  =  f(a) U {f(b)l(a,b)  E  ED}  represents the set  of matches \nassigned  to  the  contextual  neighbourhood of the  node  a  E VD  of the  data graph. \nBasic to Wilson  and Hancock's modelling of relational consistency is  to regard the \ncomplete set  of model-graph relations as mutually exclusive causes from  which  the \npotentially corrupt matched model-graph relations arise.  As  a  result,  the probabil(cid:173)\nity of the matched configuration r a  can be expressed as a  mixture distribution over \nthe corresponding space of model-graph configurations \n\np(ra) =  L  p(raISa)P(Sa) \n\naEVM \n\n(4) \n\nThe modelling of the match confusion probabilities p(r alSa)  draws on the assump(cid:173)\ntion  that  the error process is  independent  of location.  This allows  p(raISa )  to be \nfactorised over its component matches.  Individual label errors are further assumed \nto act with a  memoryless  probability Pe .  With these ingredients the probability of \nthe matched neighbourhood r a  reduces to [11,  12] \n\np(ra) =  I~~I  2:  exp[-ItH(a,a)] \n\naEVM \n\n(5) \n\nwhere  Ka  =  (1- Pe)lfal  and the exponential  constant is  related  to the probability \nof label errors, i.e.  It  =  In  (l-;,~e ).  Consistency of match is gauged by the  \"Hamming \ndistance\", H(a, a)  between the matched relation r a  and the set of consistent neigh(cid:173)\nbourhood structures  Sa, 'Va  E  VM  from  the model graph.  According to our binary \nrepresentation of the matching process, the distance measure is  computed using  the \nconnectivity matrices and the assignment  variables in  the following  manner \n\nH(a, a) =  2:  2:  Ma{3Dab(l  - Sb{3) \n\nbEVD {3EVM \n\n(6) \n\nThe probability distribution  p(r a)  may  be regarded as providing a  natural way  of \nmodelling departures from  consistency at the neighbourhood level.  Matching con(cid:173)\nsistency is  graded  by  Hamming distance and controlled hardening may be induced \nby  reducing the label-error probability Pe  towards zero. \n3  The Effective Potential for  Discrete Relaxation \nWe  commence  the  development  of our  graduated assignment  approach to discrete \nrelaxation by computing an effective Gibbs potential U(r a)  for  the matching config(cid:173)\nuration r a.  In other words, we aim to replace the compound exponential probability \ndistribution appearing in equation  (5)  by  the single Gibbs distribution \n\n(7) \n\nOur route to the effective potential is provided by statistical physics.  If we represent \np(r a)  by an equivalent Gibbs distribution with an identical partition function,  then \nthe equilibrium  configurational potential  is  related  to the  partial derivative of the \nlog-probability with respect to the coupling constant It  in the following manner [13] \n\n\fSoftening Discrete Relaxation \n\n8J.t \nUpon  substituting for  p(r a)  from  equation  (5) \n\nu(r a)  =  _ 8ln p(r a) \n\n2:  H(a, a) exp[ -J.tH(a, a)] \nu(ra) =  _a_E~VM~ ________________ __ \n\n2:  exp[-J.tH(a,a)] \n\naEVM \n\n441 \n\n(8) \n\n(9) \n\nIn  other  words  the  neighbourhood  Gibbs  potentials  are  simply  weighted  sums  of \nHamming  distance  between  the  data  and  model  graphs.  In  fact  the  local  clique \npotentials display an interesting barrier property.  The potential is  concentrated at \nHamming  distance  H  ~ ~.  Both  very  large  and  very  small  Hamming  distances \ncontribute insignificantly to the energy function,  i.e.  limH-to H exp[-J.tH]  =  0 and \nlimH-too H exp[-J.tH] = o. \nWith  the  neighbourhood  matching  potentials  to  hand,  we  construct  a  global \n\"matching-energy\" [; =  2:aEVD  U(r a)  by summing the contributions over the nodes \nof the data graph. \n4  Optimising the Global  Cost  Function \nWe are now in a position to develop a continuous update algorithm by softening the \ndiscrete  ingredients of our graph  matching  potential.  The idea is  to compute  the \nderivatives  of the  global  energy  given  in  equation  (10)  and  to effect  the softening \nprocess using the soft-max idea of Bridle  [1]. \n4.1  Softassign \n\nThe energy function  represented by equations  (9)  and  (10)  is  defined  over the dis(cid:173)\ncrete matching variables Saa.  The basic idea underpinning this paper is  to realise a \ncontinuous process for  updating the assignment  variables.  The optimal step-size is \ndetermined by computing the partial derivatives of the global matching energy with \nrespect to the assignment variables.  We  commence by computing the derivatives of \nthe contributing neighbourhood Gibbs potentials, i.e. \n\nwhere \n\n~aa = \n\nexp(-J.tH(a, a)] \n\n2:aIEVM exp[-J.tH(a, a l )] \n\n(11) \n\nTo  further  develop  this  result,  we  must  compute  the  derivatives  of the  Hamming \ndistances.  From equation  (6)  it follows  that \n\n8H(a,a)  _  M  D \n\n8 \n\nSb{3 \n\n-\n\n-\n\na{3 \n\nab \n\n(12) \n\nIt is now a straightforward matter to show that the derivative of the global matching \nenergy is  equal to \n\n\f442 \n\nA. M.  Finch,  R. C.  Wilson and E.  R.  Hancock \n\nWe would like our continuous matching vanables to remain constrained to lie within \nthe range [0, 1].  Rather than using a linear update rule, we exploit Bridle's soft-max \nansatz [1).  In  doing this we arrive at an update process which has many features in \ncommon  with the well-known mean-field equations of statistical physics \n\nexp[-~~] \nSao.  +- -----'::.........,[;:---:--0-3-:\"\"\"\"\"\"\"\"\"] \nL  exp  -~_\u00a3 \nT  OSaa' \n\nT  OSaa \n\na'EVM \n\n(14) \n\nThe mathematical structure of this update process is important and deserves further \ncomment.  The quantity  eaa  defined  in  equation  (11)  naturally plays  the  role  of a \nmatching probability.  The first term appearing under the square bracket in equation \n(13)  can  therefore  be thought  of as  analogous to  the optimal  update  direction  for \nthe standard quadratic cost function  [10,6);  we  will discus this relationship in more \ndetail  in  Section  4.2.  The  second  term  modifies  this  principal  update  direction \nby  taking into  account  the  weighted  fluctuations  in  the  Hamming  distance  about \nthe  effective  potential  or  average  Hamming  distance.  If the  average  fluctuation \nis  zero,  then  there  is  no  net  modification  to  the  update  direction.  When  the  net \nfluctuation  is non-zero, the direction of update is  modified  so  as to compensate for \nthe movement of the mean-value of the effective potential.  This corrective tracking \nprocess provides an explicit mechanism for  maintaining contact with the minimum \nof the effective  potential  under rescaling effects  induced  by changes in  the value of \nthe coupling constant p.  Moreover, since the fluctuation  term is  itself proportional \nto p,  this  has  an insignificant  effect  for  Pe  ~ ~ but  dominates  the  update process \nwhen  Pe  -+ 0. \n4.2  Quadratic  Assignment  Problem \n\nBefore we  proceed to experiment with the new graph matching process,  it is  inter(cid:173)\nesting to briefly review the standard quadratic formulation of the matching problem \ninvestigated by Simic  (9],  Suganathan et al (to]  and Gold and Rangarajan (6].  The \ncommon feature of these algorithms is to commence from the quadratic cost function \n\n(15) \n\nIn  this  case  the  derivative  of the  global  cost  function  is  linear  in  the  assignment \nvariables, i.e. \n\n(16) \n\nThis step size is  equivalent to that appearing in equation  (14)  provided that p = 0, \ni.e.  Pe  -+  !.  The  update  is  realised  by  applying  the  soft-max ansatz of equation \n(14) .  In the  next  section,  we  will  provide  some experimental comparison with the \nresulting matching process.  However, it is  important to stress that the update pro(cid:173)\ncess adopted here is very simplistic and leaves considerable scope for  further refine(cid:173)\nment.  For  instance, Gold and Rangarajan (6]  have exploited the doubly stochastic \nproperties of Sinckhorn matrices to ensure two-way symmetry in the matching pro(cid:173)\ncess. \n\n\fSoftening Discrete Relaxation \n\n443 \n\n5  Experiments and  Conclusions \n\nOur main  aim  in  this  Section  is  to compare  the non-linear  update equations  with \nthe optimisation of the  quadratic matching criterion described  in  Section 4.2.  The \ndata for  our  study  is  provided  by  synthetic  Delaunay  graphs.  These  graphs  are \nconstructed by  generating random dot  patterns.  Each random dot  is  used  to seed \na  Voronoi  cell.  The  Delaunay  triangulation  is  the  region  adjacency  graph for  the \nVoronoi cells.  In order to pose demanding tests of our matching technique, we  have \nadded controlled amounts of corruption to the synthetic graphs.  This is  effected by \ndeleting and adding a specified fraction of the dots from the initial random patterns. \nThe  associated  Delaunay  graph  is  therefore  subject  to  structural corruption.  We \nmeasure the degree of corruption by the fraction of surviving nodes in the corrupted \nDelaunay graph. \n\nOur experimental protocol has been  as follows .  For a  series of different  corruption \nlevels,  we  have generated a  sample of 100  random graphs.  The graphs  contain  50 \nnodes  each.  According  to the specified  corruption  level,  we  have  both  added  and \ndeleted  a  predefined  fraction  of nodes at random locations in  the initial graphs so \nas  to  maintain  their  overall size.  For  each graph we  measure the  quality of match \nby computing the fraction of the surviving nodes for  which the assignment variables \nindicate the correct  match.  The value of the temperature T  in the  update process \nhas  been  controlled  using  a  logarithmic  annealing schedule  of the  form  suggested \nby  Geman and Geman  (41 .  We  initialise  the assignment  variables  uniformly  across \nthe set  of matches by setting  Saa  =  JM ,  \"ta, 0:. \nWe  have compared the results obtained with two different  versions of the matching \nalgorithm.  The  first  of these  involves  updating  the  softened  assignment  variables \nby  applying  the  non-linear  update equation  given  in  (14).  The  second  matching \nalgorithm involves applying the same optimisation apparatus to the quadratic cost \nfunction  defined  in  equation  (15)  in  a  simplified  form  of the  quadratic  assignment \nalgorithm [6,  101. \n\nFigure 1 shows  the final  fraction of correct  matches for  both algorithms.  The data \ncurves  show  the  correct  matching  fraction  averaged  over  the  graph  samples  as  a \nfunction  of the corruption fraction.  The main conclusions that can  be drawn from \nthese plots is  that the new  matching technique described in  this paper significantly \noutperforms  its  conventional  quadratic  counterpart  described  in  Section  4.2.  The \nmain difference between the two techniques resides in the fact  that our new method \nrelies on updating with derivatives of the energy function  that are non-linear in the \nassignment  variables. \n\nTo  conclude,  our  main  contribution  in  this  paper  has  been  to  demonstrate  how \nthe  discrete  Bayesian  relational  consistency  measure  of Wilson  and  Hancock  (111 \ncan be cast  in  a  form  that  is  amenable to continuous non-linear optimisation.  We \nhave shown how the method relates to the standard quadratic assignment algorithm \nextensively studied  in  the  connectionist  literature  [6,  9,  101.  Moreover,  an exper(cid:173)\nimental  analysis  reveals  that  the  method  offers  superior  performance  in  terms  of \nnoise  control. \n\nReferences \n[1]  Bridle J.S.  \"Training stochastic model  recognition algorithms can lead to maximum \n\nmutual information estimation of parameters\"  NIPS2,  pp. 211-217,  1990. \n\n\f444 \n\nA.  M.  Finch,  R.  C.  Wilson and E.  R.  Hancock \n\n~' .. -.. \"\", \n\n..... \n\n..... \n\nQuadratic Assignment  - -\n\n\u00b7\u00b7\u00b7~\u00b7 ....... ~oftened Discrete Relaxation \n\nc: \n0 \n'13 \ntil u: \ni \n0 \n() \niii c: u:: \n\n0.8 \n\n0.6 \n\n0.4 \n\n0.2 \n\n0 \n\n0 \n\n0.2 \n\n. ~ \n\", \n\n\". \n\n'.' \u2022.. ~.--............... \" \n\n\u2022.. ' ...... .,.~-------i; \n\n\\ \n\\ \\ \n\\ '. \n\n\\ \n\n.... \n\n0.8 \n\n0.4 \n\n0.6 \n\nFraction of Graph Corrupt \n\nFigure  1:  Experimental  comparison:  softened  discrete  relaxation  (dotted  curve); \nmatching using  the quadratic cost function  (solid  curve). \n\n[2]  Cross  A.D.J.,  RC.Wilson  and E.R Hancock,  \"Genetic search for  structural match(cid:173)\n\ning\",  Proceedings  ECCV96,  LNCS  1064, pp.  514-525,  1996. \n\n[3]  Cross  A.D.J . and E .RHancock,  \"Relational matching with  stochastic optimisation\" \nIEEE Computer Society  International Symposium  on  Computer  Vision,  pp . 365-370, \n1995. \n\n[4]  Geman  S.  and D.  Geman,  \"Stochastic  relaxation,  Gibbs distributions  and Bayesian \n\nrestoration of images,\"  IEEE PAMI,  PAMI-6  , pp.721- 741 ,  1984. \n\n[5]  Gold S., A.  Rangarajan and E.  Mjolsness,  \"Learning with pre-knowledge:  Clustering \nwith point and graph-matching distance measures\",  Neural  Computation,  8, pp. 787-\n804,  1996. \n\n[6]  Gold S.  and A.  Rangarajan,  \"A graduated assignment algorithm for graph matching\", \n\nIEEE PAMI,  18, pp.  377-388,  1996. \n\n[7]  Sanfeliu  A.  and Fu  K.S .,  \"A  distance  measure  between  attributed relational  graphs \n\nfor  pattern recognition\",  IEEE SMC,  13, pp 353-362,  1983. \n\n[8]  Shapiro  L.  and RM.Haralick,  \"Structural description and inexact  matching\",  IEEE \n\nPAM!,  3, pp 504-519,  1981. \n\n[9]  Simic P.,  \"Constrained nets for  graph matching and other quadratic assignment prob(cid:173)\n\nlems\",  Neural  Computation,  3  , pp . 268- 281,  1991. \n\n[10]  Suganathan P.N., E .K.  Teoh and D.P. Mital,  \"Pattern recognition by graph matching \n\nusing  Potts MFT  networks\",  Pattern  Recognition,  28,  pp.  997-1009,  1995. \n\n[11]  Wilson RC., Evans A.N.  and Hancock E.R,  \"Relational matching by discrete relax(cid:173)\n\nation\",  Image  and  Vision  Computing,  13, pp.  411-421,  1995. \n\n[12]  Wilson  RC  and  Hancock  E.R,  \"Relational  matching  with  dynamic  graph  struc(cid:173)\n\ntures\" ,  Proceedings  of the  Fifth  International  Conference  on  Computer  Vision,  pp. \n450-456,  1995. \n\n[13]  Yuille  A.,  \"Generalised  deformable  models,  statistical  physics  and  matching  prob(cid:173)\n\nlems\",  Neural  Computation,  2,  pp.  1-24,  1990. \n\n\f", "award": [], "sourceid": 1308, "authors": [{"given_name": "Andrew", "family_name": "Finch", "institution": null}, {"given_name": "Richard", "family_name": "Wilson", "institution": null}, {"given_name": "Edwin", "family_name": "Hancock", "institution": null}]}