{"title": "Convergence Rates of Algorithms for Visual Search: Detecting Visual Contours", "book": "Advances in Neural Information Processing Systems", "page_first": 641, "page_last": 647, "abstract": null, "full_text": "Convergence Rates of Algorithms for \n\nVisual Search: Detecting Visual Contours \n\nA.L. Yuille \n\nSmith-Kettlewell Inst . \n\nSan Francisco, CA 94115 \n\nJames M. Coughlan \nSmith-Kettlewell Inst. \n\nSan Francisco, CA 94115 \n\nAbstract \n\nThis paper formulates the problem of visual search as Bayesian \ninference and defines a Bayesian ensemble of problem instances . \nIn particular, we address the problem of the detection of visual \ncontours in noise/clutter by optimizing a global criterion which \ncombines local intensity and geometry information. We analyze \nthe convergence rates of A * search algorithms using results from \ninformation theory to bound the probability of rare events within \nthe Bayesian ensemble. This analysis determines characteristics of \nthe domain , which we call order parameters, that determine the \nconvergence rates. In particular, we present a specific admissible \nA * algorithm with pruning which converges, with high probability, \nwith expected time O(N) in the size of the problem. \nIn addi(cid:173)\ntion, we briefly summarize extensions of this work which address \nfundamental limits of target contour detectability (Le. algorithm \nindependent results) and the use of non-admissible heuristics. \n\n1 \n\nIntroduction \n\nMany problems in vision, such as the detection of edges and object boundaries in \nnoise/clutter, see figure (1), require the use of search algorithms . Though many \nalgorithms have been proposed, see Yuille and Coughlan (1997) for a review, none \nof them are clearly optimal and it is difficult to judge their relative effectiveness. One \napproach has been to compare the results of algorithms on a representative dataset \nof images. This is clearly highly desirable though determining a representative \ndataset is often rather subjective. \nIn this paper we are specifically interested in the convergence rates of A * algorithms \n(Pearl 1984). It can be shown (Yuille and Coughlan 1997) that many algorithms \nproposed to detect visual contours are special cases of A * . We would like to \nunderstand what characteristics of the problem domain determine the convergence \n\n\f642 \n\nA. L. Yuille and J. M Coughlan \n\nFigure 1: The difficulty of detecting the target path in clutter depends, by our \ntheory (Yuille and Coughlan 1998), on the order parameter K. The larger K the \nless computation required. Left, an easy detection task with K = 3.1. Middle, a \nhard detection task K = 1.6. Right, an impossible task with K = -0.7. \n\nrates. \n\nWe formulate the problem of detecting object curves in images to be one of statistical \nestimation. This assumes statistical knowledge of the images and the curves, see \nsection (2). Such statistical knowledge has often been used in computer vision for \ndetermining optimization criteria to be minimized. We want to go one step further \nand use this statistical knowledge to determine good search strategies by defining \na Bayesian ensemble of problem instances. For this ensemble, we can prove certain \ncurve and boundary detection algorithms, with high probability, achieve expected \ntime convergence in time linear with the size of the problem. Our analysis helps \ndetermine important characteristics of the problem, which we call order parameters, \nwhich quantify the difficulty of the problem. \n\nThe next section (2) of this paper describes the basic statistical assumptions we \nmake about the domain and describes the mathematical tools used in the remaining \nsections. In section (3) we specify our search algorithm and establish converEence \nrates. We conclude by placing this work in a larger context and summarizing recent \nextensions. \n\n2 Statistical Background \n\nOur approach assumes that both the intensity properties and the geometrical shapes \nof the target path (i.e. the edge contour) can be determined statistically. This path \ncan be considered .to be a set of elementary path segments joined together. We first \nconsider the intensity properties along the edge and then the geometric properties. \nThe set of all possible paths can be represented by a tree structure, see figure (2). \n\nThe image properties at segments lying on the path are assumed to differ, in a \nstatistical sense, from those off the path. More precisely, we can design a filter \u00a2(.) \nwith output {Yx = \u00a2(I(x))} for a segment at point x so that: \n\nP(Yx) = Pon(Yx), if \"XII lies on the true path \nP(Yx) = Poff(Yx), if \"X'I lies off the true path. \n\n(1) \n\nFor example, we can think of the {Yx} as being values of the edge strength at point \nx and Pon, Poll being the probability distributions of the response of \u00a2(.) on and \noff an edge. The set of possible values of the random variable Yx is the alphabet \nwith alphabet size M (Le. Yx can take any of M possible values). See (Geman and \nJedynak 1996) for examples of distributions for Pon, Pol I used in computer vision \napplications. \n\nWe now consider the geometry of the target contour. We require the path to be \nmade up of connected segments Xl, X2, ... , x N. There will be a Markov probability \ndistribution Pg(Xi+I!Xi) which specifies prior probabilistic knowledge of the target. \n\n\fConvergence Rates of Algorithmsfor Visual Search: Detecting Visual Contours \n\n643 \n\nIt is convenient, in terms of the graph search algorithms we will use, to consider that \neach point x has a set of Q neighbours. Following terminology from graph theory, \nwe refer to Q as the branching factor. We will assume that the distribution Pg \ndepends only on the relative positions of XHI and Xi. In other words, Pg(XHllxi) = \nPLlg(XHl - Xi). An important special case is when the probability distribution \nis uniform for all branches (Le. PLlg(Ax) = U(Ax) = I/Q, VAx). The joint \ndistribution P(X, Y) of the road geometry X and filter responses Y determines the \nBayesian Ensemble. \nBy standard Bayesian analysis, the optimal path X* = {xi, ... , XN} maximizes the \nsum of the log posterior: \n\n(2) \n\nwhere the sum i is taken over all points on the target. U(Xi+l - Xi) is the uniform \ndistribution and its presence merely changes the log posterior E(X) by a constant \nvalue. It is included to make the form of the intensity and geometric terms similar, \nwhich simplifies our later analysis. \nWe will refer to E(X) as the reward of the path X which is the sum of the intensity \nrewards log Pon (Y(~jl) and the geometric rewards log PL:>.g (Xi+l -Xi) \nU(Xi+l -Xi) \n\nPoll (Y(~i\u00bb \n\nIt is important to emphasize that our results can be extended to higher-order \nMarkov chain models (provided they are shift-invariant). We can, for example, \ndefine the x variable to represent spatial orientation and position of a small edge \nsegment. This will allow our theory to apply to models, such as snakes, used in \nrecent successful vision applications (Geman and Jedynak 1996). (It is straightfor(cid:173)\nward to transform the standard energy function formulation of snakes into a Markov \nchain by discretizing and replacing the derivatives by differences. The smoothness \nconstraints, such as membranes and thin plate terms, will transform into first and \nsecond order Markov chain connections respectively). Recent work by Zhu (1998) \nshows that Markov chain models of this type can be learnt using Minimax Entropy \nLearning theory from a representative set of examples. Indeed Zhu goes further by \ndemonstrating that other Gestalt grouping laws can be expressed in this framework \nand learnt from representative data. \nMost Bayesian vision theories have stopped at this point. The statistics of the prob(cid:173)\nlem domain are used only to determine the optimization criterion to be minimized \nand are not exploited to analyze the complexity of algorithms for performing the op(cid:173)\ntimization. In this paper, we go a stage further. We use the statistics ofthe problem \ndomain to define a Bayesian ensemble and hence to determine the effectiveness of \nalgorithms for optimizing criteria such as (2). To do this requires the use of Sanov's \ntheorem for calculating the probability of rare events (Cover and Thomas 1991). \nFor the road tracking problem this can be re-expressed as the following theorem, \nderived in (Yuille and Coughlan 1998): \nTheorem 1. The probabilities that the spatially averaged log-likelihoods on, and \noff, the true curve are above, or below, threshold T are bounded above as follows: \n\nPr{.!. t {log Pon(y(Xi\u00bb) }on < T} :s; (n + I)M2-nD(PTlfPon) \n\nn i=l \n\nPoff (Y(Xi\u00bb) \n\nPr{.!. t{lOg Pon(Y(Xi\u00bb) }off > T}:S; (n+ I)M2-nD(PTIIPOI/) , \n\nn i=l \n\nPOff(Y(Xi\u00bb) \n\n(3) \n\n(4) \n\n\f644 \n\nA. L. Yuille and J. M. Coughlan \n\nwhere the subscripts on and off mean that the data is generated by Pon, Po\", \nPT(y) = p;;;>'(T) (y)P;;p jZ(T) where a ::; \"\\(T) ::; 1 is a scalar which depends \non the threshold T and Z(T) is a normalization factor. The value of \"\\(T) is deter(cid:173)\nmined by the constraint 2: y PT (y) log ;'\u00b0In}(~) = T. \nIn the next section, we will use Theorem 1 to determine a criterion for pruning \nthe search based on comparing the intensity reward to a threshold T (pruning will \nalso be done using the geometric reward). The choice of T involves a trade-off. If \nT is large (Le. close to D(PonllPoff)) then we will rapidly reject false paths but \nwe might also prune out the target (true) path. Conversely, if T is small (close \nto -D(PoffllPon)) then it is unlikely we will prune out the target path but we \nmay waste a lot of time exploring false paths. In this paper we choose T large and \nwrite the fall-off factors (Le. the exponents in the bounds of equations (3,4)) as \nD(PTllPon) = tl (T), D(PTilPoff) = D(PonilPoff) - t2(T) where tl (T), t2(T) are \npositive and (tl(T),t2(T)) t-+ (0,0) as T t-+ D(PonilPoff ). We perform a similar \nanalysis for the geometric rewards by substituting P 6.g , U for Pon , Pol I' We choose \na threshold T satisfying -D(UIIP6.g) < T < D(P6.gllU). The results of Theorem \n1 apply with the obvious substitutions. In particular, the alphabet factor becomes \nQ (the branching factor). Once again, in this paper, we choose T to be large and \nobtain fall-off factors D(Pt'IIP6.g) = El (T), D(Pt'IIU) = D(P6.gllU) - E2(T). \n\n3 Tree Search: A *, heuristics, and block pruning \n\nWe now consider a specific example, motivated by Geman and Jedynak (1996), \nof searching for a path through a search tree. In Geman and Jedynak the path \ncorresponds to a road in an aerial image and they assume that they are given an \ninitial point and direction on the target path. They have a branching factor Q = 3 \nand, in their first version, the prior probability of branching is considered to be the \nuniform distribution (later they consider more sophisticated priors). They assume \nthat no path segments overlap which means that the search space is a tree of size \nQN where N is the size of the problem (Le. the longest length). The size of the \nproblem requires an algorithm that converges in O(N) time and they demonstrate \nan algorithm which empirically performs at this speed. But no proof of convergence \nrates are given in their paper. It can be shown, see (Yuille and Coughlan 1997), \nthat the Geman and Jedynak algorithm is a close approximation to A * which uses \npruning. (Observe that Geman and Jedynak's tree representation is a simplifying \nassumption of the Bayesian model which assumes that once a path diverges from \nthe true path it can never recover, although we stress that the algorithm is able to \nrecover from false starts - for more details see Coughlan and Yuille 1998). \nWe consider an algorithm which uses an admissible A * heuristic and a pruning \nmechanism. The idea is to examine the paths chosen by the A * heuristic. As the \nlength of the candidate path reaches an integer multiple of No we prune it based on \nits intensity reward and its geometric reward evaluated on the previous No segments, \nwhich we call a segment block. The reasoning is that few false paths will survive \nthis pruning for long but the target path will survive with high probability. \n\nWe prune on the intensity by eliminating all paths whose intensity reward, averaged \nover the last No segments, is below a threshold T (recall that -D(PoffllPon) < T < \nD(PonllPoff) and we will usually select T to take values close to D(PonllPoff)). \nIn addition, we prune on the geometry by eliminating all paths whose geometric \nrewards, averaged over the last No segments, are below T (where -D(UIIP6.g) < \nT < D(P6.gllU) with T typically being close to D(P6.gllU)). More precisely, we \n\n\fConvergence Rates of AlgOrithms for Visual Search : Detecting Visual Contours \n\ndiscard a path provided (for any integer z ~ 0): \n\n1 (z~o I \n- ~ og \nNo i=zNo+l \n\nPon(Yi) \n\nPoff(yd \n\n< T, or No L log U(Llx.) < T. \n\nPLlg(Llxi) \n\nt \n\n1 (z+l)No \n\ni=zNo+l \n\n645 \n\n(5) \n\nThere are two important issues to address: (i) With what probability will the \nalgorithm converge?, (ii) How long will we expect it take to converge? The next \ntwo subsections put bounds on these issues. \n\n3.1 Probability of Convergence \n\nBecause of the pruning, there is a chance that there will be no paths which survive \npruning. To put a bound on this we calculate the probability that the target \n(true) path survives the pruning. This gives a lower bound on the probability of \nconvergence (because there could be false paths which survive even if the target \npath is mistakenly pruned out). \n\nThe pruning rules removes path segments for which the intensity reward r I or the \ngeometric reward r 9 fails the pruning test. The probability of failure by removing \na block segment of the true path, with rewards r~, r~, is Pr(r~ < T or r~ < T) ::; \nPr(r~ < T) + Pr(r~ < T) ::; (No + 1)M2- NoE1 (T) + (No + 1)Q2-NoilCT), where we \nhave used Theorem 1 to put bounds on the probabilities. The probability of pruning \nout any No segments of the true path can therefore be made arbitrarily small by \nchoosing No, T, T so as to make Notl and NOtl large. \nIt should be emphasized that the algorithm will not necessarily converge to the \nexact target path. The admissible nature of the heuristic means that the algorithm \nwill converge to the path with highest reward which has survived the pruning. It \nis highly probable that this path is close to the target path. Our recent results \n(Coughlan and Yuille 1998, Yuille and Coughlan 1998) enable us to quantify this \nclaim. \n\n3.2 Bounding the Number of False Paths \n\nSuppose we face a Q-nary tree. We can order the false paths by the stage at which \nthey diverge from the target (true) path, see figure (2). For example, at the first \nbranch point the target path lies on only one of the Q branches and there are Q - 1 \nfalse branches which generate the first set of false paths Fl' Now consider all the \nQ -1 false branches at the second target branch, these generate set F2 . As we follow \nalong the true path we keep generating these false sets Fi . The set of all paths is \ntherefore the target path plus the union of the Fi (i = 1, ... , N). To determine \nconvergence rates we must bound the amount of time we spend searching the Fi. If \nthe expected time to search each Fi is constant then searching for the target path \nwill at most take constant\u00b7 N steps. \nConsider the set Fi of false paths which leave the true path at stage i. We will apply \nour analysis to block segments of Fi which are completely off the true path. If (i -1) \nis an integer multiple of No then all block segments of Fi will satisfy this condition. \nOtherwise, we will start our analysis at the next block and make the worse case \nassumption that all path segments up till this next block will be searched. Since \nthe distance to the next block is at most No - 1, this gives a maximum number of \nQNo-l starting blocks for any branch of Fi . Each Fi also has Q - 1 branches and \nso this gives a generous upper bound of (Q - l)Q N o-l starting blocks for each Fi . \n\n\f646 \n\nA. L. Yuille and J. M. Coughlan \n\nFigure 2: The target path is shown as the heavy line. The false path sets are \nlabelled as Fl ,F2 , etc. with the numbering depending on how soon they leave the \ntarget path. The branching factor Q = 3. \n\nFor each starting block, we wish to compute (or bound) the expected number of \nblocks that are explored thereafter. This requires computing the fertility of a block, \nthe average number of paths in the block that survive pruning. Provided the fertility \nis smaller than one, we can then apply results from the theory of branching processes \nto determine the expected number of blocks searched in Fi . \n\nThe fertility q is the number of paths that survive the geometric pruning times the \nprobability that each survives the intensity pruning. This can be bounded (using \nTheorem 1) by q :s q where: \nq = QN0(No + I)Q2-No{D(hgIIU)-\u20ac2(T)}(No + I)M2-No{D(PonIIPoff)-E2(T)} \n= (No + I)Q+M 2- No {D(Pon IIPof! )-H(Pag)-E2(T)-\u20ac2(T)}, \n\n(6) \n\nIn other words, the better the edge detector and the more \n\nwhere we used the fact that D(PLlgIIU) = 10gQ - H(PLlg). \nObserve that the condition q < 1 can be satisfied provided D(PonllPolf )-H(PLlg) > \nO. This condition is intuitive, it requires that the edge detector information, quan(cid:173)\ntified by D(PonIIPolf )' must be greater than the uncertainty in the geometry mea(cid:173)\nsured by H(PLlg). \npredictable the path geometry then the smaller q will be. \nWe now apply the theory of branching processes to determine the expected number \nof blocks explored from a starting block in Fi 'L~o qZ = 1/(1 - q). The number \nof branches of Fi is (Q - 1), the total number of segments explored per block is at \nmost QNo , and we explore at most QNo-l segments before reaching the first block. \nThe total number of Fi is N. Therefore the total number of segments wastefully \nexplored is at most N(Q - 1) 1~qQ2No-1. We summarize this result in a theorem: \nTheorem 2. Provided q = (No + I)Q+M2- NoK < 1, where the order parameter \nK = D(PonllPolf) - H(PLlg) -\nthen the expected number of false \nsegments explored is at most N(Q - 1) 1~qQ2No-1. \nComment The requirement that q < 1 is chiefly determined by the order parameter \nK = D (Pon IlPolf ) - H (P Llg) - \u20ac2 (T) - f2 (T). Our convergence proofrequires that \nK > 0 and will break down if K < O. Is this a limitation of our proof? Or does it \ncorrespond to a fundamental difficulty in solving this tracking problem? \n\n\u20ac2(T) \n\n- \u20ac2(T), \n\nIn more recent work (Yuille and Coughlan 1998) we extend the concept of order \nparameters and show that they characterize the difficulty of visual search problem \nindependently of the algorithm. In other words, as K 1----7 0 the problem becomes \nimpossible to solve by any algorithm. There will be too many false paths which \nhave better rewards than the target path. As K 1----7 0 there is a phase transition in \nthe ease of solving the problem. \n\n\fConvergence Rates of Algorithmsfor Visual Search: Detecting Visual Contours \n\n647 \n\n4 Conclusion \n\nOur analysis shows it is possible to detect certain types of image contours in linear \nexpected time (with given starting points). We have shown how the convergence \nrates depend on order parameters which characterize the problem domain. In par(cid:173)\nticular, the entropy of the geometric prior and the Kullback-Leibler distance be(cid:173)\ntween Pon and Pof f allow us to quantify intuitions about the power of geometrical \nassumptions and edge detectors to solve these tasks. \n\nOur more recent work (Yuille and Coughlan 1998) has extended this work by show(cid:173)\ning that the order parameters can be used to specify the intrinsic (algorithm in(cid:173)\ndependent) difficulty of the search problem and that phase transitions occur when \nthese order parameters take critical values. In addition, we have proved conver(cid:173)\ngence rates for A * algorithms which use inadmissible heuristics or combinations of \nheuristics and pruning (Coughlan and Yuille 1998). \n\nAs shown in (Yuille and Coughlan 1997) many of the search algorithms proposed \nto solve vision search problems, such as (Geman and Jedynak 1996), are special \ncases of A * (or close approximations). We therefore hope that the results of this \npaper will throw light on the success of the algorithms and may suggest practical \nimprovements and speed ups. \n\nAcknow ledgements \n\nWe want to acknowledge funding from NSF with award number IRI-9700446, from \nthe Center for Imaging Sciences funded by ARO DAAH049510494, and from an \nASOSRF contract 49620-98-1-0197 to ALY. We would like to thank L. Xu, D. \nSnow, S. Konishi, D. Geiger, J. Malik, and D. Forsyth for helpful discussions. \n\nReferences \n[1] J .M. Coughlan and A.L. Yuille. \"Bayesian A * Tree Search with Expected O(N) \nConvergence Rates for Road Tracking.\" Submitted to Artificial Intelligence. \n1998. \n\n[2] T.M. Cover and J.A. Thomas. Elements of Information Theory. Wiley \n\nInterscience Press. New York. 1991. \n\n[3] D. Geman. and B. Jedynak. \"An active testing model for tracking roads in \nsatellite images\". IEEE Trans. Patt. Anal. and Machine Intel. Vol. 18. No.1, \npp 1-14. January. 1996. \n\n[4] J. Pearl. Heuristics. Addison-Wesley. 1984. \n[5] A.L. Yuille and J. Coughlan. \" Twenty Questions, Focus of Attention, and A *\" . \nIn Energy Minimization Methods in Computer Vision and Pattern \nRecognition. Ed. M. Pellilo and E. Hancock. Springer-Verlag. (Lecture Notes \nin Computer Science 1223). 1997. \n\n[6] A.L. Yuille and J .M~ Coughlan. \"Visual Search: Fundamental Bounds, Order \nParameters, Phase Transitions, and Convergence Rates.\" Submitted to Pattern \nAnalysis and Machine Intelligence. 1998. \n\n[7] S.C. Zhu. \"Embedding Gestalt Laws in Markov Random Fields\". Submitted \nto IEEE Computer Society Workshop on Perceptual Organization in Computer \nVision. \n\n\f", "award": [], "sourceid": 1513, "authors": [{"given_name": "Alan", "family_name": "Yuille", "institution": null}, {"given_name": "James", "family_name": "Coughlan", "institution": null}]}