{"title": "Provable Submodular Minimization using Wolfe's Algorithm", "book": "Advances in Neural Information Processing Systems", "page_first": 802, "page_last": 809, "abstract": "Owing to several applications in large scale learning and vision problems, fast submodular function minimization (SFM) has become a critical problem. Theoretically, unconstrained SFM can be performed in polynomial time (Iwata and Orlin 2009), however these algorithms are not practical. In 1976, Wolfe proposed an algorithm to find the minimum Euclidean norm point in a polytope, and in 1980, Fujishige showed how Wolfe's algorithm can be used for SFM. For general submodular functions, the Fujishige-Wolfe minimum norm algorithm seems to have the best empirical performance. Despite its good practical performance, theoretically very little is known about Wolfe's minimum norm algorithm -- to our knowledge the only result is an exponential time analysis due to Wolfe himself. In this paper we give a maiden convergence analysis of Wolfe's algorithm. We prove that in t iterations, Wolfe's algorithm returns a O(1/t)-approximate solution to the min-norm point. We also prove a robust version of Fujishige's theorem which shows that an O(1/n^2)-approximate solution to the min-norm point problem implies exact submodular minimization. As a corollary, we get the first pseudo-polynomial time guarantee for the Fujishige-Wolfe minimum norm algorithm for submodular function minimization. In particular, we show that the min-norm point algorithm solves SFM in O(n^7F^2)-time, where $F$ is an upper bound on the maximum change a single element can cause in the function value.", "full_text": "Provable Submodular Minimization using\n\nWolfe\u2019s Algorithm\n\nDeeparnab Chakrabarty\u2217\n\nPrateek Jain\u2217\n\nPravesh Kothari\u2020\n\nAbstract\n\nOwing to several applications in large scale learning and vision problems, fast\nsubmodular function minimization (SFM) has become a critical problem. Theoreti-\ncally, unconstrained SFM can be performed in polynomial time [10, 11]. However,\nthese algorithms are typically not practical. In 1976, Wolfe [21] proposed an\nalgorithm to \ufb01nd the minimum Euclidean norm point in a polytope, and in 1980,\nFujishige [3] showed how Wolfe\u2019s algorithm can be used for SFM. For general\nsubmodular functions, this Fujishige-Wolfe minimum norm algorithm seems to\nhave the best empirical performance.\n\nDespite its good practical performance, very little is known about Wolfe\u2019s minimum\nnorm algorithm theoretically. To our knowledge, the only result is an exponential\ntime analysis due to Wolfe [21] himself. In this paper we give a maiden convergence\nanalysis of Wolfe\u2019s algorithm. We prove that in t iterations, Wolfe\u2019s algorithm\nreturns an O(1/t)-approximate solution to the min-norm point on any polytope. We\nalso prove a robust version of Fujishige\u2019s theorem which shows that an O(1/n2)-\napproximate solution to the min-norm point on the base polytope implies exact\nsubmodular minimization. As a corollary, we get the \ufb01rst pseudo-polynomial time\nguarantee for the Fujishige-Wolfe minimum norm algorithm for unconstrained\nsubmodular function minimization.\n\nIntroduction\n\n1\nAn integer-valued1 function f : 2X \u2192 Z de\ufb01ned over subsets of some \ufb01nite ground set X of n\nelements is submodular if it satis\ufb01es the following diminishing marginal returns property: for every\nS \u2286 T \u2286 X and i \u2208 X \\ T , f (S \u222a {i}) \u2212 f (S) \u2265 f (T \u222a {i}) \u2212 f (T ). Submodularity arises\nnaturally in several applications such as image segmentation [17], sensor placement [18], etc. where\nminimizing an arbitrary submodular function is an important primitive.\nIn submodular function minimization (SFM), we assume access to an evaluation oracle for f which\nfor any subset S \u2286 X returns the value f (S). We denote the time taken by the oracle to answer a\nsingle query as EO. The objective is to \ufb01nd a set T \u2286 X satisfying f (T ) \u2264 f (S) for every S \u2286 X.\nIn 1981, Grotschel, Lovasz and Schrijver [8] demonstrated the \ufb01rst polynomial time algorithm for\nSFM using the ellipsoid algorithm. This algorithm, however, is practically infeasible due to the\nrunning time and the numerical issues in implementing the ellipsoid algorithm. In 2001, Schrijver [19]\nand Iwata et al. [9] independently designed combinatorial polynomial time algorithms for SFM.\nCurrently, the best algorithm is by Iwata and Orlin [11] with a running time of O(n5EO + n6).\n\nHowever, from a practical stand point, none of the provably polynomial time algorithms exhibit good\nperformance on instances of SFM encountered in practice (see \u00a74). This, along with the widespread\napplicability of SFM in machine learning, has inspired a large body of work on practically fast\nprocedures (see [1] for a survey). But most of these procedures focus either on special submodular\n\n\u2217Microsoft Research, 9 Lavelle Road, Bangalore 560001.\n\u2020University of Texas at Austin (Part of the work done while interning at Microsoft Research)\n1One can assume any function is integer valued after suitable scaling.\n\n1\n\n\fwhere x(A) :=(cid:80)\n\nfunctions such as decomposable functions [16, 20] or on constrained SFM problems [13, 12, 15,\n14].\nFujishige-Wolfe\u2019s Algorithm for SFM: For any submodular function f, the base polytope Bf of f\nis de\ufb01ned as follows:\n\nBf = {x \u2208 Rn : x(A) \u2264 f (A), \u2200A \u2282 X, and x(X) = f (X)},\n(1)\ni\u2208A xi and xi is the i-th coordinate of x \u2208 Rn. Fujishige [3] showed that if one can\nobtain the minimum norm point on the base polytope, then one can solve SFM. Finding the minimum\nnorm point, however, is a non-trivial problem; at present, to our knowledge, the only polynomial\ntime algorithm known is via the ellipsoid method. Wolfe [21] described an iterative procedure to \ufb01nd\nminimum norm points in polytopes as long as linear functions could be (ef\ufb01ciently) minimized over\nthem. Although the base polytope has exponentially many constraints, a simple greedy algorithm\ncan minimize any linear function over it. Therefore using Wolfe\u2019s procedure on the base polytope\ncoupled with Fujishige\u2019s theorem becomes a natural approach to SFM. This was suggested as early\nas 1984 in Fujishige [4] and is now called the Fujishige-Wolfe algorithm for SFM.\nThis approach towards SFM was revitalized in 2006 when Fujishige and Isotani [6, 7] announced\nencouraging computational results regarding the minimum norm point algorithm. In particular, this\nalgorithm signi\ufb01cantly out-performed all known provably polynomial time algorithms. Theoretically,\nhowever, little is known regarding the convergence of Wolfe\u2019s procedure except for the \ufb01nite, but\nexponential, running time Wolfe himself proved. Nor is the situation any better for its application\non the base polytope. Given the practical success, we believe this is an important, and intriguing,\ntheoretical challenge.\nIn this work, we make some progress towards analyzing the Fujishige-Wolfe method for SFM and, in\nfact, Wolfe\u2019s algorithm in general. In particular, we prove the following two results:\n\n\u2022 We prove (in Theorem 4) that for any polytope B, Wolfe\u2019s algorithm converges to an \u03b5-\napproximate solution, in O(1/\u03b5) steps. More precisely, in O(nQ2/\u03b5) iterations, Wolfe\u2019s\nalgorithm returns a point (cid:107)x(cid:107)2\n\n2 + \u03b5, where Q = maxp\u2208B (cid:107)p(cid:107)2.\n\n2 \u2264 (cid:107)x\u2217(cid:107)2\n\n\u2022 We prove (in Theorem 5) a robust version of a theorem by Fujishige [3] relating min-norm\npoints on the base polytope to SFM. In particular, we prove that an approximate min-norm\npoint solution provides an approximate solution to SFM as well. More precisely, if x\nsatis\ufb01es (cid:107)x(cid:107)2\n2 \u2264 zT x + \u03b52 for all z \u2208 Bf , then, f (Sx) \u2264 minS f (S) + 2n\u03b5, where Sx can\nbe constructed ef\ufb01ciently using x.\n\ni=1 (|f ({i})|,|f ([n]) \u2212 f ([n] \\ i)|).\n\nTogether, these two results gives us our main result which is a pseudopolynomial bound on the\nrunning time of the Fujishige-Wolfe algorithm for submodular function minimization.\n(Main Result.) Fix a submodular function f : 2X \u2192 Z. The Fujishige-\nTheorem 1.\nWolfe algorithm returns the minimizer of f in O((n5EO + n7)F 2) time where F :=\nmaxn\nOur analysis suggests that the Fujishige-Wolfe\u2019s algorithm is dependent on F and has worse depen-\ndence on n than the Iwata-Orlin [11] algorithm. To verify this, we conducted empirical study on\nseveral standard SFM problems. However, for the considered benchmark functions, running time of\nFujishige-Wolfe\u2019s algorithm seemed to be independent of F and exhibited better dependence on n\nthan the Iwata-Orlin algorithm. This is described in \u00a74.\n2 Preliminaries: Submodular Functions and Wolfe\u2019s Algorithm\n2.1 Submodular Functions and SFM\nGiven a ground set X on n elements, without loss of generality we think of it as the \ufb01rst n integers\n[n] := {1, 2, . . . , n}. f be a submodular function. Since submodularity is translation invariant,\nwe assume f (\u2205) = 0. For a submodular function f, we write Bf \u2286 Rn for the associated base\npolyhedron of f de\ufb01ned in (1). Given x \u2208 Rn, one can \ufb01nd the minimum value of q(cid:62)x over\nq \u2208 Bf in O(n log n + nEO) time using the following greedy algorithm: Renumber indices such\nthat x1 \u2264 \u00b7\u00b7\u00b7 \u2264 xn. Set q\u2217\ni = f ([i]) \u2212 f ([i \u2212 1]). Then, it can be proved that q\u2217 \u2208 Bf and is the\nminimizer of the x(cid:62)q for q \u2208 Bf .\nThe connection between the SFM problem and the base polytope was \ufb01rst established in the following\nminimax theorem of Edmonds [2].\n\n2\n\n\fTheorem 2 (Edmonds [2]). Given any submodular function f with f (\u2205) = 0, we have\n\n(cid:32) (cid:88)\n\n(cid:33)\n\nmin\nS\u2286[n]\n\nf (S) = max\nx\u2208Bf\n\nxi\n\ni:xi<0\n\nThe following theorem of Fujishige [3] shows the connection between \ufb01nding the minimum norm\npoint in the base polytope Bf of a submodular function f and the problem of SFM on input f. This\nforms the basis of Wolfe\u2019s algorithm. In \u00a73.2, we prove a robust version of this theorem.\nTheorem 3 (Fujishige\u2019s Theorem [3]). Let f : 2[n] \u2192 Z be a submodular function and let Bf be the\nassociated base polyhedron. Let x\u2217 be the optimal solution to minx\u2208Bf ||x||. De\ufb01ne S = {i | x\u2217\ni <\n0}. Then, f (S) \u2264 f (T ) for every T \u2286 [n].\n\n2.2 Wolfe\u2019s Algorithm for Minimum Norm Point of a polytope.\n\n(cid:80)\nz\u2208S \u03b1z \u00b7 z, (cid:80)\n\nWe now present Wolfe\u2019s algorithm for computing the minimum-norm point in an arbitrary polytope\nB \u2286 Rn. We assume a linear optimization oracle (LO) which takes input a vector x \u2208 Rn and\noutputs a vector q \u2208 arg minp\u2208B x(cid:62)p.\nWe start by recalling some de\ufb01nitions. The af\ufb01ne hull of a \ufb01nite set S \u2286 Rn is aff(S) = {y | y =\nz\u2208S \u03b1z = 1}. The af\ufb01ne minimizer of S is de\ufb01ned as y = arg minz\u2208aff(S) ||z||2,\nand y satis\ufb01es the following af\ufb01ne minimizer property: for any v \u2208 aff(S), v(cid:62)y = ||y||2. The\nprocedure AffineMinimizer(S) returns (y, \u03b1) where y is the af\ufb01ne minimizer and \u03b1 = (\u03b1s)s\u2208S is\nthe set of coef\ufb01cients expressing y as an af\ufb01ne combination of points in S. This procedure can be\nnaively implemented in O(|S|3 + n|S|2) as follows. Let B be the n \u00d7 |S| matrix where each column\nin a point in S. Then \u03b1 = (B(cid:62)B)\u221211/1(cid:62)(B(cid:62)B)\u221211 and y = B\u03b1.\n\nAlgorithm 1 Wolfe\u2019s Algorithm\n\n1. Let q be an arbitrary vertex of B. Initialize x \u2190 q. We always maintain x =(cid:80)\n\nconvex combination of a subset S of vertices of B. Initialize S = {q} and \u03bb1 = 1.\n\ni\u2208S \u03bbiqi as a\n\n// If y /\u2208 conv(S), then update x to the intersection of the boundary of conv(S) and the segment joining y and\nprevious x. Delete points from S which are not required to describe the new x as a convex combination.\n\u03b8 := mini:\u03b1i<0 \u03bbi/(\u03bbi \u2212 \u03b1i)\nUpdate x \u2190 \u03b8y + (1 \u2212 \u03b8)x.\nUpdate \u03bbi \u2190 \u03b8\u03b1i + (1 \u2212 \u03b8)\u03bbi.\nS = {i : \u03bbi > 0}.\n\ni \u03bbiqi.\n// By de\ufb01nition of \u03b8, the new x lies in conv(S).\n//This sets the coef\ufb01cients of the new x\n// Delete points which have \u03bbi = 0. This deletes at least one point.\n// After the minor loop terminates, x is updated to be the af\ufb01ne minimizer of the current set S.\n\n// Recall, x =(cid:80)\n\n(e) Update x \u2190 y.\n\n3. RETURN x.\n\nWhen \u03b5 = 0, the algorithm on termination (if it terminates) returns the minimum norm point in B\nsince ||x||2 \u2264 x(cid:62)x\u2217 \u2264 ||x|| \u00b7 ||x\u2217||. For completeness, we sketch Wolfe\u2019s argument in [21] of \ufb01nite\ntermination. Note that |S| \u2264 n always; otherwise the af\ufb01ne minimizer is 0 which either terminates\nthe program or starts a minor cycle which decrements |S|. Thus, the number of minor cycles in a\nmajor cycle \u2264 n, and it suf\ufb01ces to bound the number of major cycles. Each major cycle is associated\nwith a set S whose af\ufb01ne minimizer, which is the current x, lies in the convex hull of S. Wolfe calls\nsuch sets corrals. Next, we show that ||x|| strictly decreases across iterations (major or minor cycle)\nof the algorithm, which proves that no corral repeats, thus bounding the number of major cycles by\n\nthe number of corrals. The latter is at most(cid:0)N\n\n(cid:1), where N is the number of vertices of B.\n\nn\n\nConsider iteration j which starts with xj and ends with xj+1. Let Sj be the set S at the beginning\nof iteration j. If the iteration is a major cycle, then xj+1 is the af\ufb01ne minimizer of Sj \u222a {qj}\n\n3\n\n2. WHILE(true): (MAJOR CYCLE)\n\n(a) q := LO(x).\n(b) IF ||x||2 \u2264 x(cid:62)q + \u03b52 THEN break.\n(c) S := S \u222a {q}.\n(d) WHILE(true): (MINOR CYCLE)\ni. (y, \u03b1) = AffineMinimizer(S).\nii. IF \u03b1i \u2265 0 for all i THEN break.\niii. ELSE\n\n// Linear Optimization: q \u2208 arg minp\u2208B x(cid:62)p.\n// Termination Condition. Output x.\n\n//y = arg minz\u2208aff(S) ||z||.\n//If y \u2208 conv(S), then end minor loop.\n\n\fj qj < ||xj||2 (the algorithm doesn\u2019t terminate in iteration j) and\nwhere qj = LO(xj). Since x(cid:62)\nj+1qj = ||xj+1||2 (af\ufb01ne minimizer property), we get xj (cid:54)= xj+1, and so ||xj+1|| < ||xj|| (since\nx(cid:62)\nthe af\ufb01ne minimizer is unique). If the iteration is a minor cycle, then xj+1 = \u03b8xj + (1 \u2212 \u03b8)yj, where\nyj is the af\ufb01ne minimizer of Sj and \u03b8 < 1. Since ||yj|| < ||xj|| (yj (cid:54)= xj since yj /\u2208 conv(Sj)), we\nget ||xj+1|| < ||xj||.\n\n3 Analysis\n\n4n and f is integer-valued, then S is a minimizer.\n\nOur re\ufb01ned analysis of Wolfe\u2019s algorithm is encapsulated in the following theorem.\nTheorem 4. Let B be an arbitrary polytope such that the maximum Euclidean norm of any vertex of\nB is at most Q. After O(nQ2/\u03b52) iterations, Wolfe\u2019s algorithm returns a point x \u2208 B which satis\ufb01es\n||x||2 \u2264 x(cid:62)q + \u03b52, for all points q \u2208 B. In particular, this implies ||x||2 \u2264 ||x\u2217||2 + 2\u03b52.\nThe above theorem shows that Wolfe\u2019s algorithm converges to the minimum norm point at an 1/t-rate.\nWe stress that the above is for any polytope. To apply this to SFM, we prove the following robust\nversion of Fujishige\u2019s theorem connecting the minimum norm point in the base polytope and the set\nminimizing the submodular function value.\nTheorem 5. Fix a submodular function f with base polytope Bf . Let x \u2208 Bf be such that ||x||2 \u2264\nx(cid:62)q + \u03b52 for all q \u2208 Bf . Renumber indices such that x1 \u2264 \u00b7\u00b7\u00b7 \u2264 xn. Let S = {1, 2, . . . , k},where\nk is smallest index satisfying (C1) xk+1 \u2265 0 and (C2) xk+1 \u2212 xk \u2265 \u03b5/n. Then, f (S) \u2264 f (T ) + 2n\u03b5\nfor any subset T \u2286 S. In particular, if \u03b5 = 1\nTheorem 4 and Theorem 5 implies our main theorem.\n(Main Result.) Fix a submodular function f : 2X \u2192 Z. The Fujishige-\nTheorem 1.\nWolfe algorithm returns the minimizer of f in O((n5EO + n7)F 2) time where F :=\nmaxn\nProof. The vertices of Bf are well understood: for every permutation \u03c3 of [n], we have a vertex\nwith x\u03c3(i) = f ({\u03c3(1), . . . , \u03c3(i)}) \u2212 f ({\u03c3(1), . . . , \u03c3(i \u2212 1)}). By submodularity of f, we get\nfor all i, |xi| \u2264 F . Therefore, for any point x \u2208 Bf , ||x||2 \u2264 nF 2. Choose \u03b5 = 1/4n. From\nTheorem 4 we know that if we run O(n4F 2) iterations of Wolfe, we will get a point x \u2208 Bf such\nthat ||x||2 \u2264 x(cid:62)q + \u03b52 for all q \u2208 Bf . Theorem 5 implies this solves the SFM problem. The running\ntime for each iteration is dominated by the time for the subroutine to compute the af\ufb01ne minimizer of\nS which is at most O(n3), and the linear optimization oracle. For Bf , LO(x) can be implemented in\nO(n log n + nEO) time. This proves the theorem.\nWe prove Theorem 4 and Theorem 5 in \u00a73.1 and \u00a73.2, respectively.\n3.1 Analysis of Wolfe\u2019s Min-norm Point Algorithm\n\ni=1 (|f ({i})|,|f ([n]) \u2212 f ([n] \\ i)|).\n\nThe stumbling block in the analysis of Wolfe\u2019s algorithm is the interspersing of major and minor\ncycles which oscillates the size of S preventing it from being a good measure of progress. Instead, in\nour analysis, we use the norm of x as the measure of progress. Already we have seen that ||x|| strictly\ndecreases. It would be nice to quantify how much the decrease is, say, across one major cycle. This,\nat present, is out of our reach even for major cycles which contain two or more minor cycles in them.\nHowever, we can prove signi\ufb01cant drop in norm in major cycles which have at most one minor cycle\nin them. We call such major cycles good. The next easy, but very useful, observation is the following:\none cannot have too many bad major cycles without having too many good major cycles.\nLemma 1. In any consecutive 3n + 1 iterations, there exists at least one good major cycle.\nProof. Consider a run of r iterations where all major cycles are bad, and therefore contain \u2265 2\nminor cycles. Say there are k major cycles and r \u2212 k minor cycles, and so r \u2212 k \u2265 2k implying\nr \u2265 3k. Let SI be the set S at the start of these iterations and SF be the set at the end. We have\n|SF| \u2264 |SI| + k \u2212 (r \u2212 k) \u2264 |SI| + 2k \u2212 r \u2264 n \u2212 r\n\n3. Therefore, r \u2264 3n, since |SF| \u2265 0.\n\nBefore proceeding, we introduce some notation.\nDe\ufb01nition 1. Given a point x \u2208 B, let us denote err(x) := ||x||2 \u2212 ||x\u2217||2. Given a point x and\nq, let \u2206(x, q) := ||x||2 \u2212 x(cid:62)q and let \u2206(x) := maxq\u2208B \u2206(x, q) = ||x||2 \u2212 minq\u2208B x(cid:62)q. Observe\nthat \u2206(x) \u2265 err(x)/2 since \u2206(x) \u2265 ||x||2 \u2212 x(cid:62)x\u2217 \u2265 (||x||2 \u2212 ||x\u2217||2)/2.\n\n4\n\n\fWe now use t to index all good major cycles. Let xt be the point x at the beginning of the t-th\ngood major cycle. The next theorem shows that the norm signi\ufb01cantly drops across good major\ncycles.\nTheorem 6. For t iterating over good major cycles, err(xt) \u2212 err(xt+1) \u2265 \u22062(xt)/8Q2.\nWe now complete the proof of Theorem 4 using Theorem 6.\nProof of Theorem 4. Using Theorem 6, we get that err(xt) \u2212 err(xt+1) \u2265 err(xt)2/32Q2 since\n\u2206(x) \u2265 err(x)/2 for all x. We claim that in t\u2217 \u2264 64Q2/\u03b52 good major cycles, we reach xt with\nerr(xt\u2217 ) \u2264 \u03b52. To see this rewrite as follows:\nerr(xt+1) \u2264 err(xt)\n\nfor all t.\n\n(cid:18)\n\n(cid:19)\n\n,\n\n1 \u2212 err(xt)\n32Q2\n\n(cid:17)\n\n(cid:16)\n\n32Q22k\n\n1 \u2212 e0\n\n(cid:0)1 + 2 + \u00b7\u00b7\u00b7 + 2K(cid:1) \u2264 64Q22K/e0 = 64Q2/\u03b52.\n\nNow let e0 := err(x0). De\ufb01ne t0, t1, . . . such that for all k \u2265 1 we have err(xt) > e0/2k for\nt \u2208 [tk\u22121, tk). That is, tk is the \ufb01rst time t at which err(xt) \u2264 e0/2k. Note that for t \u2208 [tk\u22121, tk),\nwe have err(xt+1) \u2264 err(xt)\n. This implies in 32Q22k/e0 time units after tk\u22121, we\nwill have err(xt) \u2264 err(xtk\u22121 )/2; we have used the fact that (1 \u2212 \u03b4)1/\u03b4 < 1/2 when \u03b4 < 1/32.\nThat is, tk \u2264 tk\u22121 + 32Q22k/e0. We are interested in t\u2217 = tK where 2K = e0/\u03b52. We get\nt\u2217 \u2264 32Q2\nNext, we claim that in t\u2217\u2217 < t\u2217 + t(cid:48) good major cycles, where t(cid:48) = 8Q2/\u03b52, we obtain an xt\u2217\u2217\nwith \u2206(xt\u2217\u2217 ) \u2264 \u03b52. This is because, if not, then, using Theorem 6, in each of the good major\ncycles t\u2217 + 1, t\u2217 + 2, . . . t\u2217 + t(cid:48), err(x) falls additively by > \u03b54/8Q2 and thus err(xt\u2217+t(cid:48)) <\nerr(xt\u2217 ) \u2212 \u03b52 \u2264 0, which is a contradiction. Therefore, in O(Q2/\u03b52) good major cycles, the\nalgorithm obtains an x = xt\u2217\u2217 with \u2206(x) \u2264 \u03b52, proving Theorem 4.\n\ne0\n\nThe rest of this subsection is dedicated to proving Theorem 6.\n\nProof of Theorem 6: We start off with a simple geometric lemma.\nLemma 2. Let S be a subset of Rn and suppose y is the minimum norm point of aff(S). Let x and\nq be arbitrary points in aff(S). Then,\n\n||x||2 \u2212 ||y||2 \u2265 \u2206(x, q)2\n\n4Q2\n\n(2)\n\nwhere Q is an upper bound on ||x||,||q||.\nProof. Since y is the minimum norm point in aff(S), we have x(cid:62)y = q(cid:62)y = ||y||2. In particular,\n||x \u2212 y||2 = ||x||2 \u2212 ||y||2. Therefore,\n\u2206(x, q) = (cid:107)x(cid:107)2 \u2212 xT q = (cid:107)x(cid:107)2 \u2212 x(cid:62)y + y(cid:62)q \u2212 xT q = (y \u2212 x)T (q \u2212 x) \u2264 (cid:107)y \u2212 x(cid:107) \u00b7 (cid:107)q \u2212 x(cid:107)\n\n\u2264 (cid:107)y \u2212 x(cid:107)((cid:107)x(cid:107) + (cid:107)q(cid:107)) \u2264 2Q(cid:107)y \u2212 x(cid:107),\n\nwhere the \ufb01rst inequality is Cauchy-Schwartz and the second is triangle inequality. Lemma now\nfollows by taking square of the above expression and by observing that (cid:107)y\u2212 x(cid:107)2 = (cid:107)x(cid:107)2\u2212(cid:107)y(cid:107)2.\n\nThe above lemma takes case of major cycles with no minor cycles in them.\nLemma 3 (Progress in Major Cycle with no Minor Cycles). Let t be the index of a good major cycle\nwith no minor cycles. Then err(xt) \u2212 err(xt+1) \u2265 \u22062(xt)/4Q2.\nProof. Let St be the set S at start of the tth good major cycle, and let qt be the point minimizing x(cid:62)\nt q.\nLet S = St \u222a qt and let y be the minimum norm point in aff(S). Since there are no minor cycles,\ny \u2208 conv(S). Abuse notation and let xt+1 = y be the iterate at the call of the next major cycle (and\nnot the next good major cycle). Since the norm monotonically decreases, it suf\ufb01ces to prove the\nlemma statement for this xt+1. Now apply Lemma 2 with x = xt and q = qt and S = St \u222a qt. We\nhave that err(xt) \u2212 err(xt+1) = ||xt||2 \u2212 ||y||2 \u2265 \u2206(xt, qt)2/4Q2 = \u2206(xt)2/4Q2.\n\nNow we have to argue about major cycles with exactly one minor cycle. The next observation is a\nuseful structural result.\n\n5\n\n\fLemma 4 (New Vertex Survives a Minor Cycle.). Consider any (not necessarily good) major\ncycle. Let xt, St, qt be the parameters at the beginning of this cycle, and let xt+1, St+1, qt+1 be the\nparameters at the beginning of the next major cycle. Then, qt \u2208 St+1.\nProof. Clearly St+1 \u2286 St \u222a qt since qt is added and then maybe minor cycles remove some points\nfrom S. Suppose qt /\u2208 St+1. Well, then St+1 \u2286 St. But xt+1 is the af\ufb01ne minimizer of St+1 and xt\nis the af\ufb01ne minimizer of St. Since St is the larger set, we get ||xt|| \u2264 ||xt+1||. This contradicts the\nstrict decrease in the norm.\nLemma 5 (Progress in an iteration with exactly one minor cyvle). Suppose the tth good major cycle\nhas exactly one minor cycle. Then, err(xt) \u2212 err(xt+1) \u2265 \u2206(xt)2/8Q2.\n\nProof. Let xt, St, qt be the parameters at the beginning of the tth good major cycle. Let y be the\naf\ufb01ne minimizer of St\u222aqt. Since there is one minor cycle, y /\u2208 conv(St\u222aqt). Let z = \u03b8xt +(1\u2212\u03b8)y\nbe the intermediate x, that is, point in the line segment [xt, y] which lies in conv(St \u222a qt). Let S(cid:48) be\nthe set after the single minor cycle is run. Since there is just one minor cycle, we get xt+1 (abusing\nnotation once again since the next major cycle maynot be good) is the af\ufb01ne minimizer of S(cid:48).\nLet A (cid:44) ||xt||2 \u2212 ||y||2. From Lemma 2, and using qt is the minimizer of x(cid:62)\n\nt q over all q, we have:\n(3)\nRecall, z = \u03b8xt + (1 \u2212 \u03b8)y for some \u03b8 \u2208 [0, 1]. Since y is the min-norm point of aff(St \u222a qt), and\nxt \u2208 St, we get ||z||2 = \u03b82||xt||2 + (1 \u2212 \u03b82)||y||2. this yields:\n\n||xt||2 \u2212 ||z||2 = (1 \u2212 \u03b82)(cid:0)||xt||2 \u2212 ||y||2(cid:1) = (1 \u2212 \u03b82)A\n\n(4)\nFurther, recall that S(cid:48) is the set after the only minor cycle in the tth iteration is run and thus, from\nLemma 4, qt \u2208 S(cid:48). z \u2208 conv(S(cid:48)) by de\ufb01nition. And since there is only one minor cycle, xt+1 is the\naf\ufb01ne minimizer of S(cid:48). We can apply Lemma 2 with z, qt and xt+1, to get\n\nA = ||xt||2 \u2212 ||y||2 \u2265 \u22062(xt)/4Q2\n\n||z||2 \u2212 ||xt+1||2 \u2265 \u22062(z, qt)\n\n4Q2\nNow we lower bound \u22062(z, qt). By de\ufb01nition of z, we have:\nt qt + (1 \u2212 \u03b8)y(cid:62)qt = \u03b8x(cid:62)\n\nz(cid:62)qt = \u03b8x(cid:62)\n\nt qt + (1 \u2212 \u03b8)||y||2\n\nwhere the last equality follows since y(cid:62)qt = ||y||2 (since qt \u2208 St \u222a qt and y is af\ufb01ne minimizer of\nSt \u222a qt). This gives\n\n(5)\n\n(6)\n\n(7)\n\n\u2206(z, qt) = ||z||2 \u2212 z(cid:62)qt\n\n= (cid:0)\u03b82||xt||2 + (1 \u2212 \u03b82)||y||2(cid:1) \u2212(cid:0)\u03b8x(cid:62)\n\nt qt) \u2212 \u03b8(1 \u2212 \u03b8)(cid:0)||xt||2 \u2212 ||y||2(cid:1)\n\nt qt + (1 \u2212 \u03b8)||y||2(cid:1)\n\n= \u03b8(||xt||2 \u2212 x(cid:62)\n= \u03b8 (\u2206(xt) \u2212 (1 \u2212 \u03b8)A)\n\nFrom (4),(5), and (6), we get\n\nerrt \u2212 errt+1 \u2265 (1 \u2212 \u03b82)A +\n\n\u03b82 (\u2206(xt) \u2212 (1 \u2212 \u03b8)A)2\n\n4Q2\n\nWe need to show that the RHS is at least \u2206(xt)2/8Q2. Intuitively, if \u03b8 is small (close to 0), the\n\ufb01rst term implies this using (3), and if \u03b8 is large (close to 1), then the second term implies this. The\nfollowing paragraph formalizes this intuition for any \u03b8.\nNow, if (1 \u2212 \u03b82)A > \u2206(xt)2/8Q2, we are done. Therefore, we assume (1 \u2212 \u03b82)A \u2264 \u2206(xt)2/8Q2.\nIn this case, using the fact that \u2206(xt) \u2264 ||xt||2 + ||xt||||qt|| \u2264 2Q2, we get that\n\n(1 \u2212 \u03b8)A \u2264 (1 \u2212 \u03b82)A \u2264 \u2206(xt) \u00b7 \u2206(xt)\n\n8Q2 \u2264 \u2206(xt)/4\n\nSubstituting in (7), and using (3), we get\n\nerrt \u2212 errt+1 \u2265 (1 \u2212 \u03b82)\u2206(xt)2\n\n4Q2\n\n+\n\n9\u03b82\u2206(xt)2\n\n64Q2 \u2265 \u2206(xt)2\n\n8Q2\n\n(8)\n\nThis completes the proof of the lemma.\n\nLemma 3 and Lemma 5 complete the proof of Theorem 6.\n\n6\n\n\f3.2 A Robust version of Fujishige\u2019s Theorem\nIn this section we prove Theorem 5 which we restate below.\nTheorem 5. Fix a submodular function f with base polytope Bf . Let x \u2208 Bf be such that ||x||2 \u2264\nx(cid:62)q + \u03b52 for all q \u2208 Bf . Renumber indices such that x1 \u2264 \u00b7\u00b7\u00b7 \u2264 xn. Let S = {1, 2, . . . , k},where\nk is smallest index satisfying (C1) xk+1 \u2265 0 and (C2) xk+1 \u2212 xk \u2265 \u03b5/n. Then, f (S) \u2264 f (T ) + 2n\u03b5\nfor any subset T \u2286 S. In particular, if \u03b5 = 1\nBefore proving the theorem, note that setting \u03b5 = 0 gives Fujishige\u2019s theorem Theorem 3.\nProof. We claim that the following inequality holds. Below, [i] := {1, . . . , i}.\n\n4n and f is integer-valued, then S is a minimizer.\n\ni=1\n\n(xi+1 \u2212 xi) \u00b7 (f ([i]) \u2212 x([i])) \u2264 \u03b52\n\nWe prove this shortly. Let S and k be as de\ufb01ned in the theorem statement. Note that(cid:80)\nwe get using (9), f (S) \u2212 x(S) \u2264 n\u03b5. Therefore, f (S) \u2264(cid:80)\nas follows: x =(cid:80)n\u22121\n\n(9)\ni\u2208S:xi\u22650 xi \u2264\nn\u03b5, since (C2) doesn\u2019t hold for any index i < k with xi \u2265 0. Furthermore, since xk+1 \u2212 xk \u2265 \u03b5/n,\ni\u2208S:xi<0 xi + 2n\u03b5 which implies the\ntheorem due to Theorem 2.\nNow we prove (9). Let z \u2208 Bf be the point which minimizes z(cid:62)x. By the Greedy algorithm\ndescribed in Section 2.1, we know that zi = f ([i]) \u2212 f ([i \u2212 1]). Next, we write x in a different basis\ni=1 (xi \u2212 xi+1)1[i] + xn1[n]. Here 1[i] is used as the shorthand for the vector\nwhich has 1\u2019s in the \ufb01rst i coordinates and 0s everywhere else. Taking dot product with (x \u2212 z), we\nget\n\nn\u22121(cid:88)\n\n||x||2 \u2212 x(cid:62)z = (x \u2212 z)(cid:62)x =\n\nn\u22121(cid:88)\n(xi \u2212 xi+1)(cid:0)x(cid:62)1[i] \u2212 z(cid:62)1[i]\n\n(cid:1) + xn\n\n(cid:0)x(cid:62)1[n] \u2212 z(cid:62)1[n]\n\n(cid:1)\n\n(10)\n\nSince zi = f ([i]) \u2212 f ([i \u2212 1]), we get x(cid:62)1[i] \u2212 z(cid:62)1[i] is x([i]) \u2212 f ([i]). Therefore the RHS of (10)\nis the LHS of (9). The LHS of (10), by the assumption of the theorem, is at most \u03b52 implying (9).\n\ni=1\n\n4 Discussion and Conclusions\n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: Running time comparision of Iwata-Orlin\u2019s (IO) method [11] vs Wolfe\u2019s method. (a):\ns-t mincut function, (b) Iwata\u2019s 3 groups function [16]. (c): Total number of iterations required by\nWolfe\u2019s method for solving s-t mincut with increasing F\n\nWe have shown that the Fujishige-Wolfe algorithm solves SFM in O((n5EO + n7)F 2) time, where\nF is the maximum change in the value of the function on addition or deletion of an element. Although\nthis is the \ufb01rst pseudopolynomial time analysis of the algorithm, we believe there is room for\nimprovement and hope our work triggers more interest.\nNote that our anlaysis of the Fujishige-Wolfe algorithm is weaker than the best known method in\nterms of time complexity (IO method by [11]) on two counts: a) dependence on n, b) dependence\non F . In contrast, we found this algorithm signi\ufb01cantly outperforming the IO algorithm empirically\n\u2013 we show two plots here. In Figure 1 (a), we run both on Erdos-Renyi graphs with p = 0.8 and\nrandomly chosen s, t nodes. In Figure 1 (b), we run both on the Iwata group functions [16] with 3\ngroups. Perhaps more interestingly, in Figure 1 (c), we ran the Fujishige-Wolfe algorithm on the\nsimple path graph where s, t were the end points, and changed the capacities on the edges of the\ngraph which changed the parameter F . As can be seen, the number of iterations of the algorithm\nremains constant even for exponentially increasing F .\n\n7\n\n\fReferences\n\n[1] Francis Bach. Convex analysis and optimization with submodular functions: a tutorial. CoRR,\n\nabs/1010.4207, 2010. 1\n\n[2] Jack Edmonds. Matroids, submodular functions and certain polyhedra. Combinatorial Structures\n\nand Their Applications, pages 69\u201387, 1970. 2, 3\n\n[3] Satoru Fujishige. Lexicographieally optimal base of a polymatroid with respect to a weight\n\nvector. Math. Oper. Res., 5:186\u2013196, 1980. 1, 2, 3\n\n[4] Satoru Fujishige. Submodular systems and related topics. Math. Programming Study, 1984. 2\n[5] Satoru Fujishige. Submodular functions and optimization. Elsevier, 2005.\n[6] Satoru Fujishige, Takumi Hayashi, and Shigueo Isotani. The minimum-norm-point algorithm\n\napplied to submodular function minimization and linear programming. 2006. 2\n\n[7] Satoru Fujishige and Shigueo Isotani. A submodular function minimization algorithm based on\n\nthe minimum-norm base. Paci\ufb01c Journal of Optimization, 7:3, 2011. 2\n\n[8] Martin Gr\u00a8otschel, L\u00b4aszl\u00b4o Lov\u00b4asz, and Alexander Schrijver. The ellipsoid method and its\n\nconsequences in combinatorial optimization. Combinatorica, 1(2):169\u2013197, 1981. 1\n\n[9] Satoru Iwata, Lisa Fleischer, and Satoru Fujishige. A combinatorial, strongly polynomial-time\n\nalgorithm for minimizing submodular functions. In STOC, pages 97\u2013106, 2000. 1\n\n[10] Satoru Iwata, Lisa Fleischer, and Satoru Fujishige. A combinatorial strongly polynomial\n\nalgorithm for minimizing submodular functions. J. ACM, 48(4):761\u2013777, 2001. 1\n\n[11] Satoru Iwata and James B. Orlin. A simple combinatorial algorithm for submodular function\n\nminimization. In SODA, pages 1230\u20131237, 2009. 1, 2, 7\n\n[12] Rishabh Iyer, Stefanie Jegelka, and Jeff Bilmes. Curvature and optimal algorithms for learning\n\nand minimizing submodular functions. CoRR, abs/1311.2110, 2013. 1\n\n[13] Rishabh Iyer, Stefanie Jegelka, and Jeff Bilmes. Fast semidifferential-based submodular function\n\noptimization. In ICML (3), pages 855\u2013863, 2013. 1\n\n[14] Rishabh K. Iyer and Jeff A. Bilmes. Submodular optimization with submodular cover and\n\nsubmodular knapsack constraints. In NIPS, pages 2436\u20132444, 2013. 1\n\n[15] Stefanie Jegelka, Francis Bach, and Suvrit Sra. Re\ufb02ection methods for user-friendly submodular\n\noptimization. In NIPS, pages 1313\u20131321, 2013. 1\n\n[16] Stefanie Jegelka, Hui Lin, and Jeff A. Bilmes. On fast approximate submodular minimization.\n\nIn NIPS, pages 460\u2013468, 2011. 1, 7\n\n[17] Pushmeet Kohli and Philip H. S. Torr. Dynamic graph cuts and their applications in computer\nvision. In Computer Vision: Detection, Recognition and Reconstruction, pages 51\u2013108. 2010. 1\n[18] Andreas Krause, Ajit Paul Singh, and Carlos Guestrin. Near-optimal sensor placements in\ngaussian processes: Theory, ef\ufb01cient algorithms and empirical studies. Journal of Machine\nLearning Research, 9:235\u2013284, 2008. 1\n\n[19] Alexander Schrijver. A combinatorial algorithm minimizing submodular functions in strongly\n\npolynomial time. J. Comb. Theory, Ser. B, 80(2):346\u2013355, 2000. 1\n\n[20] Peter Stobbe and Andreas Krause. Ef\ufb01cient minimization of decomposable submodular func-\n\ntions. In NIPS, pages 2208\u20132216, 2010. 1\n\n[21] Phillip Wolfe. Finding the nearest point in a polytope. Math. Programming, 11:128 \u2013 149, 1976.\n\n1, 2, 3\n\n8\n\n\f", "award": [], "sourceid": 538, "authors": [{"given_name": "Deeparnab", "family_name": "Chakrabarty", "institution": "Microsoft"}, {"given_name": "Prateek", "family_name": "Jain", "institution": "Microsoft Research"}, {"given_name": "Pravesh", "family_name": "Kothari", "institution": "UT Austin"}]}