{"title": "Submodular Maximization via Gradient Ascent: The Case of Deep Submodular Functions", "book": "Advances in Neural Information Processing Systems", "page_first": 7978, "page_last": 7988, "abstract": "We study the problem of maximizing deep submodular functions (DSFs) subject to a matroid constraint. DSFs are an expressive class of submodular functions that include, as strict subfamilies, the facility location, weighted coverage, and sums of concave composed with modular functions. We use a strategy similar to the continuous greedy approach, but we show that the multilinear extension of any DSF has a natural and computationally attainable concave relaxation that we can optimize using gradient ascent. Our results show a guarantee of $\\max_{0<\\delta<1}(1-\\epsilon-\\delta-e^{-\\delta^2\\Omega(k)})$ with a running time of $O(\\nicefrac{n^2}{\\epsilon^2})$ plus time for pipage rounding\nto recover a discrete solution, where $k$ is the rank of the matroid constraint. This bound is often better than the standard $1-1/e$ guarantee of the continuous greedy algorithm, but runs much faster. Our bound also holds even for fully curved ($c=1$) functions where the guarantee of $1-c/e$ degenerates to $1-1/e$ where $c$ is the curvature of $f$. We perform computational experiments that support our theoretical results.", "full_text": "Submodular Maximization via Gradient Ascent:\n\nThe Case of Deep Submodular Functions\n\nDepts. of Electrical & Computer Engineering\u2021, Computer Science and Engineering$, and Genome Sciences\u2217\n\nWenruo Bai\u2021, William S Noble\u2217$, Jeff A. Bilmes\u2021$\n\nSeattle, WA 98195\n\n{wrbai,wnoble,bilmes}@uw.edu\n\nAbstract\n\nWe study the problem of maximizing deep submodular functions (DSFs) [13, 3]\nsubject to a matroid constraint. DSFs are an expressive class of submodular\nfunctions that include, as strict subfamilies, the facility location, weighted coverage,\nand sums of concave composed with modular functions. We use a strategy similar\nto the continuous greedy approach [6], but we show that the multilinear extension\nof any DSF has a natural and computationally attainable concave relaxation that we\ncan optimize using gradient ascent. Our results show a guarantee of max0<\u03b4<1(1\u2212\n\u0001\u2212 \u03b4\u2212 e\u2212\u03b42\u2126(k)) with a running time of O(n2/\u00012) plus time for pipage rounding [6]\nto recover a discrete solution, where k is the rank of the matroid constraint. This\nbound is often better than the standard 1 \u2212 1/e guarantee of the continuous greedy\nalgorithm, but runs much faster. Our bound also holds even for fully curved (c = 1)\nfunctions where the guarantee of 1 \u2212 c/e degenerates to 1 \u2212 1/e where c is the\ncurvature of f [37]. We perform computational experiments that support our\ntheoretical results.\n\nIntroduction\n\n1\nA set function f : 2V \u2192 R+ is called submodular [15] if f (A) + f (B) \u2265 f (A \u222a B) + f (A \u2229 B) for\nall A, B \u2286 V , where V = [n] is the ground set. An equivalent de\ufb01nition of submodularity states that\nf (v|A) \u2265 f (v|B) for all A \u2286 B \u2286 V and v \u2208 V \\B, where f (v|A) \u2261 f ({v}\u222aA)\u2212f (A) is the gain\nof element v given A. This property of diminishing returns well models concepts such as information,\ndiversity, and representativeness. Recent studies have shown that submodularity is natural for a large\nnumber of real world machine learning applications such as information gathering [23], probabilistic\nmodels [12], image segmentation [22], string alignment [28], document and speech summarization\n[27, 26], active learning [39], genomic assay selection [40] and protein subset selection [25], as well\nas many others.\nIn addition to having a variety of natural applications in machine learning, the optimization properties\nof submodular functions appear to be ever more auspicious. On one hand, the submodular minimiza-\ntion problem can be exactly solved in polynomial time [29, 11, 15]. Recent studies mostly focus on\nimproving running times [24, 7]. On the other hand, submodular maximization is harder, and the\noptimal solution cannot be found by any polynomial time algorithm. A good approximate solution,\nhowever, is usually acceptable, and a simple greedy algorithm can \ufb01nd a constant factor 1 \u2212 1/e\napproximate solution for the monotone non-decreasing1 submodular maximization problem subject\nto a k-cardinality constraint [32]. Although submodular maximization is a purely combinatorial\nproblem, there are also approaches to solve it via continuous relaxation (e.g. multilinear extension).\nFor example, [6] offers a randomized continuous greedy algorithm that offers the same 1 \u2212 1/e\nbound for monotone non-decreasing submodular maximization subject to a more general matroid\n1A submodular function is said to be monotone non-decreasing if f (v|A) \u2265 0 for all v \u2208 V and A \u2286 V .\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\findependence constraint. If the function\u2019s curvature c is taken into account, this approach yields\nan improved guarantee of no worse than 1 \u2212 c/e [37]. Recent studies showed stochastic projected\ngradient methods [18, 31] can be useful on maximizing continuous DR-submodular function [35] 2.\nThe best guarantee is (1 \u2212 OP T/e \u2212 \u0001) by 1/\u00013 iterations in gradient methods [31].\nThe above results apply to any non-negative monotone submodular function. In practice, solving\na given problem requires applying the algorithm to a speci\ufb01c submodular function, for example,\nset cover [16, 20], facility location [38], feature-based [21], graph cut [19] and deep submodular\nfunctions (DSF) [13, 3]. When working with a speci\ufb01c sub-class of functions, we can bene\ufb01t\nfrom knowing the speci\ufb01c form and its mathematical properties. For example, in the simplest case,\nmaximizing a modular function (i.e., a function f for which both f and \u2212f are submodular) under a\nmatroid constraint can be exactly solved by a greedy algorithm. [20] showed bene\ufb01t for submodular\nmaximization in the speci\ufb01c case of weighted coverage functions.\nIn our work, we focus on DSF maximization under a matroid constraint. Introduced in [13, 3], DSFs\nare a generalization of set coverage, facility location, and feature-based functions. Importantly, the\nclass of DSFs is a strict superset of the union of these three, which means that any method designed for\na general DSF can be applied to set coverage, facility location, and feature-based functions but not vice\n\nversa. For example,(cid:112)m(A) is concave over modular; a feature-based function has the form of a sum\nof concave composed with modular functions, such as(cid:112)m1(A) + log(1 + m2(A)), while a two-layer\n(cid:104)\n(cid:105)1/4\nm3(A) +(cid:112)m4(A)\n\n(cid:113)(cid:112)m1(A) + arctan(m2(A))+\n\nDSF has a nested composition of the form\nIn [3], it was shown that the expressivity of DSFs strictly grows with the number of layers.\nTo our knowledge, there have been no studies on the speci\ufb01c problem of DSF maximization. On\nthe one hand, we can use the generic greedy or continuous greedy algorithms for DSF, since DSF is\nmonotone submodular, but we should not be surprised if better bounds than 1 \u2212 1/e can be achieved\nusing the structure and properties of a DSF. The major contribution of the present work is to show\nthat a very natural and computationally easy-to-obtain concave extension of DSFs is a nearly tight\nrelaxation of the DSF\u2019s multilinear extension. Therefore, given this extension, we can use projected\ngradient ascent (Algorithm. 1 [4]) to maximize the concave extension and obtain a fractional solution,\nand then use pipage rounding [2, 6] to recover a discrete solution.\nOur approach has the following advantages over the continuous greedy algorithm with only oracle\naccess to the submodular function:\n\n.\n\n(cid:21)\n\nwmax\n\n(cid:20)\n\n\u2212 \u03b42wmin k\n\n1 \u2212 |V (1)|e\n\nmultilinear extension which often itself needs to be approximated using sampling.\n\n1. Easy concave extension: A natural concave extension of any DSF is easy to obtain, unlike the\n2. Better guarantee for large k: Our method has a guarantee of max0<\u03b4<1(1\u2212 \u0001\u2212 \u03b4 \u2212 e\u2212\u03b42\u2126(k))),\nwhere k is the rank of the matroid constraint (Corollary 2). A more complete formulation is\nmax0<\u03b4<1(1 \u2212 \u0001)(1 \u2212 \u03b4)\n, where wmin/wmax is the ratio of the smallest\nto the largest DSF element in the \ufb01rst weight layer of a DSF and |V (1)| is the size of the feature\nlayer (see Figure 2). Importantly, this bound holds even when the curvature [37] of the DSF\nis c = 1 so the 1 \u2212 c/e bound of [37] is at its worst at 1 \u2212 1/e (Lemma 7 in Appendix). We\ncompare our bound with the traditional 1/2 (for the greedy) and 1 \u2212 1/e (for the continuous\ngreedy) bounds in Figure 1. We show that our bound is better than the continuous greedy\nalgorithm (1 \u2212 1/e) for large k (> 102 \u223c 104 depending on k and wmin/wmax).\n3. Improved running time: Other than the fact that a natural concave extension of a DSF is readily\navailable, the running time of our method is O(n2\u0001\u22122) and is thus better than the O(n7) cost\nfor the continuous greedy algorithm. Most of the continuous greedy algorithm\u2019s running time is\nfor estimating the multilinear extension (O(n5) [6]), while in our method, calculating the DSF\nconcave extension only needs one evaluation of the original function.\n\n1.1 Background and Related work\n\n[13, 3] introduced deep submodular functions where [3] discussed their theoretical properties and\n[13] their training in a fashion similar to how deep neural networks may be trained. Particularly\nrelevant to the present study, [3] showed that while DSFs cannot express all submodular functions,\n\n2Multilinear extension is a special case of continuous DR-submodular.\n\n2\n\n\f(a)\n\n(b)\n\nFigure 1: Guarantee of propose methods stated in Theorem 3. Solid lines are the proposed guarantees\nwith respect to the rank of matroid constraint; dash lines are guarantees for the continuous greedy\nalgorithm and the greedy algorithm. Our guarantee is proportional to 1 \u2212 \u0001 and in the above \ufb01gure,\nwe use \u0001 = 0.01 for illustration. (a) is \ufb01xing |V (1)| = 10 and each trace is for different wmin/wmax,\nwhich is the ratio of the smallest feature to the largest feature. (b) is \ufb01xing wmin/wmax = 0.1 and each\ntrace is for different |V (1)|, which is the size of the features layer (see Figure 2).\n\nare subsets of V , \u03b1i are non-negative numbers, and \u03b2i are non-negative integers.\n\nbelow.\n\nnegative numbers. It is a subclass of weighted coverage functions [20].\n\nk-layer DSFs strictly generalize k \u2212 1-layer DSFs. Moreover, the following classes of functions are\nall strict subclasses of DSFs [3].\n\nfunctions [36]. These functions take the form f (A) =(cid:80)\n2. Weighted cardinality truncation (WCT) functions. f (A) =(cid:80)\n3. Weighted coverage (WC) functions which take the form f (A) =(cid:80)\n4. Facility location (FL) functions. f (A) = (cid:80)\n\n1. Sums of concave composed with non-negative modular functions plus an arbitrary modular\nfunction (SCMM), also called feature-based functions [21], or \u201cdecomposable\u201d submodular\ni \u03b1i\u03c6i(mi(A))+m\u00b1(A) where \u03b1i are\nnon-negative numbers, \u03c6i are monotone non-decreasing concave functions, mi are non-negative\nmodular functions, and m\u00b1 is an arbitrary modular function.\ni \u03b1i min(|A \u2229 Vi|, \u03b2i) where Vi\ni \u03b1i min(|A \u2229 Vi|, 1). See\ni\u2208V maxj\u2208A wij where wij is a matrix of non-\nIn particular, we have the following chain relationship between these classes of functions: FL \u2282\nWC \u2282 WCT \u2282 SCMM \u2282 DSF \u2282 All-Submodular-Functions [3]. In the present paper, we address\nany function that can be represented as a DSF.\nIn [20], submodular maximization of the special case of weighted coverage (WC) functions was\nstudied, using an approach that took a concave relaxation of the multilinear extension of such\nfunctions. Let U be a set and m : 2U \u2192 R+ be a non-negative modular function. The ground set V =\n{B1, B2, . . . , Bn} is a collection of subsets of U. A weighted coverage function f (S) : 2V \u2192 R+ is\nu\u2208U m(u) min(1,|S \u2229 Cu|)\nwhere Cu = {Bi|u \u2208 Bi}, which reveals that the weighted coverage function is actually a simple\nexample of a one-layer DSF. In [20], Karimi et al. show that the multilinear extension of f has a\nu\u2208U m(u) min(1, 1Cu \u00b7 x) within a 1 \u2212 1/e approximation.\nThey \ufb01rst optimize the concave relaxation and claim that the solution is also good maximizer for the\nmultilinear extension by the 1 \u2212 1/e approximation. They further show that their approach yields\nsolutions that match the 1 \u2212 1/e guarantee of the continuous greedy algorithm, while reducing the\ncomputational cost by several orders of magnitude, mostly because they do not need to compute the\nmultilinear extension.\nOur framework in the present paper is a strict generalization of this previous method in the following\nways: (1) The weighted coverage function class is a subclass of DSFs, and Karimi et al.\u2019s proposed\nconcave extension is a special case of a more general DSF concave extension; (2) We use a similar\nalgorithmic approach which thus also has the advantage of better running time over the continuous\ngreedy algorithm; and (3) We offer a still better bound for large k where k is the rank of the matroid.\n(cid:80)\nAs an example application, we note that DSFs generalize feature-based functions which are useful\nfor various summarization tasks [21, 41, 17]. A feature-based function has the form f (A) =\nu\u2208U wu\u03c6u(mu(A)) where U is a set of features, wu > 0 is a feature weight for u \u2208 U, mu(A) =\n\nde\ufb01ned as f (S) = m(\u222aBi\u2208SBi). An equivalent formula is f (S) =(cid:80)\nnatural concave relaxation \u00afF (x) = (cid:80)\n\n3\n\n101102103104105106107rank of matroid constraint, k0.00.20.40.60.81.0guaranteecontinuous greedy algorithmgreedy algorithmwmin/wmax=0.5wmin/wmax=0.1wmin/wmax=0.01101102103104105106107rank of matroid constraint, k0.00.20.40.60.81.0guaranteecontinuous greedy algorithmgreedy algorithm|V(1)|=100|V(1)|=1000|V(1)|=10000|V(1)|=100000|V(1)|=1000000|V(1)|=10000000\f(cid:80)\n\nx\u2208X mu(A) is a feature-speci\ufb01c non-negative modular function, and \u03c6u(x) is a feature-speci\ufb01c\nmonotone non-decreasing concave functions. Immediately, we have that the feature base functions\nare DSFs. Our proposed methods, therefore, offer a good bound for maximizing such functions if\nminu\u2208U\n\nminv\u2208V mu(v)\nmaxv\u2208V mu(v) k is large, which is fairly common in practice.\n\n2 Background and Problem Setup\nWe assume every set function f in this paper is normalized (i.e., f (\u2205) = 0). A function m is modular\nif and only if m and \u2212m are both submodular. A normalized modular function m(A) always has\nv\u2208V m(v) = w \u00b7 1A, where A \u2286 V , w and 1A are n-dimensional vectors,\nw = (m(1), m(2), . . . , m(n)) and 1A \u2208 RV\n\nthe form of m(A) =(cid:80)\n\n+ is 0 for coordinate i /\u2208 A and 1 for i \u2208 A.\n\n2.1 Matroid and matroid polytopes\nA matroid M = (V,I) is a family of subsets of ground set V with the following three properties:\n\n1. \u2205 \u2208 I.\n2. If A \u2208 I, then B \u2208 I for all B \u2286 A.\n3. For all A, B \u2208 I, if |A| > |B|, then there exists an element v \u2208 A \\ B, s.t. B \u222a {v} \u2208 I.\n\nThe sets I \u2208 I are the independent sets of the matroid. The third property ensures that the maximal\nindependent sets always have the same size, equal to the rank rM = k of the matroid. Matroids can\nbe generalized to the continuous domain via the matroid polytope P = conv(1A : A \u2208 I) where\n\u201cconv\u201d means the convex hull.\n\n2.2 Deep Submodular Function (DSFs)\n\nA DSF [13, 3] f is a natural generalization of\nfeature-based functions and can be de\ufb01ned on\na directed graph (Figure 2). The graph has\nK + 1 layers, where the \ufb01rst layer V = V (0)\nis the function\u2019s ground set, and additional lay-\ners V (1), V (2), V (3), . . . , V (K) are sets of \u201cfea-\ntures\u201d, \u201cmeta features\u201d, \u201cmeta-meta features\u201d,\netc. The size of V (i) is di = |V (i)| for i =\n0, 1, 2, . . . , K. Note that the size of the \ufb01nal\nlayer V (K) is always 1 because a DSF maps a\nset to a real number. For any i = 1, 2, . . . , K, two successive layers V (i\u22121) and V (i) are con-\nnected by a matrix w(i) \u2208 Rdi\u00d7di\u22121\n. Therefore, matrix w(i) is indexed by (vi, vi\u22121) for vi \u2208 V (i)\nand vi\u22121 \u2208 V (i\u22121). w(i)\nvi (vi\u22121) is an element from row vi and column vi\u22121. We may think of\n: 2V (i\u22121) \u2192 R+ as a modular function de\ufb01ned on subset of V (i\u22121). Further, let \u03c6vi : R+ \u2192 R+\nw(i)\nvi\nbe a non-negative, non-decreasing concave function. Thus, each element vi \u2208 V (i) has a mod-\nular function w(i)\nvi and concave function \u03c6vi for i = 1, 2, . . . , K. In this setting, a K-layer DSF\nf : 2V \u2192 R+ can be expressed, for any A \u2286 V , as follows:\n\nFigure 2: A layered DSF with K = 3 layers.\n\n+\n\n\uf8eb\uf8ec\uf8ed (cid:88)\n\nvK\u22121\u2208V (k\u22121)\n\n\u00aff (A) = \u03c6\n\nvK\n\n\uf8eb\uf8ec\uf8ed. . .\n\n(cid:88)\n\n(K)\n\nvK (vk\u22121)\u03c6\n\nw\n\nvK\u22121\n\nf (A) = \u00aff (A) + m\u00b1(A),\n\nwhere\n\n\uf8eb\uf8ec\uf8ed (cid:88)\n\nv1\u2208V (1)\n\n\uf8eb\uf8ed(cid:88)\n\na\u2208A\n\n(2)\n\nv2 (v1)\u03c6v1\nw\n\n\uf8f6\uf8f7\uf8f8\n\uf8f6\uf8f7\uf8f8\n\uf8f6\uf8f7\uf8f8.\n\uf8f6\uf8f8\n\nw\n\n(1)\nv1 (a)\n\n(1)\n\n(2)\n\n(3)\n\nv3 (v2)\u03c6v2\nw\n\nv2\u2208V (2)\n\n2.2.1 Concave functions \u03c6vi and continuity\nIn a DSF, \u03c6vi is a normalized (i.e., \u03c6vi(0) = 0) monotone non-decreasing concave function de\ufb01ned\non [0, +\u221e). Via concavity, this implies that the function must also be continuous on (0, +\u221e). The\nonly point that need not be continuous is x = 0, i.e., we may have limx\u21920+ \u03c6vi(x) > 0 = \u03c6vi(0).\nWhen used in a DSF, however, the set of possible input values to \u03c6vi(x) is countable. Let \u03b2 > 0\nbe the smallest strictly positive possible input to \u03c6vi(x). We de\ufb01ne another \u03c60,vi : R+ \u2192 R+ s.t.\n\n4\n\n\f\u03c60,vi(x) \u2261 \u03c6vi(x) for x \u2265 \u03b2 and \u03c60,vi(x) \u2261 \u03c6vi (\u03b2)\n\u03b2 x for 0 \u2264 x < \u03b2. \u03c60,vi is normalized, monotone\nnon-decreasing concave, and is continuous on [0, +\u221e). Moreover, replacing \u03c6vi(x) with \u03c60,vi(x)\nleaves the DSF\u2019s valuation uncharged for any set. Therefore, w.l.o.g. we assume that all concave\nfunctions are also right-continuous at x = 0.\n\n2.2.2 Final modular term m\u00b1\nRecall that f (A) = \u00aff (A) + m\u00b1(A), where \u00aff (A) has the form of nested concave over modular and\nis always monotone non-decreasing, and m\u00b1(A) is a simple modular functions but can be negative.\nAlthough [13, 3] claim that the \ufb01nal modular function is sometimes useful in applications, this \ufb01nal\nfunction will change the optimization properties of f, since \u00aff is monotone non-decreasing but f is\nnon-monotone. In this work, we focus on the monotone non-decreasing DSF case where m\u00b1 \u2265 0.3.\n\n2.3 DSF maximization\n\nThe problem we consider is DSF maximization, i.e.,\n\n(3)\nwhere f is a DSF function and M is a matroid independence constraint. In this work, we focusing on\nsolving this problem with the knowledge that f is DSF.\n\nProblem 1: max\n\nA\u2208M f (A)\n\n3 Continuous extension of submodular functions\nAlthough a submodular function is discrete, providing one value to each A \u2286 V , it is often useful to\nview such functions continuously. The bridge between the discrete and continuous worlds is made by\na continuous extension of a submodular function, which is some function from the hypercube [0, 1]n\nto R that agrees with f on the hypercube vertices [14]. This includes the Lov\u00e1sz extension [29],\nwhich is the convex closure of the function, and also the multilinear extension [6], which is an\napproximation of the concave closure. In general, most continuous methods [6, 20] follow a similar\nstrategy: they \ufb01rst \ufb01nd a continuous extension of f, then optimize it to obtain a fractional solution,\nand \ufb01nally \ufb01nish up by rounding the continuous solution back to a discrete \ufb01nal solution set.\nIn our framework, we use an extension that is tailor-made for a DSF.\n\n\uf8eb\uf8ec\uf8ed (cid:88)\n\nvK\u22121\u2208V (k\u22121)\n\n\uf8eb\uf8ec\uf8ed. . .\n\n(cid:88)\n\nv2\u2208V (2)\n\n\uf8eb\uf8ec\uf8ed (cid:88)\n\nv1\u2208V (1)\n\n(cid:16)\n\n\uf8f6\uf8f7\uf8f8\n(cid:17)\uf8f6\uf8f7\uf8f8\n\uf8f6\uf8f7\uf8f8.\n\n3.1 A DSF\u2019s Natural Concave Extension\nDSF functions have the form of nested sum of concave of modular (Equation (2)). [3] shows that\nthere exists a natural concave extension of f by replacing the discrete variables with real values in\nthe nested form F (x) = \u00afF (x) + m\u00b1 \u00b7 x where\n\n\u00afF (x) = \u03c6\n\nvK\n\n(K)\n\nvK (vk\u22121)\u03c6\n\nw\n\nvK\u22121\n\n(3)\n\nv3 (v2)\u03c6v2\nw\n\nw\n\n(2)\n\nv2 (v1)\u03c6v1\n\nw\n\n(1)\n\nv1 \u00b7 x\n\n(4)\n\nThus, f (A) in Equation (1) has f (A) = F (1A) for all A \u2286 V . In fact, we have the following:\nCorollary 1 ([3]). The DSF concave extension F (x) : [0, 1]n \u2192 R is an extension of a DSF f (x)\nand is concave.\n\nIn [3], it is claimed that the extension is potentially useful for maximizing DSFs, possibly in a\nconstrained fashion, followed by appropriate rounding methods, but the authors leave this as an open\nquestion. In the present work, we address this claim and answer this question in the af\ufb01rmative.\nBefore presenting our algorithm, we \ufb01rst discuss the relationship between DSF\u2019s natural concave\nextension and multilinear extension.\n\n3.2 Multilinear extension\n\n(cid:110)\n\n:(cid:80)\nS\u2286V pS = 1, pS \u2265 0\u2200S \u2286 V, & (cid:80)\n\nS\u2286V pSf (S), where\nThe concave closure of a submodular function f is de\ufb01ned as minp\u2208(cid:52)n(x)\n(cid:52)n(x) =\n. The concave clo-\nsure is NP-hard even to evaluate [14]; hence, the multilinear extension is often used. We \ufb01rst specify\nthe following de\ufb01nition:\n\nS\u2286V pS1S = x\n\np \u2208 R2n\n\n(cid:80)\n(cid:111)\n\n3Note, if m\u00b1 is non-negative, it can merge into \u00aff (A) which is equivalent to m\u00b1 = 0\n\n5\n\n\fDe\ufb01nition 1. For a given n-dimensional vector x \u2208 [0, 1]n, de\ufb01ne Dx to be a distribution over sets\nA, s.t. Pr(A) = \u03a0v\u2208Axv\u03a0v\u2208V \\A(1 \u2212 xv).\nIf we sample a random set A from Dx, then the event v \u2208 A is independent from u \u2208 A if v (cid:54)= u and\nPr(v \u2208 A) = xv. With these de\ufb01nitions, we may de\ufb01ne the multlinear extension as:\nDe\ufb01nition 2 (Multilinear Extension). Lf (x) = EA\u223cDx f (A).\nCalinescu et al. [6] showed that we may solve the following continuous problem instead of solving\nProblem 1 directly.\n\nProblem 2: max\n\nx\u2208P Lf (x)\n\n(5)\n\nwhere P is a matroid polytope for M.\nUnfortunately, two problems remain with the multilinear extension Lf (x). First, calculating the\nexact value is not feasible in general, and even estimating it needs O(n5) time [6]. Second, it\nis not concave. Therefore \ufb01nding the global maximizer of Problem 2 is in general not feasible.\nHowever, Calinescu et al. [6] developed a continuous greedy algorithm that \ufb01nds \u02c6x s.t. Lf (\u02c6x) \u2265\n(1 \u2212 1/e)Lf (x\u2217) where x\u2217 \u2208 argmaxx\u2208P Lf (x). It is not hard to show that Lf (x\u2217) \u2265 f (A\u2217) where\nA\u2217 \u2208 argmaxA\u2208M f (A), since Lf (1A\u2217 ) = f (A\u2217). Therefore, Lf (\u02c6x) \u2265 (1 \u2212 1/e)f (A\u2217). Next, we\nshow how they round \u02c6x.\n\nRounding Rounding is a methodology that returns a discrete set from a fractional vector. \u201cPipage\nrounding\u201d was \ufb01rst designed by Ageev et al. [2] and modi\ufb01ed by Calinecu et al. [6] for submodular\nmodular maximization, using a convex property of the multilinear extension. It maintains the quality\nof the solution in expectation, i.e., E \u02c6A\u223cPIPAGE ROUNDING(\u02c6x)f ( \u02c6A) \u2265 Lf (\u02c6x), while satisfying the matroid\nconstraint, thus \ufb01nishing the proof sketch of the 1 \u2212 1/e bounds for the continuous greedy algorithm.\nAnother rounding technique is swap rounding [9] which can be seen as a replacement of pipage\nrounding with better running time O(nk2). In the special case of the matroid constraint, e.g., a simple\npartition matroid [8, 10]4, a simple rounding technique [5] is equivalent to pipage rounding with\nmuch easier implementation and linear running time. In our work, we can use any proper rounding\ntechniques.\nIn this work, we show that given any DSF, it is not necessary to compute the multilinear extension at\nall. This is based on the following theorem:\nTheorem 1. For all f \u2208 DSF, its DSF concave extension F , and for all x \u2208 [0, 1]n, we have\n(1 \u2212 \u03b4)\nProof. See Appendix A.\n\nF (x) \u2264 Lf (x) \u2264 F (x) where \u2206(x) = minv1\u2208V (1)\n\n1 \u2212 |V (1)|e\u2212 \u03b42\u2206(x)\n\nmaxv\u2208V wv1 (v)\n\nv1 \u00b7x\nw(1)\n\n(cid:104)\n\n(cid:105)\n\n2\n\nwmax\n\nIn Theorem 1, the term \u2206(x) is fairly complex to interpret, but help can be gained by considering a\nlower bound of \u2206(x) offered by the following lemma:\nLemma 1. \u2206(x) \u2265 (cid:107)x(cid:107)1wmin\n, where wmax = maxv1\u2208V (1) maxv\u2208V wv1 (v) and wmin =\nminv1\u2208V (1) minv\u2208V wv1(v). If x is on the extreme point of a matroid polytope, then \u2206(x) \u2265 kwmin\nwhere k is the rank of the matroid.\nBy applying Lemma 1 to Theorem 1 and \u2206(x) = \u2126(k) and noticing |V (1)|e\u2212\u03b42\u2126(k) \u2265\ne\u2212\u03b42\u2126(k)+log(|V (1)|) = e\u2212\u03b42\u2126(k), we have the following results.\nProposition 1. max\u03b4(1 \u2212 \u03b4)\nIn Figure 1, we show that the coef\ufb01cient of the lower bound converges to close to 1 as k \u2192 +\u221e.\nTheorem 1 is one of the major results of the present work. It gives a concave relaxation (i.e., the\nnatural concave extension of a DSF) of the non-concave multilinear extension Lf . In this sense,\nwe claim that multilinear extension Lf is closed to the DSF\u2019s natural concave extension F . Not\nsurprisingly, maximizing a concave function is much easier than maximizing the multilinear extension\nfor a variety of reasons.\n\n1 \u2212 e\u2212\u03b42\u2126(k)(cid:105)\n\nF (x) \u2264 Lf (x) \u2264 F (x)\n\n(cid:104)\n\nwmax\n\n4I = {A \u2286 V ||A \u222a Vi| \u2264 1 \u2200i} where {Vi}s are a partition of V .\n\n6\n\n\fLemma 2. Any concave problem solver that \ufb01nds a solution \u02c6x such that F (\u02c6x) \u2265 (1 \u2212 \u0001)F (x\u2217\nsatisfy Lf (\u02c6x) \u2265 (1 \u2212 \u0001)(1 \u2212 \u03b4)\nof the corresponding function subject to the matroid polytope membership.\nProof. See Appendix B\n\n1 \u2212 |V (1)|e\u2212 \u03b42\u2206(\u02c6x)\n\nL), where x\u2217\n\nF ) will\nL are the maximizer\n\nF and x\u2217\n\n2\n\n(cid:105)L(x\u2217\n\n(cid:104)\n\n4 Projected Gradient Ascent\n\nFollowing the general framework of [6, 20], we \ufb01rst \ufb01nd a fractional solution of the concave extension\nand then employ pipage rounding to obtain a feasible set. This approach offers the aforementioned\nguarantee for any member of the DSF family, regardless of its curvature.\n\n4.1 Supergradient\nFor a concave function F : P \u2192 R, where P \u2286 Rn is a compact convex set, the set of supergradients\nof f is de\ufb01ned as\n\n(6)\n\nGiven the formula of DSF concave extension F (x), it is easy to compute supergradient as follows:\n\n\u2202f (x) = {g \u2208 Rn|f (y) \u2212 f (x) \u2264 g \u00b7 (y \u2212 x)\u2200y \u2208 P}\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nvK\u22121 (\u00b7) . . . \u03c6\nv2 (\u00b7)\u03c6\nv1 (\u00b7)w(K)\n(cid:48)\n(cid:48)\n(cid:48)\n\n. . .\n\n\u03c6\n\nvK (vk\u22121) . . . w(2)\n\nv2 (v1)w(1)\n\nv1 (e)\n\ng(x)e = \u03c6\n\nvK (\u00b7)\n(cid:48)\n\n(7)\n\nv1\u2208V (1)\n\nvK\u22121\u2208V (k\u22121)\n\n:DSF concave extension F , matroid polytope P,\nlearning rate \u03b7, maximum number of iterations T\n\nAlgorithm 1: Projected Gradient Ascent [4]\ninput\nLet x(0) \u2190 argminx\u2208P (cid:107)x(cid:107)2\nfor t = 1, 2, . . . , T do\n\nv2\u2208V (2)\nv1 (\u00b7)\nwhere e \u2208 [n] is a coordinate, \u03c6(cid:48)\nis the derivative of the concave func-\ntion \u03c6v1 (x) at its current evaluation\nif it is differentiable, or is any super-\ngradient of \u03c6v1 (x) if it is not differ-\nentiable. In fact, the way to calcu-\nlate the supergradient of a DSF is\nexactly the same as what the back-\npropagation algorithm needs in deep\nneural network (DNN) training, and\nthis was used in [13] to train DSFs.\nThis is also one of the reasons for the\nname deep submodular functions. Therefore, all of the toolkits available for DNN training, with\nprovisions for automatic symbolic differentiation (e.g., PyTorch [33] and TensorFlow [1] ) can be\nused to maximize a DSF. Since they are optimized for fast GPU computing, they can offer great\npractical and computational advantages over traditional submodular maximization procedures.\n\ncompute a supergradient g(x(t\u22121)) \u2208 \u2202F (x(t\u22121)) ;\nx(t) \u2190 argminx\u2208P\n2;\n// This is done by projecting x(t\u22121) + \u03b7g(x(t\u22121) to P\n\n(cid:13)(cid:13)x \u2212(cid:0)x(t\u22121) + \u03b7g(x(t\u22121))(cid:1)(cid:13)(cid:13)2\n\nend\nreturn 1\nT\n\n(cid:80)T\n\nt=1 x(t)\n\n2\n\n4.2 Projected gradient Ascent\n\nWe utilize the following theorem from [4, 7] (modi\ufb01ed for the concave, rather than convex, case) to\nestablish our bounds for DSF-based submodular maximization.\nTheorem 2. [[4, 7]] For any concave function F : Rn\nsupx\u2208P (cid:107)g(x)(cid:107)2\nF (\u02c6x) \u2265 maxx\u2208P F (x) \u2212 RB\n\n+ \u2192 R, let R2 = supx\u2208P (cid:107)x(cid:107)2\n2 and B2 =\nBT will obtain a fractional solution \u02c6x s.t.\n\n2, Algorithm 1 with learning rate \u03b7 =\n\n(cid:113) R\n\n(cid:113) 2\n\nT .\n\nApplying Theorem 2 to Algorithm 1 and using our propose concave function F (x), we have the\nfollowing result:\nLemma 3. For any 0 < \u0001 < 1, Algorithm 1 will obtain a fractional \u02c6x s.t. f (\u02c6x) \u2265 (1 \u2212\n\u0001) maxx\u2208P f (x) with running time T = O(n2\u0001\u22122).\nProof. See Appendix C.\n\nThus, we have a approximate solution to the concave maximization problem and using this, in concert\nwith Lemma 2, we arrive at the following which offers a guarantee of our proposed method.\n\n7\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 3: (a) DSF structure (b) performance comparison, solution value vs. k, (c) running time vs. k.\n\n(cid:20)\n\n(cid:21)\n\nTheorem 3. Algorithm 1 with pipage rounding will give \u02c6X such that Ef ( \u02c6X) \u2265 max0<\u03b4<1(1 \u2212\n\u0001)(1 \u2212 \u03b4)\n\nmaxX\u2286M f (X) with running time T = O(n2\u0001\u22122)\n\n1 \u2212 |V (1)|e\n\n\u2212 \u03b42wmink\n\nwmax\n\nIn Figure 1, we have a comparison of this bound with the traditional 1/2 and 1 \u2212 1/e bounds. We \ufb01nd\nour proposed bound approaches 1 when k \u2192 +\u221e and beats other bounds for large k (> 104 \u223c 106,\ndepending on wmin/wmax).\nCorollary 2. Algorithm 1 with with pipage rounding will give \u02c6X such that Ef ( \u02c6X) \u2265 max0<\u03b4<1(1\u2212\n\u0001 \u2212 \u03b4 \u2212 e\u2212\u03b42\u2126(k)) maxX\u2286M f (X) with running time T = O(n2\u0001\u22122)\n\n5 Experiments\n\nw(2)\n\ni\n\ni\n\n(cid:113)\n\n(1)f1,1(A) + w(2)\n\n(2)f1,2(A) for i \u2208 V (2), where w(2)\n\nIn this section, we perform a number of synthetic dataset experiments in order to demonstrate proof\nof concept and also to offer empirical evidence supporting our bounds above. While the results of the\npaper are primarily theoretical, the results of this section show that our methods can yield practical\nbene\ufb01t and also demonstrate the potential of the above methods for large-scale DSF-constrained\nmaximization.\nFigure 3 shows the structure of the DSF f : 2V \u2192 R+ to be maximized.\nIt is a three-layer\nDSF having ground set V = V (0) with |V | = n. We partition the ground set V into blocks\nV1 \u222a V2 \u222a V3 s.t. |V1| = |V2| = |V3| = t, where t = |V |/3. In the next layer V (1), the inner\npart of f consists of two concave-composed-with-modular functions, f1,1(A) = min(|X \u2229 [V1 \u222a\nV3]| + \u03b1|X \u2229 V2|, t) and f1,2(A) = \u03b1|X \u2229 [V1 \u222a V3]| + |X \u2229 V2| where \u03b1 = 0.1 is a parameter. In\nthe subsequent layer V (2), every node is concave over the weighted sum of f1,1(A) and f1,2(A),\n(cid:80)\ni.e., f2,i(A) =\nis a 2-dimensional\nuniformly at random vector from [0, 1]2. Finally, for the last layer V (3), the entire function f (A) =\ni\u2208V (2) w(3)(i)f2,i(A), where w(3) is a |V (2)|\u2212dimensional vector again uniformly at random from\n[0, 1]. The matroid constraint is a partition matroid s.t. X is independent if |X \u2229 {v1,i, v2,i}| \u2264 1\nfor i = 1, 2, . . . , t, where we label V1 = {v1,i}t\ni=1. The rank of this matroid is\ntherefore k = 2t. We repeat the experiment on 30 random DSFs. For each DSF, we maximize it by\neach algorithm and take the average function value, respectively. Figure 3(b) shows the performance\nof our method compared to the combinatorial greedy algorithm using the lazy evaluation trick [30].\nWe see that our method offers a solution that is consistently better than the standard greedy for all\nk. Regarding running time, we \ufb01nd that while our method is slower than lazy greedy for small k, it\nbecomes faster than lazy greedy for large k (Figure 3c). For a fair comparison, both algorithms were\nimplemented in Python and run on a single CPU. We anticipate that our method will run even faster\non parallel GPU machines, which can be accomplished easily using any modern DNN toolkit (e.g.,\nPyTorch [33] or TensorFlow [1]).\nAcknowledgments: This material is based upon work supported by the National Science Foundation\nunder Grant No. IIS-1162606, the National Institutes of Health under award R01GM103544, and by\na Google, a Microsoft, and an Intel research award. This research is also supported by the CONIX\nResearch Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program\nsponsored by DARPA.\n\ni=1 and V1 = {v2,i}t\n\ni\n\n8\n\nn=|V|V(1)V(2)V(3)V(4)101102103104k020406080100120140160performance projected gradient ascentgreedy101102103104k102101100101102103104105running timeprojected gradient ascentgreedy\fReferences\n[1] Mart\u00edn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,\nGreg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow,\nAndrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,\nManjunath Kudlur, Josh Levenberg, Dandelion Man\u00e9, Rajat Monga, Sherry Moore, Derek\nMurray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal\nTalwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi\u00e9gas, Oriol Vinyals, Pete\nWarden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-\nscale machine learning on heterogeneous systems, 2015. Software available from tensor\ufb02ow.org.\n\n[2] Alexander A Ageev and Maxim I Sviridenko. Pipage rounding: A new method of construct-\ning algorithms with proven performance guarantee. Journal of Combinatorial Optimization,\n8(3):307\u2013328, 2004.\n\n[3] Jeffrey Bilmes and Wenruo Bai. Deep Submodular Functions. Arxiv, abs/1701.08939, Jan 2017.\n\n[4] S\u00e9bastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and\n\nTrends R(cid:13) in Machine Learning, 8(3-4):231\u2013357, 2015.\n\n[5] G. Calinescu, C. Chekuri, M. P\u00e1l, and J. Vondr\u00e1k. Maximizing a monotone submodular function\n\nsubject to a matroid constraint. SIAM Journal on Computing, 40(6):1740\u20131766, 2011.\n\n[6] Gruia Calinescu, Chandra Chekuri, Martin P\u00e1l, and Jan Vondr\u00e1k. Maximizing a submodular set\nfunction subject to a matroid constraint. In International Conference on Integer Programming\nand Combinatorial Optimization, pages 182\u2013196. Springer, 2007.\n\n[7] Deeparnab Chakrabarty, Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. Subquadratic\n\nsubmodular function minimization. arXiv preprint arXiv:1610.09800, 2016.\n\n[8] Chandra Chekuri and Amit Kumar. Maximum coverage problem with group budget con-\nstraints and applications. In Approximation, Randomization, and Combinatorial Optimization.\nAlgorithms and Techniques, pages 72\u201383. Springer, 2004.\n\n[9] Chandra Chekuri, Jan Vondr\u00e1k, and Rico Zenklusen. Dependent randomized rounding for\n\nmatroid polytopes and applications. arXiv preprint arXiv:0909.4348, 2009.\n\n[10] Gerard Cornuejols, Marshall L Fisher, and George L Nemhauser. Exceptional paper location\nof bank accounts to optimize \ufb02oat: An analytic study of exact and approximate algorithms.\nManagement science, 23(8):789\u2013810, 1977.\n\n[11] W.H. Cunningham. On submodular function minimization. Combinatorica, 5(3):185\u2013192,\n\n1985.\n\n[12] J. Djolonga and A. Krause. From MAP to Marginals: Variational Inference in Bayesian Sub-\nmodular Models. In Neural Information Processing Society (NIPS), Montreal, CA, December\n2014.\n\n[13] Brian W Dolhansky and Jeff A Bilmes. Deep submodular functions: De\ufb01nitions and learning.\n\nIn Advances in Neural Information Processing Systems, pages 3396\u20133404, 2016.\n\n[14] Shaddin Dughmi. Submodular functions: Extensions, distributions, and algorithms. a survey.\n\narXiv preprint arXiv:0912.0322, 2009.\n\n[15] S. Fujishige. Submodular functions and optimization, volume 58. Elsevier Science, 2005.\n\n[16] Toshihiro Fujito. Approximation algorithms for submodular set cover with applications. IEICE\n\nTransactions on Information and Systems, 83(3):480\u2013487, 2000.\n\n[17] Michael Gygli, Helmut Grabner, and Luc Van Gool. Video summarization by learning submod-\n\nular mixtures of objectives. In Proceedings CVPR 2015, pages 3090\u20133098, 2015.\n\n[18] Hamed Hassani, Mahdi Soltanolkotabi, and Amin Karbasi. Gradient methods for submodular\nmaximization. In Advances in Neural Information Processing Systems, pages 5841\u20135851, 2017.\n\n9\n\n\f[19] Stefanie Jegelka and Jeff Bilmes. Submodularity beyond submodular energies: coupling edges\nin graph cuts. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on,\npages 1897\u20131904. IEEE, 2011.\n\n[20] Mohammad Karimi, Mario Lucic, Hamed Hassani, and Andreas Krause. Stochastic submodular\nmaximization: The case of coverage functions. In Advances in Neural Information Processing\nSystems, pages 6856\u20136866, 2017.\n\n[21] Katrin Kirchhoff and Jeff Bilmes. Submodularity for data selection in machine translation. In\nProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing\n(EMNLP), pages 131\u2013141, 2014.\n\n[22] Pushmeet Kohli, M Pawan Kumar, and Philip HS Torr. P3 & beyond: Move making algo-\nrithms for solving higher order functions. Pattern Analysis and Machine Intelligence, IEEE\nTransactions on, 31(9):1645\u20131656, 2009.\n\n[23] Andreas Krause, Carlos Guestrin, Anupam Gupta, and Jon Kleinberg. Near-optimal sensor\nplacements: Maximizing information while minimizing communication cost. In Proceedings\nof the 5th international conference on Information processing in sensor networks, pages 2\u201310.\nACM, 2006.\n\n[24] Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its\nimplications for combinatorial and convex optimization. In Foundations of Computer Science\n(FOCS), 2015 IEEE 56th Annual Symposium on, pages 1049\u20131065. IEEE, 2015.\n\n[25] Maxwell W. Libbrecht, Jeffrey A. Bilmes, and William Stafford Noble. Choosing non-redundant\nrepresentative subsets of protein sequence data sets using submodular optimization. Proteins:\nStructure, Function, and Bioinformatics, 2018.\n\n[26] H. Lin, J. Bilmes, and S. Xie. Graph-based submodular selection for extractive summarization.\n\nIn ASRU, 2009.\n\n[27] Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In\nProceedings of the 49th Annual Meeting of the Association for Computational Linguistics:\nHuman Language Technologies-Volume 1, pages 510\u2013520. Association for Computational\nLinguistics, 2011.\n\n[28] Hui Lin and Jeff Bilmes. Word alignment via submodular maximization over matroids. In\nProceedings of the 49th Annual Meeting of the Association for Computational Linguistics:\nHuman Language Technologies: short papers-Volume 2, pages 170\u2013175. Association for\nComputational Linguistics, 2011.\n\n[29] L\u00e1szl\u00f3 Lov\u00e1sz. Submodular functions and convexity. In Mathematical Programming The State\n\nof the Art, pages 235\u2013257. Springer, 1983.\n\n[30] M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. Opti-\n\nmization Techniques, pages 234\u2013243, 1978.\n\n[31] Aryan Mokhtari, Hamed Hassani, and Amin Karbasi. Conditional gradient method for stochas-\nIn International Conference on Arti\ufb01cial\n\ntic submodular maximization: Closing the gap.\nIntelligence and Statistics, pages 1886\u20131895, 2018.\n\n[32] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations\nfor maximizing submodular set functions\u2014i. Mathematical Programming, 14(1):265\u2013294,\n1978.\n\n[33] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito,\nZeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in\npytorch. 2017.\n\n[34] Prabhakar Raghavan. Probabilistic construction of deterministic algorithms: approximating\n\npacking integer programs. Journal of Computer and System Sciences, 37(2):130\u2013143, 1988.\n\n[35] Tasuku Soma and Yuichi Yoshida. A generalization of submodular cover via the diminishing\nreturn property on the integer lattice. In Advances in Neural Information Processing Systems,\npages 847\u2013855, 2015.\n\n10\n\n\f[36] P. Stobbe and A. Krause. Ef\ufb01cient minimization of decomposable submodular functions. In\n\nNIPS, 2010.\n\n[37] Maxim Sviridenko, Jan Vondr\u00e1k, and Justin Ward. Optimal approximation for submodular and\nsupermodular optimization with bounded curvature. In Proceedings of the Twenty-Sixth Annual\nACM-SIAM Symposium on Discrete Algorithms, pages 1134\u20131148. Society for Industrial and\nApplied Mathematics, 2015.\n\n[38] Adrian Vetta. Nash equilibria in competitive societies, with applications to facility location,\ntraf\ufb01c routing and auctions. In Foundations of Computer Science, 2002. Proceedings. The 43rd\nAnnual IEEE Symposium on, pages 416\u2013425. IEEE, 2002.\n\n[39] Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active\n\nlearning. In International Conference on Machine Learning (ICML), Lille, France, 2015.\n\n[40] Kai Wei, Maxwell W Libbrecht, Jeffrey A Bilmes, and William Noble. Choosing panels of\n\ngenomics assays using submodular optimization (tr). bioRxiv, 2016.\n\n[41] Kai Wei, Yuzong Liu, Katrin Kirchhoff, Chris Bartels, and Jeff Bilmes. Submodular subset\n\nselection for large-scale speech training data. Proceedings of ICASSP, Florence, Italy, 2014.\n\n11\n\n\f", "award": [], "sourceid": 4941, "authors": [{"given_name": "Wenruo", "family_name": "Bai", "institution": "University of Washington"}, {"given_name": "William", "family_name": "Stafford Noble", "institution": "University of Washington"}, {"given_name": "Jeff", "family_name": "Bilmes", "institution": "University of Washington, Seattle"}]}