{"title": "Adaptive Maximization of Pointwise Submodular Functions With Budget Constraint", "book": "Advances in Neural Information Processing Systems", "page_first": 1244, "page_last": 1252, "abstract": "We study the worst-case adaptive optimization problem with budget constraint that is useful for modeling various practical applications in artificial intelligence and machine learning. We investigate the near-optimality of greedy algorithms for this problem with both modular and non-modular cost functions. In both cases, we prove that two simple greedy algorithms are not near-optimal but the best between them is near-optimal if the utility function satisfies pointwise submodularity and pointwise cost-sensitive submodularity respectively. This implies a combined algorithm that is near-optimal with respect to the optimal algorithm that uses half of the budget. We discuss applications of our theoretical results and also report experiments comparing the greedy algorithms on the active learning problem.", "full_text": "Adaptive Maximization of Pointwise Submodular\n\nFunctions With Budget Constraint\n\nNguyen Viet Cuong1\n\nHuan Xu2\n\n1Department of Engineering, University of Cambridge, vcn22@cam.ac.uk\n\n2Stewart School of Industrial & Systems Engineering, Georgia Institute of Technology,\n\nhuan.xu@isye.gatech.edu\n\nAbstract\n\nWe study the worst-case adaptive optimization problem with budget constraint that\nis useful for modeling various practical applications in arti\ufb01cial intelligence and\nmachine learning. We investigate the near-optimality of greedy algorithms for this\nproblem with both modular and non-modular cost functions. In both cases, we\nprove that two simple greedy algorithms are not near-optimal but the best between\nthem is near-optimal if the utility function satis\ufb01es pointwise submodularity and\npointwise cost-sensitive submodularity respectively. This implies a combined\nalgorithm that is near-optimal with respect to the optimal algorithm that uses half\nof the budget. We discuss applications of our theoretical results and also report\nexperiments comparing the greedy algorithms on the active learning problem.\n\n1\n\nIntroduction\n\nConsider problems where we need to adaptively make a sequence of decisions while taking into\naccount the outcomes of previous decisions. For instance, in the sensor placement problem [1, 2], one\nneeds to sequentially place sensors at some pre-speci\ufb01ed locations, taking into account the working\nconditions of previously deployed sensors. The aim is to cover as large an area as possible while\nkeeping the cost of placement within a given budget. As another example, in the pool-based active\nlearning problem [3, 4], one needs to sequentially select unlabeled examples and query their labels,\ntaking into account the previously observed labels. The aim is to learn a good classi\ufb01er while ensuring\nthat the cost of querying does not exceed some given budget.\nThese problems can usually be considered under the framework of adaptive optimization with budget\nconstraint. In this framework, the objective is to \ufb01nd a policy for making decisions that maximizes the\nvalue of some utility function. With a budget constraint, such a policy must have a cost no higher than\nthe budget given by the problem. Adaptive optimization with budget constraint has been previously\nstudied in the average case [2, 5, 6] and worst case [7]. In this paper, we focus on this problem in the\nworst case.\nIn contrast to previous works on adaptive optimization with budget constraint (both in the average\nand worst cases) [2, 8], we consider not only modular cost functions but also general, possibly\nnon-modular, cost functions on sets of decisions. For example, in the sensor placement problem, the\ncost of a set of deployed sensors may be the weight of the minimum spanning tree connecting those\nsensors, where the weight of the edge between any two sensors is the distance between them.1 In\nthis case, the cost of deploying a sensor is not \ufb01xed, but depends on the set of previously deployed\nsensors. This setting allows the cost function to be non-modular, and thus is more general than the\nsetting in previous works, which usually assume the cost to be modular.\n\n1This cost function is reasonable in practice if we think of it as the minimal necessary communication cost to\n\nkeep the sensors connected (rather than the placement cost).\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fWhen cost functions are modular, we focus on the useful class of pointwise submodular utility\nfunctions [2, 7, 8] that has been applied to interactive submodular set cover and active learning\nproblems [7, 8]. With this class of utilities, we investigate the near-optimality of greedy policies for\nworst-case adaptive optimization with budget constraint. A policy is near-optimal if its worst-case\nutility is within a constant factor of the optimal worst-case utility. We \ufb01rst consider two greedy\npolicies: one that maximizes the worst-case utility gain and one that maximizes the worst-case utility\ngain per unit cost increment at each step. If the cost is uniform and modular, it is known that these\ntwo policies are equivalent and near-optimal [8]; however, we show in this paper that they cannot\nachieve near-optimality with non-uniform modular costs. Despite this negative result, we can prove\nthat the best between these two greedy policies always achieves near-optimality. This suggests we\ncan combine the two policies into one greedy policy that is near-optimal with respect to the optimal\nworst-case policy that uses half of the budget. We discuss applications of our theoretical results to the\nbudgeted adaptive coverage problem and the budgeted pool-based active learning problem, both of\nwhich can be modeled as worst-case adaptive optimization problems with budget constraint. We also\nreport experimental results comparing the greedy policies on the latter problem.\nWhen cost functions are general and possibly non-modular, we propose a novel class of utility\nfunctions satisfying a property called pointwise cost-sensitive submodularity. This property is a\ngeneralization of cost-sensitive submodularity to the adaptive setting. In essence, cost-sensitive\nsubmodularity means the utility is more submodular than the cost. Submodularity [9] and point-\nwise submodularity are special cases of cost-sensitive submodularity and pointwise cost-sensitive\nsubmodularity respectively when the cost is modular. With this new class of utilities, we prove\nsimilar near-optimality results for the greedy policies as in the case of modular costs. Our proofs\nbuild upon the proof techniques for worst-case adaptive optimization with uniform modular costs [8]\nand non-adaptive optimization with non-uniform modular costs [10] but go beyond them to handle\ngeneral, possibly non-uniform and non-modular, costs.\n\n2 Worst-case Adaptive Optimization with Budget Constraint\nWe now formalize the framework for worst-case adaptive optimization with budget constraint. Let X\nbe a \ufb01nite set of items (or decisions) and Y be a \ufb01nite set of possible states (or outcomes). Each item\nin X can be in any particular state in Y. Let h : X \u2192 Y be a deterministic function that maps each\nitem x \u2208 X to its state h(x) \u2208 Y. We call h a realization. Let H (cid:44) YX = {h | h : X \u2192 Y} be the\nrealization set consisting of all possible realizations.\nWe consider the problem where we sequentially select a subset of items from X as follows: we select\nan item, observe its state, then select the next item, observe its state, etc. After some iterations, our\nobservations so far can be represented as a partial realization, which is a partial function from X\nto Y. An adaptive strategy to select items takes into account the states of all previous items when\ndeciding the next item to select. Each adaptive strategy can be encoded as a deterministic policy for\nselecting items, where a policy is a function from a partial realization to the next item to select. A\npolicy can be represented by a policy tree in which each node is an item to be selected and edges\nbelow a node correspond to its states.\nWe assume there is a cost function c : 2X \u2192 R\u22650, where 2X is the power set of X . For any set of items\nS \u2286 X , c(S) is the cost incurred if we select the items in S and observe their states. For simplicity,\nx\u2208S c({x}) for all\nS. In general, c can be non-modular. We shall consider the modular cost setting in Section 3 and the\nnon-modular cost setting in Section 4.\nFor a policy \u03c0, we de\ufb01ne the cost of \u03c0 as the maximum cost incurred by a set of items selected along\nany path of the policy tree of \u03c0. Note that if we \ufb01x a realization h, the set of items selected by the\npolicy \u03c0 is \ufb01xed, and we denote this set by x\u03c0\nh corresponds to a path of the policy tree of\n\u03c0, and thus the cost of \u03c0 can be formally de\ufb01ned as c(\u03c0) (cid:44) maxh\u2208H c(x\u03c0\nh).\nIn the worst-case adaptive optimization problem, we have a utility function f : 2X \u00d7 H \u2192 R\u22650 that\nwe wish to maximize in the worst case. The utility function f (S, h) depends on a set S of selected\nitems and a realization h that determines the states of all items. Essentially, f (S, h) denotes the value\nof selecting S, given that the true realization is h. We assume f (\u2205, h) = 0 for all h.\nFor a policy \u03c0, we de\ufb01ne its worst-case utility as fworst(\u03c0) (cid:44) minh\u2208H f (x\u03c0\nh, h). Given a budget\nK > 0, our goal is to \ufb01nd a policy \u03c0\u2217 whose cost does not exceed K and \u03c0\u2217 maximizes fworst.\n\nwe also assume c(\u2205) = 0 and c(S) > 0 for S (cid:54)= \u2205. If c is modular, then c(S) =(cid:80)\n\nh. The set x\u03c0\n\n2\n\n\fFormally, \u03c0\u2217 (cid:44) arg max\u03c0 fworst(\u03c0) subject to c(\u03c0) \u2264 K. We call this the problem of worst-case\nadaptive optimization with budget constraint.\n\n3 Modular Cost Setting\n\nIn this section, we consider the setting where the cost function is modular. This setting is very\ncommon in the literature (e.g., see [2, 10, 11, 12]). We will describe the assumptions on the utility\nfunction, the greedy algorithms for worst-case adaptive optimization with budget constraint, and the\nanalyses of these algorithms. Proofs in this section are given in the supplementary material.\n\n3.1 Assumptions on the Utility Function\n\nAdaptive optimization with an arbitrary utility function is often infeasible, so we only focus on a\nuseful class of utility functions: the pointwise monotone submodular functions. Recall that a set\nfunction g : 2X \u2192 R is submodular if it satis\ufb01es the following diminishing return property: for\nall A \u2286 B \u2286 X and x \u2208 X \\ B, g(A \u222a {x}) \u2212 g(A) \u2265 g(B \u222a {x}) \u2212 g(B). Furthermore, g is\nmonotone if g(A) \u2264 g(B) for all A \u2286 B. In our setting, the utility function f (S, h) depends on\nboth the selected items and the realization, and we assume it satis\ufb01es the pointwise submodularity,\npointwise monotonicity, and minimal dependency properties below.\nDe\ufb01nition 1 (Pointwise Submodularity). A utility function f (S, h) is pointwise submodular if the set\nfunction fh(S) (cid:44) f (S, h) is submodular for all h \u2208 H.\nDe\ufb01nition 2 (Pointwise Monotonicity). A utility function f (S, h) is pointwise monotone if the set\nfunction fh(S) (cid:44) f (S, h) is monotone for all h \u2208 H.\nDe\ufb01nition 3 (Minimal Dependency). A utility function f (S, h) satis\ufb01es minimal dependency if the\nvalue of f (S, h) only depends on the items in S and their states (with respect to the realization h).\n\nThese properties are useful for worst-case adaptive optimization and were also considered in [8] for\nuniform modular costs. Pointwise submodularity is an extension of submodularity and pointwise\nmonotonicity is an extension of monotonicity to the adaptive setting. Minimal dependency is needed\nto make sure the value of f only depends on what have already been observed. Without this property,\nthe value of f may be unpredictable and is hard to be reasoned about. The three assumptions above\nhold for practical utility functions that we will describe in Section 5.1.\n\n3.2 Greedy Algorithms and Theoretical Results\nOur paper focuses on greedy algorithms (or greedy policies) to maximize the worst-case utility with a\nbudget constraint. We are interested in a theoretical guarantee for these policies: the near-optimality\nguarantee. Speci\ufb01cally, a policy is near-optimal if its worst-case utility is within a constant factor of\nthe optimal worst-case utility. In this section, we consider two intuitive greedy policies and prove\nthat each of these policies is individually not near-optimal but the best between them will always be\nnear-optimal. We shall also discuss a combined policy and its guarantee in this section.\n3.2.1 Two Greedy Policies\nWe consider two greedy policies in Figure 1. These policies are described in the general form and\ncan be used for both modular and non-modular cost functions. In these policies, D is the partial\nrealization that we have observed so far, and XD (cid:44) {x \u2208 X | (x, y) \u2208 D for some y \u2208 Y} is the\ndomain of D (i.e., the set of selected items in D). For any item x, we write \u03b4(x | D) to denote the\nworst-case utility gain if x is selected after we observe D. That is,\n\n\u03b4(x | D) (cid:44) min\n\ny\u2208Y{f (XD \u222a {x},D \u222a {(x, y)}) \u2212 f (XD,D)}.\n\n(1)\n\nIn this de\ufb01nition, note that we have extended the utility function f to take a partial realization as the\nsecond parameter (instead of a full realization). This extension is possible because the utility function\nis assumed to satisfy minimal dependency, and thus its value only depends on the partial realization\nthat we have observed so far. In the policy \u03c01, for any item x \u2208 X and any S \u2286 X , we de\ufb01ne:\n\n(2)\nwhich is the cost increment of selecting x after S has been selected. If the cost function c is modular,\nthen \u2206c(x | S) = c({x}).\n\n\u2206c(x | S) (cid:44) c(S \u222a {x}) \u2212 c(S),\n\n3\n\n\fCost-average Greedy Policy \u03c01:\nD \u2190 \u2205; U \u2190 X ;\nrepeat\n\nPick x\u2217 \u2208 U that maximizes \u03b4(x\u2217 | D)/\u2206c(x\u2217 | XD);\nif c(XD \u222a {x\u2217}) \u2264 K then\nObserve state y\u2217 of x\u2217;\nD \u2190 D \u222a {(x\u2217, y\u2217)};\n\nend\nU \u2190 U \\ {x\u2217};\n\nuntil U = \u2205;\n\nCost-insensitive Greedy Policy \u03c02:\nD \u2190 \u2205; U \u2190 X ;\nrepeat\n\nPick x\u2217 \u2208 U that maximizes \u03b4(x\u2217 | D);\nif c(XD \u222a {x\u2217}) \u2264 K then\nObserve state y\u2217 of x\u2217;\nD \u2190 D \u222a {(x\u2217, y\u2217)};\n\nend\nU \u2190 U \\ {x\u2217};\n\nuntil U = \u2205;\n\nFigure 1: Two greedy policies for adaptive optimization with budget constraint.\n\nThe two greedy policies in Figure 1 are intuitive. The cost-average policy \u03c01 greedily selects the\nitems that maximize the worst-case utility gain per unit cost increment if they are still affordable by\nthe remaining budget. On the other hand, the cost-insensitive policy \u03c02 simply ignores the items\u2019\ncosts and greedily selects the affordable items that maximize the worst-case utility gain.\nAnalyses of \u03c01 and \u03c02: Given the two greedy policies, we are interested in their near-optimality:\nwhether they provide a constant factor approximation to the optimal worst-case utility. Unfortunately,\nwe can show that these policies are not near-optimal. This negative result is stated in Theorem 1 below.\nThe proof of this theorem constructs counter-examples where the policies are not near-optimal.\nTheorem 1. For any \u03c0i \u2208 {\u03c01, \u03c02} and \u03b1 > 0, there exists a worst-case adaptive optimization\nproblem with a utility f, a modular cost c, and a budget K such that f satis\ufb01es the assumptions in\nSection 3.1 and fworst(\u03c0i)/fworst(\u03c0\u2217) < \u03b1, where \u03c0\u2217 is the optimal policy for the problem.\n\n3.2.2 A Near-optimal Policy\nAlthough the greedy policies \u03c01 and \u03c02 are not near-optimal, we now show that the best between\nthem is in fact near-optimal. More speci\ufb01cally, let us de\ufb01ne a policy \u03c0 such that:\n\n(cid:26)\u03c01\n\n\u03c02\n\n\u03c0 (cid:44)\n\nif fworst(\u03c01) > fworst(\u03c02)\notherwise\n\n.\n\n(3)\n\n2 (1 \u2212 1/e) fworst(\u03c0\u2217).\n\nTheorem 2 below states that \u03c0 is near-optimal for the worst-case adaptive optimization problem with\nbudget constraint.\nTheorem 2. Let f be a utility that satis\ufb01es the assumptions in Section 3.1 and \u03c0\u2217 be the optimal\npolicy for the worst-case adaptive optimization problem with utility f, a modular cost c, and a budget\nK. The policy \u03c0 de\ufb01ned by Equation (3) satis\ufb01es fworst(\u03c0) > 1\n\n\u221a\n2 (1 \u2212 1/e) in Theorem 2 is slightly worse than the constant factor (1 \u2212 1/\nThe constant factor 1\ne)\nfor the non-adaptive budgeted maximum coverage problem [10]. If we apply this theorem to a\nproblem with a uniform cost, i.e., c({x}) = c({x(cid:48)}) for all x and x(cid:48), then \u03c01 = \u03c02 and fworst(\u03c0) =\n2 (1 \u2212 1/e) fworst(\u03c0\u2217).\nfworst(\u03c01) = fworst(\u03c02). Thus, from Theorem 2, fworst(\u03c01) = fworst(\u03c02) > 1\n2 (1 \u2212 1/e) in this case\nAlthough this implies the greedy policy is near-optimal, the constant factor 1\nis not as good as the constant factor (1 \u2212 1/e) in [8] for the uniform modular cost setting. We also\nnote that Theorem 2 still holds if we replace the cost-insensitive policy \u03c02 with only the \ufb01rst item\nthat it selects (see its proof for details). In other words, we can terminate \u03c02 right after it selects the\n\ufb01rst item and the near-optimality in Theorem 2 is still guaranteed.\n3.2.3 A Combined Policy\nWith Theorem 2, a naive approach to the worst-case adaptive optimization problem with budget\nconstraint is to estimate fworst(\u03c01) and fworst(\u03c02) (without actually running these policies) and use\nthe best between them. However, exact estimation of these quantities is intractable because it would\nrequire a consideration of all realizations (an exponential number of them) to \ufb01nd the worst-case\nrealization for these policies. This is very different from the non-adaptive setting [10, 12, 13] where\nwe can easily \ufb01nd the best policy because there is only one realization.\nFurthermore, in the adaptive setting, we cannot roll back once we run a policy. For example, we\ncannot run \u03c01 and \u03c02 at the same time to determine which one is better without doubling the budget.\n\n4\n\n\f1. Run \u03c01 with budget K/2 (half of the total\n\nbudget), and let the set of selected items be S1.\n2. Starting with the empty set, run \u03c02 with budget\n\nThis is because we have to pay the cost every time we want to observe the state of an item, and the\nnext item selected would depend on the previous states. Thus, the adaptive setting in our paper is\nmore dif\ufb01cult than the non-adaptive setting considered in previous works [10, 12, 13]. If we consider\na Bayesian setting with some prior on the set of realizations [2, 4, 14], we can sample a subset of\nrealizations from the prior to estimate fworst. However, this method does not provide any guarantee\nfor the estimation.\nGiven these dif\ufb01culties, a more practical ap-\nproach is to run both \u03c01 and \u03c02 using half of the\nbudget for each policy and combine the selected\nsets. Details of this combined policy (\u03c01/2) are\nin Figure 2. Using Theorem 2, we can show that\n\u03c01/2 is near-optimal compared to the optimal\nworst-case policy that uses half of the budget.\nTheorem 3 below states this result. We note that\nthe theorem still holds if the order of running \u03c01\nand \u03c02 is exchanged in the policy \u03c01/2.\nTheorem 3. Assume the same setting as in Theorem 2. Let \u03c0\u2217\nthe worst-case adaptive optimization problem with budget K/2.\nfworst(\u03c01/2) > 1\nSince Theorem 3 only compares \u03c01/2 with the optimal policy \u03c0\u2217\n1/2 that uses half of the budget, a\nnatural question is whether or not the policies \u03c01 and \u03c02 running with the full budget have a similar\nguarantee compared to \u03c0\u2217\n1/2. Using the same counter-example for \u03c02 in the proof of Theorem 1, we\ncan easily show in Theorem 4 that this guarantee does not hold for the cost-insensitive policy \u03c02.\nTheorem 4. For any \u03b1 > 0, there exists a worst-case adaptive optimization problem with a utility\nf, a modular cost c, and a budget K such that f satis\ufb01es the assumptions in Section 3.1 and\nfworst(\u03c02)/fworst(\u03c0\u2217\n\nK/2 and let the set of items selected in this\nstep be S2. For simplicity, we allow S2 to\noverlap with S1.\n3. Return S1 \u222a S2.\n\n1/2 be the optimal policy for\nThe policy \u03c01/2 satis\ufb01es\n\n1/2 is the optimal policy for the problem with budget K/2.\n\n2 (1 \u2212 1/e) fworst(\u03c0\u2217\n\n1/2).\n\nFigure 2: The combined policy \u03c01/2.\n\n1/2) < \u03b1, where \u03c0\u2217\n\nAs regards the cost-average policy \u03c01, it remains open whether running it with full budget provides\nany constant factor approximation to the worst-case utility of \u03c0\u2217\n1/2. However, in the supplementary\nmaterial, we show that it is not possible to construct a counter-example for this case using a modular\nutility function, so a counter-example (if there is any) should use a more sophisticated utility.\n\n4 Non-Modular Cost Setting\n\nWe \ufb01rst de\ufb01ne cost-sensitive submodularity, a generalization of submodularity that takes into account\na general, possibly non-modular, cost on sets of items. We then state the assumptions on the utility\nfunction and the near-optimality results of the greedy algorithms for this setting.\nCost-sensitive Submodularity: Let c be a general cost function that is strictly monotone, i.e.,\nc(A) < c(B) for all A \u2282 B. Hence, \u2206c(x | S) > 0 for all S and x /\u2208 S. Assume c satis\ufb01es\nthe triangle inequality: c(A \u222a B) \u2264 c(A) + c(B) for all A, B \u2286 X . We de\ufb01ne cost-sensitive\nsubmodularity as follows.\nDe\ufb01nition 4 (Cost-sensitive Submodularity). A set function g : 2X \u2192 R is cost-sensitively submodu-\nlar w.r.t. a cost function c if it satis\ufb01es: for all A \u2286 B \u2286 X and x \u2208 X \\ B,\n\ng(A \u222a {x}) \u2212 g(A)\n\n\u2206c(x | A)\n\n\u2265 g(B \u222a {x}) \u2212 g(B)\n\n\u2206c(x | B)\n\n.\n\n(4)\n\nIn essence, cost-sensitive submodularity is a generalization of submodularity and means that g is\nmore submodular than the cost c. When c is modular, cost-sensitive submodularity is equivalent to\nsubmodularity. If g is cost-sensitively submodular w.r.t. a submodular cost, it will also be submodular.\nSince c satis\ufb01es the triangle inequality, it cannot be super-modular but it can be non-submodular (see\nthe supplementary for an example).\nWe state some useful properties of cost-sensitive submodularity in Theorem 5. In this theorem,\n\u03b1g1 + \u03b2g2 is the function g(S) = \u03b1g1(S) + \u03b2g2(S) for all S \u2286 X , and \u03b1c1 + \u03b2c2 is the function\nc(S) = \u03b1c1(S) + \u03b2c2(S) for all S \u2286 X . The proof of this theorem is in the supplementary material.\n\n5\n\n\fTheorem 5. (a) If g1 and g2 are cost-sensitively submodular w.r.t. a cost function c, then \u03b1g1 + \u03b2g2\nis also cost-sensitively submodular w.r.t. c for all \u03b1, \u03b2 \u2265 0.\n(b) If g is cost-sensitively submodular w.r.t. cost functions c1 and c2, then g is also cost-sensitively\nsubmodular w.r.t. \u03b1c1 + \u03b2c2 for all \u03b1, \u03b2 \u2265 0 such that \u03b1 + \u03b2 > 0.\n\n(c) For any integer n \u2265 1, if g is monotone and c(S) =(cid:80)n\nai \u2265 0 such that(cid:80)n\n\ni=1 ai(g(S))i with non-negative coef\ufb01cients\n\n(d) If g is monotone and c(S) = \u03b1eg(S) for \u03b1 > 0, then g is cost-sensitively submodular w.r.t. c.\n\ni=1 ai > 0, then g is cost-sensitively submodular w.r.t. c.\n\nThis theorem speci\ufb01es various cases where a function g is cost-sensitively submodular w.r.t. a cost\nc. Note that neither g nor c needs to be submodular for this theorem to hold. Parts (a,b) state that\ncost-sensitive submodularity is preserved for linear combinations of either g or c. Parts (c,d) state\nthat if c is a polynomial (respectively, exponential) of g with non-negative (respectively, positive)\ncoef\ufb01cients, then g is cost-sensitively submodular w.r.t. c.\nAssumptions on the Utility: In this setting, we also assume the utility f (S, h) satis\ufb01es pointwise\nmonotonicity and minimal dependency. Furthermore, we assume it satis\ufb01es the pointwise cost-\nsensitive submodularity property below. This property is an extension of cost-sensitive submodularity\nto the adaptive setting and is also a generalization of pointwise submodularity for a general cost. If\nthe cost is modular, pointwise cost-sensitive submodularity is equivalent to pointwise submodularity.\nDe\ufb01nition 5 (Pointwise Cost-sensitive Submodularity). A utility f (S, h) is pointwise cost-sensitively\nsubmodular w.r.t. a cost c if, for all h, fh(S) (cid:44) f (S, h) is cost-sensitively submodular w.r.t. c.\nTheoretical Results: Under the above assumptions, near-optimality guarantees in Theorems 2\nand 3 for the greedy algorithms in Section 3.2 still hold. This result is stated and proven in the\nsupplementary material. The proof requires a sophisticated combination of the techniques for worst-\ncase adaptive optimization with uniform modular costs [8] and non-adaptive optimization with\nnon-uniform modular costs [10]. Unlike [10], our proof deals with policy trees instead of sets and we\ngeneralize previous techniques, originally used for modular costs, to handle general cost functions.\n\n5 Applications and Experiments\n\n5.1 Applications\n\nWe discuss two applications of our theoretical results in this section: the budgeted adaptive coverage\nproblem and the budgeted pool-based active learning problem. These problems were considered in\n[2] for the average case, while we study them here in the worst case where the dif\ufb01culty, as shown\nabove, is that simple policies such as \u03c01 and \u03c02 are not near-optimal as compared to the former case.\nBudgeted Adaptive Coverage: In this problem, we are given a set of locations where we need to\nplace some sensors to get the spatial information of the surrounding environment. If sensors are\ndeployed at a set of sensing locations, we have to pay a cost depending on where the locations are.\nAfter a sensor is deployed at a location, it may be in one of a few possible states (e.g., this may be\ncaused by a partial failure of the sensor), leading to various degrees of information covered by the\nsensor. The budgeted adaptive coverage problem can be stated as: given a cost budget K, where\nshould we place the sensors to cover as much spatial information as possible?\nWe can model this problem as a worst-case adaptive optimization problem with budget K. Let\nX be the set of all possible locations where sensors may be deployed, and let Y be the set of all\npossible states of the sensors. For each set of locations S \u2286 X , c(S) is the cost of deploying sensors\nthere. For a location x and a state y, let Rx,y be the geometric shape associated with the spatial\ninformation covered if we put a sensor at x and its state is y. We can de\ufb01ne the utility function\nx\u2208S Rx,h(x)|, which is the cardinality (or volume) of the covered region. If we \ufb01x\na realization h, this utility is monotone submodular [11]. Thus, f (S, h) is pointwise monotone\nsubmodular. Since this function also satis\ufb01es minimal dependency, we can apply the policy \u03c01/2 to\nthis problem and get the guarantee in Theorem 3 if the cost function c is modular.\nBudgeted Pool-based Active Learning: For pool-based active learning, we are given a \ufb01nite set of\nunlabeled examples and need to adaptively query the labels of some selected examples from that\nset to train a classi\ufb01er. Every time we query an example, we have to pay a cost and then get to see\nits label. In the next iteration, we can use the labels observed so far to select the next example to\n\nf (S, h) = |(cid:83)\n\n6\n\n\fTable 1: AUCs (normalized to [0,100]) of four learning policies.\n\nCost\nR1\nR2\nM1\nM2\n\nPL\n79.8\n80.7\n92.5\n86.9\n\nData set 1\nLC\n85.6\n85.0\n93.0\n87.4\n\nALC BLC\n93.9\n92.0\n63.0\n63.6\n96.5\n95.9\n91.2\n90.1\n\nData set 2\nLC\n69.3\n70.4\n86.7\n73.1\n\nALC BLC\n83.1\n77.5\n50.5\n51.8\n92.6\n91.7\n62.1\n67.4\n\nPL\n76.7\n78.6\n90.7\n79.4\n\nData set 3\nLC\n79.7\n82.6\n91.0\n86.3\n\nALC BLC\n94.0\n90.1\n51.9\n54.7\n96.9\n96.3\n74.1\n78.2\n\nPL\n69.0\n70.9\n84.6\n72.5\n\nf (S, h) =(cid:80)\n\nquery. The budgeted pool-based active learning problem can be stated as: given a budget K, which\nexamples should we query to train a good classi\ufb01er?\nWe can model this problem as a worst-case adaptive optimization problem with budget K. Let X\nbe the set of unlabeled examples and Y be the set of all possible labels. For each set of examples\nS \u2286 X , c(S) is the cost of querying their labels. A realization h is a labeling of all examples in X .\nFor pool-based active learning, previous works [2, 8, 14] have shown that the version space reduction\nutility is pointwise monotone submodular and satis\ufb01es minimal dependency. This utility is de\ufb01ned as\nh(cid:48):h(cid:48)(S)(cid:54)=h(S) p0[h(cid:48)], where p0 is a prior on H and h(S) is the labels of S according to\nh. Thus, we can apply \u03c01/2 to this problem with the guarantee in Theorem 3 if the cost c is modular.\nWith the utility above, the greedy criterion that maximizes \u03b4(x\u2217 | D) in the cost-insensitive policy\n\u03c02 is equivalent to the well-known least con\ufb01dence criterion x\u2217 = arg minx maxy pD[y; x] =\narg maxx miny{1 \u2212 pD[y; x]}, where pD is the posterior after observing D and pD[y; x] is\nthe probability that x has label y. On the other hand, the greedy criterion that maximizes\n\u03b4(x\u2217 | D)/\u2206c(x\u2217 | XD) in the cost-average policy \u03c01 is equivalent to:\n\n(cid:26) miny{1 \u2212 pD[y; x]}\n\n(cid:27)\n\nx\u2217 = arg max\n\nx\n\n\u2206c(x | XD)\n\n.\n\n(5)\n\nWe prove this equation in the supplementary material. Theorem 3 can also be applied if we consider\nthe total generalized version space reduction utility [8] that incorporates an arbitrary loss. This utility\nwas also shown to be pointwise monotone submodular and satisfy minimal dependency [8], and thus\nthe theorem still holds in this case for modular costs.\n\n5.2 Experiments\nWe present experimental results for budgeted pool-based active learning with various modular cost\nsettings. We use 3 binary classi\ufb01cation data sets extracted from the 20 Newsgroups data [15]:\nalt.atheism/comp.graphics (data set 1), comp.sys.mac.hardware/comp.windows.x (data set 2), and\nrec.motorcycles/rec.sport.baseball (data set 3). Since the costs are modular, they are put on individual\nexamples, and the total cost is the sum of the selected examples\u2019 costs. We will consider settings\nwhere random costs and margin-dependent costs are put on training data.\nWe compare 4 data selection strategies: passive learning (PL), cost-insensitive greedy policy or least\ncon\ufb01dence (LC), cost-average greedy policy (ALC), and budgeted least con\ufb01dence (BLC). LC and\nALC have been discussed in Section 5.1, and BLC is the corresponding policy \u03c01/2. These three\nstrategies are active learning algorithms. For comparison, we train a logistic regression model with\nbudgets 50, 100, 150, and 200, and approximate its area under the learning curve (AUC) using the\naccuracies on a separate test set. In Table 1, bold numbers indicate the best scores, and underlines\nindicate that BLC is the second best among the active learning algorithms.\nExperiments with Random Costs: In this setting, costs are put randomly to the training examples\nin 2 scenarios. In scenario R1, some random examples have a cost drawn from Gamma(80, 0.1) and\nthe other examples have cost 1. From the results for this scenario in Table 1, ALC is better than\nLC and BLC is the second best among the active learning algorithms. In scenario R2, all examples\nwith label 1 have a cost drawn from Gamma(45, 0.1) and the others (examples with label 0) have\ncost 1. From Table 1, LC is better than ALC in this scenario, which is due to the biasness of ALC\ntoward examples with label 0. In this scenario, BLC is also the second best among the active learning\nalgorithms, although it is still signi\ufb01cantly worse than LC.\nExperiments with Margin-Dependent Costs: In this setting, costs are put on training examples\nbased on their margins to a classi\ufb01er trained on the whole data set. Speci\ufb01cally, we \ufb01rst train a logistic\nregression model on all the data and compute its probabilistic prediction for each training example.\n\n7\n\n\fThe margin of an example is then the scaled distance between 0.5 and its probabilistic prediction.\nWe also consider 2 scenarios. In scenario M1, we put higher costs on examples with lower margins.\nFrom Table 1, ALC is better than LC in this scenario. BLC performs better than both ALC and LC\non data set 2, and performs the second best among the active learning algorithms on data sets 1 and 3.\nIn scenario M2, we put higher costs on examples with larger margins. From Table 1, ALC is better\nthan LC on data set 1, while LC is better than ALC on data sets 2 and 3. On all data sets, BLC is the\nsecond best among the active learning algorithms.\nNote that our experiments do not intend to show BLC is better than LC and ALC. In fact, our\ntheoretical results somewhat state that either LC or ALC will perform well although we may not\nknow which one is better. So, our experiments are to demonstrate some cases where one of these\nmethods would perform badly, and BLC can be a more robust choice that often performs in-between\nthese two methods.\n6 Related Work\nOur work is related to [7, 8, 10, 12] but is more general than these works. Cuong et al. [8] considered\na similar worst-case setting as ours, but they assumed the utility is pointwise submodular and the cost\nis uniform modular. Our work is more general than theirs in two aspects: (1) pointwise cost-sensitive\nsubmodularity is a generalization of pointwise submodularity, and (2) our cost function is general and\nmay be neither uniform nor modular. These generalizations make the problem more complicated as\nsimple greedy policies, which are near-optimal in [8], will not be near-optimal anymore (see Section\n3.2). Thus, we need to combine two simple greedy policies to obtain a new near-optimal policy.\nGuillory & Bilmes [7] were the \ufb01rst to consider worst-case adaptive submodular optimization,\nparticularly in the interactive submodular set cover problem [7, 16].\nIn [7], the utility is also\npointwise submodular, and they look for a policy that can achieve at least a certain value of utility\nw.r.t. an unknown target realization while at the same time minimizing the cost of this policy. Their\n\ufb01nal utility, which is derived from the individual utilities of various realizations, is submodular. Our\nwork, in contrast, tries to maximize the worst-case utility directly given a cost budget.\nKhuller et al. [10] considered the budgeted maximum coverage problem, which is the non-adaptive\nversion of our problem with a modular cost. For this problem, they showed that the best between\ntwo non-adaptive greedy policies can achieve near-optimality compared to the optimal non-adaptive\npolicy. Similar results were also shown in [13] with a better constant and in [12] for the outbreak\ndetection problem. Our work is a generalization of [10, 12] to the adaptive setting with general cost\nfunctions, and we can achieve the same constant factor as [12]. Furthermore, the class of utility\nfunctions in our work is even more general than the coverage utilities in these works.\nOur concept of cost-sensitive submodularity is a generalization of submodularity [9] for general\ncosts. Submodularity has been successfully applied to many applications [1, 17, 18, 19, 20]. Besides\npointwise submodularity, there are other ways to extend submodularity to the adaptive setting, e.g.,\nadaptive submodularity [2, 21, 22] and approximately adaptive submodularity [23]. For adaptive\nsubmodular utilities, Golovin & Krause [2] proved that greedily maximizing the average utility gain in\neach step is near-optimal in both average and worst cases. However, neither pointwise submodularity\nimplies adaptive submodularity nor vice versa. Thus, our assumptions in this paper can be applied to\na different class of utilities than those in [2].\n\n7 Conclusion\nWe studied worst-case adaptive optimization with budget constraint, where the cost can be either\nmodular or non-modular and the utility satis\ufb01es pointwise submodularity or pointwise cost-sensitive\nsubmodularity respectively. We proved a negative result about two greedy policies for this problem\nbut also showed a positive result for the best between them. We used this result to derive a combined\npolicy which is near-optimal compared to the optimal policy that uses half of the budget. We\ndiscussed applications of our theoretical results and reported experiments for the greedy policies on\nthe pool-based active learning problem.\n\nAcknowledgments\nThis work was done when both authors were at the National University of Singapore. The authors were\npartially supported by the Agency for Science, Technology and Research (A*STAR) of Singapore\nthrough SERC PSF Grant R266000101305.\n\n8\n\n\fReferences\n[1] Andreas Krause and Carlos Guestrin. Nonmyopic active learning of Gaussian processes: An\n\nexploration-exploitation approach. In ICML, 2007.\n\n[2] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in\n\nactive learning and stochastic optimization. JAIR, 2011.\n\n[3] Andrew McCallum and Kamal Nigam. Employing EM and pool-based active learning for text\n\nclassi\ufb01cation. In ICML, 1998.\n\n[4] Nguyen Viet Cuong, Nan Ye, and Wee Sun Lee. Robustness of Bayesian pool-based active\n\nlearning against prior misspeci\ufb01cation. In AAAI, 2016.\n\n[5] Brian C. Dean, Michel X. Goemans, and J. Vondrdk. Approximating the stochastic knapsack\n\nproblem: The bene\ufb01t of adaptivity. In FOCS, 2004.\n\n[6] Arash Asadpour, Hamid Nazerzadeh, and Amin Saberi. Stochastic submodular maximization.\n\nIn Internet and Network Economics. 2008.\n\n[7] Andrew Guillory and Jeff Bilmes. Interactive submodular set cover. In ICML, 2010.\n[8] Nguyen Viet Cuong, Wee Sun Lee, and Nan Ye. Near-optimal adaptive pool-based active\n\nlearning with general loss. In UAI, 2014.\n\n[9] G. L. Nemhauser and L. A. Wolsey. Best algorithms for approximating the maximum of a\n\nsubmodular set function. Mathematics of Operations Research, 3(3):177\u2013188, 1978.\n\n[10] Samir Khuller, Anna Moss, and Joseph Sef\ufb01 Naor. The budgeted maximum coverage problem.\n\nInformation Processing Letters, 70(1):39\u201345, 1999.\n\n[11] Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular\n\nfunctions. In AAAI, 2007.\n\n[12] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and\n\nNatalie Glance. Cost-effective outbreak detection in networks. In KDD, 2007.\n\n[13] Maxim Sviridenko. A note on maximizing a submodular set function subject to a knapsack\n\nconstraint. Operations Research Letters, 32(1):41\u201343, 2004.\n\n[14] Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, and Hai Leong Chieu. Active\nlearning for probabilistic hypotheses using the maximum Gibbs error criterion. In NIPS, 2013.\n[15] Thorsten Joachims. A probabilistic analysis of the Rocchio algorithm with TFIDF for text\n\ncategorization. DTIC Document, 1996.\n\n[16] Andrew Guillory and Jeff A. Bilmes. Simultaneous learning and covering with adversarial\n\nnoise. In ICML, 2011.\n\n[17] Andreas Krause and Carlos Guestrin. Submodularity and its applications in optimized informa-\n\ntion gathering. ACM Transactions on Intelligent Systems and Technology, 2(4):32, 2011.\n\n[18] Andrew Guillory and Jeff A. Bilmes. Online submodular set cover, ranking, and repeated active\n\nlearning. In NIPS, 2011.\n\n[19] Andrew Guillory. Active Learning and Submodular Functions. PhD thesis, University of\n\nWashington, 2012.\n\n[20] Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active\n\nlearning. In ICML, 2015.\n\n[21] Shervin Javdani, Yuxin Chen, Amin Karbasi, Andreas Krause, Drew Bagnell, and Siddhartha S.\n\nSrinivasa. Near optimal Bayesian active learning for decision making. In AISTATS, 2014.\n\n[22] Alkis Gotovos, Amin Karbasi, and Andreas Krause. Non-monotone adaptive submodular\n\nmaximization. In IJCAI, 2015.\n\n[23] Matt J. Kusner. Approximately adaptive submodular maximization. In NIPS Workshop on\n\nDiscrete and Combinatorial Problems in Machine Learning, 2014.\n\n9\n\n\f", "award": [], "sourceid": 678, "authors": [{"given_name": "Nguyen", "family_name": "Cuong", "institution": "National University of Singapore"}, {"given_name": "Huan", "family_name": "Xu", "institution": "NUS"}]}