{"title": "Extensions of Generalized Binary Search to Group Identification and Exponential Costs", "book": "Advances in Neural Information Processing Systems", "page_first": 154, "page_last": 162, "abstract": "Generalized Binary Search (GBS) is a well known greedy algorithm for identifying an unknown object while minimizing the number of yes\" or \"no\" questions posed about that object, and arises in problems such as active learning and active diagnosis. Here, we provide a coding-theoretic interpretation for GBS and show that GBS can be viewed as a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. This interpretation is then used to extend GBS in two ways. First, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. Then, we consider the case where the cost of identifying an object grows exponentially in the number of queries. In each case, we present an exact formula for the objective function involving Shannon or Renyi entropy, and develop a greedy algorithm for minimizing it.\"", "full_text": "Extensions of Generalized Binary Search to Group\n\nIdenti\ufb01cation and Exponential Costs\n\n2Institute for Translational Sciences, 3Dept. of Preventative Medicine and Community Health,\n\nUniversity of Texas Medical Branch, Galveston, TX 77555\n\nGowtham Bellala1, Suresh K. Bhavnani2,3,4, Clayton Scott1\n\n1Department of EECS, University of Michigan, Ann Arbor, MI 48109\n\n4School of Biomedical Informatics, University of Texas, Houston, TX 77030\n\ngowtham@umich.edu, skbhavnani@gmail.com, clayscot@umich.edu\n\nAbstract\n\nGeneralized Binary Search (GBS) is a well known greedy algorithm for identify-\ning an unknown object while minimizing the number of \u201cyes\u201d or \u201cno\u201d questions\nposed about that object, and arises in problems such as active learning and active\ndiagnosis. Here, we provide a coding-theoretic interpretation for GBS and show\nthat GBS can be viewed as a top-down algorithm that greedily minimizes the ex-\npected number of queries required to identify an object. This interpretation is then\nused to extend GBS in two ways. First, we consider the case where the objects are\npartitioned into groups, and the objective is to identify only the group to which\nthe object belongs. Then, we consider the case where the cost of identifying an\nobject grows exponentially in the number of queries. In each case, we present an\nexact formula for the objective function involving Shannon or R\u00b4enyi entropy, and\ndevelop a greedy algorithm for minimizing it.\n\n1\n\nIntroduction\n\nIn applications such as active learning [1, 2, 3, 4], disease/fault diagnosis [5, 6, 7], toxic chemical\nidenti\ufb01cation [8], computer vision [9, 10] or the adaptive traveling salesman problem [11], one often\nencounters the problem of identifying an unknown object while minimizing the number of binary\nquestions posed about that object. In these problems, there is a set \u0398 = {\u03b81,\u00b7\u00b7\u00b7 , \u03b8M} of M different\nobjects and a set Q = {q1,\u00b7\u00b7\u00b7 , qN} of N distinct subsets of \u0398 known as queries. An unknown\nobject \u03b8 is generated from this set \u0398 with a certain prior probability distribution \u03a0 = (\u03c01,\u00b7\u00b7\u00b7 , \u03c0M ),\ni.e., \u03c0i = Pr(\u03b8 = \u03b8i), and the goal is to uniquely identify this unknown object through as few queries\nfrom Q as possible, where a query q \u2208 Q returns a value 1 if \u03b8 \u2208 q, and 0 otherwise. For example,\nin active learning, the objects are classi\ufb01ers and the queries are the labels for \ufb01xed test points. In\nactive diagnosis, objects may correspond to faults, and queries to alarms. This problem has been\ngenerically referred to as binary testing or object/entity identi\ufb01cation in the literature [5, 12]. We\nwill refer to this problem as object identi\ufb01cation. Our attention is restricted to the case where \u0398 and\nQ are \ufb01nite, and the queries are noiseless.\nThe goal in object identi\ufb01cation is to construct an optimal binary decision tree, where each internal\nnode in the tree is associated with a query from Q, and each leaf node corresponds to an object\nfrom \u0398. Optimality is often with respect to the expected depth of the leaf node corresponding to\nthe unknown object \u03b8. In general the determination of an optimal tree is NP-complete [13]. Hence,\nvarious greedy algorithms [5, 14] have been proposed to obtain a suboptimal binary decision tree. A\nwell studied algorithm for this problem is known as the splitting algorithm [5] or generalized binary\nsearch (GBS) [1, 2]. This is the greedy algorithm which selects a query that most evenly divides the\nprobability mass of the remaining objects [1, 2, 5, 15].\n\n1\n\n\fGBS assumes that the end goal is to rapidly identify individual objects. However, in applications\nsuch as disease diagnosis, where \u0398 is a collection of possible diseases, it may only be necessary\nto identify the intervention or response to an object, rather than the object itself. In these prob-\nlems, the object set \u0398 is partitioned into groups and it is only necessary to identify the group to\nwhich the unknown object belongs. We note below that GBS is not necessarily ef\ufb01cient for group\nidenti\ufb01cation.\nTo address this problem, we \ufb01rst present a new interpretation of GBS from a coding-theoretic per-\nspective by viewing the problem of object identi\ufb01cation as constrained source coding. Speci\ufb01cally,\nwe present an exact formula for the expected number of queries required to identify an unknown\nobject in terms of Shannon entropy of the prior distribution \u03a0, and show that GBS is a top-down\nalgorithm that greedily minimizes this cost function. Then, we extend this framework to the problem\nof group identi\ufb01cation and derive a natural extension of GBS for this problem.\nWe also extend the coding theoretic framework to the problem of object (or group) identi\ufb01cation\nwhere the cost of identifying an object grows exponentially in the number of queries, i.e., the cost\nof identifying an object using d queries is \u03bbd for some \ufb01xed \u03bb > 1. Applications where such\na scenario arises have been discussed earlier in the context of source coding [16], random search\ntrees [17] and design of alphabetic codes [18], for which ef\ufb01cient optimal or greedy algorithms\nhave been presented. In the context of object/group identi\ufb01cation, the exponential cost function has\ncertain advantages in terms of avoiding deep trees (which is crucial in time-critical applications)\nand being more robust to misspeci\ufb01cation of the prior probabilities. However, there does not exist\nan algorithm to the best of our knowledge that constructs a good suboptimal decision tree for the\nproblem of object/group identi\ufb01cation with exponential costs. Once again, we show below that GBS\nis not necessarily ef\ufb01cient for minimizing the exponential cost function, and propose an improved\ngreedy algorithm that generalizes GBS.\n\n1.1 Notation\nWe denote an object identi\ufb01cation problem by a pair (B, \u03a0) where B is a known M \u00d7 N binary\nmatrix with bij equal to 1 if \u03b8i \u2208 qj, and 0 otherwise. A decision tree T constructed on (B, \u03a0) has a\nquery from the set Q at each of its internal nodes, with the leaf nodes terminating in the objects from\n\u0398. For a decision tree with L leaves, the leaf nodes are indexed by the set L = {1,\u00b7\u00b7\u00b7 , L} and the\ninternal nodes are indexed by the set I = {L + 1,\u00b7\u00b7\u00b7 , 2L\u2212 1}. At any node \u2018a\u2019, let Qa \u2286 Q denote\nthe set of queries that have been performed along the path from the root node up to that node. An\nobject \u03b8i reaches node \u2018a\u2019 if it agrees with the true \u03b8 on all queries in Qa, i.e., the binary values in B\nfor the rows corresponding to \u03b8i and \u03b8 are same over the columns corresponding to queries in Qa.\nAt any internal node a \u2208 I, let l(a), r(a) denote the \u201cleft\u201d and \u201cright\u201d child nodes, and let \u0398a \u2286 \u0398\ndenote the set of objects that reach node \u2018a\u2019. Thus, the sets \u0398l(a) \u2286 \u0398a, \u0398r(a) \u2286 \u0398a correspond\nto the objects in \u0398a that respond 0 and 1 to the query at node \u2018a\u2019, respectively. We denote by\ndenote the Shannon entropy of a proportion \u03c0 \u2208 [0, 1] by H(\u03c0) := \u2212\u03c0 log2 \u03c0 \u2212 (1\u2212 \u03c0) log2(1\u2212 \u03c0)\ni \u03c0i log2 \u03c0i, where we use the limit,\nlim\n\u03c0\u21920\n\n\u03c0\u0398a :=(cid:80){i:\u03b8i\u2208\u0398a} \u03c0i, the probability mass of the objects reaching node \u2018a\u2019 in the tree. Finally, we\nand that of a vector \u03a0 = (\u03c01,\u00b7\u00b7\u00b7 , \u03c0M ) by H(\u03a0) := \u2212(cid:80)\n\n\u03c0 log2 \u03c0 = 0, to de\ufb01ne the value of 0 log2 0.\n\n2 GBS Greedily Minimizes the Expected Number of Queries\n\nWe begin by noting that object identi\ufb01cation reduces to the standard source coding problem [19]\nin the special case when Q is complete, meaning, for any S \u2286 \u0398 there exists a query q \u2208 Q such\nthat either q = S or \u0398 \\ q = S. Here, the problem of constructing an optimal binary decision tree\nis equivalent to constructing optimal variable length binary pre\ufb01x codes, for which there exists an\nef\ufb01cient optimal algorithm known as the Huffman algorithm [20]. It is also known that the expected\nlength of any binary pre\ufb01x code (i.e., expected depth of any binary decision tree) is bounded below\nby the Shannon entropy of the prior distribution \u03a0 [19].\nFor the problem of object identi\ufb01cation, where Q is not complete, the entropy lower bound is still\nvalid, but Huffman coding cannot be implemented. In this case, GBS is a greedy, top-down al-\ngorithm that is analogous to Shannon-Fano coding [21, 22]. We now show that GBS is actually\ngreedily minimizing the expected number of queries required to identify an object.\n\n2\n\n\fFirst, we de\ufb01ne a parameter called the reduction factor on the binary matrix/tree combination that\nprovides a useful quanti\ufb01cation on the expected number of queries required to identify an object.\nDe\ufb01nition 1 (Reduction factor). Let T be a decision tree constructed on the pair (B, \u03a0). The\nreduction factor at any internal node \u2018a\u2019 in the tree is de\ufb01ned by \u03c1a = max{\u03c0\u0398l(a) , \u03c0\u0398r(a)}/\u03c0\u0398a.\nNote that 0.5 \u2264 \u03c1a \u2264 1. Given an object identi\ufb01cation problem (B, \u03a0), let T (B, \u03a0) denote the set\nof decision trees that can uniquely identify all the objects in the set \u0398. We assume that the rows of\nB are distinct so that T (B, \u03a0) (cid:54)= \u2205. For any decision tree T \u2208 T (B, \u03a0), let {\u03c1a}a\u2208I denote the\nset of reduction factors and let di denote the number of queries required to identify object \u03b8i in the\ntree. Then the expected number of queries required to identify an unknown object using a tree (or,\ni \u03c0idi. Note that the cost function depends on both \u03a0\nand d = (d1,\u00b7\u00b7\u00b7 , dM ). However, we do not show the dependence on d explicitly.\nTheorem 1. For any T \u2208 T (B, \u03a0), the expected number of queries required to identify an unknown\nobject is given by\n\nthe expected depth of a tree) is L1(\u03a0) =(cid:80)\n\n\u03c0\u0398a[1 \u2212 H(\u03c1a)].\n\n(1)\n\nL1(\u03a0) = H(\u03a0) +(cid:88)\n\na\u2208I\n\nmin\n\nT\u2208T (B,\u03a0)\n\nH(\u03a0) +(cid:80)\n\nTheorems 1, 2 and 3 are special cases of Theorem 4, whose proof is sketched in the Appendix.\nComplete proofs are given in the Supplemental Material. Since H(\u03c1a) \u2264 1, this theorem recovers\nthe result that L1(\u03a0) is bounded below by the Shannon entropy H(\u03a0). It presents the exact formula\nfor the gap in this lower bound. It also follows from the above result that a tree attains the entropy\nbound iff the reduction factors are equal to 0.5 at each internal node in the tree. Using this result,\nminimizing L1(\u03a0) can be formulated as the following optimization problem:\n\na\u2208I \u03c0\u0398a[1 \u2212 H(\u03c1a)].\n\nSince \u03a0 is \ufb01xed, this optimization problem reduces to minimizing (cid:80)\n\n(2)\na\u2208I \u03c0\u0398a[1 \u2212 H(\u03c1a)] over\nT (B, \u03a0). As mentioned earlier, \ufb01nding a global optimal solution for this optimization problem is\nNP-complete [13]. Instead, we may take a top down approach and minimize the objective function\nby minimizing the term Ca := \u03c0\u0398a[1 \u2212 H(\u03c1a)] at each internal node, starting from the root node.\nNote that the only term that depends on the query chosen at node \u2018a\u2019 in this cost function is \u03c1a.\nHence the algorithm reduces to minimizing \u03c1a (i.e., choosing a split as balanced as possible) at each\ninternal node a \u2208 I.\nIn other words, greedy minimization of (2) is equivalent to GBS. In the next section, we show how\nthis framework can be extended to derive greedy algorithms for the problems of group identi\ufb01cation\nand object identi\ufb01cation with exponential costs.\n3 Extensions of GBS\n3.1 Group Identi\ufb01cation\nIn group identi\ufb01cation1, the goal is not to determine the unknown object \u03b8 \u2208 \u0398, rather the group to\nwhich it belongs, in as few queries as possible. Here, in addition to B and \u03a0, the group labels for\nthe objects are also provided, where the groups are assumed to be disjoint.\nWe denote a group identi\ufb01cation problem by (B, \u03a0, y), where y = (y1,\u00b7\u00b7\u00b7 , yM ) denotes the group\nlabels of the objects, yi \u2208 {1,\u00b7\u00b7\u00b7 , K}. Let {\u0398k}K\nk=1 be the partition of \u0398, where \u0398k = {\u03b8i \u2208 \u0398 :\nyi = k}. It is important to note here that the group identi\ufb01cation problem cannot be simply reduced\nto an object identi\ufb01cation problem with groups {\u03981,\u00b7\u00b7\u00b7 , \u0398K} as \u201cmeta objects,\u201d since the objects\nwithin a group need not respond the same to each query. For instance, consider the toy example\nshown in Figure 1 where the objects \u03b81, \u03b82 and \u03b83 belonging to group 1 cannot be collapsed into a\nsingle meta object as these objects respond differently to queries q1 and q3.\nIn this context, we also note that GBS can fail to produce a good solution for a group identi\ufb01cation\nproblem as it does not take the group labels into consideration while choosing queries. Once again,\nconsider the toy example shown in Figure 1 where query q2 is suf\ufb01cient to identify the group of an\nunknown object, whereas GBS requires 2 queries to identify the group when the unknown object is\neither \u03b82 or \u03b84. Here, we propose a natural extension of GBS to the problem of group identi\ufb01cation.\n1Golovin et.al. [23] simultaneously studied the problem of group identi\ufb01cation in the context of object\n\nidenti\ufb01cation with persistent noise. Their algorithm is an extension of that in [24].\n\n3\n\n\fq1\n0\n1\n0\n1\n\nq2\n1\n1\n1\n0\n\nq3 Group label, y\n1\n0\n0\n0\n\n1\n1\n1\n2\n\n\u03a0\n0.25\n0.25\n0.25\n0.25\n\n\u03b81\n\u03b82\n\u03b83\n\u03b84\n\ny = 1\n\nq1\n\n1\n\n0\n\n!DDDDD\n{wwwww\n?>=<\n89:;\n\"FFFF\n|xxxx\n?>=<\n89:;\n1 \"\n0|\ny = 2\ny = 1\n\nq2\n\nFigure 1: Toy Example\n\nFigure 2: Decision tree constructed using GBS\n\nNote that when constructing a tree for group identi\ufb01cation, a greedy, top-down algorithm terminates\nsplitting when all the objects at the node belong to the same group. Hence, a tree constructed in this\nfashion can have multiple objects ending in the same leaf node and multiple leaves ending in the\nsame group. For a tree with L leaves, we denote by Lk \u2282 L = {1,\u00b7\u00b7\u00b7 , L} the set of leaves that\nterminate in group k. Similar to \u0398k \u2286 \u0398, we denote by \u0398k\na \u2286 \u0398a the set of objects belonging to\ngroup k that reach node \u2018a\u2019 in a tree. Also, in addition to the reduction factor de\ufb01ned in Section 2,\nwe de\ufb01ne a new parameter called the group reduction factor for each group k \u2208 {1,\u00b7\u00b7\u00b7 , K} at each\ninternal node.\nDe\ufb01nition 2 (Group reduction factor). Let T be a decision tree constructed on a group identi\ufb01cation\nproblem (B, \u03a0, y). The group reduction factor for any group k at an internal node \u2018a\u2019 is de\ufb01ned by\na = max{\u03c0\u0398k\n\u03c1k\nGiven (B, \u03a0, y), let T (B, \u03a0, y) denote the set of decision trees that can uniquely identify the groups\nof all objects in the set \u0398. For any decision tree T \u2208 T (B, \u03a0, y), let dj denote the depth of leaf\nnode j \u2208 L. Let random variable X denote the number of queries required to identify the group\nof an unknown object \u03b8. Then, the expected number of queries required to identify the group of an\nunknown object using the given tree is equal to\n\n}/\u03c0\u0398k\n\n, \u03c0\u0398k\n\nr(a)\n\nl(a)\n\n.\n\na\n\nK(cid:88)\n\nL1(\u03a0) =\n\nPr(\u03b8 \u2208 \u0398k) E[X|\u03b8 \u2208 \u0398k] =\n\nk=1\n\nk=1\n\nK(cid:88)\n\n\uf8ee\uf8f0(cid:88)\n\nj\u2208Lk\n\n\uf8f9\uf8fb\n\n\u03c0\u0398j\n\u03c0\u0398k\n\ndj\n\n(3)\n\nL1(\u03a0) = H(\u03a0y) +(cid:88)\n\na\u2208I\n\n(cid:34)\n\nTheorem 2. For any T \u2208 T (B, \u03a0, y), the expected number of queries required to identify the group\nof an unknown object is given by\n\n\u03c0\u0398a\n\n1 \u2212 H(\u03c1a) +\n\na\n\n\u03c0\u0398k\n\u03c0\u0398a\n\nH(\u03c1k\na)\n\n(4)\n\nwhere \u03a0y = (\u03c0\u03981,\u00b7\u00b7\u00b7 , \u03c0\u0398K ) denotes the probability distribution of the object groups induced by\nthe labels y and H(\u00b7) denotes the Shannon entropy.\n\nNote that the term in the summation in (4) is non-negative. Hence, the above result implies that\nL1(\u03a0) is bounded below by the Shannon entropy of the probability distribution of the groups. It\nalso follows from this result that this lower bound is achieved iff the reduction factor \u03c1a is equal to\n0.5 and the group reduction factors {\u03c1k\nk=1 are equal to 1 at every internal node in the tree. Also,\nnote that the result in Theorem 1 is a special case of this result where each group is of size 1 leading\nto \u03c1k\nUsing this result, the problem of \ufb01nding a decision tree with minimum L1(\u03a0) can be formulated as:\n\na = 1 for all groups at every internal node.\n\na}K\n\n(cid:80)\n\n(cid:104)\n\n1 \u2212 H(\u03c1a) +(cid:80)K\n\nmin\n\nT\u2208T (B,\u03a0,y)\n\na\u2208I \u03c0\u0398a\n\n\u03c0\u0398k\na\n\u03c0\u0398a\n\nH(\u03c1k\na)\n\nk=1\n\n.\n\n(5)\n\nThis optimization problem being a generalized version of that in (2) is NP-complete. Hence, we\nmay take a top-down approach and minimize the objective function greedily by minimizing the term\na)] at each internal node, starting from the root node. Note that\na. Hence the algorithm reduces\n\n\u03c0\u0398a[1 \u2212 H(\u03c1a) +(cid:80)K\nto minimizing Ca := 1 \u2212 H(\u03c1a) +(cid:80)K\n\nthe terms that depend on the query chosen at node \u2018a\u2019 are \u03c1a and \u03c1k\n\nH(\u03c1k\n\n\u03c0\u0398k\na\n\u03c0\u0398a\n\nk=1\n\n\u03c0\u0398k\na\n\u03c0\u0398a\n\nk=1\n\nH(\u03c1k\n\na) at each internal node \u2018a\u2019.\n\n4\n\n\u03c0\u0398k\n\nK(cid:88)\n\nk=1\n\n(cid:35)\n\n(cid:105)\n\n{\n!\n\fGroup-GBS (GGBS)\nInitialize: L = {root node}, Qroot = \u2205\nwhile some a \u2208 L has more than one group\nChoose query q\u2217 = arg minq\u2208Q\\Qa Ca(q)\nForm child nodes l(a), r(a)\nReplace \u2018a\u2019 with l(a), r(a) in L\nend\n\nCa = 1 \u2212 H(\u03c1a) +(cid:80)K\n\n\u03c0\u0398k\na\n\u03c0\u0398a\n\na)\nH(\u03c1k\n\nk=1\n\n\u03bb-GBS\nInitialize: L = {root node}, Qroot = \u2205\nwhile some a \u2208 L has more than one object\nChoose query q\u2217 = arg minq\u2208Q\\Qa Ca(q)\nForm child nodes l(a), r(a)\nReplace \u2018a\u2019 with l(a), r(a) in L\nend\nCa = \u03c0\u0398l(a)\n\u03c0\u0398a\n\nD\u03b1(\u0398l(a)) + \u03c0\u0398r(a)\n\nD\u03b1(\u0398r(a))\n\n\u03c0\u0398a\n\nFigure 3: Greedy algorithm for group identi\ufb01-\ncation\n\nFigure 4: Greedy algorithm for object identi\ufb01-\ncation with exponential costs\n\nk\n\n\u03c0\u0398k\na\n\u03c0\u0398a\n\nH(\u03c1k\n\ngroup) while the term(cid:80)\n\nNote that this objective function consists of two terms, the \ufb01rst term [1 \u2212 H(\u03c1a)] favors queries that\nevenly distribute the probability mass of the objects at node \u2018a\u2019 to its child nodes (regardless of the\na) favors queries that transfer an entire group of objects to one of\nits child nodes. This algorithm, which we refer to as Group Generalized Binary Search (GGBS), is\nsummarized in Figure 3. Finally, as an interesting connection with greedy decision-tree algorithms\nfor multi-class classi\ufb01cation, it can be shown that GGBS is equivalent to the decision-tree splitting\nalgorithm used in the C4.5 software package, based on the entropy impurity measure [25].\n3.2 Exponential Costs\n\nNow assume the cost of identifying an object is de\ufb01ned by L\u03bb(\u03a0) := log\u03bb((cid:80)\n\ni \u03c0i\u03bbdi), where \u03bb > 1\nand di corresponds to the depth of object \u03b8i in a tree. In the limiting case where \u03bb tends to 1 and \u221e,\nthis cost function reduces to the average depth and worst case depth, respectively. That is,\n\nL1(\u03a0) = lim\n\u03bb\u21921\n\nL\u03bb(\u03a0) =\n\n\u03c0idi,\n\nL\u221e(\u03a0) := lim\n\n\u03bb\u2192\u221eL\u03bb(\u03a0) = max\n\ni\u2208{1,\u00b7\u00b7\u00b7 ,M}di.\n\n1\u2212\u03b1 log2 ((cid:80)\n\nAs mentioned in Section 2, GBS is tailored to minimize L1(\u03a0), and hence may not produce a good\nsuboptimal solution for the exponential cost function with \u03bb > 1. Thus, we derive an extension\nof GBS for the problem of exponential costs. Here, we use a result by Campbell [26] which states\nthat the exponential cost L\u03bb(\u03a0) of any tree T is bounded below by the \u03b1-R\u00b4enyi entropy, given by\nH\u03b1(\u03a0) := 1\n1+log2 \u03bb. We consider a general object identi\ufb01cation\nproblem and derive an explicit formula for the gap in this lower bound. We then use this formula to\nderive a family of greedy algorithms that minimize the exponential cost function L\u03bb(\u03a0) for \u03bb > 1.\nNote that the entropy bound reduces to the Shannon entropy H(\u03a0) and log2 M, in the limiting cases\nwhere \u03bb tends to 1 and \u221e, respectively.\nTheorem 3. For any \u03bb > 1 and any T \u2208 T (B, \u03a0), the exponential cost L\u03bb(\u03a0) is given by\n\ni ), where \u03b1 =\n\ni \u03c0\u03b1\n\n1\n\n(cid:21)\n\nM(cid:88)\n\ni=1\n\n(cid:20)\n\n\u03bbL\u03bb(\u03a0) = \u03bbH\u03b1(\u03a0) +(cid:88)\nreach node \u2018a\u2019, \u03c0\u0398a = (cid:80)\n\na\u2208I\n\n\u03c0\u0398a\n\nwhere da denotes the depth of any internal node \u2018a\u2019 in the tree, \u0398a denotes the set of objects that\n\n(\u03bb \u2212 1)\u03bbda \u2212 D\u03b1(\u0398a) +\n\n\u03c0\u0398l(a)\n\u03c0\u0398a\n\n\u03c0i, \u03b1 =\n\n1+log2 \u03bb and D\u03b1(\u0398a) :=\n\n1\n\n{i:\u03b8i\u2208\u0398a}\n\nD\u03b1(\u0398l(a)) +\n\n(cid:104)(cid:80){i:\u03b8i\u2208\u0398a}\n\n\u03c0\u0398r(a)\n\u03c0\u0398a\n\n(cid:16) \u03c0i\n\n\u03c0\u0398a\n\nD\u03b1(\u0398r(a))\n\n(cid:17)\u03b1(cid:105)1/\u03b1\n\n.\n\nThe term in summation over internal nodes I in the above result corresponds to the gap in the\nCampbell\u2019s lower bound. This result suggests a top-down greedy approach to minimize L\u03bb(\u03a0),\nwhich is to minimize the term (\u03bb \u2212 1)\u03bbda \u2212 D\u03b1(\u0398a) + \u03c0\u0398l(a)\nD\u03b1(\u0398r(a)) at\neach internal node, starting from the root node. Noting that the terms that depend on the query\nchosen at node \u2018a\u2019 are \u03c0\u0398l(a), \u03c0\u0398r(a), D\u03b1(\u0398l(a)) and D\u03b1(\u0398r(a)), this reduces to minimizing Ca :=\nD\u03b1(\u0398r(a)) at each internal node. This algorithm, which we refer to as\n\u03c0\u0398l(a)\n\u03c0\u0398a\n\u03bb-GBS, can be summarized as shown in Figure 4. Also, it can be shown by the application of\nL\u2019H\u02c6opital\u2019s rule that in the limiting case where \u03bb \u2192 1, \u03bb-GBS reduces to GBS, and in the case\nwhere \u03bb \u2192 \u221e, \u03bb-GBS reduces to GBS with uniform prior \u03c0i = 1/M. The latter algorithm is GBS\nbut with the true prior \u03a0 replaced by a uniform distribution.\n\nD\u03b1(\u0398l(a)) + \u03c0\u0398r(a)\n\nD\u03b1(\u0398l(a)) + \u03c0\u0398r(a)\n\n\u03c0\u0398a\n\n\u03c0\u0398a\n\n\u03c0\u0398a\n\n5\n\n\fFigure 5: Beta distribution over the range [0.5, 1]\nfor different values of \u03b2 when \u03b1 = 1\n\nFigure 6: Expected number of queries required to identify\nthe group of an object using GBS and GGBS\n\n3.3 Group Identi\ufb01cation with Exponential Costs\nFinally, we complete our discussion by considering the problem of group identi\ufb01cation with expo-\nnential costs. Here, the cost of identifying the group of an object given a tree T \u2208 T (B, \u03a0, y), is\n, which reduces to (3) in the limiting case as \u03bb \u2192 1,\nde\ufb01ned to be L\u03bb(\u03a0) = log\u03bb\nand to maxj\u2208L dj, i.e., the worst case depth of the tree, in the case where \u03bb \u2192 \u221e.\nTheorem 4. For any \u03bb > 1 and any T \u2208 T (B, \u03a0, y), the exponential cost L\u03bb(\u03a0) of identifying the\ngroup of an object is given by\n\nj\u2208L \u03c0\u0398j \u03bbdj\n\n(cid:17)\n\n(cid:21)\n\n(cid:16)(cid:80)\n(cid:20)\n\n\u03c0\u0398a\n\n\u03bbL\u03bb(\u03a0) = \u03bbH\u03b1(\u03a0y) +(cid:88)\n(cid:104)(cid:80)K\n\n\u03c0\u0398a\n\na\u2208I\n\n\u03c0\u0398l(a)\n\u03c0\u0398a\n\n\u03c0\u0398r(a)\n\u03c0\u0398a\n\nk=1\n\na\n\u03c0\u0398a\n\n1\n\n1+log2 \u03bb .\n\nwith \u03b1 =\n\n(cid:16) \u03c0\u0398k\n\nD\u03b1(\u0398l(a)) +\n\n(cid:17)\u03b1(cid:105)1/\u03b1\n\nD\u03b1(\u0398l(a)) + \u03c0\u0398r(a)\n\n(\u03bb \u2212 1)\u03bbda \u2212 D\u03b1(\u0398a) +\n\nD\u03b1(\u0398r(a))\nwhere \u03a0y = (\u03c0\u03981,\u00b7\u00b7\u00b7 , \u03c0\u0398K ) denotes the probability distribution of the object groups induced by\nthe labels y, D\u03b1(\u0398a) :=\nNote that the de\ufb01nition of D\u03b1(\u0398a) in this theorem is a generalization of that in Theorem 3. As\nmentioned earlier, Theorems 1-3 are special cases of the above theorem, where Theorem 2 follows\nas \u03bb \u2192 1 and Theorem 1 follows when each group is of size one in addition. This result also\nimplies a top-down, greedy algorithm to minimize L\u03bb(\u03a0), which is to choose a query that minimizes\nD\u03b1(\u0398r(a)) at each internal node. Once again, it can be shown by\nCa := \u03c0\u0398l(a)\n\u03c0\u0398a\nthe application of L\u2019H\u02c6opital\u2019s rule that in the limiting case where \u03bb \u2192 1, this reduces to GGBS, and\nin the case where \u03bb \u2192 \u221e, this reduces to choosing a query that minimizes the maximum number of\ngroups in the child nodes [27].\n4 Performance of the Greedy Algorithms\nWe compare the performance of the proposed algorithms to that of GBS on synthetic data generated\nusing different random data models.\n4.1 Group Identi\ufb01cation\nFor \ufb01xed M = |\u0398| and N = |Q|, we consider a random data model where each query q \u2208 Q is\nassociated with a pair of parameters (\u03b3w(q), \u03b3b(q)) \u2208 [0.5, 1]2. Here, \u03b3w(q) re\ufb02ects the correlation\nof the object responses within a group, and \u03b3b(q) captures the correlation of object responses between\ngroups. When \u03b3w(q) is close to 0.5, each object within a group is equally likely to exhibit 0 or 1\nas its response to query q, whereas, when it is close to 1, most of the objects within a group are\nhighly likely to exhibit the same query response. Similarly, when \u03b3b(q) is close to 0.5, each group\nis equally likely to exhibit 0 or 1 as its response to the query, where a group response corresponds\nto the majority vote of the object responses within a group, while, as \u03b3b(q) tends to 1, most of the\n\n6\n\n0.50.60.70.80.91246810\u03b2 > 1\u03b2 < 1\u03b2 = 10.50.750.9512480.750.951248456789 \u03b2b\u03b2w Expected # of queriesGBSGGBSEntropy bound\fFigure 7: Exponential cost incurred in identifying an object using GBS and \u03bb-GBS\n\ngroups are highly likely to exhibit the same response. Given these correlation values (\u03b3w(q), \u03b3b(q))\nfor a query q, the object responses to query q (i.e., the binary column of 0\u2019s and 1\u2019s corresponding\nto query q in B) are generated as follows\n1. Flip a fair coin to generate a Bernoulli random variable, x\n2. For each group k \u2208 {1,\u00b7\u00b7\u00b7 , K}, assign a binary label bk, where bk = x with probability \u03b3b(q)\n3. For each object in group k, assign bk as the object response to q with probability \u03b3w(q)\nGiven the correlation parameters (\u03b3w(q), \u03b3b(q)), \u2200q \u2208 Q, a random dataset can be created by fol-\nlowing the above procedure for each query.\nWe compare the performances of GBS and GGBS on random datasets generated using the above\nmodel. We demonstrate the results on datasets of size N = 200 (# of queries) and M = 400\n(# of objects), where we randomly partitioned the objects into 15 groups and assumed a uniform\nprior on the objects. For each dataset, the correlation parameters are drawn from independent beta\ndistributions over the range [0.5, 1], i.e., \u03b3w(q) \u223c Beta(1, \u03b2w) and \u03b3b(q) \u223c Beta(1, \u03b2b) where\n\u03b2w, \u03b2b \u2208 {0.5, 0.75, 0.95, 1, 2, 4, 8}. Figure 5 shows the density function (pdf) of Beta(1, \u03b2) for\ndifferent values of \u03b2. Note that \u03b2 = 1 corresponds to a uniform distribution, while, for \u03b2 < 1 the\ndistribution is right skewed and for \u03b2 > 1 the distribution is left skewed.\nFigure 6 compares the mean value of the cost function L1(\u03a0) for GBS and GGBS over 100 randomly\ngenerated datasets, for each value of (\u03b2w, \u03b2b). This shows the improved performance of GGBS over\nGBS in group identi\ufb01cation. Especially, note that GGBS achieves performance close to the entropy\nbound as \u03b2w decreases. This is due to the increased number of queries with \u03b3w(q) close to 1 in the\ndataset. As the correlation parameter \u03b3w(q) tends to 1, choosing that query keeps the groups intact,\na tend to 1 for these queries. Such queries offer signi\ufb01cant gains\ni.e., the group reduction factors \u03c1k\nin group identi\ufb01cation, but can be overlooked by GBS.\n4.2 Object Identi\ufb01cation with Exponential Costs\nWe consider the same random data model as above where we set K = M, i.e., each group is\ncomprised of one object. Thus, the only correlation parameter that determines the structure of the\ndataset is \u03b3b(q), q \u2208 Q. Figure 7 demonstrates the improved performance of \u03bb-GBS over standard\nGBS, and GBS with uniform prior, over a range of \u03bb values, for a dataset generated using the above\nrandom data model with \u03b3b(q) \u223c Beta(1, 1) = unif[0.5, 1]. Each curve in the \ufb01gure corresponds\nto the average value of the cost function L\u03bb(\u03a0) as a function of \u03bb over 100 repetitions. In each\nj=1, \u03b4 \u2265 0, after\nrandomly permuting the objects. Note that in the special case when \u03b4 = 0, this reduces to the\nuniform distribution and as \u03b4 increases, it tends to a skewed distribution with most of the probability\nmass concentrated on few objects.\nSimilar experiments have been performed on datasets generated using \u03b3b(q) \u223c Beta(\u03b1, \u03b2) for differ-\nent values of \u03b1, \u03b2. In all our experiments, we observed \u03bb-GBS to be consistently performing better\nthan both the standard GBS, and GBS with uniform prior. In addition, the performance of \u03bb-GBS\nhas been observed to be very close to that of the entropy bound. Finally, Figure 7 also re\ufb02ects that\n\u03bb-GBS converges to GBS as \u03bb \u2192 1, and to GBS with uniform prior as \u03bb \u2192 \u221e.\n\nrepetition, the prior is generated according to Zipf\u2019s law, i.e., (j\u2212\u03b4/(cid:80)M\n\ni=1 i\u2212\u03b4)M\n\n7\n\n1001011021031041057891011\u03bb Average Exponential cost, L\u03bb(\u03a0)\u03b4 = 1100101102103104105468101214\u03bb\u03b4 = 2 GBSGBS\u2212Uniform\u03bb\u2212GBSEntropy bound\f5 Conclusions\nIn this paper, we show that generalized binary search (GBS) is a top-down algorithm that greedily\nminimizes the expected number of queries required to identify an object. We then use this inter-\npretation to extend GBS in two ways. First, we consider the case where the objects are partitioned\ninto groups, and the goal is to identify only the group of the unknown object. Second, we consider\nthe problem where the cost of identifying an object grows exponentially in the number of queries.\nThe algorithms are derived in a common framework. In particular, we prove the exact formulas for\nthe cost function in each case that close the gap between previously known lower bounds related to\nShannon and R\u00b4enyi entropy. These exact formulas are then optimized in a greedy, top-down manner\nto construct a decision tree. We demonstrate the improved performance of the proposed algorithms\nover GBS through simulations. An important open question and the direction of our future work is\nto relate these greedy algorithms to the global optimizer of their respective cost functions.\n\nAcknowledgements\n\nG. Bellala and C. Scott were supported in part by NSF Awards No. 0830490 and 0953135. S.\nBhavnani was supported in part by CDC/NIOSH grant No. R21OH009441.\n6 Appendix: Proof Sketch for Theorem 4\n\nDe\ufb01ne two new functions(cid:101)L\u03bb and (cid:101)H\u03b1 as\n\uf8f9\uf8fb =(cid:88)\n(cid:101)L\u03bb :=\nwhere(cid:101)L\u03bb is related to the cost function L\u03bb(\u03a0) as \u03bbL\u03bb(\u03a0) = (\u03bb\u2212 1)(cid:101)L\u03bb + 1, and (cid:101)H\u03b1 is related to the\n\n\uf8ee\uf8f0dj\u22121(cid:88)\n\n\uf8ee\uf8f0(cid:88)\n\n\u03c0\u0398j \u03bbdj \u2212 1\n\nk=1 \u03c0\u03b1\n\u0398k\n\n(cid:17) 1\n\n\u03bb \u2212 1\n\n\uf8f9\uf8fb and (cid:101)H\u03b1 := 1 \u2212\n(cid:16)(cid:80)K\n(cid:32) K(cid:88)\n\n\u03b1-R\u00b4enyi entropy H\u03b1(\u03a0y) as\n\n(cid:33) 1\n\nj\u2208L\n\nj\u2208L\n\n\u03c0\u0398j\n\n\u03bbh\n\nh=0\n\n1\n\n1\n\n1\n\n,\n\n\u03b1\n\n\u03b1\n\n\u03b1\n\n1\n\n=\n\nk=1\n\nk=1\n\nk=1\n\n\u03c0\u03b1\n\u0398k\n\n\u03c0\u03b1\n\u0398k\n\nlog2\n\n\u0398k =\n\u03c0\u03b1\n\n\u03b1 log2 \u03bb\n\n\u0398k = log\u03bb\n\u03c0\u03b1\n\n(cid:32) K(cid:88)\n\nK(cid:88)\nK(cid:88)\n(cid:33) 1\n(cid:33) 1\n\u03b1 (cid:101)H\u03b1 + 1\n\u03bbda \u03c0\u0398a =\u21d2 \u03bbL\u03bb(\u03a0) = 1 +(cid:88)\n(cid:2)\u03c0\u0398aD\u03b1(\u0398a) \u2212 \u03c0\u0398l(a)D\u03b1(\u0398l(a)) \u2212 \u03c0\u0398r(a)D\u03b1(\u0398r(a))(cid:3) .\n\n(\u03bb \u2212 1)\u03bbda \u03c0\u0398a\n\na\u2208I\n\nk=1\n\n1\n\n1+log2 \u03bb in (6a). Now, we note from Lemma 1 that\n\n(6a)\n\n(6b)\n\n(7)\n\n(8)\n\n=\u21d2 \u03bbH\u03b1(\u03a0y) =\n\nH\u03b1(\u03a0y) =\n\nlog2\n\n\u03c0\u03b1\n\u0398k\n\nk=1\n\n1 \u2212 \u03b1\n\n(cid:32) K(cid:88)\n(cid:101)L\u03bb =(cid:88)\n\u03bbH\u03b1(\u03a0y) = 1 +(cid:88)\n\na\u2208I\n\na\u2208I\n\nwhere we use the de\ufb01nition of \u03b1, i.e., \u03b1 =\n\nwhere da denotes the depth of internal node \u2018a\u2019 in the tree T . Similarly, we note from (6b) and\nLemma 2 that\n\n(cid:101)H\u03b1 =\n\nof the objects at that node.\n\nFinally, the result follows from (7) and (8) above.\n\nLemma 1. The function (cid:101)L\u03bb can be decomposed over the internal nodes in a tree T , as (cid:101)L\u03bb =\n(cid:80)\na\u2208I \u03bbda \u03c0\u0398a, where da denotes the depth of internal node a \u2208 I and \u03c0\u0398a is the probability mass\nLemma 2. The function (cid:101)H\u03b1 can be decomposed over the internal nodes in a tree T , as\n(cid:88)\n(cid:2)\u03c0\u0398aD\u03b1(\u0398a) \u2212 \u03c0\u0398l(a)D\u03b1(\u0398l(a)) \u2212 \u03c0\u0398r(a)D\u03b1(\u0398r(a))(cid:3)\n(cid:17) 1\n(cid:17)\u03b1(cid:105) 1\n(cid:16) \u03c0\u0398k\n\nwhere D\u03b1(\u0398a) :=\ninternal node a \u2208 I.\nThe above two lemmas can be proved using induction over subtrees rooted at any internal node \u2018a\u2019\nin the tree. The details may be found in the Supplemental Material.\n\n\u03b1 and \u03c0\u0398a denotes the probability mass of the objects at any\n\n(cid:16)(cid:80)K\n(cid:104)(cid:80)K\n\nk=1 \u03c0\u03b1\n\u0398k\n\na\n\u03c0\u0398a\n\na\u2208I\n\nk=1\n\n1\n\n\u03b1\n\n8\n\n\fReferences\n[1] S. Dasgupta, \u201cAnalysis of a greedy active learning strategy,\u201d Advances in Neural Information Processing\n\nSystems, 2004.\n\n[2] R. Nowak, \u201cGeneralized binary search,\u201d Proceedings of the 46th Allerton Conference on Communications,\n\nControl and Computing, pp. 568\u2013574, 2008.\n\n[3] \u2014\u2014, \u201cNoisy generalized binary search,\u201d Advances in Neural Information Processing Systems, vol. 22,\n\npp. 1366\u20131374, 2009.\n\n[4] D. Golovin and A. Krause, \u201cAdaptive Submodularity: A new approach to active learning and stochastic\n\noptimization,\u201d In Proceedings of International Conference on Learning Theory (COLT), 2010.\n\n[5] D. W. Loveland, \u201cPerformance bounds for binary testing with arbitrary weights,\u201d Acta Informatica, 1985.\n[6] F. Yu, F. Tu, H. Tu, and K. Pattipati, \u201cMultiple disease (fault) diagnosis with applications to the QMR-DT\nproblem,\u201d Proceedings of IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp.\n1187\u20131192, October 2003.\n\n[7] J. Shiozaki, H. Matsuyama, E. O\u2019Shima, and M. Iri, \u201cAn improved algorithm for diagnosis of system\nfailures in the chemical process,\u201d Computational Chemical Engineering, vol. 9, no. 3, pp. 285\u2013293, 1985.\n[8] S. Bhavnani, A. Abraham, C. Demeniuk, M. Gebrekristos, A. Gong, S. Nainwal, G. Vallabha, and\nR. Richardson, \u201cNetwork analysis of toxic chemicals and symptoms: Implications for designing \ufb01rst-\nresponder systems,\u201d Proceedings of American Medical Informatics Association, 2007.\n\n[9] D. Geman and B. Jedynak, \u201cAn active testing model for tracking roads in satellite images,\u201d IEEE Trans-\n\nactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 1\u201314, 1996.\n\n[10] M. J. Swain and M. A. Stricker, \u201cPromising directions in active vision,\u201d International Journal of Computer\n\nVision, vol. 11, no. 2, pp. 109\u2013126, 1993.\n\n[11] A. Gupta, R. Krishnaswamy, V. Nagarajan, and R. Ravi, \u201cApproximation algorithms for optimal decision\n\ntrees and adaptive TSP problems,\u201d 2010, available online at arXiv.org:1003.0722.\n\n[12] M. Garey, \u201cOptimal binary identi\ufb01cation procedures,\u201d SIAM Journal on Applied Mathematics, vol. 23(2),\n\npp. 173\u2013186, 1972.\n\n[13] L. Hya\ufb01l and R. Rivest, \u201cConstructing optimal binary decision trees is NP-complete,\u201d Information Pro-\n\ncessing Letters, vol. 5(1), pp. 15\u201317, 1976.\n\n[14] S. R. Kosaraju, T. M. Przytycka, and R. S. Borgstrom, \u201cOn an optimal split tree problem,\u201d Proceedings of\n\n6th International Workshop on Algorithms and Data Structures, WADS, pp. 11\u201314, 1999.\n\n[15] R. M. Goodman and P. Smyth, \u201cDecision tree design from a communication theory standpoint,\u201d IEEE\n\nTransactions on Information Theory, vol. 34, no. 5, 1988.\n\n[16] P. A. Humblet, \u201cGeneralization of Huffman coding to minimize the probability of buffer over\ufb02ow,\u201d IEEE\n\nTransactions on Information Theory, vol. IT-27, no. 2, pp. 230\u2013232, March 1981.\n\n[17] F. Schulz, \u201cTrees with exponentially growing costs,\u201d Information and Computation, vol. 206, 2008.\n[18] M. B. Baer, \u201cR\u00b4enyi to R\u00b4enyi - source coding under seige,\u201d Proceedings of IEEE International Symposium\n\non Information Theory, pp. 1258\u20131262, July 2006.\n\n[19] T. M. Cover and J. A. Thomas, Elements of Information Theory.\n[20] D. A. Huffman, \u201cA method for the construction of minimum-redundancy codes,\u201d Proceedings of the\n\nJohn Wiley, 1991.\n\nInstitute of Radio Engineers, 1952.\n\n[21] C. E. Shannon, \u201cA mathematical theory of communication,\u201d Bell Systems Technical Journal, vol. 27, pp.\n\n379 \u2013 423, July 1948.\n\n[22] R. M. Fano, Transmission of Information. MIT Press, 1961.\n[23] D. Golovin, D. Ray, and A. Krause, \u201cNear-optimal Bayesian active learning with noisy observations,\u201d to\n\nappear in the Proceedings of the Neural Information Processing Systems (NIPS), 2010.\n\n[24] S. Dasgupta, \u201cCoarse sample complexity bounds for active learning,\u201d Advances in Neural Information\n\nProcessing Systems, 2006.\n\n[25] G. Bellala, S. Bhavnani, and C. Scott, \u201cGroup-based query learning for rapid diagnosis in time-critical\n\nsituations,\u201d Tech. Rep., 2009, available online at arXiv.org:0911.4511.\n\n[26] L. L. Campbell, \u201cA coding problem and R\u00b4enyi\u2019s entropy,\u201d Information and Control, vol. 8, no. 4, pp.\n\n423\u2013429, August 1965.\n\n[27] G. Bellala, S. Bhavnani, and C. Scott, \u201cQuery learning with exponential query costs,\u201d Tech. Rep., 2010,\n\navailable online at arXiv.org:1002.4019.\n\n9\n\n\f", "award": [], "sourceid": 671, "authors": [{"given_name": "Gowtham", "family_name": "Bellala", "institution": null}, {"given_name": "Suresh", "family_name": "Bhavnani", "institution": null}, {"given_name": "Clayton", "family_name": "Scott", "institution": null}]}