{"title": "Active Learning by Querying Informative and Representative Examples", "book": "Advances in Neural Information Processing Systems", "page_first": 892, "page_last": 900, "abstract": "Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criterions for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of -the-art active learning approaches.", "full_text": "Active Learning by Querying\n\nInformative and Representative Examples\n\nSheng-Jun Huang1\n\nRong Jin2\n\nZhi-Hua Zhou1\n\n1National Key Laboratory for Novel Software Technology,\n\nNanjing University, Nanjing 210093, China\n\n2Department of Computer Science and Engineering,\nMichigan State University, East Lansing, MI 48824\n\n{huangsj, zhouzh}@lamda.nju.edu.cn\n\nrongjin@cse.msu.edu\n\nAbstract\n\nMost active learning approaches select either informative or representative unla-\nbeled instances to query their labels. Although several active learning algorithms\nhave been proposed to combine the two criteria for query selection, they are usu-\nally ad hoc in \ufb01nding unlabeled instances that are both informative and repre-\nsentative. We address this challenge by a principled approach, termed QUIRE,\nbased on the min-max view of active learning. The proposed approach provides\na systematic way for measuring and combining the informativeness and represen-\ntativeness of an instance. Extensive experimental results show that the proposed\nQUIRE approach outperforms several state-of -the-art active learning approaches.\n\n1\n\nIntroduction\n\nIn this work, we focus on the pool-based active learning, which selects an unlabeled instance from\na given pool for manually labeling. There are two main criteria, i.e., informativeness and represen-\ntativeness, that are widely used for active query selection. Informativeness measures the ability of\nan instance in reducing the uncertainty of a statistical model, while representativeness measures if\nan instance well represents the overall input patterns of unlabeled data [16]. Most active learning\nalgorithms only deploy one of the two criteria for query selection, which could signi\ufb01cantly limit the\nperformance of active learning: approaches favoring informative instances usually do not exploit the\nstructure information of unlabeled data, leading to serious sample bias and consequently undesirable\nperformance for active learning; approaches favoring representative instances may require querying\na relatively large number of instances before the optimal decision boundary is found. Although sev-\neral active learning algorithms [19, 8, 11] have been proposed to \ufb01nd the unlabeled instances that\nare both informative and representative, they are usually ad hoc in measuring the informativeness\nand representativeness of an instance, leading to suboptimal performance.\n\nIn this paper, we propose a new active learning approach by QUerying Informative and Represen-\ntative Examples (QUIRE for short). The proposed approach is based on the min-max view of active\nlearning [11], which provides a systematic way for measuring and combining the informativeness\nand the representativeness. The interesting feature of the proposed approach is that it measures both\nthe informativeness and representativeness of an instance by its prediction uncertainty: the informa-\ntiveness of an instance x is measured by its prediction uncertainty based on the labeled data, while\nthe representativeness of x is measured by its prediction uncertainty based on the unlabeled data.\n\nThe rest of this paper is organized as follows: Section 2 reviews the related work on active learning;\nSection 3 presents the proposed approach in details; experimental results are reported in Section 4;\nSection 5 concludes this work with issues to be addressed in the future.\n\n1\n\n\f(a) A binary classi\ufb01cation\n\n(b) An approach favoring\n\nproblem\n\ninformative instances\n\n(c) An approach favoring\nrepresentative instances\n\n(d) Our approach\n\nFigure 1: An illustrative example for selecting informative and representative instances\n\n2 Related Work\n\nQuerying the most informative instances is probably the most popular approach for active learning.\nExemplar approaches include query-by-committee [17, 6, 10], uncertainty sampling [13, 12, 18, 2]\nand optimal experimental design [9, 20]. The main weakness of these approaches is that they are\nunable to exploit the abundance of unlabeled data and the selection of query instances is solely\ndetermined by a small number of labeled examples, making it prone to sample bias. Another school\nof active learning is to select the instances that are most representative to the unlabeled data. These\napproaches aim to exploit the cluster structure of unlabeled data [14, 7], usually by a clustering\nmethod. The main weakness of these approaches is that their performance heavily depends on the\nquality of clustering results [7].\n\nSeveral active learning algorithms tried to combine the informativeness measure with the represen-\ntativeness measure for \ufb01nding the optimal query instances. In [19], the authors propose a sampling\nalgorithm that exploits both the cluster information and the classi\ufb01cation margins of unlabeled in-\nstances. One limitation of this approach is that since clustering is only performed on the instances\nwithin the classi\ufb01cation margin, it is unable to exploit the unlabeled instances outside the margin.\nIn [8], Donmez et al. extended the active learning approach in [14] by dynamically balancing the\nuncertainty and the density of instances for query selection. This approach is ad hoc in combining\nthe measure of informativeness and representativeness for query selection, leading to suboptimal\nperformance.\n\nOur work is based on the min-max view of active learning, which was \ufb01rst proposed in the study of\nbatch mode active learning [11]. Unlike [11] which measures the representativeness of an instance\nby its similarity to the remaining unlabeled instances, our proposed measure of representativeness\ntakes into account the cluster structure of unlabeled instances as well as the class assignments of the\nlabeled examples, leading to a better selection of unlabeled instances for active learning.\n\n3 QUIRE: QUery Informative and Representative Examples\n\nWe start with a synthesized example that illustrates the importance of querying instances that are\nboth informative and representative for active learning. Figure 1 (a) shows a binary classi\ufb01cation\nproblem with each class represented by a different legend. We examine three different active learning\nalgorithms by allowing them to sequentially select 15 data points. Figure 1 (b) and (c) show the\ndata points selected by an approach favoring informative instances (i.e., [18]) and by an approach\nfavoring representative instances (i.e., [7]), respectively. As indicated by Figure 1 (b), due to the\nsample bias, the approach preferring informative instances tends to choose the data points close to\nthe horizontal line, leading to incorrect decision boundaries. On the other hand, as indicated by\nFigure 1 (c), the approach preferring representative instances is able to identify the approximately\ncorrect decision boundary but with a slow convergence. Figure 1 (d) shows the data points selected\nby the proposed approach that favors data points that are both informative and representative. It is\nclear that the proposed algorithm is more ef\ufb01cient in \ufb01nding the accurate decision boundary than the\nother two approaches.\n\nWe denote by D = {(x1, y1), (x2, y2), \u00b7 \u00b7 \u00b7 , (xnl , ynl ), xnl+1, \u00b7 \u00b7 \u00b7 , xn} the training data set that\nconsists of nl labeled instances and nu = n \u2212 nl unlabeled instances, where each instance\nxi = [xi1, xi2, \u00b7 \u00b7 \u00b7 , xid]\u22a4 is a vector of d dimension and yi \u2208 {\u22121, +1} is the class label of xi.\n\n2\n\n\fActive learning selects one instance xs from the pool of unlabeled data to query its class label. For\nconvenience, we divide the data set D into three parts: the labeled data Dl, the currently selected\ninstance xs, and the rest of the unlabeled data Du. We also use Da = Du \u222a {xs} to represent all\nthe unlabeled instances. We use y = [yl, ys, yu] for the class label assignment of the entire data set,\nwhere yl, ys and yu are the class labels assigned to Dl, xs and Du, respectively. Finally, we denote\nby ya = [ys, yu] the class assignment for all the unlabeled instances.\n\n3.1 The Framework\n\nTo motivate the proposed approach, we \ufb01rst re-examine the margin-based active learning from the\nviewpoint of min-max [11]. Let f \u2217 be a classi\ufb01cation model trained by the labeled examples, i.e.,\n\nf \u2217 = arg min\n\nf \u2208H\n\n\u03bb\n2\n\n|f |2\n\nH +\n\nnlXi=1\n\n\u2113(yi, f (xi)),\n\n(1)\n\nwhere H is a reproducing kernel Hilbert space endowed with kernel function \u03ba(\u00b7, \u00b7) : Rd \u00d7 Rd \u2192 R.\n\u2113(z) is the loss function. Given the classi\ufb01er f \u2217, the margin-based approach chooses the unlabeled\ninstance closest to the decision boundary, i.e.,\n\ns\u2217 = arg min\nnl<s\u2264n\n\n|f \u2217(xs)|.\n\nIt is shown in the supplementary document that this criterion can be approximated by\n\ns\u2217 = arg min\nn1<s\u2264n\n\nL(Dl, xs),\n\nwhere\n\nL(Dl, xs) = max\nys=\u00b11\n\nmin\nf \u2208H\n\n\u03bb\n2\n\n|f |2\n\nH +\n\nnlXi=1\n\n\u2113(yi, f (xi)) + \u2113(ys, f (xs)).\n\n(2)\n\n(3)\n\n(4)\n\nWe can also write Eq. 3 in a minimax form\nmin\n\nwhere\n\nA(Dl, xs) = min\nf \u2208H\n\nmax\nys=\u00b11\n\nA(Dl, xs),\n\nnl<s\u2264n\n\u03bb\n2\n\n|f |2\n\nH +\n\nnlXi=1\n\n\u2113(yi, f (xi)) + \u2113(ys, f (xs)).\n\nIn this min-max view of active learning, it guarantees that the selected instance xs will lead to a\nsmall value for the objective function regardless of its class label ys. In order to select queries that\nare both informative and representative, we extend the evaluation function L(Dl, xs) to include all\nthe unlabeled data. Hypothetically, if we know the class assignment yu for the unselected unlabeled\ninstances in Du, the evaluation function can be modi\ufb01ed as\n\nL(Dl, Du, yu, xs) = max\nys=\u00b11\n\nmin\nf \u2208H\n\n\u03bb\n2\n\n|f |2\n\nH +\n\nnXi=1\n\n\u2113(yi, f (xi)).\n\n(5)\n\nThe problem is that the class assignment yu is unknown. According to the manifold assumption [3],\nwe expect that a good solution for yu should result in a small value of L(Dl, Du, yu, xs). We\ntherefore approximate the solution for yu by minimizing L(Dl, Du, yu, xs), which leads to the\nfollowing evaluation function for query selection:\n\nbL(Dl, Du, xs) =\n\nmin\n\nyu\u2208{\u00b11}nu\u22121\n\nL(Dl, Du, yu, xs)\n\n(6)\n\n=\n\nmin\n\nyu\u2208{\u00b11}nu\u22121\n\nmax\nys=\u00b11\n\nmin\nf \u2208H\n\n3.2 The Solution\n\n\u03bb\n2\n\n|f |2\n\nH +\n\nnXi=1\n\n\u2113(yi, f (xi))\n\nFor computational simplicity, for the rest of this work, we choose a quadratic loss function, i.e.,\n\n\u2113(y,by) = (y \u2212by)2/2 1. It is straightforward to show\n\n|f |2\n\nH +\n\n(yi \u2212 f (xi))2 =\n\nmin\nf \u2208H\n\n\u03bb\n2\n\n1\n2\n\nnXi=1\n\n1\n2\n\ny\u22a4Ly,\n\n1Although quadratic loss may not be ideal for classi\ufb01cation, it does yield competitive classi\ufb01cation results\n\nwhen compared to the other loss functions such as hinge loss [15].\n\n3\n\n\fwhere L = (K + \u03bbI)\u22121 and K = [\u03ba(xi, xj)]n\u00d7n is the kernel matrix of size n \u00d7 n. Thus, the\n\nevaluation function bL(Dl, Du, xs) is simpli\ufb01ed as\n\nmin\n\nbL(Dl, Du, xs) =\n\nyu\u2208{\u22121,+1}nu\u22121\n\nmax\n\nys\u2208{\u22121,+1}\n\ny\u22a4Ly.\n\n(7)\n\nOur goal is to ef\ufb01ciently compute the above quantity for each unlabeled instance. For the conve-\nnience of presentation, we refer to by subscript u the rows/columns in a matrix M for the unlabeled\ninstances in Du, by subscript l the rows/columns in M for labeled instances in Dl, and by subscript\ns the row/column in M for the selected instance. We also refer to by subscript a the rows/columns\nin M for all the unlabeled instances (i.e., Du \u222a {xs}). Using these conventions, we rewrite the\nobjective y\u22a4Ly as\n\ny\u22a4Ly = ylLl,lyl + Ls,s + yT\n\nu Lu,uyu + 2yT\n\nu (Lu,lyl + Lu,sys) + 2ysy\u22a4\n\nl Ll,s.\n\nNote that since the above objective function is concave (linear) in ys and convex (quadratic) in\nyu, we can switch the maximization of yu with the minimization of ys in (7). By relaxing yu to\ncontinuous variables, the solution to minyu y\u22a4Ly is given by\n\nleading to the following expression for the evaluation function bL(Dl, Du, xs):\n\n{2ysLs,lyl\n\nl Ll,lyl + max\n\nys\n\nbyu = \u2212Lu,u\nbL(Dl, Du, xs) = Ls,s + yT\n\n\u22121(Lu,lyl + Lu,sys),\n\n\u2212(Lu,lyl + Lu,sys)T Lu,u\n\n\u22121(Lu,lyl + Lu,sys)}\n\n(8)\n\n(9)\n\nwhere the last step follows the relation\n\nLs,s\n\ndet(La,a)\n\n\u221d Ls,s \u2212\n\n+ 2(cid:12)(cid:12)(cid:0)Ls,l \u2212 Ls,uL\u22121\nu,uLu,l(cid:1) yl(cid:12)(cid:12) ,\nA21 A22(cid:21)(cid:19)= det(A22)det(cid:0)A11 \u2212 A12A\u22121\n22 A21(cid:1) .\n\ndet(cid:18)(cid:20)A11 A12\n\nNote that although yu is relaxed to real numbers, according to our empirical studies, we \ufb01nd that in\nmost cases, yu falls between \u22121 and +1.\n\nRemark. The evaluation function bL(Dl, Du, xs) essentially consists of two components: Ls,s \u2212\n\ndet(La,a)/Ls,s and |(Ls,l \u2212 Ls,uL\u22121\nu,uLu,l)yl|. Minimizing the \ufb01rst component is equivalent to\nminimizing Ls,s because La,a is independent from the selected instance xs. Since L = (K +\u03bbI)\u22121,\nwe have\n\nLs,s = (cid:20)Ks,s \u2212 (Ks,l, Ks,u)(cid:18) Kl,l Kl,u\n\nKu,l Ku,u(cid:19)(cid:18) Kl,s\n(Ks,l, Ks,u)(cid:18) Kl,l Kl,u\n\nKu,s(cid:19)(cid:21)\u22121\nKu,l Ku,u(cid:19)(cid:18) Kl,s\n\nKu,s(cid:19)(cid:21) .\n\n\u2248\n\n1\n\nKs,s (cid:20)1 +\n\n1\n\nKs,s\n\nTherefore, to choose an instance with small Ls,s, we select the instance with large self-similarity\nKs,s. When self-similarity Ks,s is a constant, this term will not affect query selection.\n\nTo analyze the effect of the second component, we approximate it as:\n\n(10)\n\n2(cid:12)(cid:12)(cid:0)Ls,l \u2212 Ls,uL\u22121\n\nu,uLu,l(cid:1) yl(cid:12)(cid:12) \u2248 2 |Ls,lyl| + 2(cid:12)(cid:12)Ls,uL\u22121\n\u2248 2|Ls,lyl| + 2|Ls,ubyu|.\n\nu,uLu,lyl(cid:12)(cid:12)\n\nThe \ufb01rst term in the above approximation measures the con\ufb01dence in predicting xs using only\nlabeled data, which corresponds to the informativeness of xs. The second term measures the pre-\ndiction con\ufb01dence using only the predicted labels of the unlabeled data, which can be viewed as the\nmeasure of representativeness. This is because when xs is a representative instance, it is expected to\nshare a large similarity with many of the unlabeled instances in the pool. As a result, the prediction\n\nfor xs by the unlabeled data in Du is decided by the average of their assigned class labelsbyu. If we\n\nassume that the classes are evenly distributed over the unlabeled data, we should expect a low con-\n\ufb01dence in predicting the class label for xs by unlabeled data. It is important to note that unlike the\n\n4\n\n\fAlgorithm 1 The QUIRE Algorithm\n\nInput:\n\nD : A data set of n instances\n\nInitialize:\n\nDl = \u2205; nl = 0 % no labeled data is available at the very beginning\nDu = D; nu = n % the pool of unlabeled data\n\nCalculate K\nrepeat\n\na,a using Proposition 2 and det(La,a)\n\nCalculate L\u22121\nfor s = 1 to nu do\n\nCalculate L\u22121\n\nuu according to Theorem 1\n\nend for\n\nCalculate bL(Dl, Du, xs) using Eq. 9\nSelect the xs\u2217 with the smallest bL(Dl, Du, xs\u2217 ) and query its label ys\u2217\n\nDl = Dl \u222a (xs\u2217, ys\u2217 ); Du = Du \\ xs\u2217\n\nuntil the number of queries or the required accuracy is reached\n\nexisting work that measures the representativeness only by the cluster structure of unlabeled data,\n\nour proposed measure of representativeness depends on byu, which essentially combines the cluster\n\nstructure of unlabeled data with the class assignments of labeled data. Given high-dimensional data,\nthere could be many possible cluster structures that are consistent with the unlabeled data and it is\nunclear which one is consistent with the target classi\ufb01cation problem. It is therefore critical to take\ninto account the label information when exploiting the cluster structure of unlabeled data.\n\n3.3 Ef\ufb01cient Algorithm\n\nComputing the evaluation function bL(Dl, Du, xs) in Eq. 9 requires computing L\u22121\n\nu,u for every un-\nlabeled instance xs, leading to high computational cost when the number of unlabeled instances is\nvery large. The theorem below allows us to improve the computational ef\ufb01ciency dramatically.\n\nTheorem 1. Let\n\nWe have\n\na,a =(cid:18)Ls,s Ls,u\n\nLu,s Lu,u(cid:19)\u22121\n\nL\u22121\n\n=(cid:18) a \u2212b\u22a4\n\u2212b D (cid:19) .\n\nL\u22121\n\nu,u = D \u2212\n\n1\na\n\nbb\u22a4.\n\nThe proof can be found in the supplementary document. As indicated by Theorem 1, we only need\nto compute L\u22121\na,a. The following\nproposition allows us to simplify the computation for L\u22121\na,a.\n\nu,u can be computed directly from L\u22121\n\na,a once; for each xs, its L\u22121\n\nProposition 2. L\u22121\n\na,a = (\u03bbIa + Ka,a) \u2212 Ka,l(\u03bbIl + Kl,l)\u22121Kl,a\n\nProposition 2 follows directly from the inverse of a block matrix. As indicated by Proposition 2,\nwe only need to compute (\u03bbI + Kl,l)\u22121. Given that the number of labeled examples is relatively\nsmall compared to the size of unlabeled data, the computation of L\u22121\na,a is in general ef\ufb01cient. The\npseudo-code of QUIRE is summarized in Algorithm 1. Excluding the time for computing the kernel\nmatrix, the computational complexity of our algorithm is just O(nu).\n\n4 Experiments\n\nWe compare QUIRE with the following \ufb01ve baseline approaches: (1) RANDOM: randomly select\nquery instances, (2) MARGIN: margin-based active learning [18], a representative approach which\nselects informative instances, (3) CLUSTER: hierarchical-clustering-based active learning [7], a rep-\nresentative approach that chooses representative instances, (4) IDE: active learning that selects in-\nformative and diverse examples [11], and (5) DUAL: a dual strategy for active learning that exploits\nboth informativeness and representativeness for query selection. Note that the original algorithm\nin [11] is designed for batch mode active learning. We turn it into an active learning algorithm that\nselects a single instance in each iteration by setting the parameter k = 1.\n\n5\n\n\f80\n\n70\n\n60\n\n)\n\n%\n\n(\n \ny\nc\na\nr\nu\nc\nc\nA\n\n50\n\n \n0\n\n100\n\n90\n\n80\n\n70\n\n60\n\n50\n\n \n0\n\n100\n\n90\n\n80\n\n70\n\n60\n\n50\n\n40\n\n \n0\n\n)\n\n%\n\n(\n \ny\nc\na\nr\nu\nc\nc\nA\n\n)\n\n%\n\n(\n \n\ny\nc\na\nr\nu\nc\nc\nA\n\n90\n\n80\n\n70\n\n60\n\n)\n\n%\n\n(\n \n\ny\nc\na\nr\nu\nc\nc\nA\n\n50\n\n \n0\n\n \n\n \n\n \n\n90\n\n80\n\n70\n\n60\n\n)\n\n%\n\n(\n \ny\nc\na\nr\nu\nc\nc\nA\n\n50\n\n \n0\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n20\n\n40\n\n60\n\n80\n\nNumber of queried examples\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n20\n\n40\n\n60\n\n80\n\n100\n\nNumber of queried examples\n\n(a) austra\n\n(b) digit1\n\n \n\n100\n\n \n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n10\n\n5\n25\nNumber of queried examples\n\n15\n\n20\n\n30\n\n)\n\n%\n\n(\n \ny\nc\na\nr\nu\nc\nc\nA\n\n90\n\n80\n\n70\n\n60\n\n50\n\n40\n\n \n0\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n100\n\n50\n250\nNumber of queried examples\n\n150\n\n200\n\n300\n\n)\n\n%\n\n(\n \ny\nc\na\nr\nu\nc\nc\nA\n\n80\n\n70\n\n60\n\n50\n\n)\n\n%\n\n(\n \ny\nc\na\nr\nu\nc\nc\nA\n\n90\n\n80\n\n70\n\n60\n\n50\n\n \n0\n\n \n0\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n200\n\n100\n500\nNumber of queried examples\n\n300\n\n400\n\n(c) g241n\n\n600\n\n \n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n50\n\n100\n\n150\n\nNumber of queried examples\n\n(d) isolet\n\n(e) titato\n\n(f) vehicle\n\n \n\n100\n\n)\n\n%\n\n(\n \n\ny\nc\na\nr\nu\nc\nc\nA\n\n90\n\n80\n\n70\n\n60\n\n50\n\n \n0\n\n60\n\n \n\n100\n\n)\n\n%\n\n(\n \n\ny\nc\na\nr\nu\nc\nc\nA\n\n90\n\n80\n\n70\n\n60\n\n50\n\n \n0\n\n60\n\n \n\n100\n\n)\n\n%\n\n(\n \n\ny\nc\na\nr\nu\nc\nc\nA\n\n90\n\n80\n\n70\n\n60\n\n50\n\n \n0\n\n60\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n20\n\n10\n50\nNumber of queried examples\n\n40\n\n30\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n20\n\n10\n50\nNumber of queried examples\n\n40\n\n30\n\n(h) letterDvsP\n\n(i) letterEvsF\n\n \n\n100\n\n)\n\n%\n\n(\n \n\ny\nc\na\nr\nu\nc\nc\nA\n\n90\n\n80\n\n70\n\n60\n\n50\n\n \n0\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n20\n\n40\n\n60\n\n80\n\n100\n\nNumber of queried examples\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n20\n\n10\n50\nNumber of queried examples\n\n40\n\n30\n\n \n\n60\n\n \n\n60\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n20\n\n10\n50\nNumber of queried examples\n\n30\n\n40\n\n(g) wdbc\n\nRandom\nMargin\nCluster\nIDE\nDUAL\nQuire\n\n20\n\n10\n50\nNumber of queried examples\n\n40\n\n30\n\n(j) letterIvsJ\n\n(k) letterMvsN\n\n(l) letterUvsV\n\nFigure 2: Comparison on classi\ufb01cation accuracy\n\nTwelve data sets are used in our study and their statistics are shown in the supplementary document.\nDigit1 and g241n are benchmark data sets for semi-supervised learning [5]; austria, isolet, titato,\nvechicle, and wdbc are UCI data sets [1]; letter is a multi-class data set [1] from which we select\n\ufb01ve pairs of letters that are relatively dif\ufb01cult to distinguish, i.e., D vs P, E vs F, I vs J, M vs N,\nU vs V, and construct a binary class data set for each pair. Each data set is randomly divided into\ntwo parts of equal size, with one part as the test data and the other part as the unlabeled data that is\nused for active learning. We assume that no labeled data is available at the very beginning of active\nlearning. For MARGIN, IDE and DUAL, instances are randomly selected when no classi\ufb01cation\nmodel is available, which only takes place at the beginning. In each iteration, an unlabeled instance\nis \ufb01rst selected to solicit its class label and the classi\ufb01cation model is then retrained using additional\nlabeled instance. We evaluate the classi\ufb01cation model by its performance on the holdout test data.\nBoth classi\ufb01cation accuracy and Area Under ROC curve (AUC) are used for evaluation metrics. For\nevery data set, we run the experiment for ten times, each with a random partition of the data set. We\nalso conduct experiments with a few initially labeled examples and have similar observation. Due to\nthe space limit, we put in the supplementary document the experimental results with a few initially\nlabeled examples. In all the experiments, the parameter \u03bb is set to 1 and a RBF kernel with default\n\n6\n\n\fTable 1: Comparison on AUC values (mean \u00b1 std). The best performance and its comparable\nperformances based on paired t-tests at 95% signi\ufb01cance level are highlighted in boldface.\n\nData\n\nAlgorithms\n\nNumber of queries (percentage of the unlabeled data)\n\n5%\n\n10%\n\n20%\n\n30%\n\n40%\n\n50%\n\n80%\n\naustra\n\ndigit1\n\ng241n\n\nisolet\n\ntitato\n\nvehicle\n\nwdbc\n\nletterDvsP\n\nletterEvsF\n\nletterIvsJ\n\nletterMvsN\n\nletterUvsV\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nQUIRE\n\n.868\u00b1.027\n.751\u00b1.137\n.877\u00b1.045\n.858\u00b1.101\n.866\u00b1.037\n.887\u00b1.014\n.945\u00b1.009\n.941\u00b1.028\n.938\u00b1.035\n.954\u00b1.011\n.929\u00b1.014\n.976\u00b1.006\n.713\u00b1.040\n.700\u00b1.057\n.720\u00b1.038\n.727\u00b1.030\n.722\u00b1.040\n.757\u00b1.035\n.995\u00b1.006\n.965\u00b1.052\n.998\u00b1.002\n.998\u00b1.003\n.993\u00b1.008\n.997\u00b1.002\n.762\u00b1.033\n.645\u00b1.096\n.717\u00b1.087\n.735\u00b1.040\n.708\u00b1.069\n.736\u00b1.037\n.818\u00b1.064\n.693\u00b1.078\n.771\u00b1.088\n.731\u00b1.141\n.680\u00b1.074\n.750\u00b1.137\n.984\u00b1.006\n.967\u00b1.038\n.981\u00b1.007\n.983\u00b1.006\n.955\u00b1.025\n.985\u00b1.006\n.990\u00b1.004\n.994\u00b1.005\n.988\u00b1.008\n.992\u00b1.006\n.978\u00b1.005\n.998\u00b1.001\n.977\u00b1.020\n.987\u00b1.008\n.975\u00b1.016\n.977\u00b1.014\n.976\u00b1.011\n.988\u00b1.009\n.943\u00b1.025\n.882\u00b1.096\n.952\u00b1.022\n.934\u00b1.030\n.819\u00b1.120\n.951\u00b1.023\n.977\u00b1.010\n.964\u00b1.040\n.971\u00b1.017\n.969\u00b1.017\n.950\u00b1.025\n.986\u00b1.007\n.992\u00b1.005\n.998\u00b1.002\n.990\u00b1.008\n.995\u00b1.004\n.983\u00b1.014\n.999\u00b1.001\n\n.894\u00b1.022\n.838\u00b1.119\n.888\u00b1.029\n.885\u00b1.058\n.878\u00b1.036\n.901\u00b1.010\n.969\u00b1.006\n.972\u00b1.009\n.952\u00b1.018\n.973\u00b1.007\n.953\u00b1.009\n.986\u00b1.003\n.769\u00b1.021\n.751\u00b1.048\n.770\u00b1.024\n.786\u00b1.029\n.751\u00b1.019\n.825\u00b1.019\n.998\u00b1.002\n.999\u00b1.001\n.999\u00b1.002\n.999\u00b1.002\n.999\u00b1.001\n.999\u00b1.001\n.861\u00b1.031\n.753\u00b1.078\n.806\u00b1.054\n.906\u00b1.029\n.782\u00b1.064\n.861\u00b1.025\n.864\u00b1.039\n.828\u00b1.077\n.845\u00b1.056\n.849\u00b1.106\n.706\u00b1.114\n.912\u00b1.024\n.986\u00b1.005\n.990\u00b1.002\n.987\u00b1.004\n.984\u00b1.008\n.964\u00b1.016\n.990\u00b1.004\n.995\u00b1.002\n.999\u00b1.001\n.995\u00b1.004\n.997\u00b1.002\n.986\u00b1.001\n.999\u00b1.001\n.988\u00b1.009\n.999\u00b1.001\n.991\u00b1.003\n.995\u00b1.003\n.993\u00b1.003\n.999\u00b1.000\n.966\u00b1.017\n.960\u00b1.027\n.961\u00b1.017\n.969\u00b1.011\n.897\u00b1.058\n.963\u00b1.013\n.992\u00b1.002\n.991\u00b1.014\n.986\u00b1.009\n.988\u00b1.007\n.972\u00b1.011\n.996\u00b1.003\n.996\u00b1.004\n1.00\u00b1.000\n.996\u00b1.009\n.999\u00b1.001\n.986\u00b1.008\n1.00\u00b1.000\n\n.897\u00b1.023\n.885\u00b1.043\n.894\u00b1.015\n.902\u00b1.012\n.875\u00b1.018\n.906\u00b1.016\n.979\u00b1.005\n.989\u00b1.002\n.963\u00b1.019\n.987\u00b1.002\n.975\u00b1.004\n.990\u00b1.002\n.822\u00b1.018\n.830\u00b1.022\n.815\u00b1.018\n.840\u00b1.017\n.822\u00b1.011\n.857\u00b1.020\n.999\u00b1.001\n1.00\u00b1.000\n1.00\u00b1.000\n.999\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n.954\u00b1.023\n.946\u00b1.043\n.908\u00b1.031\n.996\u00b1.003\n.900\u00b1.027\n.991\u00b1.004\n.925\u00b1.032\n.883\u00b1.105\n.927\u00b1.022\n.878\u00b1.093\n.817\u00b1.061\n.956\u00b1.025\n.990\u00b1.004\n.993\u00b1.003\n.991\u00b1.003\n.990\u00b1.004\n.972\u00b1.015\n.993\u00b1.003\n.997\u00b1.002\n.999\u00b1.000\n.997\u00b1.002\n.998\u00b1.001\n.988\u00b1.004\n.999\u00b1.001\n.994\u00b1.002\n1.00\u00b1.000\n.997\u00b1.004\n.999\u00b1.000\n.996\u00b1.002\n1.00\u00b1.000\n.980\u00b1.004\n.986\u00b1.005\n.976\u00b1.008\n.979\u00b1.006\n.934\u00b1.030\n.976\u00b1.011\n.994\u00b1.003\n.999\u00b1.000\n.994\u00b1.003\n.997\u00b1.002\n.974\u00b1.007\n.998\u00b1.001\n.998\u00b1.001\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n.990\u00b1.008\n1.00\u00b1.000\n\n.901\u00b1.022\n.909\u00b1.010\n.896\u00b1.015\n.912\u00b1.008\n.876\u00b1.016\n.912\u00b1.009\n.984\u00b1.003\n.992\u00b1.002\n.974\u00b1.011\n.991\u00b1.002\n.982\u00b1.005\n.992\u00b1.002\n.854\u00b1.016\n.864\u00b1.019\n.835\u00b1.021\n.866\u00b1.016\n.838\u00b1.022\n.884\u00b1.013\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.001\n1.00\u00b1.000\n1.00\u00b1.000\n.979\u00b1.011\n.998\u00b1.001\n.971\u00b1.021\n.999\u00b1.001\n.981\u00b1.012\n.999\u00b1.001\n.949\u00b1.026\n.981\u00b1.014\n.955\u00b1.018\n.957\u00b1.037\n.875\u00b1.035\n.985\u00b1.007\n.991\u00b1.004\n.993\u00b1.003\n.992\u00b1.003\n.992\u00b1.003\n.988\u00b1.009\n.993\u00b1.003\n.998\u00b1.001\n.999\u00b1.001\n.998\u00b1.001\n.999\u00b1.001\n.990\u00b1.004\n.999\u00b1.001\n.997\u00b1.002\n1.00\u00b1.000\n.999\u00b1.001\n.999\u00b1.000\n.996\u00b1.002\n1.00\u00b1.000\n.983\u00b1.005\n.989\u00b1.006\n.985\u00b1.007\n.980\u00b1.006\n.954\u00b1.017\n.989\u00b1.010\n.996\u00b1.002\n.999\u00b1.000\n.997\u00b1.002\n.998\u00b1.001\n.980\u00b1.008\n.999\u00b1.000\n.999\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n.991\u00b1.008\n1.00\u00b1.000\n\n.909\u00b1.015\n.911\u00b1.012\n.903\u00b1.014\n.913\u00b1.009\n.879\u00b1.013\n.914\u00b1.009\n.985\u00b1.003\n.992\u00b1.002\n.985\u00b1.002\n.992\u00b1.002\n.985\u00b1.003\n.992\u00b1.002\n.873\u00b1.015\n.896\u00b1.012\n.860\u00b1.022\n.883\u00b1.013\n.865\u00b1.016\n.900\u00b1.009\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.001\n1.00\u00b1.001\n.991\u00b1.007\n1.00\u00b1.000\n.989\u00b1.010\n1.00\u00b1.001\n.995\u00b1.006\n1.00\u00b1.000\n.968\u00b1.016\n.993\u00b1.005\n.973\u00b1.010\n.977\u00b1.010\n.908\u00b1.035\n.989\u00b1.006\n.991\u00b1.004\n.993\u00b1.003\n.992\u00b1.003\n.993\u00b1.003\n.992\u00b1.003\n.993\u00b1.003\n.998\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n.996\u00b1.001\n.999\u00b1.001\n.998\u00b1.001\n1.00\u00b1.000\n1.00\u00b1.000\n.999\u00b1.000\n.996\u00b1.002\n1.00\u00b1.000\n.985\u00b1.005\n.991\u00b1.004\n.987\u00b1.006\n.982\u00b1.008\n.959\u00b1.014\n.991\u00b1.004\n.997\u00b1.001\n.999\u00b1.000\n.998\u00b1.001\n.998\u00b1.001\n.983\u00b1.007\n.999\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n.993\u00b1.007\n1.00\u00b1.000\n\n.909\u00b1.012\n.914\u00b1.009\n.907\u00b1.015\n.914\u00b1.007\n.881\u00b1.013\n.915\u00b1.007\n.988\u00b1.003\n.992\u00b1.002\n.988\u00b1.003\n.992\u00b1.002\n.987\u00b1.003\n.992\u00b1.002\n.886\u00b1.012\n.911\u00b1.008\n.880\u00b1.013\n.899\u00b1.011\n.881\u00b1.012\n.912\u00b1.006\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n.997\u00b1.004\n1.00\u00b1.000\n.997\u00b1.003\n1.00\u00b1.000\n.999\u00b1.001\n1.00\u00b1.000\n.975\u00b1.013\n.993\u00b1.005\n.978\u00b1.011\n.985\u00b1.009\n.947\u00b1.035\n.991\u00b1.005\n.991\u00b1.004\n.993\u00b1.003\n.993\u00b1.003\n.993\u00b1.003\n.992\u00b1.003\n.993\u00b1.003\n.998\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n.998\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n.998\u00b1.001\n1.00\u00b1.000\n.987\u00b1.004\n.991\u00b1.004\n.989\u00b1.005\n.985\u00b1.005\n.953\u00b1.015\n.991\u00b1.004\n.997\u00b1.001\n.999\u00b1.000\n.998\u00b1.001\n.998\u00b1.001\n.983\u00b1.007\n.999\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n.995\u00b1.005\n1.00\u00b1.000\n\n.917\u00b1.011\n.915\u00b1.008\n.913\u00b1.011\n.916\u00b1.007\n.904\u00b1.008\n.916\u00b1.007\n.991\u00b1.002\n.992\u00b1.002\n.992\u00b1.002\n.992\u00b1.002\n.991\u00b1.002\n.992\u00b1.002\n.906\u00b1.014\n.918\u00b1.008\n.909\u00b1.009\n.916\u00b1.010\n.912\u00b1.007\n.920\u00b1.009\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n.989\u00b1.006\n.992\u00b1.005\n.992\u00b1.006\n.991\u00b1.006\n.980\u00b1.016\n.992\u00b1.005\n.993\u00b1.003\n.993\u00b1.003\n.993\u00b1.003\n.993\u00b1.003\n.992\u00b1.004\n.993\u00b1.003\n.999\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n.999\u00b1.001\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n.990\u00b1.004\n.991\u00b1.004\n.991\u00b1.004\n.990\u00b1.004\n.988\u00b1.004\n.991\u00b1.004\n.998\u00b1.001\n.999\u00b1.000\n.999\u00b1.000\n.999\u00b1.000\n.998\u00b1.001\n.999\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n1.00\u00b1.000\n.999\u00b1.000\n1.00\u00b1.000\n\nparameters is used (performances with linear kernel are not as stable as that with RBF kernel).\nLibSVM [4] is used to train a SVM classi\ufb01er for all active learning approaches in comparison.\n\n7\n\n\fTable 2: Win/tie/loss counts of QUIRE versus the other methods with varied numbers of queries.\n\nAlgorithms\n\nRANDOM\nMARGIN\nCLUSTER\nIDE\nDUAL\nIn All\n\n5%\n\n4/8/0\n6/6/0\n6/6/0\n6/6/0\n8/4/0\n\n30/30/0\n\n4.1 Results\n\nNumber of queries (percentage of the unlabeled data)\n\n10%\n\n20%\n\n30%\n\n40%\n\n50%\n\n80%\n\n8/4/0\n4/7/1\n7/5/0\n6/5/1\n10/2/0\n35/23/2\n\n9/3/0\n2/8/2\n8/4/0\n6/5/1\n11/1/0\n36/21/3\n\n9/2/1\n2/8/2\n11/1/0\n8/4/0\n10/2/0\n40/17/3\n\n10/2/0\n0/11/1\n9/3/0\n8/4/0\n10/2/0\n37/22/1\n\n10/2/0\n0/11/1\n6/6/0\n8/4/0\n11/1/0\n35/24/1\n\n6/6/0\n1/11/0\n3/9/0\n2/10/0\n9/3/0\n\nIn All\n\n56/27/1\n15/62/7\n50/34/0\n44/38/2\n69/15/0\n\n21/39/0\n\n234/176/10\n\nFigure 2 shows the classi\ufb01cation accuracy of different active learning approaches with varied num-\nbers of queries. Table 1 shows the AUC values, with 5%, 10%, 20%, 30%, 40%, 50% and 80% of\nunlabeled data used as queries. For each case, the best result and its comparable performances are\nhighlighted in boldface based on paired t-tests at 95% signi\ufb01cance level. Table 2 summarizes the\nwin/tie/loss counts of QUIRE versus the other methods based on the same test. We also perform the\nWilcoxon signed ranks test at 95% signi\ufb01cance level, and obtain almost the same results, which can\nbe found in the supplementary document.\n\nFirst, we observe that the RANDOM approach tends to yield decent performance when the number\nof queries is very small. However, as the number of queries increases, this simple approach loses\nits edge and often is not as effective as the other active learning approaches. MARGIN, the most\ncommonly used approach for active learning, is not performing well at the beginning of the learn-\ning stage. As the number of queries increases, we observe that MARGIN catches up with the other\napproaches and yields decent performance. This phenomenon can be attributed to the fact that with\nonly a few training examples, the learned decision boundary tends to be inaccurate, and as a result,\nthe unlabeled instances closest to the decision boundary may not be the most informative ones. The\nperformance of CLUSTER is mixed. It works well on some data sets, but performs poorly on the\nothers. We attribute the inconsistency of CLUSTER to the fact that the identi\ufb01ed cluster structure\nof unlabeled data may not always be consistent with the target classi\ufb01cation model. The behavior\nof IDE is similar to that of CLUSTER in that it achieves good performance on certain data sets and\nfails on the others. DUAL does not yield good performance on most data sets although we have\ntried our best efforts to tune the related parameters. We attribute the failure of DUAL to the setup\nof our experiment in which no initially labeled examples are provided. Further study shows that\nstarting with a few initially labeled examples does improve the performance of DUAL though it is\nstill signi\ufb01cantly outperformed by QUIRE.Detailed results can be found in the supplementary doc-\nument. Finally, we observe that for most cases, QUIRE is able to outperform the baseline methods\nsigni\ufb01cantly, as indicated by Figure 2, Tables 1 and 2. We attribute the success of QUIRE to the prin-\nciple of choosing unlabeled instances that are both informative and representative, and the specially\ndesigned computational framework that appropriately measures and combines the informativeness\nand representativeness. The computational cost are reported in the supplementary document.\n\n5 Conclusion\n\nWe propose a new approach for active learning, called QUIRE, that is designed to \ufb01nd unlabeled in-\nstances that are both informative and representative. The proposed approach is based on the min-max\nview of active learning, which provides a systematic way for measuring and combining the infor-\nmativeness and the representativeness. Our current work is restricted to binary classi\ufb01cation. In the\nfuture, we plan to extend this work to multi-class learning. We also plan to develop the mechanism\nwhich allows the user to control the tradeoff between informativeness and representativeness based\non their domain, leading to the incorporation of domain knowledge into active learning algorithms.\n\nAcknowledgements\n\nThis work was supported in part by the NSFC (60635030), 973 Program (2010CB327903), Jiang-\nsuSF (BK2008018) and NSF (IIS-0643494).\n\n8\n\n\fReferences\n\n[1] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.\n\n[2] M. F. Balcan, A. Z. Broder, and T. Zhang. Margin based active learning. In Proceedings of the\n\n20th Annual Conference on Learning Theory, pages 35\u201350, 2007.\n\n[3] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework\nfor learning from labeled and unlabeled examples. Journal of Machine Learning Research,\n7:2399\u20132434, 2006.\n\n[4] C. C. Chang and C. J. Lin. LIBSVM: A library for support vector machines, 2001.\n\n[5] O. Chapelle, B. Sch\u00a8olkopf, and A. Zien, editors. Semi-supervised learning. MIT Press, Cam-\n\nbridge, MA, 2006.\n\n[6] I. Dagan and S. P. Engelson. Committee-based sampling for training probabilistic classi\ufb01ers.\nIn Proceedings of the 12th International Conference on Machine Learning, pages 150\u2013157,\n1995.\n\n[7] S. Dasgupta and D. Hsu. Hierarchical sampling for active learning. In Proceedings of the 25th\n\nInternational Conference on Machine Learning, pages 208\u2013215, 2008.\n\n[8] P. Donmez, J. G. Carbonell, and P. N. Bennett. Dual strategy active learning. In Proceedings\n\nof the 18th European Conference on Machine Learning, pages 116\u2013127, 2007.\n\n[9] P. Flaherty, M. I. Jordan, and A. P. Arkin. Robust design of biological experiments. In Advances\n\nin Neural Information Processing Systems 18, pages 363\u2013370, 2005.\n\n[10] Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by\n\ncommittee algorithm. Machine Learning, 28(2-3):133\u2013168, 1997.\n\n[11] S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu. Semi-supervised svm batch mode active learning\nfor image retrieval. In Proceedings of the IEEE Computer Society Conference on Computer\nVision and Pattern Recognition, 2008.\n\n[12] D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In\nProceedings of the 11th International Conference on Machine Learning, pages 148\u2013156, 1994.\n\n[13] D. D. Lewis and W. A. Gale. A sequential algorithm for training text classi\ufb01ers. In Proceedings\nof the 17th Annual International ACM-SIGIR Conference on Research and Development in\nInformation Retrieval, pages 3\u201312, 1994.\n\n[14] H. T. Nguyen and A. W. M. Smeulders. Active learning using pre-clustering. In Proceedings\n\nof the 21st International Conference on Machine Learning, pages 623\u2013630, 2004.\n\n[15] R. Rifkin R, G. Yeo, and T. Poggio. Regularized least squares classi\ufb01cation.\n\nIn S. Basu\nC. Micchelli J. A. K. Suykens, G. Horvath and J. Vandewalle, editors, Advances in Learning\nTheory: Methods, Model and Applications, NATO Science Series III: Computer and Systems\nSciences. Volume 190, pages 131\u2013154, 2003.\n\n[16] B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, Uni-\n\nversity of Wisconsin\u2013Madison, 2009.\n\n[17] H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the 5th\n\nACM Workshop on Computational Learning Theory, pages 287\u2013294, 1992.\n\n[18] S. Tong and D. Koller. Support vector machine active learning with applications to text clas-\nsi\ufb01cation. In Proceedings of the 17th International Conference on Machine Learning, pages\n999\u20131006, 2000.\n\n[19] Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang. Representative sampling for text classi\ufb01cation us-\ning support vector machines. In Proceedings of the 25th European Conference on Information\nRetrieval Research, pages 393\u2013407, 2003.\n\n[20] K. Yu, J. Bi, and V. Tresp. Active learning via transductive experimental design. In Proceedings\n\nof the 23th International Conference on Machine Learning, pages 1081\u20131088, 2006.\n\n9\n\n\f", "award": [], "sourceid": 694, "authors": [{"given_name": "Sheng-jun", "family_name": "Huang", "institution": null}, {"given_name": "Rong", "family_name": "Jin", "institution": null}, {"given_name": "Zhi-Hua", "family_name": "Zhou", "institution": ""}]}