{"title": "On the Recursive Teaching Dimension of VC Classes", "book": "Advances in Neural Information Processing Systems", "page_first": 2164, "page_last": 2171, "abstract": "The recursive teaching dimension (RTD) of a concept class $C \\subseteq \\{0, 1\\}^n$, introduced by Zilles et al. [ZLHZ11], is a complexity parameter measured by the worst-case number of labeled examples needed to learn any target concept of $C$ in the recursive teaching model. In this paper, we study the quantitative relation between RTD and the well-known learning complexity measure VC dimension (VCD), and improve the best known upper and (worst-case) lower bounds on the recursive teaching dimension with respect to the VC dimension. Given a concept class $C \\subseteq \\{0, 1\\}^n$ with $VCD(C) = d$, we first show that $RTD(C)$ is at most $d 2^{d+1}$. This is the first upper bound for $RTD(C)$ that depends only on $VCD(C)$, independent of the size of the concept class $|C|$ and its~domain size $n$. Before our work, the best known upper bound for $RTD(C)$ is $O(d 2^d \\log \\log |C|)$, obtained by Moran et al. [MSWY15]. We remove the $\\log \\log |C|$ factor. We also improve the lower bound on the worst-case ratio of $RTD(C)$ to $VCD(C)$. We present a family of classes $\\{ C_k \\}_{k \\ge 1}$ with $VCD(C_k) = 3k$ and $RTD(C_k)=5k$, which implies that the ratio of $RTD(C)$ to $VCD(C)$ in the worst case can be as large as $5/3$. Before our work, the largest ratio known was $3/2$ as obtained by Kuhlmann [Kuh99]. Since then, no finite concept class $C$ has been known to satisfy $RTD(C) > (3/2) VCD(C)$.", "full_text": "On the Recursive Teaching Dimension\n\nof VC Classes\n\nXi Chen\n\nDepartment of Computer Science\n\nColumbia University\n\nxichen@cs.columbia.edu\n\nYu Cheng\n\nDepartment of Computer Science\nUniversity of Southern California\n\nyu.cheng.1@usc.edu\n\nBo Tang\n\nDepartment of Computer Science\n\nOxford University\n\ntangbonk1@gmail.com\n\nAbstract\n\nThe recursive teaching dimension (RTD) of a concept class C \u2286 {0, 1}n, introduced\nby Zilles et al. [ZLHZ11], is a complexity parameter measured by the worst-case\nnumber of labeled examples needed to learn any target concept of C in the recursive\nteaching model. In this paper, we study the quantitative relation between RTD and\nthe well-known learning complexity measure VC dimension (VCD), and improve\nthe best known upper and (worst-case) lower bounds on the recursive teaching\ndimension with respect to the VC dimension.\nGiven a concept class C \u2286 {0, 1}n with VCD(C) = d, we \ufb01rst show that RTD(C)\nis at most d \u00b7 2d+1. This is the \ufb01rst upper bound for RTD(C) that depends only on\nVCD(C), independent of the size of the concept class |C| and its domain size n.\nBefore our work, the best known upper bound for RTD(C) is O(d2d log log |C|),\nobtained by Moran et al. [MSWY15]. We remove the log log |C| factor.\nWe also improve the lower bound on the worst-case ratio of RTD(C) to VCD(C).\nWe present a family of classes {Ck}k\u22651 with VCD(Ck) = 3k and RTD(Ck) = 5k,\nwhich implies that the ratio of RTD(C) to VCD(C) in the worst case can be as\nlarge as 5/3. Before our work, the largest ratio known was 3/2 as obtained by\nKuhlmann [Kuh99]. Since then, no \ufb01nite concept class C has been known to satisfy\nRTD(C) > (3/2) \u00b7 VCD(C).\n\n1\n\nIntroduction\n\nIn computational learning theory, one of the fundamental challenges is to understand how different\ninformation complexity measures arising from different learning models relate to each other. These\ncomplexity measures determine the worst-case number of labeled examples required to learn any\nconcept from a given concept class. For example, one of the most notable results along this line\nof research is that the sample complexity in PAC-learning is linearly related to the VC dimension\n[BEHW89]. Recall that the VC dimension of a concept class C \u2286 {0, 1}n [VC71], denoted by\nVCD(C), is the maximum size of a shattered subset of [n] = {1, . . . , n}, where we say Y \u2286 [n] is\nshattered if for every binary string b of length |Y |, there is a concept c \u2208 C such that c|Y = b. Here\nwe use c|X to denote the projection of c on X. As the best-studied information complexity measure,\nVC dimension is known to be closely related to many other complexity parameters, and it serves as a\nnatural parameter to compare against across various models of learning and teaching.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fInstead of the PAC-learning model where the algorithm takes random samples, we consider an\ninteractive learning model where a helpful teacher selects representative examples and present them\nto the learner, with the objective of minimizing the number of examples needed. The notion of a\nteaching set was introduced in mathematical models for teaching. The teaching set of a concept c \u2208 C\nis a set of indices (or examples) X \u2286 [n] that uniquely identi\ufb01es c from C. Formally, given a concept\nclass C \u2286 {0, 1}n (a set of binary strings of length n), X \u2286 [n] is a teaching set for a concept c \u2208 C\n(a binary string in C) if X satis\ufb01es\n\nc|X (cid:54)= c(cid:48)|X ,\n\nfor all other concepts c(cid:48) \u2208 C.\n\nThe teaching dimension of a concept class C is the smallest number t such that every c \u2208 C has a\nteaching set of size no more than t [GK95, SM90]. However, teaching dimension does not always\ncapture the cooperation in teaching and learning (as we will see in Example 2), and a more optimistic\nand realistic notion of recursive teaching dimension has been introduced and studied extensively in\nthe literature [Kuh99, DSZ10, ZLHZ11, WY12, DFSZ14, SSYZ14, MSWY15].\nDe\ufb01nition 1. The recursive teaching dimension of a class C \u2286 {0, 1}n, denoted by RTD(C), is the\nsmallest number t where one can order all the concepts of C as an ordered sequence c1, . . . , c|C| such\nthat every concept ci, i < |C|, has a teaching set of size no more than t in {ci, . . . , c|C|}.\nHence, RTD(C) measures the worst-case number of labeled examples needed to learn any target\nconcept in C, if the teacher and the learner are cooperative. We would like to emphasize that an\noptimal ordered sequence (as in De\ufb01nition 1) can be derived by the teacher and learner separately\nwithout any communication: They can put all concepts in C that have the smallest teaching dimension\nappear at the beginning of the sequence, then remove these concepts from C and proceeds recursively.\nBy de\ufb01nition, RTD(C) is always bounded from above by the teaching dimension of C but can be\nmuch smaller than the teaching dimension. We use the following example to illustrate the difference\nbetween the teaching dimension and the recursive teaching dimension.\nExample 2. Consider the class C \u2286 {0, 1}n with n + 1 concepts: the empty concept 0 and all the\nsingletons. For example when n = 3, C = {000, 100, 010, 001}. Each singleton concept has teaching\ndimension 1, while the teaching dimension for the empty concept 0 is n, because the teacher has to\nreveal all labels to distinguish 0 from the other concepts. However, if the teacher and the learner\nare cooperative, every concept can be taught with one label: If the teacher reveals a \u201c0\u201d label, the\nlearner can safely assume that the target concept must be 0, because otherwise the teacher would\npresent a \u201c1\u201d label instead for the other concepts. Equivalently, in the setting of De\ufb01nition 1, the\nteacher and the learner can order the concepts so that the singleton concepts appear before the empty\nconcept 0. Then every concept has a teaching set of size 1 to distinguish it from the later concepts in\nthe sequence, and thus the recursive teaching dimension of C is 1.\nIn this paper, we study the quantitative relationship between the recursive teaching dimension (RTD)\nand the VC dimension (VCD). A bound on the RTD that depends only on the VCD would imply a\nclose connection between learning from random samples and teaching (under the recursive teaching\nmodel). The same structural properties that make a concept class easy to learn would also give a\nbound on the number of examples needed to teach it. Moreover, the recursive teaching dimension is\nknown to be closely related to sample compression schemes [LW86, War03, DKSZ16], and a better\nunderstanding of the relationship between RTD and VCD might help resolve the long-standing sample\ncompression conjecture [War03], which states that every concept class has a sample compression\nscheme of size linear in its VCD.\n\n1.1 Our Results\nOur main result (Theorem 3) is an upper bound of d \u00b7 2d+1 on RTD(C) when VCD(C) = d. This\nis the \ufb01rst upper bound for RTD(C) that depends only on VCD(C), but not on |C|, the size of the\nconcept class, or n, the domain size. Previously, Moran et al. [MSWY15] showed an upper bound of\nO(d2d log log |C|) for RTD(C); our result removes the log log |C| factor, and answers positively an\nopen problem posed in [MSWY15].\nOur proof tries to reveal examples iteratively to minimize the number of the remaining concepts.\nGiven a concept class C \u2286 {0, 1}n, we pick a set of examples Y \u2286 [n] and their labels b \u2208 {0, 1}Y , so\nthat the set of remaining concepts (with the projection c|Y = b) is nonempty and has the smallest size\namong all choices of Y and b. We then prove by contradiction (with the assumption of VCD(C) = d)\n\n2\n\n\fthat, if the size of Y is large enough (but still depends on only VCD(C)), the remaining concepts\nmust have VC dimension at most d \u2212 1. This procedure gives us a recursive formula, which we solve\nand obtain the claimed upper bound on RTD of classes of VC dimension d.\nWe also improve the lower bound on the worst-case factor by which RTD may exceed VCD. We\npresent a family of classes {Ck}k\u22651 (Figure 4) with VCD(Ck) = 3k and RTD(Ck) = 5k, which shows\nthat the worst-case ratio between RTD(C) and VCD(C) is at least 5/3. Before our work, the largest\nknown multiplicative gap between RTD(C) and VCD(C) was a ratio of 3/2, given by Kuhlmann\n[Kuh99]. (Later Doliwa et al. [DFSZ14] showed the smallest class CW with RTD(CW ) = (3/2) \u00b7\nVCD(CW ) (Warmuth\u2019s class)). Since then, no \ufb01nite concept class C with RTD(C) > (3/2) \u00b7 VCD(C)\nhas been found.\nInstead of exhaustively searching through all small concept classes, our improvement on the lower\nbound is achieved by formulating the existence of a concept class with the desired RTD, VCD and\ndomain size, as a boolean satis\ufb01ability problem. We then run the state-of-the-art SAT solvers on\nthese formulae to discover a concept class C0 with VCD(C0) = 3 and RTD(C0) = 5. Based on the\nconcept class C0, one can produce a family of concept classes {Ck}k\u22651 with VCD(Ck) = 3k and\nRTD(Ck) = 5k, by taking the Cartesian product of k copies of C0: Ck = C0 \u00d7 . . . \u00d7 C0.\n\n2 Upper Bound on the Recursive Teaching Dimension\nIn this section, we prove the following upper bound on RTD(C) with respect to VCD(C).\nTheorem 3. Let C \u2286 {0, 1}n be a class with VCD(C) = d. Then RTD(C) \u2264 2d+1(d \u2212 2) + d + 4.\nGiven a class C, we use TSmin(C) to denote the smallest integer t such that at least one concept c \u2208 C\nhas a teaching set of size t. Notice that TSmin(C) is different from teaching dimension. Teaching\ndimension is de\ufb01ned as the smallest t such that every c \u2208 C has a teaching set of size at most t.)\nTheorem 3 follows directly from Lemma 4 and the observation that the VC dimension of a concept\nclass does not increase after a concept is removed. (After removing a concept from C, the new class\nC(cid:48) still has VCD(C(cid:48)) \u2264 d, and one can apply Lemma 4 again to obtain another concept that has a\nteaching set of the desired size in C(cid:48) and repeat this process.)\nLemma 4. Let C \u2286 {0, 1}n be a class with VCD(C) = d. Then TSmin(C) \u2264 2d+1(d \u2212 2) + d + 4.\nWe start with some intuition by reviewing the proof of Kuhlmann [Kuh99] that every class C with\nVCD(C) = 1 must have a concept c \u2208 C with a teaching set of size 1. Given an index i \u2208 [n] and a\nb to denote the set of concepts c \u2208 C such that ci = b. The proof starts by\nbit b \u2208 {0, 1}, we use Ci\npicking an index i and a bit b such that Ci\nb is nonempty and has the smallest size among all choices of\ni and b. The proof then proceeds to show that Ci\nb contains a unique concept, which by the de\ufb01nition of\nCi\nb has a teaching set {i} of size 1. To see why Ci\nb must be a singleton set, we assume for contradiction\nthat it contains more than one concept. Then there exists an index j (cid:54)= i and two concepts c, c(cid:48) \u2208 Ci\nj = 1. Since C has VCD(C) = 1, {i, j} cannot be shattered and thus, all the\nsuch that cj = 0 and c(cid:48)\nconcepts c\u2217 \u2208 C with c\u2217\ni = 1 \u2212 b must share the same c\u2217\nj = 0. As a result, it is easy to verify\nthat Cj\nMoran et al. [MSWY15] used a similar approach to show that every so-called (3, 6)-class C has\nTSmin(C) at most 3. They de\ufb01ne a class C \u2286 {0, 1}n to be a (3, 6)-class if for any three indices\ni, j, k \u2208 [n], the projection of C onto {i, j, k} has at most 6 patterns. (In contrast, VCD(C) = 2\nmeans that the projection of C has at most 7 patterns. So C being a (3, 6)-class is a stronger condition\nthan VCD(C) = 2.) The proof of [MSWY15] starts by picking two indices i, j \u2208 [n] and two bits\nb1, b2 \u2208 {0, 1} such that Ci,j\n, i.e., the set of c \u2208 C such that ci = b1 and cj = b2, is nonempty\nand has the smallest size among all choices of i, j and b1, b2. They then prove by contradiction that\nVCD(Ci,j\nOur proof extends this approach further. Given a concept class C \u2286 {0, 1}n with VCD(C) = d, let\nk = 2d(d \u2212 1) + 1 and we pick a set Y \u2217 \u2282 [n] of k indices and a string b\n\u2217 \u2208 {0, 1}k such that CY \u2217\nb\u2217 ,\nthe set of c \u2208 C such that the projection c|Y \u2217 = b\n\u2217, is nonempty and has the smallest size among all\nchoices of Y and b. We then prove by contradiction (with the assumption of VCD(C) = d) that CY \u2217\nb\u2217\nmust have VC dimension at most d \u2212 1. This gives us a recursive formula that bounds the TSmin of\nclasses of VC dimension d, which we solve to obtain the upper bound stated in Lemma 4.\nWe now prove Lemma 4.\n\n) = 1, and combine with [Kuh99] to conclude that TSmin(C) \u2264 3.\n\nj , say c\u2217\n\n1 is a nonempty proper subset of Ci\n\nb, contradicting the choice of i and b at the beginning.\n\nb\n\nb1,b2\n\nb1,b2\n\n3\n\n\fProof of Lemma 4. We prove by induction on d. Let\n\nf (d) = max\n\nC : VCD(C)\u2264d\n\nTSmin(C).\n\nOur goal is to prove the following upper bound for f (d):\nf (d) \u2264 2d+1(d \u2212 2) + d + 4,\nThe base case of d = 1 follows directly from [Kuh99].\nFor the induction step, we show that condition (1) holds for some d > 1, assuming that it holds for\nd \u2212 1. Take any concept class C \u2286 {0, 1}n with VCD(C) \u2264 d. Let k = 2d(d \u2212 1) + 1. If n \u2264 k then\nwe are already done because\n\nfor all d \u2265 1.\n\n(1)\n\nTSmin(C) \u2264 n \u2264 k = 2d(d \u2212 1) + 1 \u2264 2d+1(d \u2212 2) + d + 4,\n\nwhere the last inequality holds for all d \u2265 1. Assume in the rest of the proof that n > k. Then any set\nof k indices Y \u2282 [n] partitions C into 2k (possibly empty) subsets, denoted by\n\nCY\nb = {c \u2208 C : c|Y = b},\n\nfor each b \u2208 {0, 1}k.\n\n\u2217 \u2208 {0, 1}k such that CY \u2217\n\nWe follow the approach of [Kuh99] and [MSWY15] to choose a set of k indices Y \u2217 \u2282 [n] as well as\nb\u2217 is nonempty and has the smallest size among all nonempty CY\nb ,\na string b\n\u2217\nover all choices of Y and b. Without loss of generality we assume below that Y \u2217 = [k] and b\n= 0\nis the all-zero string. For notational convenience, we also write Cb to denote CY \u2217\nfor b \u2208 {0, 1}k.\nNotice that if Cb\u2217 = CY \u2217\n\nb\u2217 has VC dimension at most d \u2212 1, then we have\nTSmin(C) \u2264 k + f (d \u2212 1) \u2264 2d+1(d \u2212 2) + d + 4,\n\nb\n\nusing the inductive hypothesis. This is because according to the de\ufb01nition of f, one of the concepts\nc \u2208 Cb\u2217 has a teaching set T \u2286 [n]\\ Y \u2217 of size at most f (d\u2212 1) to distinguish it from other concepts\nof Cb\u2217. Thus, [k] \u222a T is a teaching set of c in the original class C, of size at most k + f (d \u2212 1).\n\n0\n\n0\n\n0\n\n0\n\n1\n\n1\n\n1\n\n1\n\n0\n\n0\n0\n0\n1\n1\n0\n1\n1\n\u0018\u0018\u0018XXX0\n0\n\u0018\u0018\u0018XXX0\n1\n\u0018\u0018\u0018XXX1\n0\n\u0018\u0018\u0018XXX1\n1\n1 \u0018\u0018\u0018XXX0\n0\n\n\u2217\n\n= 0. Note that CY (cid:48)\n\nb\u2217 , after \ufb01xing \ufb01ve bits, has VCD(CY \u2217\n\n0 is indeed a nonempty proper subset of CY \u2217\n0 .\n\nFigure 1: An illustration for the proof of Lemma 4, TSmin(C) \u2264 6 when d = 2. We prove by\ncontradiction that the smallest nonempty set CY \u2217\nb\u2217 ) = 1, where\n= 0. In this example, we have Z = {6, 7}, Y (cid:48) = {2, 3, 4, 6, 7} and\nY \u2217 = {1, 2, 3, 4, 5} and b\n(cid:48)\nb\nFinally, we prove by contradiction that Cb\u2217 has VC dimension at most d \u2212 1. Assume that Cb\u2217 has\nVC dimension d. Then by de\ufb01nition there exists a set Z \u2286 [n] \\ Y \u2217 of d indices that is shattered by\nCb\u2217 (i.e., all the 2d possible strings appear in Cb\u2217 on Z). Observe that for each i \u2208 Y \u2217, the union of\nall Cb with bi = 1 (recall that b\n\u2217 is the all-zero string) must miss at least one string on Z, which we\ndenote by pi (choose one arbitrarily if more than one are missing); otherwise, C has a shattered set\nof size d + 1, i.e., Z \u222a {i}, contradicting with the assumption that VCD(C) \u2264 d. (See Figure 1 for\nan example when d = 2 and k = 5.) However, given that there are only 2d possibilities for each pi\n(and |Y \u2217| = k = 2d(d \u2212 1) + 1), it follows from the pigeonhole principle that there exists a subset\nK \u2282 Y \u2217 of size d such that pi = p for every i \u2208 K, for some p \u2208 {0, 1}d. Let Y (cid:48) = (Y \u2217 \\ K) \u222a Z\nis a nonempty and proper subset of CY \u2217\n(cid:48)\nbe a new set of k indices and let b\nb\u2217 , a\ncontradiction with our choice of Y \u2217 and b\nThis \ufb01nishes the induction and the proof of the lemma.\n\n= 0k\u2212d \u25e6 p. Then CY (cid:48)\nb(cid:48)\n\n\u2217.\n\n4\n\n\f3 Lower Bound on the Worst-Case Recursive Teaching Dimension\n\nWe also improve the lower bound on the worst-case factor by which RTD may exceed VCD.\nIn this section, we present an improved lower bound on the worst-case factor by which RTD(C) may\nexceed VCD(C). Recall the de\ufb01nition of TSmin(C), which denotes the number of examples needed\nto teach some concept in c \u2208 C. By de\ufb01nition we always have RTD(C) \u2265 TSmin(C) for any class C.\nKuhlmann [Kuh99] \ufb01rst found a class C such that RTD(C) = TSmin(C) = 3 and VCD(C) = 2, with\ndomain size n = 16 and |C| = 24. Since then, no class C with RTD(C) > (3/2) \u00b7 VCD(C) has been\nfound. Recently, Doliwa et al. [DFSZ14] gave the smallest such class CW (Warmuth\u2019s class, as shown\nin Figure 2), with RTD(CW ) = TSmin(CW ) = 3, VCD(CW ) = 2, n = 5, and |CW| = 10. We can\nview CW as taking all \ufb01ve possible rotations of the two concepts (0, 0, 0, 1, 1) and (0, 1, 0, 1, 1).\n\nx1\n\nx2\n\nx3\n\nx4\n\nx5\n\n0\n0\n0\n1\n1\n0\n1\n0\n1\n1\n\n0\n0\n1\n1\n0\n1\n0\n1\n1\n0\n\n0\n1\n1\n0\n0\n0\n1\n1\n0\n1\n\n(a)\n\n1\n1\n0\n0\n0\n1\n1\n0\n1\n0\n\n1\n0\n0\n0\n1\n1\n0\n1\n0\n1\n\nx1\n\nx2\n\nx3\n\nx4\n\nx5\n\n0\n0\n\n0\n1\n\n0\n0\n\n(b)\n\n1\n1\n\n1\n1\n\nFigure 2: (a) Warmuth\u2019s class CW with RTD(CW ) = 3 and VCD(CW ) = 2; (b) The succinct\nrepresentation of CW with one concept selected from each rotation-equivalent set of concepts.\nThe teaching set of each concept is marked with underline.\nGiven CW one can obtain a family of classes {Ck}k\u22651 by taking the Cartesian product of k copies:\n\nCk = Ck\n\nW = CW \u00d7 \u00b7\u00b7\u00b7 \u00d7 CW ,\n\nand it follows from the next lemma that RTD(Ck) = TSmin(Ck) = 3k and VCD(Ck) = 2k.\nLemma 5 (Lemma 16 of [DFSZ14]). Given two concept classes C1 and C2.\nLet C1 \u00d7 C2 = {(c1, c2) | c1 \u2208 C1, c2 \u2208 C2}. Then\n\nTSmin(C1 \u00d7 C2) = TSmin(C1) + TSmin(C2),\nRTD(C1 \u00d7 C2) \u2264 RTD(C1) + RTD(C2),\nVCD(C1 \u00d7 C2) = VCD(C1) + VCD(C2).\n\nand\n\nLemma 5 allows us to focus on \ufb01nding small concept classes with RTD(C) > (3/2) \u00b7 VCD(C). The\n\ufb01rst attempt to \ufb01nd such classes is to exhaustively search over all possible binary matrices and then\ncompute and compare their VCD and RTD. But brute-force search quickly becomes infeasible as the\ndomain size n gets larger. For example, even the class CW has \ufb01fty 0/1 entries. Instead, we formulate\nthe existence of a class with certain desired RTD, VCD, and domain size, as a boolean satis\ufb01ability\nproblem, and then run state-of-the-art Boolean Satis\ufb01ability (SAT) solvers to see whether the boolean\nformula is satis\ufb01able or not.\nWe brie\ufb02y describe how to construct an equivalent boolean formula in conjunctive normal form\n(CNF). For a \ufb01xed domain size n, we have 2n basic variables xc, each describing whether a concept\nc \u2208 {0, 1}n is included in C or not. We need VC dimension to be at most VCD, which is equivalent to\nrequiring that every set S \u2286 [n] of size |S| = VCD + 1 is not shattered by C. So we de\ufb01ne auxiliary\nvariables y(S,b) for each set S of size |S| = VCD + 1, and every string b \u2208 {0, 1}S, indicating\nwhether a speci\ufb01c pattern b appears in the projection of C on S or not. These auxiliary variables are\ndecided by the basic variables, and for every S, at least one of the 2|S| patterns must be missing on S.\n\n5\n\n\fFor the minimum teaching dimension to be at least RTD, we cannot teach any row with RTD \u2212 1\nlabels. So for every concept c, and every set of indices T \u2286 [n] of size |T| = RTD \u2212 1, we need\nat least one other concept c(cid:48) (cid:54)= c satisfying c|T = c(cid:48) |T so that c(cid:48) is there to \u201cconfuse\u201d c on T . As\nan example, we list one clause of each type, from the SAT instance with n = 5, VCD = 2, and\nRTD = 3:\n\nx01011 \u2192 y({1,2,3},010),\n\n\u00ac y({1,2,3},b),\n\nx01011 \u2192 (cid:95)\n\nx(01,b).\n\nb(cid:54)=011\n\n(cid:95)\n\nb\n\nNote that there are many ways to formulate our problem as a SAT instance. For example, we could\ndirectly use a boolean variable for each entry of the matrix. But in our experiments, the SAT solvers\nrun faster using the formulation described above. The SAT solvers we use are Lingeling [Bie15]\nand Glucose [AS14] (based on MiniSAT [ES03]). We are able to rediscover CW and rule out the\nexistence of concept classes for certain small values of (VCD, RTD, n); see Figure 3.\n\nSatis\ufb01able Concept Class\nCW (Figure 2)\n\nVCD(C) RTD(C) n (domain size)\n\n2\n2\n3\n3\n4\n4\n\n3\n\n3\n4\n5\n6\n6\n7\n\n5\n\n5\n7\n7\n8\n7\n8\n\n12\n\nYes\nNo\nNo\nNo\nNo\nNo\nYes\n\nFigure 4\n\nFigure 3: The satis\ufb01ability of the boolean formulae for small values of VCD(C), RTD(C), and n.\n\nUnfortunately for n > 8, even these SAT solvers are no longer feasible. We use another heuristic\nto speed up the SAT solvers when we conjecture the formula to be satis\ufb01able \u2014 adding additional\nclauses to the SAT formula so that it has fewer solutions (but hopefully still satis\ufb01able), and faster\nto solve. More speci\ufb01cally, we bundle all the rotation-equivalent concepts, that is if we include a\nconcept, we must also include all its rotations. Note that with this restriction, we can reduce the\nnumber of variables by having one for each rotation-equivalent set; we can also reduce the number of\nclauses, since if S is not shattered, then we know all rotations of S are also not shattered.\nWe manage to \ufb01nd a class C0 with RTD(C0) = TSmin(C0) = 5 and VCD(C) = 3, and domain size\nn = 12. A succinct representation of C0 is given in Figure 4, where all rotation-equivalent concepts\n(i.e. rows) are omitted. The \ufb01rst 8 rows each represents 12 concepts, and the last row represents 4\nconcepts (because it is more symmetric), with a total of |C0| = 100 concepts. We also include a text\n\ufb01le with the entire concept class C0 (as a 100 \u00d7 12 matrix) in the supplemental material. Applying\nLemma 5, we obtain a family of concept classes {Ck}k\u22651, where Ck = C0 \u00d7 \u00b7\u00b7\u00b7 \u00d7 C0 is the Cartesian\nproduct of k copies of C0, that satisfy RTD(Ck) = 5k and VCD(Ck) = 3k.\n\n4 Conclusion and Open Problem\n\nWe improve the best known upper and lower bounds for the worst-case recursive teaching dimension\nwith respect to VC dimension. Given a concept class C with d = VCD(C) we improve the upper\nbound RTD(C) = O(d2d log log |C|) of Moran et al. [MSWY15] to 2d+1(d \u2212 2) + d + 4, removing\nthe log log |C| factor as well as the dependency on |C|. In addition, we improve the lower bound\nmaxC(RTD(C)/VCD(C)) \u2265 3/2 of Kuhlmann [Kuh99] to maxC(RTD(C)/VCD(C)) \u2265 5/3.\nOur results are a step towards answering the following question:\nIs RTD(C) = O(VCD(C))?\n\nposed by Simon and Zilles [SZ15].\nWhile Kuhlmann [Kuh99] showed that RTD(C) = 1 when VCD(C) = 1, the simplest case that is still\nopen is to give a tight bound on RTD(C) when VCD(C) = 2: Doliwa et al. [DFSZ14] presented a\nconcept class C (Warmuth\u2019s class) with RTD(C) = 3, while our Theorem 3 shows that RTD(C) \u2264 6.\n\n6\n\n\fx1\n\nx2\n\nx3\n\nx4\n\nx5\n\nx6\n\nx7\n\nx8\n\nx9\n\nx10\n\nx11\n\nx12\n\n0\n0\n0\n0\n0\n0\n0\n0\n0\n\n0\n0\n0\n0\n0\n0\n0\n1\n1\n\n0\n0\n0\n0\n0\n1\n1\n0\n1\n\n0\n0\n0\n1\n1\n1\n1\n1\n1\n\n0\n0\n1\n0\n1\n0\n0\n0\n0\n\n1\n1\n1\n1\n1\n1\n1\n1\n1\n\n0\n1\n0\n1\n0\n0\n1\n1\n1\n\n1\n1\n1\n1\n1\n1\n1\n1\n1\n\n0\n0\n0\n0\n0\n0\n0\n0\n0\n\n1\n1\n1\n1\n1\n1\n1\n1\n1\n\n0\n0\n0\n0\n0\n0\n0\n1\n1\n\n1\n1\n1\n1\n1\n1\n1\n1\n1\n\nFigure 4: The succinct representation of a concept class C0 with RTD(C0) = 5 and VCD(C0) = 3.\nThe teaching set of each concept is marked with underline.\n\nAcknowledgments\n\nWe thank the anonymous reviewers for their helpful comments and suggestions. We also thank Joseph\nBebel for pointing us to the SAT solvers. This work was done in part while the authors were visiting\nthe Simons Institute for the Theory of Computing. Xi Chen is supported by NSF grants CCF-1149257\nand CCF-1423100. Yu Cheng is supported in part by Shang-Hua Teng\u2019s Simons Investigator Award.\nBo Tang is supported by ERC grant 321171.\n\nReferences\n\n[AS14] G. Audemard and L. Simon. Glucose 4.0. 2014. Available at\n\nhttp://www.labri.fr/perso/lsimon/glucose.\n\n[BEHW89] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the\n\nVapnik-Chervonenkis dimension. J. ACM, 36(4):929\u2013965, 1989.\n\n[Bie15] A. Biere. Lingeling, Plingeling and Treengeling. 2015. Available at\n\nhttp://fmv.jku.at/lingeling.\n\n[DFSZ14] T. Doliwa, G. Fan, H.-U. Simon, and S. Zilles. Recursive teaching dimension,\n\nVC-dimension and sample compression. Journal of Machine Learning Research,\n15(1):3107\u20133131, 2014.\n\n[DKSZ16] M. Darnst\u00e4dt, T. Kiss, H. U. Simon, and S. Zilles. Order compression schemes. Theor.\n\nComput. Sci., 620:73\u201390, 2016.\n\n[DSZ10] T. Doliwa, H.-U. Simon, and S. Zilles. Recursive teaching dimension, learning\n\ncomplexity, and maximum classes. In Proceedings of the 21st International Conference\non Algorithmic Learning Theory, pages 209\u2013223, 2010.\n\n[ES03] N. E\u00e9n and N. S\u00f6rensson. An extensible SAT-solver. In Theory and Applications of\nSatis\ufb01ability Testing, 6th International Conference, SAT 2003., pages 502\u2013518, 2003.\n\n[GK95] S. A. Goldman and M. J. Kearns. On the complexity of teaching. Journal of Computer\n\nand System Sciences, 50(1):20\u201331, 1995.\n\n[Kuh99] C. Kuhlmann. On teaching and learning intersection-closed concept classes. In\n\nProceedings of the 4th European Conference on Computational Learning Theory, pages\n168\u2013182, 1999.\n\n[LW86] N. Littlestone and M. Warmuth. Relating data compression and learnability. Technical\n\nreport, University of California, Santa Cruz, 1986.\n\n7\n\n\f[MSWY15] S. Moran, A. Shpilka, A. Wigderson, and A. Yehudayoff. Compressing and teaching\n\nfor low VC-dimension. In Proceedings of the 56th IEEE Annual Symposium on\nFoundations of Computer Science, pages 40\u201351, 2015.\n\n[SM90] A. Shinohara and S. Miyano. Teachability in computational learning. In ALT, pages\n\n247\u2013255, 1990.\n\n[SSYZ14] R. Samei, P. Semukhin, B. Yang, and S. Zilles. Algebraic methods proving Sauer\u2019s\n\nbound for teaching complexity. Theoretical Computer Science, 558:35\u201350, 2014.\n\n[SZ15] H.-U. Simon and S. Zilles. Open problem: Recursive teaching dimension versus VC\n\ndimension. In Proceedings of the 28th Conference on Learning Theory, pages\n1770\u20131772, 2015.\n\n[VC71] V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative\n\nfrequencies of events to their probabilities. Theory of Probability and Its Applications,\n16:264\u2013280, 1971.\n\n[War03] M. K. Warmuth. Compressing to VC dimension many points. In Proceedings of the\n\n16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop,\nCOLT/Kernel 2003, pages 743\u2013744, 2003.\n\n[WY12] A. Wigderson and A. Yehudayoff. Population recovery and partial identi\ufb01cation. In\n\nProceedings of the 53rd IEEE Annual Symposium on Foundations of Computer Science,\npages 390\u2013399, 2012.\n\n[ZLHZ11] S. Zilles, S. Lange, R. Holte, and M. Zinkevich. Models of cooperative teaching and\n\nlearning. Journal of Machine Learning Research, 12:349\u2013384, 2011.\n\n8\n\n\f", "award": [], "sourceid": 1129, "authors": [{"given_name": "Xi", "family_name": "Chen", "institution": "Columbia University"}, {"given_name": "Xi", "family_name": "Chen", "institution": "Columbia University"}, {"given_name": "Yu", "family_name": "Cheng", "institution": "U of Southern California"}, {"given_name": "Bo", "family_name": "Tang", "institution": "University of Oxford"}]}