{"title": "Subsampled Power Iteration: a Unified Algorithm for Block Models and Planted CSP's", "book": "Advances in Neural Information Processing Systems", "page_first": 2836, "page_last": 2844, "abstract": "We present an algorithm for recovering planted solutions in two well-known models, the stochastic block model and planted constraint satisfaction problems (CSP), via a common generalization in terms of random bipartite graphs. Our algorithm matches up to a constant factor the best-known bounds for the number of edges (or constraints) needed for perfect recovery and its running time is linear in the number of edges used. The time complexity is significantly better than both spectral and SDP-based approaches.The main contribution of the algorithm is in the case of unequal sizes in the bipartition that arises in our reduction from the planted CSP. Here our algorithm succeeds at a significantly lower density than the spectral approaches, surpassing a barrier based on the spectral norm of a random matrix.Other significant features of the algorithm and analysis include (i) the critical use of power iteration with subsampling, which might be of independent interest; its analysis requires keeping track of multiple norms of an evolving solution (ii) the algorithm can be implemented statistically, i.e., with very limited access to the input distribution (iii) the algorithm is extremely simple to implement and runs in linear time, and thus is practical even for very large instances.", "full_text": "Subsampled Power Iteration: a Uni\ufb01ed Algorithm for\n\nBlock Models and Planted CSP\u2019s\n\nVitaly Feldman\n\nIBM Research - Almaden\n\nvitaly@post.harvard.edu\n\nWill Perkins\n\nUniversity of Birmingham\n\nw.f.perkins@bham.ac.uk\n\nSantosh Vempala\n\nGeorgia Tech\n\nvempala@cc.gatech.edu\n\nAbstract\n\nWe present an algorithm for recovering planted solutions in two well-known mod-\nels, the stochastic block model and planted constraint satisfaction problems (CSP),\nvia a common generalization in terms of random bipartite graphs. Our algorithm\nmatches up to a constant factor the best-known bounds for the number of edges (or\nconstraints) needed for perfect recovery and its running time is linear in the num-\nber of edges used. The time complexity is signi\ufb01cantly better than both spectral\nand SDP-based approaches.\nThe main contribution of the algorithm is in the case of unequal sizes in the bi-\npartition that arises in our reduction from the planted CSP. Here our algorithm\nsucceeds at a signi\ufb01cantly lower density than the spectral approaches, surpassing\na barrier based on the spectral norm of a random matrix.\nOther signi\ufb01cant features of the algorithm and analysis include (i) the critical use\nof power iteration with subsampling, which might be of independent interest; its\nanalysis requires keeping track of multiple norms of an evolving solution (ii) the\nalgorithm can be implemented statistically, i.e., with very limited access to the\ninput distribution (iii) the algorithm is extremely simple to implement and runs in\nlinear time, and thus is practical even for very large instances.\n\n1\n\nIntroduction\n\nA broad class of learning problems \ufb01ts into the framework of obtaining a sequence of independent\nrandom samples from a unknown distribution, and then (approximately) recovering this distribution\nusing as few samples as possible. We consider two natural instances of this framework: the stochas-\ntic block model in which a random graph is formed by choosing edges independently at random\nwith probabilities that depend on whether an edge crosses a planted partition, and planted k-CSP\u2019s\n(or planted k-SAT) in which width-k boolean constraints are chosen independently at random with\nprobabilities that depend on their evaluation on a planted assignment to a set of boolean variables.\nWe propose a natural bipartite generalization of the stochastic block model, and then show that\nplanted k-CSP\u2019s can be reduced to this model, thus unifying graph partitioning and planted CSP\u2019s\ninto one problem. We then give an algorithm for solving random instances of the model. Our\nalgorithm is optimal up to a constant factor in terms of number of sampled edges and running time\nfor the bipartite block model; for planted CSP\u2019s the algorithm matches up to log factors the best\npossible sample complexity in several restricted computational models and the best-known bounds\nfor any algorithm. A key feature of the algorithm is that when one side of the bipartition is much\n\n1\n\n\flarger than the other, then our algorithm succeeds at signi\ufb01cantly lower edge densities than using\nSingular Value Decomposition (SVD) on the rectangular adjacency matrix. Details are in Sec. 5.\nThe bipartite block model begins with two vertex sets, V1 and V2 (of possibly unequal size), each\nwith a balanced partition, (A1, B1) and (A2, B2) respectively. Edges are added independently at\nrandom between V1 and V2 with probabilities that depend on which parts the endpoints are in: edges\nbetween A1 and A2 or B1 and B2 are added with probability \u03b4p, while the other edges are added\nwith probability (2\u2212 \u03b4)p, where \u03b4 \u2208 [0, 2] and p is the overall edge density. To obtain the stochastic\nblock model we can identify V1 and V2. To reduce planted CSP\u2019s to this model, we \ufb01rst reduce the\nproblem to an instance of noisy r-XOR-SAT, where r is the complexity parameter of the planted\nCSP distribution de\ufb01ned in [19] (see Sec. 2 for details). We then identify V1 with literals, and V2\nwith (r \u2212 1)-tuples of literals, and add an edge between literal l \u2208 V1 and tuple t \u2208 V2 when the\nr-clause consisting of their union appears in the formula. The reduction leads to a bipartition with\nV2 much larger than V1.\nOur algorithm is based on applying power iteration with a sequence of matrices subsampled from the\noriginal adjacency matrix. This is in contrast to previous algorithms that compute the eigenvectors\n(or singular vectors) of the full adjacency matrix. Our algorithm has several advantages. Such\nan algorithm, for the special case of square matrices, was previously proposed and analyzed in a\ndifferent context by Korada et al. [25].\n\n\u2022 Up to a constant factor, the algorithm matches the best-known (and in some cases the best-\npossible) edge or constraint density needed for complete recovery of the planted partition or\nassignment. The algorithm for planted CSP\u2019s \ufb01nds the planted assignment using O(nr/2 \u00b7\nlog n) clauses for a clause distribution of complexity r (see Sec. 2 for the formal de\ufb01nition),\nnearly matching computational lower bounds for SDP hierarchies [30] and the class of\nstatistical algorithms [19].\n\u2022 The algorithm is fast, running in time linear in the number of edges or constraints used,\nunlike other approaches that require computing eigenvectors or solving semi-de\ufb01nite pro-\ngrams.\n\u2022 The algorithm is conceptually simple and easy to implement. In fact it can be implemented\n\u2022 It is based on the idea of iteration with subsampling which may have further applications\nin the design and analysis of algorithms.\n\u2022 Most notably, the algorithm succeeds where generic spectral approaches fail. For the case\nof the planted CSP, when |V2| (cid:29) |V1|, our algorithm succeeds at a polynomial factor\nsparser density than the approaches of McSherry [28], Coja-Oghlan [7], and Vu [33]. The\nalgorithm succeeds despite the fact that the \u2018energy\u2019 of the planted vector with respect to\nthe random adjacency matrix is far below the spectral norm of the matrix.\nIn previous\nanalyses, this was believed to indicate failure of the spectral approach. See Sec. 5.\n\nin the statistical query model, with very limited access to the input graph [19].\n\n1.1 Related work\n\nThe algorithm of Mossel, Neeman and Sly [29] for the standard stochastic block model also runs\nin near linear time, while other known algorithmic approaches for planted partitioning that succeed\nnear the optimal edge density [28, 7, 27] perform eigenvector or singular vector computations and\nthus require superlinear time, though a careful randomized implementation of low-rank approxima-\ntions can reduce the running time of McSherry\u2019s algorithm substantially [2].\nFor planted satis\ufb01ability, the algorithm of Flaxman for planted 3-SAT works for a subset of planted\ndistributions (those with distribution complexity at most 2 in our de\ufb01nition below) using O(n) con-\nstraints, while the algorithm of Coja-Oghlan, Cooper, and Frieze [8] works for planted 3-SAT dis-\ntributions that exclude unsatis\ufb01ed clauses and uses O(n3/2 ln10 n) constraints.\nThe only previous algorithm that \ufb01nds the planted assignment for all distributions of planted k-\nCSP\u2019s is the SDP-based algorithm of Bogdanov and Qiao [5] with the folklore generalization to\nr-wise independent predicates (cf. [30]). Similar to our algorithm, it uses \u02dcO(nr/2) constraints. This\nalgorithm effectively solves the noisy r-XOR-SAT instance and therefore can be also used to solve\nour general version of planted satis\ufb01ability using \u02dcO(nr/2) clauses (via the reduction in Sec. 4).\n\n2\n\n\fNotably for both this algorithm and ours, having a completely satisfying planted assignment plays\nno special role: the number of constraints required depends only on the distribution complexity.To\nthe best of our knowledge, our algorithm is the \ufb01rst for the planted k-SAT problem that runs in linear\ntime in the number of constraints used.\nIt is important to note that in planted k-CSP\u2019s, the planted assignment becomes recoverable with\nhigh probability after at most O(n log n) random clauses yet the best known ef\ufb01cient algorithms\nrequire n\u2126(r/2) clauses. Problems exhibiting this type of behavior have attracted signi\ufb01cant interest\nin learning theory [4, 12, 31, 15, 32, 3, 10, 16] and some of the recent hardness results are based on\nthe conjectured computational hardness of the k-SAT refutation problem [10, 11].\nOur algorithm is arguably simpler than the approach in [5] and substantially improves the running\ntime even for small k. Another advantage of our approach is that it can be implemented using\nrestricted access to the distribution of constraints referred to as statistical queries [24, 17]. Roughly\nspeaking, for the planted SAT problem this access allows an algorithm to evaluate multi-valued\nfunctions of a single clause on randomly drawn clauses or to estimate expectations of such functions,\nwithout direct access to the clauses themselves. Recently, in [19], lower bounds on the number of\nclauses necessary for a polynomial-time statistical algorithm to solve planted k-CSPs were proved.\nIt is therefore important to understand the power of such algorithms for solving planted k-CSPs.\nA statistical implementation of our algorithm gives an upper bound that nearly matches the lower\nbound for the problem. See [19] for the formal details of the model and statistical implementation\nof our algorithm.\nKorada, Montanari and Oh [25] analyzed the \u2018Gossip PCA\u2019 algorithm, which for the special case\nof an equal bipartition is the same as our subsampled power iteration. The assumptions, model,\nand motivation in the two papers are different and the results incomparable. In particular, while\nour focus and motivation are on general (nonsquare) matrices, their work considers extracting a\nplanting of rank k greater than 1 in the square setting. Their results also assume an initial vector\nwith non-trivial correlation with the planted vector. The nature of the guarantees is also different.\n\n2 Model and results\n\nBipartite stochastic block model:\nDe\ufb01nition 1. For \u03b4 \u2208 [0, 2] \\ {1}, n1, n2 even, and P1 = (A1, B1), P2 = (A2, B2) biparti-\ntions of vertex sets V1, V2 of size n1, n2 respectively, we de\ufb01ne the bipartite stochastic block model\nB(n1, n2,P1,P2, \u03b4, p) to be the random graph in which edges between vertices in A1 and A2 and\nB1 and B2 are added independently with probability \u03b4p and edges between vertices in A1 and B2\nand B1 and A2 with probability (2 \u2212 \u03b4)p.\nHere \u03b4 is a \ufb01xed constant while p will tend to 0 as n1, n2 \u2192 \u221e. Note that setting n1 = n2 = n, and\nidentifying A1 and A2 and B1 and B2 gives the usual stochastic block model (with loops allowed);\nfor edge probabilities a/n and b/n, we have \u03b4 = 2a/(a + b) and p = (a + b)/2n, the overall edge\ndensity. For our application to k-CSP\u2019s, it will be crucial to allow vertex sets of very different sizes,\ni.e. n2 (cid:29) n1.\nThe algorithmic task for the bipartite block model is to recover one or both partitions (completely\nor partially) using as few edges and as little computational time as possible. In this work we will\nassume that n1 \u2264 n2, and we will be concerned with the algorithmic task of recovering the partition\nP1 completely, as this will allow us to solve the planted k-CSP problems described below. We de\ufb01ne\ncomplete recovery of P1 as \ufb01nding the exact partition with high probability over the randomness in\nthe graph and in the algorithm.\nTheorem 1. Assume n1 \u2264 n2. There is a constant C so that the Subsampled Power Iteration\nalgorithm described below completely recovers the partition P1 in the bipartite stochastic block\nmodel B(n1, n2,P1,P2, \u03b4, p) with probability 1 \u2212 o(1) as n1 \u2192 \u221e when p \u2265 C log n1\n(\u03b4\u22121)2\u221a\n. Its\nrunning time is O\n\n(cid:16)\u221a\n\nn1n2\n\n(cid:17)\n\n.\n\nn1n2 \u00b7 log n1\n(\u03b4\u22121)2\n\nNote that for the usual stochastic block model this gives an algorithm using O(n log n) edges and\nO(n log n) time, which is the best possible for complete recovery since that many edges are needed\nfor every vertex to appear in at least edge. With edge probabilities a log n/n and b log n/n, our\n\n3\n\n\f\u221a\n\nresults require (a \u2212 b)2 \u2265 C(a + b) for some absolute constant C, matching the dependence on a\nand b in [6, 28] (see [1] for a discussion of the best possible threshold for complete recovery).\nFor any n1, n2, at least\nn1n2 edges are necessary for even non-trivial partial recovery, as below\nthat threshold the graph consists only of small components (and even if a correct partition is found\n\u221a\non each component, correlating the partitions of different components is impossible). Similarly at\nn1n2 log n1) are needed for complete recover of P1 since below that density, there are\nleast \u2126(\nvertices in V1 joined only to vertices of degree 1 in V2.\nFor very lopsided graphs, with n2 (cid:29) n1 log2 n1, the running time is sublinear in the size of V2; this\nrequires careful implementation and is essential to achieving the running time bounds for planted\nCSP\u2019s described below.\n\nPlanted k-CSP\u2019s: We now describe a general model for planted satis\ufb01ability problems intro-\nlet Ck be the set of all ordered k-tuples of literals from\nduced in [19]. For an integer k,\nx1, . . . , xn, x1, . . . , xn with no repetition of variables. For a k-tuple of literals C and an assign-\nment \u03c3, \u03c3(C) denotes the vector of values that \u03c3 assigns to the literals in C. A planting distribution\nQ : {\u00b11}k \u2192 [0, 1] is a probability distribution over {\u00b11}k.\nDe\ufb01nition 2. Given a planting distribution Q : {\u00b11}k \u2192 [0, 1], and an assignment \u03c3 \u2208 {\u00b11}n,\nwe de\ufb01ne the random constraint satisfaction problem FQ,\u03c3(n, m) by drawing m k-clauses from Ck\nindependently according to the distribution\n\nQ\u03c3(C) =\n\nQ(\u03c3(C))\n\n(cid:80)\n\nQ(\u03c3(C(cid:48)))\n\nC(cid:48)\u2208Ck\n\nwhere \u03c3(C) is the vector of values that \u03c3 assigns to the k-tuple of literals comprising C.\nDe\ufb01nition 3. The distribution complexity r(Q) of the planting distribution Q is the smallest integer\nr \u2265 1 so that there is some S \u2286 [k], |S| = r, so that the discrete Fourier coef\ufb01cient \u02c6Q(S) is\nnon-zero.\nIn other words, the distribution complexity of Q is r if Q is an (r\u2212 1)-wise independent distribution\non {\u00b11}k but not an r-wise independent distribution. The uniform distribution over all clauses,\nQ \u2261 2\u2212k, has \u02c6Q(S) = 0 for all |S| \u2265 1, and so we de\ufb01ne its complexity to be \u221e. The uniform\ndistribution does not reveal any information about \u03c3, and so inference is impossible. For any Q that\nis not the uniform distribution over clauses, we have 1 \u2264 r(Q) \u2264 k.\nNote that the uniform distribution on k-SAT clauses with at least one satis\ufb01ed literal under \u03c3 has\ndistribution complexity r = 1. r = 1 means that there is a bias towards either true or false literals.\nIn this case, a very simple algorithm is effective: for each variable, count the number of times it\nappears negated and not negated, and take the majority vote. For distributions with complexity\nr \u2265 2, the expected number of true and false literals in the random formula are equal and so this\nsimple algorithm fails.\nTheorem 2. For any planting distribution Q, there exists an algorithm that for any assignment\n\u03c3, given an instance of FQ,\u03c3(n, m) completely recovers the planted assignment \u03c3 for m =\nO(nr/2 log n) using O(nr/2 log n) time, where r \u2265 2 is the distribution complexity of Q. For\ndistribution complexity r = 1, there is an algorithm that gives non-trivial partial recovery with\nO(n1/2) constraints and complete recovery with O(n log n) constraints.\n\n3 The algorithm\n\nWe now present our algorithm for the bipartite stochastic block model. We de\ufb01ne vectors u and v\nof dimension n1 and n2 respectively, indexed by V1 and V2, with ui = 1 for i \u2208 A1, ui = \u22121\nfor i \u2208 B1, and similarly for v. To recover the partition P1 it suf\ufb01ces to \ufb01nd either u or \u2212u. We\nwill \ufb01nd this vector by multiplying a random initial vector x0 by a sequence of centered adjacency\nmatrices and their transposes.\nWe form these matrices as follows: let Gp be the random bipartite graph drawn from the model\nB(n1, n2,P1,P2, \u03b4, p), and T a positive integer. Then form T different bipartite graphs G1, . . . , GT\non the same vertex sets V1, V2 by placing each edge from Gp uniformly and independently at random\ninto one of the T graphs. The resulting graphs have the same marginal distribution.\n\n4\n\n\fNext we form the n1 \u00d7 n2 adjacency matrices A1, . . . , AT for G1, . . . GT with rows indexed by V1\nand columns by V2 with a 1 in entry (i, j) if vertex i \u2208 V1 is joined to vertex j \u2208 V2. Finally we\ncenter the matrices by de\ufb01ning Mi = Ai \u2212 p\nThe basic iterative steps are the multiplications y = M T x and x = M y.\n\nT J where J is the n1 \u00d7 n2 all ones matrix.\n\nAlgorithm: Subsampled Power Iteration.\n\n1. Form T = 10 log n1 matrices M1, . . . , MT by uniformly and independently assigning\neach edge of the bipartite block model to a graph G1, . . . , GT , then forming the matrices\nMi = Ai \u2212 p\n\nT J, where Ai is the adjacency matrix of Gi and J is the all ones matrix.\n\n2. Sample x \u2208 {\u00b11}n1 uniformly at random and let x0 = x\u221a\n3. For i = 1 to T /2 let\n\n.\n\nn1\n\nyi =\n\n2i\u22121xi\u22121\nM T\n2i\u22121xi\u22121(cid:107) ;\n(cid:107)M T\n\nxi =\n\nM2iyi\n(cid:107)M2iyi(cid:107) ;\n\nzi = sgn(xi).\n\n4. For each coordinate j \u2208 [n1] take the majority vote of the signs of zi\n\nj for all i \u2208\n\n{T /4, . . . , T /2} and call this vector v:\n\n\uf8eb\uf8ed T(cid:88)\n\n\uf8f6\uf8f8 .\n\nzi\nj\n\nvj = sgn\n\n5. Return the partition indicated by v.\n\ni=T /2\n\nThe analysis of the resampled power iteration algorithm proceeds in four phases, during which\nwe track the progress of two vectors xi and yi, as measured by their inner product with u and v\nrespectively. We de\ufb01ne Ui := u \u00b7 xi and Vi := v \u00b7 yi. Here we give an overview of each phase:\n\n\u2022 Phase 1. Within log n1 iterations, |Ui| reaches log n1. We show that conditioned on the\nvalue of Ui, there is at least a 1/2 chance that |Ui+1| \u2265 2|Ui|; that Ui never gets too small;\nand that in log n1 steps, a run of log log n1 doublings pushes the magnitude of Ui above\nlog n1.\n\u2022 Phase 2. After reaching log n1, |Ui| makes steady, predictable progress, doubling at each\nn1), at which point we say xi has strong correlation with u.\n\u2022 Phase 3. Once xi is strongly correlated with u, we show that zi+1 agrees with either u or\n\u2022 Phase 4. We show that taking the majority vote of the coordinate-by-coordinate signs of zi\n\n\u221a\nstep whp until it reaches \u0398(\n\n\u2212u on a large fraction of coordinates.\n\nover O(log n1) additional iterations gives complete recovery whp.\n\nRunning time\nIf n2 = \u0398(n1), then a straightforward implementation of the algorithm runs in\ntime linear in the number of edges used: each entry of xi = M yi (resp. yi = M T xi\u22121) can be\ncomputed as a sum over the edges in the graph associated with M. The rounding and majority vote\nare both linear in n1. However, if n2 (cid:29) n1, then simply initializing the vector yi will take too much\ntime. In this case, we have to implement the algorithm more carefully.\nSay we have a vector xi\u22121 and want to compute xi = M2iyi without storing the vector yi. Instead\n2i\u22121xi\u22121, we create a set Si \u2282 V2 of all vertices with degree at least 1 in the\nof computing yi = M T\ncurrent graph G2i\u22121 corresponding to the matrix M2i\u22121. The size of Si is bounded by the number\nof edges in G2i\u22121, and checking membership can be done in constant time with a data structure of\nsize O(|Si|) that requires expected time O(|Si|) to create [21].\nRecall that M2i\u22121 = A2i\u22121 \u2212 qJ. Then we can write\n\nyi = (A2i\u22121 \u2212 qJ)T xi\u22121 = \u02c6y \u2212 q\n\nxi\u22121\n\nj\n\n\uf8f6\uf8f8 1n2 = \u02c6y \u2212 qL1n2,\n\n\uf8eb\uf8ed n1(cid:88)\n\nj=1\n\n5\n\n\fwhere \u02c6y is 0 on coordinates j /\u2208 Si, L =(cid:80)n1\n\nThen to compute xi = M2iyi, we write\n\nj=1 xi\u22121\n\nj\n\n, and 1n2 is the all ones vector of length n2.\n\nxi = (A2i \u2212 qJ)yi = (A2i \u2212 qJ)(\u02c6y \u2212 qL1n2)\n\n= (A2i \u2212 qJ)\u02c6y \u2212 qLA2i1n2 + q2LJ1n2\n= A2i \u02c6y \u2212 qJ \u02c6y \u2212 qLA2i1n2 + q2Ln21n1\n\nthe number of edges of G2i\u22121. Computing L = (cid:80)n1\n\nWe bound the running time of the computation as follows: we can compute \u02c6y in linear time in the\nnumber of edges of G2i\u22121 using Si. Given \u02c6y, computing A2i \u02c6y is linear in the number of edges\nof G2i and computing qJ \u02c6y is linear in the number of non-zero entries of \u02c6y, which is bounded by\nis linear in n1 and gives q2Ln21n1.\nComputing qLA2i1n2 is linear in the number of edges of G2i. All together this gives our linear time\nimplementation.\n\nj=1 xi\u22121\n\nj\n\n4 Reduction of planted k-CSP\u2019s to the block model\n\nHere we describe how solving the bipartite block model suf\ufb01ces to solve the planted k-CSP prob-\nlems. Consider a planted k-SAT problem FQ,\u03c3(n, m) with distribution complexity r. Let S \u2286 [k],\n|S| = r, be such that \u02c6Q(S) = \u03b7 (cid:54)= 0. Such an S exists from the de\ufb01nition of the distribution\ncomplexity. We assume that we know both r and this set S, as trying all possibilities (smallest \ufb01rst)\nrequires only a constant factor (2r) more time.\nWe will restrict each k-clause in the formula to an r-clause, by taking the r literals speci\ufb01ed by the\nset S. If the distribution Q is known to be symmetric with respect to the order of the k-literals in\neach clause, or if clauses are given as unordered sets of literals, then we can simply sample a random\nset of r literals (without replacement) from each clause.\nWe will show that restricting to these r literals from each k-clause induces a distribution on r-clauses\nde\ufb01ned by Q\u03b4 : {\u00b11}r \u2192 R+ of the form Q\u03b4(C) = \u03b4/2r for |C| even, Q\u03b4(C) = (2 \u2212 \u03b4)/2r for\n|C| odd, for some \u03b4 \u2208 [0, 2] , \u03b4 (cid:54)= 1, where |C| is the number of TRUE literals in C under \u03c3. This\nreduction allows us to focus on algorithms for the speci\ufb01c case of a parity-based distribution on\nr-clauses with distribution complexity r.\nRecall that for a function f : {\u22121, 1}k \u2192 R, its Fourier coef\ufb01cients are de\ufb01ned for each subset\nS \u2282 [k] as\n\n\u02c6f (S) =\n\nE\n\nx\u223c{\u22121,1}k\n\n[f (x)\u03c7S(x)]\n\ni.e., \u03c7S(x) =(cid:81)\n\ni\u2208S xi.\n\nwhere \u03c7S are the Walsh basis functions of {\u00b11}k with respect to the uniform probability measure,\n\nLemma 1. If the function Q : {\u00b11}k \u2192 R+ de\ufb01nes a distribution Q\u03c3 on k-clauses with distribution\ncomplexity r and planted assignment \u03c3, then for some S \u2286 [k], |S| = r and \u03b4 \u2208 [0, 2]\\{1}, choosing\nr literals with indices in S from a clause drawn randomly from Q\u03c3 yields a random r-clause from\n\u03c3.\nQ\u03b4\nProof. From De\ufb01nition 3 we have that there exists an S with |S| = r such that \u02c6Q(S) (cid:54)= 0. Note that\nby de\ufb01nition,\n\n\u02c6Q(S) =\n\nE\n\nx\u223c{\u00b11}k\n\n[Q(x)\u03c7S(x)] =\n\n=\n\n=\n\n(cid:88)\n\uf8eb\uf8ed (cid:88)\n\nx\u2208{\u00b11}k\n\n1\n2k\n\n1\n2k\n\nQ(x)\u03c7S(x)\n\nQ(x) \u2212 (cid:88)\n\n\uf8f6\uf8f8\n\nQ(x)\n\nx:\u2208{\u00b11}k:xS even\n\nx:\u2208{\u00b11}k:xS odd\n\n1\n\n2k (Pr[xS even] \u2212 Pr[xS odd])\n\nwhere xS is x restricted to the coordinates in S, and so if we take \u03b4 = 1 + 2k \u02c6Q(S), the distribution\ninduced by restricting k-clauses to the r-clauses speci\ufb01ed by S is Q\u03b4\n\u03c3. Note that by the de\ufb01nition\n\n6\n\n\f\u221a\n\nof all (r \u2212 1)-tuples of literals. We have n1 = |V1| = 2n and n2 = |V2| = (cid:0) 2n\n\nUsing c(cid:112)t log(1/\u0001) clauses for c = O(1/|1\u2212\u03b4|2) this algorithm will give an assignment that agrees\n\nof the distribution complexity, \u02c6Q(T ) = 0 for any 1 \u2264 |T| < r, and so the original and induced\ndistributions are uniform over any set of r \u2212 1 coordinates.\nFirst consider the case r = 1. Restricting each clause to S for |S| = 1, induces a noisy 1-XOR-\nSAT distribution in which a random true literal appears with probability \u03b4 and random false literal\nappears with probability 2 \u2212 \u03b4. The simple majority vote algorithm described above suf\ufb01ces: set\neach variable to +1 if it appears more often positively than negated in the restricted clauses of the\nformula; to \u22121 if it appears more often negated; and choose randomly if it appears equally often.\nwith \u03c3 (or \u2212\u03c3) on n/2 + t\nn variables with probability at least 1 \u2212 \u0001; using cn log n clauses it will\nrecover \u03c3 exactly with probability 1 \u2212 o(1).\nNow assume that r \u2265 2. We describe how the parity distribution Q\u03b4\n\u03c3 on r-constraints induces a\nbipartite block model. Let V1 be the set of 2n literals of the given variable set, and V2 the collection\neach set into two parts as follows: A1 \u2282 V1 is the set of false literals under \u03c3, and B1 the set of true\nliterals. A2 \u2282 V2 is the set of (r \u2212 1)-tuples with an even number of true literals under \u03c3, and B2\nthe set of (r \u2212 1)-tuples with an odd number of true literals.\nFor each r-constraint (l1, l2, . . . , lr), we add an edge in the block model between the tuples l1 \u2208 V1\nand (l2, . . . , lr) \u2208 V2. A constraint drawn according to Q\u03b4\n\u03c3 induces a random edge between A1\nand A2 or B1 and B2 with probability \u03b4/2 and between A1 and B2 or B1 and A2 with probability\n1 \u2212 \u03b4/2, exactly the distribution of a single edge in the bipartite block model. Recovering the\npartition P1 = A1 \u222a B1 in this bipartite block model partitions the literals into true and false sets\ngiving \u03c3 (up to sign). Now the model in Defn. 2 is that of m clauses selected independently with\nreplacement according to a given distribution, while in Defn. 1, each edge is present independently\nwith a given probability. Reducing from the \ufb01rst to the second can be done by Poissonization; details\ngiven in the full version [18].\nThe key feature of our bipartite block model algorithm is that it uses \u02dcO(\n\u02dcO((n1n2)\u22121/2), corresponding to \u02dcO(nr/2) clauses in the planted CSP.\n\n(cid:1). We partition\n\nn1n2) edges (i.e. p =\n\nr\u22121\n\n\u221a\n\n5 Comparison with spectral approach\n\nAs noted above, many approaches to graph partitioning problems and planted satis\ufb01ability problems\nuse eigenvectors or singular vectors. These algorithms are essentially based on the signs of the\ntop eigenvector of the centered adjacency matrix being correlated with the planted vector. This is\nfairly straightforward to establish when the average degree of the random graph is large enough.\nHowever, in the stochastic block model, for example, when the average degree is a constant, vertices\nof large degree dominate the spectrum and the straightforward spectral approach fails (see [26] for\na discussion and references).\nIn the case of the usual block model, n1 = n2 = n, while our approach has a fast running time, it\ndoes not save on the number of edges required as compared to the standard spectral approach: both\nrequire \u2126(n log n) edges. However, when n2 (cid:29) n1, eg. n1 = \u0398(n), n2 = \u0398(nk\u22121) as in the case\nof the planted k-CSP\u2019s for odd k, this is no longer the case.\nConsider the general-purpose partitioning algorithm of [28]. Let G be the matrix of edge probabil-\nities: Gij is the probability that the edge between vertices i and j is present. Let Gu, Gv denote\ncolumns of G corresponding to vertices u, v. Let \u03c32 be an upper bound of the variance of an entry\nin the adjacency matrix, sm the size of the smallest part in the planted partition, q the number of\nparts, \u03b4 the failure probability of the algorithm, and c a universal constant. Then the condition for\nthe success of McSherry\u2019s partitioning algorithm is:\n\nmin\n\nu,v in different parts\n\n(cid:107)Gu \u2212 Gv(cid:107)2 > cq\u03c32(n/sm + log(n/\u03b4))\n\nIn our case, we have q = 4, n = n1+n2, sm = n1/2, \u03c32 = \u0398(p), and (cid:107)Gu\u2212Gv(cid:107)2 = 4(\u03b4\u22121)2p2n2.\nWhen n2 (cid:29) n1 log n, the condition requires p = \u2126(1/n1), while our algorithm succeeds when\np = \u2126(log n1/\nk\u22121\nthis gives a polynomial factor improvement.\n\nn1n2). In our application to planted CSP\u2019s with odd k and n1 = 2n, n2 =(cid:0) 2n\n\n(cid:1),\n\n\u221a\n\n7\n\n\fIn fact, previous spectral approaches to planted CSP\u2019s or random k-SAT refutation worked for even\nk using nk/2 constraints [23, 9, 14], while algorithms for odd k only worked for k = 3 and used\nconsiderably more complicated constructions and techniques [13, 22, 8]. In contrast to previous\napproaches, our algorithm uni\ufb01es the algorithm for planted k-CSP\u2019s for odd and even k, works for\nodd k > 3, and is particularly simple and fast.\nWe now describe why previous approaches faced a spectral barrier for odd k, and how our algorithm\nsurmounts it. The previous spectral algorithms for even k constructed a similar graph to the one in\nthe reduction above: vertices are k/2-tuples of literals, and with edges between two tuples if their\nunion appears as a k-clause. The distribution induced in this case is the stochastic block model. For\nodd k, such a reduction is not possible, and one might try a bipartite graph, with either the reduction\ndescribed above, or with (cid:98)k/2(cid:99)-tuples and (cid:100)k/2(cid:101)-tuples (our analysis works for this reduction as\nwell). However, with \u02dcO(k/2) clauses, the spectral approach of computing the largest or second\nlargest singular vector of the adjacency matrix does not work.\nConsider M from the distribution M (p). Let u be the n1 dimensional vector indexed as the rows\nof M whose entries are 1 if the corresponding vertex is in A1 and \u22121 otherwise. De\ufb01ne the n2\ndimensional vector v analogously. The next propositions summarize properties of M.\nProposition 1. E(M ) = (\u03b4 \u2212 1)puvT .\nProposition 2. Let M1 be the rank-1 approximation of M drawn from M (p). Then (cid:107)M1\u2212E(M )(cid:107) \u2264\n2(cid:107)M \u2212 E(M )(cid:107).\n\n\u221a\n\nThe above propositions suf\ufb01ce to show high correlation between the top singular vector and the\nvector u when n2 = \u0398(n1) and p = \u2126(log n1/n1). This is because the norm of E(M ) is p\n\u221a\nn1n2;\npn2), the norm of M \u2212 E(M ) for this range of p. Therefore the top singular\nthis is higher than O(\nvector of M will be correlated with the top singular vector of E(M ). The latter is a rank-1 matrix\nwith u as its left singular vector.\nHowever, when n2 (cid:29) n1 (eg. k odd) and p = \u02dcO((n1n2)\u22121/2), the norm of the zero-mean matrix\nM \u2212 E(M ) is in fact much larger than the norm of E(M ). Letting x(i) be the vector of length\nn1 with a 1 in the ith coordinate and zeroes elsewhere, we see that (cid:107)M x(i)(cid:107)2 \u2248 \u221a\npn2, and so\n\u221a\n(cid:107)M \u2212 E(M )(cid:107) = \u2126(\nn1n2); the former is \u2126((n2/n1)1/4) while the\nlatter is O(1)). In other words, the top singular value of M is much larger than the value obtained by\nthe vector corresponding to the planted assignment! The picture is in fact richer: the straightforward\nspectral approach succeeds for p (cid:29) n\n, the top left singular\nvector of the centered adjacency matrix is asymptotically uncorrelated with the planted vector [20].\nIn spite of this, one can exploit correlations to recover the planted vector below this threshold with\nour resampling algorithm, which in this case provably outperforms the spectral algorithm.\n\npn2), while (cid:107) E(M )(cid:107) = O(p\n\n, while for p (cid:28) n\n\n\u22122/3\n1\n\n\u22121/3\nn\n2\n\n\u221a\n\n\u22122/3\n1\n\n\u22121/3\n2\n\nn\n\nAcknowledgements\n\nS. Vempala supported in part by NSF award CCF-1217793.\n\nReferences\n[1] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. arXiv preprint\n\narXiv:1405.3267, 2014.\n\n[2] D. Achlioptas and F. McSherry. Fast computation of low rank matrix approximations. In STOC, pages\n\n611\u2013618, 2001.\n\n[3] Q. Berthet and P. Rigollet. Complexity theoretic lower bounds for sparse principal component detection.\n\nIn COLT, pages 1046\u20131066, 2013.\n\n[4] A. Blum. Learning boolean functions in an in\ufb01nite attribute space. Machine Learning, 9:373\u2013386, 1992.\n[5] A. Bogdanov and Y. Qiao. On the security of goldreich\u2019s one-way function. In Approximation,\n\nRandomization, and Combinatorial Optimization. Algorithms and Techniques, pages 392\u2013405. 2009.\n\n[6] R. B. Boppana. Eigenvalues and graph bisection: An average-case analysis. In FOCS, pages 280\u2013285,\n\n1987.\n\n[7] A. Coja-Oghlan. Graph partitioning via adaptive spectral techniques. Combinatorics, Probability &\n\nComputing, 19(2):227, 2010.\n\n8\n\n\f[8] A. Coja-Oghlan, C. Cooper, and A. Frieze. An ef\ufb01cient sparse regularity concept. SIAM Journal on\n\nDiscrete Mathematics, 23(4):2000\u20132034, 2010.\n\n[9] A. Coja-Oghlan, A. Goerdt, A. Lanka, and F. Sch\u00a8adlich. Certifying unsatis\ufb01ability of random 2k-sat\n\nformulas using approximation techniques. In Fundamentals of Computation Theory, pages 15\u201326.\nSpringer, 2003.\n\n[10] A. Daniely, N. Linial, and S. Shalev-Shwartz. More data speeds up training time in learning halfspaces\n\nover sparse vectors. In NIPS, pages 145\u2013153, 2013.\n\n[11] A. Daniely and S. Shalev-Shwartz. Complexity theoretic limitations on learning dnf\u2019s. CoRR,\n\nabs/1404.3378, 2014.\n\n[12] S. Decatur, O. Goldreich, and D. Ron. Computational sample complexity. SIAM Journal on Computing,\n\n29(3):854\u2013879, 1999.\n\n[13] U. Feige and E. Ofek. Easily refutable subformulas of large random 3cnf formulas. In Automata,\n\nlanguages and programming, pages 519\u2013530. Springer, 2004.\n\n[14] U. Feige and E. Ofek. Spectral techniques applied to sparse random graphs. Random Structures &\n\nAlgorithms, 27(2):251\u2013275, 2005.\n\n[15] V. Feldman. Attribute ef\ufb01cient and non-adaptive learning of parities and DNF expressions. Journal of\n\nMachine Learning Research, (8):1431\u20131460, 2007.\n\n[16] V. Feldman. Open problem: The statistical query complexity of learning sparse halfspaces. In COLT,\n\npages 1283\u20131289, 2014.\n\n[17] V. Feldman, E. Grigorescu, L. Reyzin, S. Vempala, and Y. Xiao. Statistical algorithms and a lower bound\n\nfor planted clique. In STOC, pages 655\u2013664, 2013.\n\n[18] V. Feldman, W. Perkins, and S. Vempala. Subsampled power iteration: a uni\ufb01ed algorithm for block\n\nmodels and planted csp\u2019s. CoRR, abs/1407.2774, 2014.\n\n[19] V. Feldman, W. Perkins, and S. Vempala. On the complexity of random satis\ufb01ability problems with\n\nplanted solutions. In STOC, pages 77\u201386, 2015.\n\n[20] L. Florescu and W. Perkins. Spectral thresholds in the bipartite stochastic block model. arXiv preprint\n\narXiv:1506.06737, 2015.\n\n[21] M. L. Fredman, J. Koml\u00b4os, and E. Szemer\u00b4edi. Storing a sparse table with 0 (1) worst case access time.\n\nJournal of the ACM (JACM), 31(3):538\u2013544, 1984.\n\n[22] J. Friedman, A. Goerdt, and M. Krivelevich. Recognizing more unsatis\ufb01able random k-sat instances\n\nef\ufb01ciently. SIAM Journal on Computing, 35(2):408\u2013430, 2005.\n\n[23] A. Goerdt and M. Krivelevich. Ef\ufb01cient recognition of random unsatis\ufb01able k-sat instances by spectral\n\nmethods. In STACS 2001, pages 294\u2013304. Springer, 2001.\n\n[24] M. Kearns. Ef\ufb01cient noise-tolerant learning from statistical queries. JACM, 45(6):983\u20131006, 1998.\n[25] S. B. Korada, A. Montanari, and S. Oh. Gossip pca. In SIGMETRICS, pages 209\u2013220, 2011.\n[26] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborov\u00b4a, and P. Zhang. Spectral redemption\n\nin clustering sparse networks. PNAS, 110(52):20935\u201320940, 2013.\n\n[27] L. Massouli\u00b4e. Community detection thresholds and the weak ramanujan property. In STOC, pages 1\u201310,\n\n2014.\n\n[28] F. McSherry. Spectral partitioning of random graphs. In FOCS, pages 529\u2013537, 2001.\n[29] E. Mossel, J. Neeman, and A. Sly. A proof of the block model threshold conjecture. arXiv preprint\n\narXiv:1311.4115, 2013.\n\n[30] R. O\u2019Donnell and D. Witmer. Goldreich\u2019s prg: Evidence for near-optimal polynomial stretch. In\n\nConference on Computational Complexity, 2014.\n\n[31] R. Servedio. Computational sample complexity and attribute-ef\ufb01cient learning. Journal of Computer\n\nand System Sciences, 60(1):161\u2013178, 2000.\n\n[32] S. Shalev-Shwartz, O. Shamir, and E. Tromer. Using more data to speed-up training time. In AISTATS,\n\npages 1019\u20131027, 2012.\n\n[33] V. Vu. A simple svd algorithm for \ufb01nding hidden partitions. arXiv preprint arXiv:1404.3918, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1607, "authors": [{"given_name": "Vitaly", "family_name": "Feldman", "institution": "IBM Research - Almaden"}, {"given_name": "Will", "family_name": "Perkins", "institution": "University of Birmingham"}, {"given_name": "Santosh", "family_name": "Vempala", "institution": "Georgia Tech"}]}