{"title": "Coded Distributed Computing for Inverse Problems", "book": "Advances in Neural Information Processing Systems", "page_first": 709, "page_last": 719, "abstract": "Computationally intensive distributed and parallel computing is often bottlenecked by a small set of slow workers known as stragglers. In this paper, we utilize the emerging idea of ``coded computation'' to design a novel error-correcting-code inspired technique for solving linear inverse problems under specific iterative methods in a parallelized implementation affected by stragglers. Example machine-learning applications include inverse problems such as personalized PageRank and sampling on graphs. We provably show that our coded-computation technique can reduce the mean-squared error under a computational deadline constraint. In fact, the ratio of mean-squared error of replication-based and coded techniques diverges to infinity as the deadline increases. Our experiments for personalized PageRank performed on real systems and real social networks show that this ratio can be as large as $10^4$. Further, unlike coded-computation techniques proposed thus far, our strategy combines outputs of all workers, including the stragglers, to produce more accurate estimates at the computational deadline. This also ensures that the accuracy degrades ``gracefully'' in the event that the number of stragglers is large.", "full_text": "Coded Distributed Computing for Inverse Problems\n\nYaoqing Yang, Pulkit Grover and Soummya Kar\n\nCarnegie Mellon University\n\n{yyaoqing, pgrover, soummyak}@andrew.cmu.edu\n\nAbstract\n\nComputationally intensive distributed and parallel computing is often bottlenecked\nby a small set of slow workers known as stragglers. In this paper, we utilize the\nemerging idea of \u201ccoded computation\u201d to design a novel error-correcting-code\ninspired technique for solving linear inverse problems under speci\ufb01c iterative\nmethods in a parallelized implementation affected by stragglers. Example machine-\nlearning applications include inverse problems such as personalized PageRank and\nsampling on graphs. We provably show that our coded-computation technique can\nreduce the mean-squared error under a computational deadline constraint. In fact,\nthe ratio of mean-squared error of replication-based and coded techniques diverges\nto in\ufb01nity as the deadline increases. Our experiments for personalized PageRank\nperformed on real systems and real social networks show that this ratio can be\nas large as 104. Further, unlike coded-computation techniques proposed thus far,\nour strategy combines outputs of all workers, including the stragglers, to produce\nmore accurate estimates at the computational deadline. This also ensures that the\naccuracy degrades \u201cgracefully\u201d in the event that the number of stragglers is large.\n\nIntroduction\n\n1\nThe speed of distributed computing is often affected by a few slow workers known as the \u201cstragglers\u201d\n[1\u20134]. This issue is often addressed by replicating tasks across workers and using this redundancy to\nignore some of the stragglers. Recently, methods from error-correcting codes (ECC) have been used\nfor speeding up distributed computing [5\u201315], which build on classical works on algorithm-based\nfault-tolerance [16]. The key idea is to treat stragglers as \u201cerasures\u201d and use ECC to retrieve the result\nafter a subset of fast workers have \ufb01nished. In some cases, (e.g. [6, 8] for matrix multiplications),\ntechniques that utilize ECC achieve scaling-sense speedups in average computation time compared to\nreplication. In this work, we propose a novel coding-inspired technique to deal with stragglers in\ndistributed computing of linear inverse problems using iterative solvers [17].\nExisting techniques that use coding to deal with stragglers treat straggling workers as \u201cerasures\u201d, that\nis, they ignore computation results of the stragglers. In contrast, when using iterative methods for\nlinear inverse problems, even if the computation result at a straggler has not converged, the proposed\nalgorithm does not ignore the result, but instead combines it (with appropriate weights) with results\nfrom other workers. This is in part because the results of iterative methods often converge gradually\nto the true solutions. We use a small example shown in Fig. 1 to illustrate this idea. Suppose we\nwant to solve two linear inverse problems with solutions x\u2217\n2. We \u201cencode the computation\u201d by\nadding an extra linear inverse problem with solution x\u2217\n2 (see Section 3), and distribute these\nthree problems to three workers. Using this method, the solutions x\u2217\n2 can be obtained from\nthe results of any combination of two fast workers that \ufb01rst return their solutions.\nBut what if we have a computational deadline, Tdl, by which only one worker converges? The natural\nextension of existing strategies (e.g., [6]) will declare a failure because it needs at least two workers\nto respond. However, our strategy does not require convergence: even intermediate results can be\nutilized to estimate solutions. In other words, our strategy degrades gracefully as the number of\nstragglers increases, or as the deadline is pulled earlier. Indeed, we show that it is suboptimal to\nignore stragglers as erasures, and design strategies that treat the difference from the optimal solution\n\n1 and x\u2217\n1 + x\u2217\n\n1 and x\u2217\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: A comparison between the existing scheme in [6] and the proposed algorithm.\n\nas \u201csoft\u201d additive noise (see Section 3). We use an algorithm that is similar to weighted least-squares\nfor decoding, giving each worker a weight based on its proximity to convergence. In this way, we can\nexpect to fully utilize the computation results from all workers and obtain better speedup.\nTheoretically, we show that for a speci\ufb01ed deadline time Tdl, under certain conditions on worker\nspeed distributions, the coded linear inverse solver using structured codes has smaller mean squared\nerror than the replication-based linear solver (Theorem 4.4). In fact, under more relaxed conditions\non worker speed distributions, when the computation time Tdl increases, the ratio of the mean-squared\nerror (MSE) of replication-based and coded linear solvers can get arbitrarily large (Theorem 4.5)!\nFor validation of our theory, we performed experiments to compare coded and replication-based\ncomputation for a graph mining problem, namely personalized PageRank [18] using the classical\npower-iteration method [19]. We conduct experiments on the Twitter and Google Plus social networks\nunder a deadline on computation time using a given number of workers on a real computation cluster\n(Section 6). We observe that the MSE of coded PageRank is smaller than that of replication by\na factor of 104 at Tdl = 2 seconds. From an intuitive perspective, the advantage of coding over\nreplication is that coding utilizes the diversity of all heterogeneous workers, whereas replication\ncannot (see section 7 for details). To compare with existing coded technique in [6], we adapt it to\ninverse problems by inverting only the partial results from the fast workers. However, from our\nexperiments, if only the results from the fast workers are used, the error ampli\ufb01es due to inverting an\nill-conditioned submatrix during decoding (Section 6). This ill-conditioning issue of real-number\nerasure codes has also been recognized in a recent communication problem [20]. In contrast, our\nnovel way of combining all the partial results including those from the stragglers helps bypass the\ndif\ufb01culty of inverting an ill-conditioned matrix.\nThe focus of this work is on utilizing computations to deliver the minimal MSE in solving linear\ninverse problems. Our algorithm does not reduce the communication cost. However, because each\nworker performs sophisticated iterative computations in our problem, such as the power-iteration\ncomputations, the time required for computation dominates that of communication (Section 5.2).\nThis is unlike some recent works (e.g.[21\u201324]) where communication costs are observed to dominate\nbecause the per-processor computation is smaller.\nFinally, we summarize our main contributions in this paper:\n\n\u2022 We propose a coded computing algorithm for multiple instances of a linear inverse problem;\n\u2022 We theoretically analyze the mean-squared error of coded, uncoded and replication-based\niterative linear solvers under a deadline constraint, and show scaling sense advantage of\ncoded solvers in theory and orders of magnitude smaller error in data experiments.\n\u2022 This is the \ufb01rst work that treats stragglers as soft errors instead of erasures, which leads to\n\ngraceful degradation in the event that the number of stragglers is large.\n\n2 System Model and Problem Formulation\n2.1 Preliminaries on Solving Linear Systems using Iterative Methods\n\nConsider the problem of solving k inverse problems with the same linear transform matrix M and\ndifferent inputs ri: Mxi = ri, i = 1, 2, . . . k. When M is a square matrix, the closed-form solution\nis xi = M\u22121ri. When M is a non-square matrix, the regularized least-square solution is xi =\n(M(cid:62)M + \u03bbI)\u22121M(cid:62)ri, i = 1, 2, . . . k, with an appropriate regularization parameter \u03bb. Since matrix\ninversion is hard, iterative methods are often used. We now look at two ordinary iterative methods,\nnamely the Jacobian method [17] and the gradient descent method. For a square matrix M = D + L,\nwhere D is diagonal, the Jacobian iteration is written as x(l+1)\ni ). Under certain\nconditions of D and L ([17, p.115]), the computation result converges to the true solution. One\nexample is the PageRank algorithm discussed in Section 2.2. For the (cid:96)2-minimization problem with a\ni +\u0001M(cid:62)ri,\nnon-square M, the gradient descent method has the form x(l+1)\n\n= ((1\u2212\u03bb)I\u2212\u0001M(cid:62)M)x(l)\n\n= D\u22121(ri \u2212 Lx(l)\n\ni\n\ni\n\n2\n\nx1+e1x2+e2x1+x2+e3Processor1Processor2Processor3x1x1+x2r1r2r1+r2General coded computationProposed coded method(slow)ignore!Processor1Processor2Processor3(slow)waitEforEtwoEfastEworkersDeadlineETdlDecode*******x2*(slow)(slow)DecodefailweightedEcombinationEncoder1r2r1r2r1+r2Encoder1r2\fwhere \u0001 is an appropriate step-size. We can see that both the Jacobian iteration and the gradient\ndescent iteration mentioned above have the form\n\n(1)\nfor two appropriate matrices B and K, which solves the following equation with true solution x\u2217\ni :\n(2)\n\ni + Kri, i = 1, 2, . . . k,\n\ni + Kri, i = 1, 2, . . . k.\n\nx\u2217\ni = Bx\u2217\n\n= Bx(l)\n\nx(l+1)\ni\n\nTherefore, subtracting (2) from (1), we have that the computation error e(l)\n\ni = x(l)\n\ni \u2212 x\u2217\n\ni satis\ufb01es\n\n(3)\nFor the iterative method to converge, we always assume the spectral radius \u03c1(B) < 1 (see [17,\np.115]). We will study iterative methods that have the form (1) throughout this paper.\n\n= Be(l)\ni\n\ne(l+1)\ni\n\n.\n\n2.2 Motivating Applications of Linear Inverse Problems\n\nOur coded computation technique requires solving multiple inverse problems with the same linear\ntransform matrix M. One such problem is personalized PageRank. For a directed graph, the\nPageRank algorithm [19] aims to measure the nodes\u2019 importance by solving the linear problem\nN 1N + (1 \u2212 d)Ax, where d = 0.15 is called the \u201cteleport\u201d probability, N is the number of\nx = d\nnodes and A is the column-normalized adjacency matrix. The personalized PageRank problem [18]\nconsiders a more general equation x = dr + (1\u2212 d)Ax, for any possible vector r \u2208 RN that satis\ufb01es\n1(cid:62)r = 1. Compared to PageRank [19], personalized PageRank [18] incorporates r as the preference\nof different users or topics. A classical method to solve PageRank is power-iteration, which iterates\nthe computation x(l+1) = dr + (1 \u2212 d)Ax(l) until convergence. This iterative method is the same\nas (1), which is essentially the Jacobian method mentioned above. Another example application is\nthe sampling and recovery problem in the emerging \ufb01eld of graph signal processing [25, 26] as a\nnon-square system, which is discussed in Supplementary section 8.1.\n\n2.3 Problem Formulation: Distributed Computing and the Straggler Effect\n\nConsider solving k linear inverse problems Mxi = ri, i = 1, 2, . . . k in n > k workers using the\niterative method (1), where each worker solves one inverse problem. Due to the straggler effect, the\ncomputation at different workers can have different speeds. The goal is to obtain minimal MSE in\nsolving linear inverse problems before a deadline time Tdl. Suppose after Tdl, the i-th worker has\ncompleted li iterations in (1). Then, from (3), the residual error at the i-th worker is\n\ne(li)\ni = Blie(0)\n\ni\n\n.\n\n(4)\n\ni , i = 1, 2, . . . k, are i.i.d.\n\nFor our theoretical results, we sometimes need the following assumption.\nAssumption 1. We assume that the optimal solutions x\u2217\nDenote by \u00b5E and CE respectively the mean and the covariance of each x\u2217\ni . Note that Assumption 1\nis equivalent to the assumption that the inputs ri, i = 1, 2, . . . k are i.i.d., because ri and x\u2217\ni are\nrelated by the linear equation (2). For the personalized PageRank problem discussed above, this\nassumption is reasonable because queries from different users or topics are unrelated. Assume we\nhave estimated the mean \u00b5E beforehand and we start with the initial estimate x(0)\ni = \u00b5E. Then,\ni \u2212 x\u2217\ne(0)\ni = x(0)\ni has mean 0N and covariance CE. We also try to extend our results for the case\nwhen x\u2217\ni \u2019s (or equivalently, ri\u2019s) are correlated. Since the extension is rather long and may hinder the\nunderstanding of the main paper, we provide it in supplementary section 8.2 and section 8.5.\n2.4 Preliminaries on Error Correcting Codes\n\nWe will use \u201cencode\u201d and \u201cdecode\u201d to denote preprocessing and post-processing before and after\nparallel computation. In this paper, the encoder multiplies the inputs to the parallel workers with a\n\u201cgenerator matrix\u201d G and the decoder multiplies the outputs of the workers with a \u201cdecoding matrix\u201d\nL (see Algorithm 1). We call a code an (n, k) code if the generator matrix has size k \u00d7 n. We often\nuse generator matrices G with orthonormal rows, which means Gk\u00d7nG(cid:62)\nn\u00d7k = Ik. An example of\nsuch a matrix is the submatrix formed by any k rows of an n \u00d7 n orthonormal matrix (e.g., a Fourier\nmatrix). Under this assumption, Gk\u00d7n can be augmented to form an n \u00d7 n orthonormal matrix using\nanother matrix H(n\u2212k)\u00d7n, i.e. the square matrix Fn\u00d7n =\n\n(cid:20) Gk\u00d7n\n\nsatis\ufb01es F(cid:62)F = In.\n\n(cid:21)\n\nH(n\u2212k)\u00d7n\n\n3\n\n\f3 Coded Distributed Computing of Linear Inverse Problems\nThe proposed coded linear inverse algorithm (Algorithm 1) has three stages: (1) preprocessing\n(encoding) at the central controller, (2) parallel computing at n > k parallel workers, and (3) post-\nprocessing (decoding) at the central controller. As we show later in the analysis of computing error,\nthe entries trace(C(li)) in the diagonal matrix \u039b are the expected MSE at each worker prior to\ndecoding. The decoding matrix Lk\u00d7n in the decoding step (7) is chosen to be (G\u039b\u22121G(cid:62))\u22121G\u039b\u22121\nto reduce the mean-squared error of the estimates of linear inverse solutions by assigning different\nweights to different workers based on the estimated accuracy of their computation (which is what \u039b\nprovides). This particular choice of \u039b is inspired from the weighted least-square solution.\n\nAlgorithm 1 Coded Distributed Linear Inverse\n\nInput: Input vectors [r1, r2, . . . , rk], generator matrix Gk\u00d7n, the linear system matrices B and K\nde\ufb01ned in (1).\nInitialize (Encoding): Encode the input vectors and the initial estimates by multiplying G:\n\n[s1, s2, . . . , sn] = [r1, r2, . . . , rk] \u00b7 G.\n2 , . . . , x(0)\n2 , . . . , y(0)\n\nn ] = [x(0)\n\n1 , x(0)\n\n[y(0)\n\n1 , y(0)\n\nk ] \u00b7 G.\n\nParallel Computing:\nfor i = 1 to n (in parallel) do\n\n(5)\n(6)\n\nto the i-th worker. Execute the iterative method (1) with initial estimate y(0)\n\ni\n\nSend si and y(0)\nand input si at each worker.\n\ni\n\nend for\nAfter a deadline time Tdl, collect all linear inverse results y(li)\nsuperscript li in y(li)\ncollection of all results Y(Tdl)\n, . . . , y(ln)\nn ].\nPost Processing (decoding at the central controller):\nCompute an estimate of the linear inverse solutions using the following matrix multiplication:\n\nfrom these n workers. The\nrepresents that the i-th worker \ufb01nished li iterations. Denote by Y(Tdl) the\n\nN\u00d7n = [y(l1)\n\n, y(l2)\n\n1\n\n2\n\ni\n\ni\n\n\u02c6X(cid:62) = L \u00b7 (Y(Tdl))(cid:62) := (G\u039b\u22121G(cid:62))\u22121G\u039b\u22121(Y(Tdl))(cid:62),\n\nwhere the estimate \u02c6XN\u00d7k = [\u02c6x1, \u02c6x2, . . . , \u02c6xk], the matrix \u039b is\n\n\u039b = diag [trace(C(l1)), . . . , trace(C(ln))] ,\n\n(7)\n\n(8)\n\nwhere the matrices C(li), i = 1, . . . , n are de\ufb01ned as\n\n(9)\nIn computation of \u039b, if trace(C(li)) are not available, one can use precomputed estimates of this\ntrace as discussed in Supplementary Section 8.9 with negligible computational complexity and\ntheoretically guaranteed accuracy.\n\nC(li) = Bli CE(B(cid:62))li.\n\n1, x\u2217\n\n2, . . . x\u2217\n\n3.1 Bounds on Performance of the Coded Linear Inverse Algorithm\nDe\ufb01ne l = [l1, l2, . . . ln] as the vector of the number of iterations at all workers. E[\u00b7|l] denotes the\nconditional expectation taken with respect to the randomness of the optimal solution x\u2217\ni (see As-\nsumption 1) conditioned on \ufb01xed iteration number li at each worker, i.e., E[X|l] = E[X|l1, l2, . . . ln].\nDe\ufb01ne X\u2217\nTheorem 3.1. De\ufb01ne E = \u02c6X \u2212 X\u2217, i.e., the error of the decoding result (7). Assuming that the\nsolutions for each linear inverse problem are chosen i.i.d. (across all problems) according to a\ndistribution with covariance CE. Then, the error covariance of E satis\ufb01es\n\nk] as the matrix composed of all the true solutions.\n\nN\u00d7k = [x\u2217\n\nE[(cid:107)E(cid:107)2 |l] \u2264 \u03c3max(G(cid:62)G)trace(cid:2)(G\u039b\u22121G(cid:62))\u22121(cid:3) ,\n\n(10)\nwhere the norm (cid:107)\u00b7(cid:107) is the Frobenius norm, \u03c3max(G(cid:62)G) is the maximum eigenvalue of G(cid:62)G and\nthe matrix \u039b is de\ufb01ned in (8). Further, when G has orthonormal rows,\n\nE[(cid:107)E(cid:107)2 |l] \u2264 trace(cid:2)(G\u039b\u22121G(cid:62))\u22121(cid:3) ,\n\n(11)\n\n4\n\n\fProof overview. See supplementary Section 8.3 for the complete proof. Here we provide the main\nintuition by analyzing a \u201cscalar version\u201d of the linear inverse problem, in which case the matrix B is\nequal to a scalar a. For B = a, the inputs and the initial estimates in (5) and (6) are vectors instead\nof matrices. As we show in Supplementary Section 8.3, if we encode both the inputs and the initial\nestimates using (5) and (6), we also \u201cencode\u201d the error\n1 , e(0)\n\n2 , . . . , e(0)\n\n2 , . . . , \u0001(0)\n\n(12)\n\n[\u0001(0)\n1 , \u0001(0)\ni is the initial error at the i-th worker, e(0)\n\nn ] = [e(0)\n\ni = y(0)\n\nwhere \u0001(0)\nthe i-th linear inverse problem, and E0 := [e(0)\nscalar version of CE after Assumption 1. From (4), the error satis\ufb01es:\n\n2 , . . . e(0)\n\ni \u2212 y\u2217\n\n1 , e(0)\n\nk ] \u00b7 G =: E0G,\ni \u2212 x\u2217\ni = x(0)\n\nk ]. Suppose var[e(0)\n\ni\n\ni is the initial error of\n] = ce, which is a\n\ni\n\n\u0001(li)\ni = ali\u0001(0)\n\n, i = 1, 2, . . . n.\n\n(13)\nDenote by D = diag{al1, al2 , . . . aln}. Therefore, from (12) and (13), the error before the decoding\nstep (7) can be written as\n[\u0001(l1)\n\nn ] \u00b7 D = E0GD.\n(14)\nWe can show (see Supplementary Section 8.3 for details) that after the decoding step (7), the error\nvector is also multiplied by the decoding matrix L = (G\u039b\u22121G(cid:62))\u22121G\u039b\u22121:\n\nn ] =[\u0001(0)\n\n2 , . . . \u0001(0)\n\n1 , \u0001(0)\n\n, . . . \u0001(ln)\n\n, \u0001(l2)\n\n1\n\n2\n\n(cid:104)\n\n(cid:105)(cid:62)\n\nE(cid:62) = L\n\n\u0001(l1)\n1\n\n, \u0001(l2)\n\n2\n\n, . . . \u0001(ln)\n\nn\n\n= LD(cid:62)G(cid:62)E(cid:62)\n0 .\n\nThus,\n\nE[(cid:107)E(cid:107)2 |l] =E[trace[E(cid:62)E]|l] = trace[LD(cid:62)G(cid:62)E[E(cid:62)\n\n0 E0|l]GDL(cid:62)]\n\n(15)\n\n(16)\n\n(a)\n\n= trace[LD(cid:62)G(cid:62)ceIkGDL(cid:62)] = cetrace[LD(cid:62)G(cid:62)GDL(cid:62)]\n(b)\u2264ce\u03c3max(G(cid:62)G)trace[LD(cid:62)DL(cid:62)] = \u03c3max(G(cid:62)G)trace[L(ceD(cid:62)D)L(cid:62)]\n= \u03c3max(G(cid:62)G)trace[L\u039bL(cid:62)]\n\n(d)\n\n(c)\n\n= \u03c3max(G(cid:62)G)trace[(G\u039b\u22121G(cid:62))\u22121],\nk ] and var[e(0)\n\ni\n\n] = ce, (b) holds because G(cid:62)G (cid:22)\nwhere (a) holds because E0 := [e(0)\n\u03c3max(G(cid:62)G)In, (c) holds because ceD(cid:62)D = \u039b, which is from the fact that for a scalar linear system\nmatrix B = a, the entries in the \u039b matrix in (8) satisfy\n\n2 , . . . e(0)\n\n1 , e(0)\n\ntrace(C(li)) = alice(a(cid:62))li = cea2li,\n\n(17)\nwhich is the same as the entries in the diagonal matrix ceD(cid:62)D. Finally, (d) is obtained by di-\nrectly plugging in L := (G\u039b\u22121G(cid:62))\u22121G\u039b\u22121. Finally, inequality 11 holds because when G has\northonormal rows, \u03c3(G(cid:62)G) = 1.\n\nAdditionally, we note that in (10), the term trace(cid:2)(G\u039b\u22121G(cid:62))\u22121(cid:3) resembles the MSE of ordinary\n\nweighted least-square solution, and the term \u03c3max(G(cid:62)G) represents the \u201cinaccuracy\u201d due to using\nthe weighted least-square solution as the decoding result, because the inputs to different workers\nbecome correlated by multiplying the i.i.d. inputs with matrix G (see (5)).\n\n4 Comparison with Uncoded Schemes and Replication-based Schemes\n\nHere, we often assume (we will state explicitly in the theorem) that the number of iterations li at\ndifferent workers are i.i.d.. We use Ef [\u00b7] to denote expectation on randomness of both the linear\ninverse solutions x\u2217\nAssumption 2. Within time Tdl, the number of iterations of linear inverse computations (see (1)) at\neach worker follows an i.i.d. distribution li \u223c f (l).\n\ni and the number of iterations li (this is different from the notation E[\u00b7|l]).\n\n4.1 Comparison between the coded and uncoded linear inverse before a deadline\n\nFirst, we compare the coded linear inverse scheme with an uncoded scheme, in which case we use\nthe \ufb01rst k workers to solve k linear inverse problems in (2) without coding. The following theorem\nquanti\ufb01es the overall mean-squared error of the uncoded scheme given l1, l2, . . . , lk. The proof is in\nSupplementary Section 8.6.\n\n5\n\n\fTheorem 4.1. In the uncoded scheme, the error E(cid:104)(cid:107)Euncoded(cid:107)2 |l\n(cid:105)\n(cid:80)k\n\ni=1 trace (C(li)). Further, when the i.i.d. Assumption 2 holds,\n\n(cid:20)(cid:13)(cid:13)(cid:13)[e(l1)\n\n1\n\n= E\n\n. . . , e(lk)\n\nk\n\n(cid:104)(cid:107)Euncoded(cid:107)2(cid:105)\n\nEf\n\n= kEf [trace(C(l1))].\n\n(cid:13)(cid:13)(cid:13)2(cid:12)(cid:12)(cid:12)(cid:12) l\n(cid:21)\n\n]\n\n=\n\n(18)\n\nThen, we compare the overall mean-squared error of coded and uncoded linear inverse algorithms.\nNote that this comparison is not fair because the coded algorithm uses more workers than uncoded.\nHowever, we still include Theorem 4.2 because we need it for the fair comparison between coded and\nreplication-based linear inverse. The proof is in Supplementary section 8.4.\nTheorem 4.2. (Coded linear inverse beats uncoded) Suppose the i.i.d. Assumptions 1 and 2 hold and\nsuppose G is a k \u00d7 n submatrix of an n \u00d7 n Fourier transform matrix F, i.e., Fn\u00d7n =\n.\nThen, expected error of the coded linear inverse is strictly less than that of uncoded:\n\n(cid:20) Gk\u00d7n\n\nH(n\u2212k)\u00d7n\n\n(cid:21)\n\n(cid:104)(cid:107)Euncoded(cid:107)2(cid:105) \u2212 Ef\n\n(cid:104)(cid:107)Ecoded(cid:107)2(cid:105) \u2265 Ef [trace(J2J\u22121\n\n4 J(cid:62)\n\n2 )],\n\n(19)\n\nEf\n\nwhere J2 and J4 are the submatrices of F\u039bF(cid:62) :=\n(8). That is, (J1)k\u00d7k is G\u039bG(cid:62), (J2)k\u00d7(n\u2212k) is G\u039bH(cid:62), and (J4)(n\u2212k)\u00d7(n\u2212k) is H\u039bH(cid:62).\n\nand the matrix \u039b is de\ufb01ned in\n\nJ(cid:62)\n\nn\u00d7n\n\nJ4\n\n2\n\n(cid:20) J1 J2\n\n(cid:21)\n\n4.2 Comparison between the replication-based and coded linear inverse before a deadline\n\nConsider an alternative way of doing linear inverse using n > k workers. In this paper, we only\nconsider the case when n \u2212 k < k, i.e., the number of extra workers is only slightly bigger than\nthe number of problems (both in theory and in experiments). Since we have n \u2212 k extra workers,\na natural way is to pick any (n \u2212 k) linear inverse problems and replicate them using these extra\n(n \u2212 k) workers. After we obtain two computation results for the same equation, we use two\nnatural \u201cdecoding\u201d strategies for this replication-based linear inverse: (i) choose the worker with\nhigher number of iterations; (ii) compute the weighted average using weights\n,\n\nwhere w1 = 1/(cid:112)trace(C(l1)) and w2 = 1/(cid:112)trace(C(l2)), and l1 and l2 are the number of iterations\n\ncompleted at the two workers (recall that trace(C(li)) represents the residual MSE at the i-th worker).\nTheorem 4.3. The replication-based schemes satisfy the following lower bound on the MSE:\n\nw1+w2\n\nw1+w2\n\nand\n\n(cid:104)(cid:107)Euncoded(cid:107)2(cid:105) \u2212 (n \u2212 k)Ef [trace(C(l1))].\n\n(cid:104)(cid:107)Erep(cid:107)2(cid:105)\n\n>Ef\n\n(20)\n\nEf\n\nw1\n\nw2\n\nProof overview. Here the goal is to obtain a lower bound on the MSE of replication-based linear\ninverse and compare it with an upper bound on the MSE of coded linear inverse.\nNote that if an extra worker is used to replicate the computation at the i-th worker, i.e., the linear\ninverse problem with input ri is solved on two workers, the expected error of the result of the i-th\nproblem could at best reduced from Ef [trace(C(l1))] (see Thm. 4.1) to zero1. Therefore, (n\u2212k) extra\nworkers make the error decrease by at most (and strictly smaller than) (n \u2212 k)Ef [trace(C(l1))].\n\nUsing this lower bound, we can provably show that coded linear inverse beats replication-based linear\ninverse when certain conditions are satis\ufb01ed. One crucial condition is that the distribution of the\nrandom variable trace(C(l)) (i.e., the expected MSE at each worker) satis\ufb01es a \u201cvariance heavy-tail\u201d\nproperty de\ufb01ned as follows.\nDe\ufb01nition 1. The random variable trace(C(l)) is said to have a \u201c\u03c1-variance heavy-tail\u201d property if\n(21)\n\nvarf [trace(C(l))] > \u03c1E2\n\nf [trace(C(l))],\n\nfor some constant \u03c1 > 1. Notice that the term trace(C(l)) is essentially the remaining MSE after l\niterations at a single machine. Therefore, this property simply means the remaining error at a single\nmachine has large variance. For the coded linear inverse, we will use a \u201cFourier code\u201d, the generator\nmatrix G of which is a submatrix of a Fourier matrix. This particular choice of code is only for ease\nof analysis in comparing coded linear inverse and replication-based linear inverse. In practice, the\ncode that minimizes mean-squared error should be chosen.\n\n1Although this is clearly a loose bound, it makes for convenient comparison with coded linear inverse.\n\n6\n\n\f\u221a\nTheorem 4.4. (Coded linear inverse beats replication) Suppose the i.i.d. Assumptions 1 and 2 hold and\nG is a k \u00d7 n submatrix of k rows of an n \u00d7 n Fourier matrix F. Further, suppose (n \u2212 k) = o(\nn).\nThen, the expected error of the coded linear inverse satis\ufb01es\n\nlim\nn\u2192\u221e\n\n1\n\nn \u2212 k\n\n(cid:104)Ef\n\n(cid:104)(cid:107)Euncoded(cid:107)2(cid:105) \u2212 Ef\n(cid:2)(cid:107)Erep(cid:107)2(cid:3)(cid:3) <\n\n(cid:104)(cid:107)Ecoded(cid:107)2(cid:105)(cid:105) \u2265 varf [trace(C(l1))]\n(cid:2)(cid:107)Euncoded(cid:107)2(cid:3) \u2212 Ef\n\nEf [trace(C(l1))]\n\n(cid:2)Ef\n\n1\n\nlim\nn\u2192\u221e\n\n(n \u2212 k)\n\n1\n\u03c1\n\n.\n\n(22)\n\n(cid:2)(cid:107)Ecoded(cid:107)2(cid:3)(cid:3) .\n(cid:104)(cid:107)Euncoded(cid:107)2(cid:105) \u2212\n\n(23)\n\nMoreover, if the random variable trace(C(l)) satis\ufb01es the \u03c1-variance heavy-tail property for \u03c1 > 1,\ncoded linear inverse outperforms replication-based linear inverse in the following sense,\n\n(cid:2)Ef\n\n(cid:2)(cid:107)Euncoded(cid:107)2(cid:3) \u2212 Ef\n\nlim\nn\u2192\u221e\n\n1\n\n(n \u2212 k)\n\n(cid:104)(cid:107)Ecoded(cid:107)2(cid:105) \u2265 Ef [trace(J2J\u22121\n\nProof overview. See Supplementary Section 8.7 for a complete and rigorous proof. Here we only\nprovide the main intuition behind the proof. From Theorem 4.2, we have Ef\n4 J(cid:62)\nEf\n2 )]. Therefore, to prove (22), the main technical dif\ufb01culty is\nto simplify the term trace(J2J\u22121\n4 J(cid:62)\n2 ). For a Fourier matrix F, we are able to show that the matrix\nF\u039bF(cid:62) =\n(see Theorem 4.2) is a Toeplitz matrix, which provides a good structure for\nus to study its behavior. Then, we use the Gershgorin circle theorem [27] (with some algebraic\nmanipulations) to show that the maximum eigenvalue of J4 satis\ufb01es \u03c3max(J4) \u2248 Ef [trace(C(l1))],\nand separately using some algebraic manipulations, we show\n\n(cid:20) J1 J2\n\nJ(cid:62)\n\n(cid:21)\n\nJ4\n\n2\n\ntrace(J2J(cid:62)\nfor large matrix size n. Since trace(J2J\u22121\n4 J(cid:62)\n\ntrace(J2J\u22121\n\n2 ) \u2248 (n \u2212 k)varf [trace(C(l1))],\n2 ) \u2265 trace(J2(\u03c3max(J4))\u22121J(cid:62)\n4 J(cid:62)\n2 ) \u2265 (n \u2212 k)varf [trace(C(l1))]\n\n,\n\nEf [trace(C(l1))]\n\n2 ) =\n\n(24)\n\u03c3max(J4)trace(J2J(cid:62)\n2 ),\n\n1\n\n(25)\n\nfor large n. Then, (22) can be proved by plugging (25) into (19). After that, we can combine (22),\n(20) and the variance heavy-tail property to prove (23).\n\n4.3 Asymptotic Comparison between Coded, Uncoded and Replication-based linear inverse\n\nas the Deadline Tdl \u2192 \u221e\n\nvi\n\n(cid:101), i = 1, 2, . . . n.\n\nAssumption 3. We assume the computation time of one power iteration is \ufb01xed at each worker for\neach linear inverse computation, i.e., there exist n independent (not necessarily identically distributed)\nrandom variables v1, v2, . . . vn such that li = (cid:100) Tdl\nThe above assumption is validated in experiments in Supplementary Section 8.13.\nThe k-th order statistic of a sample is equal to its k-th smallest value. Suppose the order statistics\nof the sequence v1, v2, . . . vn are vi1 < vi2 < . . . vin, where {i1, i2, . . . in} is a permutation of\n{1, 2, . . . n}. Denote by [k] the set {1, 2, . . . k} and [n] the set {1, 2, . . . n}.\nTheorem 4.5. (Error exponent comparison when Tdl \u2192 \u221e) Suppose the i.i.d. Assumption 1 and\nAssumption 3 hold. Suppose n \u2212 k < k. Then, the error exponents of the coded and uncoded\ncomputation schemes satisfy\nTdl\u2192\u221e\u2212 1\nTdl\u2192\u221e\u2212 1\n. (27)\nThe error exponents satisfy coded>replication=uncoded. Here the expectation E[\u00b7|l] is only taken\nwith respect to the randomness of the linear inverse sequence xi, i = 1, 2, . . . k.\n\nlog E[(cid:107)Ecoded(cid:107)2 |l] \u2265 2\nvik\nlog E[(cid:107)Euncoded(cid:107)2 |l] = lim\n\n1 \u2212 d\nTdl\u2192\u221e\u2212 1\n\nlog E[(cid:107)Erep(cid:107)2 |l] =\n\nmaxi\u2208[k] vi\n\n1 \u2212 d\n\n(26)\n\nlim\n\nlim\n\nlog\n\nlog\n\nTdl\n\nTdl\n\nTdl\n\n2\n\n1\n\n,\n\n1\n\nProof overview. See Supplementary Section 8.8 for a detailed proof. The main intuition behind this\nresult is the following: when Tdl approaches in\ufb01nity, the error of uncoded computation is dominated\n\n7\n\n\fby the slowest worker among the \ufb01rst k workers, which has per-iteration time maxi\u2208[k] vi. For the\nreplication-based scheme, since the number of extra workers n\u2212k < k, there is a non-zero probability\n(which does not change with Tdl) that the n \u2212 k extra workers do not replicate the computation in\nthe slowest one among the \ufb01rst worker. Therefore, replication when n \u2212 k < k does not improve\nthe error exponent, because the error is dominated by this slowest worker. For coded computation,\nwe show in Supplementary Section 8.8 that the slowest n \u2212 k workers among the overall n workers\ndo not affect the error exponent, which means that the error is dominated by the k-th fastest worker,\nwhich has per-iteration time vik. Since the k-th fastest worker among all n workers can not be slower\nthan the slowest one among the \ufb01rst (unordered) k workers, the error exponent of coded linear inverse\nis larger than that of the uncoded and the replication-based linear inverse.\n\n5 Analyzing the Computational Complexity\n\n5.1 Encoding and decoding complexity\n\nWe \ufb01rst show that the encoding and decoding complexity of Algorithm 1 are in scaling-sense smaller\nthan that of the computation at each worker. This ensures that straggling comes from the parallel\nworkers, not the encoder or decoder. The proof of Theorem 5.1 is in Supplementary Section 8.10. In\nour experiment on the Google Plus graph (See Section 6) for computing PageRank, the computation\ntime at each worker is 30 seconds and the encoding and decoding time at the central controller is\nabout 1 second.\nTheorem 5.1. The computational complexity for the encoding and decoding is \u0398(nkN ), where N is\nthe number of rows in the matrix B and k, n depend on the number of available workers assuming\nthat each worker performs a single linear inverse computation. For a general dense matrix B, the\ncomputational complexity of computing linear inverse at each worker is \u0398(N 2l), where l is the\nnumber of iterations in the speci\ufb01ed iterative algorithm. The complexity of encoding and decoding is\nsmaller than that of the computation at each user for large B matrices (large N).\n\n5.2 Analysis on the cost of communication versus computation\n\nIn this work, we focus on optimizing the computation cost. However, what if the computation cost is\nsmall compared to the overall cost, including the communication cost? If this is true, optimizing the\ncomputation cost is not very useful. In Theorem 5.2 (proof appears in Supplementary Section 8.11),\nwe show that the computation cost is larger than the communication cost in the scaling-sense.\nTheorem 5.2. The ratio between the number of operations (computation) and the number of bits trans-\n\u00afd) operations\nmitted (communication) at the i-th worker is COSTcomputation/COSTcommunication = \u0398(li\nper integer, where li is the number of iterations at the i-th worker, and \u00afd is the average number of\nnon-zeros in each row of the B matrix.\n\n6 Experiments on Real Systems\n\nWe test the performance of the coded linear inverse algorithm for the PageRank problem on the\nTwitter graph and the Google Plus graph from the SNAP datasets [28]. The Twitter graph has 81,306\nnodes and 1,768,149 edges, and the Google Plus graph has 107,614 nodes and 13,673,453 edges. We\nuse the HT-condor framework in a cluster to conduct the experiments. The task is to solve k = 100\npersonalized PageRank problems in parallel using n = 120 workers. The uncoded algorithm picks\nthe \ufb01rst k workers and uses one worker for each PageRank problem. The two replication-based\nschemes replicate the computation of the \ufb01rst n \u2212 k PageRank problems in the extra n \u2212 k workers\n(see Section 4.2). The coded PageRank uses n workers to solve these k = 100 equations using\nAlgorithm 1. We use a (120, 100) code where the generator matrix is the submatrix composed of the\n\ufb01rst 100 rows in a 120\u00d7 120 DFT matrix. The computation results are shown in the left two \ufb01gures in\nFig. 2. Note that the two graphs are of different sizes so the computation in the two experiments take\ndifferent time. From Fig. 2, we can see that the mean-squared error of uncoded and replication-based\nschemes is larger than that of coded computation by a factor of 104 for large deadlines.\nWe also compare Algorithm 1 with the coded computing algorithm proposed in [6]. As we discussed\nin the Figure 1, the original coded technique in [6] ignores partial results and is suboptimal even in the\ntoy example of three workers. However, it has a natural extension to iterative methods, which will be\n\n8\n\n\fFigure 2: From left to right: (1,2) Experimentally computed overall MSE of uncoded, replication-\nbased and coded personalized PageRank on the Twitter and Google Plus graph on a cluster with 120\nworkers. The ratio of MSE for repetition-based schemes and coded linear inverse increase as Tdl\nincreases. (3) Comparison between an extended version of the algorithm in [6] and Algorithm 1 on\nthe Google Plus graph. The \ufb01gure shows that naively extending the general coded method using\nmatrix inverse introduces error ampli\ufb01cation. (4) Comparison of different codes. In this experiment\nthe DFT-code out-performs the other candidates in MSE.\n\n1, x\u2217\n\n2, . . . , x\u2217\n\ndiscussed in details later. The third \ufb01gure in Fig. 2 shows the comparison between the performance\nof Algorithm 1 and this extension of the algorithm from [6]. This extension uses the (un\ufb01nished)\npartial results from the k fastest workers to retrieve the required PageRank solutions. More concretely,\nsuppose S \u2282 [n] is the index set of the k fastest workers. Then, this extension retrieves the solutions to\nk]\u00b7 GS, where YS is\nthe original k PageRank problems by solving the equation YS = [x\u2217\ncomposed of the (partial) computation results obtained from the fastest k workers and GS is the k\u00d7 k\nsubmatrix composed of the columns in the generator matrix G with indexes in S. However, since\nthere is some remaining error at each worker (i.e., the computation results YS have not converged\nyet), when conducting the matrix-inverse-based decoding from [6], the error is magni\ufb01ed due to the\nlarge condition number of GS. This is why the algorithm in [6] should not be naively extended in the\ncoded linear inverse problem.\nOne question remains: what is the best code design for the coded linear inverse algorithm? Although\nwe do not have a concrete answer to this question, we have tested different codes (with different\ngenerator matrices G) in the Twitter graph experiment, all using Algorithm 1. The results are shown\nin the fourth \ufb01gure in Fig. 2. The generator matrix used for the \u201cbinary\u201d curve has i.i.d. binary entries\nin {\u22121, 1}. The generator matrix used for the \u201csparse\u201d curve has random binary sparse entries. The\ngenerator matrix for the \u201cGaussian\u201d curve has i.i.d. standard Gaussian entries. In this experiment, the\nDFT-code performs the best. However, \ufb01nding the best code in general is a meaningful future work.\n7 Conclusions\nBy studying coding for iterative algorithms designed for distributed inverse problems, we aim to\nintroduce new applications and analytical tools to the problem of coded computing with stragglers.\nSince these iterative algorithms designed for inverse problems commonly have decreasing error\nwith time, the partial computation results at stragglers can provide useful information for the \ufb01nal\noutputs. Note that this is unlike recent works on coding for multi-stage computing problems [29, 30],\nwhere the computation error can accumulate with time and coding has to be applied repeatedly to\nsuppress this error accumulation. An important connection worth discussing is the diversity gain in\nthis coded computing problem. The distributed computing setting in this work resembles random\nfading channels, which means coding can be used to exploit straggling diversity just as coding is\nused in communication channels to turn diverse channel fading into an advantage. What makes\ncoding even more suitable in our setting is that the amount of diversity gain achieved here through\nreplication is actually smaller than that can be achieved by replication in fading channels. This\nis because for two computers that solve the same equation Mxi = ri, the remaining error at the\nslow worker is a deterministic multiple of the remaining error at the fast worker (see equation (3)).\nTherefore, taking a weighted average of the two computation results through replication does not\nreduce error as in independent fading channels. How diversity gain can be achieved here optimally is\nworth deep investigation. Our next goals are two-fold: (1) extend the current method to solving a\nsingle large-scale inverse problem, such as graph mining with graphs that exceed the memory of a\nsingle machine; (2) carry out experiments on faster distributed systems such as Amazon EC2.\n\n9\n\n0102030Deadline Tdl (sec)10-10100012Deadline Tdl (sec)10-51000102030Deadline Tdl (sec)10-1010-5100Average mean-squared errorGoogle Plus graph0.511.52Deadline Tdl (sec)10-5100Twitter graphGoogle Plus graphCodedTwitter graphDFTSparseBinaryGaussianRepetition-1UncodedRepetition-2Repetition-1Coded>104UncodedRepetition-2Extension of codedMethod in [Lee et.al.]Algorithm 1Original CodedMethod in [Lee et.al.]\fReferences\n[1] J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56(2):74\u201380, 2013.\n\n[2] G. Joshi, Y. Liu, and E. Soljanin. On the delay-storage trade-off in content download from\ncoded distributed storage systems. IEEE Journal on Selected Areas in Communications, 32(5):\n989\u2013997, 2014.\n\n[3] D. Wang, G. Joshi, and G. Wornell. Ef\ufb01cient task replication for fast response times in parallel\nIn ACM SIGMETRICS Performance Evaluation Review, volume 42, pages\n\ncomputation.\n599\u2013600. ACM, 2014.\n\n[4] D. Wang, G. Joshi, and G. Wornell. Using straggler replication to reduce latency in large-scale\n\nparallel computing. ACM SIGMETRICS Performance Evaluation Review, 43(3):7\u201311, 2015.\n\n[5] L. Huang, S. Pawar, H. Zhang, and K. Ramchandran. Codes can reduce queueing delay in data\ncenters. In IEEE International Symposium on Information Theory Proceedings (ISIT), pages\n2766\u20132770. IEEE, 2012.\n\n[6] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran. Speeding up distributed\nmachine learning using codes. In IEEE International Symposium on Information Theory (ISIT),\npages 1143\u20131147. IEEE, 2016.\n\n[7] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis. Gradient coding. 2016.\n\n[8] S. Dutta, V. Cadambe, and P. Grover. Short-dot: Computing large linear transforms distributedly\nusing coded short dot products. In Advances In Neural Information Processing Systems, pages\n2092\u20132100, 2016.\n\n[9] N. S. Ferdinand and S. C. Draper. Anytime coding for distributed computation. In 54th Annual\nAllerton Conference on Communication, Control, and Computing (Allerton), pages 954\u2013960.\nIEEE, 2016.\n\n[10] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr. A uni\ufb01ed coding framework for distributed\ncomputing with straggling servers. In IEEE Globecom Workshops (GC Wkshps), pages 1\u20136.\nIEEE, 2016.\n\n[11] A. Reisizadehmobarakeh, S. Prakash, R. Pedarsani, and S. Avestimehr. Coded computation\nover heterogeneous clusters. In IEEE International Symposium on Information Theory (ISIT),\npages 2408\u20132412. IEEE, 2017.\n\n[12] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr. Coding for distributed fog computing. IEEE\n\nCommunications Magazine, 55(4):34\u201340, 2017.\n\n[13] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr. Polynomial codes: an optimal design for\nhigh-dimensional coded matrix multiplication. In Advances In Neural Information Processing\nSystems, 2017.\n\n[14] K. Lee, C. Suh, and K. Ramchandran. High-dimensional coded matrix multiplication. In IEEE\n\nInternational Symposium on Information Theory (ISIT), pages 2418\u20132422. IEEE, 2017.\n\n[15] K. Lee, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran. Coded computation for multicore\nsetups. In IEEE International Symposium on Information Theory (ISIT), pages 2413\u20132417.\nIEEE, 2017.\n\n[16] K.-H. Huang et al. Algorithm-based fault tolerance for matrix operations. IEEE transactions on\n\ncomputers, 100(6):518\u2013528, 1984.\n\n[17] Y. Saad. Iterative methods for sparse linear systems. SIAM, 2003.\n\n[18] T. H. Haveliwala. Topic-sensitive pagerank. In Proceedings of the 11th international conference\n\non World Wide Web, pages 517\u2013526. ACM, 2002.\n\n[19] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order\n\nto the web. Technical report, Stanford InfoLab, 1999.\n\n10\n\n\f[20] M. Haikin and R. Zamir. Analog coding of a source with erasures. In IEEE International\n\nSymposium on Information Theory, pages 2074\u20132078. IEEE, 2016.\n\n[21] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran. Network coding\nfor distributed storage systems. IEEE Transactions on Information Theory, 56(9):4539\u20134551,\n2010.\n\n[22] M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and\nD. Borthakur. Xoring elephants: Novel erasure codes for big data. In Proceedings of the VLDB\nEndowment, volume 6, pages 325\u2013336. VLDB Endowment, 2013.\n\n[23] M. A. Maddah-Ali and U. Niesen. Decentralized coded caching attains order-optimal memory-\n\nrate tradeoff. IEEE/ACM Transactions on Networking, 23(4):1029\u20131040, 2015.\n\n[24] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr. Coded mapreduce. In Communication, Control,\nand Computing (Allerton), 2015 53rd Annual Allerton Conference on, pages 964\u2013971. IEEE,\n2015.\n\n[25] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emerging \ufb01eld\nof signal processing on graphs: Extending high-dimensional data analysis to networks and other\nirregular domains. IEEE Signal Processing Magazine, 30(3):83\u201398, 2013.\n\n[26] A. Sandryhaila and J. M. F. Moura. Discrete signal processing on graphs. IEEE transactions on\n\nsignal processing, 61(7):1644\u20131656, 2013.\n\n[27] G. H. Golub and C. F. van Loan. Matrix computations, volume 3. JHU Press, 2012.\n\n[28] J. Leskovec and J. J. Mcauley. Learning to discover social circles in ego networks. In Advances\n\nin neural information processing systems, pages 539\u2013547, 2012.\n\n[29] Y. Yang, P. Grover, and S. Kar. Computing linear transformations with unreliable components.\n\nIEEE Transactions on Information Theory, 2017.\n\n[30] Y. Yang, P. Grover, and S. Kar. Rate distortion for lossy in-network linear function computation\nand consensus: Distortion accumulation and sequential reverse water-\ufb01lling. IEEE Transactions\non Information Theory, 2017.\n\n[31] X. Wang, P. Liu, and Y. Gu. Local-set-based graph signal reconstruction. IEEE Transactions on\n\nSignal Processing, 63(9):2432\u20132444, 2015.\n\n[32] S. K. Narang, A. Gadde, E. Sanou, and A. Ortega. Localized iterative methods for interpolation\nin graph structured data. In 2013 IEEE Global Conference on Signal and Information Processing\n(GlobalSIP), pages 491\u2013494. IEEE, 2013.\n\n[33] S. Chen, R. Varma, A. Sandryhaila, and J. Kova\u02c7cevi\u00b4c. Discrete signal processing on graphs:\n\nSampling theory. IEEE Transactions on Signal Processing, 63(24):6510\u20136523, 2015.\n\n[34] S. Chen, Y. Yang, C. Faloutsos, and J. Kovacevic. Monitoring manhattan\u2019s traf\ufb01c at 5 intersec-\ntions? In IEEE 2016 GlobalSIP Conference on Signal and Information Processing (GlobalSIP),\n2016.\n\n[35] A. M. Mood, F. A. Graybill, and D. C. Boes. Introduction to the theory of statistics, 3rd edition.\n\n1974.\n\n[36] H. Zhang and F. Ding. On the kronecker products and their applications. Journal of Applied\n\nMathematics, 2013, 2013.\n\n11\n\n\f", "award": [], "sourceid": 475, "authors": [{"given_name": "Yaoqing", "family_name": "Yang", "institution": "Carnegie Mellon University"}, {"given_name": "Pulkit", "family_name": "Grover", "institution": "CMU"}, {"given_name": "Soummya", "family_name": "Kar", "institution": "Carnegie Mellon University"}]}