{"title": "A primal-dual method for conic constrained distributed optimization problems", "book": "Advances in Neural Information Processing Systems", "page_first": 5049, "page_last": 5057, "abstract": "We consider cooperative multi-agent consensus optimization problems over an undirected network of agents, where only those agents connected by an edge can directly communicate. The objective is to minimize the sum of agent-specific composite convex functions over agent-specific private conic constraint sets; hence, the optimal consensus decision should lie in the intersection of these private sets. We provide convergence rates in sub-optimality, infeasibility and consensus violation; examine the effect of underlying network topology on the convergence rates of the proposed decentralized algorithms; and show how to extend these methods to handle time-varying communication networks.", "full_text": "A primal-dual method for conic constrained\n\ndistributed optimization problems\n\nNecdet Serhat Aybat\n\nErfan Yazdandoost Hamedani\n\nDepartment of Industrial Engineering\n\nDepartment of Industrial Engineering\n\nPenn State University\n\nUniversity Park, PA 16802\n\nnsa10@psu.edu\n\nPenn State University\n\nUniversity Park, PA 16802\n\nevy5047@psu.edu\n\nAbstract\n\nWe consider cooperative multi-agent consensus optimization problems over an\nundirected network of agents, where only those agents connected by an edge\ncan directly communicate. The objective is to minimize the sum of agent-\nspeci\ufb01c composite convex functions over agent-speci\ufb01c private conic constraint\nsets; hence, the optimal consensus decision should lie in the intersection of these\nprivate sets. We provide convergence rates in sub-optimality, infeasibility and\nconsensus violation; examine the effect of underlying network topology on the\nconvergence rates of the proposed decentralized algorithms; and show how to ex-\ntend these methods to handle time-varying communication networks.\n\n1\n\nIntroduction\n\nLet G = (N ,E) denote a connected undirected graph of N computing nodes, where N ,\n{1, . . . , N} and E \u2286 N \u00d7 N denotes the set of edges \u2013 without loss of generality assume that\n(i, j) \u2208 E implies i < j. Suppose nodes i and j can exchange information only if (i, j) \u2208 E , and\neach node i \u2208 N has a private (local) cost function \u03a6i : Rn \u2192 R \u222a {+\u221e} such that\nwhere \u03c1i : Rn \u2192 R \u222a {+\u221e} is a possibly non-smooth convex function, and fi : Rn \u2192 R is a\nsmooth convex function. We assume that fi is differentiable on an open set containing dom \u03c1i with\na Lipschitz continuous gradient \u2207fi, of which Lipschitz constant is Li; and the prox map of \u03c1i,\n\n\u03a6i(x) , \u03c1i(x) + fi(x),\n\n(1)\n\n(2)\n\nprox\u03c1i\n\n(x) , argmin\n\ny\u2208Rn (cid:8)\u03c1i(y) + 1\n\n2 ky \u2212 xk2(cid:9) ,\n\nis ef\ufb01ciently computable for i \u2208 N , where k.k denotes the Euclidean norm. Let Ni , {j \u2208 N :\n(i, j) \u2208 E or (j, i) \u2208 E} denote the set of neighboring nodes of i \u2208 N , and di , |Ni| is the degree\nof node i \u2208 N . Consider the following minimization problem:\ns.t. Aix \u2212 bi \u2208 Ki,\n\n\u2200i \u2208 N ,\n\n\u03a6i(x)\n\n(3)\n\nwhere Ai \u2208 Rmi\u00d7n, bi \u2208 Rmi and Ki \u2286 Rmi is a closed, convex cone. Suppose that projections\nonto Ki can be computed ef\ufb01ciently, while the projection onto the preimage A\u22121\n(Ki+bi) is assumed\nto be impractical, e.g., when Ki is the positive semide\ufb01nite cone, projection to preimage requires\nsolving an SDP. Our objective is to solve (3) in a decentralized fashion using the computing nodes\nN and exchanging information only along the edges E. In Section 2 and Section 3, we consider (3)\nwhen the topology of the connectivity graph is static and time-varying, respectively.\n\ni\n\nmin\n\nx\u2208Rn Xi\u2208N\n\nThis computational setting, i.e., decentralized consensus optimization, appears as a generic model\nfor various applications in signal processing, e.g., [1, 2], machine learning, e.g., [3, 4, 5] and sta-\ntistical inference, e.g., [6]. Clearly, (3) can also be solved in a \u201ccentralized\u201d fashion by commu-\nnicating all the private functions \u03a6i to a central node, and solving the overall problem at this\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fnode. However, such an approach can be very expensive both from communication and com-\nputation perspectives when compared to the distributed algorithms which are far more scalable\nto increasing problem data and network sizes.\n\nIn particular, suppose (Ai, bi) \u2208 Rm\u00d7(n+1) and\n\u03a6i(x) = \u03bbkxk1 + kAix \u2212 bik2 for some given \u03bb > 0 for i \u2208 N such that m \u226a n and N \u226b 1.\nHence, (3) is a very large scale LASSO problem with distributed data. To solve (3) in a centralized\nfashion, the data {(Ai, bi) : i \u2208 N} needs to be communicated to the central node. This can be\nprohibitively expensive, and may also violate privacy constraints \u2013 in case some node i does not\nwant to reveal the details of its private data. Furthermore, it requires that the central node has large\nenough memory to be able to accommodate all the data. On the other hand, at the expense of slower\nconvergence, one can completely do away with a central node, and seek for consensus among all\nthe nodes on an optimal decision using \u201clocal\u201d decisions communicated by the neighboring nodes.\nFrom computational perspective, for certain cases, computing partial gradients locally can be more\ncomputationally ef\ufb01cient when compared to computing the entire gradient at a central node. With\nthese considerations in mind, we propose decentralized algorithms that can compute solutions to (3)\nusing only local computations without explicitly requiring the nodes to communicate the functions\n{\u03a6i : i \u2208 N}; thereby, circumventing all privacy, communication and memory issues. Examples\nof constrained machine learning problems that \ufb01t into our framework include multiple kernel learn-\ning [7], and primal linear support vector machine (SVM) problems. In the numerical section we\nimplemented the proposed algorithms on the primal SVM problem.\n\n1.1 Previous Work\n\nThere has been active research [8, 9, 10, 11, 12] on solving convex-concave saddle point problems\nminx maxy L(x, y). In [9] primal-dual proximal algorithms are proposed for convex-concave prob-\nlems with known saddle-point structure minx maxy Ls(x, y) , \u03a6(x) + hT x, yi \u2212 h(y), where \u03a6\nand h are convex functions, and T is a linear map. These algorithms converge with rate O(1/k) for\nthe primal-dual gap, and they can be modi\ufb01ed to yield a convergence rate of O(1/k2) when either\n\u03a6 or h is strongly convex, and O(1/ek) linear rate, when both \u03a6 and h are strongly convex. More\n\nrecently, in [11] Chambolle and Pock extend their previous work in [9], using simpler proofs, to\nhandle composite convex primal functions, i.e., sum of smooth and (possibly) nonsmooth functions,\nand to deal with proximity operators based on Bregman distance functions.\nConsider minx\u2208Rn{Pi\u2208N\n\u03a6i(x) : x \u2208 \u2229i\u2208NXi} over G = (N ,E). Although the uncon-\nstrained consensus optimization, i.e., Xi = Rn, is well studied \u2013 see [13, 14] and the references\ntherein, the constrained case is still an immature, and recently developing area of active research\n[13, 14, 15, 16, 17, 18, 19]. Other than few exceptions, e.g., [15, 16, 17], the methods in liter-\nature require that each node compute a projection on the privately known set Xi in addition to\nconsensus and (sub)gradient steps, e.g., [18, 19]. Moreover, among those few exceptions that do not\nuse projections onto Xi when \u03a0Xi is not easy to compute, only [15, 16] can handle agent-speci\ufb01c\nconstraints without assuming global knowledge of the constraints by all agents. However, no rate\nresults in terms of suboptimality, local infeasibility, and consensus violation exist for the primal-\ndual distributed methods in [15, 16] when implemented for the agent-speci\ufb01c conic constraint sets\nXi = {x : Aix \u2212 bi \u2208 Ki} studied in this paper. In [15], a consensus-based distributed primal-\ndual perturbation (PDP) algorithm using a square summable but not summable step-size sequence\nis proposed. The objective is to minimize a composition of a global network function (smooth) with\nthe summation of local objective functions (smooth), subject to local compact sets and inequality\nconstraints on the summation of agent speci\ufb01c constrained functions. They showed that the local\nprimal-dual iterate sequence converges to a global optimal primal-dual solution; however, no rate\nresult was provided. The proposed PDP method can also handle non-smooth constraints with sim-\nilar convergence guarantees. Finally, while we were preparing this paper, we became aware of a\nvery recent work [16] related to ours. The authors proposed a distributed algorithm on time-varying\ncommunication network for solving saddle-point problems subject to consensus constraints. The\nalgorithm can also be applied to solve consensus optimization problems with inequality constraints\nthat can be written as summation of local convex functions of local and global variables. Under\nsome assumptions, it is shown that using a carefully selected decreasing step-size sequence, the\n\nergodic average of primal-dual sequence converges with O(1/\u221ak) rate in terms of saddle-point\n\nevaluation error; however, when applied to constrained optimization problems, no rate in terms of\neither suboptimality or infeasibility is provided.\n\n2\n\n\fContribution. We propose primal-dual algorithms for distributed optimization subject to agent\nspeci\ufb01c conic constraints. By assuming composite convex structure on the primal functions, we\nshow that our proposed algorithms converge with O(1/k) rate where k is the number of consensus\niterations. To the best of our knowledge, this is the best rate result for our setting. Indeed, \u01eb-optimal\nand \u01eb-feasible solution can be computed within O(1/\u01eb) consensus iterations for the static topology,\nand within O(1/\u01eb1+1/p) consensus iterations for the dynamic topology for any rational p \u2265 1,\nalthough O(1) constant gets larger for large p. Moreover, these methods are fully distributed, i.e.,\nthe agents are not required to know any global parameter depending on the entire network topology,\ne.g., the second smallest eigenvalue of the Laplacian; instead, we only assume that agents know who\ntheir neighbors are. Due to limited space, we put all the technical proofs to the appendix.\n\n1.2 Preliminary\n\nLet X and Y be \ufb01nite-dimensional vector spaces.\nproposed a primal-dual algorithm (PDA) for the following convex-concave saddle-point problem:\n\nIn a recent paper, Chambolle and Pock [11]\n\nmin\nx\u2208X\n\ny\u2208Y L(x, y) , \u03a6(x) + hT x, yi \u2212 h(y), where \u03a6(x) , \u03c1(x) + f (x),\nmax\n\n(4)\n\n\u03c1 and h are possibly non-smooth convex functions, f is a convex function and has a Lipschitz\ncontinuous gradient de\ufb01ned on dom \u03c1 with constant L, and T is a linear map. Brie\ufb02y, given x0, y0\nand algorithm parameters \u03bdx, \u03bdy > 0, PDA consists of two proximal-gradient steps:\n\nxk+1 \u2190 argmin\nyk+1 \u2190 argmin\n\nx\n\ny\n\n\u03c1(x) + f (xk) +D\u2207f (xk), x \u2212 xkE +DT x, ykE +\nh(y) \u2212DT (2xk+1 \u2212 xk), yE +\n\nDy(y, yk),\n\n1\n\u03bdy\n\n1\n\u03bdx\n\nDx(x, xk)\n\n(5a)\n\n(5b)\n\nwhere Dx and Dy are Bregman distance functions corresponding to some continuously differen-\ntiable strongly convex \u03c8x and \u03c8y such that dom \u03c8x \u2283 dom \u03c1 and dom \u03c8y \u2283 dom h. In particu-\nlar, Dx(x, \u00afx) , \u03c8x(x) \u2212 \u03c8x(\u00afx) \u2212 h\u2207\u03c8x(\u00afx), x \u2212 \u00afxi, and Dy is de\ufb01ned similarly. In [11], a simple\nproof for the ergodic convergence is provided for (5); indeed, it is shown that, when the convexity\nmodulus for \u03c8x and \u03c8y is 1, if \u03c4, \u03ba > 0 are chosen such that ( 1\n\nmax(T ), then\n\n\u03bdx \u2212 L) 1\n\n\u03bdy \u2265 \u03c32\n\n(6)\n\nL(\u00afxK , y) \u2212 L(x, \u00afyK ) \u2264\n\nK (cid:18) 1\nfor all x, y \u2208 X \u00d7 Y, where \u00afxK , 1\n\n1\n\n\u03bdx\n\nK PK\n\nDx(x, x0) +\n\n1\n\u03bdy\n\nDy(y, y0) \u2212(cid:10)T (x \u2212 x0), y \u2212 y0(cid:11)(cid:19) ,\n\nk=1 xk and \u00afyK , 1\n\nK PK\n\nk=1 yk.\n\nFirst, we de\ufb01ne the notation used throughout the paper. Next, in Theorem 1.1, we discuss a special\ncase of (4), which will help us prove the main results of this paper, and also allow us to develop\ndecentralized algorithms for the consensus optimization problem in (3). The proposed algorithms in\nthis paper can distribute the computation over the nodes such that each node\u2019s computation is based\non the local topology of G and the private information only available to that node.\nNotation. Throughout the paper, k.k denotes the Euclidean norm. Given a convex set S, let \u03c3S (.)\ndenote its support function, i.e., \u03c3S (\u03b8) , supw\u2208S h\u03b8, wi, let IS(\u00b7) denote the indicator function of\nS, i.e., IS(w) = 0 for w \u2208 S and equal to +\u221e otherwise, and let PS (w) , argmin{kv \u2212 wk :\nv \u2208 S} denote the projection onto S. For a closed convex set S, we de\ufb01ne the distance function\nas dS (w) , kPS (w) \u2212 wk. Given a convex cone K \u2208 Rm, let K\u2217 denote its dual cone, i.e.,\nK\u2217 , {\u03b8 \u2208 Rm : h\u03b8, wi \u2265 0 \u2200w \u2208 K}, and K\u25e6 , \u2212K\u2217 denote the polar cone of K. Note that for\na given cone K \u2208 Rm, \u03c3K(\u03b8) = 0 for \u03b8 \u2208 K\u25e6 and equal to +\u221e if \u03b8 6\u2208 K\u25e6, i.e., \u03c3K(\u03b8) = IK\u25e6 (\u03b8)\nfor all \u03b8 \u2208 Rm. Cone K is called proper if it is closed, convex, pointed, and it has a nonempty\ninterior. Given a convex function g : Rn \u2192 R \u222a {+\u221e}, its convex conjugate is de\ufb01ned as g\u2217(w) ,\nsup\u03b8\u2208Rn hw, \u03b8i \u2212 g(\u03b8). \u2297 denotes the Kronecker product, and In is the n \u00d7 n identity matrix.\nDe\ufb01nition 1. Let X , \u03a0i\u2208N Rn and X \u220b x = [xi]i\u2208N ; Y , \u03a0i\u2208N Rmi \u00d7 Rm0 , Y \u220b y =\n[\u03b8\u22a4\u03bb\u22a4]\u22a4 and \u03b8 = [\u03b8i]i\u2208N \u2208 Rm, where m , Pi\u2208N\nmi, and \u03a0 denotes the Cartesian product.\nGiven parameters \u03b3 > 0, \u03bai, \u03c4i > 0 for i \u2208 N , let D\u03b3 , 1\nImi ]i\u2208N ),\n2 \u03bb\u22a4D\u03b3 \u03bb\nand D\u03c4 , diag([ 1\n\u03c4i\nleads to the following Bregman distance functions: Dx(x, \u00afx) = 1\n, and Dy(y, \u00afy) =\n1\n1\n2 for Q \u227b 0.\n, where the Q-norm is de\ufb01ned as kzkQ\n\n2 x\u22a4D\u03c4 x and \u03c8y(y) , 1\n2 kx \u2212 \u00afxk2\n\nIn]i\u2208N ). De\ufb01ning \u03c8x(x) , 1\n\n\u03b3 Im0 , D\u03ba , diag([ 1\n\u03bai\n\nD\u03c4\n, (z\u22a4Qz)\n\n2 \u03b8\u22a4D\u03ba\u03b8 + 1\n\n2\nD\u03ba\n\n2\nD\u03b3\n\n+ 1\n\n2 (cid:13)(cid:13)\u03b8 \u2212 \u00af\u03b8(cid:13)(cid:13)\n\n2 (cid:13)(cid:13)\u03bb \u2212 \u00af\u03bb(cid:13)(cid:13)\n\n3\n\n\f\u03a6i(xi), and h(y) , h0(\u03bb) + Pi\u2208N\n\nTheorem 1.1. Let X , Y, and Bregman functions Dx, Dy be de\ufb01ned as in De\ufb01nition 1. Suppose\n\u03a6(x) , Pi\u2208N\nhi(\u03b8i), where {\u03a6i}i\u2208N are composite convex\nfunctions de\ufb01ned as in (1), and {hi}i\u2208N are closed convex with simple prox-maps. Given A0 \u2208\nRm0\u00d7n|N| and {Ai}i\u2208N such that Ai \u2208 Rmi\u00d7n, let T = [A\u22a4 A\u22a40 ]\u22a4, where A , diag([Ai]i\u2208N ) \u2208\nRm\u00d7n|N| is a block-diagonal matrix. Given the initial point (x0, y0), the PDA iterate sequence\n{xk, yk}k\u22651, generated according to (5a) and (5b) when \u03bdx = \u03bdy = 1, satis\ufb01es (6) for all K \u2265 1\nif \u00afQ(A, A0) , \uf8ee\n\u03c4i \u2212 Li)In]i\u2208N ). Moreover, if a\n\uf8f0\nsaddle point exists for (4), and \u00afQ(A, A0) \u227b 0, then {xk, yk}k\u22651 converges to a saddle point of (4);\nhence, {\u00afxk, \u00afyk}k\u22651 converges to the same point.\n\n\uf8f9\n\uf8fb (cid:23) 0, where \u00afD\u03c4 , diag([( 1\n\n\u00afD\u03c4 \u2212A\u22a4 \u2212A\u22a4\n\u2212A D\u03ba\n0\n\u2212A0\n0\nD\u03b3\n\n0\n\nAlthough the proof of Theorem 1.1 follows from the lines of [11], we provide the proof in the\nappendix for the sake of completeness as it will be used repeatedly to derive our results.\n\nNext we discuss how (5) can be implemented to compute an \u01eb-optimal solution to (3) in a distributed\nway using only O(1/\u01eb) communications over the communication graph G while respecting node-\nspeci\ufb01c privacy requirements. Later, in Section 3, we consider the scenario where the topology of\nthe connectivity graph is time-varying, and propose a distributed algorithm that requires O(1/\u01eb1+ 1\np )\ncommunications for any p \u2265 1. Finally, in Section 4 we test the proposed algorithms for solving the\nprimal SVM problem in a decentralized manner. These results are shown under Assumption 1.1.\nAssumption 1.1. The duality gap for (3) is zero, and a primal-dual solution to (3) exists.\nA suf\ufb01cient condition for this is the existence of a Slater point, i.e., there exists \u00afx \u2208 relint(dom \u03a6)\nsuch that Ai \u00afx \u2212 bi \u2208 int(Ki) for i \u2208 N , where dom \u03a6 = \u2229i\u2208N dom \u03a6i.\n2 Static Network Topology\n\nLet xi \u2208 Rn denote the local decision vector of node i \u2208 N . By taking advantage of the fact that G\n\nis connected, we can reformulate (3) as the following distributed consensus optimization problem:\n\nmin\n\nxi\u2208Rn, i\u2208N(Xi\u2208N\n\n\u03a6i(xi) | xi = xj : \u03bbij, \u2200(i, j) \u2208 E, Aixi \u2212 bi \u2208 Ki : \u03b8i, \u2200i \u2208 N) ,\n\n(7)\n\nwhere \u03bbij \u2208 Rn and \u03b8i \u2208 Rmi are the corresponding dual variables. Let x = [xi]i\u2208N \u2208 Rn|N|. The\nconsensus constraints xi = xj for (i, j) \u2208 E can be formulated as M x = 0, where M \u2208 Rn|E|\u00d7n|N|\nis a block matrix such that M = H \u2297 In where H is the oriented edge-node incidence matrix, i.e.,\nthe entry H(i,j),l, corresponding to edge (i, j) \u2208 E and l \u2208 N , is equal to 1 if l = i, \u22121 if l = j,\nand 0 otherwise. Note that M TM = H TH \u2297 In = \u2126 \u2297 In, where \u2126 \u2208 R|N|\u00d7|N| denotes the graph\nLaplacian of G, i.e., \u2126ii = di, \u2126ij = \u22121 if (i, j) \u2208 E or (j, i) \u2208 E, and equal to 0 otherwise.\nFor any closed convex set S, we have \u03c3\u2217\ni \u2208 N , one can obtain the following saddle point problem corresponding to (7),\n\nS (\u00b7) = IS(\u00b7); therefore, using the fact that \u03c3\u2217\nKi\ny L(x, y) , Xi\u2208N(cid:18)\u03a6i(xi) + h\u03b8i, Aixi \u2212 bii \u2212 \u03c3Ki (\u03b8i)(cid:19) + h\u03bb, M xi,\nwhere y = [\u03b8\u22a4 \u03bb\u22a4]\u22a4 for \u03bb = [\u03bbij](i,j)\u2208E \u2208 Rn|E|, \u03b8 = [\u03b8i]i\u2208N \u2208 Rm, and m , Pi\u2208N\nNext, we study the distributed implementation of PDA in (5a)-(5b) to solve (8). Let \u03a6(x) ,\n\u03c3Ki (\u03b8i) + hbi, \u03b8ii. De\ufb01ne the block-diagonal matrix A ,\nPi\u2208N\ndiag([Ai]i\u2208N ) \u2208 Rm\u00d7n|N| and T = [A\u22a4M\u22a4]\u22a4. Therefore, given the initial iterates x0, \u03b80, \u03bb0\nand parameters \u03b3 > 0, \u03c4i, \u03bai > 0 for i \u2208 N , choosing Dx and Dy as de\ufb01ned in De\ufb01nition 1, and\nsetting \u03bdx = \u03bdy = 1, PDA iterations in (5a)-(5b) take the following form:\ni ), xii + hAixi \u2212 bi, \u03b8k\n\n\u03a6i(xi), and h(y) , Pi\u2208N\n\nxk+1 \u2190 argmin\n\n1\n2\u03c4i kxi \u2212 xk\n\n= IKi for\n\ni i +\n\nmi.\n\nmax\n\nmin\n\n(9a)\n\n(8)\n\nx\n\nx\n\nh\u03bbk, M xi +Xi\u2208Nh\u03c1i(xi) + h\u2207f (xk\n\u03c3Ki (\u03b8i) \u2212 hAi(2xk+1\ni \u2212 xk\nn \u2212 hM (2xk+1 \u2212 xk), \u03bbi +\n\n1\n\n1\n2\u03bai k\u03b8i \u2212 \u03b8k\n\ni k2,\n\ni ) \u2212 bi, \u03b8ii +\n2\u03b3 k\u03bb \u2212 \u03bbkk2o = \u03bbk + \u03b3M (2xk+1 \u2212 xk).\n\ni \u2208 N\n\ni k2i,\n\n\u03b8k+1\ni \u2190 argmin\n\u03bbk+1 \u2190 argmin\n\n\u03b8i\n\n\u03bb\n\n(9b)\n\n(9c)\n\n4\n\n\fSince Ki is a cone, prox\u03bai\u03c3Ki\n\n(.); hence, \u03b8k+1\n\ni\n\ncan be written in closed form as\n\ni\n\n(.) = PK\u25e6\ni(cid:16)\u03b8k\n\n\u03b8k+1\ni = PK\u25e6\n\ni + \u03bai(cid:16)Ai(2xk+1\n\ni \u2212 xk\n\ni ) \u2212 bi(cid:17)(cid:17),\n\ni \u2208 N .\n\nUsing recursion in (9c), we can write \u03bbk+1 as a partial summation of primal iterates {x\u2113}k\n\u03bbk = \u03bb0 + \u03b3 Pk\u22121\n\u2113=0 M (2x\u2113+1 \u2212 x\u2113). Let \u03bb0 \u2190 \u03b3M x0, s0 \u2190 x0, and sk , xk + Pk\nk \u2265 1; hence, \u03bbk = \u03b3M sk. Using the fact that M\u22a4M = \u2126 \u2297 In, we obtain\nhM x, \u03bbki = \u03b3 hx, (\u2126 \u2297 In)ski = \u03b3Pi\u2208Nhxi, Pj\u2208Ni\n\nThus, PDA iterations given in (9) for the static graph G can be computed in a decentralized way, via\nthe node-speci\ufb01c computations as in Algorithm DPDA-S displayed in Fig. 1 below.\n\n\u2113=0, i.e.,\n\u2113=1 x\u2113 for\n\n(sk\ni \u2212 sk\n\nj )i.\n\ni \u2190 x0\ni ,\n\nAlgorithm DPDA-S ( x0, \u03b80, \u03b3,{\u03c4i, \u03bai}i\u2208N )\nInitialization: s0\nStep k: (k \u2265 0)\n1. xk+1\n2. sk+1\n3. \u03b8k+1\n\ni \u2208 N\ni \u2212 \u03c4i(cid:16)\u2207fi(xk\ni \u2190 prox\u03c4i\u03c1i(cid:16)xk\ni +Pk+1\ni \u2190 xk+1\ni \u2208 N\ni(cid:16)\u03b8k\ni + \u03bai Ai(2xk+1\ni \u2212 xk\ni \u2190 PK\u25e6\nFigure 1: Distributed Primal Dual Algorithm for Static G (DPDA-S)\n\ni + \u03b3Pj\u2208Ni\ni \u2208 N\n\ni ) \u2212 bi(cid:1)(cid:17),\n\nj )(cid:17)(cid:17),\n\n(sk\ni \u2212 sk\n\ni ) + A\u22a4\n\n\u2113=1 x\u2113\ni ,\n\ni \u03b8k\n\ni \u2208 N\n\nThe convergence rate for DPDA-S, given in (6), follows from Theorem 1.1 with the help of following\n\ntechnical lemma which provides a suf\ufb01cient condition for \u00afQ(A, A0) \u227b 0.\nLemma 2.1. Given {\u03c4i, \u03bai}i\u2208N and \u03b3 such that \u03b3 > 0, and \u03c4i, \u03bai > 0 for i \u2208 N , let A0 = M and\nA , diag([Ai]i\u2208N ). Then \u00afQ , \u00afQ(A, A0) (cid:23) 0 if {\u03c4i, \u03bai}i\u2208N and \u03b3 are chosen such that\n\n(cid:18) 1\n\u03c4i \u2212 Li \u2212 2\u03b3di(cid:19) 1\n\n\u03bai \u2265 \u03c32\n\nmax(Ai),\n\n\u2200 i \u2208 N ,\n\n(10)\n\nand \u00afQ \u227b 0 if (10) holds with strict inequality, where \u00afQ(A, A0) is de\ufb01ned in Theorem 1.1.\nRemark 2.1. Choosing \u03c4i = (ci + Li + 2\u03b3di)\u22121, \u03bai = ci/\u03c32\n\nmax(Ai) for any ci > 0 satis\ufb01es (10).\n\nNext, we quantify the suboptimality and infeasibility of the DPDA-S iterate sequence.\n\nTheorem 2.2. Suppose Assumption 1.1 holds. Let {xk, \u03b8k, \u03bbk}k\u22650 be the sequence generated by\nAlgorithm DPDA-S, displayed in Fig. 1, initialized from an arbitrary x0 and \u03b80 = 0. Let step-sizes\n{\u03c4i, \u03bai}i\u2208N and \u03b3 be chosen satisfying (10) with strict inequality. Then {xk, \u03b8k, \u03bbk}k\u22650 converges\nto {x\u2217, \u03b8\u2217, \u03bb\u2217}, a saddle point of (8) such that x\u2217 = 1 \u2297 x\u2217 and (x\u2217, \u03b8\u2217) is a primal-dual optimal\nsolution to (3); moreover, the following error bounds hold for all K \u2265 1:\n\nk\u03bb\u2217kkM \u00afxKk +Xi\u2208N\n\u03b3k\u03bb\u2217k2 \u2212 \u03b3\n\nwhere \u03981 , 2\n\nk\u03b8\u2217\n\ni k dKi (Ai \u00afxK\n\ni \u2212 bi) \u2264 \u03981/K,\n\n2 (cid:13)(cid:13)M x0(cid:13)(cid:13)\n\n2\n\n+Pi\u2208N h 1\n\n2\u03c4ikx\u2217i \u2212 x0\n\nik2 + 4\n\n|\u03a6(\u00afxK ) \u2212 \u03a6(x\u2217)| \u2264 \u03981/K,\n\u03baik\u03b8\u2217i k2i, and \u00afxK , 1\n\nK PK\n\nk=1 xk.\n\n3 Dynamic Network Topology\n\nIn this section we develop a distributed primal-dual algorithm for solving (3) when the com-\nmunication network topology is time-varying. We assume a compact domain, i.e., let Di ,\nmaxxi,x\u2032\n\ni\u2208dom \u03c1i kx \u2212 x\u2032k and B , maxi\u2208N Di < \u221e. Let C be the set of consensus decisions:\n\nC , {x \u2208 Rn|N | : xi = \u00afx, \u2200i \u2208 N for some \u00afx \u2208 Rn s.t. k\u00afxk \u2264 B},\n\nthen one can reformulate (3) in a decentralized way as follows:\n\nmin\n\nx\n\nmax\n\ny L(x, y) , Xi\u2208N(cid:16)\u03a6i(xi) + h\u03b8i, Aixi \u2212 bii \u2212 \u03c3Ki (\u03b8i)(cid:17) + h\u03bb, xi \u2212 \u03c3C (\u03bb),\n\nwhere y = [\u03b8\u22a4\u03bb\u22a4]\u22a4 such that \u03bb \u2208 Rn|N|, \u03b8 = [\u03b8i]i\u2208N \u2208 Rm, and m , Pi\u2208N\n\nmi.\n\n(11)\n\n5\n\n\fNext, we consider the implementation of PDA in (5) to solve (11). Let \u03a6(x) , Pi\u2208N\n\u03a6i(xi), and\nh(y) , \u03c3C(\u03bb) +Pi\u2208N\n\u03c3Ki (\u03b8i) +hbi, \u03b8ii. De\ufb01ne the block-diagonal matrix A , diag([Ai]i\u2208N ) \u2208\nRm\u00d7n|N| and T = [A\u22a4 In|N|]\u22a4. Therefore, given the initial iterates x0, \u03b80, \u03bb0 and parameters \u03b3 >\n0, \u03c4i, \u03bai > 0 for i \u2208 N , choosing Dx and Dy as de\ufb01ned in De\ufb01nition 1, and setting \u03bdx = \u03bdy = 1,\nPDA iterations given in (5) take the following form: Starting from \u00b50 = \u03bb0, compute for i \u2208 N\n\nx\n\nxk+1\ni \u2190 argmin\n\u03b8k+1\ni \u2190 argmin\n\u03bbk+1 \u2190 argmin\n\n\u03b8i\n\n\u00b5\n\ni ), xii + hAixi \u2212 bi, \u03b8k\ni \u2212 xk\ni ) \u2212 bi, \u03b8ii +\n\n\u03c1i(xi) + h\u2207f (xk\n\u03c3Ki (\u03b8i) \u2212 hAi(2xk+1\n\u03c3C (\u00b5) \u2212 h2xk+1 \u2212 xk, \u00b5i +\n\n1\n2\u03b3 k\u00b5 \u2212 \u00b5kk2\n2,\n\ni i + hxi, \u00b5k\n\n1\n2\u03c4i kxi \u2212 xk\ni i +\n1\n2\u03bai k\u03b8i \u2212 \u03b8k\ni k2\n2,\n\u00b5k+1 \u2190 \u03bbk+1.\n\ni k2\n2,\n\nUsing extended Moreau decomposition for proximal operators, \u03bbk+1 can be written as\n\n\u03bbk+1 = argmin\n\n\u03c3C (\u00b5) +\n\n\u00b5\n\n1\n2\u03b3 k\u00b5 \u2212 (\u00b5k + \u03b3(2xk+1 \u2212 xk))k2 = prox\u03b3\u03c3C\n\n(\u00b5k + \u03b3(2xk+1 \u2212 xk))\n\n= \u00b5k + \u03b3(2xk+1 \u2212 xk) \u2212 \u03b3 PC(cid:16) 1\n\n\u03b3\n\n\u00b5k + 2xk+1 \u2212 xk(cid:17).\n\nLet 1 \u2208 R|N| be the vector all ones, B0 , {x \u2208 Rn : kxk \u2264 B}. Note PB0 (x) = x min{1, B\nFor any x = [xi]i\u2208N \u2208 Rn|N|, PC(x) can be computed as\n\nkxk}.\n\nPC (x) = 1 \u2297 p(x), where p(x) , argmin\n\nk\u03be \u2212 xik2 = argmin\n\n\u03be\u2208B0 k\u03be \u2212\n\nxik2.\n\n(14)\n\n\u03be\u2208B0 Xi\u2208N\n\n1\n\n|N| Xi\u2208N\n\n(12a)\n\n(12b)\n\n(12c)\n\n(13)\n\n(15)\n\nLet B , {x : kxik \u2264 B, i \u2208 N} = \u03a0i\u2208NB0. Hence, we can write PC(x) = PB ((W \u2297 In)x)\nwhere W , 1\n|N|\n\n11\u22a4 \u2208 R|N|\u00d7|N|. Equivalently,\n\nPC (x) = PB (1 \u2297 \u02dcp(x)) , where\n\n\u02dcp(x) , 1\n\n|N | Pi\u2208N xi.\n\nAlthough x-step and \u03b8-step of the PDA implementation in (12) can be computed locally at each\nnode, computing \u03bbk+1 requires communication among the nodes. Indeed, evaluating the average\noperator \u02dcp(.) is not a simple operation in a decentralized computational setting which only allows\nfor communication among neighbors. In order to overcome this issue, we will approximate \u02dcp(.)\noperator using multi-consensus steps, and analyze the resulting iterations as an inexact primal-dual\nalgorithm. In [20], this idea has been exploited within a distributed primal algorithm for uncon-\nstrained consensus optimization problems. We de\ufb01ne the consensus step as one time exchanging\nlocal variables among neighboring nodes \u2013 the details of this operation will be discussed shortly.\n\ni , and V t\n\nSince the connectivity network is dynamic, let Gt = (N ,E t) be the connectivity network at the time\nt-th consensus step is realized for t \u2208 Z+. We adopt the information exchange model in [21].\nAssumption 3.1. Let V t \u2208 R|N|\u00d7|N| be the weight matrix corresponding to Gt = (N ,E t) at the\ntime of t-th consensus step and N t\n, {j \u2208 N : (i, j) \u2208 E t or (j, i) \u2208 E t}. Suppose for all\nt \u2208 Z+: (i) V t is doubly stochastic; (ii) there exists \u03b6 \u2208 (0, 1) such that for i \u2208 N , V t\nij \u2265 \u03b6 if\ni ; (iii) G\u221e = (N ,E\u221e) is connected where E\u221e , {(i, j) \u2208 N \u00d7 N :\nj \u2208 N t\n(i, j) \u2208 E t for in\ufb01nitely many t \u2208 Z+}, and there exists Z \u220b T \u25e6 > 1 such that if (i, j) \u2208 E\u221e, then\n(i, j) \u2208 E t \u222a E t+1 \u222a ... \u222a E t+T \u25e6\u22121 for all t \u2265 1.\n[21] Let Assumption 3.1 holds, and W t,s = V tV t\u22121...V s+1 for t \u2265 s + 1. Given\ns \u2265 0 the entries of W t,s converges to 1\nN as t \u2192 \u221e with a geometric rate, i.e., for all i, j \u2208 N , one\nN (cid:12)(cid:12) \u2264 \u0393\u03b1t\u2212s, where \u0393 , 2(1+\u03b6\u2212 \u00afT )/(1\u2212\u03b6 \u00afT ), \u03b1 , (1\u2212\u03b6 \u00afT )1/ \u00afT , and \u00afT , (N \u22121)T \u25e6.\nhas (cid:12)(cid:12)W t,s\n\nij = 0 if j /\u2208 N t\n\nConsider the k-th iteration of PDA as shown in (12). Instead of computing \u03bbk+1 exactly according\nto (13), we propose to approximate \u03bbk+1 with the help of Lemma 3.1 and set \u00b5k+1 to this approx-\nimation. In particular, let tk be the total number of consensus steps done before k-th iteration of\nPDA, and let qk \u2265 1 be the number of consensus steps within iteration k. For x = [xi]i\u2208N , de\ufb01ne\n\nij \u2212 1\n\nLemma 3.1.\n\ni\n\n(16)\n\nto approximate PC(x) in (13). Note that Rk(\u00b7) can be computed in a distributed fashion requiring\nqk communications with the neighbors for each node. Indeed,\n\nRk(x) , PB (W tk +qk ,tk \u2297 In) x(cid:1)\ni (x) , PB0(cid:16)Xj\u2208N\n\ni (x)]i\u2208N such that Rk\n\nRk(x) = [Rk\n\nW tk +qk ,tk\n\nij\n\nxj(cid:17).\n\n(17)\n\n6\n\n\fMoreover, the approximation error, Rk(x) \u2212 PC(x), for any x can be bounded as in (18) due to\nnon-expansivity of PB and using Lemma 3.1. From (15), we get for all i \u2208 N ,\nN Xj\u2208N\nxj(cid:1)k\n\u221aN \u0393\u03b1qk kxk .\n\ni (x) \u2212 PB0 \u02dcp(x)(cid:1)k = kPB0 Xj\u2208N\nkRk\n\nW tk +qk ,tk\n\n(18)\n\nij\n\nxj(cid:1) \u2212 PB0 1\n\u2212 1\nN(cid:1)xjk \u2264\n\n\u2264 kXj\u2208N W tk +qk ,tk\n\nij\n\nThus, (15) implies that kRk(x) \u2212 PC(x)k \u2264 N \u0393\u03b1qk kxk. Next, to obtain an inexact variant of\n\n(12), we replace the exact computation in (12c) with the inexact iteration rule:\n\n\u00b5k+1 \u2190 \u00b5k + \u03b3(2xk+1 \u2212 xk) \u2212 \u03b3Rk(cid:16) 1\n\n\u03b3 \u00b5k + 2xk+1 \u2212 xk(cid:17) .\n\nThus, PDA iterations given in (12) can be computed inexactly, but in decentralized way for dynamic\nconnectivity, via the node-speci\ufb01c computations as in Algorithm DPDA-D displayed in Fig. 2 below.\n\n(19)\n\nAlgorithm DPDA-D ( x0, \u03b80, \u03b3,{\u03c4i, \u03bai}i\u2208N ,{qk}k\u22650 )\nInitialization: \u00b50\nStep k: (k \u2265 0)\n1. xk+1\n\ni \u2190 0,\n\ni ) + A\u22a4\n\ni \u03b8k\n\ni + \u00b5k\n\ni \u2208 N\ni \u2212 \u03c4i(cid:16)\u2207fi(xk\ni + \u03bai Ai(2xk+1\ni \u2212 xk\nV tk +\u2113\nij\n\nrj,\n\ntk +\u2113\ni\n\n\u222a{i}\n\ni \u2190 prox\u03c4i\u03c1i(cid:16)xk\ni \u2190 PK\u25e6\n\n2. \u03b8k+1\n3. For \u2113 = 1, . . . , qk\n4.\n\ni(cid:16)\u03b8k\nri \u2190Pj\u2208N\n\n5. End For\n6. \u00b5k+1\n\ni \u2190 \u00b5k\n\ni + \u03b3(2xk+1\n\ni \u2212 xk\n\ni ) \u2212 bi(cid:1)(cid:17),\ni \u2208 N\ni ) \u2212 \u03b3PB0 ri(cid:1),\n\ni \u2208 N\n\ni(cid:17)(cid:17),\ni \u2208 N\n\nri \u2190 1\n\n\u03b3 \u00b5k\n\ni + 2xk+1\n\ni \u2212 xk\n\ni\n\ni \u2208 N\n\nFigure 2: Distributed Primal Dual Algorithm for Dynamic Gt (DPDA-D)\n\nNext, we de\ufb01ne the proximal error sequence {ek}k\u22651 as in (20), which will be used later for ana-\n\nlyzing the convergence of Algorithm DPDA-D displayed in Fig. 2.\n\nek+1 , PC(cid:16) 1\n\n\u03b3 \u00b5k + 2xk+1 \u2212 xk(cid:17) \u2212 Rk(cid:16) 1\n\n\u03b3 \u00b5k + 2xk+1 \u2212 xk(cid:17) ;\n\nhence, \u00b5k = \u03bbk + \u03b3ek for k \u2265 1 when (12c) is replaced with (19). In the rest, we assume \u00b50 = 0.\nThe following observation will also be useful to prove error bounds for DPDA-D iterate sequence.\nFor each i \u2208 N , the de\ufb01nition of Rk\ni (x) \u2208 B0 for all x; hence, from (19),\ni \u2212 xk\n\ni in (17) implies that Rk\ni )k + \u03b3kRk\n\u03b3 \u00b5k + 2xk+1 \u2212 xk(cid:1)k \u2264 k\u00b5k\ni 1\nThus, we trivially get the following bound on (cid:13)(cid:13)\u00b5k(cid:13)(cid:13):\nk\u00b5kk \u2264 4\u03b3\u221aN B k.\n\ni k + 4\u03b3B.\n\ni + \u03b3(2xk+1\n\nk \u2264 k\u00b5k\n\nk\u00b5k+1\n\ni\n\nMoreover, for any \u00b5 and \u03bb we have that\n\n\u03c3C (\u00b5) = sup\n\nx\u2208C h\u03bb, xi + h\u00b5 \u2212 \u03bb, xi \u2264 \u03c3C (\u03bb) + \u221aN B k\u00b5 \u2212 \u03bbk.\n\n(20)\n\n(21)\n\n(22)\n\nTheorem 3.2. Suppose Assumption 1.1 holds. Starting from \u00b50 = 0, \u03b80 = 0, and an arbitrary\nx0, let {xk, \u03b8k, \u00b5k}k\u22650 be the iterate sequence generated using Algorithm DPDA-D, displayed in\np\u221ak consensus steps at the k-th iteration for all k \u2265 1 for some rational p \u2265 1.\nFig. 2, using qk =\nLet primal-dual step-sizes {\u03c4i, \u03bai}i\u2208N and \u03b3 be chosen such that the following holds:\n\n(cid:16) 1\n\u03c4i \u2212 Li \u2212 \u03b3(cid:17) 1\n\n\u03bai\n\n> \u03c32\n\nmax(Ai),\n\n\u2200 i \u2208 N .\n\n(23)\n\nThen {xk, \u03b8k, \u00b5k}k\u22650 converges to {x\u2217, \u03b8\u2217, \u03bb\u2217}, a saddle point of (11) such that x\u2217 = 1 \u2297 x\u2217\nand (x\u2217, \u03b8\u2217) is a primal-dual optimal solution to (3). Moreover, the following bounds hold for all\nK \u2265 1:\nk\u03bb\u2217k d \u02dcC (\u00afxK ) +Xi\u2208N\nK PK\n\n\u03baik\u03b8\u2217i k2i,\nk=1 \u03b1qkh2\u03b3k2 + (cid:16)\u03b3 + k\u03bb\u2217k\u221aN B(cid:17) ki. Moreover, supK\u2208Z+ \u03983(K) < \u221e;\n\n\u03b3k\u03bb\u2217k+(cid:13)(cid:13)x0 \u2212 x\u2217(cid:13)(cid:13)(cid:1)+Pi\u2208N h 1\n\nwhere \u00afxK , 1\nand \u03983(K) , 8N 2B2\u0393 PK\nhence, 1\n\nk=1 xk, \u03982 , 2k\u03bb\u2217k! 1\n\n|\u03a6(\u00afxK ) \u2212 \u03a6(x\u2217)| \u2264\n\ni k dKi (Ai \u00afxK\n\n\u03c4ikx\u2217i \u2212x0\n\ni \u2212 bi) \u2264\n\nik2 + 4\n\n\u03982 + \u03983(K)\n\n\u03982 + \u03983(K)\n\nk\u03b8\u2217\n\nK\n\nK\n\n,\n\n,\n\nK \u03983(K) = O( 1\nK ).\n\n7\n\n\fRemark 3.1. Note that the suboptimality, infeasibility and consensus violation at the K-th iteration\nis O(\u03983(K)/K), where \u03983(K) denotes the error accumulation due to approximation errors, and\n\u03983(K) can be bounded above for all K \u2265 1 as \u03983(K) \u2264 RPK\nk=1 \u03b1qk k2 for some constant\np\u221ak for k \u2265 1, then the\np\u221akk2 < \u221e for any p \u2265 1, if one chooses qk =\nR > 0. Since P\u221ek=1 \u03b1\ntotal number of communications per node until the end of K-th iteration can be bounded above by\nPK\nk=1 qk = O(K 1+1/p). For large p, qk grow slowly, it makes the method more practical at the cost\nof longer convergence time due to increase in O(1) constant. Note that qk = (log(k))2 also works\nand it grows very slowly. We assume agents know qk as a function of k at the beginning, hence,\nsynchronicity can be achieved by simply counting local communications with each neighbor.\n\n4 Numerical Section\n\nWe tested DPDA-S and DPDA-D on a primal linear SVM problem where the data is distributed\namong the computing nodes in N . For the static case, communication network G = (N ,E) is a\nconnected graph that is generated by randomly adding edges to a spanning tree, generated uniformly\nat random, until a desired algebraic connectivity is achieved. For the dynamic case, for each consen-\nsus round t \u2265 1, Gt is generated as in the static case, and V t , I \u2212 1\nc \u2126t, where \u2126t is the Laplacian\nof Gt, and the constant c > dt\nmax. We ran DPDA-S and DPDA-D on the line and complete graphs as\nwell to see the topology effect \u2013 for the dynamic case when the topology is line, each Gt is a random\nline graph. Let S , {1, 2, .., s} and D , {(x\u2113, y\u2113) \u2208 Rn \u00d7 {\u22121, +1} : \u2113 \u2208 S} be a set of feature\nvector and label pairs. Suppose S is partitioned into Stest and Strain, i.e., the index sets for the\ntest and training data; let {Si}i\u2208N be a partition of Strain among the nodes N . Let w = [wi]i\u2208N ,\nb = [bi]i\u2208N , and \u03be \u2208 R|Strain| such that wi \u2208 Rn and bi \u2208 R for i \u2208 N .\nw,b,\u03ben 1\no\nSimilar to [3], {x\u2113}\u2113\u2208S is generated from two-dimensional multivariate Gaussian distribution with\ncovariance matrix \u03a3 = [1, 0; 0, 2] and with mean vector either m1 = [\u22121,\u22121]T or m2 = [1, 1]T\nwith equal probability. The experiment was performed for C = 2, |N| = 10, s = 900 such\nthat |Stest| = 600, |Si| = 30 for i \u2208 N , i.e., |Strain| = 300, and qk = \u221ak. We ran DPDA-S\nand DPDA-D on line, random, and complete graphs, where the random graph is generated such\nthat the algebraic connectivity is approximately 4. Relative suboptimality and relative consensus\n\nConsider the following distributed SVM problem:\ny\u2113(wT\n\nkwik2 + C |N|Xi\u2208N X\u2113\u2208Si\n\n\u2113 \u2208 Si, i \u2208 N ,\n\n2 Xi\u2208N\n\n(i, j) \u2208 E\n\nmin\n\n\u03be\u2113 :\n\ni x\u2113 + bi) \u2265 1 \u2212 \u03be\u2113,\n\nwi = wj,\n\nbi = bj\n\n\u03be\u2113 \u2265 0,\n\nviolation, i.e., max(i,j)\u2208E k[w\u22a4i bi]\u22a4 \u2212 [w\u22a4j bj]\u22a4k/(cid:13)(cid:13)(cid:13)\n\n, and absolute feasibility violation are\nplotted against iteration counter in Fig. 3, where [w\u2217\u22a4b\u2217] denotes the optimal solution to the central\nproblem. As expected, the convergence is slower when the connectivity of the graph is weaker.\nFurthermore, visual comparison between DPDA-S, local SVMs (for two nodes) and centralized\nSVM for the same training and test data sets is given in Fig. 4 and Fig. 5 in the appendix.\n\n[w\u2217\u22a4b\u2217](cid:13)(cid:13)(cid:13)\n\nFigure 3: Static (top) and Dynamic (bottom) network topologies: line, random, and complete graphs\n\n8\n\n\fReferences\n\n[1] Qing Ling and Zhi Tian. Decentralized sparse signal recovery for compressive sleeping wireless sensor\n\nnetworks. Signal Processing, IEEE Transactions on, 58(7):3816\u20133827, 2010.\n\n[2] Ioannis D Schizas, Alejandro Ribeiro, and Georgios B Giannakis. Consensus in ad hoc WSNs with noisy\nlinks - Part I: Distributed estimation of deterministic signals. Signal Processing, IEEE Transactions on,\n56(1):350\u2013364, 2008.\n\n[3] Pedro A Forero, Alfonso Cano, and Georgios B Giannakis. Consensus-based distributed support vector\n\nmachines. The Journal of Machine Learning Research, 11:1663\u20131707, 2010.\n\n[4] Ryan McDonald, Keith Hall, and Gideon Mann. Distributed training strategies for the structured percep-\ntron. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of\nthe Association for Computational Linguistics, pages 456\u2013464. Association for Computational Linguis-\ntics, 2010.\n\n[5] F. Yan, S. Sundaram, S. Vishwanathan, and Y. Qi. Distributed autonomous online learning: Regrets\nand intrinsic privacy-preserving properties. Knowledge and Data Engineering, IEEE Transactions on,\n25(11):2483\u20132493, 2013.\n\n[6] Gonzalo Mateos, Juan Andr\u00b4es Bazerque, and Georgios B Giannakis. Distributed sparse linear regression.\n\nSignal Processing, IEEE Transactions on, 58(10):5262\u20135276, 2010.\n\n[7] Francis R Bach, Gert RG Lanckriet, and Michael I Jordan. Multiple kernel learning, conic duality, and the\nsmo algorithm. In Proceedings of the twenty-\ufb01rst international conference on Machine learning, page 6.\nACM, 2004.\n\n[8] Angelia Nedi\u00b4c and Asuman Ozdaglar. Subgradient methods for saddle-point problems. Journal of opti-\n\nmization theory and applications, 142(1):205\u2013228, 2009.\n\n[9] Antonin Chambolle and Thomas Pock. A \ufb01rst-order primal-dual algorithm for convex problems with\n\napplications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120\u2013145, 2011.\n\n[10] Bingsheng He and Xiaoming Yuan. Convergence analysis of primal-dual algorithms for a saddle-point\n\nproblem: from contraction perspective. SIAM Journal on Imaging Sciences, 5(1):119\u2013149, 2012.\n\n[11] Antonin Chambolle and Thomas Pock. On the ergodic convergence rates of a \ufb01rst-order primal\u2013dual\n\nalgorithm. Mathematical Programming, 159(1):253\u2013287, 2016.\n\n[12] Yunmei Chen, Guanghui Lan, and Yuyuan Ouyang. Optimal primal-dual methods for a class of saddle\n\npoint problems. SIAM Journal on Optimization, 24(4):1779\u20131814, 2014.\n\n[13] A. Nedic and A. Ozdaglar. Convex Optimization in Signal Processing and Communications, chapter\n\nCooperative Distributed Multi-agent Optimization, pages 340\u2013385. Cambridge University Press, 2010.\n\n[14] A. Nedi\u00b4c. Distributed optimization. In Encyclopedia of Systems and Control, pages 1\u201312. Springer, 2014.\n\n[15] Tsung-Hui Chang, Angelia Nedic, and Anna Scaglione.\n\nby consensus-based primal-dual perturbation method.\n59(6):1524\u20131538, 2014.\n\nDistributed constrained optimization\nAutomatic Control, IEEE Transactions on,\n\n[16] David Mateos-N\u00b4u\u02dcnez and Jorge Cort\u00b4es. Distributed subgradient methods for saddle-point problems. In\n\n2015 54th IEEE Conference on Decision and Control (CDC), pages 5462\u20135467, Dec 2015.\n\n[17] Deming Yuan, Shengyuan Xu, and Huanyu Zhao. Distributed primal\u2013dual subgradient method for multi-\nagent optimization via consensus algorithms. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE\nTransactions on, 41(6):1715\u20131724, 2011.\n\n[18] Angelia Nedi\u00b4c, Asuman Ozdaglar, and Pablo A Parrilo. Constrained consensus and optimization in multi-\n\nagent networks. Automatic Control, IEEE Transactions on, 55(4):922\u2013938, 2010.\n\n[19] Kunal Srivastava, Angelia Nedi\u00b4c, and Du\u02c7san M Stipanovi\u00b4c. Distributed constrained optimization over\nIn Decision and Control (CDC), 2010 49th IEEE Conference on, pages 1945\u20131950.\n\nnoisy networks.\nIEEE, 2010.\n\n[20] Albert I Chen and Asuman Ozdaglar. A fast distributed proximal-gradient method. In Communication,\nControl, and Computing (Allerton), 2012 50th Annual Allerton Conference on, pages 601\u2013608. IEEE,\n2012.\n\n[21] Angelia Nedi\u00b4c and Asuman Ozdaglar. Distributed subgradient methods for multi-agent optimization.\n\nAutomatic Control, IEEE Transactions on, 54(1):48\u201361, 2009.\n\n[22] Ralph Tyrell Rockafellar. Convex analysis. Princeton university press, 2015.\n\n[23] H. Robbins and D. Siegmund. Optimizing methods in statistics (Proc. Sympos., Ohio State Univ., Colum-\nbus, Ohio, 1971), chapter A convergence theorem for non negative almost supermartingales and some\napplications, pages 233 \u2013 257. New York: Academic Press, 1971.\n\n9\n\n\f", "award": [], "sourceid": 2589, "authors": [{"given_name": "Necdet Serhat", "family_name": "Aybat", "institution": "Penn State University"}, {"given_name": "Erfan", "family_name": "Yazdandoost Hamedani", "institution": "Penn State University"}]}