Paper ID: | 4549 |
---|---|

Title: | Robust and Communication-Efficient Collaborative Learning |

This paper propose a new algorithm to solve a stochastic optimization problem distributively over the nodes of a graph of computing units. As usual, the algorithm involves a communication step and a computing step at each iteration. The algorithms only involves local computations. To be robust to the effect of stragglers among the computing units, the proposed algorithm imposes a deadline in computation time at each iteration. To avoid the communication cost, only quantized version of the local iterates are exchanged during communication steps. Two theorems are given : Th 1. The loss is strongly convex and a bound is provided for the expected square distance to the solution. Th 2. The loss is non convex and a bound is provided for the expected square norm of the gradient **at the average iterate**. A bound is also provided for the distance to the consensus. The problem is relevant. Deadlines, local computations and quantizations has been considered in the literature of collaborative learning but separately. The explanations are a bit technical but clear and the paper is well written. Numerical simulations are provided. I have two comments for the authors. 1. Assumption 2. It is assumed that the variance of the quantization is bounded whereas some authors allow it to grow with \|x\|. Could you remove this assumption? 2. Eq 10 should include the cost of computing the average iterate since in this paper, communication time is a bottleneck. *********************************************************************************** I appreciate the authors's response, especially the considerations for the convex setting and for the variance of the quantization (not for the convex setting, of course)

I have read the authorsâ€™ response. Overall, I think that this is a solid paper with significant theoretical and algorithmic contribution, and it is also well-written. A minor concern is that this paper established the convergence results under strongly convex and non-convex settings, whether the result can be directly derived for convex setting, i.e., the case of \mu=0 in Section 4.1. I hope that the results for convex setting can be added in the final version.

This paper combines gradient with deadline and quantization techniques into the decentralized optimization algorithm. Although the convergence in the nonconvex setting looks interesting and solid, the combination limits the novelty of this paper, 1. The experiments are too simple to show the efficacy of the proposed algorithms, merely MNIST and CIFAR10 datasets. 2. The size of the tested neural network is too small. In addition, whats the activation function for the fully connected neural network? If the ReLU activation was used, the tested loss function is not smooth. 3. For first order stochastic methods, their performances are sensitive to the learning rate. The authors should report the best results for each solver with well-tuned learning rates