Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track
Vincent Cohen-Addad, Alessandro Epasto, Vahab Mirrokni, Shyam Narayanan, Peilin Zhong
We study the differentially private (DP) k-means and k-median clustering problems of n points in d-dimensional Euclidean space in the massively parallel computation (MPC) model. We provide two near-optimal algorithms where the near-optimality is in three aspects: they both achieve (1). O(1) parallel computation rounds, (2). near-linear in n and polynomial in k total computational work (i.e., near-linear running time when n is a sufficient polynomial in k), (3). O(1) relative approximation and poly(k,d) additive error. Note that Ω(1) relative approximation is provably necessary even for any polynomial-time non-private algorithm, and Ω(k) additive error is a provable lower bound for any polynomial-time DP k-means/median algorithm. Our two algorithms provide a tradeoff between the relative approximation and the additive error: the first has O(1) relative approximation and ∼(k2.5+k1.01√d) additive error, and the second one achieves (1+γ) relative approximation to the optimal non-private algorithm for an arbitrary small constant γ>0 and with poly(k,d) additive error for a larger polynomial dependence on k and d. To achieve our result, we develop a general framework which partitions the data and reduces the DP clustering problem for the entire dataset to the DP clustering problem for each part. To control the blow-up of the additive error introduced by each part, we develop a novel charging argument which might be of independent interest.