Multi-Swap k-Means++

Beretta, Lorenzo; Cohen-Addad, Vincent; Lattanzi, Silvio; Parotsidis, Nikos

Multi-Swap k-Means++

Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track

Bibtex Paper Supplemental

Authors

Lorenzo Beretta, Vincent Cohen-Addad, Silvio Lattanzi, Nikos Parotsidis

Abstract

The $k$ -means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$ -means clustering objective and is known to give an $O(\log k)$ -approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$ -means++ with $O(k \log \log k)$ local-search steps obtained through the $k$ -means++ sampling distribution to yield a $c$ -approximation to the $k$ -means clustering problem, where $c$ is a large absolute constant. Here we generalize and extend their local-search algorithm by considering larger and more sophisticated local-search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a $9 + \varepsilon$ approximation ratio, which is the best possible for local search. Importantly we show that our algorithm is practical, namely easy to implement and fast enough to run on a variety of classic datasets, and outputs solutions of better cost.

Multi-Swap k-Means++

Authors

Abstract

Name Change Policy