Part of Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
Yong Liu
Kernel k-means is one of the most popular approaches to clustering and its theoretical properties have been investigated for decades. However, the existing state-of-the-art risk bounds are of order O(k/√n), which do not match with the stated lower bound Ω(√k/n) in terms of k, where k is the number of clusters and n is the size of the training set. In this paper, we study the statistical properties of kernel k-means and Nystr\"{o}m-based kernel k-means, and obtain optimal clustering risk bounds, which improve the existing risk bounds. Particularly, based on a refined upper bound of Rademacher complexity [21], we first derive an optimal risk bound of rate O(√k/n) for empirical risk minimizer (ERM), and further extend it to general cases beyond ERM. Then, we analyze the statistical effect of computational approximations of Nystr\"{o}m kernel k-means, and prove that it achieves the same statistical accuracy as the original kernel k-means considering only Ω(√nk) Nystr\"{o}m landmark points. We further relax the restriction of landmark points from Ω(√nk) to Ω(√n) under a mild condition. Finally, we validate the theoretical findings via numerical experiments.