Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track
Sagi Lotan, Ernesto Evgeniy Sanches Shayda, Dan Feldman
The input to the {line-sets k-median} problem is an integer k≥1, and a set L={L1,…,Ln}that contains n sets of lines in Rd. The goal is to compute a set C of k centers (points in Rd) that minimizes the sum ∑L∈Lmin of Euclidean distances from each set to its closest center, where \mathrm{dist}(\ell,c):=\min_{x\in \ell}\norm{x-c}_2.An \emph{\varepsilon-coreset} for this problem is a weighted subset of sets in \mathcal{L} that approximates this sum up to 1 \pm \varepsilon multiplicative factor, for every set C of k centers. We prove that \emph{every} such input set \set{L} has a small \varepsilon-coreset, and provide the first coreset construction for this problem and its variants. The coreset consists of O(\log^2n) weighted line-sets from \set{L}, and is constructed in O(n\log n) time for every fixed d, k\geq 1 and \varepsilon \in (0,1). The main technique is based on a novel reduction to a fair clustering'' of colored points to colored centers. We then provide a coreset for this coloring problem, which may be of independent interest. Open source code and experiments are also provided.