Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback »Bibtex »Bibtex »MetaReview »Metadata »Paper »Reviews »Supplemental »


Michela Meister, Tamas Sarlos, David Woodruff


We revisit the classic randomized sketch of a tensor product of $q$ vectors $x_i\in\mathbb{R}^n$. The $i$-th coordinate $(Sx)_i$ of the sketch is equal to $\prod_{j = 1}^q \langle u^{i, j}, x^j \rangle / \sqrt{m}$, where $u^{i,j}$ are independent random sign vectors. Kar and Karnick (JMLR, 2012) show that if the sketching dimension $m = \Omega(\epsilon^{-2} C_{\Omega}^2 \log (1/\delta))$, where $C_{\Omega}$ is a certain property of the point set $\Omega$ one wants to sketch, then with probability $1-\delta$, $\|Sx\|_2 = (1\pm \epsilon)\|x\|_2$ for all $x\in\Omega$. However, in their analysis $C_{\Omega}^2$ can be as large as $\Theta(n^{2q})$, even for a set $\Omega$ of $O(1)$ vectors $x$. We give a new analysis of this sketch, providing nearly optimal bounds. Namely, we show an upper bound of $m = \Theta \left (\epsilon^{-2} \log(n/\delta) + \epsilon^{-1} \log^q(n/\delta) \right ),$ which by composing with CountSketch, can be improved to $\Theta(\epsilon^{-2}\log(1/(\delta \epsilon)) + \epsilon^{-1} \log^q (1/(\delta \epsilon))$. For the important case of $q = 2$ and $\delta = 1/\poly(n)$, this shows that $m = \Theta(\epsilon^{-2} \log(n) + \epsilon^{-1} \log^2(n))$, demonstrating that the $\epsilon^{-2}$ and $\log^2(n)$ terms do not multiply each other. We also show a nearly matching lower bound of $m = \Omega(\eps^{-2} \log(1/(\delta)) + \eps^{-1} \log^q(1/(\delta)))$. In a number of applications, one has $|\Omega| = \poly(n)$ and in this case our bounds are optimal up to a constant factor. This is the first high probability sketch for tensor products that has optimal sketch size and can be implemented in $m \cdot \sum_{i=1}^q \textrm{nnz}(x_i)$ time, where $\textrm{nnz}(x_i)$ is the number of non-zero entries of $x_i$. Lastly, we empirically compare our sketch to other sketches for tensor products, and give a novel application to compressing neural networks.