Part of Advances in Neural Information Processing Systems 30 (NIPS 2017)
Xingguo Li, Jarvis Haupt, David Woodruff
We study the least squares regression problem min, where \Theta is a low-rank tensor, defined as \Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}, for vectors \theta_d^{(r)} \in \mathbb{R}^{p_d} for all r \in [R] and d \in [D]. %R is small compared with p_1,\ldots,p_D, Here, \circ denotes the outer product of vectors, and \cA(\Theta) is a linear function on \Theta. This problem is motivated by the fact that the number of parameters in \Theta is only R \cdot \sum_{d=1}^D p_D, which is significantly smaller than the \prod_{d=1}^{D} p_d number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors \Theta, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections \Phi \in \RR^{m \times n}, with m \ll n, to reduce the problem to a much smaller problem \min_{\Theta} \|\Phi \cA(\Theta) - \Phi b\|_2^2, for which \|\Phi \cA(\Theta) - \Phi b\|_2^2 = (1 \pm \varepsilon) \| \cA(\Theta) - b \|_2^2 holds simultaneously for all \Theta. We obtain a significantly smaller dimension and sparsity in the randomized linear mapping \Phi than is possible for ordinary least squares regression. Finally, we give a number of numerical simulations supporting our theory.