Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track
Martin Jørgensen, Michael A Osborne
Modern approximations to Gaussian processes are suitable for tall data'', with a cost that scales well in the number of observations, but under-performs on
wide data'', scaling poorly in the number of input features. That is, as the number of input features grows, good predictive performance requires the number of summarising variables, and their associated cost, to grow rapidly. We introduce a kernel that allows the number of summarising variables to grow exponentially with the number of input features, but requires only linear cost in both number of observations and input features. This scaling is achieved through our introduction of the ``Bezier buttress'', which allows approximate inference without computing matrix inverses or determinants. We show that our kernel has close similarities to some of the most used kernels in Gaussian process regression, and empirically demonstrate the kernel's ability to scale to both tall and wide datasets.