Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track
Joshua Engels, Benjamin Coleman, Vihan Lakshman, Anshumali Shrivastava
We study the problem of \emph{vector set search} with \emph{vector set queries}. This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are \textit{sets} of vectors. We identify this problem as a core subroutine for semantic search applications and find that existing solutions are unacceptably slow. Towards this end, we present a new approximate search algorithm, DESSERT (\bf DESSERT \bf Effeciently \bf Searches \bf Sets of \bf Embeddings via \bf Retrieval \bf Tables). DESSERT is a general tool with strong theoretical guarantees and excellent empirical performance. When we integrate DESSERT into ColBERT, a state-of-the-art semantic search model, we find a 2-5x speedup on the MS MARCO and LoTTE retrieval benchmarks with minimal loss in recall, underscoring the effectiveness and practical applicability of our proposal.