Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)
Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj
In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract com- pact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse com- binations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models.