Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)
Gil-jin Jang, Te-Won Lee
We present a new technique for achieving source separation when given only a single channel recording. The main idea is based on exploiting the inherent time structure of sound sources by learning a priori sets of basis ﬁlters in time domain that encode the sources in a statistically efﬁcient manner. We derive a learning algorithm using a maximum likelihood approach given the observed single channel data and sets of basis ﬁlters. For each time point we infer the source signals and their contribution factors. This inference is possible due to the prior knowledge of the basis ﬁlters and the associated coefﬁcient densities. A ﬂexible model for density estimation allows accurate modeling of the observation and our experimental results exhibit a high level of separation performance for mixtures of two music signals as well as the separation of two voice signals.