Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)
Noam Shental, Aharon Bar-hillel, Tomer Hertz, Daphna Weinshall
Density estimation with Gaussian Mixture Models is a popular gener- ative technique used also for clustering. We develop a framework to incorporate side information in the form of equivalence constraints into the model estimation procedure. Equivalence constraints are deﬁned on pairs of data points, indicating whether the points arise from the same source (positive constraints) or from different sources (negative con- straints). Such constraints can be gathered automatically in some learn- ing problems, and are a natural form of supervision in others. For the estimation of model parameters we present a closed form EM procedure which handles positive constraints, and a Generalized EM procedure us- ing a Markov net which handles negative constraints. Using publicly available data sets we demonstrate that such side information can lead to considerable improvement in clustering tasks, and that our algorithm is preferable to two other suggested methods using the same type of side information.