David Barber, Felix Agakov
We propose a simple information-theoretic approach to soft clus- tering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with re- spect to parameters of speciﬁcally constrained encoding distribu- tions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to speciﬁc unknown vectors in the feature space. The method may be conveniently applied to learning the optimal aﬃnity matrix, which corresponds to learn- ing parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets.