Geoffrey E. Hinton, Russ R. Salakhutdinov
We show how to model documents as bags of words using family of two-layer, undirected graphical models. Each member of the family has the same number of binary hidden units but a different number of ``softmax visible units. All of the softmax units in all of the models in the family share the same weights to the binary hidden units. We describe efficient inference and learning procedures for such a family. Each member of the family models the probability distribution of documents of a specific length as a product of topic-specific distributions rather than as a mixture and this gives much better generalization than Latent Dirichlet Allocation for modeling the log probabilities of held-out documents. The low-dimensional topic vectors learned by the undirected family are also much better than LDA topic vectors for retrieving documents that are similar to a query document. The learned topics are more general than those found by LDA because precision is achieved by intersecting many general topics rather than by selecting a single precise topic to generate each word.