Part of Advances in Neural Information Processing Systems 25 (NIPS 2012)
Robert Gens, Pedro Domingos
Sum-product networks are a new deep architecture that can perform fast, exact in- ference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the ﬁrst discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efﬁcient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably us- ing “hard” gradient descent, where marginal inference is replaced by MPE infer- ence (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classiﬁcation tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architec- ture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset.