Yoshua Bengio, Samy Bengio
The curse of dimensionality is severe when modeling high-dimensional discrete data: the number of possible combinations of the variables ex(cid:173) plodes exponentially. In this paper we propose a new architecture for modeling high-dimensional data that requires resources (parameters and computations) that grow only at most as the square of the number of vari(cid:173) ables, using a multi-layer neural network to represent the joint distribu(cid:173) tion of the variables as the product of conditional distributions. The neu(cid:173) ral network can be interpreted as a graphical model without hidden ran(cid:173) dom variables, but in which the conditional distributions are tied through the hidden units. The connectivity of the neural network can be pruned by using dependency tests between the variables. Experiments on modeling the distribution of several discrete data sets show statistically significant improvements over other methods such as naive Bayes and comparable Bayesian networks, and show that significant improvements can be ob(cid:173) tained by pruning the network.