Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)

Bibtex »Metadata »Paper »Supplemental »


Ryan Mcdonald, Mehryar Mohri, Nathan Silberman, Dan Walker, Gideon Mann


<p>Training conditional maximum entropy models on massive data requires significant time and computational resources. In this paper, we investigate three common distributed training strategies: distributed gradient, majority voting ensembles, and parameter mixtures. We analyze the worst-case runtime and resource costs of each and present a theoretical foundation for the convergence of parameters under parameter mixtures, the most efficient strategy. We present large-scale experiments comparing the different strategies and demonstrate that parameter mixtures over independent models use fewer resources and achieve comparable loss as compared to standard approaches.</p>