Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)
Cynthia Rudin, Ingrid Daubechies, Robert E. Schapire
In order to understand AdaBoost’s dynamics, especially its ability to maximize margins, we derive an associated simplified nonlinear iterated map and analyze its behavior in low-dimensional cases. We find stable cycles for these cases, which can explicitly be used to solve for Ada- Boost’s output. By considering AdaBoost as a dynamical system, we are able to prove R¨atsch and Warmuth’s conjecture that AdaBoost may fail to converge to a maximal-margin combined classifier when given a ‘non- optimal’ weak learning algorithm. AdaBoost is known to be a coordinate descent method, but other known algorithms that explicitly aim to max- imize the margin (such as AdaBoost⁄ and arc-gv) are not. We consider a differentiable function for which coordinate ascent will yield a maxi- mum margin solution. We then make a simple approximation to derive a new boosting algorithm whose updates are slightly more aggressive than those of arc-gv.