Cynthia Rudin, Ingrid Daubechies, Robert E. Schapire
In order to understand AdaBoost’s dynamics, especially its ability to maximize margins, we derive an associated simpliﬁed nonlinear iterated map and analyze its behavior in low-dimensional cases. We ﬁnd stable cycles for these cases, which can explicitly be used to solve for Ada- Boost’s output. By considering AdaBoost as a dynamical system, we are able to prove R¨atsch and Warmuth’s conjecture that AdaBoost may fail to converge to a maximal-margin combined classiﬁer when given a ‘non- optimal’ weak learning algorithm. AdaBoost is known to be a coordinate descent method, but other known algorithms that explicitly aim to max- imize the margin (such as AdaBoost⁄ and arc-gv) are not. We consider a differentiable function for which coordinate ascent will yield a maxi- mum margin solution. We then make a simple approximation to derive a new boosting algorithm whose updates are slightly more aggressive than those of arc-gv.