Grigoris Karakoulas, John Shawe-Taylor
Following recent results [9, 8] showing the importance of the fat(cid:173) shattering dimension in explaining the beneficial effect of a large margin on generalization performance, the current paper investi(cid:173) gates the implications of these results for the case of imbalanced datasets and develops two approaches to setting the threshold. The approaches are incorporated into ThetaBoost, a boosting al(cid:173) gorithm for dealing with unequal loss functions. The performance of ThetaBoost and the two approaches are tested experimentally.
Keywords: Computational Learning Theory, Generalization, fat-shattering, large margin, pac estimates, unequal loss, imbalanced datasets