Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)
Ran El-Yaniv, Oren Souroujon
We present a powerful meta-clustering technique called Iterative Dou- ble Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that ex- hibited impressive performance on text categorization tasks . Us- ing synthetically generated data we empirically ﬂnd that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a signiﬂcantly more accurate classiﬂcation. IDC is especially advan- tageous when the data exhibits high attribute noise. Our simulation results also show the eﬁectiveness of IDC in text categorization prob- lems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a simple and natural extension of IDC for semi-supervised and transductive learning where we are given both labeled and unla- beled examples.