Isabelle Guyon, Steve Gunn, Asa Ben-Hur, Gideon Dror
The NIPS 2003 workshops included a feature selection competi- tion organized by the authors. We provided participants with five datasets from different application domains and called for classifica- tion results using a minimal number of features. The competition took place over a period of 13 weeks and attracted 78 research groups. Participants were asked to make on-line submissions on the validation and test sets, with performance on the validation set being presented immediately to the participant and performance on the test set presented to the participants at the workshop. In total 1863 entries were made on the validation sets during the development period and 135 entries on all test sets for the final competition. The winners used a combination of Bayesian neu- ral networks with ARD priors and Dirichlet diffusion trees. Other top entries used a variety of methods for feature selection, which combined filters and/or wrapper or embedded methods using Ran- dom Forests, kernel methods, or neural networks as a classification engine. The results of the benchmark (including the predictions made by the participants and the features they selected) and the scoring software are publicly available. The benchmark is available at www.nipsfsc.ecs.soton.ac.uk for post-challenge submissions to stimulate further research.
Recently, the quality of research in Machine Learning has been raised by the sus- tained data sharing efforts of the community. Data repositories include the well known UCI Machine Learning repository , and dozens of other sites . Yet, this has not diminished the importance of organized competitions. In fact, the proliferation of datasets combined with the creativity of researchers in designing
experiments makes it hardly possible to compare one paper with another . A number of large conferences have regularly organized competitions (e.g. KDD, CAMDA, ICDAR, TREC, ICPR, and CASP). The NIPS workshops offer an ideal forum for organizing such competitions. In 2003, we organized a competition on the theme of feature selection, the results of which were presented at a workshop on feature extraction, which attracted 98 participants. We are presently preparing a book combining tutorial chapters and papers from the proceedings of that work- shop . In this paper, we present to the NIPS community a concise summary of our challenge design and the findings of the result analysis.
2 Benchmark design
We formatted five datasets (Table 1) from various application domains. All datasets are two-class classification problems. The data were split into three subsets: a training set, a validation set, and a test set. All three subsets were made available at the beginning of the benchmark, on September 8, 2003. The class labels for the validation set and the test set were withheld. The identity of the datasets and of the features (some of which were random features artificially generated) were kept secret. The participants could submit prediction results on the validation set and get their performance results and ranking on-line for a period of 12 weeks. By December 1st, 2003, which marked the end of the development period, the participants had to turn in their results on the test set. Immediately after that, the validation set labels were revealed. On December 8th, 2003, the participants could make submissions of test set predictions, after having trained on both the training and the validation set. Some details on the benchmark design are provided in this Section.