Reviews: Model Similarity Mitigates Test Set Overuse

This paper is concerned with an observation about adaptive data analysis. It relies on a study that shows that despite statistical lower bounds, common practices of adaptive data analysis do not result in overfitting. The authors show that empirically this is a result of the models used in Kaggle competitions behaving in a similar manner. In addition, the authors give a simple model and analyze the model. The reviewers thought this is an interesting direction and that the results were generally well executed. This paper was thoroughly discussed by the committee, and was discussed with the PC Chair. A topic that came into discussion by the committee was the tight relationship between this paper and paper: 4929 A Meta-Analysis of Overfitting in Machine Learning which provides a large scale analysis of overfitting, similar to that performed by Recht et al. ICML 2019. The committee thought that paper 5286 provides the methodological progress expected for a contribution to NeurIPS, given the contribution by Recht et al. ICML 2019. After a great deal of deliberation, the committee thought that there is no room to have both paper accepted, and between paper 4929 and 5286, paper 5286 should be accepted. Recommendation: Throughout the process, multiple committee members read both papers 4929 and paper 5286. Some of the senior committee members were aware of the fact there is significant overlap in the papers’ author list. It seems that work in 5286 is strengthened by findings in 4929. All members of the committee thought that merging the two papers would result in a much stronger result and statement. The committee decided it will not force this solution, but recommend it to the authors.

Paper ID:	5286
Title:	Model Similarity Mitigates Test Set Overuse