NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6217
Title:A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

This paper describes a system for detecting the source of performance regressions in source code. The idea is to measure performance counters (HPCs) at a per-function level of the code, and then when a performance regression is detected, it is localized by looking for the function with most anomalous performance counters. The anomaly detection is done by training autoencoders on the HPCs, and there is a further idea to cluster functions with similar behavior profiles to avoid the need for learning an autoencoder for every function in a large code base. This is a controversial paper because there is little methodological novelty. R1 gave the lowest score and asks whether we want to allow this kind of paper in NeurIPS, worrying that if we accept any application of ML, then NeurIPS risks becoming too broad. R3 gave the highest initial score and finds the paper of high quality. R2 also supports acceptance but agrees with the lack of methodological novelty. None of the reviewers was an expert on the application, so I solicited an extra review from an expert in the application area. The expert's opinion is that the paper presents a pretty good problem formulation and would be a good foundation for future ML research (though there is a simplifying assumption in the work that doesn't hold for all performance regressions). It's worth considering R1's question. For me (AC), the reason to prefer this paper to the average ICSE ML paper is that it introduces a problem that I've never seen in an ML venue, uses an interesting source of data that will be new to ML people (HPCs), follows reasonable modern ML practices (autoencoders for anomaly detection), would benefit from more advanced ML, and it could get ML people excited about the problem (the paper is easy to ready for an ML audience). I agree that we don't want NeurIPS to only have this kind of paper, but having a mix of methodological, application, and infrastructure (e.g., software toolkits, datasets) papers is healthy IMHO.