Paper ID: 1486
Title: Hidden Technical Debt in Machine Learning Systems
Current Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
I enjoyed reading this paper. It is thought-provoking and addresses an issue that should be of great concern to the ML community: how to reduce technical debt and hence make ML software systems more successful and sustainable for a broader set of real-life applications. In the long run, this will be essential for ML to fulfill the high expectations of industry and society. Certainly, it is important for researchers in ML to be aware that, from a SW engineering perspective, increasing complexity of ML systems and algorithms for marginal accuracy gains may be valuable from an academic perspective, but in the long run undermine the impact of ML research. Having said that, I have two main concerns with this paper: 1) I don't see it is in the scope of the NIPS call for papers. Overall, it reads more like a systems paper. 2) I find certain aspects should be dealt with in more depth. The authors should try to provide more empirical evidence for their claims. The solutions they propose often stay at a very high level. Are there any novel design patterns that could be derived from their analysis? A lot of the issues addressed by the authors pertain to pipelines for processing data and extracting features. There is a vast amount of Computer Systems research that deals with data management services; specific references to that work might be appropriate and help to put this paper in context.
Q2: Please summarize your review in 1-2 sentences
The paper addresses a technical issue of great relevance to machine learning system in a broader sense. However, I don't see it's really in the scope of the NIPS call for papers; moreover, certain aspects are treated only superficially.

Submitted by Assigned_Reviewer_2

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)


(Light Review) This well-written and thought-provoking paper

catalogues the myriad real-world engineering risks associated

with integrating machine learning into engineering systems.

The

paper does a fantastic job of outlining the various pitfalls -

as a researcher and practitioner I can personally vouch for

having encountered nearly all of them. This could be an

extremely valuable guide to teams and companies that are new to

(or considering) integrating machine learning into their

systems, in order to help them plan for the potential risks.

The paper doesn't offer quite as much in terms of fixes, though

it does offer some best practices (and anti-practices).

I think

this paper would generate great discussion at the conference and

would be often-cited (and more often *used*) in both research

and industrial settings.

I'd defintiely support the inclusion of

this paper at NIPS.

Q2: Please summarize your review in 1-2 sentences


This well-written and thought-provoking paper does a fantastic

job cataloging the myriad real-world engineering risks

associated with integrating machine learning into engineering

systems. While a bit untraditional for NIPS, I think this paper

would be of immense value to practitioners in both industry and

research.

I'd defintely support the inclusion of this paper at NIPS.


Submitted by Assigned_Reviewer_3

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper shines the light on the systems aspect of the design and maintenance of the ML systems.

This is not a typical NIPS paper as it does not propose new methodology.

However, there are barely any papers of its ilk despite the fact that most major corporations which make decisions based on data are employing some version of a ML system.

Technical debt leads to high overhead in maintenance which carries significant monetary costs to the companies.

By summarizing major issues associated with the maintenance of such systems (and thus potentially helping avoid some of the pointed out issues), this paper will have a *very* significant impact on the development of new ML systems.

I very much enjoyed reading the paper.

The paper resonates very strongly with me as after many years as a university ML researcher, I recently left to join industry, and over the last two years, I have observed first-hand the systems issues described in the paper.

The authors did a great job summarizing the main pitfalls.

I could add a few points to the taxonomy even though it is already quite comprehensive: - (Intro) Deploying ML systems carries its own overhead as well manifesting in running of A/B experiments (and then cleaning up after them) and maintaining an archive of artefacts for deployed models. - (ML-System Anti-Patterns) When reading about the Glue Code, I cannot help but think of incorporating open source packages into the system, especially if additional custom modifications need to be made.

Using these packages leads to several potential (unmentioned) headaches: (1) someone needs to maintain this package internally, (2) whenever there is an updated version of the package, it needs to be re-integrated into the internal code-base, and (3) if internal modifications we made, they often need to be committed back to the open source (to maintain future compatibility); this may require jumping through administrative hoops. - (Configuration Debt) It is worth mentioning that interruptions in data logging may lead not only to stale features but also to (potentially cascading) failures in feature builds. - (Monitoring and Testing) Additional aspect could be changes in available bandwidth, both to extract/build the features and to train the model.

Typos: - Page 2 (Correction Cascades) $a$ -> $m_a$ (two places) - Page 3, line 131, the -> them
Q2: Please summarize your review in 1-2 sentences
A very detailed paper on the potential pitfalls of maintaining a machine learning system.

The paper is very timely as ML is maturing and is being put in production across many industries; in my opinion, it is a very useful read for anyone building or working with an essential ML system.

Submitted by Assigned_Reviewer_4

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
I think this is an interesting position paper. It talks about challenges in designing and maintaining ML systems. It takes a software engineering perspective and analyzes current systems that seem to use ML algorithms as black boxes.

I see that this is an applied paper but I think that it would have been seriously strengthened with actual case studies of real systems that showcase these problems. In its current state it is an amalgamation of different design decisions that one should consider. I think it would also help to prioritize the different criteria rather than just listing them, specifically with ML systems in mind. This again would require actual case studies and I think would provide valuable insight into design of other similar systems.

Lastly, since this paper really doesn't have any technical/empirical contribution but as I mentioned before is a position paper more catered towards software engineering, I am not sure how suitable it is for NIPS. I think more applied venues such as industry tracks of data mining/ML conferences or conferences that are at the intersection of software engineering and data analytics might be more suitable.
Q2: Please summarize your review in 1-2 sentences
This is a position paper which I think would be more suitable for more applied venues. It would also help to have case studies to ground the suggested ideas.

Author Feedback
Author Feedback
Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 5000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We thank the reviewers for their careful reviews.

One of the key questions that was raised by reviewers 1, 3, and 6 was whether this paper is suited to NIPS as opposed to more applied / engineering / systems conferences. Our opinion here is that highlighting the systems-level complications and costs hidden in ML is important for the NIPS community in particular because this community's large and growing influence on real world systems and applied work. Indeed, a major portion of even "academic" researchers at NIPS have an industry affiliation of one form or another. Thus, we agree with reviewers 2, 4, and 5 that this topic area would be of high value to the NIPS community.

Reviewer 1 notes that this paper's topic area is not listed in the NIPS call for papers. In our opinion, CFP's should be interpreted as guidelines rather than exhaustive lists, otherwise new areas of investigation might never surface. The CFP itself states that papers of interest are "not limited to" the listed topic areas. We also note that the inclusion of the word Systems in the NIPS conference title is some suggestion that a discussion of systems-level issues for ML is not inappropriate for this venue. A significant portion of recent advances in ML in recent years have been at the systems level, bringing into scope not just algorithms and proofs, but also hardware, networking, and other topics traditionally seen as "engineering". In this view, a paper on ML-specific technical debt may be seen as highly relevant to NIPS.

Overall, we agree with reviewer 3 that case studies would be another excellent way to study this problem, and with reviewer 1 that additional empirical evidence would further strengthen the paper. As the first paper to explore these issues in detail, we felt that breadth was also a priority. Unfortunately the 8 page limit creates a tension between breadth and depth. If there are specific places where the reviewers feel content could be cut to make room for additional depth in other areas, we would be open to these suggestions. Fortunately (or unfortunately?) there is no lack of material from which to draw.

Reviewer 1 suggests incorporating more references from traditional data management literature. We will do so, but also note that the key difference between traditional data management and ML data management is that ML data directly impacts system behavior, making this a much more difficult problem area.

Reviewer 6 notes that the paper does not provide many "clean" mechanisms for dealing with ML related technical debt. We think this is in some sense a fundamental difficulty in this area. There is not a magic bullet for resolving complex system-level entanglements. However, as reviewer 2 notes, the paper does provide a number of best practices to emulate and anti-practices to avoid.

On the whole, it appears that the reviewers generally felt this work was valuable if untraditional. Again, we appreciate their insightful comments.