Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper provides an interesting new method for neural network architecture search by encoding knowledge about constraints in IoT devices. As such the authors are able to search many more architectures than previous works because they do not need to train as many networks to find the best ones for the IoT use case, and they are able to find networks which provide higher accuracy under the same latency / memory constraints as compared to previous work. This paper contains several typographical errors (35$ instead of $35), and some other errors, for example their definition of "half" and "float" in Section 3.2 is backwards. These errors are minor and do not significantly detract from the paper's content. There are also some small issues with clarity. The last paragraph in Section 3.1 is one sentence, which makes it hard to read. Also, what exactly is the relationship between the manual and automatic workflows? How do they interact with each other? And within the automatic workflow, what is the difference between the automatic search and "random configuration" as shown in Figure 4? This figure in particular was hard to understand. Overall, the paper is an interesting contribution, however I am unsure of the significance of the contribution, as it does not seem like that much of an improvement over the cited previous works.
**Originality**: The novelty of the work is that the authors search in the context of IoT devices, in terms of evaluating measured latency, memory footprint, and prediction accuracy at the same time to satisfy certain constraints. This is new compared to prior works, according to the Related Work section and my understanding. Also, their method of evolution of a parameterized search space seems also new in the context of Neural Architecture Search. **Quality**: The submission is complete and not a work-in-progress. The experiment design is reasonable, the baseline comparisons against previous methods are fair and the results support their claims. The authors did not mention the weakness of this method, and there are a number of typos yet to be fixed. **Clarity**: The method description and clear and easy to understand, since the method is not very complicated. Most of the figures are self-explainable, except some require a deeper look into the text. **Significance**: Their results show that they can search for better model hyper-parameters (kernel size, channel size) for IoT devices. The results have shown that their method is better compared to prior work. They can also deal with the target space constraint (such as memory size) by evolving the input search space, and this is useful when deployed models have such constraints. Their search for different floating point configuration also prompts hardware designers and researchers to design specialized architecture for smaller deep networks that run on CPUs. Since they target IoT devices, the results can also be useful for non-researchers with affordable hardware.
This paper studies the NAS problem in the IoT platform scenarios with specific hardware considerations. The idea is to break down the original whole search space into a set of subspaces through sampling, and the final, actual search is conducted in the typically much narrower space as the union of these sampled subspaces. In the IoT platform scenarios, specific hardware considerations are applied in sampling the space. The whole work of the paper is reported through a case study on image classification using CIFAR-10 dataset, and the better ever results are reported. The only methodological contribution as I see is the idea of breaking down the original space into subspaces and conducting the search in the narrow space as the union of the subspaces. While this could be claimed as a contribution as was done in the paper, I don’t feel comfortable with buying this idea, as I see that this methodology has a lot of open issues. For example, how many sampled subspaces are reasonably sufficient to deliver a good solution? What sort of theoretic assurance of reaching to a good or better solution using this approach, even if we force each sampled subspace to cover a reference model as was done in the paper? In terms of empirical analysis, the paper only gave one case study on image classification with CIFAR-10 dataset. In real-world IoT applications, the problems we face may way more challenging than image classification in CIFAR-10. Consequently, I am not sure how much societal impact this work could generate. Finally, I would like to comment on the presentation of the paper. The paper has a lot of grammatical errors and awkward sentences. I encourage the authors to have a serious proof-reading before sending the paper out for review. The above was my initial review. After I read the authors’ response, I am willing to buy the arguments regarding the significance and novelty of the work. Specifically, regarding the claimed improvements over SOTA, my ranking of the significance goes in the order of 1 (most, though I still have a technical reservation stated above and I am not completely convinced by the authors’ response as I was expecting that the authors would offer an insightful discussion on determining the number of subspaces as well as its relationship with the give prior such as the reference models), 3 (moderate), and 2 (least). As a result, I am raising the overall score to 6.