Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper provides an active learning approach that targets both the selection of examples and the most suitable architecture to learn an image recognition classifier. The model searches for effective architectures while applying active learning. The authors apply three query techniques: softmax response, MC-dropout, and coresets. The authors show that the model outperforms a traditional active learning approach based on static architecture. The paper is well-written and presented. The general idea of architecture learning is clearly not novel but what really matters is an instantiation of the idea. The authors present a concrete approach, which is reasonable and supported by some theoretical considerations. However, I have some doubts on the generality of the results. The baseline architecture is included in the set of networks the model can generate this means that: (i) the approach cannot be claimed to improve a standard model for image classification as there are more powerful architectures; and (ii) the paper just shows that the proposed approach can select a richer architecture than the baseline.
This paper proposed a method for doing active learning (AL) where in each AL iteration the optimization is done over network architecture and the underlying parameters, as opposed to other methods which fixes the architecture and only optimizes the parameters. These two optimizations are done separately, by first performing a local search among models of monotonically increasing complexity and then optimizing parameters of the obtained architecture. The authors used this method with three different active learning algorithms and showed that their method improved performance of these ALs. The paper is very well-written and clear. The problem of architectural optimization is also of great importance in the field. One confusing thing is that the title and introduction of the paper sound as if the paper is an AL paper, whereas it is mostly addressing architecture optimization in an AL framework (which can be replaced by any querying function, including passive random selection). Therefore, it is expected that the paper gives a more comprehensive review about existing methods of architectural search (rather than AL methods). The supplementary material does include a review of the existing architecture search methods, some of which should be moved to the main text. It is not clear why a new network architecture search is needed, and what makes it more suitable for AL settings. Any existing NAS method, including those based on reinforcement learning, Bayesian optimization, etc, could be used within AL iterations, why should iNAS be a better choice for this purpose? Moreover, is the proposed algorithm scalable to large data sets as it has to train/test multiple model architectures in each AL iterations? Does the model get fine-tuned in each iNAS iteration, or it gets trained from scratch? In the early AL iteration when there is small number of labeled samples, how reliable is the architecture selection as the size of the validation fold would be very small? Finally, it might not always be the case that more data induce deeper (more complicated) model. One can even imagine a scenario when smaller number of samples might lead to selecting a model with higher complexity as the underlying pattern in the data distribution might not be visible when there is smaller number of samples. Why did the authors allow the algorithm to only move towards increasing complexity in each iNAS iteration?
The main novelty of this work lies in the proposed algorithm for selection of an optimal network architecture, given the pre-defined AL selection criterion. The advantage of this is that this approach can select on-the-fly a potentially less complex model under the same budget, while also attaining improved performance in the classification task. The authors also provide theorethical motivation for this approach using the notion of tight bounds for the VC-dimension of deep neural nets. Experimental validation on three benchmark datasets show improvements (sometimes marginal) over the fixed baseline architectures. The approach is well presented and the paper is easy to read.