Review for NeurIPS paper: RandAugment: Practical Automated Data Augmentation with a Reduced Search Space

NeurIPS 2020

RandAugment: Practical Automated Data Augmentation with a Reduced Search Space

Review 1

Summary and Contributions: This paper tackles automated data augmentation by first identifying the problem of policy search on proxy task, a standard practice in the field. It then propose a reduced search space to find good data augmentation strategies. The resultant search space is simple enough for grid search yet still yield competitive performance compared to other automated data augmentation methods.

Strengths: The paper provides a very insightful analysis on how the data augmentation policy search is sub-optimal if they are performed on proxy tasks. More precisely, as the authors point out quite accurately, the proxy tasks are typically smaller dataset/smaller model. As shown in Figure 1, these two dimensions both have a non-trivial (and sometimes counter-intuitive) impact on the final accuracy, showing search on proxy tasks may not be a good strategy. I find the information (both this analysis, and the proposed simple search space) provided by this paper useful to demystify important aspects of auto data augmentation, and perhaps can provide inspiration for deeper insights of other applications of AutoML as well. More precisely, most of the AutoML based methods (auto data augmentation in particular) boast about their “flexibility” while making drastic simplifications. But if the flexibility comes with a large search space that require approximation, could the approximation be so bad that the large search space is simply useless? This paper certainly provides a concrete thesis that caution against this trend. In practice, this paper propose a much simpler algorithm compared to existing auto data augmentation algorithm that can achieve SOTA results on classification/detection tasks. This by itself is a good contribution.

Weaknesses: The paper only shows proxy task + complicated search space may not work as well as using a simple search task without much approximation. It doesn’t really tell us what happens if a complicated search space can be efficiently explored on the real task. In this sense, this paper is only a reflection of current practice, without providing a clear direction forward. In fact, the simplification of this paper (reducing the search space to number of op to apply, and the shared magnitude of ops) seems like an over-kill. By doing that, it misses an opportunity to answer some interesting question, such as: “Does assigning a different magnitude to different ops useful at all in auto data augmentation”? Intuitively, there should be an intermediate search space that is perhaps simpler than most used in the literature, but still simple enough so that the search can be done on the target task. In a sense, this paper seems to mix two related, but distinct goals prematurely. Goal 1: have an efficient search algorithm on target task (not proxy task). Goal 2: Replace an approximate search algorithm (such as RL used in AutoAugment) with a grid search. While goal 1 is clearly justified by careful analysis, goal 2 seems like an after-thought. Update: The rebuttal addresses most of my most of my concerns. I think it is a on balance a good paper and should be accepted. I have revised my score to "accept". I have read the other reviews and compare notes with the rebuttal. I agree with the authors in that the paper shouldn't be discounted as not-novel. My reasoning is that if data augmentation (with the current design in augmentation strategies) does not really need a full AutoML solution, then a paper that finds this empirically has its values.

Correctness: The paper has good empirical analysis. The comparisons seems correct and can justify the conclusions.

Clarity: The paper is in general well-written. However, there are a few problems that should be addressed Line 158-160: Why can RandAugment represent K^N policies? If it only searches N, wouldn’t it mean it can only search among N options? Perhaps it should be K^N outcomes? Line 167-168: What does M mean precisely? How does the search in M work, is it just selecting which schedule to use? What is the range of M and N, the paper doesn’t seem to make this clear.

Relation to Prior Work: The literature survey is comprehensive and useful.

Reproducibility: No

Additional Feedback:

Review 2

Summary and Contributions: This paper proposes an efficient method to search for data augmentation policies. It conducts a simple grid search on a vastly reduced search space. The experimental results show the competitive performances with some previous work on multiple tasks.

Strengths: 1. This work empirically studies the relationship between the optimal strength of data augmentation and the model size/training set size. The observation indicates that proxy-task-based search methods may be sub-optimal for learning and transferring augmentation policies. 2. This work designs a small policy search space. It significantly reduces the computation cost, getting rid of a separate expensive search phase and proxy tasks. 3. Experimental results are shown for CIFAR-10/100, SVHN, ImageNet, and COCO datasets. This method achieves equal or better performance over some previous methods.

Weaknesses: The mothod of this paper are not so complecate. The author made the point clear that their work decovers the power of the random augmentation. However, some how I think some conventional augmentation procedure doesn't seems entirely another category. Personaly, I would like to see more inovention in this work. 2. This work uses a much smaller search space than much previous work, resulting in fixed searched policies. However, PBA [2] points out that an augmentation function that can reduce generalization error at the end of training is not necessarily a good function at initial phases. So the end goal of PBA is to learn a schedule of augmentation policies as opposed to a fixed policy. Besides, previous works [3,4] also replaces the fixed augmentation policy with a dynamic schedule of augmentation policy along with the training process, and achieves better performance than this paper with competitive computational costs. All of these may indicate the limitations of the fixed policies. Reference: [1] Cubuk, Ekin D., et al. "Autoaugment: Learning augmentation strategies from data." Proceedings of the IEEE conference on computer vision and pattern recognition. 2019. [2] Ho, Daniel, et al. "Population based augmentation: Efficient learning of augmentation policy schedules." International Conference on Machine Learning. 2019. [3] Zhang, Xinyu, et al. "Adversarial autoaugment."International Conference on Learning Representations . 2020. [4] Lin, Chen, et al."Online hyper-parameter learning for auto-augmentation strategy" Proceedings of the IEEE International Conference on Computer Vision. 2019.

Correctness: A claim in L013 may be controversial ("our method achieves equal or better performance over previous automated augmentation strategies"). Because another previous works, achieves competitive performance with this paper.

Clarity: The paper is well written and easy to follow.

Relation to Prior Work: Yes, it clearly discussed how this work differs from some previous automated data augmentation work. However, several strongly related works are missing in their reference (see weekness reference [3,4]).

Reproducibility: Yes

Additional Feedback: Update: The rebuttal addresses some of my concerns. I have revised my score. I think overall this work gives a pratical implemation of better augmentation compared with current deafult settings and could increase performance of baseline models.

Review 3

Summary and Contributions: The author proposed a novel automated data augmentation method named "RandAugment". It designed a vastly smaller search space than the previous works, thus reducing the computation expense. Moreover, in the proposed search space, the author used a random search strategy that achieved some promising results on classification and object detection tasks in automated data augmentation. I have read the rebuttal and would keep the score.

Strengths: 1. Comparing to previous works, the proposed method could reduce the search space from 10^32 to only 100. Moreover, the proposed work is easy to implement thanks to the random search or grid search. 2. The ablation experiments seems ok and adequately designed. It's interesting that posterize has a negative impact on these datasets and rotate\shear\translate are the most effective transformation which is consistent with common sense. 3. The experiments verify that the proxy task may provide sub-optimal results and the proposed method could be directly used on large datasets. Furthermore, RandAugment is largely insensitive to the selection of transformations for different datasets. 4. Paper is well written and easy to understand.

Weaknesses: 1. Although experimental results show the effect of the proposed method, the contribution of this work may be incremental. The idea of this paper seems to come from the paper "EVALUATING THE SEARCH PHASE OF NEURAL ARCHITECTURE SEARCH" that random search performs better than elaborate search strategies. Could you highlight your main contributions and the differerence with it, as well as why such differences make sense? 2. It's encouraged to give the details of the search space of previous works like AA, Fast AA, and PBA. 3. It would be better to place the time cost in the other tables of experiments like in Table 1.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes, the authors show the distinctions between their work and a similar one proposed by E. D. Cubuk et.al.

Reproducibility: Yes

Additional Feedback: 1. There is a typo on page 3, i.e., `discritizes'. 2. The reference [35] lose the paper name.

Review 4

Summary and Contributions: This work proposes a simple and small space to reduce the cost of augmentation search. It samples N transformation operations from the pool, and set a scale M of distortion parameters for all sampled operations. This method leads to significant improvement on multiple tasks like image classification and object detection at minor cost.

Strengths: The paper proposes an extremely simple search space, yet the empirical improvements are significant. It conducts comprehensive empirical experiments with well described training details.

Weaknesses: - Although the empirical study shows good results, the novalty of this paper is limited. - The paper claims a simple grid search is sufficient to get a good result. However a random search might give better result at the same cost, because the optimal parameter may not be included in the grid. - The advantage of this method is obvious when the pool of all transforms is large. Still it is unclear to see whether it is still outstanding when the number of transformation is small. Not all scenarios have a list of 14 augmentations, thus such ablation study is important.

Correctness: Yes

Clarity: The paper is easy to follow.

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: - Conduct experiment with random search instead of the grid search, and compare the results. Faster convergence and more robust performance is expected from random search given a large pool of augmentations. - Conduct experiment with a smaller pool of transformation candidates and compare with the baselines. It is possible that other augmentation searching algorithms may be comparable or even superior since they might have better results in a reasonable amout of time given a small pool to search from. Update after Rebuttal: I've read the feedback and other reviews. The feedback on novelty is not persuasive enough to me. The statement "...many new semi-supervised and self-supervised papers utilize this method to achieve SOTA..." is more on the side of useful-ness but not the novelty. On the other hand, it is still clear to see that the method brings improvement. Therefore, I'd like to keep my score.