{"title": "Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty", "book": "Advances in Neural Information Processing Systems", "page_first": 15663, "page_last": 15674, "abstract": "Self-supervision provides effective representations for downstream tasks without requiring labels. However, existing approaches lag behind fully supervised training and are often not thought beneficial beyond obviating or reducing the need for annotations. We find that self-supervision can benefit robustness in a variety of ways, including robustness to adversarial examples, label corruption, and common input corruptions. Additionally, self-supervision greatly benefits out-of-distribution detection on difficult, near-distribution outliers, so much so that it exceeds the performance of fully supervised methods. These results demonstrate the promise of self-supervision for improving robustness and uncertainty estimation and establish these tasks as new axes of evaluation for future self-supervised learning research.", "full_text": "Using Self-Supervised Learning Can Improve Model\n\nRobustness and Uncertainty\n\nDan Hendrycks\n\nUC Berkeley\n\nMantas Mazeika\u2217\n\nUIUC\n\nhendrycks@berkeley.edu\n\nmantas3@illinois.edu\n\nSaurav Kadavath*\n\nUC Berkeley\n\nDawn Song\nUC Berkeley\n\nsauravkadavath@berkeley.edu\n\ndawnsong@berkeley.edu\n\nAbstract\n\nSelf-supervision provides effective representations for downstream tasks without\nrequiring labels. However, existing approaches lag behind fully supervised training\nand are often not thought bene\ufb01cial beyond obviating or reducing the need for\nannotations. We \ufb01nd that self-supervision can bene\ufb01t robustness in a variety of\nways, including robustness to adversarial examples, label corruption, and common\ninput corruptions. Additionally, self-supervision greatly bene\ufb01ts out-of-distribution\ndetection on dif\ufb01cult, near-distribution outliers, so much so that it exceeds the\nperformance of fully supervised methods. These results demonstrate the promise of\nself-supervision for improving robustness and uncertainty estimation and establish\nthese tasks as new axes of evaluation for future self-supervised learning research.\n\n1\n\nIntroduction\n\nSelf-supervised learning holds great promise for improving representations when labeled data are\nscarce. In semi-supervised learning, recent self-supervision methods are state-of-the-art [Gidaris\net al., 2018, Dosovitskiy et al., 2016, Zhai et al., 2019], and self-supervision is essential in video\ntasks where annotation is costly [Vondrick et al., 2016, 2018]. To date, however, self-supervised\napproaches lag behind fully supervised training on standard accuracy metrics and research has existed\nin a mode of catching up to supervised performance. Additionally, when used in conjunction with\nfully supervised learning on a fully labeled dataset, self-supervision has little impact on accuracy.\nThis raises the question of whether large labeled datasets render self-supervision needless.\n\nWe show that while self-supervision does not substantially improve accuracy when used in tandem\nwith standard training on fully labeled datasets, it can improve several aspects of model robustness, in-\ncluding robustness to adversarial examples [Madry et al., 2018], label corruptions [Patrini et al., 2017,\nZhang and Sabuncu, 2018], and common input corruptions such as fog, snow, and blur [Hendrycks\nand Dietterich, 2019]. Importantly, these gains are masked if one looks at clean accuracy alone,\nfor which performance stays constant. Moreover, we \ufb01nd that self-supervision greatly improves\nout-of-distribution detection for dif\ufb01cult, near-distribution examples, a long-standing and underex-\nplored problem. In fact, using self-supervised learning techniques on CIFAR-10 and ImageNet for\nout-of-distribution detection, we are even able to surpass fully supervised methods.\n\nThese results demonstrate that self-supervision need not be viewed as a collection of techniques\nallowing models to catch up to full supervision. Rather, using the two in conjunction provides strong\nregularization that improves robustness and uncertainty estimation even if clean accuracy does not\n\n\u2217Equal Contribution.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fchange. Importantly, these methods can improve robustness and uncertainty estimation without re-\nquiring larger models or additional data [Schmidt et al., 2018, Kurakin et al., 2017]. They can be used\nwith task-speci\ufb01c methods for additive effect with no additional assumptions. With self-supervised\nlearning, we make tangible progress on adversarial robustness, label corruption, common input\ncorruptions, and out-of-distribution detection, suggesting that future self-supervised learning methods\ncould also be judged by their utility for uncertainty estimates and model robustness. Code and our\nexpanded ImageNet validation dataset are available at https://github.com/hendrycks/ss-ood.\n\n2 Related Work\n\nSelf-supervised learning. A number of self-\nsupervised methods have been proposed, each\nexploring a different pretext task. Doersch et al.\n[2015] predict the relative position of image\npatches and use the resulting representation to\nimprove object detection. Dosovitskiy et al.\n[2016] create surrogate classes to train on by\ntransforming seed image patches. Similarly, Gi-\ndaris et al. [2018] predict image rotations (Fig-\nure 1). Other approaches include using coloriza-\ntion as a proxy task [Larsson et al., 2016], deep\nclustering methods [Ji et al., 2018], and methods\nthat maximize mutual information [Hjelm et al.,\n2019] with high-level representations [van den\nOord et al., 2018, H\u00e9naff et al., 2019]. These\nworks focus on the utility of self-supervision for\nlearning without labeled data and do not consider its effect on robustness and uncertainty.\n\nFigure 1: Predicting rotation requires modeling\nshape. Texture alone is not suf\ufb01cient for determin-\ning whether the zebra is \ufb02ipped, although it may\nbe suf\ufb01cient for classi\ufb01cation under ideal condi-\ntions. Thus, training with self-supervised auxiliary\nrotations may improve robustness.\n\nRobustness.\nImproving model robustness refers to the goal of ensuring machine learning models\nare resistant across a variety of imperfect training and testing conditions. Hendrycks and Dietterich\n[2019] look at how models can handle common real-world image corruptions (such as fog, blur, and\nJPEG compression) and propose a comprehensive set of distortions to evaluate real-world robustness.\nAnother robustness problem is learning in the presence of corrupted labels [Nettleton et al., 2010,\nPatrini et al., 2017]. To this end, Hendrycks et al. [2018] introduce Gold Loss Correction (GLC), a\nmethod that uses a small set of trusted labels to improve accuracy in this setting. With high degrees\nof label corruption, models start to over\ufb01t the misinformation in the corrupted labels [Zhang and\nSabuncu, 2018, Hendrycks et al., 2019a], suggesting a need for ways to supplement training with\nreliable signals from unsupervised objectives. Madry et al. [2018] explore adversarial robustness\nand propose PGD adversarial training, where models are trained with a minimax robust optimization\nobjective. Zhang et al. [2019] improve upon this work with a modi\ufb01ed loss function and develop a\nbetter understanding of the trade-off between adversarial accuracy and natural accuracy.\n\nOut-of-distribution detection. Out-of-distribution detection has a long history. Traditional meth-\nods such as one-class SVMs [Sch\u00f6lkopf et al., 1999] have been revisited with deep representations\n[Ruff et al., 2018], yielding improvements on complex data. A central line of recent exploration\nhas been with out-of-distribution detectors using supervised representations. Hendrycks and Gimpel\n[2017] propose using the maximum softmax probability of a classi\ufb01er for out-of-distribution detection.\nLee et al. [2018] expand on this by generating synthetic outliers and training the representations to\n\ufb02ag these examples as outliers. However, Hendrycks et al. [2019b] \ufb01nd that training against a large\nand diverse dataset of outliers enables far better out-of-distribution detection on unseen distributions.\nIn these works, detection is most dif\ufb01cult for near-distribution outliers, which suggests a need for\nnew methods that force the model to learn more about the structure of in-distribution examples.\n\n3 Robustness\n\n3.1 Robustness to Adversarial Perturbations\n\nImproving robustness to adversarial inputs has proven dif\ufb01cult, with adversarial training providing\nthe only longstanding gains [Carlini and Wagner, 2017, Athalye et al., 2018]. In this section, we\ndemonstrate that auxiliary self-supervision in the form of predicting rotations [Gidaris et al., 2018] can\n\n2\n\n\fClean\n\n20-step PGD 100-step PGD\n\nNormal Training\nAdversarial Training\n+ Auxiliary Rotations (Ours)\n\n94.8\n84.2\n83.5\n\n0.0\n44.8\n50.4\n\n0.0\n44.8\n50.4\n\nTable 1: Results for our defense. All results use \u03b5 = 8.0/255. For 20-step adversaries \u03b1 = 2.0/255,\nand for 100-step adversaries \u03b1 = 0.3/255. More steps do not change results, so the attacks converge.\nSelf-supervision through rotations provides large gains over standard adversarial training.\n\nimprove upon standard Projected Gradient Descent (PGD) adversarial training [Madry et al., 2018].\nWe also observe that self-supervision can provide gains when combined with stronger defenses such\nas TRADES [Zhang et al., 2019] and is not broken by gradient-free attacks such as SPSA [Uesato\net al., 2018].\n\nSetup. The problem of defending against bounded adversarial perturbations can be formally ex-\npressed as \ufb01nding model parameters \u03b8 for the classi\ufb01er p that minimize the objective\n\nmin\u03b8 E(x,y)\u223cD [maxx\u2032\u2208S LCE(y, p(y | x\u2032); \u03b8)] where S = {x\u2032 : kx \u2212 x\u2032k < \u03b5}\n\n(1)\n\nIn this paper, we focus on \u2113\u221e norm bounded adversaries. Madry et al. [2018] propose that PGD is \u201ca\nuniversal \ufb01rst-order adversary.\u201d Hence, we \ufb01rst focus on defending against PGD. Let PGD(x) be the\nK th step of PGD,\n\nxk+1 = \u03a0S (cid:0)xk + \u03b1 sign(\u2207xLCE(y, p(y | xk); \u03b8))(cid:1)\n\nand x0 = x + U (\u2212\u03b5, \u03b5)\n\n(2)\n\nwhere K is a preset parameter which characterizes the number of steps that are taken, \u03a0S is the\nprojection operator for the l\u221e ball S, and LCE(y, p(y | x\u2032); \u03b8) is the loss we want to optimize.\nNormally, this loss is the cross entropy between the model\u2019s softmax classi\ufb01cation output for x and\nthe ground truth label y. For evaluating robust accuracy, we use 20-step and 100-step adversaries.\nFor the 20-step adversary, we set the step-size \u03b1 = 2/256. For the 100-step adversary, we set\n\u03b1 = 0.3/256 as in [Madry et al., 2018]. During training, we use 10-step adversaries with \u03b1 = 2/256.\n\nIn all experiments, we use 40-2 Wide Residual\nNetworks Zagoruyko and Komodakis [2016].\nFor training, we use SGD with Nesterov mo-\nmentum of 0.9 and a batch size of 128. We use\nan initial learning rate of 0.1 and a cosine learn-\ning rate schedule Loshchilov and Hutter [2016]\nand weight decay of 5 \u00d7 10\u22124. For data augmen-\ntation, we use random cropping and mirroring.\nHyperparameters were chosen as standard val-\nues and are used in subsequent sections unless\notherwise speci\ufb01ed.\n\nMethod. We explore improving representa-\ntion robustness beyond standard PGD training\nwith auxiliary rotation-based self-supervision\nin the style of Gidaris et al. [2018]. In our ap-\nproach, we train a classi\ufb01cation network along\nwith a separate auxiliary head, which takes the\npenultimate vector from the network as input\nand outputs a 4-way softmax distribution. This\nhead is trained along with the rest of the net-\nwork to predict the amount of rotation applied\nto a given input image (from 0\u00b0, 90\u00b0, 180\u00b0, and\n270\u00b0). Our overall loss during training can be broken down into a supervised loss and a self-supervised\nloss\n\nFigure 2: The effect of attack strength on a \u03b5 =\n8/255 adversarially trained model. The attack\nstrengths are \u03b5 \u2208 {4/255, 5/255, . . . , 10/255}.\nSince the accuracy gap widens as \u03b5 increases, self-\nsupervision\u2019s bene\ufb01ts are masked when observing\nthe clean accuracy alone.\n\nL(x, y; \u03b8) = LCE(y, p(y | PGD(x)); \u03b8) + \u03bbLSS(PGD(x); \u03b8).\n\n(3)\n\nNote that the self-supervised component of the loss does not require the ground truth training label y\nas input. The supervised loss does not make use of our auxiliary head, while the self-supervised loss\n\n3\n\n456789103040506070Adversarial Accuracy (%)Adversarial Accuracy on CIFAR-10 vs. Adversarial Training+ Auxiliary Rotations\fonly makes use of this head. When \u03bb = 0, our total loss falls back to the loss used for PGD training.\nFor our experiments, we use \u03bb = 0.5 and the following rotation-based self-supervised loss\n\n\uf8f9\n\uf8fb ,\n\n(4)\n\nLSS(x; \u03b8) =\n\n1\n4\n\nX\n\nr\u2208{0\u25e6,90\u25e6,180\u25e6,270\u25e6}\n\nLCE(one_hot(r), prot_head(r | Rr(x)); \u03b8)\n\n\uf8ee\n\uf8f0\n\nwhere Rr(x) is a rotation transformation and LCE(x, r; \u03b8) is the cross-entropy between the auxiliary\nhead\u2019s output and the ground-truth label r \u2208 {0\u25e6, 90\u25e6, 180\u25e6, 270\u25e6}. In order to adapt the PGD\nadversary to the new training setup, we modify the loss used in the PGD update equation (2) to\nmaximize both the rotation loss and the classi\ufb01cation loss. In the Appendix, we \ufb01nd that this\nmodi\ufb01cation is optional and that the main source of improvement comes from the rotation loss itself.\nWe report results with the modi\ufb01cation here, for completeness. The overall loss that PGD will try to\nmaximize for each training image is LCE(y, p(y | x); \u03b8) + LSS(x; \u03b8). At test-time, the PGD loss does\nnot include the LSS term, as we want to attack the image classi\ufb01er and not the rotation classi\ufb01er.\n\nResults and analysis. We are able to attain large improvements over standard PGD training by\nadding self-supervised rotation prediction. Table 1 contains results of our model against PGD\nadversaries with K = 20 and K = 100. In both cases, we are able to achieve a 5.6% absolute\nimprovement over classical PGD training. In Figure 2, we observe that our method of adding\nauxiliary rotations actually provides larger gains over standard PGD training as the maximum\nperturbation distance \u03b5 increases. The \ufb01gure also shows that our method can withstand up to 11%\nlarger perturbations than PGD training without any drop in performance.\n\nIn order to demonstrate that our method does not rely on gradient obfuscation, we attempted to\nattack our models using SPSA [Uesato et al., 2018] and failed to notice any performance degradation\ncompared to standard PGD training. In addition, since our self-supervised method has the nice\nproperty of being easily adaptable to supplement other different supervised defenses, we also studied\nthe effect of adding self-supervised rotations to stronger defenses such as TRADES [Zhang et al.,\n2019]. We found that self-supervision is able to help in this setting as well. Our best-performing\nTRADES + rotations model gives a 1.22% boost over standard TRADES and a 7.79% boost over\nstandard PGD training in robust accuracy. For implementation details, see code.\n\n3.2 Robustness to Common Corruptions\n\nSetup.\nIn real-world applications of computer vision systems, inputs can be corrupted in various\nways that may not have been encountered during training. Improving robustness to these common\ncorruptions is especially important in safety-critical applications. Hendrycks and Dietterich [2019]\ncreate a set of \ufb01fteen test corruptions and four validation corruptions common corruptions to measure\ninput corruption robustness. These corruptions fall into noise, blur, weather, and digital categories.\nExamples include shot noise, zoom blur, snow, and JPEG compression.\n\nWe use the CIFAR-10-C validation dataset from Hendrycks and Dietterich [2019] and compare the\nrobustness of normally trained classi\ufb01ers to classi\ufb01ers trained with an auxiliary rotation prediction\nloss. As in previous sections, we predict all four rotations in parallel in each batch. We use 40-2 Wide\nResidual Networks and the same optimization hyperparameters as before. We do not tune on the\nvalidation corruptions, so we report average performance over all corruptions. Results are in Figure 3.\n\nResults and analysis. The baseline of normal training achieves a clean accuracy of 94.7% and\nan average accuracy over all corruptions of 72.3%. Training with auxiliary rotations maintains\nclean accuracy at 95.5% but increases the average accuracy on corrupted images by 4.6% to 76.9%.\nThus, the bene\ufb01ts of self-supervision to robustness are masked by similar accuracy on clean images.\nPerformance gains are spread across corruptions, with a small loss of performance in only one\ncorruption type, JPEG compression. For glass blur, clean accuracy improves by 11.4%, and for\nGaussian noise it improves by 11.6%. Performance is also improved by 8.9% on contrast and shot\nnoise and 4.2% on frost, indicating substantial gains in robustness on a wide variety of corruptions.\nThese results demonstrate that self-supervision can regularize networks to be more robust even if\nclean accuracy is not affected.\n\n3.3 Robustness to Label Corruptions\n\nSetup. Training classi\ufb01ers on corrupted labels can severely degrade performance. Thus, several\nprior works have explored training deep neural networks to be robust to label noise in the multi-class\n\n4\n\n\fFigure 3: A comparison of the accuracy of usual training compared to training with auxiliary rotation\nself-supervision on the nineteen CIFAR-10-C corruptions. Each bar represents an average over all\n\ufb01ve corruption strengths for a given corruption type.\n\nclassi\ufb01cation setting Sukhbaatar et al. [2014], Patrini et al. [2017], Hendrycks et al. [2018]. We\n\nclassi\ufb01cation accuracy on a test dataset Dtest of cleanly-labeled (x, y) pairs.\n\nuse the problem setting from these works. Let x, y, and ey be an input, clean label, and potentially\ncorrupted label respectively. Given a dataset eD of (x,ey) pairs for training, the task is to obtain high\nGiven a cleanly-labeled training dataset eD, we generate eD with a corruption matrix C, where\nCij = p(ey = j | y = i) is the probability of a ground truth label i being corrupted to j. Where K is\n\nthe range of the label, we construct C according to C = (1 \u2212 s)IK + s11T/K. In this equation, s is\nthe corruption strength, which lies in [0, 1]. At a corruption strength of 0, the labels are unchanged,\nwhile at a corruption strength of 1 the labels have an equal chance of being corrupted to any class.\nTo measure performance, we average performance on Dtest over corruption strengths from 0 to 1 in\nincrements of 0.1 for a total of 11 experiments.\n\nMethods. Training without loss correction methods or self-supervision serves as our \ufb01rst baseline,\nwhich we call No Correction in Table 2. Next, we compare to the state-of-the-art Gold Loss Correction\n(GLC) Hendrycks et al. [2018]. This is a two-stage loss correction method based on Sukhbaatar et al.\n[2014] and Patrini et al. [2017]. The \ufb01rst stage of training estimates the matrix C of conditional\ncorruption probabilities, which partially describes the corruption process. The second stage uses\nthe estimate of C to train a corrected classi\ufb01er that performs well on the clean label distribution.\nThe GLC assumes access to a small dataset of trusted data with cleanly-labeled examples. Thus, we\nspecify the percent of amount of trusted data available in experiments as a fraction of the training set.\nThis setup is also known as a semi-veri\ufb01ed setting Charikar et al. [2017].\n\nTo investigate the effect of self-supervision, we use the combined loss LCE(y, p(y | x); \u03b8) +\n\u03bbLSS(x; \u03b8), where the \ufb01rst term is standard cross-entropy loss and the second term is the auxil-\niary rotation loss de\ufb01ned in Section 3.1. We call this Rotations in Table 2. In all experiments, we set\n\u03bb = 0.5. Gidaris et al. [2018] demonstrate that predicting rotations can yield effective representations\nfor subsequent \ufb01ne-tuning on target classi\ufb01cation tasks. We build on this approach and pre-train\nwith the auxiliary rotation loss alone for 100 epochs, after which we \ufb01ne-tune for 40 epochs with the\ncombined loss.\n\nWe use 40-2 Wide Residual Networks [Zagoruyko and Komodakis, 2016]. Hyperparameters remain\nunchanged from Section 3.1. To select the number of \ufb01ne-tuning epochs, we use a validation split of\nthe CIFAR-10 training dataset with clean labels and select a value to bring accuracy close to that of\nNormal Training. Results are in Table 2 and performance curves are in Figure 4.\n\nAnalysis. We observe large gains in robustness from auxiliary rotation prediction. Without loss\ncorrections, we reduce the average error by 5.6% on CIFAR-10 and 5.2% on CIFAR-100. This\ncorresponds to an 11% relative improvement over the baseline of normal training on CIFAR-100\n\n5\n\nGaussianNoiseShotNoiseImpulseNoiseSpeckleNoiseGlassBlurDefocusBlurMotionBlurZoomBlurGaussianBlurJPEGCompressionElasticTransformSpatterBrightnessSaturateContrastPixelateSnowFogFrost0.30.40.50.60.70.80.91.0Top-1 Test AccuracyCIFAR-10, Normal TrainingCIFAR-10, Auxiliary Rotations\fFigure 4: Error curves for label corruption comparing normal training to training with auxiliary\nrotation self-supervision. Auxiliary rotations improve performance when training without loss\ncorrections and are complementary with the GLC loss correction method.\n\nCIFAR-10\n\nCIFAR-100\n\nNormal Training Rotations Normal Training Rotations\n\nNo Correction\nGLC (5% Trusted)\nGLC (10% Trusted)\n\n27.4\n14.6\n11.6\n\n21.8\n10.5\n9.6\n\n52.6\n48.3\n39.1\n\n47.4\n43.2\n36.8\n\nTable 2: Label corruption results comparing normal training to training with auxiliary rotation self-\nsupervision. Each value is the average error over 11 corruption strengths. All values are percentages.\nThe reliable training signal from self-supervision improves resistance to label noise.\n\nand a 26% relative improvement on CIFAR-10. In fact, auxiliary rotation prediction with no loss\ncorrection outperforms the GLC with 5% trusted data on CIFAR-100. This is surprising given that\nthe GLC was developed speci\ufb01cally to combat label noise.\n\nWe also observe additive effects with the GLC. On CIFAR-10, the GLC with 5% trusted data\nobtains 14.6% average error, which is reduced to 10.5% with the addition of auxiliary rotation\nprediction. Note that doubling the amount of trusted data to 10% yields 11.6% average error. Thus,\nusing self-supervision can enable obtaining better performance than doubling the amount of trusted\ndata in a semi-supervised setting. On CIFAR-100, we observe similar complementary gains from\nauxiliary rotation prediction. Qualitatively, we can see in Figure 4 that performance degradation as\nthe corruption strength increases is softer with auxiliary rotation prediction.\n\nOn CIFAR-100, error at 0% corruption strength is 2.3% higher with auxiliary rotation predictions.\nThis is because we selected the number of \ufb01ne-tuning epochs on CIFAR-10 at 0% corruption strength,\nfor which the degradation is only 1.3%. Fine-tuning for longer can eliminate this gap, but also\nleads to over\ufb01tting label noise [Zhang and Sabuncu, 2018]. Controlling this trade-off of robustness\nto performance on clean data is application-speci\ufb01c. However, past a corruption strength of 20%,\nauxiliary rotation predictions improve performance for all tested corruption strengths and methods.\n\n4 Out-of-Distribution Detection\n\nSelf-supervised learning with rotation prediction enables the detection of harder out-of-distribution\nexamples. In the following two sections, we show that self-supervised learning improves out-of-\ndistribution detection when the in-distribution consists in multiple classes or just a single class.\n\n6\n\n0.00.20.40.60.81.0Corruption Strength020406080100Test Error (%)CIFAR-10, No CorrectionNormal Training+ Rotations0.00.20.40.60.81.0Corruption Strength020406080100Test Error (%)CIFAR-10, GLC 5% Trusted0.00.20.40.60.81.0Corruption Strength020406080100Test Error (%)CIFAR-10, GLC 10% Trusted0.00.20.40.60.81.0Corruption Strength20406080100Test Error (%)CIFAR-100, No Correction0.00.20.40.60.81.0Corruption Strength20406080100Test Error (%)CIFAR-100, GLC 5% Trusted0.00.20.40.60.81.0Corruption Strength20406080100Test Error (%)CIFAR-100, GLC 10% Trusted\f4.1 Multi-Class Out-of-Distribution Detection.\n\nSetup.\nIn the following experiment, we train a CIFAR-10 classi\ufb01er and use it as an out-of-\ndistribution detector. When given an example x, we write the classi\ufb01er\u2019s posterior distribution\nover the ten classes with p(y | x). Hendrycks and Gimpel [2017] show that p(y | x) can enable\nthe detection of out-of-distribution examples. They show that the maximum softmax probability\nmaxc p(y = c | x) tends to be higher for in-distribution examples than for out-of-distribution\nexamples across a range of tasks, enabling the detection of OOD examples.\n\nWe evaluate each OOD detector using the area under the receiver operating characteristic curve\n(AUROC) [Davis and Goadrich, 2006]. Given an input image, an OOD detector produces an anomaly\nscore. The AUROC is equal to the probability an out-of-distribution example has a higher anomaly\nscore than an in-distribution example. Thus an OOD detector with a 50% AUROC is at random-chance\nlevels, and one with a 100% AUROC is without a performance \ufb02aw.\n\nMethod. We train a classi\ufb01er with an auxiliary self-supervised rotation loss. The loss during\n\ntraining is LCE(y, p(y | x)) + Pr\u2208{0\u25e6,90\u25e6,180\u25e6,270\u25e6} LCE(one_hot(r), prot_head(r | Rr(x))), and\n4 Pr\u2208{0\u25e6,90\u25e6,180\u25e6,270\u25e6} LCE(one_hot(r), prot_head(r | Rr(x))). We use the KL divergence\n\nwe only train on in-distribution CIFAR-10 training examples. After training is complete, we score\nin-distribution CIFAR-10 test set examples and OOD examples with the formula KL[U kp(y |\nx)] + 1\nof the softmax prediction to the uniform distribution U since it combines well with the rotation score,\nand because Hendrycks et al. [2019b] show that KL[U kp(y | x)] performs similarly to the maximum\nsoftmax probability baseline maxc p(y = c | x).\n\nThe training loss is standard cross-entropy loss with auxiliary\nrotation prediction. The detection score is the KL divergence\ndetector from prior work with a rotation score added to it. The\nrotation score consists of the cross entropy of the rotation soft-\nmax distribution to the categorical distribution over rotations\nwith probability 1 at the current rotation and 0 everywhere else.\nThis is equivalent to the negative log probability assigned to the\ntrue rotation. Summing the cross entropies over the rotations\ngives the total rotation score.\n\nMethod\n\nBaseline\nRotations (Ours)\n\nAUROC\n\n91.4%\n96.2%\n\nFigure 5: OOD detection per-\nformance of the maximum soft-\nmax probability baseline and our\nmethod using self-supervision. Full\nresults are in the Appendix.\n\nResults and Analysis. We evaluate this proposed method\nagainst the maximum softmax probability baseline [Hendrycks\nand Gimpel, 2017] on a wide variety of anomalies with CIFAR-10 as the in-distribution data. For\nthe anomalies, we select Gaussian, Rademacher, Blobs, Textures, SVHN, Places365, LSUN, and\nCIFAR-100 images. We observe performance gains across the board and report average AUROC\nvalues in Figure 5. On average, the rotation method increases the AUROC by 4.8%.\n\nThis method does not require additional data as in Outlier Exposure [Hendrycks et al., 2019b],\nalthough combining the two could yield further bene\ufb01ts. As is, the performance gains are of\ncomparable magnitude to more complex methods proposed in the literature [Xie et al., 2018]. This\ndemonstrates that self-supervised auxiliary rotation prediction can augment OOD detectors based on\nfully supervised multi-class representations. More detailed descriptions of the OOD datasets and the\nfull results on each anomaly type with additional metrics are in the Appendix.\n\n4.2 One-Class Learning\n\nIn the following experiments, we take a dataset consisting in k classes and train a model on\nSetup.\none class. This model is used as an out-of-distribution detector. For the source of OOD examples, we\nuse the examples from the remaining unseen k \u2212 1 classes. Consequently, for the datasets we consider,\nthe OOD examples are near the in-distribution and make for a dif\ufb01cult OOD detection challenge.\n\n4.2.1 CIFAR-10\n\nBaselines. One-class SVMs [Sch\u00f6lkopf et al., 1999] are an unsupervised out-of-distribution detec-\ntion technique which models the training distribution by \ufb01nding a small region containing most of the\ntraining set examples, and points outside this region are deemed OOD. In our experiment, OC-SVMs\noperate on the raw CIFAR-10 pixels. Deep SVDD [Ruff et al., 2018] uses convolutional networks to\nextract features from the raw pixels all while modelling one class, like OC-SVMs.\n\n7\n\n\fOC-SVM DeepSVDD Geometric RotNet DIM IIC Supervised (OE) Ours Ours + OE\n\nAirplane\nAutomobile\nBird\nCat\nDeer\nDog\nFrog\nHorse\nShip\nTruck\nMean\n\n65.6\n40.9\n65.3\n50.1\n75.2\n51.2\n71.8\n51.2\n67.9\n48.5\n58.8\n\n61.7\n65.9\n50.8\n59.1\n60.9\n65.7\n67.7\n67.3\n75.9\n73.1\n64.8\n\n76.2\n84.8\n77.1\n73.2\n82.8\n84.8\n82.0\n88.7\n89.5\n83.4\n82.3\n\n71.9\n94.5\n78.4\n70.0\n77.2\n86.6\n81.6\n93.7\n90.7\n88.8\n83.3\n\n72.6 68.4\n52.3 89.4\n60.5 49.8\n53.9 65.3\n66.7 60.5\n51.0 59.1\n62.7 49.3\n59.2 74.8\n52.8 81.8\n47.6 75.7\n57.9 67.4\n\n87.6\n93.9\n78.6\n79.9\n81.7\n85.6\n93.3\n87.9\n92.6\n92.1\n87.3\n\n77.5\n96.9\n87.3\n80.9\n92.7\n90.2\n90.9\n96.5\n95.2\n93.3\n90.1\n\n90.4\n99.3\n93.7\n88.1\n97.4\n94.3\n97.1\n98.8\n98.7\n98.5\n95.6\n\nTable 3: AUROC values of different OOD detectors trained on one of ten CIFAR-10 classes. Test\ntime out-of-distribution examples are from the remaining nine CIFAR-10 classes. In-distribution\nexamples are examples belonging to the row\u2019s class. Our self-supervised technique surpasses a fully\nsupervised model. All values are percentages.\n\nRotNet [Gidaris et al., 2018] is a successful self-supervised technique which learns its representations\nby predicting whether an input is rotated 0\u00b0, 90\u00b0, 180\u00b0, or 270\u00b0. After training RotNet, we use the\nsoftmax probabilities to determine whether an example is in- or out-of-distribution. To do this, we\nfeed the network the original example (0\u00b0) and record RotNet\u2019s softmax probability assigned to the 0\u00b0\nclass. We then rotate the example 90\u00b0 and record the probability assigned to the 90\u00b0 class. We do the\nsame for 180\u00b0 and 270\u00b0, and add up these probabilities. The sum of the probabilities of in-distribution\nexamples will tend to be higher than the sum for OOD examples, so the negative of this sum is\nthe anomaly score. Next, Golan and El-Yaniv [2018] (Geometric) predicts transformations such\nas rotations and whether an input is horizontally \ufb02ipped; we are the \ufb01rst to connect this method to\nself-supervised learning and we improve their method. Deep InfoMax [Hjelm et al., 2019] networks\nlearn representations which have high mutual information with the input; for detection we use the\nscores of the discriminator network. A recent self-supervised technique is Invariant Information\nClustering (IIC) [Ji et al., 2018] which teaches networks to cluster images without labels but instead\nby learning representations which are invariant to geometric perturbations such as rotations, scaling,\nand skewing. For our supervised baseline, we use a deep network which performs logistic regression,\nand for the negative class we use Outlier Exposure. In Outlier Exposure, the network is exposed to\nexamples from a real, diverse dataset of consisting in out-of-distribution examples. Done correctly,\nthis process teaches the network to generalize to unseen anomalies. For the outlier dataset, we use\n80 Million Tiny Images [Torralba et al., 2008] with CIFAR-10 and CIFAR-100 examples removed.\nCrucial to the success of the supervised baseline is our loss function choice. To ensure the supervised\nbaseline learns from hard examples, we use the Focal Loss [Lin et al., 2017].\n\nMethod. For our self-supervised one-class OOD detector, we use a deep network to predict\ngeometric transformations and thereby surpass previous work and the fully supervised network.\nExamples are rotated 0\u00b0, 90\u00b0, 180\u00b0, or 270\u00b0 then translated 0 or \u00b18 pixels vertically and horizontally.\nThese transformations are composed together, and the network has three softmax heads: one for\npredicting rotation (R), one for predicting vertical translations (Tv), and one for predicting horizontal\ntranslations (Th). Concretely, the anomaly score for an example x is\n\nprot_head(r | G(x)) + pvert_transl_head(s | G(x)) + phoriz_transl_head(t | G(x)),\n\nX\n\nX\n\nX\n\nt\u2208Th\n\nr\u2208R\n\ns\u2208Tv\n\nwhere G is the composition of rotations, vertical translations, and horizontal translations speci\ufb01ed\nby r, p, and q respectively. The set R is the set of rotations, and prot_head(r | \u00b7) is the softmax\nprobability assigned to rotation r by the rotation predictor. Likewise with translations for Tv, Th, s, t,\npvert_transl_head, and phoriz_transl_head. The backbone architecture is a 16-4 WideResNet [Zagoruyko\nand Komodakis, 2016] trained with a dropout rate of 0.3 [Srivastava et al., 2014]. We choose a\n16-4 network because there are fewer training samples. Networks are trained with a cosine learning\nrate schedule [Loshchilov and Hutter, 2016], an initial learning rate of 0.1, Nesterov momentum,\nand a batch size of 128. Data is augmented with standard cropping and mirroring. Our RotNet and\nsupervised baseline use the same backbone architecture and training hyperparameters. When training\nour method with Outlier Exposure, we encourage the network to have uniform softmax responses on\n\n8\n\n\fout-of-distribution data. For Outlier Exposure to work successfully, we applied the aforementioned\ngeometric transformations to the outlier images so that the in-distribution data and the outliers are as\nsimilar as possible.\n\nResults are in Table 3. Notice many self-supervised techniques perform better than methods speci\ufb01-\ncally designed for one-class learning. Also notice that our self-supervised technique outperforms\nOutlier Exposure, the state-of-the-art fully supervised method, which also requires access to out-of-\ndistribution samples to train. In consequence, a model trained with self-supervision can surpass a\nfully supervised model. Combining our self-supervised technique with supervision through Outlier\nExposure nearly solves this CIFAR-10 task.\n\n4.2.2\n\nImageNet\n\nDataset. We consequently turn to a harder dataset to test self-supervised techniques. For this\nexperiment, we select 30 classes from ImageNet Deng et al. [2009]. See the Appendix for the classes.\n\nMethod. Like before, we demonstrate that a self-supervised model can surpass a model that is\nfully supervised. The fully supervised model is trained with Outlier Exposure using ImageNet-22K\noutliers (with ImageNet-1K images removed). The architectural backbone for these experiments is a\nResNet-18. Images are resized such that the smallest side has 256 pixels, while the aspect ratio is\nmaintained. Images are randomly cropped to the size 224 \u00d7 224 \u00d7 3. Since images are larger than\nCIFAR-10, new additions to the self-supervised method are possible. Consequently, we can teach\nthe network to predict whether than image has been resized. In addition, since we should like the\nnetwork to more easily learn shape and compare regions across the whole image, we discovered\nthere is utility in self-attention [Woo et al., 2018] for this task. Other architectural changes, such as\nusing a Wide RevNet [Behrmann et al., 2018] instead of a Wide ResNet, can increase the AUROC\nfrom 65.3% to 77.5%. AUROCs are shown in Table 4. Self-supervised methods outperform the fully\nsupervised baseline by a large margin, yet there is still wide room for improvement on large-scale\nOOD detection.\n\nMethod\n\nSupervised (OE)\nRotNet\nRotNet + Translation\nRotNet + Self-Attention\nRotNet + Translation + Self-Attention\nRotNet + Translation + Self-Attention + Resize (Ours)\n\nAUROC\n\n56.1\n65.3\n77.9\n81.6\n84.8\n85.7\n\nTable 4: AUROC values of supervised and self-supervised OOD detectors. AUROC values are\nan average of 30 AUROCs corresponding to the 30 different models trained on exactly one of the\n30 classes. Each model\u2019s in-distribution examples are from one of 30 classes, and the test out-\nof-distribution samples are from the remaining 29 classes. The self-supervised methods greatly\noutperform the supervised method. All values are percentages.\n\n5 Conclusion\n\nIn this paper, we applied self-supervised learning to improve the robustness and uncertainty of\ndeep learning models beyond what was previously possible with purely supervised approaches. We\nfound large improvements in robustness to adversarial examples, label corruption, and common\ninput corruptions. For all types of robustness that we studied, we observed consistent gains by\nsupplementing current supervised methods with an auxiliary rotation loss. We also found that self-\nsupervised methods can drastically improve out-of-distribution detection on dif\ufb01cult, near-distribution\nanomalies, and that in CIFAR and ImageNet experiments, self-supervised methods outperform fully\nsupervised methods. Self-supervision had the largest improvement over supervised techniques in our\nImageNet experiments, where the larger input size meant that we were able to apply a more complex\nself-supervised objective. Our results suggest that future work in building more robust models and\nbetter data representations could bene\ufb01t greatly from self-supervised approaches.\n\n9\n\n\f5.1 Acknowledgments\n\nThis material is in part based upon work supported by the National Science Foundation Frontier\nGrant. Any opinions, \ufb01ndings, and conclusions or recommendations expressed in this material are\nthose of the author(s) and do not necessarily re\ufb02ect the views of the National Science Foundation.\n\nReferences\n\nAnish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of\nsecurity: Circumventing defenses to adversarial examples. In Proceedings of the 35th International\nConference on Machine Learning, ICML 2018, July 2018.\n\nJens Behrmann, Will Grathwohl, Ricky T. Q. Chen, David Duvenaud, and J\u00f6rn-Henrik Jacobsen.\n\nInvertible residual networks. ArXiv, abs/1811.00995, 2018.\n\nNicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten\n\ndetection methods, 2017.\n\nMoses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. STOC, 2017.\n\nJesse Davis and Mark Goadrich. The relationship between precision-recall and ROC curves. In\n\nInternational Conference on Machine Learning, 2006.\n\nJia Deng, Wei Dong, Richard Socher, Li jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale\n\nhierarchical image database. CVPR, 2009.\n\nCarl Doersch, Abhinav Gupta, and Alexei A Efros. Unsupervised visual representation learning by\ncontext prediction. In Proceedings of the IEEE International Conference on Computer Vision,\npages 1422\u20131430, 2015.\n\nAlexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin Riedmiller, and Thomas Brox.\nDiscriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE\ntransactions on pattern analysis and machine intelligence, 38(9):1734\u20131747, 2016.\n\nSpyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by\n\npredicting image rotations. In International Conference on Learning Representations, 2018.\n\nIzhak Golan and Ran El-Yaniv. Deep anomaly detection using geometric transformations. CoRR,\n\nabs/1805.10917, 2018.\n\nDan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common\n\ncorruptions and perturbations. ICLR, 2019.\n\nDan Hendrycks and Kevin Gimpel. A baseline for detecting misclassi\ufb01ed and out-of-distribution\n\nexamples in neural networks. ICLR, 2017.\n\nDan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. Using trusted data to train\n\ndeep networks on labels corrupted by severe noise. NeurIPS, 2018.\n\nDan Hendrycks, Kimin Lee, and Mantas Mazeika. Using pre-training can improve model robustness\n\nand uncertainty. Proceedings of the International Conference on Machine Learning, 2019a.\n\nDan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier\n\nexposure. In International Conference on Learning Representations, 2019b.\n\nR Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam\nTrischler, and Yoshua Bengio. Learning deep representations by mutual information estimation\nand maximization. In International Conference on Learning Representations, 2019.\n\nOlivier J. H\u00e9naff, Ali Razavi, Carl Doersch, S. M. Ali Eslami, and Aaron van den Oord. Data-ef\ufb01cient\n\nimage recognition with contrastive predictive coding, 2019.\n\nXu Ji, Jo\u00e3o F. Henriques, and Andrea Vedaldi. Invariant information distillation for unsupervised\n\nimage segmentation and clustering. CoRR, abs/1807.06653, 2018.\n\n10\n\n\fAlexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. ICLR,\n\n2017.\n\nGustav Larsson, Michael Maire, and Gregory Shakhnarovich. Learning representations for automatic\n\ncolorization. In European Conference on Computer Vision, pages 577\u2013593. Springer, 2016.\n\nKimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. Training con\ufb01dence-calibrated classi\ufb01ers for\n\ndetecting out-of-distribution samples. ICLR, 2018.\n\nTsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Doll\u00e1r. Focal loss for dense\n\nobject detection. ICCV, 2017.\n\nIlya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with warm restarts. ICLR,\n\n2016.\n\nAleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\n\nTowards deep learning models resistant to adversarial attacks. ICLR, 2018.\n\nDavid F Nettleton, Albert Orriols-Puig, and Albert Fornells. A study of the effect of different types\n\nof noise on the precision of supervised learning techniques. Artif Intell Rev, 2010.\n\nGiorgio Patrini, Alessandro Rozza, Aditya Menon, Richard Nock, and Lizhen Qu. Making deep\n\nneural networks robust to label noise: a loss correction approach. CVPR, 2017.\n\nLukas Ruff, Robert A. Vandermeulen, Nico G\u00f6rnitz, Lucas Deecke, Shoaib A. Siddiqui, Alexander\nBinder, Emmanuel M\u00fcller, and Marius Kloft. Deep one-class classi\ufb01cation. In Proceedings of the\n35th International Conference on Machine Learning, volume 80, pages 4393\u20134402, 2018.\n\nLudwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adver-\n\nsarially robust generalization requires more data. NeurIPS, 2018.\n\nBernhard Sch\u00f6lkopf, Robert Williamson, Alex Smola, John Shawe-Taylor, and John Platt. Support\nvector method for novelty detection. In Proceedings of the 12th International Conference on\nNeural Information Processing Systems, NIPS\u201999, pages 582\u2013588, Cambridge, MA, USA, 1999.\nMIT Press.\n\nNitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov.\nDropout: A simple way to prevent neural networks from over\ufb01tting. The Journal of Machine\nLearning Research, 2014.\n\nSainbayar Sukhbaatar, Joan Bruna, Manohar Paluri, Lubomir Bourdev, and Rob Fergus. Training\n\nconvolutional networks with noisy labels. ICLR Workshop, 2014.\n\nAntonio Torralba, Rob Fergus, and William T Freeman. 80 million tiny images: A large data set for\n\nnonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, 2008.\n\nJonathan Uesato, Brendan O\u2019Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk\n\nand the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666, 2018.\n\nAaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive\n\ncoding. NeurIPS, 2018.\n\nCarl Vondrick, Hamed Pirsiavash, and Antonio Torralba. Anticipating visual representations from\nunlabeled video. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),\nJun 2016. doi: 10.1109/cvpr.2016.18.\n\nCarl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. Track-\ning emerges by colorizing videos. In The European Conference on Computer Vision (ECCV),\nSeptember 2018.\n\nSanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block\n\nattention module. In The European Conference on Computer Vision (ECCV), September 2018.\n\nCihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, and Kaiming He. Feature denoising for\n\nimproving adversarial robustness. arXiv preprint, 2018.\n\n11\n\n\fSergey Zagoruyko and Nikos Komodakis. Wide residual networks. BMVC, 2016.\n\nXiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. S4l: Self-supervised semi-\n\nsupervised learning, 2019.\n\nHongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I.\nJordan. Theoretically principled trade-off between robustness and accuracy. arXiv preprint\narXiv:1901.08573, 2019.\n\nZhilu Zhang and Mert Sabuncu. Generalized cross entropy loss for training deep neural networks\nwith noisy labels. In Advances in Neural Information Processing Systems, pages 8778\u20138788, 2018.\n\n12\n\n\f", "award": [], "sourceid": 9087, "authors": [{"given_name": "Dan", "family_name": "Hendrycks", "institution": "UC Berkeley"}, {"given_name": "Mantas", "family_name": "Mazeika", "institution": "University of Chicago"}, {"given_name": "Saurav", "family_name": "Kadavath", "institution": "UC Berkeley"}, {"given_name": "Dawn", "family_name": "Song", "institution": "UC Berkeley"}]}