{"title": "Theoretical evidence for adversarial robustness through randomization", "book": "Advances in Neural Information Processing Systems", "page_first": 11838, "page_last": 11848, "abstract": "This paper investigates the theory of robustness against adversarial attacks. It\nfocuses on the family of randomization techniques that consist in injecting noise\nin the network at inference time. These techniques have proven effective in many\ncontexts, but lack theoretical arguments. We close this gap by presenting a theo-\nretical analysis of these approaches, hence explaining why they perform well in\npractice. More precisely, we make two new contributions. The first one relates\nthe randomization rate to robustness to adversarial attacks. This result applies for\nthe general family of exponential distributions, and thus extends and unifies the\nprevious approaches. The second contribution consists in devising a new upper\nbound on the adversarial risk gap of randomized neural networks. We support our\ntheoretical claims with a set of experiments.", "full_text": "Theoretical evidence for adversarial robustness\n\nthrough randomization\n\nRafael Pinot1,2 Laurent Meunier1,3 Alexandre Araujo1,4\n\nHisashi Kashima5,6 Florian Yger1 C\u00e9dric Gouy-Pailler2\n\nJamal Atif1\n\n1Universit\u00e9 Paris-Dauphine, PSL Research University, CNRS, LAMSADE, Paris, France\n2Institut LIST, CEA, Universit\u00e9 Paris-Saclay\n3Facebook AI Research, Paris, France\n\n4Wavestone, Paris, France\n\n5Kyoto University, Kyoto, Japan\n\n6RIKEN Center for AIP, Japan\n\nAbstract\n\nThis paper investigates the theory of robustness against adversarial attacks. It\nfocuses on the family of randomization techniques that consist in injecting noise\nin the network at inference time. These techniques have proven effective in many\ncontexts, but lack theoretical arguments. We close this gap by presenting a theo-\nretical analysis of these approaches, hence explaining why they perform well in\npractice. More precisely, we make two new contributions. The \ufb01rst one relates\nthe randomization rate to robustness to adversarial attacks. This result applies for\nthe general family of exponential distributions, and thus extends and uni\ufb01es the\nprevious approaches. The second contribution consists in devising a new upper\nbound on the adversarial risk gap of randomized neural networks. We support our\ntheoretical claims with a set of experiments.\n\n1\n\nIntroduction\n\nAdversarial attacks are some of the most puzzling and burning issues in modern machine learning.\nAn adversarial attack refers to a small, imperceptible change of an input maliciously designed to fool\nthe result of a machine learning algorithm. Since the seminal work of [42] exhibiting this intriguing\nphenomenon in the context of deep learning, a wealth of results have been published on designing\nattacks [18, 34, 32, 23, 6, 31] and defenses [18, 35, 20, 29, 39, 27]), or on trying to understand the\nvery nature of this phenomenon [17, 40, 15, 16]. Most methods remain unsuccessful to defend against\npowerful adversaries [6, 28, 1]. Among the defense strategies, randomization has proven effective\nin some contexts. It consists in injecting random noise (both during training and inference phases)\ninside the network architecture, i.e. at a given layer of the network. Noise can be drawn either from\nGaussian [26, 24, 37], Laplace [24], Uniform [44], or Multinomial [12] distributions. Remarkably,\nmost of the considered distributions belong to the Exponential family. Albeit these signi\ufb01cant efforts,\nseveral theoretical questions remain unanswered. Among these, we tackle the following, for which\nwe provide principled and theoretically-founded answers:\n\nQ1: To what extent does a noise drawn from the Exponential family preserve robustness\n\n(in a sense to be de\ufb01ned) to adversarial attacks?\n\nA1: We introduce a de\ufb01nition of robustness to adversarial attacks that is suitable to the randomization\ndefense mechanism. As this mechanism can be described as a non-deterministic querying process,\ncalled probabilistic mapping in the sequel, we propose a formal de\ufb01nition of robustness relying on a\nmetric/divergence between probability measures. A key question arises then about the appropriate\nmetric/divergence for our context. This requires tools for comparing divergences w.r.t. the introduced\nrobustness de\ufb01nition. Renyi divergence turned out to be a measure of choice, since it satis\ufb01es most of\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fthe desired properties (coherence, strength, and computational tractability). Finally, thanks to the\nexisting links between Renyi divergence and the Exponential family, we prove that methods based on\nnoise injection from the Exponential family ensures robustness to adversarial attacks (cf Theorem 1).\nQ2: Can we guarantee a good accuracy under attack for classi\ufb01ers defended with these noises?\nA2: We present an upper bound on the drop of accuracy (under attack) of the methods defended\nwith noise drawn from the Exponential family (cf. Theorem 2). We also provide a certi\ufb01cate on\naccuracy under attack for this kind of noise (cf Theorem 3). We illustrate this result by training\ndifferent randomized models with Laplace and Gaussian distributions on CIFAR10/CIFAR100. These\nexperiments highlight the trade-off between accuracy and robustness that depends on the amount of\nnoise one injects in the network. Our theoretical and experimental conclusion is that randomized\ndefenses are competitive (with the state-of-the-art [28]) given the intensity of the injected noise.\nOutline of the paper: We present in Section 2 the related work on randomized defenses to adversarial\nexamples. Section 3 introduces the de\ufb01nition of robustness relying on a metric/divergence between\nprobability measures, and discusses the key role of the Renyi divergence. We state in Section 4 our\nmain results on the robustness and accuracy of Exponential family-based defenses. Section 5 presents\nextensive experiments supporting our theoretical \ufb01ndings. Section 6 provides concluding remarks.\n\n2 Related works\n\nInjecting noise into algorithms to improve their robustness has been used for ages in detection and\nsignal processing tasks [46, 7, 30]. It has also been extensively studied in several machine learning\nand optimization \ufb01elds, e.g. robust optimization [4] and data augmentation techniques [36]. Recently,\nnoise injection techniques have been adopted by the adversarial defense community, especially for\nneural networks, with very promising results. Randomization techniques are generally oriented\ntowards one of the following objectives: experimental robustness or provable robustness.\nExperimental robustness: The \ufb01rst technique explicitly using randomization at inference time as a\ndefense appeared during the 2017 NIPS defense challenge [44]. This method uniformly samples over\ngeometric transformations of the image to select a substitute image to feed the network. Then [12]\nproposed to use stochastic activation pruning based on a multinomial distribution for adversarial\ndefense. Several papers [26, 37] propose to inject Gaussian noise directly on the activation of selected\nlayers both at training and inference time. While these works hypothesize that noise injection makes\nthe network robust to adversarial perturbations, they do not provide any formal justi\ufb01cation on the\nnature of the noise they use or on the loss of accuracy/robustness of the network.\nProvable robustness: In [24], the authors proposed a randomization method by exploiting the link\nbetween differential privacy [14] and adversarial robustness. Their framework, called \u201crandomized\nsmoothing\u201d 1, inherits some theoretical results from the differential privacy community allowing\nthem to evaluate the level of accuracy under attack of their method. Initial results from [24] have been\nre\ufb01ned in [25], and [9]. Our work belongs to this line of research. However, our framework does\nnot treat exactly the same class of defenses. Notably, we provide theoretical arguments supporting\nthe defense strategy based on randomization techniques relying on the exponential family, and\nderive a new bound on the adversarial risk gap, which completes the results obtained so far on\ncerti\ufb01ed robustness. Furthermore, our main focus is on the network randomized by noise injection,\n\u201crandomized smoothing\u201d instead uses this network to create a new classi\ufb01er robust to attacks.\nSince the initial discovery of adversarial examples, a wealth of non randomized defense approaches\nhave also been proposed, inspired by various machine learning domains such as adversarial train-\ning [18, 27], image reconstruction [29, 39] or robust learning [18, 27]. Even if these methods have\ntheir own merits, a thorough evaluation made by [1] shows that most defenses can be easily broken\nwith known powerful attacks [27, 6, 8]. Adversarial training, which consists in training a model\ndirectly on adversarial examples, came out as the best defense in average. Defense based on ran-\ndomization could be overcome by the Expectation Over Transformation technique proposed by [2]\nwhich consists in taking the expectation over the network to craft the perturbation. In this paper, to\nensure that our results are not biased by obfuscated gradients, we follow the principles of [1, 5] and\nevaluate our randomized networks with this technique. We show that randomized defenses are still\ncompetitive given the intensity of noise injected in the network.\n\n1Name introduced in [9] which came later than [24].\n\n2\n\n\f3 General de\ufb01nitions of risk and robustness\n\n3.1 Risk, robustness and probabilistic mappings\nLet us consider two spaces X (with norm (cid:107).(cid:107)X ), and Y. We consider the classi\ufb01cation task that\nseeks a hypothesis (classi\ufb01er) h : X \u2192 Y minimizing the risk of h w.r.t. some ground-truth\ndistribution D over X \u00d7 Y. The risk of h w.r.t D is de\ufb01ned as Risk(h) := E(x,y)\u223cD [1 (h(x) (cid:54)= y)].\nGiven a classi\ufb01er h : X \u2192 Y, and some input x \u2208 X with true label ytrue \u2208 Y, to generate an\nadversarial example, the adversary seeks a \u03c4 such that h(x + \u03c4 ) (cid:54)= ytrue, with some budget \u03b1 over\nthe perturbation (i.e with (cid:107)\u03c4(cid:107)X \u2264 \u03b1). \u03b1 represents the maximum amount of perturbation one can\nadd to x without being spotted (the perturbation remains humanly imperceptible). The overall goal of\nthe adversary is to \ufb01nd a perturbation crafting strategy that both maximizes the risk of h, and keeps\nthe values of (cid:107)\u03c4(cid:107)X small. To measure this risk \"under attack\" we de\ufb01ne the notion of adversarial\n\u03b1-radius risk of h w.r.t. D as follows\n\n(cid:35)\n\nRisk\u03b1(h) := E(x,y)\u223cD\n\n1 (h(x + \u03c4 ) (cid:54)= y)\n\n.\n\n(cid:34)\n\nsup\n\n(cid:107)\u03c4(cid:107)X \u2264\u03b1\n\nIn practice, the adversary does not have any access to the ground-truth distribution. The literature\nproposed several surrogate versions of Risk\u03b1(h) (see [13] for more details) to overcome this issue.\nWe focus our analysis on the one used in e.g [42], or [15] denoted \u03b1-radius prediction-change risk of\nh w.r.t. DX (marginal of D for X ), and de\ufb01ned as\n\nPC-Risk\u03b1(h) := Px\u223cDX [\u2203\u03c4 \u2208 B(\u03b1) s.t. h(x + \u03c4 ) (cid:54)= h(x)]\n\nwhere for any \u03b1 \u2265 0, B(\u03b1) := {\u03c4 \u2208 X s.t. (cid:107)\u03c4(cid:107)X \u2264 \u03b1} .\nAs we will inject some noise in our classi\ufb01er in order to defend against adversarial attacks, we need\nto introduce the notion of \u201cprobabilistic mapping\u201d. Let Y be the output space, and FY a \u03c3-algebra\nover Y. Let us also denote P(Y) the set of probability measures over (Y,FY ).\nDe\ufb01nition 1 (Probabilistic mapping). Let X be an arbitrary space, and (Y,FY ) a measurable space.\nA probabilistic mapping from X to Y is a mapping M : X \u2192 P(Y). To obtain a numerical output\nout of this probabilistic mapping, one needs to sample y according to M(x).\nThis de\ufb01nition does not depend on the nature of Y as long as (Y,FY ) is measurable. In that sense, Y\ncould be either the label space or any intermediate space corresponding to the output of an arbitrary\nhidden layer of a neural network. Moreover, any mapping can be considered as a probabilistic\nmapping, whether it explicitly injects noise (as in [24, 37, 12]) or not. In fact, any deterministic\nmapping can be considered as a probabilistic mapping, since it can be characterized by a Dirac\nmeasure. Accordingly, the de\ufb01nition of a probabilistic mapping is fully general and equally treats\nnetworks with or without noise injection. There exists no de\ufb01nition of robustness against adversarial\nattacks that comply with the notion of probabilistic mappings. We settle that by generalizing the\nnotion of prediction-change risk initially introduced in [13] for deterministic classi\ufb01ers. Let M be\na probabilistic mapping from X to Y, and dP(Y) some metric/divergence on P(Y). We de\ufb01ne the\n(\u03b1, \u0001)-radius prediction-change risk of M w.r.t. DX and dP(Y) as\n\n(cid:2)\u2203\u03c4 \u2208 B(\u03b1) s.t. dP(Y)(M(x + \u03c4 ), M(x)) > \u0001(cid:3) .\n\nPC-Risk\u03b1(M, \u0001) := Px\u223cDX\n\nThese three generalized notions allow us to analyze noise injection defense mechanisms (Theorems 1,\nand 2). We can also de\ufb01ne adversarial robustness (and later adversarial gap) thanks to these notions.\nDe\ufb01nition 2 (Adversarial robustness). Let dP(Y) be a metric/divergence on P(Y). A probabilistic\nmapping M is called dP(Y)-(\u03b1, \u0001, \u03b3) robust if PC-Risk\u03b1(M, \u0001) \u2264 \u03b3, dP(Y)-(\u03b1, \u0001) robust if \u03b3 = 0.\nIt is dif\ufb01cult in general to show that a classi\ufb01er is dP(Y)-(\u03b1, \u0001, \u03b3) robust. However, we can derive\nsome bounds for particular divergences that will ensure robustness up to a certain level (Theorem 1).\nIt is worth noting that our de\ufb01nition of robustness depends on the considered metric/divergence\nbetween probability measures. Lemma 1 gives some insights on the monotony of the robustness\naccording to the parameters, and the probability metric/divergence at hand.\nLemma 1. Let M be a probabilistic mapping, and let d1 and d2 be two metrics on P(Y). If there\nexists a non decreasing function \u03c6 : R \u2192 R such that \u2200\u00b51, \u00b52 \u2208 P(Y), d1(\u00b51, \u00b52) \u2264 \u03c6(d2(\u00b51, \u00b52)),\nthen the following assertion holds: M is d2-(\u03b1, \u0001, \u03b3)-robust =\u21d2 M is d1-(\u03b1, \u03c6(\u0001), \u03b3)-robust.\n\n3\n\n\fAs suggested in De\ufb01nition 2 and Lemma 1, any given choice of metric/divergence will instantiate a\nparticular notion of adversarial robustness and it should be carefully selected.\n\n3.2 On the choice of the metric/divergence for robustness\n\nwith the cost function c(y1, y2) = 1 (y1 (cid:54)= y2): inf(cid:82)\n\nThe aforementioned formulation naturally raises the question of the choice of the metric used to\ndefend against adversarial attacks. The main notions that govern the selection of an appropriate\nmetric/divergence are coherence, strength, and computational tractability. A metric/divergence is\nsaid to be coherent if it naturally \ufb01ts the task at hand (e.g. classi\ufb01cation tasks are intrinsically linked to\ndiscrete/trivial metrics, conversely to regression tasks). The strength of a metric/divergence refers to\nits ability to cover (dominate) a wide class of others in the sense of Lemma 1. In the following, we will\nfocus on both the total variation metric and the Renyi divergence, that we consider as respectively the\nmost coherent with the classi\ufb01cation task using probabilistic mappings, and the strongest divergence.\nWe \ufb01rst discuss how total variation metric is coherent with randomized classi\ufb01ers but suffers from\ncomputational issues. The Renyi divergence provides good guarantees about adversarial robustness,\nenjoys nice computational properties, in particular when considering Exponential family distributions,\nand is strong enough to dominate a wide range of metrics/divergences including total variation.\nLet \u00b51 and \u00b52 be two measures in P(Y), both dominated by a third measure \u03bd. The trivial distance\ndT (\u00b51, \u00b52) := 1 (\u00b51 (cid:54)= \u00b52) is the simplest distance one can de\ufb01ne between \u00b51 and \u00b52. In the\ndeterministic case, it is straightforward to compute (since the numerical output of the algorithm\ncharacterizes its associated measure), but this is not the case in general. In fact one might not\nhave access to the true distribution of the mapping, but just to the numerical outputs. Therefore,\none needs to consider more sophisticated metrics/divergences, such as the total variation distance\ndT V (\u00b51, \u00b52) := supY \u2208FY |\u00b51(Y ) \u2212 \u00b52(Y )|. The total variation distance is one of the most broadly\nused probability metrics. It admits several very simple interpretations, and is a very useful tool in\nmany mathematical \ufb01elds such as probability theory, Bayesian statistics, coupling or transportation\ntheory. In transportation theory, it can be rewritten as the solution of the Monge-Kantorovich problem\n1 (y1 (cid:54)= y2) d\u03c0(y1, y2) , where the in\ufb01mum\nis taken over all joint probability measures \u03c0 on (Y \u00d7 Y,FY \u2297 FY ) with marginals \u00b51 and \u00b52.\nAccording to this interpretation, it seems quite natural to consider the total variation distance as a\nrelaxation of the trivial distance on [0, 1] (see [43] for details). In the deterministic case, the total\nvariation and the trivial distance coincides. In general, the total variation allows a \ufb01ner analysis of the\nprobabilistic mappings than the trivial distance. But it suffers from a high computational complexity.\nIn the following of the paper we will show how to ensure robustness regarding TV distance.\nFinally, denoting by g1 and g2 the respective probability distributions w.r.t. \u03bd, the Renyi divergence of\norder \u03bb [38] writes as dR,\u03bb(\u00b51, \u00b52) := 1\nd\u03bd(y). The Renyi divergence is a\ngeneralized measure de\ufb01ned on the interval (1,\u221e), where it equals the Kullback-Leibler divergence\nwhen \u03bb \u2192 1 (that will be denoted dKL), and the maximum divergence when \u03bb \u2192 \u221e. It also has the\nvery special property of being non decreasing w.r.t. \u03bb. This divergence is very common in machine\nlearning, especially in its Kullback-Leibler form as it is widely used as the loss function (cross\nentropy) of classi\ufb01cation algorithms. It enjoys the desired properties since it bounds the TV distance,\nand is tractable. Furthermore, Proposition 1 proves that Renyi-robustness implies TV-robustness,\nmaking it a suitable surrogate for the trivial distance.\nProposition 1 (Renyi implies TV-robustness). Let M be a probabilistic mapping, then for all \u03bb \u2265 1,\n\u0001 > 0, there exists \u0001(cid:48) > 0 s.t. if M is dR,\u03bb-(\u03b1, \u0001, \u03b3)-robust then M is dT V -(\u03b1, \u0001(cid:48), \u03b3)-robust.\nA crucial property of Renyi-robustness is the Data processing inequality. It is a well-known inequality\nfrom information theory which states that \u201cpost-processing cannot increase information\u201d [10, 3].\nIn our case, if we consider a Renyi-robust probabilistic mapping, composing it with a deterministic\nmapping maintains Renyi-robustness with the same level.\nProposition 2 (Data processing inequality). Let us consider a probabilistic mapping M : X \u2192 P(Y),\nand denote \u03c1 : Y \u2192 Y(cid:48) a deterministic function. If U \u223c M(x) then the probability measure M(cid:48)(x)\ns.t. \u03c1(U ) \u223c M(cid:48)(x) de\ufb01nes a probabilistic mapping M(cid:48) : X \u2192 P(Y(cid:48)). For any \u03bb > 1, if M is\ndR,\u03bb-(\u03b1, \u0001, \u03b3) robust then M(cid:48) is also dR,\u03bb-(\u03b1, \u0001, \u03b3) robust.\nData processing inequality will allow us later to inject some additive noise in any layer of a neural\nnetwork and to ensure Renyi-robustness.\n\n\u03bb\u22121 log(cid:82)\n\nY g2(y)\n\nY 2\n\n(cid:16) g1(y)\n\n(cid:17)\u03bb\n\ng2(y)\n\n4\n\n\f4 Defense mechanisms based on Exponential family noise injection\n\n4.1 Robustness through Exponential family noise injection\n\nFor now, the question of which class of noise to add is treated ad hoc. We choose here to investigate\none particular class of noise closely linked to the Renyi divergence, namely Exponential family\ndistributions, and demonstrate their interest. Let us \ufb01rst recall what the Exponential family is.\nDe\ufb01nition 3 (Exponential family). Let \u0398 be an open convex set of Rn, and \u03b8 \u2208 \u0398. Let \u03bd be a\nmeasure dominated by \u00b5 (either by the Lebesgue or counting measure), it is said to be part of the\nExponential family of parameter \u03b8 (denoted EF (\u03b8, t, k)) if it has the following p.d.f.\n\npF (z, \u03b8) = exp{(cid:104)t(z), \u03b8(cid:105) \u2212 u(\u03b8) + k(z)}\n\nwhere t(z) is a suf\ufb01cient statistic, k a carrier measure (either for a Lebesgue or a counting measure)\n\nz exp{< t(z), \u03b8 > +k(z)} dz.\n\nand u(\u03b8) = log(cid:82)\n\nTo show the robustness of randomized networks with noise injected from the Exponential family, one\nneeds to de\ufb01ne the notion of sensitivity for a given deterministic function:\nDe\ufb01nition 4 (Sensitivity of a function). For any \u03b1 \u2265 0 and for any ||.||A and ||.||B two norms, the\n\u03b1-sensitivity of f w.r.t. ||.||A and ||.||B is de\ufb01ned as\n\n\u2206A,B\n\n\u03b1 (f ) :=\n\nsup\n\nx,y\u2208X ,||x\u2212y||A\u2264\u03b1\n\n||f (x) \u2212 f (y)||B .\n\nLet us consider an n-layer feedforward neural network N (.) = \u03c6n \u25e6 ... \u25e6 \u03c61(.). For any i \u2208 [n],\nwe de\ufb01ne N|i(.) = \u03c6i \u25e6 ... \u25e6 \u03c61(.) the neural network truncated at layer i. Theorem 1 shows that,\ninjecting noise drawn from an Exponential family distribution ensures robustness to adversarial\nexample attacks in the sense of De\ufb01nition 2.\nTheorem 1 (Exponential family ensures robustness). Let us denote N i\nX (.) = \u03c6n \u25e6 ...\u25e6 \u03c6i+1(N|i(.) +\nX) with X a random variable. Let us also consider two arbitrary norms ||.||A and ||.||B respectively\non X and on the output space of N i\nX.\n\n(\u2206A,B\n\nt\n\n\u2022 If X \u223c EF (\u03b8, t, k) where t and k have non-decreasing modulus of continuity \u03c9t and \u03c9k.\nThen for any \u03b1 \u2265 0, N i\nX (.) de\ufb01nes a probabilistic mapping that is dR,\u03bb-(\u03b1, \u0001) robust with\n\u03b1 (\u03c6)) where ||.||2 is the norm corresponding to the\n\u0001 = ||\u03b8||2\u03c9B,2\nscalar product in the de\ufb01nition of the exponential family density function and ||.||1 is the\nabsolute value on R. Notions of continuity modulus is de\ufb01ned in the supplementary material.\n\u2022 If X is a centered Gaussian random variable with a non degenerated matrix parameter \u03a3.\nX (.) de\ufb01nes a probabilistic mapping that is dR,\u03bb-(\u03b1, \u0001) robust with\n\n\u03b1 (\u03c6)) + \u03c9B,1\n\n(\u2206A,B\n\nk\n\n\u03b1 (\u03c6)2\n\n2\u03c3min(\u03a3) where ||.||2 is the canonical Euclidean norm on Rn.\n\nThen for any \u03b1 \u2265 0, N i\n\u0001 = \u03bb\u2206A,2\n\nIn simpler words, the previous theorem ensures stability in the neural network when injecting noise\nthe distribution of the output. Intuitively, if two inputs are close w.r.t. (cid:107).(cid:107)A, the output\nw.r.t.\ndistributions of the network will be close in the sense of Renyi divergence. It is well known that\nin the case of deterministic neural networks, the Lipschitz constant becomes bigger as the number\nof layers increases [19]. By injecting noise at layer i, the notion of robustness only depends on the\nsensitivity of the \ufb01rst i layers of the network and not the following ones. In that sense, randomization\nprovides a more precise control on the \u201ccontinuity\u201d of the neural network. In the next section, we\nshow that thanks to the notion of robustness w.r.t. probabilistic mappings, one can bound the loss of\naccuracy of a randomized neural network when it is attacked.\n\n4.2 Bound on the risk gap under attack and certi\ufb01ed accuracy\n\nThe notions of risk and adversarial risk can easily be generalized to encompass probabilistic mappings.\nDe\ufb01nition 5 (Risks for probabilistic mappings). Let M be a probabilistic mapping from X to Y, the\nrisk and the \u03b1-radius adversarial risk of M w.r.t. D are de\ufb01ned as envoenvoie le\n\nRisk(M) := E(x,y)\u223cD(cid:2)Ey(cid:48)\u223cM(x) [1 (y(cid:48) (cid:54)= y)](cid:3)\n\n(cid:35)\n\nRisk\u03b1(M) := E(x,y)\u223cD\n\nEy(cid:48)\u223cM(x+\u03c4 ) [1 (y(cid:48) (cid:54)= y)]\n\n.\n\n(cid:34)\n\nsup\n\n(cid:107)\u03c4(cid:107)X \u2264\u03b1\n\n5\n\n\fThe de\ufb01nition of adversarial risk for a probabilistic mapping can be matched with the concept of\nExpectation over Transformation (EoT) attacks [1]. Indeed, EoT attacks aim at computing the best\nopponent in expectation for a given random transformation. In the adversarial risk de\ufb01nition, the\nadversary chooses the perturbation which has the greatest probability to fool the model, which is\na stronger objective than the EoT objective. Theorem 2 provides a bound on the gap between the\nadversarial risk and the regular risk:\nTheorem 2 (Adversarial risk gap bound in the randomized setting). Let M be the probabilistic\nmapping at hand. Let us suppose that M is dR,\u03bb-(\u03b1, \u0001) robust for some \u03bb \u2265 1 then:\n\nwhere H is the Shannon entropy H(p) = \u2212(cid:80)\n\n| Risk\u03b1(M) \u2212 Risk(M)| \u2264 1 \u2212 e\u2212\u0001Ex\n\ni pi log(pi) .\n\n(cid:104)\n\ne\u2212H(M(x))(cid:105)\n\nThis theorem gives a control on the loss of accuracy under attack w.r.t. the robustness parameter \u0001\nand the entropy of the predictor. It provides a tradeoff between the quantity of noise added in the\nnetwork and the accuracy under attack. Intuitively, when the noise increases, for any input, the output\ndistribution tends towards the uniform distribution, then, \u0001 \u2192 0 and H(M(x)) \u2192 log(K), and the\nrisk and the adversarial risk both tends to 1\nK where K is the number of classes in the classi\ufb01cation\nproblem. On the opposite, if no noise is injected, for any input, the output distribution is a Dirac\ndistribution, then, if the prediction for the adversarial example is not the same as for the regular one,\n\u0001 \u2192 \u221e and H(M(x)) \u2192 0. Hence, the noise needs to be designed both to preserve accuracy and\nrobustness to adversarial attacks. In the Section 5, we give an illustration of this bound when M is a\nneural network with noise injection at input level as presented in Theorem 1. In practice, we do not\nhave access to the real value of the entropy, but we estimate it with classical estimators [33].\nOur framework being general enough it encompasses several known accuracy certi\ufb01cates from the\nliterature, e.g. the one provided in [24]. Interestingly, we can introduce the following one, based on\nour de\ufb01nition of robustness.\nTheorem 3. Let x \u2208 X , and M be a probabilistic mapping with values in RK. If M is dR,\u03bb-(\u03b1, \u0001)\nrobust, and if there exist k\u2217 and \u03b4\u2217 \u2208 (0, 1) s.t. Ey\u223cM(x) [yk\u2217 ] > e2\u0001(cid:48)\n)\u03b4\u2217,\nwith \u0001(cid:48) = \u0001+ log(1/\u03b4\u2217)\nEy\u223cM(x) [yk] there is no perturbation\n\u03c4 \u2208 B(\u03b1) such that f (x) (cid:54)= f (x + \u03c4 ).\n\n. Then, for the classi\ufb01er f : x (cid:55)\u2192 argmax\nk\u2208[K]\n\nEy\u223cM(x) [yi] + (1 + e\u0001(cid:48)\n\nmax\ni(cid:54)=k\u2217\n\n\u03bb\u22121\n\nAs the main focus of this work is to give theoretical evidence for randomization techniques, numerical\nexperiments will mainly focus on Theorem 1 and 2 and not on certi\ufb01cates (Theorem 3).\n\n5 Numerical experiments\n\nTo illustrate our theoretical \ufb01ndings, we train randomized neural networks with a simple method\nwhich consists in injecting a noise drawn from an Exponential family distribution in the image during\ntraining and inference. This section aims to answer Q2 stated in the introduction, by tackling the\nfollowing sub-questions:\n\nQ2.1: How does the randomization impact the accuracy of the network? And, how does the\n\ntheoretical trade-off between accuracy and robustness apply in practice?\n\nQ2.2: What is the accuracy under attack of randomized neural networks against powerful iterative\nattacks? And how does randomized neural networks compare to state-of-the-art defenses\ngiven the intensity of the injected noise?\n\n5.1 Experimental setup\n\nWe present our results and analysis on CIFAR-10, CIFAR-100 [22] and ImageNet datasets [11]. For\nCIFAR-10 and CIFAR-100 [22], we used a Wide ResNet architecture [45] which is a variant of the\nResNet model from [21]. We use 28 layers with a widen factor of 10. We train all networks for\n200 epochs, a batch size of 400, dropout 0.3 and Leaky Relu activation with a slope on R\u2212 of 0.1.\nWe minimize the Cross Entropy Loss with Momentum 0.9 and use a piecewise constant learning\n\n6\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 1: (a) Impact of the standard deviation of the injected noise on accuracy in a randomized\nmodel on CIFAR-10 with a Wide ResNet architecture. (b) and (c) illustration of the guaranteed\naccuracy of different randomized models with Gaussian (b) and Laplace (c) noises given the norm of\nthe adversarial perturbation. The accuracies and entropies and estimated empirically.\n\nrate of 0.1, 0.02, 0.004 and 0.00008 after respectively 7500, 15000 and 20000 steps. The networks\nachieve for CIFAR10 and 100 a TOP-1 accuracy of 95.8% and 79.1% respectively on test images.\nFor ImageNet [11], we use an Inception ResNet v2 [41] which is the sate of the art architecture for\nthis dataset and achieve a TOP-1 accuracy of 80%. For the training of ImageNet, we use the same\nhyper parameters setting as the original implementation. We train the network for 120 epochs with\na batch size of 256, dropout 0.8 and Relu as activation function. All evaluations were done with a\nsingle crop on the non-blacklisted subset of the validation set.\nTo transform these classical networks to probabilistic mappings, we inject noise drawn from Laplace\nand Gaussian distributions, each with various standard deviations. While the noise could theoretically\nbe injected anywhere in the network, we inject the noise on the image for simplicity. More experiments\nwith noise injected in the \ufb01rst layer of the network are presented in the supplementary material.\nTo evaluate our models under attack, we use three powerful iterative attacks with different norms:\nElasticNet attack (EAD) [8] with (cid:96)1 distortion, Carlini&Wagner attack (C&W) [6] with (cid:96)2 distortion\nand Projected Gradient Descent attack (PGD) [27] with (cid:96)\u221e distortion. All standard deviations and\nattack intensities are in between \u22121 and 1. Precise descriptions of our numerical experiments and of\nthe attacks used for evaluation are deferred to the supplementary material.\nAttacks against randomized defenses: It has been pointed out by [2, 5] that in a white box setting,\nan attacker with a complete knowledge of the system will know the distribution of the noise injected\nin the network. As such, to create a stronger adversarial example, the attacker can take the expectation\nof the loss or the logits of the randomized network during the computation of the attack. This\ntechnique is called Expectation Over Transformation (EoT) and we use a Monte Carlo method with\n80 simulations to approximate the best perturbation for a randomized network.\n\n5.2 Experimental results\n\nTrade-off between accuracy and intensity of noise (Q2.1): When injecting noise as a defense\nmechanism, regardless of the distribution it is drawn from, we observe (as in Figure 1(a)) that the\naccuracy decreases when the noise intensity grows. In that sense, noise needs to be calibrated to\npreserve both accuracy and robustness against adversarial attacks, i.e. it needs to be large enough to\npreserve robustness and small enough to preserve accuracy. Figure 1(a) shows the loss of accuracy on\nCIFAR10 from 0.95 to 0.82 (respectively 0.95 to 0.84) with noise drawn from a Gaussian distribution\n(respectively Laplace) with a standard deviation from 0.01 to 0.5. Figure 1(b) and 1(c) illustrate the\ntheoretical lower bound on accuracy under attack of Theorem 2 for different distributions and standard\ndeviations. The term in entropy of Theorem 2 has been estimated using a Monte Carlo method with\n104 simulations. The trade-off between accuracy and robustness from Theorem 2 thus appears w.r.t\nthe noise intensity. With small noises, the accuracy is high, but the guaranteed accuracy drops fast\nw.r.t the magnitude of the adversarial perturbation. Conversely, with bigger noises, the accuracy is\nlower but decreases slowly w.r.t the magnitude of the adversarial perturbation. These Figures also\nshow that Theorem 2 gives strong accuracy guarantees against small adversarial perturbations. Next\nparagraph shows that in practice, randomized networks achieve much higher accuracy under attack\nthan the theoretical bound, and against much larger perturbations.\n\n7\n\n0.00.10.20.30.40.5Standard Deviation0.800.830.850.880.900.930.950.981.00Accuracy vs Standard devation on Randomized models on CIFAR10GaussLaplace0.00.20.40.60.81.0Maximal norm of 2 perturbation0.00.20.40.60.81.0Accuracy Guarantee of Randomized models with Gaussian noise on CIFAR10=0.01=0.015=0.02=0.03=0.05=0.08=0.13=0.2=0.32=0.50.000.100.200.300.400.50Maximal norm of 1 perturbation0.00.20.40.60.81.0Accuracy Guarantee of Randomized models with Laplace noise on CIFAR10=0.01=0.015=0.02=0.03=0.05=0.08=0.13=0.2=0.32=0.5\fTable 1: Accuracy under attack on the CIFAR-10 dataset with a randomized Wide ResNet architecture.\nWe compare the accuracy on natural images and under attack with different noise over 3 iterative\nattacks (the number of steps is next to the name) made with 80 Monte Carlo simulations to compute\nEoT attacks. The \ufb01rst line is the baseline, no noise has been injected.\n\nDistribution\n-\n\nNormal\n\nLaplace\n\nSd\n-\n\n0.01\n0.50\n0.01\n0.50\n\nNatural\n0.958\n0.954\n0.824\n0.955\n0.846\n\n(cid:96)1 \u2013 EAD 60\n\n(cid:96)2 \u2013 C&W 60\n\n(cid:96)\u221e \u2013 PGD 20\n\n0.035\n0.193\n0.448\n0.208\n0.464\n\n0.034\n0.294\n0.523\n0.313\n0.494\n\n0.384\n0.408\n0.587\n0.389\n0.589\n\nTable 2: Accuracy under attack of randomized neural network with different distributions and\nstandard deviations versus adversarial training by Madry et al. [27]. The PGD attack has been made\nwith 20 step, an epsilon of 0.06 and a step size of 0.006 (input space between \u22121 and +1). The\nCarlini&Wagner attack uses 30 steps, 9 binary search steps and a 0.01 learning rate. The \ufb01rst line\nrefers to the baseline without attack.\n\nAttack\n\nSteps\n\n-\n\n-\n(cid:96)\u221e \u2013 PGD\n20\n(cid:96)2 \u2013 C&W 30\n\nMadry et al.\n\n[27]\n0.873\n0.456\n0.468\n\nNormal 0.32 Laplace 0.32 Normal 0.5 Laplace 0.5\n\n0.876\n0.566\n0.512\n\n0.891\n0.576\n0.502\n\n0.824\n0.587\n0.489\n\n0.846\n0.589\n0.479\n\nPerformance of randomized networks under attacks and comparison to state of the art (Q2.2):\nWhile Figure 1(b) and 1(c) illustrated a theoretical robustness against growing adversarial pertur-\nbations, Table 1 illustrates this trade-off experimentally. It compares the accuracy under attack of\na deterministic network with the one of randomized networks with Gaussian and Laplace noises\nboth with low (0.01) and high (0.5) standard deviations. Randomized networks with a small noise\nlead to no loss in accuracy with a small robustness while high noises lead to a higher robustness\nat the expense of loss of accuracy (\u223c 11 points). Table 2 compares the accuracy and the accuracy\nunder attack of randomized networks with Gaussian and Laplace distributions for different standard\ndeviations against adversarial training [27]. We observe that the accuracy on natural images of both\nnoise injection methods are similar to the one from [27]. Moreover, both methods are more robust\nthan adversarial training to PGD and C&W attacks. With all the experiments, to construct an EoT\nattack, we use 80 Monte Carlo simulations at every step the attacks. These experiments show that\nrandomized defenses can be competitive given the intensity of noise injected in the network. Note\nthat these experiments have been led with EoT of size 80. For much bigger sizes of EoT these results\nwould be mitigated. Nevertheless, the accuracy would never drop under the bounds illustrated in\nFigure 5.2, since Theorem 2 gives a bound that on the worst case attack strategy (including EoT).\n\n6 Conclusion and future work\n\nThis paper brings new contributions to the \ufb01eld of provable defenses to adversarial attacks. Principled\nanswers have been provided to key questions on the interest of randomization techniques, and on their\nloss of accuracy under attack. The obtained bounds have been illustrated in practice by conducting\nthorough experiments on baseline datasets such as CIFAR and ImageNet. We show in particular that a\nsimple method based on injecting noise drawn from the Exponential family is competitive compared\nto baseline approaches while leading to provable guarantees. Future work will focus on investigating\nother noise distributions belonging or not to the Exponential family, combining randomization with\nmore sophisticated defenses and on devising new tight bounds on the adversarial risk gap.\nAcknowledgements: This work was granted access to the OpenPOWER prototype from GENCI-\nIDRIS under the Preparatory Access AP010610510 made by GENCI. R. Pinot bene\ufb01ted from a JSPS\nSummer Program Fellowship during this work (Grant number SP18218). L. Meunier and J. Atif\nwould also like to thank Adrien Balp from Soci\u00e9t\u00e9 G\u00e9n\u00e9rale for his support.\n\n8\n\n\fReferences\n[1] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security:\nCircumventing defenses to adversarial examples. In J. Dy and A. Krause, editors, Proceedings of\nthe 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine\nLearning Research, pages 274\u2013283, Stockholmsm\u00e4ssan, Stockholm Sweden, 10\u201315 Jul 2018.\nPMLR.\n\n[2] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples.\nIn J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Ma-\nchine Learning, volume 80 of Proceedings of Machine Learning Research, pages 284\u2013293,\nStockholmsm\u00e4ssan, Stockholm Sweden, 10\u201315 Jul 2018. PMLR.\n\n[3] N. J. Beaudry and R. Renner. An intuitive proof of the data processing inequality. Quantum\n\nInfo. Comput., 12(5-6):432\u2013441, May 2012.\n\n[4] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust optimization, volume 28. Princeton\n\nUniversity Press, 2009.\n\n[5] N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, and\n\nA. Madry. On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.\n\n[6] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE\n\nSymposium on Security and Privacy (SP), pages 39\u201357. IEEE, 2017.\n\n[7] F. Chapeau-Blondeau and D. Rousseau. Noise-enhanced performance for an optimal bayesian\n\nestimator. IEEE Transactions on Signal Processing, 52(5):1327\u20131334, 2004.\n\n[8] P.-Y. Chen, Y. Sharma, H. Zhang, J. Yi, and C.-J. Hsieh. Ead: elastic-net attacks to deep neural\nnetworks via adversarial examples. In Thirty-second AAAI conference on arti\ufb01cial intelligence,\n2018.\n\n[9] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter. Certi\ufb01ed adversarial robustness via randomized\n\nsmoothing. CoRR, abs/1902.02918, 2019.\n\n[10] T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2012.\n\n[11] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei.\n\nHierarchical Image Database. In CVPR09, 2009.\n\nImageNet: A Large-Scale\n\n[12] G. S. Dhillon, K. Azizzadenesheli, J. D. Bernstein, J. Kossai\ufb01, A. Khanna, Z. C. Lipton, and\nA. Anandkumar. Stochastic activation pruning for robust adversarial defense. In International\nConference on Learning Representations, 2018.\n\n[13] D. Diochnos, S. Mahloujifar, and M. Mahmoody. Adversarial risk and robustness: General\nde\ufb01nitions and implications for the uniform distribution. In Advances in Neural Information\nProcessing Systems, pages 10380\u201310389, 2018.\n\n[14] C. Dwork, A. Roth, et al. The algorithmic foundations of differential privacy. Foundations and\n\nTrends R(cid:13) in Theoretical Computer Science, 9(3\u20134):211\u2013407, 2014.\n\n[15] A. Fawzi, H. Fawzi, and O. Fawzi. Adversarial vulnerability for any classi\ufb01er. In S. Bengio,\nH. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in\nNeural Information Processing Systems 31, pages 1186\u20131195. Curran Associates, Inc., 2018.\n\n[16] A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard. Robustness of classi\ufb01ers: from adversarial\nto random noise. In Advances in Neural Information Processing Systems, pages 1632\u20131640,\n2016.\n\n[17] A. Fawzi, S.-M. Moosavi-Dezfooli, P. Frossard, and S. Soatto. Empirical study of the topology\n\nand geometry of deep networks. In IEEE CVPR, 2018.\n\n[18] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In\n\nInternational Conference on Learning Representations, 2015.\n\n9\n\n\f[19] H. Gouk, E. Frank, B. Pfahringer, and M. Cree. Regularisation of neural networks by enforcing\n\nlipschitz continuity. arXiv preprint arXiv:1804.04368, 2018.\n\n[20] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input\n\ntransformations. In International Conference on Learning Representations, 2018.\n\n[21] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.\n\nIn\nProceedings of the IEEE conference on computer vision and pattern recognition, pages 770\u2013\n778, 2016.\n\n[22] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical\n\nreport, Citeseer, 2009.\n\n[23] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv\n\npreprint arXiv:1607.02533, 2016.\n\n[24] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana. Certi\ufb01ed robustness to adversarial\nexamples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP),\npages 727\u2013743, 2018.\n\n[25] B. Li, C. Chen, W. Wang, and L. Carin. Second-order adversarial attack and certi\ufb01able\n\nrobustness. CoRR, abs/1809.03113, 2018.\n\n[26] X. Liu, M. Cheng, H. Zhang, and C.-J. Hsieh. Towards robust neural networks via random\nself-ensemble. In European Conference on Computer Vision, pages 381\u2013397. Springer, 2018.\n\n[27] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models\nresistant to adversarial attacks. In International Conference on Learning Representations, 2018.\n\n[28] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models\nresistant to adversarial attacks. In International Conference on Learning Representations, 2018.\n\n[29] D. Meng and H. Chen. Magnet: a two-pronged defense against adversarial examples. In\nProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security,\npages 135\u2013147. ACM, 2017.\n\n[30] S. Mitaim and B. Kosko. Adaptive stochastic resonance. Proceedings of the IEEE, 86(11):2152\u2013\n\n2183, 1998.\n\n[31] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturba-\ntions. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages\n86\u201394. Ieee, 2017.\n\n[32] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to\nfool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and\nPattern Recognition, pages 2574\u20132582, 2016.\n\n[33] L. Paninski. Estimation of entropy and mutual information. Neural computation, 15(6):1191\u2013\n\n1253, 2003.\n\n[34] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of\ndeep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European\nSymposium on, pages 372\u2013387. IEEE, 2016.\n\n[35] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial\nperturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy\n(SP), pages 582\u2013597. IEEE, 2016.\n\n[36] L. Perez and J. Wang. The effectiveness of data augmentation in image classi\ufb01cation using deep\n\nlearning. arXiv preprint arXiv:1712.04621, 2017.\n\n[37] A. S. Rakin, Z. He, and D. Fan. Parametric noise injection: Trainable randomness to improve\ndeep neural network robustness against adversarial attack. arXiv preprint arXiv:1811.09310,\n2018.\n\n10\n\n\f[38] A. R\u00e9nyi. On measures of entropy and information. Technical report, HUNGARIAN\n\nACADEMY OF SCIENCES Budapest Hungary, 1961.\n\n[39] P. Samangouei, M. Kabkab, and R. Chellappa. Defense-GAN: Protecting classi\ufb01ers against\nadversarial attacks using generative models. In International Conference on Learning Represen-\ntations, 2018.\n\n[40] C.-J. Simon-Gabriel, Y. Ollivier, B. Sch\u00f6lkopf, L. Bottou, and D. Lopez-Paz. Adversarial vul-\nnerability of neural networks increases with input dimension. arXiv preprint arXiv:1802.01421,\n2018.\n\n[41] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the\nimpact of residual connections on learning. In Thirty-First AAAI Conference on Arti\ufb01cial\nIntelligence, 2017.\n\n[42] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intrigu-\ning properties of neural networks. In International Conference on Learning Representations,\n2014.\n\n[43] C. Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media,\n\n2008.\n\n[44] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects through\n\nrandomization. In International Conference on Learning Representations, 2018.\n\n[45] S. Zagoruyko and N. Komodakis. Wide residual networks. In E. R. H. Richard C. Wilson and\nW. A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages\n87.1\u201387.12. BMVA Press, September 2016.\n\n[46] S. Zozor and P.-O. Amblard. Stochastic resonance in discrete time nonlinear AR(1) models.\n\nIEEE transactions on Signal Processing, 47(1):108\u2013122, 1999.\n\n11\n\n\f", "award": [], "sourceid": 6368, "authors": [{"given_name": "Rafael", "family_name": "Pinot", "institution": "Dauphine University - CEA LIST Institute"}, {"given_name": "Laurent", "family_name": "Meunier", "institution": "Dauphine University - FAIR Paris"}, {"given_name": "Alexandre", "family_name": "Araujo", "institution": "Universit\u00e9 Paris-Dauphine"}, {"given_name": "Hisashi", "family_name": "Kashima", "institution": "Kyoto University/RIKEN Center for AIP"}, {"given_name": "Florian", "family_name": "Yger", "institution": "Universit\u00e9 Paris-Dauphine"}, {"given_name": "Cedric", "family_name": "Gouy-Pailler", "institution": "CEA"}, {"given_name": "Jamal", "family_name": "Atif", "institution": "Universit\u00e9 Paris-Dauphine"}]}