{"title": "Partially Encrypted Deep Learning using Functional Encryption", "book": "Advances in Neural Information Processing Systems", "page_first": 4517, "page_last": 4528, "abstract": "Machine learning on encrypted data has received a lot of attention thanks to recent breakthroughs in homomorphic encryption and secure multi-party computation. It allows outsourcing computation to untrusted servers without sacrificing privacy of sensitive data. We propose a practical framework to perform partially encrypted and privacy-preserving predictions which combines adversarial training and functional encryption. We first present a new functional encryption scheme to efficiently compute quadratic functions so that the data owner controls what can be computed but is not involved in the calculation: it provides a decryption key which allows one to learn a specific function evaluation of some encrypted data. We then show how to use it in machine learning to partially encrypt neural networks with quadratic activation functions at evaluation time and we provide a thorough analysis of the information leaks based on indistinguishability of data items of the same label. Last, since several encryption schemes cannot deal with the last thresholding operation used for classification, we propose a training method to prevent selected sensitive features from leaking which adversarially optimizes the network against an adversary trying to identify these features. This is of great interest for several existing works using partially encrypted machine learning as it comes with almost no cost on the model's accuracy and significantly improves data privacy.", "full_text": "Partially Encrypted Machine Learning\n\nusing Functional Encryption\n\nTh\u00e9o Ryffel1, 2, Edouard Dufour-Sans1, Romain Gay1,3,\n\nFrancis Bach2, 1 and David Pointcheval1, 2\n\n1D\u00e9partement d\u2019informatique de l\u2019ENS, ENS, CNRS, PSL University, Paris, France\n\n2INRIA, Paris, France\n\n3University of California, Berkeley\n\n{theo.ryffel,edufoursans,romain.gay,francis.bach,david.pointcheval}@ens.fr\n\nAbstract\n\nMachine learning on encrypted data has received a lot of attention thanks to recent\nbreakthroughs in homomorphic encryption and secure multi-party computation. It\nallows outsourcing computation to untrusted servers without sacri\ufb01cing privacy of\nsensitive data. We propose a practical framework to perform partially encrypted and\nprivacy-preserving predictions which combines adversarial training and functional\nencryption. We \ufb01rst present a new functional encryption scheme to ef\ufb01ciently\ncompute quadratic functions so that the data owner controls what can be computed\nbut is not involved in the calculation: it provides a decryption key which allows one\nto learn a speci\ufb01c function evaluation of some encrypted data. We then show how\nto use it in machine learning to partially encrypt neural networks with quadratic\nactivation functions at evaluation time, and we provide a thorough analysis of\nthe information leaks based on indistinguishability of data items of the same\nlabel. Last, since most encryption schemes cannot deal with the last thresholding\noperation used for classi\ufb01cation, we propose a training method to prevent selected\nsensitive features from leaking, which adversarially optimizes the network against\nan adversary trying to identify these features. This is interesting for several existing\nworks using partially encrypted machine learning as it comes with little reduction\non the model\u2019s accuracy and signi\ufb01cantly improves data privacy.\n\n1\n\nIntroduction\n\nAs both public opinion and regulators are becoming increasingly aware of issues of data privacy, the\narea of privacy-preserving machine learning has emerged with the aim of reshaping the way machine\nlearning deals with private data. Breakthroughs in fully homomorphic encryption (FHE) [15, 18] and\nsecure multi-party computation (SMPC) [19, 39] have made computation on encrypted data practical\nand implementations of neural networks to do encrypted predictions have \ufb02ourished [34\u201336, 8, 13].\nHowever, these protocols require the data owner encrypting the inputs and the parties performing the\ncomputations to interact and communicate in order to get decrypted results, which we would like\nto avoid in some cases, like spam \ufb01ltering, for example, where the email receiver should not need\nto be online for the email server to classify incoming email as spam or not. Functional encryption\n(FE) [12, 32] in return does not need interaction to compute over encrypted data: it allows users to\nreceive in plaintext speci\ufb01c functional evaluations of encrypted data: for a function f, a functional\ndecryption key can be generated such that, given any ciphertext with underlying plaintext x, a user\ncan use this key to obtain f (x) without learning x or any other information than f (x). It stands in\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fbetween traditional public key encryption, where data can only be directly revealed, and FHE, where\ndata can be manipulated but cannot be revealed: it allows the user to tightly control what is disclosed\nabout his data.\n\n1.1 Use cases\n\nSpam \ufb01ltering. Consider the following scenario: Alice uses a secure email protocol which makes\nuse of functional encryption. Bob uses Alice\u2019s public key to send her an email, which lands on Alice\u2019s\nemail provider\u2019s server. Alice gave the server keys that enable it to process the email and take a\nprede\ufb01ned set of appropriate actions without her being online. The server could learn how urgent the\nemail is and decide accordingly whether to alert Alice. It could also detect whether the message is\nspam and store it in the spam box right away.\nPrivacy-preserving enforcement of content policies Another use case could be to enable platforms,\nsuch as messaging apps, to maintain user privacy through end-to-end encryption, while \ufb01ltering out\ncontent that is illegal or doesn\u2019t adhere to policies the site may have regarding, for instance, abusive\nspeech or explicit images.\nThese applications are not currently feasible within a reasonable computing time, as the construction\nof FE for all kinds of circuits is essentially equivalent to indistinguishable obfuscation [7, 21],\nconcrete instances of which have been shown insecure, let alone ef\ufb01cient. However, there exist\npractical FE schemes for the inner-product functionality [1, 2] and more recently for quadratic\ncomputations [6], that is usable for practical applications.\n\n1.2 Our contributions\n\nWe introduce a new FE scheme to compute quadratic forms which outperforms that of Baltico et\nal. [6] in terms of complexity, and provide an ef\ufb01cient implementation of this scheme. We show\nhow to use it to build privacy preserving neural networks, which perform well on simple image\nclassi\ufb01cation problems. Speci\ufb01cally, we show that the \ufb01rst layers of a polynomial network can be run\non encrypted inputs using this quadratic scheme.\nIn addition, we present an adversarial training technique to process these \ufb01rst layers to improve\nprivacy, so that their output, which is in plaintext, cannot be used by adversaries to recover speci\ufb01c\nsensitive information at test time. This adversarial procedure is generic for semi-encrypted neural\nnetworks and aims at reducing the information leakage, as the decrypted output is not directly the\nclassi\ufb01cation result but an intermediate layer (i.e., the neuron outputs of the neural network before\nthresholding). This has been overlooked in other popular encrypted classi\ufb01cation schemes (even in\nFHE-based constructions like [20] and [15]), where the argmax operation used to select the class\nlabel is made in clear, as it is either not possible with FE, or quite inef\ufb01cient with FHE and SMPC.\nWe demonstrate the practicality of our approach using a dataset inspired from MNIST [27], which is\nmade of images of digits written using two different fonts. We show how to perform classi\ufb01cation of\nthe encrypted digit images in less than 3 seconds with over 97.7% accuracy while making the font\nprediction a hard task for a whole set of adversaries.\nThis paper builds on a preliminary version available on the Cryptology ePrint Archive at\neprint.iacr.org/2018/206. All code and implementations can be found online at github.com/\nLaRiffle/collateral-learning and github.com/edufoursans/reading-in-the-dark.\n\n2 Background Knowledge\n\n2.1 Quadratic and Polynomial Neural Networks\n\nPolynomial neural networks are a class of networks which only use linear elements, like fully\nconnected linear layers, convolutions but with average pooling, and model activation functions with\npolynomial approximations when not simply the square function. Despite these simpli\ufb01cations, they\nhave proved themselves satisfactorily accurate for relatively simple tasks ([20] learns on MNIST and\n[5] on CIFAR10 [26]). The simplicity of the operations they build on guarantees good ef\ufb01ciency,\nespecially for the gradient computations, and works like [28] have shown that they can achieve\nconvergence rates similar to those of networks with non-linear activations.\n\n2\n\n\fIn particular, they have been used for several early stage implementations in cryptography [20, 18, 14]\nto demonstrate the usability of new protocols for machine learning. However, the argmax or other\nthresholding function present at the end of a classi\ufb01er network to select the class among the output\nneurons cannot be conveniently handled, so several protocol implementations (among which ours)\nrun polynomial networks on encrypted inputs, but take the argmax over the decrypted output of the\nnetwork. This results in potential information leakage which could be maliciously exploited.\n\n2.2 Functional Encryption\nFunctional encryption extends the notion of public key encryption where one uses a public key pk\nand a secret key sk to respectively encrypt and decrypt some data. More precisely, pk is still used\nto encrypt data, but for a given function f, sk can be used to derive a functional decryption key dkf\nwhich will be shared to users so that, given a ciphertext of x, they can decrypt f (x) but not x. In\nparticular, someone having access to dkf cannot learn anything about x other than f (x). Note also\nthat functions cannot be composed, since the decryption happens within the function evaluation.\nHence, only single quadratic functions can be securely evaluated. A formal de\ufb01nition of functional\nencryption is provided in Appendix A.1.\nPerfect correctness. Perfect correctness is achieved in functional encryption: \u2200x \u2208 X , f \u2208 F,\nPr[Dec(dkf , ct) = f (x)] = 1, where dkf \u2190 KeyGen(msk, f ) and ct \u2190 Enc(pk, x). Note that this\nproperty is a very strict condition, which is not satis\ufb01ed by exisiting fully homomorphic encryption\nschemes (FHE), such as [16, 22].\n\nIndistinguishability and security\n\n2.3\nTo assess the security of our framework, we \ufb01rst consider the FE scheme security and make sure\nthat we cannot learn anything more than what the function is supposed to output given an encryption\nof x. Second, we analyze how sensitive the output f (x) is with respect to the private input x. For\nboth studies, we will rely on indistinguishability [23], a classical security notion which can be\nsummed up in the following game: an adversary provides two input items to the challenger (here\nour FE algorithm), and the challenger chooses one item to be encrypted, runs encryption on it before\nreturning the output. The adversary should not be able to detect which input was used. This is known\nas IND-CPA security in cryptography and a formal de\ufb01nition of it can be found in Appendix A.2.\nWe will \ufb01rst prove that our quadratic FE scheme achieves IND-CPA security, then, we will use a\nrelaxed version of indistinguishability to measure the FE output sensitivity. More precisely, we\nwill make the hypothesis that our input data can be used to predict public labels but also sensitive\nprivate ones, respectively ypub and ypriv. Our quadratic FE scheme q aims at predicting ypub and an\nadversary would rather like to infer ypriv. In this case, the security game consists in the adversary\nproviding two inputs (x0, x1) labelled with the same ypub but a different ypriv and then trying to\ndistinguish which one was selected by the challenger, given its output q(xb), b \u2208 {0, 1}. One way\nto do this is to measure the ability of an adversary to predict ypriv for items which all belong to the\nsame ypub class.\nIn particular, note that we do not consider approaches based on input reconstruction (as done by [17])\nbecause in many cases, the adversary is not interested in reconstructing the whole input, but rather\nwants to get insights into speci\ufb01c characteristics.\nAnother way to see this problem is that we want the sensitive label ypriv to be independent from the\ndecrypted output q(x) (which is a proxy to the prediction), given the true public label ypub. This\nindependence notion is known as separation and is used as a fairness criterion in [9] if the sensitive\nfeatures can be misused for discrimination.\n\n3 Our Context for Private Inference\n3.1 Classifying in two directions\nWe are interested in speci\ufb01c types of datasets ((cid:126)xi)i=1,...,n which have public labels ypub but also\nprivate ones ypriv. Moreover, these different types of labels should be entangled, meaning that they\nshould not be easily separable, unlike the color and the shape of an object in an image for example\nwhich can be simply separated. For example, in the spam \ufb01ltering use case mentioned above, ypub\nwould be a spam \ufb02ag, and ypriv would be some marketing information highlighting areas of interest\n\n3\n\n\fof the email recipient like technology, culture, etc. In addition, to simplify our analysis, we assume\nthat classes are balanced for all types of labels, and that labels are independent from each other given\nthe input: \u2200(cid:126)x, P (ypub, ypriv|(cid:126)x) = P (ypub|(cid:126)x)P (ypriv|(cid:126)x). To illustrate our approach in the case of\nimage recognition, we propose a synthetic dataset inspired by MNIST which consists of 60 000 grey\nscaled images of 28 \u00d7 28 pixels representing digits using two fonts and some distortion, as shown in\nFigure 1. Here, the public label ypub is the digit on the image and the private one ypriv is the font\nused to draw it.\n\nFigure 1: Arti\ufb01cial dataset inspired from MNIST with two types of labels.\n\nWe de\ufb01ne two tasks: a main task which tries to predict ypub using a partially-encrypted polynomial\nneural network with functional encryption, and a collateral task which is performed by an adversary\nwho tries to leverage the output of the FE encrypted network at test time to predict ypriv. Our goal\nis to perform the main task with high accuracy while making the collateral one as bad as random\npredictions. In terms of indistinguishability, given a dataset with the same digit drawn, it should be\ninfeasible to detect the used font.\n\n3.2 Equivalence with a Quadratic Functional Encryption scheme\n\nWe now introduce our new framework for quadratic functional encryption and show that it can be\nused to partially encrypt a polynomial network.\n\n3.2.1 Functional Encryption for Quadratic Polynomials\nWe build an ef\ufb01cient FE scheme for the set of quadratic functions de\ufb01ned as Fn,Bx,By,Bq \u2282\n{q : [\u2212Bx, Bx]n \u00d7 [\u2212By, By]n \u2192 Z}, where q is described as a set of bounded coef\ufb01cients\n\n{qi,j \u2208 [\u2212Bq, Bq]}i,j\u2208[n] and for all vectors ((cid:126)x, (cid:126)y), we have q((cid:126)x, (cid:126)y) =(cid:80)\n\ni,j\u2208[n] qi,jxiyj.\n\nA complete version of our scheme is given in Figure 2, but here are the main ideas and notations.\nFirst note that we use bilinear groups, i.e., a set of prime-order groups G1, G2 and GT together\nwith a bilinear map e : G1 \u00d7 G2 \u2192 GT called pairing which satis\ufb01es e(ga\n2) = e(g1, g2)ab for\nany exponents a, b \u2208 Z: one can compute quadratic polynomials in the exponent. Here, g1, g2 are\ngenerators of G1 and G2 and gT := e(g1, g2) is a generator of the target group GT . A pair of vectors\n((cid:126)s, (cid:126)t) is \ufb01rst selected and constitutes the private key msk, while the public key is (g(cid:126)s\n\n1 , gb\n\n1, g(cid:126)t\n2).\n\nT\n\n2, which allows any user to\nEncrypting ((cid:126)x, (cid:126)y) roughly consists of masking g(cid:126)x\ncompute gq((cid:126)x,(cid:126)y)\u2212q((cid:126)s,(cid:126)t)\nwith for any quadratic function q, using the pairing. The functional decryption\nkey for a speci\ufb01c q is gq((cid:126)s,(cid:126)t)\n. Last, taking the discrete logarithm gives\naccess to q((cid:126)x, (cid:126)y) (discrete logarithm for small exponents is easy). Security uses the fact that it is\nhard to compute msk from pk (discrete logarithm for large exponents (cid:126)s, (cid:126)t is hard to compute). More\ndetails are given in Appendix B.11\n\nwhich allows to get gq((cid:126)x,(cid:126)y)\n\n2 with g(cid:126)s\n\n1 and g(cid:126)y\n\n1 and g(cid:126)t\n\nT\n\nT\n\nTheorem 3.1 (Security, correctness and complexity) The FE scheme provided in Figure 2:\n\n\u2022 is IND-CPA secure in the Generic Bilinear Group Model,\n\u2022 veri\ufb01es log(out) = q((cid:126)x, (cid:126)y) and satis\ufb01es perfect correctness,\n\u2022 has a overall decryption complexity of 2n2(E + P ) + P + D,\n\nwhere E, P and D respectively denote exponentiation, pairing and discrete logarithm complexities.\n\nOur scheme outperforms previous schemes for quadratic FE with the same security assumption, like\nthe one from [6, Sec. 4] which achieves 3n2(E + P ) + 2P + D complexity and uses larger ciphertexts\nand decryption keys. Note that the ef\ufb01ciency of the decryption can even be further optimized for\nthose quadratic polynomials used that are relevant to our application (see Section 3.2.2).\n\n1Note that we only present a simpli\ufb01ed scheme here. In particular, the actual encryption is randomized,\n\nwhich is necessary to achieve IND-CPA security.\n\n4\n\n\fSetUp(1\u03bb,Fn,Bx,By,Bf ):\nPG := (G1, G2, p, g1, g2, e) \u2190 GGen(1\u03bb), (cid:126)s, (cid:126)t $\u2190 Zn\nReturn (pk, msk).\n\nEnc(cid:0)pk, ((cid:126)x, (cid:126)y)(cid:1):\n\u03b3 $\u2190 Zp, W $\u2190 GL2, for all i \u2208 [n], (cid:126)ai := (W\u22121)(cid:62)(cid:18) xi\n(cid:16)\n\nReturn ct :=\n\n\u03b3si\n1 \u00d7 G2\n2)n\n\np , msk := ((cid:126)s, (cid:126)t), pk :=\n\n(cid:19)\n\n, (cid:126)bi := W\n\n(cid:18) yi\u2212ti\n\n(cid:17)\n\n1, g(cid:126)t\n\n2\n\n(cid:16)PG, g(cid:126)s\n(cid:19)\n\n1 ,{g(cid:126)ai\ng\u03b3\n(cid:16)\n(cid:16)\n\ngq((cid:126)s,(cid:126)t)\n\n2\n\n1 ,{g(cid:126)ai\ng\u03b3\n\n(cid:126)bi\n\n1 , g\n\n2 }i\u2208[n]\n\n(cid:17) \u2208 G1 \u00d7 (G2\n(cid:17) \u2208 G2 \u00d7 Fn,Bx,By,Bq.\n(cid:16)\n(cid:17)\n(cid:1)qi,j\ni,j\u2208[n] e(cid:0)g(cid:126)ai\n) \u00b7(cid:81)\n\n2 }i\u2208[n]\n\n(cid:126)bj\n1 , g\n2\n\n, dkq :=\n\n1 , g\n\n, q\n\n(cid:126)bi\n\nKeyGen(msk, q):\nReturn dkf :=\n\n(cid:16)\n\nDec\n\npk, ct :=\n\n1 , gq((cid:126)s,(cid:126)t)\nout := e(g\u03b3\nReturn log(out) \u2208 Z.\n\n2\n\n(cid:17)(cid:17)\n\n:\n\n, q\n\ngq((cid:126)s,(cid:126)t)\n\n2\n\nFigure 2: Our functional encryption scheme for quadratic polynomials.\n\nComputing the discrete logarithm for decryption. Our decryption requires computing discrete\nlogarithms of group elements in base gT , but contrary to previous works like [25] it is independent of\nthe ciphertext and the functional decryption key used to decrypt. This allows to pre-compute values\nand dramatically speeds-up decryption.\n\n3.2.2 Equivalence of the FE scheme with a Quadratic Network\nWe classify data which can be represented as a vector (cid:126)x \u2208 [0, B]n (in our case, the size B = 255,\nand the dimension n = 784) and we \ufb01rst build models (qi)i\u2208[(cid:96)] for each public label i \u2208 [(cid:96)], such\nthat our prediction ypub for (cid:126)x is argmaxi\u2208[(cid:96)] qi((cid:126)x).\nQuadratic polynomial on Rn. The most straightforward way to use our FE scheme would be for\nus to learn a model (Qi)i\u2208[(cid:96)] \u2208 (Rn\u00d7n)(cid:96), which we would then round onto integers, such that\nqi((cid:126)x) = (cid:126)x(cid:62)Qi(cid:126)x, \u2200i \u2208 [(cid:96)]. This is a unnecessarily powerful model in the case of MNIST as it has (cid:96)n2\nparameters (n = 784), and the resulting number of pairings to compute would be unreasonably large.\nLinear homomorphism. The encryption algorithm of our FE scheme is linearly homomorphic with\nrespect to the plaintext: given an encryption of ((cid:126)x, (cid:126)y) under the secret key msk := ((cid:126)s, (cid:126)t), one can\nef\ufb01ciently compute an encryption of ((cid:126)u(cid:62)(cid:126)x, (cid:126)v(cid:62)(cid:126)y) under the secret key msk\n:= ((cid:126)u(cid:62)(cid:126)s, (cid:126)v(cid:62)(cid:126)t) for any\nlinear combination (cid:126)u, (cid:126)v (see proof in Appendix B.2). Any vector (cid:126)v is a column, and (cid:126)v(cid:62) is a row.\nTherefore, if q can be written q((cid:126)x, (cid:126)y) = (U(cid:126)x)(cid:62)M(V(cid:126)y) for all ((cid:126)x, (cid:126)y), with U, V \u2208 Zd\u00d7n\nprojection\nmatrices and M \u2208 Zd\u00d7d\n, it is more ef\ufb01cient to \ufb01rst compute the encryption of (U(cid:126)x, V(cid:126)y) from the\nencryption of ((cid:126)x, (cid:126)y), and then to apply the functional decryption on these ciphertexts, because their\nunderlying plaintexts are of reduced dimension d < n. This reduces the number of exponentiations\nfrom 2n2 to 2dn and the number of pairing computations from 2n2 to 2d2 for a single qi. This is a\nmajor ef\ufb01ciency improvement for small d, as pairings are the main bottleneck in the computation.\nProjection and quadratic polynomial on Rd. We can use this and apply the quadratic polynomials\n\non projected vectors: we learn P \u2208 Rn\u00d7d and (Qi)i\u2208[(cid:96)] \u2208 (cid:0)Rd\u00d7d(cid:1)(cid:96), and our model is qi((cid:126)x) =\n\n(P(cid:126)x)(cid:62)Qi(P(cid:126)x), \u2200i \u2208 [(cid:96)]. We only need 2(cid:96)d2 pairings and since the same P is used for all qi, we only\ncompute once the encryption of P(cid:126)x from the encryption of (cid:126)x. Better yet, we can also perform the\npairings only once, and then compute the scores by exponentiating with different coef\ufb01cients the\nsame results of the pairings, thus only requiring 2d2 pairing evaluations, independently of (cid:96).\n\np\n\np\n\n(cid:48)\n\n5\n\n\fDegree 2 polynomial network, with one hidden layer. To further reduce the number of pairings,\nwe actually limit ourselves to diagonal matrices, and thus rename Qi to Di. We \ufb01nd that the gain\nin ef\ufb01ciency associated with only computing 2d pairings is worth the small drop in accuracy. The\nresulting model is actually a polynomial network of degree 2 with one hidden layer of d neurons and\nthe activation function is the square. In the following experiments we take d = 40.\nOur \ufb01nal encrypted model can thus be written as qi((cid:126)x) = (P(cid:126)x)(cid:62)Di(P(cid:126)x),\u2200i \u2208 [(cid:96)], where we add a\nbias term to (cid:126)x by replacing it with (cid:126)x = (1 x1 . . . xn).\nFull network. The result of the quadratic (qi((cid:126)x))i\u2208[(cid:96)] (i.e., of the private quadratic network) is now\nvisible in clear. As mentioned above, we cannot compose this block several times as it contains\ndecryption, so this is currently the best that we can have as an encrypted computation with FE. Instead\nof simply applying the argmax to the cleartext output of this privately-evaluated quadratic network to\nget the label, we observe that adding more plaintext layers on top of it helps improving the overall\naccuracy of the main task. We have therefore a neural network composed of a private and a public\npart, as illustrated in Figure 3.\n\nFigure 3: Semi-encrypted net-\nwork using quadratic FE.\n\nFigure 4: Semi-encrypted network with an adversary trying\nto recover private labels from the private quadratic network.\n\n3.3 Threat of Collateral Learning\n\nA typical adversary would have a read access to the main task classi\ufb01cation process. It would leverage\nthe output of the quadratic network to try to learn the font used on ciphered images. To do this, all\nthat is needed is to train another network on top of the quadratic network so that it learns to predict\nthe font, assuming some access to labeled samples (which is the case if the adversary encrypts itself\nimages and provides them to the main task at evaluation time). Note that in this case the private\nnetwork is not updated by the collateral network as we assume it is only provided in read access after\nthe main task is trained. Figure 4 summarizes the setting.\nWe implemented this scenario using as adversary a neural network composed of a \ufb01rst layer acting\nas a decompression step where we increase the number of neurons from 10 back to 28 \u00d7 28 and\nadd on top of it a classical2 convolutional neural network (CNN). This structure is reminiscent of\nautoencoders [38] where the bottleneck is the public output of the private net and the challenge of\nthis autoencoder is to correctly memorize the public label while forgetting the private one. What we\nobserved is striking: in less than 10 epochs, the collateral network leverages the 10 public neurons\noutput and achieves 93.5% accuracy for the font prediction. As expected, it gets even worse when\nthe adversary is assessed with the indistinguishability criterion because in that case the adversary\ncan work on a dataset where only a speci\ufb01c digit is represented: this reduces the variability of the\nsamples and makes it easier to distinguish the font; the probability of success is indeed of 96.9%.\nWe call collateral learning this phenomenon of learning unexpected features and will show in the\nnext section how to implement counter-measures to this threat in order to improve privacy.\n\n2https://github.com/pytorch/examples/blob/master/mnist/main.py\n\n6\n\n\f4 Defeating Collateral Learning\n4.1 Reducing information leakage\n\nOur \ufb01rst approach is based on the observation that we leak many bits of information. We \ufb01rst\ninvestigate whether we can reduce the number of outputs of the privately-evaluted network, as\nadding extra layers on top of the private network makes it no longer necessary to keep 10 of them.\nThe intuition is that if the information that is\nrelevant to the main task can \ufb01t in less than\n10 neurons, then the extra neurons would leak\nunnecessary information. We have therefore a\ntrade-off between reducing too much and losing\ndesired information or keeping a too large out-\nput and having an important leakage. We can\nobserve this through the respective accuracies\nas it is shown in Figure 5, where the main and\nadversarial networks are CNNs as in Section 3.3\nwith 10 epochs of training using 7-fold cross\nvalidation. What we observe here is interesting:\nthe main task does not exhibit signi\ufb01cant weak-\nnesses even with size 3 where we drop to 97.1%\nwhich is still very good although 2% under the\nbest accuracy. In return, the collateral accuracy\nstarts to signi\ufb01cantly decrease when output size\nis below 7. At size 4, it is only 76.4% on aver-\nage so 18% less than the baseline. We will keep\nan output size of 3 or 4 for the next experiments to keep the main accuracy almost unchanged.\nAnother hyperparameter that we can consider is the weight compression: how many bits do we need\nto represent the weights on the private networks layers? This is of interest for the FE scheme as we\nneed to convert all weights to integers and those integers will be low provided that the compression\nrate is high. Small weight integers mean that the output of the private network has a relatively low\namplitude and can be therefore ef\ufb01ciently decrypted using discrete logarithm. We managed to express\nall weights and even the input image using 4 bit values with limited impact on the main accuracy and\nalmost none on the collateral one. Details about compression can be found in Appendix C.1.\n\nFigure 5: Trade-off between main and collateral\naccuracies depending on the private output size.\n\n4.2 Adversarial training\n\nWe propose a new approach to actively adapt against collateral learning. The main idea is to simulate\nadversaries and to try to defeat them. To do this, we use semi-adversarial training and optimize\nsimultaneously the main classi\ufb01cation objective and the opposite of the collateral objective of a given\nsimulated adversary. The function that we want to minimize at each iteration step can be written:\n\nLpub(\u03b8q, \u03b8pub) \u2212 \u03b1 min\n\nLpriv(\u03b8q, \u03b8pub)].\n\nmin\n\u03b8q\n\n[min\n\u03b8pub\n\n\u03b8pub\n\nThis approach is inspired from [29] where the authors train some objective against nuisances parame-\nters to build a classi\ufb01er independent from these nuisances. Private features leaking in our scheme can\nindeed be considered to be a nuisance. However, our approach goes one step further as we do not just\nstack a network above another; our global network structure is fork-like: the common basis is the\nprivate network and the two forks are the main and collateral classi\ufb01ers. This allows us to have a\nbetter classi\ufb01er for the main task which is not as sensitive to the adversarial training as the scheme\nexposed by [29, Figure 1]. One other difference is that the collateral classi\ufb01er is a speci\ufb01c modeling\nof an adversary, and we will discuss this in details in the next section. We de\ufb01ne in Figure 6 the\n3-step procedure used to implement this semi-adversarial training using partial back-propagation.\n\n5 Experimental Results\n\nAccurate main task and poor collateral results. In Figures 7 and 8 we show that the output size\nhas an important in\ufb02uence on the two tasks\u2019 performances. For this experiment, we use \u03b1 = 1.7 as\ndetailed in Appendix C.2, the adversary uses the same CNN as stated above and the main network is\n\n7\n\n\fPre-training: Initial phase where both tasks learn and strengthen before the joint optimization\nMinimize Lpub(\u03b8q, \u03b8pub)\nMinimize Lpriv(Frozen(\u03b8q), \u03b8priv)\n\nSemi-adversarial training: The joint optimization phase, where \u03b8pub and \u03b8priv are updated depending on\nthe variations of \u03b8q and \u03b8q is optimized to reduce the loss L = Lpub \u2212 \u03b1Lpriv\nMinimize Lpub(Frozen(\u03b8q), \u03b8pub)\nMinimize Lpriv(Frozen(\u03b8q), \u03b8priv)\nMinimize L = Lpub(\u03b8q, Frozen(\u03b8pub)) \u2212 \u03b1Lpriv(\u03b8q, Frozen(\u03b8pub))\n\nRecover phase: Both tasks recover from the perturbations induced by the adversarial phase, \u03b8q does not\nchange anymore\nMinimize Lpub(Frozen(\u03b8q), \u03b8pub)\nMinimize Lpriv(Frozen(\u03b8q), \u03b8priv)\n\nFigure 6: Our semi-adversarial training scheme.\n\nFigure 7: In\ufb02uence of the output size on the main\ntask accuracy with adversarial training.\n\nFigure 8: In\ufb02uence of the output size on the col-\nlateral task accuracy with adversarial training.\n\na simple feed forward network (FFN) with 4 layers. We observe that both networks behave better\nwhen the output size increases, but the improvement is not synchronous which makes it possible to\nhave a main task with high accuracy while the collateral task is still very inaccurate. In our example,\nthis corresponds to an output size between 3 and 5. Note that the collateral result is the accuracy\nat the distinction task, i.e., the digit is \ufb01xed for the adversary which trains to distinguish two fonts\nduring a 50 epoch recover phase using 7-fold cross validation, after 50 epochs of semi-adversarial\ntraining have been spent to reduce leakage from the private network.\nGeneralizing resistance against multiple adversaries. In practice, it is very likely that the adver-\nsary will use a different model than the one against which the protection has been built. We have\ntherefore investigated how building resistance against a model M can provide resistance against other\nmodels. Our empirical results tend to show that models with less parameters than M do not perform\nwell. In return, models with more parameters can behave better, provided that the complexity does\nnot get excessive for the considered task, because it would not provide any additional advantage and\nwould just lead to learning noise. In particular, the CNN already mentioned above seems to be a\nsuf\ufb01ciently complex model to resist against a wide range of feed forward (FFN) and convolutional\nnetworks, as illustrated in Figure 9 where the measure used is indistinguishability of the font for a\n\ufb01xed digit. This study is not exhaustive as the adversary can change the activation function (here we\nuse relu) or even the training parameters (optimizer, batch size, dropout, etc.), but these do not seem\nto provide any decisive advantage.\nWe also assessed the resistance to a large range of other models from the sklearn library [33] and\nreport the collateral accuracy in Figure 10. As can be observed, some models such as k-nearest\nneighbors or random forests perform better compared to neural networks, even if their accuracy\nremains relatively low. One reason can be that they operate in a very different manner compared\nto the model on which the adversarial training is performed: k-nearest neighbors for example just\nconsiders distances between points.\nRuntime. Training in semi-adversarial mode can take quite a long time depending on the level\nof privacy one wants to achieve. However, the runtime during the test phase is much faster, it is\ndominated by the FE scheme part which can be broken down to 4 steps: functional key generation,\n\n8\n\n\fLinear Ridge Regression\nLogistic Regression\nQuad. Discriminant Analysis\nSVM (RBF kernel)\nGaussian Process Classi\ufb01er\nGaussian Naive Bayes\nK-Neighbors Classi\ufb01er\nDecision Tree Classi\ufb01er\nRandom Forest Classi\ufb01er\nGradient Boosting Classi\ufb01er\n\n53.5 \u00b1 0.5%\n52.5 \u00b1 0.6%\n54.9 \u00b1 0.3%\n57.9 \u00b1 0.4%\n53.8 \u00b1 0.3%\n53.2 \u00b1 0.5%\n58.1 \u00b1 0.7%\n56.8 \u00b1 0.4%\n58.9 \u00b1 0.2%\n58.9 \u00b1 0.2%\nFigure 10: Accuracy on the distinction task for\ndifferent adversarial learning models.\n\nFigure 9: Collateral accuracy depending of the\nadversarial network complexity seen as the log of\nthe number of parameters.\n\nencryption of the input, evaluation of the function and discrete logarithm. Regarding encryption and\nevaluation, the main overhead comes from the exponentiations and pairings which are implemented in\nthe crypto library charm [3]. In return, the discrete logarithm is very ef\ufb01cient thanks to the reduction\nof the weights amplitude detailed in Figure 4.1.\n\nFunctional key generation\nEncryption time\n\n94 \u00b1 5ms\n12.1 \u00b1 0.3 s\n\nEvaluation time\nDiscrete logarithms time\n\n2.97 \u00b1 0.07s\n24 \u00b1 9ms\n\nTable 1: Average runtime for the FE scheme using a 2,7 GHz Intel Core i7 and 16GB of RAM.\n\nTable 1 shows that encryption time is longer than evaluation time, but a single encryption can be used\nwith several decryption keys dkqi to perform multiple evaluation tasks.\n\n6 Conclusion\n\nWe have shown that functional encryption can be used for practical applications where machine\nlearning is used on sensitive data. We have raised awareness about the potential information leakage\nwhen not all the network is encrypted and have proposed semi-adversarial training as a solution to\nprevent targeted sensitive features from leaking for a vast family of adversaries.\nHowever, it remains an open problem to provide privacy-preserving methods for all features except\nthe public ones as they can be hard to identify in advance. On the cryptography side, extension of the\nrange of functions supported in functional encryption would help increase provable data privacy, and\nadding the ability to hide the function evaluated would be of interest for sensitive neural networks.\n\nAcknowledgments\n\nThis work was supported in part by the European Community\u2019s Seventh Framework Programme\n(FP7/2007-2013 Grant Agreement no. 339563 \u2013 CryptoCloud), the European Community\u2019s Horizon\n2020 Project FENTEC (Grant Agreement no. 780108), the Google PhD fellowship, and the French\nFUI ANBLIC Project.\n\nReferences\n[1] Michel Abdalla, Florian Bourse, Angelo De Caro, and David Pointcheval. Simple functional\nencryption schemes for inner products. In Jonathan Katz, editor, PKC 2015, volume 9020 of\nLNCS, pages 733\u2013751. Springer, Heidelberg, March / April 2015.\n\n[2] Shweta Agrawal, Beno\u00eet Libert, and Damien Stehl\u00e9. Fully secure functional encryption for\ninner products, from standard assumptions. In Matthew Robshaw and Jonathan Katz, editors,\nCRYPTO 2016, Part III, volume 9816 of LNCS, pages 333\u2013362. Springer, Heidelberg, August\n2016.\n\n9\n\n\f[3] Joseph A. Akinyele, Christina Garman, Ian Miers, Matthew W. Pagano, Michael Rushanan,\nMatthew Green, and Aviel D. Rubin. Charm: a framework for rapidly prototyping cryptosystems.\nJournal of Cryptographic Engineering, 3(2):111\u2013128, 2013.\n\n[4] Miguel Ambrona, Gilles Barthe, Romain Gay, and Hoeteck Wee. Attribute-based encryption in\nthe generic group model: Automated proofs and new constructions. In Bhavani M. Thurais-\ningham, David Evans, Tal Malkin, and Dongyan Xu, editors, ACM CCS 2017, pages 647\u2013664.\nACM Press, October / November 2017.\n\n[5] Ahmad Al Badawi, Jin Chao, Jie Lin, Chan Fook Mun, Jun Jie Sim, Benjamin Hong Meng\nTan, Xiao Nan, Khin Mi Mi Aung, and Vijay Ramaseshan Chandrasekhar. The alexnet moment\nfor homomorphic encryption: Hcnn, the \ufb01rst homomorphic cnn on encrypted data with gpus.\nCryptology ePrint Archive, Report 2018/1056, 2018.\n\n[6] Carmen Elisabetta Zaira Baltico, Dario Catalano, Dario Fiore, and Romain Gay. Practical\nfunctional encryption for quadratic functions with applications to predicate encryption. In\nJonathan Katz and Hovav Shacham, editors, CRYPTO 2017, Part I, volume 10401 of LNCS,\npages 67\u201398. Springer, Heidelberg, August 2017.\n\n[7] Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil P. Vad-\nhan, and Ke Yang. On the (im)possibility of obfuscating programs. In Joe Kilian, editor,\nCRYPTO 2001, volume 2139 of LNCS, pages 1\u201318. Springer, Heidelberg, August 2001.\n\n[8] Mauro Barni, Pierluigi Failla, Riccardo Lazzeretti, Ahmad-Reza Sadeghi, and Thomas Schnei-\nder. Privacy-preserving ecg classi\ufb01cation with branching programs and neural networks. IEEE\nTransactions on Information Forensics and Security, 6(2):452\u2013468, 2011.\n\n[9] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. fairml-\n\nbook.org, 2018.\n\n[10] Gilles Barthe, Edvard Fagerholm, Dario Fiore, John C. Mitchell, Andre Scedrov, and Benedikt\nSchmidt. Automated analysis of cryptographic assumptions in generic group models. In Juan A.\nGaray and Rosario Gennaro, editors, CRYPTO 2014, Part I, volume 8616 of LNCS, pages\n95\u2013112. Springer, Heidelberg, August 2014.\n\n[11] Dan Boneh and Matthew K. Franklin. Identity based encryption from the Weil pairing. SIAM\n\nJournal on Computing, 32(3):586\u2013615, 2003.\n\n[12] Dan Boneh, Amit Sahai, and Brent Waters. Functional encryption: De\ufb01nitions and challenges.\nIn Yuval Ishai, editor, TCC 2011, volume 6597 of LNCS, pages 253\u2013273. Springer, Heidelberg,\nMarch 2011.\n\n[13] Raphael Bost, Raluca Ada Popa, Stephen Tu, and Sha\ufb01 Goldwasser. Machine learning classi\ufb01-\n\ncation over encrypted data. In NDSS, volume 4324, page 4325, 2015.\n\n[14] Florian Bourse, Michele Minelli, Matthias Minihold, and Pascal Paillier. Fast homomorphic\nevaluation of deep discretized neural networks. Cryptology ePrint Archive, Report 2017/1114,\n2017. https://eprint.iacr.org/2017/1114.\n\n[15] Florian Bourse, Michele Minelli, Matthias Minihold, and Pascal Paillier. Fast homomorphic\nevaluation of deep discretized neural networks. In Advances in Cryptology - CRYPTO 2018\n- 38th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 19-23,\n2018, Proceedings, Part III, pages 483\u2013512, 2018.\n\n[16] Z. Brakerski and V. Vaikuntanathan. Ef\ufb01cient fully homomorphic encryption from (standard)\nlwe. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pages\n97\u2013106, Oct 2011.\n\n[17] Sergiu Carpov, Caroline Fontaine, Damien Ligier, and Renaud Sirdey. Illuminating the dark or\n\nhow to recover what should not be seen. IACR Cryptology ePrint Archive, 2018:1001, 2018.\n\n[18] Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Izabach\u00e8ne. Faster fully homo-\nmorphic encryption: Bootstrapping in less than 0.1 seconds. In Jung Hee Cheon and Tsuyoshi\nTakagi, editors, Advances in Cryptology \u2013 ASIACRYPT 2016, pages 3\u201333, Berlin, Heidelberg,\n2016. Springer Berlin Heidelberg.\n\n10\n\n\f[19] Ivan Damg\u00e5rd, Valerio Pastro, Nigel Smart, and Sarah Zakarias. Multiparty computation\nfrom somewhat homomorphic encryption. In Reihaneh Safavi-Naini and Ran Canetti, editors,\nAdvances in Cryptology \u2013 CRYPTO 2012, pages 643\u2013662, Berlin, Heidelberg, 2012. Springer\nBerlin Heidelberg.\n\n[20] Nathan Dowlin, Ran Gilad-Bachrach, Kim Laine, Kristin Lauter, Michael Naehrig, and John\nWernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and\naccuracy. Technical report, February 2016.\n\n[21] Sanjam Garg, Craig Gentry, Shai Halevi, Mariana Raykova, Amit Sahai, and Brent Waters.\nCandidate indistinguishability obfuscation and functional encryption for all circuits. In 54th\nFOCS, pages 40\u201349. IEEE Computer Society Press, October 2013.\n\n[22] Craig Gentry, Amit Sahai, and Brent Waters. Homomorphic encryption from learning with\nerrors: Conceptually-simpler, asymptotically-faster, attribute-based. In Ran Canetti and Juan A.\nGaray, editors, CRYPTO 2013, Part I, volume 8042 of LNCS, pages 75\u201392. Springer, Heidelberg,\nAugust 2013.\n\n[23] Sha\ufb01 Goldwasser and Silvio Micali. Probabilistic encryption. Journal of Computer and System\n\nSciences, 28(2):270\u2013299, 1984.\n\n[24] Antoine Joux. A one round protocol for tripartite Dif\ufb01e-Hellman. Journal of Cryptology,\n\n17(4):263\u2013276, September 2004.\n\n[25] Sam Kim, Kevin Lewi, Avradip Mandal, Hart Montgomery, Arnab Roy, and David J. Wu.\nFunction-hiding inner product encryption is practical. In Dario Catalano and Roberto De Prisco,\neditors, SCN 18, volume 11035 of LNCS, pages 544\u2013562. Springer, Heidelberg, September\n2018.\n\n[26] Alex Krizhevsky. Learning multiple layers of features from tiny images. University of Toronto,\n\n05 2012.\n\n[27] Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010.\n\n[28] Roi Livni, Shai Shalev-Shwartz, and Ohad Shamir. On the computational ef\ufb01ciency of training\n\nneural networks. CoRR, abs/1410.1141, 2014.\n\n[29] Gilles Louppe, Michael Kagan, and Kyle Cranmer. Learning to pivot with adversarial networks.\nIn I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Gar-\nnett, editors, Advances in Neural Information Processing Systems 30, pages 981\u2013990. Curran\nAssociates, Inc., 2017.\n\n[30] Ueli M. Maurer. Abstract models of computation in cryptography (invited paper). In Nigel P.\nSmart, editor, 10th IMA International Conference on Cryptography and Coding, volume 3796\nof LNCS, pages 1\u201312. Springer, Heidelberg, December 2005.\n\n[31] V. I. Nechaev. Complexity of a determinate algorithm for the discrete logarithm. Mathematical\n\nNotes, 55(2):165\u2013172, 1994.\n\n[32] Adam O\u2019Neill. De\ufb01nitional issues in functional encryption. Cryptology ePrint Archive, Report\n\n2010/556, 2010.\n\n[33] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,\nP. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,\nM. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine\nLearning Research, 12:2825\u20132830, 2011.\n\n[34] M. Sadegh Riazi, Christian Weinert, Oleksandr Tkachenko, Ebrahim M. Songhori, Thomas\nSchneider, and Farinaz Koushanfar. Chameleon: A hybrid secure computation framework for\nmachine learning applications. In Proceedings of the 2018 on Asia Conference on Computer\nand Communications Security, ASIACCS \u201918, pages 707\u2013721, New York, NY, USA, 2018.\nACM.\n\n11\n\n\f[35] Theo Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert,\nand Jonathan Passerat-Palmbach. A generic framework for privacy preserving deep learning.\nCoRR, abs/1811.04017, 2018.\n\n[36] Microsoft SEAL (release 3.2). https://github.com/Microsoft/SEAL, February 2019.\n\nMicrosoft Research, Redmond, WA.\n\n[37] Victor Shoup. Lower bounds for discrete logarithms and related problems. In Walter Fumy,\neditor, EUROCRYPT\u201997, volume 1233 of LNCS, pages 256\u2013266. Springer, Heidelberg, May\n1997.\n\n[38] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and\ncomposing robust features with denoising autoencoders. In Proceedings of the 25th International\nConference on Machine Learning, ICML \u201908, pages 1096\u20131103, New York, NY, USA, 2008.\nACM.\n\n[39] Sameer Wagh, Divya Gupta, and Nishanth Chandran. Securenn: Ef\ufb01cient and private neural\n\nnetwork training. (PETS 2019), February 2019.\n\n12\n\n\f", "award": [], "sourceid": 2545, "authors": [{"given_name": "Th\u00e9o", "family_name": "Ryffel", "institution": "ENS, CNRS, PSL University, INRIA Paris"}, {"given_name": "David", "family_name": "Pointcheval", "institution": "\u00c9cole Normale Sup\u00e9rieure"}, {"given_name": "Francis", "family_name": "Bach", "institution": "INRIA - Ecole Normale Superieure"}, {"given_name": "Edouard", "family_name": "Dufour-Sans", "institution": "Carnegie Mellon University"}, {"given_name": "Romain", "family_name": "Gay", "institution": "UC Berkeley"}]}