{"title": "Hiding Images in Plain Sight: Deep Steganography", "book": "Advances in Neural Information Processing Systems", "page_first": 2069, "page_last": 2079, "abstract": "Steganography is the practice of concealing a secret message within another, ordinary, message.  Commonly, steganography is used to unobtrusively hide a small message within the noisy regions of a larger image.  In this study, we attempt to place a full size color image within another image of the same size.  Deep neural networks are simultaneously trained to create the hiding and revealing processes and are designed to specifically work as a pair.  The system is trained on images drawn randomly from the ImageNet database, and works well on natural images from a wide variety of sources.  Beyond demonstrating the successful application of deep learning to hiding images, we carefully examine how the result is achieved and explore extensions.  Unlike many popular steganographic methods that encode the secret message within the least significant bits of the carrier image, our approach compresses and distributes the secret image's representation across all of the available bits.", "full_text": "Hiding Images in Plain Sight:\n\nDeep Steganography\n\nShumeet Baluja\nGoogle Research\n\nGoogle, Inc.\n\nshumeet@google.com\n\nAbstract\n\nSteganography is the practice of concealing a secret message within another,\nordinary, message. Commonly, steganography is used to unobtrusively hide a small\nmessage within the noisy regions of a larger image. In this study, we attempt\nto place a full size color image within another image of the same size. Deep\nneural networks are simultaneously trained to create the hiding and revealing\nprocesses and are designed to speci\ufb01cally work as a pair. The system is trained on\nimages drawn randomly from the ImageNet database, and works well on natural\nimages from a wide variety of sources. Beyond demonstrating the successful\napplication of deep learning to hiding images, we carefully examine how the result\nis achieved and explore extensions. Unlike many popular steganographic methods\nthat encode the secret message within the least signi\ufb01cant bits of the carrier image,\nour approach compresses and distributes the secret image\u2019s representation across\nall of the available bits.\n\n1\n\nIntroduction to Steganography\n\nSteganography is the art of covered or hidden writing; the term itself dates back to the 15th century,\nwhen messages were physically hidden. In modern steganography, the goal is to covertly communicate\na digital message. The steganographic process places a hidden message in a transport medium, called\nthe carrier. The carrier may be publicly visible. For added security, the hidden message can also be\nencrypted, thereby increasing the perceived randomness and decreasing the likelihood of content\ndiscovery even if the existence of the message detected. Good introductions to steganography and\nsteganalysis (the process of discovering hidden messages) can be found in [1\u20135].\nThere are many well publicized nefarious applications of steganographic information hiding, such as\nplanning and coordinating criminal activities through hidden messages in images posted on public\nsites \u2013 making the communication and the recipient dif\ufb01cult to discover [6]. Beyond the multitude of\nmisuses, however, a common use case for steganographic methods is to embed authorship information,\nthrough digital watermarks, without compromising the integrity of the content or image.\nThe challenge of good steganography arises because embedding a message can alter the appearance\nand underlying statistics of the carrier. The amount of alteration depends on two factors: \ufb01rst, the\namount of information that is to be hidden. A common use has been to hide textual messages in\nimages. The amount of information that is hidden is measured in bits-per-pixel (bpp). Often, the\namount of information is set to 0.4bpp or lower. The longer the message, the larger the bpp, and\ntherefore the more the carrier is altered [6, 7]. Second, the amount of alteration depends on the carrier\nimage itself. Hiding information in the noisy, high-frequency \ufb01lled, regions of an image yields less\nhumanly detectable perturbations than hiding in the \ufb02at regions. Work on estimating how much\ninformation a carrier image can hide can be found in [8].\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: The three components of the full system. Left: Secret-Image preparation. Center: Hiding\nthe image in the cover image. Right: Uncovering the hidden image with the reveal network; this is\ntrained simultaneously, but is used by the receiver.\n\nThe most common steganography approaches manipulate the least signi\ufb01cant bits (LSB) of images to\nplace the secret information - whether done uniformly or adaptively, through simple replacement or\nthrough more advanced schemes [9, 10]. Though often not visually observable, statistical analysis\nof image and audio \ufb01les can reveal whether the resultant \ufb01les deviate from those that are unaltered.\nAdvanced methods attempt to preserve the image statistics, by creating and matching models of the\n\ufb01rst and second order statistics of the set of possible cover images explicitly; one of the most popular\nis named HUGO [11]. HUGO is commonly employed with relatively small messages (< 0.5bpp).\nIn contrast to the previous studies, we use a neural network to implicitly model the distribution of\nnatural images as well as embed a much larger message, a full-size image, into a carrier image.\nDespite recent impressive results achieved by incorporating deep neural networks with steganaly-\nsis [12\u201314], there have been relatively few attempts to incorporate neural networks into the hiding\nprocess itself [15\u201319]. Some of these studies have used deep neural networks (DNNs) to select which\nLSBs to replace in an image with the binary representation of a text message. Others have used\nDNNs to determine which bits to extract from the container images. In contrast, in our work, the\nneural network determines where to place the secret information and how to encode it ef\ufb01ciently;\nthe hidden message is dispersed throughout the bits in the image. A decoder network, that has been\nsimultaneously trained with the encoder, is used to reveal the secret image. Note that the networks\nare trained only once and are independent of the cover and secret images.\nIn this paper, the goal is to visually hide a full N \u00d7 N \u00d7 RGB pixel secret image in another\nN \u00d7 N \u00d7 RGB cover image, with minimal distortion to the cover image (each color channel is 8\nbits). However, unlike previous studies, in which a hidden text message must be sent with perfect\nreconstruction, we relax the requirement that the secret image is losslessly received. Instead, we\nare willing to \ufb01nd acceptable trade-offs in the quality of the carrier and secret image (this will be\ndescribed in the next section). We also provide brief discussions of the discoverability of the existence\nof the secret message. Previous studies have demonstrated that hidden message bit rates as low as\n0.1bpp can be discovered; our bit rates are 10\u00d7 - 40\u00d7 higher. Though visually hard to detect, given\nthe large amount of hidden information, we do not expect the existence of a secret message to be\nhidden from statistical analysis. Nonetheless, we will show that commonly used methods do not \ufb01nd\nit, and we give promising directions on how to trade-off the dif\ufb01culty of existence-discovery with\nreconstruction quality, as required.\n\n2 Architectures and Error Propagation\n\nThough steganography is often con\ufb02ated with cryptography, in our approach, the closest analogue\nis image compression through auto-encoding networks. The trained system must learn to compress\nthe information from the secret image into the least noticeable portions of the cover image. The\narchitecture of the proposed system is shown in Figure 1.\nThe three components shown in Figure 1 are trained as a single network; however, it is easiest to\ndescribe them individually. The leftmost, Prep-Network, prepares the secret image to be hidden. This\ncomponent serves two purposes. First, in cases in which the secret-image (size M \u00d7 M) is smaller\nthan the cover image (N \u00d7 N), the preparation network progressively increases the size of the secret\nimage to the size of the cover, thereby distributing the secret image\u2019s bits across the entire N \u00d7 N\n\n2\n\n\fFigure 2: Transformations made by the preparation network (3 examples shown). Left: Original\nColor Images. Middle: the three channels of information extracted by the preparation network that\nare input into the middle network. Right: zoom of the edge-detectors. The three color channels are\ntransformed by the preparation-network. In the most easily recognizable example, the 2nd channel\nactivates for high frequency regions, e.g. textures and edges (shown enlarged (right)).\n\npixels. (For space reasons, we do not provide details of experiments with smaller images, and instead\nconcentrate on full size images). The more important purpose, relevant to all sizes of hidden images,\nis to transform the color-based pixels to more useful features for succinctly encoding the image \u2013\nsuch as edges [20, 21], as shown in Figure 2.\nThe second/main network, the Hiding Network, takes as input the output of the preparation-network\nand the cover image, and creates the Container image. The input to this network is a N \u00d7 N pixel\n\ufb01eld, with depth concatenated RGB channels of the cover image and the transformed channels of\nthe secret image. Over 30 architectures for this network were attempted for our study with varying\nnumber of hidden layers and convolution sizes; the best consisted of 5 convolution layers that had 50\n\ufb01lters each of {3 \u00d7 3, 4 \u00d7 4, 5 \u00d7 5} patches. Finally, the right-most network, the Reveal Network,\nis used by the receiver of the image; it is the decoder. It receives only the Container image (not the\ncover nor secret image). The decoder network removes the cover image to reveal the secret image.\nAs mentioned earlier, our approach borrows heavily from auto-encoding networks [22]; however,\ninstead of simply encoding a single image through a bottleneck, we encode two images such that the\nintermediate representation (the container image) appears as similar as possible to the cover image.\nThe system is trained by reducing the error shown below (c and s are the cover and secret images\nrespectively, and \u03b2 is how to weigh their reconstruction errors):\n\nL(c, c(cid:48), s, s(cid:48)) = ||c \u2212 c(cid:48)|| + \u03b2||s \u2212 s(cid:48)||\n\n(1)\n\nIt is important to note where the errors are computed and the weights that each error affects, see\nFigure 3. In particular, note that the error term ||c \u2212 c(cid:48)|| does not apply to the weights of the\nreveal-network that receives the container image and extracts the secret image. On the other hand,\nall of the networks receive the error signal \u03b2||s \u2212 s(cid:48)|| for reconstructing the hidden image. This\nensures that the representations formed early in the preparation network as well as those used for\nreconstruction of the cover image also encode information about the secret image.\n\nFigure 3: The three networks are trained as a single, large, network. Error term 1 affects only the \ufb01rst\ntwo networks. Error term 2 affects all 3. S is the secret image, C is the cover image.\n\n3\n\n\fTo ensure that the networks do not simply encode the secret image in the LSBs, a small amount of\nnoise is added to the output of the second network (e.g. into the generated container image) during\ntraining. The noise was designed such that the LSB was occasionally \ufb02ipped; this ensured that the\nLSB was not the sole container of the secret image\u2019s reconstruction. Later, we will discuss where the\nsecret image\u2019s information is placed. Next, we examine how the network performs in practice.\n\n3 Empirical Evaluation\n\nThe three networks were trained as described above using Adam [23]. For simplicity, the reconstruc-\ntions minimized the sum of squares error of the pixel difference, although other image metrics could\nhave easily been substituted [24, 25]. The networks were trained using randomly selected pairs of\nimages from the ImageNet training set [26].\nQuantitative results are shown in Figure 4, as measured by the SSE per pixel, per channel. The\ntesting was conducted on 1,000 image pairs taken from ImageNet images (not used in training). For\ncomparison, also shown is the result of using the same network for only encoding the cover image\nwithout the secret image (e.g. \u03b2 = 0). This gives the best reconstruction error of the cover using\nthis network (this is unattainable while also encoding the secret image). Also shown in Figure 4 are\nhistograms of errors for the cover and reconstruction. As can be seen, there are few large pixel errors.\n\nDeep-Stego\nDeep-Stego\nDeep-Stego\nCover Only\n\n\u03b2\n\n0.75\n1.00\n1.25\n0.00\n\nCover\n\nSecret\n\n2.8\n3.0\n6.4\n0.1\n\n3.6\n3.2\n2.8\n(n/a)\n\nFigure 4: Left: Number of intensity values off (out of 256) for each pixel, per channel, on cover and\nsecret image. Right: Distribution of pixel errors for cover and secret images, respectively.\n\nFigure 5 shows the results of hiding six images, chosen to show varying error rates. These images are\nnot taken from ImageNet to demonstrate that the networks have not over-trained to characteristics of\nthe ImageNet database, and work on a range of pictures taken with cell phone cameras and DSLRs.\nNote that most of the reconstructed cover images look almost identical to the original cover images,\ndespite encoding all the information to reconstruct the secret image. The differences between the\noriginal and cover images are shown in the rightmost columns (magni\ufb01ed 5\u00d7 in intensity).\nConsider how these error rates compare to creating the container through simple LSB substitution:\nreplacing the 4 least signi\ufb01cant bits (LSB) of the cover image with the 4 most-signi\ufb01cant 4-bits\n(MSB) of the secret image. In this procedure, to recreate the secret image, the MSBs are copied\nfrom the container image, and the remaining bits set to their average value across the training dataset.\nDoing this, the average pixel error per channel on the cover image\u2019s reconstruction is 5.4 (in a range\nof 0-255). The average error on the reconstruction of the secret image (when using the average value\nfor the missing LSB bits) is approximately 4.0.1 Why is the error for the cover image\u2019s reconstruction\nlarger than 4.0? The higher error for the cover image\u2019s reconstruction re\ufb02ects the fact that the\ndistribution of bits in the natural images used are different for the MSBs and LSBs; therefore, even\nthough the secret and cover image are drawn from the same distribution, when the MSB from the\nsecret image are used in the place of the LSB, larger errors occur than simply using the average values\nof the LSBs. Most importantly, these error rates are signi\ufb01cantly higher than those achieved by our\nsystem (Figure 4).\n\n1Note that an error of 4.0 is expected when the average value is used to \ufb01ll in the LSB: removing 4 bits\nfrom a pixel\u2019s encoding yields 16x fewer intensities that can be represented. By selecting the average value to\nreplace the missing bits, the maximum error can be 8, and the average error is 4, assuming uniformly distributed\nbits. To avoid any confusion, we point out that though it is tempting to consider using the average value for the\ncover image also, recall that the LSBs of the cover image are where the MSBs of the secret image are stored.\nTherefore, those bits must be used in this encoding scheme, and hence the larger error.\n\n4\n\n\fOriginal\ncover\n\nsecret\n\nReconstructed\ncover\n\nsecret\n\nDifferences \u00d75\ncover\n\nsecret\n\nFigure 5: 6 Hiding Results. Left pair of each set: original cover and secret image. Center pair: cover\nimage embedded with the secret image, and the secret image after extraction from the container.\nRight pair: Residual errors for cover and hidden \u2013 enhanced 5\u00d7. The errors per pixel, per channel are\nthe smallest in the top row: (3.1, 4.5) , and largest in the last (4.5, 7.9).\n\nWe close this section with a demonstration of the limitation of our approach. Recall that the networks\nwere trained on natural images found in the ImageNet challenge. Though this covers a very large\nrange of images, it is illuminating to examine the effects when other types of images are used. Five\nsuch images are shown in Figure 6. In the \ufb01rst row, a pure white image is used as the cover, to\nexamine the visual effects of hiding a colorful secret image. This simple case was not encountered\nin training with ImageNet images. The second and third rows change the secret image to bright\npink circles and uniform noise. As can be seen, even though the container image (4th column)\ncontains only limited noise, the recovered secret image is extremely noisy. In the \ufb01nal two rows, the\ncover image is changed to circles, and uniform noise, respectively. As expected, the errors for the\nreconstruction of the cover and secret are now large, though the secret image remains recognizable.\n\n3.1 What if the original cover image became accessible?\n\nFor many steganographic applications, it can safely be assumed that access to the original cover\nimage (without the secret image embedded) is impossible for an attacker. However, what if the\noriginal cover image was discovered? What could then be ascertained about the secret image, even\nwithout access to the decoding network? In Figure 5, we showed the difference image between the\noriginal cover and the container with 5x enhancement \u2013 almost nothing was visible. We reexamine\n\n5\n\n\fFigure 6: Results with images outside the set of natural images.\n\nthe residual image at 5x, 10x, and 20x enhancement (with clipping at 255 where appropriate), see\nFigure 7. In the \ufb01rst row, note that the residual (at 20x) strongly resembles the cover image. In the\nsecond row, the residual is a combination of the cover and secret image, and in the third row, we see\nthe most troubling result \u2013 features of the secret image are revealed. (Recall that this happens only\nwhen the original, unperturbed image, is available for comparison). There are many standard methods\nfor obfuscation, such as adding cryptographic encodings of the secret image before embedding it\ninto the cover image. We demonstrate another avenue that can be used in conjunction with any other\napproach: modifying the network\u2019s error function.\nIn addition to the two error terms described, we add an error term that minimizes the pixel-wise\ncorrelation between the residual of the cover image and the secret image corr(Rc, S) where Rc =\n||C \u2212 C(cid:48)|| and S is the secret image. Many weightings for this term were empirically tested. In the\nresults shown in Figure 7(Bottom), it is scaled to approximately (0.15 * number of pixel * channels).\nMinimizing the residual\u2019s correlation with the secret image removed many of the secret image\u2019s\nfeatures from the residuals \u2013 even when 20x magni\ufb01cation is used. Naturally, the robustness and\nresilience comes at a price; the quality of some of the reconstructions have decreased, as shown (see\nsaturation of reds in \ufb01rst image).\n\n4 Where is the Secret Image Encoded?\n\nThe primary focus of this paper is to concretely demonstrate that it is possible to encode a large\namount of information in an image with limited visually noticeable artifacts. However, no explicit\nattempt has been made to actively hide the existence of that information from machine detection.\nThough we cannot expect to completely hide the fact that up to 1/2 of the information is part of a\nhidden message, measures can be taken to make it more dif\ufb01cult to discover. First, however, we must\ndetermine where the information of the secret image resides.\nIs the network simply hiding the information about the secret image in the least signi\ufb01cant bits of the\ncover image? Tools exist to seek out hidden information in the LSBs. One such publicly available\nsteganalysis toolkit, StegExpose, was used to test the detectability of our hidden images [27\u201329].\nPer the description of the tool: \u201cStegExpose rating algorithm is derived from an intelligent and\nthoroughly tested combination of pre-existing pixel based steganalysis methods including Sample\nPairs by Dumitrescu (2003), RS Analysis by Fridrich (2001), Chi Square Attack by Westfeld (2000)\nand Primary Sets by Dumitrescu (2002)\u201d [27]. In addition to the default settings (threshold = 0.2),\nthe detection thresholds were varied throughout a large range. The ROC curve for StegExpose is\nshown in Figure 8. Note the little variation beyond random guessing (the green line).\nStegExpose should have been able to \ufb01nd the information if it were simply placed in the LSB bits.\nWe turn to a second method to \ufb01nd where the information is stored. The images used in the study\n\n6\n\n\fFigure 7: Top 3 rows. If the original image is leaked and is subtracted from the container image, the\nresidual can be computed. With enough enhancement (20x), some of the secret image is revealed.\nBottom 3 rows: by explicitly creating an error term that minimized the correlation between the\nresidual and the secret image, the residual reveals less about the secret image; however, the pixel\nerrors for the container rise (note the less saturated colors in some of the red regions).\n\nFigure 8: ROC curves: True Positive Rate vs. False Positive\nRate for StegExpose when trying to detect images embedded\nvia the proposed method.\n\nare composed, at each pixel, of 24 bits (8 \u00d7 (R, G, B)). If we \ufb02ip the \ufb01rst bit of the R channel\nof all the pixels in the container image, we can measure its effects on the reconstructions on the\ncontainer image itself and also, by propagating the modi\ufb01ed image through reveal network, on the\nreconstruction of the secret image. The effects are striking, see Figure 9.\nIn Figure 9, the left half of the \ufb01gure should be considered the \u201ccontrol\u201d. The upper left bar-chart,\nshows that the effects of changing a bit in the Red channel of the container only has an effect on the\nred-channel in the container, and that the magnitude is proportional to the signi\ufb01cance of the bit; this\nis exactly as expected and holds true for the Green and Blue channels as well (shown on the diagonal).\nMuch more interesting is the right side of Figure 9. Here, we see that a bit \ufb02ip in any bit position\nin any color channel of the container image, has an effect across all color channels in the secret\n\n7\n\n\fFigure 9: Sensitivity to Bits in the Container Image. (Left:) Control case, the container image.\nChanging a bit in the container image obviously has no effect on other bits. (Right): However, that\nsame bit \ufb02ip in the container image has effects across all color bands in the recovered secret image.\nFor example, if we want to measure the effect of \ufb02ipping the 2nd bit in the Blue channel, we would\nlook at the marked entries (marked with *).\n\nimage\u2019s reconstruction. Moreover, the error does not follow the standard signi\ufb01cance ordering of the\nbit positions. In other words, the information for the secret image is spread across the color channels;\nhence the reason it was not detected earlier.\nDespite StegExpose not \ufb01nding the encodings, we suspected that this amount of information should\nbe readily discoverable. It has recently been demonstrated in multiple studies that deep networks\nare competitive and often better than the more established methods of steganalysis that use hand\nselected image features [12\u201314]. Interestingly, many published steganalysis techniques are not blind:\nthey assume that they are detecting hidden images only from known hiding methods and even that\naccess to the statistics about the cover image distribution are available. This makes the steganalysis\ntask easier while making the hiding task much more dif\ufb01cult. We will not discuss the real-world\nlimitations of that assumption here, but will follow that example.\nWe trained the steganalysis networks as binary classi\ufb01ers, using the unperturbed ImageNet images\nas negative samples, and our containers as positive examples. We discovered that the steganalysis\nnetworks performed well with 24 \u00d7 (N \u00d7 N ) binary input layers. Each of the 24 layers represented a\nbit in the pixel representation of the input image. The networks were trained in an analogous manner\nto others in this study; again a multitude of architectures and learning rates were attempted.\nThe fully convolutional networks were able to obtain between 90-95% combined classi\ufb01cation rates.\nFor comparison, the same networks were retrained to \ufb01nd Least-Signi\ufb01cant-Bit substitution (where\nthe L least signi\ufb01cant bits of each color channel in the cover image are replaced with the L most\nsigni\ufb01cant bits from the secret image). The networks were trained and tested with L = 1, 2, 3, 4; all\nachieved over 99% combined rates. An additional set of tests revealed that even when a single bit\nwas randomly placed in one of the 4 bit positions, the detection rate remained over 99%. Given these\ndetection rates on even a single bit, the high detection rates with a full image are not surprising.\nNonetheless, this leads to an immediate direction for future work. In this study, the steganalysis\nnetworks were trained and used after the complete encoding system was created. However, using\npre-trained and/or simultaneously trained steganalysis networks in an adversarial learning framework,\nsuch as Generative-Adversarial-Networks (GAN) [30], during the training of steganography nets\nprovides a method to incorporate an obfuscation based error metric. The adversary provides a\nsupplemental error signal based upon the discoverability of the hidden message that is minimized in\naddition to the reconstruction errors. [31] has recently successfully explored a very similar procedure\nwith small messages hidden in images.\n\n5 Discussion & Future Work\n\nIn this section, we brie\ufb02y discuss a few observations found in this study and present ideas for future\nwork. First, lets consider the possibility of training a network to recover the hidden images after the\nsystem has been deployed and without access to the original network. One can imagine that if an\n\n8\n\n\fattacker was able to obtain numerous instances of container images that were created by the targeted\nsystem, and in each instance if at least one of the two component images (cover or secret image) was\nalso given, a network could be trained to recover both constituent components. What can an attacker\ndo without having access to this ground-truth \u201ctraining\u201d data? Using a smoothness constraint or other\ncommon heuristic from more classic image decomposition and blind source separation [32\u201334] may\nbe a \ufb01rst alternative. With many of these approaches, obtaining even a modest amount of training\ndata would be useful in tuning and setting parameters and priors. If such an attack is expected, it is\nopen to further research how much adapting the techniques described in Section 3.1 may mitigate the\neffectiveness of these attempts.\nAs described in the previous section, in its current form, the correct detection of the existence (not\nnecessarily the exact content) of a hidden image is indeed possible. The discovery rate is high because\nof the amount of information hidden compared to the cover image\u2019s data (1:1 ratio). This is far\nmore than state-of-the-art systems that transmit reliably undetected messages. We presented one of\nmany methods to make it more dif\ufb01cult to recover the contents of the hidden image by explicitly\nreducing the similarity of the cover image\u2019s residual to the hidden image. Though beyond the scope\nof this paper, we can make the system substantially more resilient by supplementing the presented\nmechanisms as follows. Before hiding the secret image, the pixels are permuted (in-place) in one\nof M previously agreed upon ways. The permuted-secret-image is then hidden by the system, as is\nthe key (an index into M). This makes recovery dif\ufb01cult even by looking at the residuals (assuming\naccess to the original image is available) since the residuals have no spatial structure. The use of\nthis approach must be balanced with (1) the need to send a permutation key (though this can be sent\nreliably in only a few bytes), and (2) the fact that the permuted-secret-image is substantially more\ndif\ufb01cult to encode; thereby potentially increasing the reconstruction-errors throughout the system.\nFinally, it should be noted that in order to employ this approach, the trained networks in this study\ncannot be used without retraining. The entire system must be retrained as the hiding networks can no\nlonger exploit local structure in the secret image for encoding information.\nThis study opens a new avenue for exploration with steganography and, more generally, in placing\nsupplementary information in images. Several previous methods have attempted to use neural net-\nworks to either augment or replace a small portion of an image-hiding system. We have demonstrated\na method to create a fully trainable system that provides visually excellent results in unobtrusively\nplacing a full-size, color image into another image. Although the system has been described in the\ncontext of images, the same system can be trained for embedding text, different-sized images, or\naudio. Additionally, by using spectrograms of audio-\ufb01les as images, the techniques described here\ncan readily be used on audio samples.\nThere are many immediate and long-term avenues for expanding this work. Three of the most\nimmediate are listed here. (1) To make a complete steganographic system, hiding the existence of the\nmessage from statistical analyzers should be addressed. This will likely necessitate a new objective in\ntraining (e.g. an adversary), as well as, perhaps, encoding smaller images within large cover images.\n(2) The proposed embeddings described in this paper are not intended for use with lossy image \ufb01les.\nIf lossy encodings, such as jpeg, are required, then working directly with the DCT coef\ufb01cients instead\nof the spatial domain is possible [35]. (3) For simplicity, we used a straightforward SSE error metric\nfor training the networks; however, error metrics more closely associated with human vision, such as\nSSIM [24], can be easily substituted.\n\nReferences\n[1] Gary C Kessler and Chet Hosmer. An overview of steganography. Advances in Computers, 83(1):51\u2013107,\n\n2011.\n\n[2] Gary C Kessler. An overview of steganography for the computer forensics examiner. Forensic Science\n\nCommunications, 6(3), 2014.\n\n[3] Gary C Kessler. An overview of steganography for the computer forensics examiner (web), 2015.\n\n[4] Jussi\n\nParikka.\n\nimage.\nhttps://unthinking.photography/themes/fauxtography/hidden-in-plain-sight-the-steganographic-image,\n2017.\n\nstagnographic\n\nHidden\n\nin\n\nplain\n\nsight:\n\nThe\n\n[5] Jessica Fridrich, Jan Kodovsk`y, Vojt\u02c7ech Holub, and Miroslav Goljan. Breaking hugo\u2013the process discovery.\n\nIn International Workshop on Information Hiding, pages 85\u2013101. Springer, 2011.\n\n9\n\n\f[6] Jessica Fridrich and Miroslav Goljan. Practical steganalysis of digital images: State of the art. In Electronic\n\nImaging 2002, pages 1\u201313. International Society for Optics and Photonics, 2002.\n\n[7] Hamza Ozer, Ismail Avcibas, Bulent Sankur, and Nasir D Memon. Steganalysis of audio based on audio\nquality metrics. In Electronic Imaging 2003, pages 55\u201366. International Society for Optics and Photonics,\n2003.\n\n[8] Farzin Yaghmaee and Mansour Jamzad. Estimating watermarking capacity in gray scale images based on\n\nimage complexity. EURASIP Journal on Advances in Signal Processing, 2010(1):851920, 2010.\n\n[9] Jessica Fridrich, Miroslav Goljan, and Rui Du. Detecting lsb steganography in color, and gray-scale images.\n\nIEEE multimedia, 8(4):22\u201328, 2001.\n\n[10] Abdelfatah A Tamimi, Ayman M Abdalla, and Omaima Al-Allaf. Hiding an image inside another image\nusing variable-rate steganography. International Journal of Advanced Computer Science and Applications\n(IJACSA), 4(10), 2013.\n\n[11] Tom\u00e1\u0161 Pevn`y, Tom\u00e1\u0161 Filler, and Patrick Bas. Using high-dimensional image models to perform highly\nundetectable steganography. In International Workshop on Information Hiding, pages 161\u2013177. Springer,\n2010.\n\n[12] Yinlong Qian, Jing Dong, Wei Wang, and Tieniu Tan. Deep learning for steganalysis via convolutional\nneural networks. In SPIE/IS&T Electronic Imaging, pages 94090J\u201394090J. International Society for Optics\nand Photonics, 2015.\n\n[13] Lionel Pibre, J\u00e9r\u00f4me Pasquet, Dino Ienco, and Marc Chaumont. Deep learning is a good steganalysis tool\nwhen embedding key is reused for different images, even if there is a cover source mismatch. Electronic\nImaging, 2016(8):1\u201311, 2016.\n\n[14] Lionel Pibre, Pasquet J\u00e9r\u00f4me, Dino Ienco, and Marc Chaumont. Deep learning for steganalysis is better\nthan a rich model with an ensemble classi\ufb01er, and is natively robust to the cover source-mismatch. arXiv\npreprint arXiv:1511.04855, 2015.\n\n[15] Sabah Husien and Haitham Badi. Arti\ufb01cial neural network for steganography. Neural Computing and\n\nApplications, 26(1):111\u2013116, 2015.\n\n[16] Imran Khan, Bhupendra Verma, Vijay K Chaudhari, and Ilyas Khan. Neural network based steganography\nalgorithm for still images. In Emerging Trends in Robotics and Communication Technologies (INTERACT),\n2010 International Conference on, pages 46\u201351. IEEE, 2010.\n\n[17] V Kavitha and KS Easwarakumar. Neural based steganography. PRICAI 2004: Trends in Arti\ufb01cial\n\nIntelligence, pages 429\u2013435, 2004.\n\n[18] Alexandre Santos Brandao and David Calhau Jorge. Arti\ufb01cial neural networks applied to image steganog-\n\nraphy. IEEE Latin America Transactions, 14(3):1361\u20131366, 2016.\n\n[19] Robert Jaru\u0161ek, Eva Volna, and Martin Kotyrba. Neural network approach to image steganography\n\ntechniques. In Mendel 2015, pages 317\u2013327. Springer, 2015.\n\n[20] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked\ndenoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.\nJournal of Machine Learning Research, 11(Dec):3371\u20133408, 2010.\n\n[21] Anthony J Bell and Terrence J Sejnowski. The \u201cindependent components\u201d of natural scenes are edge \ufb01lters.\n\nVision research, 37(23):3327\u20133338, 1997.\n\n[22] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks.\n\nScience, 313(5786):504\u2013507, 2006.\n\n[23] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015.\n\n[24] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error\n\nvisibility to structural similarity. IEEE transactions on image processing, 13(4):600\u2013612, 2004.\n\n[25] Andrew B Watson. Dct quantization matrices visually optimized for individual images. In proc. SPIE,\n\n1993.\n\n[26] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang,\nAndrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. Imagenet large\nscale visual recognition challenge. CoRR, abs/1409.0575, 2014.\n\n10\n\n\f[27] Benedikt Boehm. Stegexpose - A tool for detecting LSB steganography. CoRR, abs/1410.6656, 2014.\n\n[28] Stegexpose - github. https://github.com/b3dk7/StegExpose.\n\n[29] darknet.org.uk.\n\ntool\n\nfor detecting steganography in images.\nhttps://www.darknet.org.uk/2014/09/stegexpose-steganalysis-tool-detecting-steganography-images/, 2014.\n\nStegexpose \u2013 steganalysis\n\n[30] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron\nCourville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing\nSystems, pages 2672\u20132680, 2014.\n\n[31] Jamie Hayes and George Danezis. ste-gan-ography: Generating steganographic images via adversarial\n\ntraining. arXiv preprint arXiv:1703.00371, 2017.\n\n[32] J-F Cardoso. Blind signal separation: statistical principles. Proceedings of the IEEE, 86(10):2009\u20132025,\n\n1998.\n\n[33] Aapo Hyv\u00e4rinen, Juha Karhunen, and Erkki Oja. Independent component analysis, volume 46. John Wiley\n\n& Sons, 2004.\n\n[34] Li Shen and Chuohao Yeo. Intrinsic images decomposition using a local and global sparse representation\nof re\ufb02ectance. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages\n697\u2013704. IEEE, 2011.\n\n[35] Hossein Sheisi, Jafar Mesgarian, and Mostafa Rahmani. Steganography: Dct coef\ufb01cient replacement\nmethod andcompare with jsteg algorithm. International Journal of Computer and Electrical Engineering,\n4(4):458, 2012.\n\n11\n\n\f", "award": [], "sourceid": 1251, "authors": [{"given_name": "Shumeet", "family_name": "Baluja", "institution": "Google, Inc."}]}