{"title": "Deep, complex, invertible  networks for inversion of transmission effects in multimode optical fibres", "book": "Advances in Neural Information Processing Systems", "page_first": 3280, "page_last": 3291, "abstract": "We use complex-weighted, deep networks to invert the effects of multimode optical fibre distortion of a coherent input image. We generated experimental data based on collections of optical fibre responses to greyscale input images generated with coherent light, by measuring only image amplitude  (not amplitude and phase as is typical) at the output of \\SI{1}{\\metre} and \\SI{10}{\\metre} long, \\SI{105}{\\micro\\metre} diameter multimode fibre. This data is made available as the {\\it Optical fibre inverse problem} Benchmark collection. The experimental data is used to train complex-weighted models with a range of regularisation approaches. A {\\it unitary regularisation} approach for complex-weighted networks is proposed which performs well in robustly inverting the fibre transmission matrix, which fits well with the physical theory. A key benefit of the unitary constraint is that it allows us to learn a forward unitary model and analytically invert it to solve the inverse problem. We demonstrate this approach, and show how it can improve performance by incorporating knowledge of the phase shift induced by the spatial light modulator.", "full_text": "Deep, complex, invertible networks for inversion of\n\ntransmission effects in multimode optical \ufb01bres\n\nOis\u00edn Moran,1 Piergiorgio Caramazza,2 Daniele Faccio,2 Roderick Murray-Smith1,*\n\n1School of Computing Science, University of Glasgow, Scotland.\n\noisin@inscribe.ai, Roderick.Murray-Smith@glasgow.ac.uk,\n2School of Physics and Astronomy, University of Glasgow, Scotland.\n\npiergiorgio.caramazza@gmail.com, Daniele.Faccio@glasgow.ac.uk\n\nAbstract\n\nWe use complex-weighted, deep networks to invert the effects of multimode optical\n\ufb01bre distortion of a coherent input image. We generated experimental data based\non collections of optical \ufb01bre responses to greyscale input images generated with\ncoherent light, by measuring only image amplitude (not amplitude and phase as is\ntypical) at the output of 1 m and 10 m long, 105 \u00b5m diameter multimode \ufb01bre. This\ndata is made available as the Optical \ufb01bre inverse problem Benchmark collection.\nThe experimental data is used to train complex-weighted models with a range\nof regularisation approaches. A unitary regularisation approach for complex-\nweighted networks is proposed which performs well in robustly inverting the \ufb01bre\ntransmission matrix, which is compatible with the physical theory. A bene\ufb01t of\nthe unitary constraint is that it allows us to learn a forward unitary model and\nanalytically invert it to solve the inverse problem. We demonstrate this approach,\nand outline how it has the potential to improve performance by incorporating\nknowledge of the phase shift induced by the spatial light modulator.\n\n1\n\nIntroduction\n\nThe ability to better transmit images over multimode \ufb01bre (MMF) has applications in medicine,\ncryptography and communications in general. However, as pointed out by Stasio [2017], MMFs are\nnot normally utilised for imaging because they do not act as a relay optical element. This makes their\nuse for focussing and imaging impractical, without sophisticated compensation for their transmission\nproperties. In this paper we demonstrate that a deep network combining multiple complex dense\nlayers with orthogonal regularisation and conventional autoencoders can successfully invert speckle\nimages generated over a 10 m long distorted 105 \u00b5m MMF without phase information. Previous work\nhas either required phase information, or been limited to short (e.g. 30 cm) and straight \ufb01bres.\n\n1.1 Deep learning challenges\n\nThis particular imaging application has a number of features which are potentially challenging for the\nmachine learning community. The speckle images used as inputs have a non-local relationship with\nthe pixels in the inverted image, making local patch-based approaches impossible, and leading to\nchallenging memory problems. The statistics of speckle images are very different from typical images,\nmeaning that \u2018off-the-shelf\u2019 deep convolutional network approaches\u2014which assume locally-spatial\nstructure\u2014cannot be applied directly. There are clear circular correlations in the images, but \ufb01nding\nthese requires solving the inversion, leading to a chicken-and-egg problem. The non-locality also\nmeans that position-invariant approaches like convolution layers and max-pooling should not be\napplied before having brought the image back to an appropriate spatial arrangement\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFrom our knowledge of the optics, we know that the transformation of the image to speckles can\nbe represented by a complex-valued, orthogonal transmission matrix T , but this matrix will be very\nlarge (e.g. for 350 \u00d7 350 resolution images in and out, T will have 3504 \u2248 15 billion entries). The\nlonger and more distorted the optical \ufb01bre, the smaller the speckles at the output, so the higher the\ncamera resolution required to capture the details.\nFor instrumentation convenience and cost in real-world applications, we wish to see how far we can\ngo with amplitude-only sensing, i.e. we do not want to have the complexity of using an interferometer,\nwhich means we must assume no phase information for the speckle outputs. This means our model\n\ufb01tting of the transmission process is signi\ufb01cantly underconstrained.\n\n2 Background\n\n2.1 Modelling optical \ufb01bre transmission\n\nThe transmission of optical information by means of a guiding system has always been of paramount\ninterest, e.g.\nfor medical and communication systems. One particular challenge has been the\nrealisation of ever thinner waveguides able to transmit actual images. This could be decisive whenever\nwe want to obtain images of otherwise inaccessible zones, for example due to limitations related to\nlight depth penetration. However, when a wave is con\ufb01ned its propagation is constrained to follow a\nprecise mathematical relationship. In other words, the con\ufb01ning system permits only certain possible\nwave propagations, called \u2018modes\u2019. These constitute an orthogonal basis for the space of all the\npossible solutions of the propagation.\nSingle-mode \ufb01bres can be considered for the transmission of images, but as a single-mode waveguide\ncan carry just a single channel of amplitude information (and a phase one, since light propagation is\ncomplex), an image would require either a single-mode \ufb01bre bundle or a scanning system [Seibel\net al., 2006]. To date, the minimum diameter size for both of these systems is around 1 mm.\nMultimode \ufb01bres (MMF) have clear advantages because, as pointed out in [Choi et al., 2012b], the\ndensity of modes for unit of area is 1\u20132 orders of magnitude greater than that of a \ufb01bre bundle.\nMoreover, whenever using coherent light, single-mode \ufb01bres tend to couple between each other if\ntoo close (i.e. light crosses over from one \ufb01bre to the others), blurring the image and reducing its\ncontrast. The challenge with multi-mode \ufb01bres is that since different modes propagate with different\nvelocities in these types of \ufb01bre, interference between these generates a scrambled image, as shown\nin Figures 1 and 3.\nIn principle, the ability to perfectly model the transmission through an optical \ufb01bre would allow\nimaging through MMFs. The modes of the \ufb01bre form an orthogonal basis for the \ufb01bre, which can\nbe considered a linear system. If for each input (mode), the output can be calculated, the transfer\nfunction of this linear system can be constructed, allowing us to infer an unknown input for a\ngiven system output. However, in real optical \ufb01bres, mechanical deformations of the \ufb01bre, surface\nroughness, bending and even extremely small temperature variations lead to mode-to-mode coupling\nand changes in the refractive index, which makes it very challenging to predict the output \ufb01eld\nanalytically or numerically. Nonetheless, however complex, the actual propagation still remains\nlinear and deterministic. Therefore, similarly to complex random media, it has been demonstrated\nthat it is possible to build up an input orthogonal basis such that, by acquiring the output amplitude\nand phase relative to each mode, we can in principle, identify the transmission matrix (TM), T , that\nmaps the input into the output [ \u02c7Ci\u017em\u00e1r and Dholakia, 2011, 2012].1\nOther techniques have achieved scanner-free endoscopy reconstruction (on 1m long \ufb01bre). Choi et al.\n[2012a] present an empirical approach but based on measurement of both amplitude and phase of\nthe output light \ufb01eld, and requiring 500 measurement repetitions at different incident angles, and\nimproved the resolution [Mahalati et al., 2013] with just single-pixel measurements relative to a\nsequence of random illuminations, and implemented phase conjugation [Papadopoulos et al., 2012]\nthat allows the restoring process to proceed without calculating the full TM. Finally, a step further\nwas the realisation of a procedure that could include possible bends in the \ufb01bre. This was realised in\n\n1Note the non-locality of this system\u2014the ability to project a focused spot in one speci\ufb01c position of the\noutput relies on the knowledge of the whole matrix T , and would require inputs from all around the input image.\nThis will create challenges when we apply memory-intensive deep networks to this problem. See Figure 1.\n\n2\n\n\fFigure 1: Image of an input of single pixels and their speckle images at the output. Note how a single\npixel generates a response over the full space of the output sensor, and at much \ufb01ner detail than the\nresolution of the pixel. Also, the amplitude of output of the two-pixel example is not the sum of\namplitudes of the single pixel responses, due to complex-valued interactions of modes in the \ufb01bre.\n\n[Pl\u00f6schner et al., 2015] by having a precise characterisation of the \ufb01bre and an accurate theoretical\nmodel, for straight lengths up to 30 cm.2\n\n2.2 Related machine learning background\n\n2.2.1 Complex deep networks\n\nHirose [2003, 2012] reviews the long history of the \ufb01eld of complex neural networks. More recent\nwork includes [Tygert et al., 2016], and [Trabelsi et al., 2018], the latter of which provides and\nrigorously tests the key atomic components for complex-valued deep neural networks\u2014including\ncomplex convolutions, complex batch-normalisation, complex weight initialisation, and complex\nactivation functions. Guberman [2016] investigates the dif\ufb01culties in training complex-valued models\ndue to the lack of order over the complex \ufb01eld and \ufb01nds them signi\ufb01cantly less vulnerable to\nover-\ufb01tting.\n\n2.3\n\nInversion\n\nThere are two approaches to using machine learning to solve the inverse problem. We can identify\nthe \u2018causal\u2019 or \u2018forward\u2019 model from image to speckle, then numerically invert the forward model\nand optimise to \ufb01nd the input most likely to have generated that image. The alternative we test in this\npaper is whether we can directly learn an inverse model. This approach has been used in the past in a\nvariety of applications, e.g. control [Cabrera and Narendra, 1999], and single-pixel imaging [Higham\net al., 2018], and the two approaches can be combined as illustrated in human motor control [Wolpert\nand Kawato, 1998]. There has been growing interest in invertible neural networks recently [Dinh\net al., 2016, Ardizzone et al., 2018, Grathwohl et al., 2018]. In this paper we explore both forward\nand inverse modelling approaches to complex-valued inversion.\n\n3 Experimental setup\n\nThe experiment is carried out with the setup illustrated in Figure 2. In order to generate greyscale\nimages, a continuous wave (CW) laser source, with wavelength \u03bb = 532 nm, is used. The laser light\n\n2After this NeurIPS submission, [Borhani et al., 2018, Fan et al., 2018] published deep, convolutional\nlearning encoders inferring images from the speckle patterns, and demonstrate handwritten digit reconstruction.\nAs pointed out in [Borhani et al., 2018], these approaches are more limited to images from the training classes.\nWhile Rahmani et al. [2018] do show simple binary images from outside the training set as examples of \u2018transfer\nlearning\u2019, there is still limited evidence of true general, detailed imaging outside the training set.\n\n3\n\n\fFigure 2: Experimental con\ufb01guration\n\nbeam is modulated in amplitude by means of a phase-only spatial light modulator (SLM) along with\na polarised beam-splitter (PBS) and a half-wave plate (\u03bb/2) in the con\ufb01guration reported in Figure 2.\nThe SLM is controlled by a computer which generates the hologram that will be projected onto\nit. In this way, the resulting greyscale image, with values ranging from 0 to 100, is coupled into a\nmultimode step-index \ufb01bre by means of a doublet collimator lens (C1). At the output of the \ufb01bre,\nanother collimator (C2) along with a second lens (L) are used to image the output of the \ufb01bre onto a\nCMOS camera. The data acquired with the camera are digitised in the range from 0 to 255. Moreover,\nthe pixel resolution of a single acquisition is 350 \u00d7 350. Our \ufb01eld of view is reduced by the output\nlenses. However, in principle, it is possible to remove the lenses and expect the neural network to\ninclude the free-air propagation operator, from the \ufb01bre output to the camera, in its learning process\nas well. For this experiment two multimode \ufb01bre (core diameter = 105 \u00b5m and NA = 0.22) with\ndifferent lengths, respectively 1 m and 10 m, have been used. The sensor noise has an average error\nof ca. 1% of total signal per pixel.\nThe input images have a resolution of 28 \u00d7 28 pixels. Several different image datasets were used:\nMNIST [Lecun et al., 1998], Fashion-MNIST [Xiao et al., 2017] and random images. MNIST is\nthe common dataset containing ten classes of handwritten digits (0 to 9), whereas fashion MNIST\nconsists of ten different classes of clothing. As introduced before, all input images values are in the\nrange: 0 to 100. In order to give an example of the input and output images, we can refer to Figure 3.\nThe acquisition procedure is very straightforward: once the \ufb01rst hologram is loaded on the SLM, an\nimage is captured at the output of the \ufb01bre. Then, we move to the second hologram and so on for the\nentire training and testing dataset. Finally, we want to pinpoint that our experiment is not relying\non any phase acquisition, so that training and testing process are implemented with just amplitude\nimages. In the same way, no scanning procedure has been applied.\n\nFigure 3: Examples of input and output (speckle) images from the experiment. The inference task is\nthen to regenerate the input image from the speckle image.\n\n3.1 Dataset description\n\nThe dataset generated with the experimental equipment is composed of several public sets, including:\nMNIST [Lecun et al., 1998], Fashion-MNIST [Xiao et al., 2017], and some images of Muybridge\u2019s\nhistoric stop-motion \ufb01lms [Muybridge, 1955]. Examples of input image and speckle response are\nshown in Figure 3. Further we generated datasets composed of N 2\ni binary Hadamard bases, and\n60,000 random binary images.\n\n4\n\n\f3.1.1 Optical \ufb01bre inverse problem Benchmark collection\n\nWe share this dataset of of 90,000 images repeated at 4 \ufb01bre lengths. The images are acquired at a\n\ufb01bre length of 1 m and 10 m. Input images are at 28 \u00d7 28 pixel resolution for compatibility with the\nwidely used MNIST and MNIST-fashion benchmarks. Speckle images are recorded at 350 \u00d7 350\npixels. We provide accompanying code and sample models, with the intention that this can be used\nas a benchmark for this type of inference\u2014for both the machine learning and optics communities.\n\n4 Models used\n\nOur approach was to transform the speckle image back towards the original image space with a\ncomplex layer, as in Figure 4,3 followed by a denoising autoencoder which can compensate for\nimperfect reconstruction of the target image. Table 1 outlines the model performances.\n\nFigure 4: Inversion pipeline. A speckled image is transformed to the inferred input image that\ngenerated it, via an initial complex af\ufb01ne transformation.\n\n4.1 Nonlinear compensation for SLM effects and laser power drop-off\n\nThe spatial light modulator (SLM) in the con\ufb01guration showed in Figure 2 introduces nonlinear\neffects as a function of pixel value to intensity and phase (Figure 5). We characterise the intensity\nusing a photodiode (after the PBS). The phase is obtained by splitting the beam re\ufb02ected by the SLM\n(after the PBS) in two and making the beams interfere, \ufb01xing the pixel value of one of the beams and\nvarying the pixel value of the other one between 0-100, and measuring the result with a camera. The\nintensity of the laser has a roughly Gaussian decay as we move away from the focal point, requiring\nanother nonlinear layer to learn this effect. We use a general Hadamard elementwise multiplication\nlayer to capture these drop-off effects and any scaling issues needed in the unitary regularisation case.\n\nFigure 5: Optical nonlinearities. Graphs of Experimental Intensity and Phase mappings induced by\nSpatial-light modulator as a function of pixel value. Right, parameters of the Hadamard layer show\nthe learned function of intensity drop-off of the laser\n\n4.2 Complex dense layer for inverting T\n\nAs described earlier, we know the transmission matrix T is complex-valued, so implementation of\nthe initial dense layer of the network with complex-valued weights is appropriate. Trabelsi et al.\n[2018] investigated the usage of various different complex-valued activation functions, namely:\n\n3The approach used is monochrome, but RGB images can be communicated as monochrome channels\n\nthrough the complex transformation, and \ufb01rst integrated as RGB channels in later layers.\n\n5\n\nSpeckle ImageReconstructed Image Complex DenseLayerHadamardMultiplication\fComplex ReLU or CReLU,4 modReLU [Arjovsky et al., 2016], and zReLU [Guberman, 2016] and\nfound that models using the CReLU function had a lower classi\ufb01cation error on all three tested\ndatasets\u2014CIFAR-10, CIFAR-100 and SVHN* (a reduced training set of SVHN). Following this\nwe decided to focus on activation functions with separable real and imaginary parts, focusing in\nparticular on complex-valued analogues of the standard variants of the ReLU, such as the PReLU\n[He et al., 2015]. This is convenient, as if the complex numbers are represented as two real-valued\nchannels these complex-valued activation functions are easily computable by performing a standard\nreal-valued activation on each channel separately. The speci\ufb01c complex-weighted network used in\nthis paper was implemented with Keras [Chollet et al., 2015] and TensorFlow [Abadi et al., 2015].\nThis layer tends to be a memory-expensive element, scaling as O(N 2\no ) and, due to the non-local\nnature of the inversion process, we cannot work on rectangular sub-patches of the speckle images.\n\ni N 2\n\n4.3 Weight regularisation approaches\n\n4.3.1 Unitary, complex dense layer regularisation\n\nOrthogonal regularisation of weights was introduced in [Brock et al., 2016] to ensure an ef\ufb01cient use\nof the representational capability of the layer. Our motivation is related to the physics of the problem\nwhich suggest that the transmission channel can be represented by a limited number of orthogonal\nmodes. For the \ufb01bre used in this experiment there are 9000 modes. We therefore extend Brock et al.\u2019s\napproach to the complex-valued domain, by proposing a unitary regularisation, where the complex\nweight matrix W of the complex layer is pushed towards a unitary matrix for W \u2208 Cm\u00d7m, or more\ngenerally toward a semi-unitary matrix for rectangular W , by a regularising term\n\nLunitary(W ) = (cid:107)W W \u2217 \u2212 I(cid:107)1 for W \u2208 Cm\u00d7n,\n\n(1)\n\nwhere W \u2217 is the conjugate transpose of W .\n\n4.3.2 Amplitude and phase weight regularisation for complex layer\n\ntrade-off phase and amplitude penalties: L(W, \u03b1r, \u03b1\u03c6) =(cid:80)\nchannels independently: L(W ) =(cid:80)\ni((cid:60){Wi}2 +(cid:61){Wi}2) =(cid:80)\n\nAs part of the implementation of the complex-weighted dense layer, we can choose different options\nfor weight regularisation. One such option is amplitude & phase regularisation, implemented as\na weighted sum of amplitude and phase penalisation terms with two parameters \u03b1r and \u03b1\u03c6 to\ni \u03b1r|Wi|2 + \u03b1\u03c6 Wi. Amplitude-only\nregularisation (\u03b1\u03c6 = 0) is equivalent to the standard per-channel l2 penalisation applied to both\ni |Wi|2. Phase regularisation could\nbe useful for tasks with a target phase, or where something is known about the phase characteristics\nof the system involved. However, it is not explored in this paper.\n\nInversion of the forward function\n\n4.4\nA complex square matrix W is unitary if its conjugate transpose W \u2217 is its inverse, so if we enforce\nunitarity in the complex dense layer, we automatically have an analytic inverse at negligible computa-\ntional cost, and we can directly optimise the parameters in both a forward and inverse manner, as\nshown in Figure 6. We do this by creating an autoencoder-like structure for the forward model, going\nfrom image to speckle by multiplying by our estimate of complex matrix T , and then back to the\noriginal image again via its inverse T \u2217. This has the advantage of being able to incorporate input\nphase information into the optimisation of the model parameters for the forward path. As discussed\nin section 4.1, the SLM induces a phase shift which is a nonlinear function of pixel amplitude, as well\nas a nonlinear modulation of the pixel amplitude itself. Future physical experiments could use this\napproach to explicitly manipulate phase information on the input to better identify T , without having\nto measure output phase on the speckles. The inverse path goes from speckle to image, but has no\nphase information (although there is the option of including the inferred phase from the forward path,\nshown by a dotted line in Figure 6). This model was implemented in keras/tensor\ufb02ow, and produced\nsuccessful inversions when trained simultaneously in forward/backward directions. As our GPU\nmemory was insuf\ufb01cient to allow more than 56 \u00d7 56 transmission matrices in this more complex\nmodel, the results did not improve over the direct inverse method at the higher 112 pixel resolution.\n\n4Not to be confused with the Concatenated ReLU introduced in [Shang et al., 2016]\n\n6\n\n\fFigure 6: Model structure of the forward and inverse model. Cyan outputs are those for which we\nhave target training data. Coloured layers indicate tied weights between associated forward and\ninverse layers. During training, inferred speckle phases from the forward model can potentially be\nused to augment training via the inverse model. Controlling image phase input (much easier than\nsensing phase output) can also augment learning in this approach.\n\n5 Experimental results\n\nIn our experimental analysis we \ufb01rst compare approaches to invert the complex transmission matrix,\nthen investigate how we can further re\ufb01ne these images. All results here are on test data not used\nduring training. We split the data into 80% for training, 20% for test.\n\n5.1\n\nInversion of the transmission matrix\n\nHere we compare a real-valued baseline with a complex-valued dense layer with and without l2\nweight regularisation and with unitary regularisation. We also used a multiscale approach where\nmultiple average pooling layers after the complex dense layer fed into an output vector composed\nof a pyramid of halving resolutions of the target images (28 \u00d7 28, 14 \u00d7 14 and 7 \u00d7 7). The results\nare summarised in Figure 7 and Table 1. The complex-weighted models consistently learn faster\nand to a better accuracy than real-weighted ones. The evidence for differences among the various\nregularisation techniques is not strong for this dataset. Although there is some numerical difference\nin MSE, the visual difference in Figure 7 is minimal.\n\n(a) No regularisation\n\n(b) l2 weight regularisation\n\n(c) Unitary regularisation\n\nFigure 7: The impact of regularisation on inference by a single complex-valued layer with various\nregularisation methods. Speckle images of 112 \u00d7 112 resolution were used on 1m data.\n\n7\n\nImage amplitudeinputSpeckle inputHadamard layerComplex Dense layerUpsample2DComplex inverseDense layerPhase layer Image amplitudeoutputAmplitude layerComplex inverseDense layerInverse HadamardlayerInverse HadamardlayerImage phase outputImage phase \u00a0inputSpeckle amplitudeoutputSpeckle phase \u00a0outputPhase layer Amplitude layerPhase layer Image amplitudeoutputAmplitude layerImage phase \u00a0outputPredictive, inverse network.Phase input duringtraining stageTraining networkInferred speckle imageForward model\fTable 1: Model comparisons where each single layer model with 19,669,776 parameters (9,835,280\nfor real-valued models) was trained for 300 epochs, or until convergence, on a speckle resolution of\n112 \u00d7 112, with \u03bb = 0.03 where the model was l2 regularised.\n\nModel\n\nName\nReal-valued\nReal-valued None\nComplex\nComplex\nComplex\nComplex\n\nRegularisation\nl2 weight regularisation\n\nUnitary regularisation\nMultiscale no regularisation\nl2 weight regularisation\nNone\n\nMSE\n1034.62\n1025.96\n989.28\n988.10\n962.33\n960.31\n\n5.1.1 Comparison of MSE and SSIM results\n\nWhile a mean squared error cost function is commonly used, there is often a mismatch between MSE\nand the perceived quality of an image. Zhao et al. [2015] provides a more in-depth analysis of using\nperceptually-based loss functions in neural networks. We compared the perceptually-motivated SSIM\n[Wang et al., 2004] with MSE and found that for poorer quality models it denoised somewhat, but did\nnot add much subjective value to the optimised models for this application.5\n\n5.1.2 Impact of speckle resolution\nWe evaluate the impact of varying speckle image resolution for \ufb01xed target image resolution (28\u00d728).\nWe test on speckle inputs of 14, 28, 56, 112, & 224 square, shown in Figure 8. Inversion was by a\nsingle complex-valued layer minimising MSE, with l2 amplitude weight regularisation. The quality\nof the \ufb01nal estimate of the inverted image increases steadily with increasing speckle information, even\nwhen at 4 or 8 times the target image resolution. This is critical for inversion of longer transmission\n\ufb01bres, as the size of speckles decreases the longer and more distorted the optical \ufb01bre.\n\nFigure 8: Impact of speckle resolution on quality of inferred image. From left to right we test on\nspeckle image inputs of Ni =14, 28, 56, 112, and 224 pixels. This highlights the challenge to the\nmachine learning community\u2014increasing resolution far beyond the target resolution will improve\nthe accuracy of the inverted images (up to the capacity of the \ufb01bre), but these cause problems for a\nstraightforward application of the complex-weighted transformation, as the memory requirements\nscale O(N 2\n\ni N 2\n\no ).\n\n5We used the DSSIM implementation in the keras-contrib package. https://github.com/keras-team/\n\nkeras-contrib\n\n8\n\n\f5.1.3 Comparison of 1m and 10m results\n\nWhile transmission of arbitrary images over 1m of bent \ufb01bre is already signi\ufb01cantly longer than\nprevious comparable work in the optics community, our results using a 10m \ufb01bre go an order of\nmagnitude beyond that and approach the realm of communications. Figure 9 shows stills from a\nvideo from [Muybridge, 1955] at 1m and 10m, highlighting generalisation to content quite different\nfrom the MNIST and fashion training data.\n\nFigure 9: Comparison of inverse performance for 1m (upper) and 10m (lower) \ufb01bres, showing input,\nspeckle image, inferred output. Note changes in speckle patterns for the longer distance\u2014an increase\nin number, and decrease in size of speckles. Inverted images become noisier with increasing distance.\n\nThe outcome of the inverse transformation has inevitable errors\u2014distributed speckled noise and\nsystematic errors due to the missing phase information and the low precision of the SLM. Further\nimprovements in speci\ufb01c domains which could be characterised in an appropriate training collection,\ncould be gained by subsequent layers including, for example, the use of denoising convolutional\nautoencoder [Vincent et al., 2008]. Such denoising will become more valuable for longer distance\n\ufb01bres, where the losses in quality of the inversion from the complex layer become more signi\ufb01cant.\n\n6 Conclusions\n\nWe have presented an application of deep, complex-weighted networks to create the best known\nresults on inversion of optical \ufb01bre image with a non-scanning, interferometer-free approach to\nsensing. This allows direct analogue image transfer over bent optical \ufb01bre at distances not achieved\nbefore without measurement of phase information. One concrete advantage of this approach is that it\nallows real-time video rate analogue image transmission along very thin (105 \u00b5m) multi-mode \ufb01bre\n(scanning approaches would take N 2 longer to communicate each N \u00d7 N image).\nWe contribute the Optical \ufb01bre inverse problem6 benchmark dataset. This can act as an initial\nchallenge set for machine learning researchers looking for an interesting challenge which can not\nbe directly attacked by conventional convolutional networks. It brings challenging requirements of\nnon-local patches on input and the need for better models of non-local relationships between pixels if\nwe are to be able to work with the smaller speckles associated with longer optical \ufb01bres.\nDespite these challenges, we achieved world-leading performance via the use of relatively simple,\ncomplex-weighted networks, which proved better than real-weighted networks in representing the\ninverse transmission matrix, and which can generalise to a wide range of images. Furthermore, we\ntested unitary, complex-weight regularisation, which improved performance compared to real-valued\ndense layers, is compatible with our physical understanding of the optical \ufb01bre inversion problem,\nand enables analytic invertibility of the trained network.\n\n6Code and data can be found at https://github.com/rodms/opticalfibreml\n\n9\n\n\fReferences\nMart\u00edn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S.\nCorrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew\nHarp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath\nKudlur, Josh Levenberg, Dandelion Man\u00e9, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah,\nMike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent\nVanhoucke, Vijay Vasudevan, Fernanda Vi\u00e9gas, Oriol Vinyals, Pete Warden, Martin Wattenberg,\nMartin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on\nheterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from\ntensor\ufb02ow.org.\n\nL. Ardizzone, J. Kruse, S. Wirkert, D. Rahner, E. W. Pellegrini, R. S. Klessen, L. Maier-Hein,\nC. Rother, and U. K\u00f6the. Analyzing Inverse Problems with Invertible Neural Networks. ArXiv\ne-prints, August 2018.\n\nMartin Arjovsky, Amar Shah, and Yoshua Bengio. Unitary evolution recurrent neural networks.\nIn Proceedings of the 33rd International Conference on International Conference on Machine\nLearning - Volume 48, ICML\u201916, pages 1120\u20131128. JMLR.org, 2016. URL http://dl.acm.\norg/citation.cfm?id=3045390.3045509.\n\nNavid Borhani, Eirini Kakkava, Christophe Moser, and Demetri Psaltis. Learning to see through\n\nmultimode \ufb01bers. Optica, 5:960, August 2018.\n\nAndrew Brock, Theodore Lim, James M. Ritchie, and Nick Weston. Neural photo editing with\nintrospective adversarial networks. CoRR, abs/1609.07093, 2016. URL http://arxiv.org/\nabs/1609.07093.\n\nJoao BD Cabrera and Kumpati S Narendra. Issues in the application of neural networks for tracking\n\nbased on inverse control. IEEE Transactions on Automatic Control, 44(11):2007\u20132027, 1999.\n\nYoungwoon Choi, Changhyeong Yoon, Moonseok Kim, Taeseok Daniel Yang, Christopher Fang-\nYen, Ramachandra R. Dasari, Kyoung Jin Lee, and Wonshik Choi. Scanner-free and wide-\ufb01eld\nendoscopic imaging by using a single multimode optical \ufb01ber. Physical Review Letters, 109(20),\n2012a. ISSN 00319007. doi: 10.1103/PhysRevLett.109.203901.\n\nYoungwoon Choi, Changhyeong Yoon, Moonseok Kim, Taeseok Daniel Yang, Christopher Fang-\nYen, Ramachandra R Dasari, Kyoung Jin Lee, and Wonshik Choi. Scanner-free and wide-\ufb01eld\nendoscopic imaging by using a single multimode optical \ufb01ber. Physical review letters, 109(20):\n203901, 2012b.\n\nFran\u00e7ois Chollet et al. Keras. https://keras.io, 2015.\nTom\u00e1\u0161 \u02c7Ci\u017em\u00e1r and Kishan Dholakia. Shaping the light transmission through a multimode optical\n\ufb01bre: complex transformation analysis and applications in biophotonics. Optics Express, 19(20):\n18871\u201318884, 2011.\n\nTom\u00e1\u0161 \u02c7Ci\u017em\u00e1r and Kishan Dholakia. Exploiting multimode waveguides for pure \ufb01bre-based imaging.\n\nNature communications, 3:1027, 2012.\n\nL. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using Real NVP. ArXiv e-prints, May\n\n2016.\n\nPengfei Fan, Tianrui Zhao, and Lei Su. Deep learning the high variability and randomness inside\n\nmultimode \ufb01bres. arXiv:1807.09351, 18th July 2018.\n\nW. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud. FFJORD: Free-form\nContinuous Dynamics for Scalable Reversible Generative Models. ArXiv e-prints, October 2018.\narXiv preprint\n\nOn complex valued convolutional neural networks.\n\nNitzan Guberman.\n\narXiv:1602.09046, 2016.\n\nKaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image\nRecognition. Arxiv.Org, 7(3):171\u2013180, 2015. ISSN 1664-1078. doi: 10.3389/fpsyg.2013.00124.\nURL http://arxiv.org/pdf/1512.03385v1.pdf.\n\n10\n\n\fCatherine F Higham, Roderick Murray-Smith, Miles J Padgett, and Matthew P Edgar. Deep learning\n\nIn Complex-Valued Neural\n\nfor real-time single-pixel video. Scienti\ufb01c reports, 8(1):2369, 2018.\nAkira Hirose. Complex-valued neural networks: An introduction.\n\nNetworks: Theories and Applications, pages 1\u20136. World Scienti\ufb01c, 2003.\n\nAkira Hirose. Complex-valued neural networks, 2nd edition, volume 400. Springer Science &\n\nBusiness Media, 2012.\n\nY. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document\nrecognition. Proceedings of the IEEE, 86(11):2278\u20132324, Nov 1998. ISSN 0018-9219. doi:\n10.1109/5.726791.\n\nReza Nasiri Mahalati, Ruo Yu Gu, and Joseph M Kahn. Resolution limits for imaging through\n\nmulti-mode \ufb01ber. Optics express, 21(2):1656\u20131668, 2013.\n\nEadweard Muybridge. The Human Figure in Motion, volume 11. Courier Corporation, 1955.\nIoannis N Papadopoulos, Salma Farahi, Christophe Moser, and Demetri Psaltis. Focusing and\nscanning light through a multimode optical \ufb01ber using digital phase conjugation. Optics express,\n20(10):10583\u201310590, 2012.\n\nMartin Pl\u00f6schner, Tom\u00e1\u0161 Tyc, and Tom\u00e1\u0161 \u02c7Ci\u017em\u00e1r. Seeing through chaos in multimode \ufb01bres. Nature\n\nPhotonics, 9(8):529\u2013535, 2015. ISSN 17494893. doi: 10.1038/nphoton.2015.112.\n\nBabak Rahmani, Damien Loterie, Georgia Konstantinou, Demetri Psaltis, and Christophe Moser. Mul-\ntimode optical \ufb01ber transmission with a deep learning network. Light: Science and Applications, 7\n(69), October 2018.\n\nEric J Seibel, Richard S Johnston, and C David Melville. A full-color scanning \ufb01ber endoscope. In\nOptical Fibers and Sensors for Medical Diagnostics and Treatment Applications VI, volume 6083,\npage 608303. International Society for Optics and Photonics, 2006.\n\nWenling Shang, Kihyuk Sohn, Diogo Almeida, and Honglak Lee. Understanding and improving\nconvolutional neural networks via concatenated recti\ufb01ed linear units. In Proceedings of the 33rd\nInternational Conference on International Conference on Machine Learning - Volume 48, ICML\u201916,\npages 2217\u20132225. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm?id=3045390.\n3045624.\n\nNicolino Stasio. Multimode \ufb01ber optical imaging using wavefront control. PhD thesis, EPFL, 2017.\nChiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, Joao Felipe\nSantos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J Pal. Deep\ncomplex networks. In International Conference on Learning Representations, 2018. URL https:\n//openreview.net/forum?id=H1T2hmZAb.\n\nMark Tygert, Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, and Arthur Szlam. A\nmathematical motivation for complex-valued convolutional networks. Neural computation, 28(5):\n815\u2013825, 2016.\n\nPascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and\ncomposing robust features with denoising autoencoders. In Proceedings of the 25th international\nconference on Machine learning, pages 1096\u20131103. ACM, 2008.\n\nZhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error\nvisibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600\u2013612, April\n2004. ISSN 1057-7149. doi: 10.1109/TIP.2003.819861.\n\nDaniel M Wolpert and Mitsuo Kawato. Multiple paired forward and inverse models for motor control.\n\nNeural networks, 11(7-8):1317\u20131329, 1998.\n\nHan Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking\n\nmachine learning algorithms, 2017.\n\nHang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. Loss functions for neural networks for image\n\nprocessing. CoRR, abs/1511.08861, 2015. URL http://arxiv.org/abs/1511.08861.\n\n11\n\n\fAcknowledgements\n\nWe acknowledge funding from the Engineering and Physical Sciences Research Council on the\nQuantIC EPSRC grant EP/M01326X/1. R. Murray-Smith also acknowledges the support of the\nEPSRC Closed-loop data science grant EP/R018634/1. We would like to thank Francesco Tonolini\nand John Williamson for useful discussions and Maya Levitsky for discussions and support on\nsimulation experiments.\n\n12\n\n\f", "award": [], "sourceid": 1670, "authors": [{"given_name": "Ois\u00edn", "family_name": "Moran", "institution": "Inscribe.ai"}, {"given_name": "Piergiorgio", "family_name": "Caramazza", "institution": "University of Glasgow"}, {"given_name": "Daniele", "family_name": "Faccio", "institution": "University of Glasgow"}, {"given_name": "Roderick", "family_name": "Murray-Smith", "institution": "University of Glasgow"}]}