{"title": "Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images", "book": "Advances in Neural Information Processing Systems", "page_first": 2843, "page_last": 2851, "abstract": "We address a central problem of neuroanatomy, namely, the automatic segmentation of neuronal structures depicted in stacks of electron microscopy (EM) images. This is necessary to efficiently map 3D brain structure and connectivity. To segment {\\em biological} neuron membranes, we use a special type of deep {\\em artificial} neural network as a pixel classifier. The label of each pixel (membrane or non-membrane) is predicted from raw pixel values in a square window centered on it. The input layer maps each window pixel to a neuron. It is followed by a succession of convolutional and max-pooling layers which preserve 2D information and extract features with increasing levels of abstraction. The output layer produces a calibrated probability for each class. The classifier is trained by plain gradient descent on a $512 \\times 512 \\times 30$ stack with known ground truth, and tested on a stack of the same size (ground truth unknown to the authors) by the organizers of the ISBI 2012 EM Segmentation Challenge. Even without problem-specific post-processing, our approach outperforms competing techniques by a large margin in all three considered metrics, i.e. \\emph{rand error}, \\emph{warping error} and \\emph{pixel error}. For pixel error, our approach is the only one outperforming a second human observer.", "full_text": "Deep Neural Networks Segment Neuronal\nMembranes in Electron Microscopy Images\n\nDan C. Cires\u00b8an\u2217\n\nIDSIA\n\nUSI-SUPSI\nLugano 6900\n\ndan@idsia.ch\n\nAlessandro Giusti\n\nIDSIA\n\nUSI-SUPSI\nLugano 6900\n\nalessandrog@idsia.ch\n\nLuca M. Gambardella\n\nIDSIA\n\nUSI-SUPSI\nLugano 6900\n\nluca@idsia.ch\n\nJ\u00a8urgen Schmidhuber\n\nIDSIA\n\nUSI-SUPSI\nLugano 6900\n\njuergen@idsia.ch\n\nAbstract\n\nWe address a central problem of neuroanatomy, namely, the automatic segmen-\ntation of neuronal structures depicted in stacks of electron microscopy (EM) im-\nages. This is necessary to ef\ufb01ciently map 3D brain structure and connectivity. To\nsegment biological neuron membranes, we use a special type of deep arti\ufb01cial\nneural network as a pixel classi\ufb01er. The label of each pixel (membrane or non-\nmembrane) is predicted from raw pixel values in a square window centered on it.\nThe input layer maps each window pixel to a neuron. It is followed by a succes-\nsion of convolutional and max-pooling layers which preserve 2D information and\nextract features with increasing levels of abstraction. The output layer produces\na calibrated probability for each class. The classi\ufb01er is trained by plain gradient\ndescent on a 512 \u00d7 512 \u00d7 30 stack with known ground truth, and tested on a\nstack of the same size (ground truth unknown to the authors) by the organizers of\nthe ISBI 2012 EM Segmentation Challenge. Even without problem-speci\ufb01c post-\nprocessing, our approach outperforms competing techniques by a large margin in\nall three considered metrics, i.e. rand error, warping error and pixel error. For\npixel error, our approach is the only one outperforming a second human observer.\n\n1\n\nIntroduction\n\nHow is the brain structured? The recent \ufb01eld of connectomics [2] is developing high-throughput\ntechniques for mapping connections in nervous systems, one of the most important and ambitious\ngoals of neuroanatomy. The main tool for studying connections at the neuron level is serial-section\nTransmitted Electron Microscopy (ssTEM), resolving individual neurons and their shapes. After\npreparation, a sample of neural tissue is typically sectioned into 50-nanometer slices; each slice is\nthen recorded as a 2D grayscale image with a pixel size of about 4 \u00d7 4 nanometers (see Figure 1).\nThe visual complexity of the resulting stacks makes them hard to handle. Reliable automated seg-\nmentation of neuronal structures in ssTEM stacks so far has been infeasible. A solution of this\nproblem, however, is essential for any automated pipeline reconstructing and mapping neural con-\nnections in 3D. Recent advances in automated sample preparation and imaging make this increas-\n\n\u2217webpage: http://www.idsia.ch/\u02dcciresan\n\n1\n\n\fFigure 1: Left: the training stack (one slice shown). Right: corresponding ground truth; black lines\ndenote neuron membranes. Note complexity of image appearance.\n\ningly urgent, as they enable acquisition of huge datasets [6, 21], whose manual analysis is simply\nunfeasible.\nOur solution is based on a Deep Neural Network (DNN) [12, 13] used as a pixel classi\ufb01er. The\nnetwork computes the probability of a pixel being a membrane, using as input the image intensities\nin a square window centered on the pixel itself. An image is then segmented by classifying all of\nits pixels. The DNN is trained on a different stack with similar characteristics, in which membranes\nwere manually annotated.\nDNN are inspired by convolutional neural networks introduced in 1980 [16], improved in the 1990s\n[25], re\ufb01ned and simpli\ufb01ed in the 2000s [5, 33], and brought to their full potential by making them\nboth large and deep [12, 13]. Lately, DNN proved their ef\ufb01ciency on data sets extending from\nhandwritten digits (MNIST) [10, 12], handwritten characters [11] to 3D toys (NORB) [13] and faces\n[35]. Training huge nets requires months or even years on CPUs, where high data transfer latency\nprevented multi-threading code from saving the situation. Our fast GPU implementation [10, 12]\novercomes this problem, speeding up single-threaded CPU code by up to two orders of magnitude.\nMany other types of learning classi\ufb01ers have been applied to segmentation of TEM images, where\ndifferent structures are not easily characterized by intensity differences, and structure boundaries\nare not correlated with high image gradients, due to noise and many confounding micro-structures.\nIn most binary segmentation problems, classi\ufb01ers are used to compute one or both of the follow-\ning probabilities: (a) probability of a pixel belonging to each class; (b) probability of a boundary\ndividing two adjacent pixels. Segmentation through graph cuts [7] uses (a) as the unary term, and\n(b) as the binary term. Some use an additional term to account for the expected geometry of neuron\nmembranes[23].\nWe compute pixel probabilities only (point (a) above), and directly obtain a segmentation by mild\nsmoothing and thresholding, without using graph cuts. Our main contribution lies therefore in the\nclassi\ufb01er itself. Others have used off-the-shelf random forest classi\ufb01ers to compute unary terms of\nneuron membranes [22], or SVMs to compute both unary and binary terms for segmenting mito-\nchondria [28, 27]. The former approach uses haar-like features and texture histograms computed on\na small region around the pixel of interest, whereas the latter uses sophisticated rotational [17] and\nray [34] features computed on superpixels [3]. Feature selection mirrors the researcher\u2019s expecta-\ntion of which characteristics of the image are relevant for classi\ufb01cation, and has a large impact on\nclassi\ufb01cation accuracy. In our approach, we bypass such problems, using raw pixel values as inputs.\nDue to their convolutional structure, the \ufb01rst layers of the network automatically learn to compute\nmeaningful features during training.\nThe main contribution of the paper is a practical state-of-the-art segmentation method for neuron\nmembranes in ssTEM data, described in Section 2. It outperforms existing methods as validated\nin Section 3. The contribution is particularly meaningful because our approach does not rely on\nproblem-speci\ufb01c postprocessing: fruitful application to different biomedical segmentation problems\nis therefore likely.\n\n2\n\n\fFigure 2: Overview of our approach (see text).\n\n2 Methods\n\nFor each pixel we consider two possible classes, membrane and non-membrane. The DNN classi\ufb01er\n(Section 2.1) computes the probability of a pixel p being of the former class, using as input the\nraw intensity values of a square window centered on p with an edge of w pixels\u2014w being an odd\nnumber to enforce symmetry. When a pixel is close to the image border, its window will include\npixels outside the image boundaries; such pixels are synthesized by mirroring the pixels in the actual\nimage across the boundary (see Figure 2).\nThe classi\ufb01er is \ufb01rst trained using the provided training images (Section 2.2). After training, to\nsegment a test image, the classi\ufb01er is applied to all of its pixels, thus generating a map of mem-\nbrane probabilities\u2014i.e., a new real-valued image the size of the input image. Binary membrane\nsegmentation is obtained by mild postprocessing techniques discussed in Section 2.3, followed by\nthresholding.\n\n2.1 DNN architecture\n\nA DNN [13] consists of a succession of convolutional, max-pooling and fully connected layers. It\nis a general, hierarchical feature extractor that maps raw pixel intensities of the input image into a\nfeature vector to be classi\ufb01ed by several fully connected layers. All adjustable parameters are jointly\noptimized through minimization of the misclassi\ufb01cation error over the training set.\nEach convolutional layer performs a 2D convolution of its input maps with a square \ufb01lter. The\nactivations of the output maps are obtained by summing the convolutional responses which are\npassed through a nonlinear activation function.\nThe biggest architectural difference between the our DNN and earlier CNN [25] are max-pooling\nlayers [30, 32, 31] instead of sub-sampling layers. Their outputs are given by the maximum activa-\ntion over non-overlapping square regions. Max-pooling are \ufb01xed, non-trainable layers which select\nthe most promising features. The DNN also have many more maps per layer, and thus many more\nconnections and weights.\nAfter 1 to 4 stages of convolutional and max-pooling layers several fully connected layers further\ncombine the outputs into a 1D feature vector. The output layer is always a fully connected layer\nwith one neuron per class (two in our case). Using a softmax activation function for the last layer\nguarantees that each neuron\u2019s output activation can be interpreted as the probability of a particular\ninput image belonging to that class.\n\n2.2 Training\nTo train the classi\ufb01er, we use all available slices of the training stack, i.e., 30 images with a 512\u00d7512\nresolution. For each slice, we use all membrane pixels as positive examples (on average, about\n50000), and the same amount of pixels randomly sampled (without repetitions) among all non-\nmembrane pixels. This amounts to 3 million training examples in total, in which both classes are\nequally represented.\nAs is often the case in TEM images\u2014but not\nin other modalities such as phase-contrast\nmicroscopy\u2014the appearance of structures is not affected by their orientation. We take advantage of\n\n3\n\n\fthis property, and synthetically augment the training set at the beginning of each epoch by randomly\nmirroring each training instance, and/or rotating it by \u00b190\u25e6.\n\n2.3 Postprocessing of network outputs\n\nBecause each class is equally represented in the training set but not in the testing data, the network\noutputs cannot be directly interpreted as probability values; instead, they tend to severely overesti-\nmate the membrane probability. To \ufb01x this issue, a polynomial function post-processor is applied to\nthe network outputs.\nTo compute its coef\ufb01cients, a network N is trained on 20 slices of the training volume Ttrain and\ntested on the remaining 10 slices of the same volume (Ttest, for which ground truth is available). We\ncompare all outputs obtained on Ttest (a total of 2.6 million instances) to ground truth, to compute\nthe transformation relating the network output value and the actual probability of being a membrane;\nfor example, we measure that, among all pixels of Ttest which were classi\ufb01ed by N as having a 50%\nprobability of being membrane, only about 18% have in fact such a ground truth label; the reason\nbeing the different prevalence of membrane instances in Ttrain (i.e. 50%) and in Ttest (roughly 20%).\nThe resulting function is well approximated by a monotone cubic polynomial, whose coef\ufb01cients\nare computed by least-squares \ufb01tting. The same function is then used to calibrate the outputs of all\ntrained networks.\nAfter calibration (a grayscale transformation in image processing terms), network outputs are spa-\ntially smoothed by a 2-pixel-radius median \ufb01lter. This results in regularized of membrane boundaries\nafter thresholding.\n\n2.4 Foveation and nonuniform sampling\n\nWe experimented with two related techniques for improving the network performance by manipu-\nlating its input data, namely foveation and nonuniform sampling (see Figure 3).\nFoveation is inspired by the structure of human photoreceptor topography [14], and has recently been\nshown to be very effective for improving nonlocal-means denoising algorithms [15]. It imposes a\nspatially-variant blur on the input window pixels, such that full detail is kept in the central section\n(fovea), while the peripheral parts are defocused by means of a convolution with a disk kernel, to\nremove \ufb01ne details. The network, whose task is to classify the center pixel of the window, is then\nforced to disregard such peripheral \ufb01ne details, which are most likely irrelevant, while still retaining\nthe general structure of the window (context).\n\nFigure 3: Input windows with w = 65, from the training set. First row shows original window\n(Plain); other rows show effects of foveation (Fov), nonuniform sampling (Nu), and both (Fov+Nu).\nSamples on the left and right correspond to instances of class Membrane and Non-membrane, re-\nspectively. The leftmost image illustrates how a checkerboard pattern is affected by such transfor-\nmations.\n\nNonuniform sampling is motivated by the observation that (in this and other applications) larger\nwindow sizes w generally result in signi\ufb01cant performance improvements. However, a large w\n\n4\n\n\fresults in much bigger networks, which take longer to train and, at least in theory, require larger\namounts of training data to retain their generalization ability. With nonuniform sampling, image\npixels are directly mapped to neurons only in the central part of the window; elsewhere, their source\npixels are sampled with decreasing resolution as the distance from the window center increases. As\na result, the image in the window is deformed in a \ufb01sheye-like fashion, and covers a larger area of\nthe input image with fewer neurons.\nSimultaneously applying both techniques is a way of exploiting data at multiple resolutions\u2014\ufb01ne at\nthe center, coarse in the periphery of the window.\n\n2.5 Averaging outputs of multiple networks\n\nWe observed that large networks with different architectures often exhibit signi\ufb01cant output dif-\nferences for many image parts, despite being trained on the same data. This suggests that these\npowerful and \ufb02exible classi\ufb01ers exhibit relatively large variance but low bias. It is therefore reason-\nable to attempt to reduce such variance by averaging the calibrated outputs of several networks with\ndifferent architectures.\nThis was experimentally veri\ufb01ed. The submissions obtained by averaging the outputs of multiple\nlarge networks scored signi\ufb01cantly better in all metrics than the single networks.\n\n3 Experimental results\n\nAll experiments are performed on a computer with a Core i7 950 3.06GHz processor, 24GB of RAM,\nand four GTX 580 graphics cards. A GPU implementation [12] accelerates the forward propagation\nand back propagation routines by a factor of 50.\nWe validate our approach on the publicly-available dataset [9] provided by the organizers of the ISBI\n2012 EM Segmentation Challenge [1], which represents two portions of the ventral nerve cord of a\nDrosophila larva. The dataset is composed by two 512\u00d7 512\u00d7 30 stacks, one used for training, one\nfor testing. Each stack covers a 2 \u00d7 2 \u00d7 1.5 \u00b5m volume, with a resolution of 4 \u00d7 4 \u00d7 50 nm/pixel.\nFor the training stack, a manually annotated ground truth segmentation is provided. For the testing\nstack, the organizers obtained (but did not distribute) two manual segmentations by different expert\nneuroanatomists. One is used as ground truth, the other to evaluate the performance of a second\nhuman observer and provide a meaningful comparison for the algorithms\u2019 performance.\nA segmentation of the testing stack is evaluated through an automated online system, which com-\nputes three error metrics in relation to the hidden ground truth:\nRand error: de\ufb01ned as 1\u2212 Frand, where Frand represents the F1 score of the Rand index [29], which\n\nmeasures the accuracy with which pixels are associated to their respective neurons.\n\nWarping error: a segmentation metric designed to account for topological disagreements [19];\nit accounts for the number of neuron splits and mergers required to obtain the candidate\nsegmentation from ground truth.\n\nPixel error: de\ufb01ned as 1 \u2212 Fpixel, where Fpixel represents the F1 score of pixel similarity.\nThe automated system accepts a stack of grayscale images, representing membrane probability val-\nues for each pixel; the stack is thresholded using 9 different threshold values, obtaining 9 binary\nstacks. For each of the stacks, the system computes the error measures above, and returns the mini-\nmum error.\nPixel error is clearly not a suitable indicator of segmentation quality in this context, and is reported\nmostly for reference. Rand and Warping error metrics have various strengths and weaknesses, with-\nout clear consensus in favor of any. The former tends to provide a more consistent measure but\npenalizes even slightly misplaced borders, which would not be problematic in most practical ap-\nplications. The latter has a more intuitive interpretation, but completely disregards non-topological\nerrors.\nWe train four networks N1, N2, N3 and N4, with slightly different architectures, and window sizes\nw = 65 (for N1, N2, N3) and w = 95 (for N4); all networks use foveation and nonuniform sampling,\n\n5\n\n\fFigure 4: Above, from left to right: part of a source image from the test set; corresponding calibrated\noutputs of networks N1, N2, N3 and N4; average of such outputs; average after \ufb01ltering. Below, the\nperformance of each network, as well as the signi\ufb01cantly better performance due to averaging their\noutputs. All results are computed after median \ufb01ltering (see text).\n\nexcept N3, which uses neither. As the input window size increases, the network depth also increases\nbecause we keep the convolutional \ufb01lter sizes small. The architecture of N4 is the deepest, and is\nreported in Table 1.\nTraining time for one epoch varies from approximately 170 minutes for N1 (w = 65) to 340 minutes\nfor N4 (w = 95). All nets are trained for 30 epochs, which leads to a total training time of several\ndays. However, once networks are trained, application to new images is relatively fast: classify-\ning the 8 million pixels comprising the whole testing stack takes 10 to 30 minutes on four GPUs.\nSuch implementation is currently being further optimized (with foreseen speedups of one order of\nmagnitude at least) in view of application to huge, terapixel-class datasets [6, 21].\n\nTable 1: 11-layer architecture for network N4, w = 95.\n\nLayer\n\n0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n\nType\ninput\nconvolutional\nmax pooling\nconvolutional\nmax pooling\nconvolutional\nmax pooling\nconvolutional\nmax pooling\nfully connected\nfully connected\n\nMaps and neurons\n1 map of 95x95 neurons\n48 maps of 92x92 neurons\n48 maps of 46x46 neurons\n48 maps of 42x42 neurons\n48 maps of 21x21 neurons\n48 maps of 18x18 neurons\n48 maps of 9x9 neurons\n48 maps of 6x6 neurons\n48 maps of 3x3 neurons\n200 neurons\n2 neurons\n\nKernel size\n\n4x4\n2x2\n5x5\n2x2\n4x4\n2x2\n4x4\n2x2\n1x1\n1x1\n\nThe outputs of four such networks are shown in Figure 4, along with their performance after \ufb01ltering.\nBy averaging the outputs of all networks, results improve signi\ufb01cantly. The \ufb01nal result for one slice\nof the test stack is shown in Figure 5.\nOur results are compared to competing methods in Table 2.\nSince our pure pixel classi\ufb01er method aims at minimizing pixel error, Rand and warping errors are\njust minimized as a side-effect, but never explicitly accounted for during segmentation. In contrast,\nsome competing segmentation approaches adopt different post-processing techniques directly opti-\n\n6\n\n\fFigure 5: Left: slice 16 of the test stack. Right: corresponding output.\n\nTable 2: Results of our approach and competing algorithms. For comparison, the \ufb01rst two rows\nreport the performance of the second human observer and of a simple thresholding approach.\n\nGroup\nSecond Human Observer\nSimple Thresholding\nOur approach\nLaptev et al. [24] (1)\nLaptev et al. [24] (2)\nSumbul et al.\nLiu et al. [26] (1)\nKaynig et al. [23]\nLiu et al. [26] (2)\nKamentsky et al. [20]\nBurget et al. [8]\nTan et al. [36]\nBas et al. [4]\nIftikhar et al. [18]\n\nRand error [\u00b710\u22123] Warping error [\u00b710\u22126]\n344\n15522\n434\n556\n525\n646\n1602\n1124\n1134\n1512\n2641\n685\n1613\n16156\n\n27\n445\n48\n65\n70\n76\n84\n84\n89\n90\n139\n153\n162\n230\n\nPixel error [\u00b710\u22123]\n67\n222\n60\n83\n79\n65\n134\n157\n78\n100\n102\n88\n109\n150\n\nmizing the rand error. Nevertheless, their results are inferior. But such post-processing techniques\u2014\nwhich unlike our general classi\ufb01er are speci\ufb01c to this particular problem\u2014could be successfully\napplied to \ufb01netune our outputs, further improving results. Preliminary results in this direction are\nencouraging: the problem-speci\ufb01c postprocessing techniques in [20] and [24], operating on our seg-\nmentation, reduce the Rand error to measure to 36\u00b710\u22123 and 32\u00b710\u22123, respectively. Further research\nalong these lines is planned for the near future.\n\n4 Discussion and conclusions\n\nThe main strength of our approach to neuronal membrane segmentation in EM images lies in a\ndeep and wide neural network trained by online back-propagation to become a very powerful pixel\nclassi\ufb01er with superhuman pixel-error rate, made possible by an optimized GPU implementation\nmore than 50 times faster than equivalent code on standard microprocessors.\n\n7\n\n\fOur approach outperforms all other approaches in the competition, despite not even being tailored\nto this particular segmentation task. Instead, the DNN acts as a generic image classi\ufb01er, using raw\npixel intensities as inputs, without ad-hoc post-processing. This opens interesting perspectives on\napplying similar techniques to other biomedical image segmentation tasks.\n\nAcknowledgments\n\nThis work was partially supported by the Supervised Deep / Recurrent Nets SNF grant, Project Code\n140399.\n\nReferences\n[1] Segmentation of neuronal structures in EM stacks challenge - ISBI 2012. http://tinyurl.com/\n\nd2fgh7g.\n\n[2] The Open Connectome Project. http://openconnectomeproject.org.\n[3] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. S\u00a8usstrunk. Slic superpixels. Technical Report\n\n149300 EPFL, (June), 2010.\n\n[4] Erhan Bas, Mustafa G. Uzunbas, Dimitris Metaxas, and Eugene Myers. Contextual grouping in a concept:\na multistage decision strategy for EM segmentation. In Proc. of ISBI 2012 EM Segmentation Challenge.\n[5] Sven Behnke. Hierarchical Neural Networks for Image Interpretation, volume 2766 of Lecture Notes in\n\nComputer Science. Springer, 2003.\n\n[6] Davi D. Bock, Wei-Chung A. Lee, Aaron M. Kerlin, Mark L. Andermann, Greg Hood, Arthur W. Wetzel,\nSergey Yurgenson, Edward R. Soucy, Hyon S. Kim, and R. Clay Reid. Network anatomy and in vivo\nphysiology of visual cortical neurons. Nature, 471(7337):177\u2013182, 2011.\n\n[7] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. Pattern\n\nAnalysis and Machine Intelligence, IEEE Transactions on, 23(11):1222\u20131239, 2001.\n\n[8] Radim Burget, Vaclav Uher, and Jan Masek. Trainable Segmentation Based on Local-level and Segment-\n\nlevel Feature Extraction. In Proc. of ISBI 2012 EM Segmentation Challenge.\n\n[9] Albert Cardona, Stephan Saalfeld, Stephan Preibisch, Benjamin Schmid, Anchi Cheng, Jim Pulokas,\nPavel Tomancak, and Volker Hartenstein. An integrated micro- and macroarchitectural analysis of the\ndrosophila brain by computer-assisted serial section electron microscopy. PLoS Biol, 8(10):e1000502, 10\n2010.\n\n[10] Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, and J\u00a8urgen Schmidhuber. Deep, big, simple\n\nneural nets for handwritten digit recognition. Neural Computation, 22(12):3207\u20133220, 2010.\n\n[11] Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, and J\u00a8urgen Schmidhuber. Convolutional\nneural network committees for handwritten character classi\ufb01cation. In International Conference on Doc-\nument Analysis and Recognition, pages 1250\u20131254, 2011.\n\n[12] Dan Claudiu Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and J\u00a8urgen Schmidhuber.\nFlexible, high performance convolutional neural networks for image classi\ufb01cation. In International Joint\nConference on Arti\ufb01cial Intelligence, pages 1237\u20131242, 2011.\n\n[13] Dan Claudiu Ciresan, Ueli Meier, and J\u00a8urgen Schmidhuber. Multi-column deep neural networks for image\n\nclassi\ufb01cation. In Computer Vision and Pattern Recognition, pages 3642\u20133649, 2012.\n\n[14] C.A. Curcio, K.R. Sloan, R.E. Kalina, and A.E. Hendrickson. Human photoreceptor topography. The\n\nJournal of comparative neurology, 292(4):497\u2013523, 1990.\n\n[15] A. Foi and G. Boracchi. Foveated self-similarity in nonlocal image \ufb01ltering. In Proceedings of SPIE,\n\nvolume 8291, page 829110, 2012.\n\n[16] Kunihiko Fukushima. Neocognitron: A self-organizing neural network for a mechanism of pattern recog-\n\nnition unaffected by shift in position. Biological Cybernetics, 36(4):193\u2013202, 1980.\n\n[17] G. Gonz\u00b4alez, F. Fleurety, and P. Fua. Learning rotational features for \ufb01lament detection. In Computer\nVision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1582\u20131589. IEEE, 2009.\n[18] Saadia Iftikhar and Afzal Godil. The Detection of Neuronal Structures using a Patch-based Multi-features\n\nand Support Vector Machines Learning Algorithm. In Proc. of ISBI 2012 EM Segmentation Challenge.\n\n[19] Viren Jain, Benjamin Bollmann, Mark Richardson, Daniel R. Berger, Moritz Helmstaedter, Kevin L. Brig-\ngman, Winfried Denk, Jared B. Bowden, John M. Mendenhall, Wickliffe C. Abraham, Kristen M. Harris,\nN. Kasthuri, Ken J. Hayworth, Richard Schalek, Juan Carlos Tapia, Jeff W. Lichtman, and H. Sebastian\nSeung. Boundary Learning by Optimization with Topological Constraints. In CVPR, pages 2488\u20132495.\nIEEE, 2010.\n\n8\n\n\f[20] Lee Kamentsky. Segmentation of EM images of neuronal structures using CellPro\ufb01ler. In Proc. of ISBI\n\n2012 EM Segmentation Challenge.\n\n[21] Bobby Kasthuri. Mouse Visual Cortex Dataset\n\nin the Open Connectome Project.\n\nopenconnectomeproject.org/Kasthuri11/.\n\nhttp://\n\n[22] V. Kaynig, T. Fuchs, and J. Buhmann. Geometrical consistent 3D tracing of neuronal processes in ssTEM\ndata. Medical Image Computing and Computer-Assisted Intervention\u2013MICCAI 2010, pages 209\u2013216,\n2010.\n\n[23] V. Kaynig, T. Fuchs, and J.M. Buhmann. Neuron geometry extraction by perceptual grouping in sstem\nimages. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2902\u2013\n2909. IEEE, 2010.\n\n[24] Dmitry Laptev, Alexander Vezhnevets, Sarvesh Dwivedi, and Joachim Buhmann. Segmentation of Neu-\n\nronal Structures in EM stacks. In Proc. of ISBI 2012 EM Segmentation Challenge.\n\n[25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.\n\nProceedings of the IEEE, 86(11):2278\u20132324, November 1998.\n\n[26] Ting Liu, Mojtaba Seyedhosseini, Elizabeth Jurrus, and Tolga Tasdizen. Neuron Segmentation in EM\nImages using Series of Classi\ufb01ers and Watershed Tree. In Proc. of ISBI 2012 EM Segmentation Challenge.\n[27] A. Lucchi, K. Smith, R. Achanta, G. Knott, and P. Fua. Supervoxel-Based Segmentation of Mitochondria\nin EM Image Stacks With Learned Shape Features. Medical Imaging, IEEE Transactions on, (99):1\u20131,\n2012.\n\n[28] A. Lucchi, K. Smith, R. Achanta, V. Lepetit, and P. Fua. A fully automated approach to segmentation of\nirregularly shaped cellular structures in EM images. Medical Image Computing and Computer-Assisted\nIntervention\u2013MICCAI 2010, pages 463\u2013471, 2010.\n\n[29] W.M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical\n\nassociation, 66(336):846\u2013850, 1971.\n\n[30] Maximiliam Riesenhuber and Tomaso Poggio. Hierarchical models of object recognition in cortex. Nat.\n\nNeurosci., 2(11):1019\u20131025, 1999.\n\n[31] Dominik Scherer, Adreas M\u00a8uller, and Sven Behnke. Evaluation of pooling operations in convolutional\n\narchitectures for object recognition. In International Conference on Arti\ufb01cial Neural Networks, 2010.\n\n[32] Thomas Serre, Lior Wolf, and Tomaso Poggio. Object recognition with features inspired by visual cortex.\n\nIn Proc. of Computer Vision and Pattern Recognition Conference, 2005.\n\n[33] Patrice Y. Simard, Dave. Steinkraus, and John C. Platt. Best practices for convolutional neural networks\nIn Seventh International Conference on Document Analysis and\n\napplied to visual document analysis.\nRecognition, pages 958\u2013963, 2003.\n\n[34] K. Smith, A. Carleton, and V. Lepetit. Fast ray features for learning irregular shapes. In Computer Vision,\n\n2009 IEEE 12th International Conference on, pages 397\u2013404. IEEE, 2009.\n\n[35] Daniel Strigl, Klaus Ko\ufb02er, and Stefan Podlipnig. Performance and scalability of GPU-based convo-\nIn 18th Euromicro Conference on Parallel, Distributed, and Network-Based\n\nlutional neural networks.\nProcessing, 2010.\n\n[36] Xiao Tan and Changming Sun. Membrane extraction using two-step classi\ufb01cation and post-processing.\n\nIn Proc. of ISBI 2012 EM Segmentation Challenge.\n\n9\n\n\f", "award": [], "sourceid": 1292, "authors": [{"given_name": "Dan", "family_name": "Ciresan", "institution": null}, {"given_name": "Alessandro", "family_name": "Giusti", "institution": null}, {"given_name": "Luca", "family_name": "Gambardella", "institution": null}, {"given_name": "J\u00fcrgen", "family_name": "Schmidhuber", "institution": null}]}