{"title": "U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging", "book": "Advances in Neural Information Processing Systems", "page_first": 4415, "page_last": 4426, "abstract": "Neural networks are becoming more and more popular for the analysis of physiological time-series. The most successful deep learning systems in this domain combine convolutional and recurrent layers to extract useful features to model temporal relations. Unfortunately, these recurrent models are difficult to tune and optimize. In our experience, they often require task-specific modifications, which makes them challenging to use for non-experts. We propose U-Time, a fully feed-forward deep learning approach to physiological time series segmentation developed for the analysis of sleep data. U-Time is a temporal fully convolutional network based on the U-Net architecture that was originally proposed for image segmentation. U-Time maps sequential inputs of arbitrary length to sequences of class labels on a freely chosen temporal scale. This is done by implicitly classifying every individual time-point of the input signal and aggregating these classifications over fixed intervals to form the final predictions. We evaluated U-Time for sleep stage classification on a large collection of sleep electroencephalography (EEG) datasets. In all cases, we found that U-Time reaches or outperforms current state-of-the-art deep learning models while being much more robust in the training process and without requiring architecture or hyperparameter adaptation across tasks.", "full_text": "U-Time: A Fully Convolutional Network for Time\n\nSeries Segmentation Applied to Sleep Staging\n\nMathias Perslev\n\nDepartment of Computer Science\n\nUniversity of Copenhagen\n\nmap@di.ku.dk\n\nMichael Hejselbak Jensen\n\nDepartment of Computer Science\n\nUniversity of Copenhagen\nmhejselbak@gmail.com\n\nSune Darkner\n\nDepartment of Computer Science\n\nUniversity of Copenhagen\n\ndarkner@di.ku.dk\n\nPoul J\u00f8rgen Jennum\n\nDanish Center for Sleep Medicine\n\nRigshospitalet, Denmark\n\npoul.joergen.jennum@regionh.dk\n\nChristian Igel\n\nDepartment of Computer Science\n\nUniversity of Copenhagen\n\nigel@diku.dk\n\nAbstract\n\nNeural networks are becoming more and more popular for the analysis of\nphysiological time-series. The most successful deep learning systems in this\ndomain combine convolutional and recurrent layers to extract useful features to\nmodel temporal relations. Unfortunately, these recurrent models are dif\ufb01cult to\ntune and optimize. In our experience, they often require task-speci\ufb01c modi\ufb01cations,\nwhich makes them challenging to use for non-experts. We propose U-Time, a fully\nfeed-forward deep learning approach to physiological time series segmentation\ndeveloped for the analysis of sleep data. U-Time is a temporal fully convolutional\nnetwork based on the U-Net architecture that was originally proposed for image\nsegmentation. U-Time maps sequential inputs of arbitrary length to sequences of\nclass labels on a freely chosen temporal scale. This is done by implicitly classifying\nevery individual time-point of the input signal and aggregating these classi\ufb01cations\nover \ufb01xed intervals to form the \ufb01nal predictions. We evaluated U-Time for sleep\nstage classi\ufb01cation on a large collection of sleep electroencephalography (EEG)\ndatasets. In all cases, we found that U-Time reaches or outperforms current state-of-\nthe-art deep learning models while being much more robust in the training process\nand without requiring architecture or hyperparameter adaptation across tasks.\n\n1\n\nIntroduction\n\nDuring sleep our brain goes through a series of changes between different sleep stages, which are\ncharacterized by speci\ufb01c brain and body activity patterns [Kales and Rechtschaffen, 1968, Iber and\nAASM, 2007]. Sleep staging refers to the process of mapping these transitions over a night of sleep.\nThis is of fundamental importance in sleep medicine, because the sleep patterns combined with other\nvariables provide the basis for diagnosing many sleep related disorders [Sateia, 2014]. The stages can\nbe determined by measuring the neuronal activity in the cerebral cortex (via electroencephalography,\nEEG), eye movements (via electrooculography, EOG), and/or the activity of facial muscles (via\nelectromyography, EMG) in a polysomnography (PSG) study (see Figure S.1 in the Supplementary\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fMaterial). The classi\ufb01cation into stages is done manually. This is a dif\ufb01cult and time-consuming\nprocess, in which expert clinicians inspect and segment the typically 8\u201324 hours long multi-channel\nsignals. Contiguous, \ufb01xed-length intervals of 30 seconds are considered, and each of these segments\nis classi\ufb01ed individually.\nAlgorithmic sleep staging aims at automating this process. Recent work shows that such systems\ncan be highly robust (even compared to human performance) and may play an important role in\ndeveloping novel biomarkers for sleep disorders and other (e.g., neurodegenerative and psychiatric)\ndiseases [Stephansen et al., 2018, Warby et al., 2014, Schenck et al., 2014]. Deep learning is becoming\nincreasingly popular for the analysis of physiological time-series [Faust et al., 2018] and has already\nbeen applied to sleep staging [Robert et al., 1998, Ronzhina et al., 2012, Faust et al., 2019]. Today\u2019s\nbest systems are based on a combination of convolutional and recurrent layers [Supratak et al., 2017,\nBiswal et al., 2017]. While recurrent neural networks are conceptually appealing for time series\nanalysis, they are often dif\ufb01cult to tune and optimize in practice, and it has been found that for many\ntasks across domains recurrent models can be replaced by feed-forward systems without sacri\ufb01cing\naccuracy [Bai et al., 2018, Chen and Wu, 2017, Vaswani et al., 2017].\nThis study introduces U-Time, a feed-forward neural network for sleep staging. U-Time as opposed\nto recurrent architectures can be directly applied across datasets of signi\ufb01cant variability without\nany architecture or hyperparameter tuning. The task of segmenting the time series is treated similar\nto image segmentation by the popular U-Net architecture [Ronneberger et al., 2015]. This allows\nsegmentation of an entire PSG in a single forward pass and to output sleep stages at any temporal\nresolution. Fixing a temporal embedding, which is a common argument against feed-forward\napproaches to time series analysis, is no problem, because in our setting the full time series is\navailable at once and is processed entirely (or in large chunks) at different scales by the special\nnetwork architecture.\nIn the following, we present our general approach to classifying \ufb01xed length continuous segments of\nphysiological time series. In Section 3, we apply it to sleep stage classi\ufb01cation and evaluate it on 7\ndifferent PSG datasets using a \ufb01xed architecture and hyperparameter set. In addition, we performed\nmany experiments with a state-of-the-art recurrent architecture, trying to improve its performance\nover U-Time and to assess its robustness against architecture and hyperparameter changes. These\nexperiments are listed in the Supplementary Material. Section 4 summarizes our main \ufb01ndings,\nbefore we conclude in Section 5.\n\n2 Method\n\nU-Time is a fully convolutional encoder-decoder network. It is inspired by the popular U-Net\narchitecture originally proposed for image segmentation [Ronneberger et al., 2015, Koch et al., 2019b,\nPerslev et al., 2019] and so-called temporal convolutional networks [Lea et al., 2016]. U-Time adopts\nbasic concepts from U-Net for 1D time-series segmentation by mapping a whole sequence to a dense\nsegmentation in a single forward pass.\nLet x 2 R\u2327S\u21e5C be a physiological signal with C channels sampled at rate S for \u2327 seconds. Let\ne be the frequency at which we want to segment x, that is, the goal is to map x to b\u2327 \u00b7 ec labels,\nwhere each label is based on i = S/e sampled points. In sleep staging, 30 second intervals are\ntypically considered (i.e., e = 1/30 Hz). The input x to U-Time are T \ufb01xed-length connected\nsegments of the signal, each of length i. U-Time predicts the T labels at once. Speci\ufb01cally, the model\nf (x; \u2713) : RT\u21e5i\u21e5C ! RT\u21e5K with parameters \u2713 maps x to class con\ufb01dence scores for predicting K\nclasses for all T segments. That is, the model processes 1D signals of length t = T i in each channel.\nThe segmentation frequency e is variable. For instance, a U-Time model trained to segment with\ne = 1/30 Hz may output sleep stages at a higher frequency at inference time. In fact, the extreme\ncase of e = S, in which every individual time-point of x gets assigned a stage, is technically possible,\nalthough dif\ufb01cult (or even infeasible) to evaluate (see for example Figure 3). U-Time, in contrast to\nother approaches, allows for this \ufb02exibility, because it learns an intermediate representation of the\ninput signal where a con\ufb01dence score for each of the K classes is assigned to each time point. From\nthis dense segmentation the \ufb01nal predictions over longer segments of time are computed by projecting\nthe \ufb01ne-grained scores down to match the rate e at which human annotated labels are available.\n\n2\n\n\fFigure 1: Illustrative example of how U-Time maps a potentially very long input sequence (here only\nT = 4 for visual purposes) to segmentations at a chosen temporal scale (here e = 1/30 Hz) by \ufb01rst\nsegmenting the signal at every data-point and then aggregating these scores to form \ufb01nal predictions.\n\nThe U-Time model f consists of three logical submodules: The encoder fenc takes the raw physiolog-\nical signal and represents it by a deep stack of feature maps, where the input is sub-sampled several\ntimes. The decoder fdec learns a mapping from the feature stack back to the input signal domain that\ngives a dense, point-wise segmentation. A segment classi\ufb01er fsegment uses the dense segmentation to\npredict the \ufb01nal sleep stages at a chosen temporal resolution. These steps are illustrated in Figure 1.\nAn architecture overview is provided in Figure 2 and detailed in Supplementary Table S.2.\n\nEncoder The encoder consists of four convolution blocks. All convolutions in the three submodules\npreserve the input dimensionality through zero-padding. Each block in the encoder performs two\nconsecutive convolutions with 5-dimensional kernels dilated to width 9 [Yu and Koltun, 2015]\nfollowed by batch normalization [Ioffe and Szegedy, 2015] and max-pooling. In the four blocks, the\npooling windows are 10, 8, 6, and 4, respectively. Two additional convolutions are applied to the fully\ndown-sampled signal. The aggressive down-sampling reduces the input dimensionality by a factor\n1920 at the lowest layers. This 1) drastically reduces computational and memory requirements even\nfor very long inputs, 2) enforces learning abstract features in the bottom layers and, 3), combined\nwith stacked dilated convolutions, provides a large receptive \ufb01eld at the last convolution layer of the\nencoder. Speci\ufb01cally, the maximum theoretical receptive \ufb01eld of U-Time corresponds to approx. 5.5\nminutes given a 100 Hz signal (see Luo et al. [2017] for further information on theoretical and\neffective receptive \ufb01elds).\nThe input x to the encoder could be an entire PSG record (T = b\u2327 \u00b7 ec) or a subset. As the model is\nbased on convolution operations, the total input length t need not be static either, but could change\nbetween training and testing or even between individual mini-batches. While t is adjustable, it\nmust be large enough so that all max-pooling operations of the encoder are de\ufb01ned, which in our\nimplementation amounts to tmin = 1920 or 19.2 seconds of a 100 Hz signal. A too small t reduces\nperformance by preventing the model from exploiting long-range temporal relations.\n\nDecoder The decoder consists of four transposed-convolution blocks [Long et al., 2014], each\nperforming nearest-neighbour up-sampling [Odena et al., 2016] of its input followed by convolution\nwith kernel sizes 4, 6, 8 and 10, respectively, and batch normalization. The resulting feature maps\nare concatenated (along the \ufb01lter dimension) with the corresponding feature maps computed by the\nencoder at the same scale. Two convolutional layers, both followed by batch normalization, process\nthe concatenated feature maps in each block. Finally, a point-wise convolution with K \ufb01lters (of size\n1) results in K scores for each sample of the input sequence.\nIn combination, the encoder and decoder maps a t \u21e5 C input signal to t \u21e5 K con\ufb01dence scores.\nWe may interpret the decoder output as class con\ufb01dence scores assigned to every sample point of\nthe input signal, but in most applications we are not able to train the encoder-decoder network in a\nsupervised setting as labels are only provided or even de\ufb01ned over segments of the input signal.\n\n3\n\n\fEncoder Block\n\n4x\n\nDecoder Block\n\n4x\n\nSegment Classifier\n\nInput\n\nOutput\n\nInput\n\nSkip connection\n\nOutput\n\nDense \n\nsegmentation\n\nSleep \nstages\n\n1D convolution\nBatch normalization\n\nNN up-sampling\nAverage pool\n\nConcatenate\nMax pool\n\nFigure 2: Structural overview of the U-Time architecture. Please refer to Supplementary Figure S.2\nfor an extended, larger version.\n\nSegment classi\ufb01er The segment classi\ufb01er serves as a trainable link between the intermediate\nrepresentation de\ufb01ned by the encoder-decoder network and the label space. It aggregates the sample-\nwise scores to predictions over longer periods of time. For periods of i time steps, the segment\nclassi\ufb01er performs channel-wise mean pooling with width i and stride i followed by point-wise\nconvolution (kernel size 1). This aggregates and re-weights class con\ufb01dence scores to produce\nscores of lower temporal resolution. In training, where we only have T labels available, the segment\nclassi\ufb01er maps the dense t \u21e5 K segmentation to a T \u21e5 K-dimensional output.\nBecause the segment classi\ufb01er relies on the mean activation over a segment of decoder output, learning\nthe full function f (encoder+decoder+segment classi\ufb01er) drives the encoder-decoder sub-network to\noutput class con\ufb01dence scores distributed over the segment. As the input to the segment classi\ufb01er\ndoes not change in expectation if e (the segmentation frequency) is changed, this allows to output\nclassi\ufb01cations on shorter temporal scales at inference time. Such scores may provide important\ninsight into the individual sleep stage classi\ufb01cations by highlighting regions of uncertainty or fast\ntransitions between stages on shorter than 30 second scales. Figure 3 shows an example.\n\n3 Experiments and Evaluation\n\nOur brain is in either an awake or sleeping state, where the latter is further divided into rapid-eye-\nmovement sleep (REM) and non-REM sleep. Non-REM sleep is further divided into multiple states.\nIn his pioneering work, Kales and Rechtschaffen [1968] originally described four non-REM stages,\nS1, S2, S3 and S4. However, the American Academy of Sleep Medicine (AASM) provides a newer\ncharacterization [Iber and AASM, 2007], which most importantly changes the non-REM naming\nconvention to N1, N2, and N3, grouping the original stages S3 and S4 into a single stage N3. We use\nthis 5-class system and refer to Table S.1 in the Supplementary Material for an overview of primary\nfeatures describing each of the AASM sleep stages.\nWe evaluated U-Time for sleep-stage segmentation of raw EEG data. Speci\ufb01cally, U-Time was\ntrained to output a segmentation of an EEG signal into K = 5 sleep stages according to the AASM,\nwhere each segment lasts 30 seconds (e = 1/30 Hz). We \ufb01xed T = 35 in our experiments. That is,\nfor a S = 100 Hz signal we got an input of t = 105000 samples spanning 17.5 minutes.\nOur experiments were designed to gauge the performance of U-Time across several, signi\ufb01cantly\ndifferent sleep study cohorts when no task-speci\ufb01c modi\ufb01cations are made to the architecture or\nhyperparameters between each. In the following, we describe the data pre-processing, optimization,\nand evaluation in detail, followed by a description of the datasets considered in our experiments.\n\nPreprocessing All EEG signals were re-sampled at S = 100 Hz using polyphase \ufb01ltering with\nautomatically derived FIR \ufb01lters. Across the datasets, sleep stages were scored by at least one human\nexpert at temporal resolution e = 1/30 Hz. When stages were scored according to the Kales and\nRechtschaffen [1968] manual, we merged sleep stages S3 and S4 into a single N3 stage to comply\nwith the AASM standard. We discarded the rare and typically boundary-located sleep stages such as\n\u2018movement\u2019 and \u2018non-scored\u2019 and their corresponding PSG signals, producing the identical label set\n\n4\n\n\f{W, N1, N2, N3, R} for all the datasets. EEG signals were individually scaled for each record to\nmedian 0 and inter quartile range (IQR) 1.\nSome records display extreme values typically near the start or end of the PSG studies when electrodes\nare placed or the subject is entering or leaving the bed. To stabilize the pre-processing scaling as well\nas learned batch normalization, all 30 second segments that included one or more values higher than\n20 times the global IQR of that record were set to zero. Note that this only applied if the segment was\nscored by the human observer (almost always classi\ufb01ed \u2018wake\u2019 as these typically occur outside the\n\u2019in-bed\u2019 region), as they would otherwise be discarded. We set the values to zero instead of discarding\nthem to maintain temporal consistency between neighboring segments.\n\nn ykn \u02c6ykn\nn ykn+\u02c6ykn\n\nK PK\nk PN\nPK\nk PN\n\nOptimization U-Time was optimized using a \ufb01xed set of hyperparameters for all datasets. We\nused the Adam optimizer [Kingma and Ba, 2014] with learning rate \u2318 = 5 \u00b7 106 minimizing the\ngeneralized dice cost function with uniform class weights [Sudre et al., 2017, Crum et al., 2006],\n. This cost function is useful in sleep staging, because the classes\nL(y, \u02c6y) = 1  2\nmay be highly imbalanced. To further counter class imbalance we selected batches of size B = 12\non-the-\ufb02y during training according to the following scheme: 1) we uniformly sample a class from\nthe label set {W, N1, N2, N3, R}, 2) we select a random sleep period corresponding to the chosen\nclass from a random PSG record in the dataset, 3) we shift the chosen sleep segment to a random\nposition within the T = 35 width window of sleep segments. This scheme does not fully balance the\nbatches, as the 34 remaining segments of the input window are still subject to class imbalance.\nTraining of U-Time was stopped after 150 consecutive epochs of no validation loss improvement (see\nalso Cross-validation below). We de\ufb01ned one epoch as dL/T /Be gradient steps, where L is the total\nnumber of sleep segments in the dataset, T is the number of \ufb01xed-length connected segments input to\nthe model and B is the batch size. Note that we found applying regularization unnecessary when\noptimizing U-Time as over\ufb01tting was negligible even on the smallest of datasets considered here (see\nSleep Staging Datasets 3 below).\n\nModel speci\ufb01cation and hyperparameter selection The encoder and decoder parts of the U-Time\narchitecture are 1D variants of the 2D U-Net type model that we have found to perform excellent\nacross medical image segmentation problems (described in [Koch et al., 2019b, Perslev et al., 2019]).\nHowever, U-Time uses larger max-pooling windows and dilated convolution kernels. These changes\nwere introduced in order to increase the theoretical receptive \ufb01eld of U-Time and were made based\non our physiological understand of sleep staging rather than hyperparameter tuning. The only choice\nwe made based on data was the loss function, where we compared dice loss and cross entropy using\n5-fold cross-validation on the Sleep-EDF-39 dataset (see below). We did not modify the architecture\nor any hyperparameters (e.g., learning rates) after observing results on any of the remaining datasets.\nOur minimal hyperparameter search minimizes the risk of unintentional method-level over\ufb01tting.\nU-Time as applied here has a total of \u21e1 1.2 million trainable parameters. Note that this is at least one\norder of magnitude lower than typical CNN-LSTM architectures such as DeepSleepNet [Supratak\net al., 2017]. We refer to Table S.2 and Figure S.2 in the Supplementary Material for a detailed\nmodel speci\ufb01cation as well as to Table S.3 in the Supplementary Material for a detailed list of\nhyperparameters.\n\nCross-validation We evaluated U-Time on 7 sleep EEG datasets (see below) with no task-speci\ufb01c\narchitectural modi\ufb01cations. For a fair comparison with published results, we adopted the evaluation\nsetting that was most frequent in the literature for each dataset. In particular, we adopted the number\nof cross-validation (CV) splits, which are given in the results Table 2 below. All reported CV scores\nresult from single, non-repeated CV experiments.\nIt is important to stress that CV was always performed on a per-subject basis. The entire EEG record\n(or multiple records, if one subject was recorded multiple times) were considered a single entity in\nthe CV split process.1 On all datasets except SVUH-UCD, d5%e of the training records of each split\nwere used for validation to implement early-stopping based on the validation F1 score [S\u00f8rensen,\n\n1Not doing so leads to data from the same subject being in both training and test sets and, accordingly, to\noveroptimistic results. This effect is very pronounced. Therefore, we do not discuss published results where\ntraining and test set were not split on a per-subject basis.\n\n5\n\n\f1948, Dice, 1945]. For SVUH-UCD, a \ufb01xed number of training epochs (800) was used in all splits,\nbecause the dataset is too small to provide a representative validation set.\n\nEvaluation & metrics\nIn Table 2 we report the per-class F1/dice scores computed over raw\nconfusion matrices summed across all records and splits. This procedure was chosen to be comparable\nto the relevant literature. The table summarizes our results and published results for which the\nevaluation strategy was described clearly. Speci\ufb01cally, we only compare to studies in which CV has\nbeen performed on a subject-level and not segment level. In addition, we only compare to studies that\neither report F1 scores directly or provide other metrics or confusion matrices from which we could\nderive the F1 score. We only compare to EEG based methods.\n\nLSTM comparison We re-implemented the successful DeepSleepNet CNN-LSTM model\n[Supratak et al., 2017] for two purposes. First, we tried to push the performance of this model\nto the level of U-Time on the Sleep-EDF-39 and DCSM datasets (see below) through a series of\nhyperparameter experiments summarized in Table S.13 & Table S.14 in the Supplementary Ma-\nterial. Second, we used DeepSleepNet to establish a uni\ufb01ed, state-of-the-art baseline. Because\nthe DeepSleepNet system as introduced in Supratak et al. [2017] was trained for a \ufb01xed number\nof epochs without early stopping, we argue that direct application of the original implementation\nto new data would favour our U-Time model. Therefore, we re-implemented DeepSleepNet and\nplugged it into our U-Time training pipeline. This ensures that the models use the same early stopping\nmechanisms, class-balancing sampling schemes, and TensorFlow implementations. We employed\npre- and \ufb01netune training of the CNN and CNN-LSTM subnetworks, respectively, as in Supratak\net al. [2017]. We observed over\ufb01tting using the original settings, which we mitigated by reducing the\ndefault pre-training learning rate by a factor 10. For Sleep-EDF-39 and DCSM, DeepSleepNet was\nmanually tuned in an attempt to reach maximum performance (see Supplementary Material). We did\nnot evaluate DeepSleepNet on SVUH-UCD because of the small dataset size.\n\nImplementation U-Time is publicly available at https://github.com/perslev/U-Time. The\nsoftware includes a command-line-interface for initializing, training and evaluating models through\nCV experiments automatically distributed and controlled over multiple GPUs. The code is based on\nTensorFlow [Abadi et al., 2015]. We ran all experiments on a NVIDIA DGX-1 GPU cluster using\n1 GPU for each CV split experiment. However, U-Time can be trained on a conventional 8-12 GB\nmemory GPU. Because U-Time can score a full PSG in a single forward-pass, segmenting 10+ hours\nof signal takes only seconds on a laptop CPU.\n\nSleep Staging Datasets We evaluated U-Time on several public and non-public datasets covering\nmany real-life sleep-staging scenarios. The PSG records considered in our experiments have been\ncollected over multiple decades at multiple sites using various instruments and recording protocols to\nstudy sleep in both healthy and diseased individuals. We brie\ufb02y describe each dataset and refer to the\noriginal papers for details. Please refer to Table 1 for an overview and a list of used EEG channels.\nSleep-EDF A public PhysioNet database [Kemp et al., 2000, Goldberger et al., 2000] often used for\nbenchmarking automatic sleep stage classi\ufb01cation algorithms. As of 2019, the sleep-cassette subset\nof the database consists of 153 whole-night polysomnographic sleep recordings of healthy Caucasians\nage 25-101 taking no sleep-related medication. We utilze both the full Sleep-EDF database (referred\nto as Sleep-EDF-153) as well as a subset of 39 samples (referred to as Sleep-EDF-39) that correspond\nto an earlier version of the Sleep-EDF database that has been extensively studied in the literature.\nNote that for these two datasets speci\ufb01cally, we only considered the PSGs starting from 30 minutes\nbefore to 30 minutes after the \ufb01rst and last non-wake sleep stage as determined by the ground truth\nlabels in order to stay comparable with literature such as Supratak et al. [2017].\nPhysionet 2018 The objective of the 2018 Physionet challenge [Ghassemi et al., 2018, Goldberger\net al., 2000] was to detect arousal during sleep from PSG data contributed by the Massachusetts\nGeneral Hospital\u2019s Computational Clinical Neurophysiology Laboratory. Sleep stages were also\nprovided for the training set. We evaluated U-Time on splits of the 994 subjects in the training set.\nDCSM A non-public database provided by Danish Center for Sleep Medicine (DCSM), Rigshospi-\ntalet, Glostrup, Denmark comprising 255 whole-night PSG recordings of patients visiting the center\nfor diagnosis of non-speci\ufb01c sleep related disorders. Subjects vary in demographic characteristics,\ndiagnostic background and sleep/non-sleep related medication usage.\n\n6\n\n\fTable 1: Datasets overview. The Scoring column reports the annotation protocol (R&K = Rechtschaf-\nfen and Kales, AASM = American Academy of Sleep Medicine), Sample Rate lists the original rate\n(in Hz), and Size gives the number of subjects included in our study after exclusions.\n\nDataset\nS-EDF-39\nS-EDF-153\nPhysio-2018\nDCSM\nISRUC\nCAP\nSVUH-UCD\n\nSize\n39\n153\n994\n255\n99\n101\n25\n\nSample Rate Channel\nFpz-Cz\n100\n100\nFpz-Cz\n200 C3-A2\n256 C3-A2\n200 C3-A2\n\nNone\nNone\n\nScoring Disorders\nR&K\nR&K\nAASM Non-speci\ufb01c sleep disorders\nAASM Non-speci\ufb01c sleep disorders\nAASM Non-speci\ufb01c sleep disorders\n\n100-512 C4-A1/C3-A2 R&K\nR&K\n\n128 C3-A2\n\n7 types of sleep disorders\nSleep apnea, primary snoring\n\nISRUC Sub-group 1 of this public database [Khalighi et al., 2016] comprises all-night PSG record-\nings of 100 adult, sleep disordered individuals, some of which were under the effect of sleep\nmedication. Recordings were independently scored by two human experts allowing performance\ncomparison between the algorithmic solution and human expert raters. We excluded subject 40 due\nto a missing channel.\nCAP A public database [Terzano et al., 2002] storing 108 PSG recordings of 16 healthy subjects\nand 92 pathological patients diagnosed with one of bruxism, insomnia, narcolepsy, nocturnal frontal\nlobe epilepsy, periodic leg movements, REM behavior disorder, or sleep-disordered breathing. We\nexcluded subjects brux1, n\ufb02e6, n\ufb02e25, n\ufb02e27, n\ufb02e33, n12 and n16 due to missing C4-A1 and C3-A2\nchannels or due to inconsistent meta-data information.\nSVUH-UCD The St. Vincent\u2019s University Hospital / University College Dublin Sleep Apnea\nDatabase [Goldberger et al., 2000] contains 25 full overnight PSG records of randomly selected\nindividuals under diagnosis for either obstructive sleep apnea, central sleep apnea or primary snoring.\n\n4 Results\n\nWe applied U-Time with \ufb01xed architecture and hyperparameters to 7 PSG datasets. Table 2 lists the\nclass-wise F1 scores computed globally (i.e., on the summed confusion matrices over all records)\nfor U-Time applied to a single EEG channel (see Table 1), our re-implemented DeepSleepNet\n(CNN-LSTM) baseline and alternative models from literature. Table S.12 in the Supplementary\nmaterial further reports a small number of preliminary multi-channel U-Time experiments, which we\ndiscuss below. Table S.5 to Table S.11 in the Supplementary Material display raw confusion matrices\ncorresponding to the scores of Table 2. In Table S.4 in the Supplementary Material, we report the\nmean, standard deviation, minimum and maximum per-class F1 scores computed across individual\nEEG records, which may be more relevant from a practical perspective.\nEven without task-speci\ufb01c modi\ufb01cations, U-Time reached high performance scores for large and\nsmall datasets (such as Physionet-18 and Sleep-EDF-39), healthy and diseased populations (such as\nSleep-EDF-153 and DCSM), and across different EEG channels, sample rates, accusation protocols\nand sites etc. On all datasets, U-Time performed, to our knowledge, at least as well as any automated\nmethod from the literature that allows for a fair comparison \u2013 even if the method was tailored towards\nthe individual dataset. In all cases, U-Time performed similar or better than the CNN-LSTM baseline.\nWe attempted to push the performance of the CNN-LSTM architecture of our re-implemented\nDeepSleepNet [Supratak et al., 2017] to the performance of U-Time on both the Sleep-EDF-39 and\nDCSM datasets. These hyperparameter experiments are given in Table S.13 and Table S.14 in the\nSupplementary Material. However, across 13 different architectural changes to the DeepSleepNet\nmodel, we did not observe any improvement over the published baseline version on the Sleep-EDF-39\ndataset, indicating that the model architecture is already highly optimized for the particular study\ncohort. We found that relatively modest changes to the DeepSleepNet architecture can lead to large\nchanges in performance, especially for the N1 and REM sleep stages. On the DCSM dataset, a smaller\nversion of the DeepSleepNet (smaller CNN \ufb01lters, speci\ufb01cally) improved performance slightly over\nthe DeepSleepNet baseline.\n\n7\n\n\fTable 2: U-Time results across 7 datasets. U-Time and our CNN-LSTM baseline process single-\nchannel EEG data. Referenced models process single- or multi-channel EEG data. References: [1]\n[Supratak et al., 2017], [2] [Vilamala et al., 2017], [3] [Phan et al., 2018], [4] [Tsinalis et al., 2016],\n[5] [Andreotti et al., 2018].\n\nDataset\nS-EDF-39\n\nS-EDF-153\n\nPhysio-18\n\nDCSM\n\nISRUC\n\nCAP\n\nModel\nU-Time\nCNN-LSTM1\nVGGNet2\nCNN3\nAutoenc.4\nU-Time\nCNN-LSTM\nU-Time\nCNN-LSTM\nU-Time\nCNN-LSTM\nU-Time\nCNN-LSTM\nHuman obs.\nU-Time\nCNN5\nCNN-LSTM\n\nSVUH-UCD U-Time\n\nEval\n\nRecords\n\n39\n39\n39\n39\n39\n153\n153\n994\n994\n255\n255\n99\n99\n99\n101\n104\n101\n25\n\n.\n\nCV W\n0.87\n20\n20\n0.85\n0.81\n20\n0.77\n20\n0.72\n20\n10\n0.92\n0.91\n10\n0.83\n5\n5\n0.82\n0.97\n5\n0.96\n5\n0.87\n10\n0.84\n10\n-\n0.92\n0.78\n5\n0.77\n5\n5\n0.77\n0.75\n25\n\nGlobal F1 scores\n\nN1\n0.52\n0.47\n0.47\n0.41\n0.47\n0.51\n0.47\n0.59\n0.58\n0.49\n0.39\n0.55\n0.46\n0.54\n0.29\n0.35\n0.28\n0.51\n\nN2\n0.86\n0.86\n0.85\n0.87\n0.85\n0.84\n0.81\n0.83\n0.83\n0.84\n0.82\n0.79\n0.70\n0.80\n0.76\n0.76\n0.69\n0.79\n\nN3\n0.84\n0.85\n0.83\n0.86\n0.84\n0.75\n0.69\n0.79\n0.78\n0.83\n0.80\n0.87\n0.83\n0.85\n0.80\n0.78\n0.77\n0.86\n\nREM mean\n0.79\n0.84\n0.82\n0.77\n0.76\n0.82\n0.75\n0.82\n0.74\n0.81\n0.80\n0.76\n0.73\n0.79\n0.77\n0.84\n0.85\n0.77\n0.79\n0.82\n0.76\n0.82\n0.77\n0.78\n0.71\n0.72\n0.90\n0.80\n0.68\n0.76\n0.68\n0.76\n0.75\n0.65\n0.73\n0.73\n\n5 Discussion and Conclusions\n\nU-Time is a novel approach to time-series segmentation that leverages the power of fully convolutional\nencoder-decoder structures. It \ufb01rst implicitly segments the input sequence at every time point and\nthen applies an aggregation function to produce the desired output.\nWe developed U-Time for sleep staging, and this study evaluated it on seven different sleep PSG\ndatasets. For all tasks, we used the same U-Time network architecture and hyperparameter settings.\nThis does not only rule out over\ufb01tting by parameter or structure tweaking, but also shows that U-Time\nis robust enough to be used by non-experts \u2013 which is of key importance for clinical practice. In\nall cases, the model reached or surpassed state-of-the-art models from the literature as well as our\nCNN-LSTM baseline. In our experience, CNN-LSTM models require careful optimization, which\nindicates that they may not generalize well to other cohorts. This is supported by the observed drop\nin CNN-LSTM baseline performance when transferred to, for example, the ISRUC dataset. We\nfurther found that the CNN-LSTM baseline shows large F1 score variations, in particular for sleep\nstage N1, for small changes of the architecture (see Table S.13 in the Supplementary Material). In\ncontrast, U-Time reached state-of-the-art performance across the datasets without being tuned for\neach task. Our results show that U-Time can learn sleep staging based on various input channels\nacross both healthy and diseased subjects. We attribute the general robustness of U-Time to its fully\nconvolutional, feed-forward only architecture.\nReaders not familiar with sleep staging should be aware that even human experts from the same\nclinical site may disagree when segmenting a PSG.2 While human performance varies between\ndatasets, the mean F1 overlap between typical expert annotators is at or slightly above 0.8 [Stephansen\net al., 2018]. This is also the case on the ISRUC dataset as seen in Table 2. U-Time performs at the\nlevel of the human experts on the three non-REM sleep stages of the ISRUC dataset, while inferior\n\n2This is true in particular for the N1 sleep stage, which is dif\ufb01cult to detect due to its transitional nature and\n\nnon-strict separation from the awake and deep sleep stages.\n\n8\n\n\fFigure 3: Visualization of the class con\ufb01dence scores of U-Time trained on C = 3 input channels\non the Sleep-EDF-153 dataset when the segmentation frequency e is set to match the input signal\nfrequency. Here, U-Time outputs 100 sleep stage scores per second. The top, colored letters give the\nground truth labels for each 30 second segment. The height of the colored bars in the bottom frame\ngives the softmax (probability-like) scores for each sleep stage at each point in time.\n\non the REM sleep stage and slightly below on the wake stage. However, human annotators have\nthe advantage of being able to inspect several channels including the EOG (eye movement), which\noften provides important information in separating wake and REM sleep stages. This is because the\nEEG activity in wake and REM stages is similar, while \u2013 as the name suggests \u2013 characteristic eye\nmovements are indicative of REM sleep (see Table S.1 in the Supplementary Material). In this study\nwe chose to use only a single EEG channel to compare to other single-channel studies in literature. It\nis highly likely that U-Time for sleep staging would bene\ufb01t from receiving multiple input channels.\nThis is supported by our preliminary multi-channel results reported in Supplementary Table S.12.\nOn ISRUC and other datasets, the inclusion of an EOG channel improved classi\ufb01cation of the REM\nsleep stage.\nWe observed the lowest U-Time performance on the CAP dataset, although on par with the model of\nAndreotti et al. [2018], which requires multiple input channels. The CAP dataset is dif\ufb01cult because\nit contains recordings from patients suffering from seven different sleep related disorders, each of\nwhich are represented by only few subjects, and because of the need for learning both the C4-A1 and\nC3-A2 channels simultaneously.\nBesides its accuracy, robustness, and \ufb02exibility, U-Time has a couple of other advantageous properties.\nBeing fully feed-forward, it is fast in practice as computations may be distributed ef\ufb01ciently on GPUs.\nThe input window T can be dynamically adjusted, making it possible to score an entire PSG record\nin a single forward pass and to obtain full-night sleep stage classi\ufb01cations almost instantaneously\nin clinical practice. Because of its special architecture, U-Time can output sleep stages at a higher\ntemporal resolution than provided by the training labels. This may be of importance in a clinical\nsetting for explaining the system\u2019s predictions as well as in sleep research, where sleep stage dynamics\non shorter time scales are of great interest [Koch et al., 2019a]. Figure 3 shows an example.\nWhile U-Time was developed for sleep staging, we expect its basic design to be readily applicable\nto other time series segmentation tasks as well. Based on our results, we conclude that fully\nconvolutional, feed-forward architectures such as U-Time are a promising alternative to recurrent\narchitectures for times series segmentation, reaching similar or higher performance scores while\nbeing much more robust with respect to the choice of hyperparameters.\n\n9\n\n\fReferences\nM. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean,\nM. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz,\nL. Kaiser, M. Kudlur, J. Levenberg, D. Man\u00e9, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster,\nJ. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi\u00e9gas,\nO. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale\nmachine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/.\n\nF. Andreotti, H. Phan, N. Cooray, C. Lo, M. T. M. Hu, and M. De Vos. Multichannel sleep stage\nclassi\ufb01cation and transfer learning using convolutional neural networks. In 2018 40th Annual\nInternational Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages\n171\u2013174, 2018. doi: 10.1109/EMBC.2018.8512214.\n\nS. Bai, J. Z. Kolter, and V. Koltun. An empirical evaluation of generic convolutional and recurrent\nnetworks for sequence modeling. CoRR, abs/1803.01271, 2018. URL http://arxiv.org/abs/\n1803.01271.\n\nS. Biswal, J. Kulas, H. Sun, B. Goparaju, M. B. Westover, M. T. Bianchi, and J. Sun. SLEEPNET:\nautomated sleep staging system via deep learning. CoRR, abs/1707.08262, 2017. URL http:\n//arxiv.org/abs/1707.08262.\n\nQ. Chen and R. Wu. CNN is all you need. CoRR, abs/1712.09662, 2017. URL http://arxiv.\n\norg/abs/1712.09662.\n\nW. R. Crum, O. Camara, and D. L. G. Hill. Generalized overlap measures for evaluation and validation\nin medical image analysis. IEEE Transactions on Medical Imaging, 25(11):1451\u20131461, 2006. doi:\n10.1109/TMI.2006.880587.\n\nL. R. Dice. Measures of the Amount of Ecologic Association Between Species. Ecology, 26(3):\n\n297\u2013302, 1945. doi: 10.2307/1932409.\n\nO. Faust, Y. Hagiwara, T. J. Hong, O. S. Lih, and U. R. Acharya. Deep learning for healthcare\napplications based on physiological signals: A review. Computer Methods and Programs in\nBiomedicine, 161:1 \u2013 13, 2018. doi: 10.1016/j.cmpb.2018.04.005.\n\nO. Faust, H. Razaghi, R. Barika, E. J. Ciaccio, and U. R. Acharya. A review of automated sleep stage\nscoring based on physiological signals for the new millennia. Computer Methods and Programs in\nBiomedicine, 176:81 \u2013 91, 2019. doi: 10.1016/j.cmpb.2019.04.032.\n\nM. M. Ghassemi, B. E. Moody, L. H. Lehman, C. Song, Q. Li, H. Sun, R. G. Mark, M. B. Westover,\nand G. D. Clifford. You snooze, you win: the physionet/computing in cardiology challenge\n2018. In 2018 Computing in Cardiology Conference (CinC), volume 45, pages 1\u20134, 2018. doi:\n10.22489/CinC.2018.049.\n\nA. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E.\nMietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. PhysioBank, PhysioToolkit, and PhysioNet:\nComponents of a new research resource for complex physiologic signals. Circulation, 101(23):\ne215\u2013e220, 2000. doi: 10.1161/01.CIR.101.23.e215.\n\nC. Iber and AASM. The AASM manual for the scoring of sleep and associated events: rules,\nterminology and technical speci\ufb01cations. American Academy of Sleep Medicine, Westchester, IL,\n2007.\n\nS. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal\n\ncovariate shift. CoRR, abs/1502.03167, 2015. URL http://arxiv.org/abs/1502.03167.\n\nA. Kales and A. Rechtschaffen. A manual of standardized terminology, techniques and scoring\nsystem for sleep stages of human subjects. Allan Rechtschaffen and Anthony Kales, editors. U.\nS. National Institute of Neurological Diseases and Blindness, Neurological Information Network\nBethesda, Md, 1968.\n\n10\n\n\fB. Kemp, A. H. Zwinderman, B. Tuk, H. A. C. Kamphuisen, and J. J. L. Oberye. Analysis of\na sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg. IEEE\nTransactions on Biomedical Engineering, 47(9):1185\u20131194, 2000. doi: 10.1109/10.867928.\n\nS. Khalighi, T. Sousa, J. M. dos Santos, and U. Nunes. ISRUC-Sleep: A comprehensive public\ndataset for sleep researchers. Computer Methods and Programs in Biomedicine, 124:180\u2013192,\n2016. doi: 10.1016/j.cmpb.2015.10.013.\n\nD. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.\n\nURL http://arxiv.org/abs/1412.6980.\n\nH. Koch, P. Jennum, and J. A. E. Christensen. Automatic sleep classi\ufb01cation using adaptive seg-\nmentation reveals an increased number of rapid eye movement sleep transitions. Journal of Sleep\nResearch, 28(2):e12780, 2019a. doi: 10.1111/jsr.12780.\n\nT. L. Koch, M. Perslev, C. Igel, and S. S. Brandt. Accurate segmentation of dental panoramic\nradiographs with U-NETS. In IEEE International Symposium on Biomedical Imaging (ISBI),\npages 15\u201319, 2019b. doi: 10.1109/ISBI.2019.8759563.\n\nC. Lea, R. Vidal, A. Reiter, and G. D. Hager. Temporal convolutional networks: A uni\ufb01ed approach\nto action segmentation. CoRR, abs/1608.08242, 2016. URL http://arxiv.org/abs/1608.\n08242.\n\nJ. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation.\n\nCoRR, abs/1411.4038, 2014. URL http://arxiv.org/abs/1411.4038.\n\nW. Luo, Y. Li, R. Urtasun, and R. S. Zemel. Understanding the effective receptive \ufb01eld in deep\nconvolutional neural networks. CoRR, abs/1701.04128, 2017. URL http://arxiv.org/abs/\n1701.04128.\n\nA. Odena, V. Dumoulin, and C. Olah. Deconvolution and checkerboard artifacts. Distill, 2016. doi:\n\n10.23915/distill.00003.\n\nM. Perslev, E. B. Dam, A. Pai, and C. Igel. One network to segment them all: A general, lightweight\nsystem for accurate 3D medical image segmentation. In Medical Image Computing and Computer-\nAssisted Intervention (MICCAI), volume 11765 of LNCS, pages 30\u201338. Springer, 2019. doi:\n10.1007/978-3-030-32245-8_4.\n\nH. Phan, F. Andreotti, N. Cooray, O. Y. Ch\u00e9n, and M. D. Vos. Joint classi\ufb01cation and prediction\nCNN framework for automatic sleep stage classi\ufb01cation. CoRR, abs/1805.06546, 2018. URL\nhttp://arxiv.org/abs/1805.06546.\n\nC. Robert, C. Guilpin, and A. Limoge. Review of neural network applications in sleep research.\nJournal of Neuroscience Methods, 79(2):187 \u2013 193, 1998. doi: 10.1016/S0165-0270(97)00178-7.\nO. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image\nIn Medical Image Computing and Computer-Assisted Intervention (MICCAI),\n\nsegmentation.\nvolume 9351 of LNCS, pages 234\u2013241. Springer, 2015. doi: 10.1007/978-3-319-24574-4_28.\n\nM. Ronzhina, O. Janou\u0161ek, J. Kol\u00e1\u02c7rov\u00e1, M. Nov\u00e1kov\u00e1, P. Honz\u00edk, and I. Provazn\u00edk. Sleep scoring\nusing arti\ufb01cial neural networks. Sleep Medicine Reviews, 16(3):251 \u2013 263, 2012. doi: 10.1016/j.\nsmrv.2011.06.003.\n\nM. J. Sateia. International classi\ufb01cation of sleep disorders-third edition. Chest, 146(5):1387 \u2013 1394,\n\n2014. doi: 10.1378/chest.14-0970.\n\nC. Schenck, J. Montplaisir, B. Frauscher, B. Hogl, J.-F. Gagnon, R. Postuma, K. Sonka, P. Jennum,\nM. Partinen, I. Arnulf, V. C. de Cock, Y. Dauvilliers, P.-H. Luppi, A. Heidbreder, G. Mayer, F. Sixel-\nD\u00f6ring, C. Trenkwalder, M. Unger, P. Young, Y. Wing, L. Ferini-Strambi, R. Ferri, G. Plazzi,\nM. Zucconi, Y. Inoue, A. Iranzo, J. Santamaria, C. Bassetti, J. M\u00f6ller, B. Boeve, Y. Lai, M. Pavlova,\nC. Saper, P. Schmidt, J. Siegel, C. Singer, E. S. Louis, A. Videnovic, and W. Oertel. Corrigendum\nto \u201crapid eye movement sleep behavior disorder: devising controlled active treatment studies for\nsymptomatic and neuroprotective therapy\u2014a consensus statement from the international rapid\neye movement sleep behavior disorder study group\u201d [sleep med 14(8) (2013) 795\u2013806]. Sleep\nMedicine, 15(1):157, 2014. doi: 10.1016/j.sleep.2013.11.001.\n\n11\n\n\fT. J. S\u00f8rensen. A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based\non Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish\nCommons. Biologiske Skrifter, 5(4):1\u201335, 1948. Kongelige Danske Videnskabernes Selskab.\n\nJ. Stephansen, A. Olesen, M. Olsen, A. Ambati, E. Leary, H. Moore, O. Carrillo, L. Lin, F. Yan, Y. Sun,\nY. Dauvilliers, S. Scholz, L. Barateau, B. Hogl, A. Stefani, S. Hong, T. Kim, F. Pizza, G. Plazzi,\nS. Vandi, E. Antelmi, D. Perrin, S. Kuna, P. Schweitzer, C. Kushida, P. Peppard, H. S\u00f8rensen,\nP. Jennum, and E. Mignot. Neural network analysis of sleep stages enables ef\ufb01cient diagnosis of\nnarcolepsy. Nature Communications, 2018(9), 2018. doi: 10.1038/s41467-018-07229-3.\n\nC. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso. Generalised dice overlap as a\ndeep learning loss function for highly unbalanced segmentations. CoRR, abs/1707.03237, 2017.\nURL http://arxiv.org/abs/1707.03237.\n\nA. Supratak, H. Dong, C. Wu, and Y. Guo. DeepSleepNet: A model for automatic sleep stage\nscoring based on raw single-channel eeg. IEEE Transactions on Neural Systems and Rehabilitation\nEngineering, 25(11):1998\u20132008, 2017. doi: 10.1109/TNSRE.2017.2721116.\n\nM. G. Terzano, L. Parrino, A. Smerieri, R. Chervin, S. Chokroverty, C. Guilleminault, M. Hirshkowitz,\nM. Mahowald, H. Moldofsky, A. Rosa, R. Thomas, and A. Walters. Atlas, rules, and recording\ntechniques for the scoring of cyclic alternating pattern (cap) in human sleep. Sleep Medicine, 3(2):\n187 \u2013 199, 2002. doi: 10.1016/S1389-9457(02)00003-5.\n\nO. Tsinalis, P. M. Matthews, and Y. Guo. Automatic sleep stage scoring using time-frequency analysis\nand stacked sparse autoencoders. Annals of Biomedical Engineering, 44(5):1587\u20131597, 2016. doi:\n10.1007/s10439-015-1444-y.\n\nA. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin.\nAttention is all you need. CoRR, abs/1706.03762, 2017. URL http://arxiv.org/abs/1706.\n03762.\n\nA. Vilamala, K. H. Madsen, and L. K. Hansen. Deep convolutional neural networks for interpretable\nanalysis of EEG sleep stage scoring. CoRR, abs/1710.00633, 2017. URL http://arxiv.org/\nabs/1710.00633.\n\nP. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski,\nP. Peterson, W. Weckesser, J. Bright, S. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. May-\norov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, I. Polat, Y. Feng, E. W. Moore,\nJ. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Har-\nris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy. Scipy 1.0-\nfundamental algorithms for scienti\ufb01c computing in python. CoRR, abs/1907.10121, 2019. URL\nhttp://arxiv.org/abs/1907.10121.\n\nS. Warby, S. Wendt, P. Welinder, E. Munk, O. Carrillo, H. S\u00f8rensen, P. Jennum, P. Peppard, P. Perona,\nand E. Mignot. Sleep-spindle detection: crowdsourcing and evaluating performance of experts, non-\nexperts and automated methods. Nature Methods, 11:385\u2013392, 2014. doi: 10.1038/nmeth.2855.\nF. Yu and V. Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. CoRR,\n\nabs/1511.07122, 2015. URL http://arxiv.org/abs/1511.07122.\n\n12\n\n\f", "award": [], "sourceid": 2469, "authors": [{"given_name": "Mathias", "family_name": "Perslev", "institution": "University of Copenhagen"}, {"given_name": "Michael", "family_name": "Jensen", "institution": "University of Copehagen"}, {"given_name": "Sune", "family_name": "Darkner", "institution": "University of Copenhagen, Denmark"}, {"given_name": "Poul J\u00f8rgen", "family_name": "Jennum", "institution": "Danish Center for Sleep Medicine, Rigshospitalet"}, {"given_name": "Christian", "family_name": "Igel", "institution": "University of Copenhagen"}]}