{"title": "Active Learning Applied to Patient-Adaptive Heartbeat Classification", "book": "Advances in Neural Information Processing Systems", "page_first": 2442, "page_last": 2450, "abstract": "While clinicians can accurately identify different types of heartbeats in electrocardiograms (ECGs) from different patients, researchers have had limited success in applying supervised machine learning to the same task. The problem is made challenging by the variety of tasks, inter- and intra-patient differences, an often severe class imbalance, and the high cost of getting cardiologists to label data for individual patients. We address these difficulties using active learning to perform patient-adaptive and task-adaptive heartbeat classification. When tested on a benchmark database of cardiologist annotated ECG recordings, our method had considerably better performance than other recently proposed methods on the two primary classification tasks recommended by the Association for the Advancement of Medical Instrumentation. Additionally, our method required over 90% less patient-specific training data than the methods to which we compared it.", "full_text": "Active Learning Applied to Patient-Adaptive\n\nHeartbeat Classi\ufb01cation\n\nJenna Wiens\nCSAIL, MIT\n\njwiens@csail.mit.edu\n\nJohn V. Guttag\nCSAIL, MIT\n\nguttag@csail.mit.edu\n\nAbstract\n\nWhile clinicians can accurately identify different types of heartbeats in electro-\ncardiograms (ECGs) from different patients, researchers have had limited success\nin applying supervised machine learning to the same task. The problem is made\nchallenging by the variety of tasks, inter- and intra-patient differences, an often\nsevere class imbalance, and the high cost of getting cardiologists to label data\nfor individual patients. We address these dif\ufb01culties using active learning to per-\nform patient-adaptive and task-adaptive heartbeat classi\ufb01cation. When tested on\na benchmark database of cardiologist annotated ECG recordings, our method had\nconsiderably better performance than other recently proposed methods on the two\nprimary classi\ufb01cation tasks recommended by the Association for the Advance-\nment of Medical Instrumentation. Additionally, our method required over 90%\nless patient-speci\ufb01c training data than the methods to which we compared it.\n\n1\n\nIntroduction\n\nIn 24 hours an electrocardiogram (ECG) can record over 100,000 heartbeats for a single patient.\nOf course, a physician is not likely to look at all of them. Automated analysis of long-term ECG\nrecordings can help physicians understand a patient\u2019s physiological state and his/her risk for adverse\ncardiovascular outcomes [1] [2]. Often, an important step in such analysis is labeling the different\ntypes of heartbeats. This labeling reduces an ECG to a set of symbols transferable across patients.\nTrained clinicians can successfully identify over a dozen different types of heartbeats in ECG record-\nings. However, researchers have had limited success using supervised machine learning techniques\nto do the same. The problem is made challenging by the inter-patient differences present in the mor-\nphology and timing characteristics of the ECGs produced by compromised cardiovascular systems.\nThe variation in the physiological systems that produce the data means that a classi\ufb01er trained on\neven a large set of patients will yield unpredictable results when applied to a new cardiac patient.\nFor this reason, global classi\ufb01ers are highly unreliable and therefore not widely used in practice [3].\nHu et al was one of the \ufb01rst to describe an automatic patient-adaptive ECG beat classi\ufb01er [4]. It\ndistinguished ventricular ectopic beats (VEBs), from non-VEBs. This work employed a mixture of\nexperts approach, combining a global classi\ufb01er with a local classi\ufb01er trained on the \ufb01rst 5 minutes of\nthe test patient\u2019s record. Similarly, de Chazal et al augmented the performance of a global heartbeat\nclassi\ufb01er by including patient-speci\ufb01c expert knowledge for each test patient. Their local classi\ufb01er\nwas trained on the \ufb01rst 500 labeled beats of each record [3]. More recently, Ince et al developed\na patient-adaptive classi\ufb01cation scheme using arti\ufb01cial neural networks by incorporating the \ufb01rst 5\nminutes of each test recording in the training set [5] .\nBased on the results from these three studies, it is clear that patient-adaptive classi\ufb01ers provide\nincreased classi\ufb01cation accuracy. Unfortunately, patient-adaptive classi\ufb01ers are not used in practice\nbecause they require an unrealistic amount of labor to produce a cardiologist-labeled patient-speci\ufb01c\ntraining set. Furthermore, by sampling all of the patient-speci\ufb01c training data from one portion of\n\n1\n\n\fthe ECG, one is at risk for over-\ufb01tting to that patient\u2019s physiological state in time. Given a long-\nterm record, which is likely to contain high intra-patient differences, it is likely that constructing the\ntraining set in this manner will not yield a good representation of the patient\u2019s ECG.\nThere has been some success with hand-coded rule-based algorithms for heartbeat classi\ufb01cation.\nHamilton et al developed a rule-based algorithm for detecting one type of particularly dangerous\nectopic heartbeat, the premature ventricular contraction (PVC) [6]. While reasonably accurate, rule-\nbased algorithms are in\ufb02exible, since they can only be used for a single classi\ufb01cation task. And to be\nuseful in practice, a classi\ufb01er should not only be capable of adapting to new patients, but also to new\nclassi\ufb01cation problems, since the classi\ufb01cation task in question can change depending on the patient\nor even the clinician. Since the \ufb01eld of ECG research is continuously evolving, tools to analyze the\nsignal should be capable of adapting.\nIn this paper, we show how active learning can be successfully applied to the problems of both\npatient-adaptive and task-adaptive heartbeat classi\ufb01cation. We developed our method with a clin-\nical setting in mind: initially it requires no labeled data, it has no user-speci\ufb01ed parameters, and\nachieves good performance on an imbalanced data set. Applied to data from the MIT-BIH Arrhyth-\nmia Database our method outperforms current state-of-the-art machine learning heartbeat classi\ufb01-\ncation techniques and uses less training data. Moreover, our approach outperforms a rule-based\nalgorithm designed to detect an important class of abnormal beat. Finally, we discuss how the clas-\nsi\ufb01cation method performed when used in a prospective experiment with two cardiologists.\n\n2 Background\n\nWe begin with a brief background on the signal of interest, the ECG. Since we will consider dif-\nferent heartbeat classi\ufb01cation tasks we \ufb01rst present a few examples of heartbeat classes and ECG\nabnormalities.\n\n2.1 The ECG and ECG Abnormalities\n\nAn ECG records a patient\u2019s cardiac electrical activity by measuring the potential differences at the\nsurface of the patient\u2019s body. In most healthy patients, the ECG, measured from Lead II, begins with\na P-wave, is followed by a QRS complex and ends with a T-wave. Figure 1(a) shows an example\nof the ECG of a normal sinus rhythm beat (N). The exact morphology and timing of the different\nportions of the wave depend on the patient and lead placement.\n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: Normal sinus rhythm beats like the ones shown in (a) originate from the pacemaker cells\nof the sinoatrial node. Premature ventricular contractions (b) and atrial premature beats (c) are two\nexamples of ectopic beats.\n\nCardiac abnormalities can disrupt the heart\u2019s normal sinus rhythm, and, depending on their type\nand frequency, can vary from benign to life threatening. Examples of ectopic beats (beats that\ndo not originate in the sinoatrial node) are shown in Figures 1(b) and 1(c). Premature ventricular\ncontractions (PVCs), originate in the ventricles instead of in the pacemaker cells of the sinoatrial\nnode. They are common in patients who have suffered an acute myocardial infarction [7] and may\nindicate that a patient is at increased risk for more serious ventricular arrhythmias and sudden cardiac\ndeath [8]. When the electrical impulse originates from the atria, an atrial premature beat is recorded\nby the ECG as shown in Figure 1(c). Atrial premature beats tend not to be life threatening.\n\n2\n\n00.20.40.60.811.21.41.6\u22120.200.20.40.60.811.21.41.61.8Time (s)Amplitude(mV)RRR intervalQT intervalSQTP00.511.522.53\u22121.5\u22121\u22120.500.511.522.53Time(s)Amplitude (mv)post\u2212RRintervalpre\u2212RRinterval00.511.522.53\u22120.4\u22120.200.20.40.60.811.21.4Time (s)Amplitude (mv)\fBecause of their speci\ufb01c timing and morphology characteristics these two types of abnormal beats\nare generally distinguishable by trained cardiologists, but there are many exceptions. Not only can\nabnormalities vary from patient to patient, but the same recording may contain beats that belong\nto the same class but all look quite different. Figure 2 shows an example of an ECG containing\nmultiform PVCs.\n\nFigure 2: Each PVC is marked by a \u201cV\u201d and each normal sinus rhythm beat is marked by a \u201c\u00b7\u201d. The\nPVC morphology varies greatly among patients and even within recordings from a single patient.\n\n3 Methods\n\nIn this section we describe the two main components of our heartbeat classi\ufb01cation scheme. We\nbegin, with the process of feature extraction and then present the classi\ufb01cation method.\n\n3.1 Feature Extraction\n\nBefore extracting feature vectors, we pre-process and segment the ECG. We used PhysioNet\u2019s au-\ntomated R-peak detector to detect the R-peaks of each heartbeat [9]. Next, we removed baseline\nwander from the signals using the method described in [10]. Once pre-processed, the data was seg-\nmented into individual heartbeats based on \ufb01xed intervals before and after the R-peak, so that each\nbeat contained the same number of samples.\nOur goal was to develop a feature vector that worked well not only across patients but also across dif-\nferent heartbeat classi\ufb01cation tasks. This led us to use a combination of the ECG features proposed\nin [10],[11], and [12]. The elements of the feature vector, x, are described in Table 1.\n\nTable 1: Heartbeat features used in experiments.\n\nFeatures\nx1, ..., x60\n\nx61, x62, x63\nx64, x65, x66\n\nx67\n\nDescription\n\u2022 Wavelet coef\ufb01cients from the last 5 levels of a 6 level wavelet decomposition using\na Daubechies 2 wavelet\n\u2022 The normalized energy in different segments of the beat\n\u2022 The pre and post RR intervals normalized by a local average, and the average RR interval\n\u2022 Morphological distance between the current beat the record\u2019s median beat\n\nThe last, and most novel, feature in Table 1 is a measure of the morphological distance between\nthe represented beat and the median beat for a patient (recalculated every 500 beats). The feature is\nbased on the dynamic time warping algorithm used in [12] to measure the morphological distance\nbetween a \ufb01xed interval that contains a portion of the Q-T intervals of two beats.\n\n3.2 Classi\ufb01cation\n\nOur goal was to develop a clinically useful patient-adaptive heartbeat classi\ufb01cation method for solv-\ning different binary heartbeat classi\ufb01cation problems. We designed the classi\ufb01er for use in a clinical\nsetting, where physicians have little time to label beats, let alone tune classi\ufb01er parameters. Thus,\nit was important that the method should require few cardiologist-labeled heartbeats, and have no\nuser-de\ufb01ned parameters. Based on these goals we developed the algorithm presented below, which\ncombines different ideas from the literature [13-16].\n\n3\n\n\fInputs:\n(a) Unlabeled data {x1, ..., xn}\n(b) Max number of initial clusters per clustering, k\n(c) SVM cost parameter C\n(d) Stopping precision \u0001\n\n1. Cluster the data using hierarchical clustering with two different linkage criteria, yielding <= 2\u2217 k clusters.\n2. Query the centroid of each cluster. Add these points to the initially empty set of labeled examples.\n3. If the expert labeled all the points as belonging to the same class, stop, else k = 1.\n4. Train a linear SVM based on the labeled examples.\n5. Apply the SVM to all of the data.\n6. If all data that lies on or within the margin is labeled, stop.\n7. Re-cluster data that lie on or within the margin using hierarchical clustering with k = k + 1.\n8. Query the point from each cluster that lies closest to the current SVM decision boundary.\n9. Repeat steps 4-8 until the change in the margin is within \u0001 of zero.\n\nMany proposed techniques for SVM active learning assume one starts with some set of labeled data\nor, as in [13], the initial training examples are randomly selected. In our application, we start with a\npool of completely unlabeled data. Furthermore, since there is often a severe class imbalance (e.g.,\nsome multi-thousand beat recordings contain less than a handful of PVCs), choosing a small or even\nmoderate number of random samples is unlikely to be an effective approach to \ufb01nding representative\nsamples of a record. The choice of initial queries is crucial. If beats from only one class are queried\nthe algorithm could stop prematurely. More generally, the selection of the \ufb01rst set of queries is\nindependent of the binary task, and therefore the \ufb01rst query should contain at least one example\nfrom each of the beat classes contained in the record. We use clustering in an effort to quickly\nidentify representative samples from each class.\nWe experimented with different clustering techniques before choosing hierarchical clustering. On\naverage hierarchical clustering outperformed other popular clustering techniques like k-means. We\nbelieve this can be attributed to the fact that hierarchical clustering has the ability to produce a\nvariety of different clusters by modifying the linkage criterion. We chose to use two complementary\nlinkage criteria in attempt to address the intra-patient variation present in ECG records. The \ufb01rst\nmetric is average linkage. Average linkage de\ufb01nes the distance between two clusters, q and r, as the\naverage distance between all pairs of objects in q and r. This linkage is biased toward producing\nclusters with similar variances, and has the tendency to merge clusters with small variances. The\nsecond linkage criterion is Ward\u2019s linkage [17], de\ufb01ned in Equation 1.\n\nd(q, r) = ss(qr) \u2212 [ss(q) + ss(r)]\n\n(1)\n\nwhere ss(qr) is the within-cluster sum of squares for the resulting cluster when q and r are com-\nbined. The within-cluster sum of squares, ss(x), is de\ufb01ned as the sum of squares of the distances\nbetween all objects in the cluster and the centroid of the cluster:\n\nnx(cid:88)\n\ni=1\n\nnx(cid:88)\n\nj=1\n\nss(x) =\n\n|xi \u2212 1\nnx\n\nxj|2\n\n(2)\n\nUsing Ward\u2019s linkage tends to join clusters with a small number of points, and is biased towards\nproducing clusters with approximately the same number of samples. If presented with an outlier,\nWard\u2019s method tends to assign it to the cluster with the closest centroid, whereas the average linkage\ntends to assign it to the densest cluster, where it will have the smallest impact on the maximum\nvariance [18].\nOnce the initial queries are labeled, we train a linear SVM, and apply this SVM to all of the data.\nWe use linear SVMs because most heartbeat classi\ufb01cation tasks are close to linearly separable and\nbecause linear SVMs require few tuning parameters. Next, we re-cluster the data on or within the\nmargin of the SVM, incrementing the max number of clusters with each iteration. We then query a\nbeat from each cluster that is closest to the SVM decision boundary.\nAs described above, our algorithm would halt when no unlabeled data lay on or within the margin.\nFor some records, however, e.g., those with fusion beats - a fusion of normal and abnormal beats\n\n4\n\n\f- many beats can lie within the margin of the SVM and thus a clinician might end up labeling\nhundreds of beats that add little useful information.\nIntuitively, one should stop querying when\nadditional training data has little to no effect on the solution. The algorithm, therefore, terminates\nwhen the change in the margin between iterations is within \u0001.\n\n4 Experiments & Results\n\nWe implemented our algorithm in MATLAB, and used SV Mlight [19] to train the linear SVM at\neach iteration. We held the cost parameter of the linear SVM constant, at C = 100, throughout all\nexperiments. This value was selected based on previous cross-validation experiments. The stopping\nprecision \u0001 was held constant at \u0001 = 10\u22123. Typical ECG recordings contain beats from 2 to 5 classes\nbut can contain more; based on this a priori knowledge, we conservatively set k = 10. This value\nwas held constant throughout all experiments.\nTo test the utility of our proposed approach for heartbeat classi\ufb01cation we ran a series of experi-\nments on data from different patients, and for different classi\ufb01cation tasks. First, we compare the\nperformance of a classi\ufb01er obtained using our approach to two classi\ufb01ers recently presented in the\nliterature. Next, we directly measure the impact active learning has on the classi\ufb01cation of heart-\nbeats by creating our own passive learning classi\ufb01er using the same pre-processing and features as\nour proposed active learning method. Finally, we test our method using actual cardiologists.\nIn our experiments we report the classi\ufb01cation performance in terms of sensitivity (SE), speci\ufb01city\n(SP), and positive predictive value (PPV). As an overall measure of performance we use the F-score:\n\nF =\n\n2 \u2217 SE \u2217 P P V\nSE + P P V\n\n(3)\n\nThe F-score is a commonly-accepted performance evaluation measure in medicine and information\nretrieval where one data class (often the positive class) is more important than the other [20]. We\nuse this measure since the problem of heartbeat classi\ufb01cation suffers from severe class imbalance,\nand thus the SE (aka recall) and the PPV (aka precision) are more important than SP.\n\n4.1 Classi\ufb01cation Performance\n\nWe tested performance on the MIT-BIH Arrhythmia Database (MITDB) [9], a widely used bench-\nmark database that contains 48 half-hour ECG recordings, sampled at 360Hz, from 47 different\npatients. Twenty-three of these records, labeled 100 to 124 were selected at random from a source\nof 4000 recordings. The remaining 25 records, labeled 200 to 234 were selected because they con-\ntain rare clinical activity that might not have been represented had all 48 records been chosen at\nrandom. The database contains approximately 109,000 cardiologist labeled heartbeats. Each beat\nis labeled as belonging to one of 16 different classes. In some sense, the data in the MITDB is\ntoo good. It was collected at 360Hz, which is a higher sampling rate than is typical for the Holter\nmonitors used to gather most long term clinical data. To simulate this kind of data, we resampled\nthe pre-processed ECG signal at 128Hz.\nWe consider the two main classi\ufb01cation tasks proposed by the Association for the Advancement of\nMedical Instrumentation (AAMI): detecting ventricular ectopic beats (VEBs), and detecting supra-\nventricular ectopic beats (SVEBs). These two tasks have been the focus of other researchers in-\nvestigating patient-adaptive heartbeat classi\ufb01cation. Recently, Ince et al [5] and de Chazal et al\n[3] described methods that combine global information with patient-speci\ufb01c information. Ince et al\ntrained a global classi\ufb01er on 245 hand chosen beats from the MITDB, and then adapted the global\nclassi\ufb01er by training on labeled data from the \ufb01rst \ufb01ve minutes of each test record. Their reported\nresults of testing on 44 of the 48 records - all records with paced beats were excluded - from the\nMITDB are reported in Table 2. De Chazal et al trained their global classi\ufb01er on all of the data from\n22 patients in the MITDB, and then adapted the global classi\ufb01er by training on labeled data for the\n\ufb01rst 500 beats of each test record. Their reported results of testing on 22 records -different from the\nones used in the global training set- from the MITDB are also reported in Table 2.\nFor the same two classi\ufb01cation tasks we tested our proposed approach and we report the results\nwhen tested on the records reported on in [5] and [3]. In these experiments we exclude the queried\n\n5\n\n\fbeats from the test set, testing only on data the expert hasn\u2019t seen. This was also done in [5] and [3].\nSince we query far fewer beats that the other methods, we end up testing on many more beats.\n\nTable 2: Our proposed method outperforms other classi\ufb01ers for two common classi\ufb01cation tasks.\n\nVEB\n\nSVEB\n\nSP\n\nSE\n\nClassi\ufb01er\nInce et al\nProposed1\nChazal et al\nProposed2\n1 for the 44 records in common\n2 for the 22 records in common\n\n84.6% 98.7% 87.4%\n99.0% 99.9% 99.2%\n94.3% 99.7% 96.2%\n99.6% 99.9% 99.3%\n\nPPV F-Score\n\nSpec\n\nSens\n86.0% 63.5%\n99.0% 53.7%\n99.1% 88.3% 100.0% 99.2%\n95.2% 87.7%\n96.2% 47.0%\n99.5% 92.0% 100.0% 99.5%\n\nPPV F-Score\n58.2%\n93.4%\n61.2%\n95.6%\n\nAs Table 2 shows, the method proposed here does considerably better than the methods proposed in\n[5] and [3] for each task. For the task of classifying VEBs vs. non-VEBs, our method on average\nused 45 labeled beats (compared to roughly 350 beats for [5] and 500 beats for [3]) per record. For\nthe task of detecting SVEBs, our method used even fewer labeled beats. Recognizing SVEBs is\nconsiderably more dif\ufb01cult than detecting VEBs since the class imbalance problem is even more\nsevere and supra-ventricular beats are harder to distinguish from normal sinus rhythm beats.\n\nTable 3: Our algorithm outperforms a rule-based classi\ufb01er designed speci\ufb01cally for the task of\ndetecting PVCs.\n\nClassi\ufb01er\n\nHamilton et al\n\nProposed3\n3 for all 48 records\n\nSP\n\nSE\n92.8%\n98.4% 79.5%\n99.0% 100.0% 99.3%\n\nPPV F-Score\n85.7%\n99.1%\n\nHamilton et al proposed a rule-based classi\ufb01er for classifying PVCs vs. non-PVCs. Their software\nis freely available online, from eplimited.com. We applied their software to all of the records, see\nTable 3. Their method does particularly poorly on the four records containing paced beats. Omitting\nthese four records the F-Score increases to 91.4%, still worse than our method. One advantage of\nthe rule-based algorithm is that it does not require a labeled training set, whereas on average we\nrequire 45 labeled beats per record. However, unlike our method the rule-based algorithm can only\nbe used for one task.\n\n4.2 The Impact of Active Learning\n\nWe hypothesize that the difference in performance between our method and the other learning-based\nmethods discussed above is attributable partly to the design of our feature vector and partly to the\nmethod of choosing training data. In order to test this hypothesis we ran an experiment that directly\ncompares the effect of actively vs. passively selecting the training set, with all other parameters kept\nthe same (e.g., identical pre-processing, identical feature vectors, etc.).\nFor each of the 48 records in the MITDB we compare a VEB vs. non-VEB classi\ufb01er using our\napproach, to a linear SVM classi\ufb01er trained on the \ufb01rst 500 beats of each record. For each patient\nwe record the number of queries made, as well as the performance of each classi\ufb01er. Table 4 shows\nthe classi\ufb01cation results for each method across all patients. The column headed \u201c#Q\u201d gives the\nnumber of beats used for training each classi\ufb01er, while the column headed \u201cTP\u201d for true positives,\ngives the number of correctly labeled VEBs. The last row gives the totals across all records for each\nclassi\ufb01cation method.\nOverall, our classi\ufb01cation approach achieves an F-score over 99%, and the passive technique\nachieves an F-score of 94%. Compared to the passive approach, active learning used over 90%\nless training data, and resulted in over 85% fewer misclassi\ufb01ed heartbeats. These results empha-\nsize that fact that active learning can be used to dramatically reduce the labor cost of producing\nhighly accurate classi\ufb01ers. That the passive technique performed better than [5] and almost as well\nas [3], despite not having any global training data, suggests that our feature vector provides some\nadvantage.\n\n6\n\n\fTable 4: Active versus passive learning. Active learning outperforms a passive approach, and uses\nover 90% less data.\n\nActive vs. Passive VEB Classi\ufb01cation Results\n\nProposed\n\nPassive\n\n# Q\n22\n19\n28\n20\n30\n54\n50\n31\n52\n45\n22\n20\n19\n51\n20\n34\n20\n30\n32\n24\n20\n26\n32\n124\n45\n41\n103\n36\n109\n90\n29\n90\n20\n137\n53\n52\n61\n41\n20\n33\n20\n86\n66\n30\n24\n20\n91\n26\n2148\n\nTP\n1\n0\n4\n0\n2\n41\n520\n59\n17\n38\n1\n0\n0\n43\n0\n109\n0\n16\n444\n1\n0\n3\n41\n825\n198\n19\n410\n70\n203\n986\n1\n190\n0\n215\n256\n164\n162\n63\n0\n396\n0\n473\n362\n1\n2\n0\n830\n3\n7169\n\nTN FP\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n15\n0\n4\n0\n0\n1\n0\n20\n0\n0\n0\n0\n0\n0\n0\n7\n0\n0\n0\n0\n0\n0\n47\n\n2258\n1851\n2162\n2073\n2214\n2501\n1497\n2070\n1717\n2463\n2107\n2529\n1783\n1807\n1942\n2283\n1523\n2242\n1523\n1849\n2464\n1500\n1558\n1717\n1737\n2088\n2456\n2574\n2016\n1916\n2993\n2392\n2736\n2985\n1988\n3168\n2009\n2065\n2035\n2022\n2472\n2094\n1662\n2242\n1552\n1771\n2216\n2731\n102573\n\nFN\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n6\n1\n0\n0\n34\n1\n7\n6\n0\n5\n0\n5\n0\n0\n0\n1\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n66\n\n# Q\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n500\n24000\n\nTP\n0\n0\n3\n0\n1\n41\n507\n11\n17\n36\n0\n0\n0\n42\n0\n106\n0\n4\n444\n0\n0\n3\n30\n799\n0\n19\n397\n65\n190\n977\n0\n180\n0\n157\n254\n164\n159\n52\n0\n393\n0\n321\n356\n0\n2\n0\n810\n0\n6540\n\nTN\n2269\n1861\n2181\n2082\n2224\n2521\n1506\n2076\n1740\n2492\n2122\n2537\n1793\n1833\n1951\n2301\n1533\n2260\n1542\n1860\n2473\n1513\n1570\n1773\n1764\n2114\n2453\n2583\n2060\n1953\n3002\n2434\n2746\n3016\n2002\n3196\n2045\n2089\n2045\n2030\n2480\n2119\n1690\n2253\n1567\n1779\n2245\n2749\n102427\n\nFP\n0\n0\n0\n0\n0\n1\n0\n0\n4\n0\n0\n0\n0\n2\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n1\n79\n0\n59\n5\n0\n16\n0\n12\n1\n1\n0\n0\n0\n0\n0\n11\n0\n0\n0\n0\n1\n0\n193\n\nFN\n1\n0\n1\n0\n1\n0\n13\n48\n0\n2\n1\n0\n0\n1\n0\n3\n0\n12\n0\n1\n0\n0\n17\n27\n198\n0\n47\n6\n20\n15\n1\n15\n0\n63\n2\n0\n3\n12\n0\n3\n0\n152\n6\n1\n0\n0\n20\n3\n695\n\n100\n101\n102\n103\n104\n105\n106\n107\n108\n109\n111\n112\n113\n114\n115\n116\n117\n118\n119\n121\n122\n123\n124\n200\n201\n202\n203\n205\n207\n208\n209\n210\n212\n213\n214\n215\n217\n219\n220\n221\n222\n223\n228\n230\n231\n232\n233\n234\nTotals\n\n4.3 Experiments with Clinicians\n\nTo get a sense of the feasibility of using our approach in an actual clinical setting, we ran an ex-\nperiment with two cardiologists and data from another cohort of patients admitted with NSTEACS.\nThe ECG tracings in this database, unlike those in the MITDB, are not particularly clean, i.e., they\ncontain a considerable amount of noise and many artifacts. This makes them more representative of\nthe data with which an algorithm in clinical use is likely to have to deal. We considered 4 randomly\nchosen records, from a subset of patients who had experienced at least one episode of ventricular\ntachycardia in the 7 day period following randomization. For each record, we consider the \ufb01rst\nhalf-hour, giving us a test set of 8230 heartbeats.\nIn these experiments we used a slightly different stopping criterion developed earlier. As our algo-\nrithm chose beats to be labeled, each cardiologist was presented with an ECG plot of the heartbeat\nto be labeled and the beats surrounding it, like the one shown in Figure 3. The cardiologist was\nthen asked to label it according to the following key: 1=clearly non-PVC , 2 = ambiguous non-PVC,\n3=ambiguous PVC, 4=clearly PVC. Because the cardiologists made different choices about how\nsome beats should be labeled, one was asked to label an average of 15 beats/record and the other\nroughly 20 beats/record. The whole process took each cardiologist about 90 seconds per record.\nSince the records had not been previously labeled (and it seemed unreasonable to ask our experts\nto label all of them), we used the PVC classi\ufb01cation software from [6] to provide a label to which\n\n7\n\n\fFigure 3: The classi\ufb01ers trained using active learning both labeled the delineated beat delineated as a PVC,\nwhereas the rule-based algorithm labeled it as a non-PVC.\n\nTable 5: Comparison of active earning using two different experts and Hamilton et al. Results are the sum\nacross four records.\n\nAll Records (8230 beats total)\n\nSize Training Data\n\nClassi\ufb01er\nExpert #1\nExpert #2\n\nHamilton et al\n\n60\n83\n0\n\nTP\n191\n192\n190\n\nTN FP\n0\n3\n3\n\n8038\n8035\n8035\n\nFN\n1\n0\n2\n\nwe could compare the labels generated by our method. This gave us three independently generated\nlabels for each beat. When all three classi\ufb01ers agreed, we assumed that the beat was correctly\nclassi\ufb01ed. Out of a possible 8230 disagreements there were only 6. We asked a third expert to\nadjudicate all 6 disagreements, and used this as the gold standard to calculate the results for the\nthree classi\ufb01ers shown in Table 5.\n\n5 Summary & Conclusion\n\nThe goal of this work was to produce a clinically useful technique for automatically classifying\nactivity in ECG recordings. The problem is made challenging by the intra- and inter-patient differ-\nences present in the morphology and timing characteristics of the ECG produced by compromised\ncardiovascular systems and by the variability in the classi\ufb01cation tasks that a clinician might want to\nperform. We propose to address these dif\ufb01culties with a method for using active learning to perform\npatient-adaptive and task-adaptive heartbeat classi\ufb01cation.\nWhen tested on the most widely used benchmark database of cardiologist annotated ECG record-\nings, our method had better performance than other recently proposed methods on the two primary\nclassi\ufb01cation tasks recommended by AAMI. Additionally, our method required over 90% less train-\ning data than the methods to which it was compared. We also showed that our method compares\nfavorably to a state-of-the-art hand coded algorithm for a third common classi\ufb01cation task.\nTo test out the practical applicability of our method, we conducted a small study with two cardiol-\nogists. Both cardiologists were able to use our tool with minimal training, and achieved excellent\nclassi\ufb01cation results with a small amount of labor per record.\nThese preliminary results are highly encouraging, and suggest that active learning can be used prac-\ntically in a clinical setting to not only reduce the labor cost but also garner additional improvements\nin performance. Of course, there is still room for improvement. In all experiments we used identical\ninput parameters; further tuning of these parameters may improve results. However, in a clinical set-\nting parameter tuning is impractical, and thus more work to investigate automated parameter tuning\nis needed. Based on preliminary experiments we believe that by \ufb01rst learning the optimal number\nof initial clusters for each record one can improve performance while decreasing the total number\nof required labels. It may also be possible to further reduce the amount of required expert labor by\nstarting with a global classi\ufb01er and then adapting it using active learning.\n\nAcknowledgments\n\nWe would like to thank Benjamin Scirica, Collin Stultz, and Zeeshan Syed for sharing their expert\nknowledge in cardiology and for their participation in our experiments. This work was supported in\npart by the NSERC and by Quanta Computer Inc.\n\n8\n\n00.40.81.21.622.42.83.23.644.4\u22121\u22120.500.511.5Time (s)mV\fReferences\n[1] D. V. Exner, K. M. Kavanagh, M. P. Slawnych et al, and for the REFINE Investigators. Noninvasive risk\nassessment early after a myocardial infarction: The REFINE study. J Am Coll Cardiol, 50(24):2275\u2013\n2284, 2007.\n\n[2] Z. Syed, B. Scirica, S. Mohanavel, P. Sung, C. Cannon, P. Stone, C. Stultz, and J. V. Guttag. Relation to\ndeath within 90 days of non-st-elevation acute coronary syndromes to variability in electrocardiographic\nmorphology. Am J of Cardiol, 103(3), 2009.\n\n[3] P. de Chazal and R. B. Reilly. A Patient-Adapting Heartbeat Classi\ufb01er Using ECG Morphology and\nHeartbeat Interval Features. Biomedical Engineering, IEEE Transactions on, 53(12):2535\u20132543, Dec.\n2006.\n\n[4] Y. H. Hu, S. Palreddy, and W.J. Tompkins. A Patient-Adaptable ECG Beat Classi\ufb01er Using a Mixture of\n\nExperts Approach. Biomedical Engineering, IEEE Transactions on, 44(9):891\u2013900, Sept. 1997.\n\n[5] T. Ince, S. Kiranyaz, and M. Gabbouj. A generic and robust system for automated patient-speci\ufb01c classi-\n\n\ufb01cation of ecg signals. IEEE Transactions on Biomedical Engineering, 56(5), May 2009.\n\n[6] P. Hamilton. Open Source ECG Analysis. In Computers in Cardiology, volume 29, pages 101\u2013104, 2002.\n[7] J. Bigger, F. Dresdale, and R. Heissenbuttel et. al. Ventricular arrhythmias in ischemic heart disease:\n\nmechanism, prevalence, signi\ufb01cance, and management. Prog Cardiovasc Dis, 19:255, 1977.\n\n[8] T. Smilde, D. van Veldhuisen, and M. van den Berg. Prognostic value of heart rate variability and ven-\ntricular arrhythmias during 13-year follow up in patients with mild to moderate heart failure. Clinical\nResearch in Cardiology, 98(4):233\u2013239, 2009.\n\n[9] A. L. Goldberger, L. A. N. Amaral, and L. Glass et al. PhysioBank, PhysioToolkit, and PhysioNet: Com-\nponents of a new research resource for complex physiologic signals. Circulation, 101(23):e215\u2013e220,\n2000 (June 13). Circulation Electronic Pages: http://circ.ahajournals.org/cgi/content/full/101/23/e215.\n\n[10] P. de Chazal, M. O\u2019Dwyer, R. B. Reilly, and Senior Member. Automatic Classi\ufb01cation of Heartbeats\nUsing ECG Morphology and Heartbeat Interval Features. IEEE Transactions on Biomedical Engineering,\n51:1196\u20131206, 2004.\n\n[11] K. Sternickel. Automatic pattern recognition in ecg time series. In Computer Methods and Programs in\n\nBiomedicine, Vol: 68, pages 109\u2013115, 2002.\n\n[12] Z. Syed, J. Guttag, and C. Stultz. Clustering and Symbolic Analysis of Cardiovascular Signals: Discovery\nand Visualization of Medically Relevant Patterns in Long-term Data Using Limited Prior Knowledge.\nEURASIP Journal on Advances in Signal Processing, 2007:97\u2013112, 2007.\n\n[13] S. Tong and D. Koller. Support vector machine active learning with applications to text classi\ufb01cation.\n\nJournal of Machine Learning Research, 2:45\u201366, 2002.\n\n[14] S. Dasgupta and D. Hsu. Hierarchical sampling for active learning. In ICML \u201908: Proceedings of the 25th\n\ninternational conference on Machine learning, pages 208\u2013215, New York, NY, USA, 2008. ACM.\n\n[15] Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang. Representative sampling for text classi\ufb01cation using support\nvector machines. In Proceedings of the twenty-\ufb01fth European Conference on Information Retrieval, pages\n393\u2013407. Springer, 2003.\n\n[16] H.T. Nguyen and A. Smeulders. Active learning using pre-clustering. In Proceedings of the twenty-\ufb01rst\n\ninternational conference on Machine learning, page 79, New York, NY, USA, 2004. ACM.\n\n[17] J. H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical\n\nAssociation, 58(301):234\u2013244, 1963.\n\n[18] S. Kamvar, D. Klein, and C. Manning.\n\nalgorithms using a model-based approach.\nMachine Learning, pages 283\u2013290, 2002.\n\nInterpreting and extending classical agglomerative clustering\nIn Proceedings of nineteenth International Conference on\n\n[19] T. Joachims. Making Large-scale Support Vector Machine Learning Practical. MIT Press, Cambridge,\n\nMA, USA, 1999.\n\n[20] M. Sokolova, N. Japkowicz, and S. Szpakowicz. Beyond Accuracy, F-score and ROC: a Family of Dis-\ncriminant Measures for Performance Evaluation, volume 4304 of Lecture Notes in Computer Science,\npages 1015\u20131021. Springer Berlin/Heidelberg, 2006.\n\n9\n\n\f", "award": [], "sourceid": 984, "authors": [{"given_name": "Jenna", "family_name": "Wiens", "institution": null}, {"given_name": "John", "family_name": "Guttag", "institution": null}]}