1 ECG beats classification using waveform similarity and RR interval Ahmad Khoureich Ka Abstract—This paper present an electrocardiogram (ECG) of each patient’s electrocardiogram. Therefore the method beatclassification methodbasedonwaveform similarityandRR proposed in this paper is a patient-specific classifier. Wavalet 1 interval. The purpose of the method is to classify six types of transformhasbeensuccessfully usedin the processingofnon 1 heart beats (normal beat, atrial premature beat, paced beat, 0 premature ventricular beat, left bundle branch block beat and stationary signals like electrocardiograms[15]. Thereforeit is 2 rightbundlebranchblockbeat).Theelectrocardiogramsignalis employed in this study for ECG signal denoising and beat first denoised using wavelet transform based techniques. Heart length reduction. A beats database containing five classes of n a beats of 128 samples data centered on the R peak are extracted annotated beats is created for the classifier. And each time a J from the ECG signal and thence reduced to 16 samples data to patient’sECG beatshaveto be classified the fivefirst minutes constitute a feature. RR intervals surrounding the beat are also 0 manually annotated beats of the ECG record is integrated exploited as feature. A database of annotated beats is built for 1 theclassifierforwaveformcomparisontounknownbeats.Tested into that beats database. So the classifier experience grows on46 recordsintheMIT/BIHarrhythmiadatabase,themethod up each time an ECG record is submited to it for automatic ] M shows classification rate of 97.52%. annotation. The beats of the patient’s ECG are first clustered Index Terms—ECG beat classification, RR interval, wavelet by similarity in the shape of their waveform and then each Q transform, patient adaptation. cluster is classified by consideringthe greater similarity of its . o elements to the beats in the classifier beats database. i b I. INTRODUCTION - q WITH the introduction of the string galvanometer by II. ECG SIGNAL PRE-PROCESSING [ Willem Einthoven the electrocardiogram (ECG) has The purpose of this work is to classify six types of heart become one of the most important tools in the diagnosis of 1 beats which are : Normal beats (N), Premature Ventricular v heart deseases. The ECG is the graphicaldisplay of electrical Contractions(PVC), Paced Beats (PB), Atrial Premature Beat 6 activity of the heart recorded from electrodes on the body (APB), Left Bundle Branch Block beats (LBBB) and Right 3 surface [1]. From the plot of an ECG a cardiologist can 8 Bundle Branch Block beats(RBBB). A description of these analyse the shape of the waveform and determine the nature 1 arrhythmias can be found at [1]. Forty six (46) ECG signals ofdeseasesaffictingtheheart.TheabnormalbeatsintheECG . recordedwiththeMason-LikarIIlead(MLII)aretakenfrom 1 pointing to a particuliar desease can be rare and widespread 0 theMIT/BIHarrhythmiadatabaseforthecreationofthebeats in the span of a large record. Therefore, the work of the 1 database and the evaluation of the classifier. 1 cardiologisttrackingdownabnormalitiescanbetedious.Thus The ECG records are generaly noisy, they present a base- : it becomes very helpful to use computer-based diagnosis. v line wander and high frequency noise. Techniques based Besides the fact the ECG record can be noisy, the main i on discrete wavelet transform are used to overcome these X problem in computer based classification is the wide variety disturbances. r in the shape of beatsbelongingto the same class and beatsof a similarshapebelongingtodifferentsclasses[2],[3].Therefore the algorithms in computer-based diagnosis are generaly of A. Discrete Wavelet Transform three steps: EGC beat detection, extraction of usefull features The Discrete Wavelet Transform is mainly based on the from beats and classification. For beat detection a number multiresolution analysis of the wavelet transform introduced of methods are available in the literature [4], [5]. Feature by S. Mallat [16]. With the discrete wavelet transform any extraction can be done in time domain [6], in frequency domain [7], by multiscale decomposition [8], by multifractal function f ∈ L2(R) can be uniquely represented in terms of analysis [9] or by statistical means [2]. The classification an L2-convergentseries: can be performed by neural networks [10], [11], mixture of ∞ experts [12], switchable scheme [13]. x(t)= α φ (t)+ β ψ (t), (1) j0k j0k jk jk Although statistical methods of ECG beats recognition [2], Xk jX=j0Xk [13], [14], reports good recognition accuracy, we believe as where {φ } is an orthonormal system from the scaling in the work of Yu Hen Hu [12] that a good approach in j0k function and {ψ } an orthonormal system from the mother ECG beat classification is to take into account specificities jk wavelet, and A. K. Ka: Institut de Recherche Mathe´matique de Rennes. Universite´ de Rennes 1 - Campus de Beaulieu 35042 Rennes Cedex - France. Universite´ α = x(t)φ (t)dt, β = x(t)ψ (t)dt, CheikhAntaDiopdeDakar.e-mail:[email protected]. j0k Z j0k jk Z jk 2 arethewaveletcoefficients.Thecascadealgorithm[16]allows RR interval will be very small compared to the mean RR thecomputationofthelowerlevelcoefficientsfromthehigher of none premature beats in the ECG signal. The distinction level ones and vice versa: between PVC and APB is that usually a PVC is followed by a compensatory pause i.e., the RR interval between two αjk = hl−2kαj+1,l and βjk = λl−2kαj+1,l (2) QRS enclosing PVC equals twice the normal RR interval; in X X l l contrast,APBsareusuallyfollowedbynocompensatorypause where λk = (−1)k+1h1−k, j and k are integers and {hk} i.e., the RR interval between two QRS enclosing APB is less are the mother wavelet coefficients. The index j indicates the than twice the normal RR interval [1]. resolution level of the multiresolution analysis. The inverse Finally RR ,RR (respectivelythe RR interval before and b a transformation is given by the following equation: after the R peak) along with RR the mean RR of no m premature beats are considered as features in this study. αj+1,l = hl−2kαjk+ λl−2kβjk (3) TheRpeaksaretakenfromtheannotationfilesoftheMIT- X X k k BIH arrhythmia database. From the ECG record, beats are extracted using 128 samples centered at the R peak. After B. ECG signal noise reduction techniques that,thebeatsizeisreducedto16samplesbydiscretewavelet ECGrecordsaregeneralycorruptedbynoisefromdifferent transformasfollows:withthemultiresolutionapprochthe128 sources. The noise can appear as a baseline wander and/or a samples beats at resolution level j0 is denoted by high frequency oscillation along the signal. 2j0−1 Inthiswork,forbaselinewandercancellationwehaveused themethodproposedin[17]foritsabilitytoeliminatebaseline b128(t)= αj0kφj0k(t), βj0k =0, (4) X drift without including distortion in the signal. On the other k=0 handwe havedeltwiththe highoscillationnoisebyusingthe its decomposition to a resolution level j0−3 is soft thresholding technique proposed by D. Donoho [18]. 2j0−3−1 j0 2j−1 We have made a java implementation of these two noise reduction techniques and run the program on a noisy ECG b128(t)= αj0−3,kφj0−3,k(t)+ βjkψjk(t), Xk=0 j=Xj0−3 kX=0 signal. The result is shown in figure 1. (5) (a) φ and ψ are scale functions and wavelet functions at the 1500 corresponding resolution level. 1400 1300 The beats at resolution level j0 can be characterized by 1200 2j0 = 128 samples per length unit, then j0 =7, thus the last 1100 equation becomes 1000 900 24−1 7 2j−1 800 0 1000 2000 3000 4000 5000 6000 b128(t)= α4,kφ4,k(t)+ βjkψjk(t) (6) (b) Xk=0 Xj=4 kX=0 350 300 Bysettingallthedetailcoefficientsβ tozeroweconsider jk 250 200 the reduced beat 150 100 24−1 50 0 b(t)= α4,kφ4,k(t) (7) -50 Xk=0 -100 0 1000 2000 3000 4000 5000 6000 asafeatureinourmethod.Thereductionofthebeatsizefrom Fig. 1. Noise cancellation result on record number 113, MLII, from MIT- 128to16samplessaveshardwarememoryandacceleratesthe ArrhythmiaDatabase: (a)Original signal(b)Denoisedsignal. processing speed. III. PROPOSED METHOD B. Classification A. Feature extraction LetAbetheorthonormalfamilyofscalingfunctions{φ4,k} at resolution level 24. From equation (7) we see that A It is known that the morphology of the QRS complex along with the instantanuous RR interval (interval between generatesavectorspaceB⊂L2(R)containingallthebeatsin two successives R peaks) play an important role in the heart our study. Therefore the scalar product in L2(R) can also be definedforeverytwovectorsofB:|Ψi= Ψ |φi and deseasesdiagnosis[13],[14].FortheclassificationofanECG φ∈A φ |Ψ′i = Ψ′|φi by hΨ|Ψ′i = P Ψ Ψ′, where beat the ratio of the RR interval before it to the one after φ∈A φ φ∈A φ φ it is a usefull feature. For beats such as N, LBBB, RBBB Ψφ denoPte the complex conjugate of ΨPφ (it coincides with and PB this ratio is near or equal to 1 and for beats in the Ψφ if it is real). We can define a similarity function on B as classe of APB and PVC this ratio is rather less than 1. But follow: sometimesithappensthatPVCsorAPBsoccuringroups[1], hΨ|Ψ′i σ: B×B→[0,1], σ(Ψ,Ψ′)= (8) in such case the ratio can be near or equal to 1 but the kΨk·kΨ′k 3 k·k denotes a Hilbert norm on B. Applied on two beats the (RBBB), artial premature beat (APB), premature ventricular similarity function indicates the degree of proximity in their contraction (PVC) and paced beat (PB) are considered. The shape. Such similarity functionshave been used previously in RR intervals are calculated from the position of the R peak literature in many different contexts, see [19] for a recent up documented in the annotation files of the MIT/BIH database. to date review. The mean RR (RR ) used in the identification of atrial m As stated in [20] we believe that computer analysis cannot premature beats (APB) is calculated using the 5 first minutes substitute physician’s interpretation of ECG. Therefore our of the signal. For the clustering of beats before classification, classifier use a database of known beats (reduced length we consider that two beats belong to the same cluster if their version of the original 128 samples beats) taken within the similarity is greater or equals to 0.95. five first minutes of each record in the MIT/BIH arrhythmia database. The classifier beat database contains five classes of TABLEI CLASSIFICATIONRESULTS beats which are N, LBBB, RBBB, PB and PVC. The APB beat type is not present in that database because APBs can Record# #Beat #Misclassification Classif. rate(%) be similar to N, LBBB and RBBB, therefore its identification 100 1892 0 100 101 1514 2 99.87 is mainly based on the ratio of the previous RR interval to 103 1721 5 99.71 the following one (RRb/RRa). The beats database acts as 105 2141 71 96.68 the classifier knowledge. It grows up each time a patient’s 106 1688 52 96.92 ECG record is treated because the manually annotated beats 107 1776 11 99.38 108 1468 53 96.39 in the five first minutes are included in it. This practice is 109 2089 11 99.47 conformed to the AAMI (American Association of Medical 111 1768 41 97.68 Instrumentation) recommended procedure which allows the 112 2101 2 99.90 113 1498 3 99.80 use of the first 5 minutes of data in an ECG record to fine 114 1590 44 97.23 tune the classifier [12]. 115 1628 1 99.94 After the denoising step, the classification of a patient’s 116 2007 23 98.85 ECG record is done with the following steps: 117 1277 3 99.77 118 1907 42 97.80 (1) Beats are extracted using 128 samples data centered on 119 1653 0 100 the R peaks. For each beat the RR interval before (RRb) and 121 1551 5 99.68 after (RR ) its R peak are taken. Afterwards beat length are 122 2044 0 100 a 123 1262 0 100 reducedto16samplesusingdiscretewaveletdecompositionas 124 1334 12 99.10 described in the feature extraction section. The resulted beat 200 2156 76 96.47 is normalized to reduce waveform differencies in the same 201 1408 44 96.88 class by subtracting the mean value and then dividing by the 202 1860 63 96.61 203 2467 592 76 standard deviation [13]. 205 2180 6 99.72 (2) The beats within the 5 first minutes of the ECG are 207 1475 21 98.58 included in the classifier beat database. And the mean RR 208 2122 19 99.10 209 2509 109 95.66 (RR ) of beats in those 5 minutes is taken as feature. m 210 2164 57 97.37 (3) The beats in the 25 minutes remaining are first hierar- 212 2275 7 99.69 chical clustered by similatity before classification. 213 2450 40 98.35 (4) The class of a beat is identified using its highest 214 1866 86 95.39 215 2780 104 96.26 similarity to beats in the classifier database. 217 1608 12 99.25 (5) If the class is found to be N or LBBB or RBBB, the 219 1764 44 97.51 ratio RR /RR is calculated. 220 1685 36 97.86 b a 221 2011 2 99.90 If RRb/RRa < (1−ǫ1) ou (RRa +RRb) < (2RRm − 222 1892 279 85.25 ǫ2) then the class is changed to APB otherwise the class is 223 2166 265 87.77 unchanged.Theoptimalǫ1 andǫ2 valuewillbeidentifiedwith 228 1695 29 98.29 experiments. 230 1849 2 99.89 231 1270 12 99.06 (6)Theclassofaclusteristheclassofitselementwhichhas 232 1478 16 98.92 the highestsimilarity to a beatin the classifier ECG database. 233 2543 44 98.27 234 2231 0 100 Average 97.52 IV. RESULTSAND DISCUSSION The classifier was tested with the 25 minutes of data from The performance of the classifier is evaluated in terms each30minutesECGsignalsrecordedwiththeMason-Likar of accuracy rate by ECG signal and overall accuracy. The II lead (MLII) in the MIT/BIH arrhythmia database. The 5 classification results are summarized in Table 1. firstminutesofthoseECGsignalswereusedtobuildthebeats Theoverallclassificationresultisfoundtobe97.52%.This databaseusedbytheclassifier.IneachECGrecordusedinthis is a good recognition accuracy in regard to the fact exposed study only beats in the class of normal beat (N), left bundle in[21]thatthepercentageofECGscorrectlyclassified bythe branch block beat (LBBB), right bundle branch block beat computer programs have a median of 91.3%. 4 Insomerecords(203,222and223)theclassificationrateis REFERENCES rather low. An examination of these records indicates a high [1] F. G. Yanowitz, ”THE ALAN E. LINDSAY ECG TUTORIAL V6.0”, variation of length of the RR intervals around normal beats. University ofUtahSchoolofMedicine (July2007). This variations causes normal beats to be classified as APBs. [2] S.Osowski,T.H.Linh,“ECGbeatrecognitionusingfuzzyhybridneural network,” IEEETrans.Biomed. Eng.(2001)48(11),1265-1271. Another fact to take in consideration is that misclassification [3] L.Shyu,W.Hu,“Intelligent HybridMethodsforECGClassification-A can result from errors in the position of R peaks provided by Review,” Journal of Medical and Biological Engineering, 28(1): 1-10 the manually annotated files in the MIT-BIH database. Our (2007). [4] V. X. Afonso, W. J. Tompkins, T. Q. Nguyen, and S. Luo, “ECG similarity function can give bad results for two beats in the beat detection using filter banks”. IEEE Transactions on Biomedical same class when their R peak are misaligned at the top of Engineering, (1999)46(2),192202. theirQRScomplex.Wearecurrentlydeveloppingapromising [5] B.U.Kohler,C.HennigandR.Orglmeister,“Theprinciplesofsoftware QRSdetection,” IEEEEngineeringinMedicineandBiology,21:42-57, method for the detection of the summit of R peaks for better 2002. alignment when comparing two beats. [6] P. De Chazal, and R. B. Reilly, “Automatic classification of ECG A little difficulty in the application of our method can be beats using waveform shape and heart beat interval features,” IEEE International Conference on Acoustic, Speech and Signal Processing the creation of the beats database used by the classifier. We (ICASSP03),HongKong,China(2003)Vol.2,pp.269-272. think, it is possible to find a way to reduce the 5 minutes [7] U. R. Acharya, A. Kumar, P. S. Bhat, C. M. Lim, S. S. Lyengar, N. a cardiologist have to annotate. But we stay convinced as Kannathal,etal,“Classificationofcardiacabnormalitiesusingheartrate signals,”MedicalandBiologicalEngineeringandComputing,(2004)42, in [21] that Computer-based interpretation of the ECG is an 288-293. adjunct to the electrocardiographer, and all computer-based [8] G. K. Prasad, and J. S. Sahambi, “Classification of ECG arrhythmias reports require physician overreading. usingmulti-resolution analysis andneuralnetworks,”IEEEConference onConvergent Technologies (Tecon2003), Bangalore, India Vol. 1,pp. A comparison of our method to other works in the field of 227-231. automatic ECG beat classification is summerized in table 2. [9] P. Ivanov, M. QDY, R. Bartsch, et al, “Levels of complexity in scale- But actually, efficency comparison of methods is not straigh- invariant neuralsignals,”PhysicalReview E79,041920(2009). [10] Y.zbaya,R.Ceylana,B.Karlikb,“Afuzzyclusteringneuralnetworkar- forward due to the difference in the test conditions (type of chitectureforclassificationofECGarrhythmias,”ComputersinBiology beats to classify and number of beats used for test). andMedicine 36(2006)376 388. [11] S. Osowski, T. Markiewicz, L. Tran Hoai, “Recognition and classi- fication system of arrhythmia using ensemble of neural networks,” TABLEII Measurement 41(2008)610-617. COMPARATIVERESULTSOFDIFFERENTECGBEATCLASSIFICATION [12] Y.H.Hu,S.Palreddy, andW.J.Tompkins,“APatient-adaptable ECG METHODS beatclassifierusingamixtureofexpertsapproach,” IEEETransactions onBiomedical Engineering, (1997)44(9),891-900. Method Numberofbeattype Accuracy(%) [13] S.-NienYu,K.-ToChou, “A switchable scheme forECGbeat classifi- ICA[13] 6 99.51 cationbasedonindependentcomponentanalysis,”ExpertSystemswith FTNN[22] 3 98 Applications 33(2007)824-829. MOE[12] 4 94 [14] M.Engin,“ECGbeatclassificationusingneuro-fuzzynetwork,”Pattern MRANN[8] 13 96.79 Recognition Letters 25(2004)1715-1722. FHNN[2] 7 96.6 [15] L.Senhadji, G.Carrault, J.J.BellangerandG.Passariello, “Comparing Ourmethod 6 97.52 Wavelet TransformsforRecognizing Cardiac Patterns. IEEEEngineer- inginMedicine andBiology,” 14:167-173,1995. [16] S. Mallat, “A theory for multiresolution signal decomposition: the It is interesting to note that some methods published in the wavelet representation,” IEEE Pattern Analysis. and Machine Intelli- literature are not tested on a large number of beats as we gence.,1989,vol.11,no.7,pp.674-693. [17] A.Khawaja.Thesis:“AutomaticECGAnalysisusingPrincipalCompo- do.For exemplein theIndependentComponentAnalysis[13] nent Analysis and Wavelet Transformation,” Vol. 3Karlsruhe Transac- method, authors have used per ECG signal 100 beats for tionsonBiomedical Engeneering(2007). training and 100 beats for testing even if the signal contains [18] D.Donoho,I.Johnstone,“Adaptingtounknownsmoothnessviawavelet shrinkage,” J.ASA,1995,vol.90,pp.1200-1223. more than 2000 beats. Since the morphology of beats of the [19] T. Sierocinski. Thesis: “Me´thodes probabilistes, floues et quantiques same type not only changes from patient to patient but also pourl’extraction del’information biologique,” Universite´ deRennes 1, withinthesamepatient[3],thenumberofbeatsfortestingcan IRMAR,Ecoledoctoral MATISSE(2008). [20] A. H. Kadish, “ACC/AHA Competence Statement on Electrocardiog- impact on the recognition accuracy. In the table 1 if we get raphy and Ambulatory Electrocardiography,” Journal of The American ridoftherecords203,222and223whichgivebadresultsthe HeartAssociation, Circulation 2001;104;3169-3178. overall classification rate of our method increases from 97.52 [21] P. Kligfield, “Standardization and Interpretation of the ECG, Part I,” JournalofTheAmericanHeartAssociation,Circulation2007;115;1306- to 98.53%. 1324. [22] K. Minami, H. Nakajima, T. Toyoshima, “Real-time discrimination ofventricular tachyarrhythmia with Fourier-transform neural network,” V. CONCLUSION IEEETransactions onBiomedical Engineering, 46(2),179185. In this paper, we present a patient-adaptable ECG beat classification method based on a similarity function and a beats database which acts as the classifier knowlegde. Dis- crete wavelet transform is also used for the ECG signal preprocessing. The method uses a simple approach and runs with low processing cost in comparison with those using neural networks or fuzzy logic. A promising accuracy in the classification of six types of heart beats has been reached.