sensors Article LiteNet: Lightweight Neural Network for Detecting Arrhythmias at Resource-Constrained Mobile Devices ZiyangHe1 ID,XiaoqingZhang1,YangjieCao1,2,*,ZhiLiu3,BoZhang2andXiaoyanWang4 1 CollaborativeInnovationCenterforInternetHealthcare,ZhengzhouUniversity,75UniversityNorthRoad, ErqiDistrict,Zhengzhou450000,China;[email protected](Z.H.);[email protected](X.Z.) 2 SchoolofSoftwareEngineering,ZhengzhouUniversity,97CultureRoad,JinshuiDistrict, Zhengzhou450000,China;[email protected] 3 DepartmentofMathematicalandSystemsEngineering,ShizuokaUniversity,5-627, 3-5-1JohokuHamamatsu432-8561,Japan;[email protected] 4 CollegeofEngineering,IbarakiUniversity,4-12-1Nakanarusawa,Hitachi,Ibaraki316-8511,Japan; [email protected] * Correspondence:[email protected];Tel.:+86-371-638-89129 (cid:1)(cid:2)(cid:3)(cid:1)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:1) (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7) Received:1April2018;Accepted:12April2018;Published:17April2018 Abstract: Byrunningapplicationsandservicesclosertotheuser,edgeprocessingprovidesmany advantages,suchasshortresponsetimeandreducednetworktraffic.Deep-learningbasedalgorithms providesignificantlybetterperformancesthantraditionalalgorithmsinmanyfieldsbutdemand moreresources, suchashighercomputationalpowerandmorememory. Hence, designingdeep learningalgorithmsthataremoresuitableforresource-constrainedmobiledevicesisvital. Inthis paper,webuildalightweightneuralnetwork,termedLiteNetwhichusesadeeplearningalgorithm designtodiagnosearrhythmias,asanexampletoshowhowwedesigndeeplearningschemesfor resource-constrainedmobiledevices. Comparetootherdeeplearningmodelswithanequivalent accuracy,LiteNethasseveraladvantages. Itrequireslessmemory,incurslowercomputationalcost, andismorefeasiblefordeploymentonresource-constrainedmobiledevices. Itcanbetrainedfaster thanotherneuralnetworkalgorithmsandrequireslesscommunicationacrossdifferentprocessing units during distributed training. It uses filters of heterogeneous size in a convolutional layer, whichcontributestothegenerationofvariousfeaturemaps. Thealgorithmwastestedusingthe MIT-BIHelectrocardiogram(ECG)arrhythmiadatabase;theresultsshowedthatLiteNetoutperforms comparableschemesindiagnosingarrhythmias,andinitsfeasibilityforuseatthemobiledevices. Keywords: deep learning algorithms; lightweight neural network; resource-constrained mobile devices;electrocardiogram 1. Introduction Cardiovascular diseases (CVDs) are the main cause of mortality in the world. The World Health Organization (WHO) reported that the total number of people who died from CVDs was approximately 17.5 million in 2012, and 17.7 million in 2015, and that the total number of deaths due to CVDs continues to grow every year [1]. Therefore, CVDs pose a great threat to humanhealth. CVDsmainlyconsistofarrhythmias,highbloodpressure,coronaryarterydisease, andcardiomyopathy[2]. Theelectrocardiogram(ECG)isastandardpieceofequipmentfortesting for arrhythmias; however, handling a large number of ECG samples manually is laborious and time-consuming. Acomputer-aidedarrhythmiadiagnosissystem[3]oncarry-onmobiledevicescan automaticallynotifypatientsinreal-time,thusimprovingtheefficiencyofdailyarrhythmiadetection. Inrecentyears,smartmobiledevices,suchassmartwatches,mobilephonesandotherwearable devices, have become more and more popular, resulting in the development of a large number of Sensors2018,18,1229;doi:10.3390/s18041229 www.mdpi.com/journal/sensors Sensors2018,18,1229 2of18 sensors and mobile applications. Nowadays, various mobile devices start to equip ECG sensors for ECG recording. Most mobile devices nowadays are still limited in both computational power andmemorycapacity,andarethusunfitforexistingresource-demandingdataanalysisapproaches. Therefore,inexistingapproaches,mobiledevicesoftensimplycollectandstoreECGdataforanalysis byhumanexpertsatalaterpointintime. Automaticanalysisschemestrytotransmitthesesignalsto aremoteserverforanalysis,whichmayeasilyoverwhelmthelimitednetworkbandwidthofmobile devicesandcauseaprolongeddelay. Therefore,designinganautomaticanalysisalgorithmthatruns onresourceconstrainedmobiledevicesoronthenetworkedge[4–10]canhelpreducebothhuman laborandresponsetime,withvirtuallynonetworkbandwidthconsumption;hence,thedevelopment of this technology is of fundamental importance. In this paper, we address real-time automatic arrhythmia detection by investigating an ECG arrhythmia detection mechanism that runs on the resource-constrainedmobiledevices. Weproposealight-weightedgecomputingalgorithmbasedon deep-learningforsuchtask. Deep-learningbasedclassificationalgorithms[11],whichoftenoutperformtraditionalalgorithms substantially,areemergingandaremoreandmorewidelysupportedbytheinformationtechnology society[12]. Theyhavebeensuccessfullyappliedinmanyfields,suchasimageclassification,speech and natural language processing, scene labeling, etc. Deep-learning based algorithms are useful foridentifyingdifferentwavetypesandthecomplicatedrelationshipsamongthemintimeseries. Furthermore, deep-learning based algorithms outperform hand-made feature extraction methods assembledwithtraditionalclassifiers,andcanachieveequivalentaccuraciesforbothnoise-freeand noisy data. Therefore, numerous deep learning models have been proposed as useful tools for arrhythmiadetectioninECGsignals[13,14]. Thesemodelsrangefromsimplefeed-forwardnetworks andback-propagationneuralnetworkstocomplexnetworkssuchasconvolutionalneuralnetworks (CNNs)andrecurrentneuralnetworks(RNNs).Thecomputationalcomplexityofdeep-learning-based algorithmsinbothtrainingandworkingphasesishighlystronglyrelatedtoalgorithmscale. Toour knowledge,mostdeep-learning-basedalgorithmsmainlyfocusonimprovingarrhythmiadetection accuracy, while paying little attention to algorithm scale or model size reduction (e.g., memory andcomputationalpowerrequirements),leadingtothesemodelsbeingpoorlysuitednotsuitable forforresource-constrainedmobiledevices. Therefore,designingalight-weight,deep-learningbased algorithmforresource-constrainedmobiledeviceswithcomparableaccuracyremainsachallenge. Variousstrategiesarehaverecentlybeenusedtobuildsmallandefficientneuralnetworks. Inspired by GoogleNet [15], SqueezeNet [16] and MobileNets [17], we devised a light-weight, CNN-based neuralnetworkforresource-constrainedmobiledevices,namelyLiteNet,foraccuratelydiagnosing arrhythmiasinreal-time. LiteNetfocusesonbalancingthetradeoffbetweenthemodelsizeandthe accuracyoftheoutput. Insummary,thecontributionsofthispaperareasfollows: • Weproposealight-weightneuralnetworkmodel,namedLiteNet,whichcannotonlybetrained fasterthantraditionaldeeplearningalgorithmsonremoteservers,butwhichisalsodramatically moreresource-friendlywhenworkingonmobiledevices. LiteNetcanthereforebetrainedon low-capacity servers and the trained model can be installed on resource-constrained mobile devicesforarrhythmiadetectionwithlowresourceconsumption. • Filterswithheterogeneoussizesineachconvolutionallayeraredesignedtogetvariousfeature combinations,inordertoachievehighaccuracy. Boththesizesofeachfilterandthetotalnumber offilterscanbeadaptedwithineachconvolutionallayer,whichhelpssubstantiallyinobtaining differentfeaturemapsinaconvolutionallayer. • LiteNetverifiesthatAdamoptimizer[18]canbeusedasastochasticobjectivefunction. Itcan improvetheaccuracyofthemodelcomparedwiththetraditionalgradientdescentoptimizer whilerequiringminimalparametertuninginthetrainingprocess. • We conducted extensive experiments to evaluate the performance of LiteNet in terms of both accuracy and efficiency. Experimental results confirm that LiteNet outperforms recent Sensors2018,18,1229 3of18 state-of-the-artnetworksinthatitachievescomparableorevenhigheraccuracywithmuchhigher resource-efficiency. LiteNetisthuswellsuitableforresource-constrainedmobiledevices. Theremainingofthispaperisorganizedasfollows. Section2discussesrelatedwork. Section3 introducesthealgorithmofLiteNet. TheexperimentsandresultsareelucidatedinSection4. Finally, wepresentourconclusionsinSection5. 2. RelatedWork Wenextbrieflyreviewedrelatedworkinedgecomputingandarrhythmiadetection. Foreach subjectwehighlightthedifferencesbetweenexistingapproachesandLiteNet. 2.1. EdgeProcessing Recently,thenumberofmobileapplicationsthatarerunningonmobiledeviceshasincreased drastically; this has promoted the development of mobile edge computing (MEC), including the connected base station and mobile devices. By running applications and services closer to the user, MEC brings many advantages, such as shorter response time and reduced network traffic. However, compared to the traditional cloud, computational resources such as the CPU cycle and memory,arelimited. Toaddressthissituation,someresearch[19,20]hasfocusedonhowtoperform optimalresourceallocationsonresource-constrainedmobiledevices,andconductedMECinvarious scenarios[21–23]. The edge processing algorithms play a crucial role in providing efficient MEC. Although deep-learningbasedalgorithmsusuallyhaveexcellentperformance,theydemandhighcomputational capability and large memory. Hence, the traditional deep learning algorithms are too heavy for themobiledevices. Intheliterature,therehasbeenanemerginginterestindesigninglight-weight deeplearningmodels. GoogleNetintroducedtheInceptionmodule,showninFigure1a,forbetter featureextractionatlowcomputationalcost. Convolutionkernelsofheterogeneousandsmallsizes (1 × 2, 1 × 3 and 1 × 1) are combined in this module. However, if directly applied to the mobile devices, this module may overwhelm the memory capacity of the mobile devices due to its large featuremaps. Incontrast,theFiremoduleinSqueezeNet,asshowninFigure1b,focusesonreducing the parameter volume by limiting the number of feature maps. SqueezeNet replaces a standard convolution layer with two specially designed layers: squeeze layer and expand layer. Squeeze layersusefiltersofsize1×1andthenumberoffiltersareatleast2×lessthantheprecedinglayer, whileexpandlayersuseamixtureoffiltersofsize1×1and1×3forcertainaccuracyguaranteein aparameter-volume-controlledmanner. Weadaptthe1×1squeezelayerinLiteNetforparameter compression,andtermitassqueezeconvolutionlayerinourwork. InMobileNets,Andrewetal. useda depthwiseseparableconvolution[24]strategythatfactorizesastandardconvolutionintoadepthwise convolutionandapointwiseconvolutionasshowninFigure1c. Depthwiseconvolutionapplieseach convolutionkerneltoeachfeaturemap,whilepointwiseconvolutionisusedtocombinetheoutputof theprecedinglayer. Suchstructurecanthereforeeffectivelyreduceatleast2×computationalload comparingtostandardconvolutions. In this paper, the LiteNet combines the strengths of these modules, to achieve high feature extraction ability while maintaining a low computational cost by parameter volume compression. WedescribeLiteNetindetail,whichapplieslightweightdeep-learningbasedalgorithmstoidentify arrhythmias. Figure2shows,anexampleofthedeeplearningschemeforresource-constrainedmobile devices. AsshowninFigure2,firstly,resource-constrainededgedevicescollectECGdatafromthe users through smart sensors, then the LiteNet model is deployed on resource-constrained mobile devices(mobilephone,SmartwatchandECGmonitor)andusedtodetectdifferentarrhythmiaswith collecteddata,finally,theLiteNetmodelanalyzesECGdataandproducesarrhythmiaidentification resulttotheusersinrealtime. Sensors2018,18,1229 4of18 Figure1.(a)Inceptionmodule;(b)Firemodule;(c)Depthwiseseparableconvolutionthatfactorizesa standardconvolutionintoadepthwiseconvolutionandapointwiseconvolution. Figure2.SystemillustrationofarrhythmiadetectionbasedonECGusingLiteNet. Sensors2018,18,1229 5of18 2.2. ArrhythmiaDetection ArrhythmiadetectionusingECGisanimportantresearchtopicandnumerousalgorithmsinboth traditionalmachinelearninganddeeplearningaspectshavebeendeveloped. With conventional machine learning, the diagnosis of arrhythmias using ECG requires several processes:datapreprocessing,featureextraction,normalizationandclassification.Duetotheexistenceof baselinedriftandECGnoise(e.g.,musclemotion),itisvitalfortraditionalmachinelearningmethods to perform efficient and accurate de-noising operations before feature extraction can be employed. Commonsolutions,suchasthelow-passlinear-phasefilter,high-passlinear-phasefilter,medianfilter, andmeanmedianfilter,areusuallyusedforsuchde-noisingtask.Classicalfeatureextractionapproaches, suchascontinuouswavelettransform(CWT)[25],S-Transform(ST),discreteFouriertransform(DFT), principal component analysis (PCA), Daubechies wavelet (Db4) [26], and independent component analysis(ICA)[27]canthenbeapplied.Researchersin[28]usedthreemachine-learningbasedalgorithms, namely,DiscreteWaveletTransform(DWT)[29],EmpiricalModeDecomposition(EMD)[29]andDiscrete Cosine Transform (DCT) to obtain coefficients from ECG signals. Then, the researchers adopted the LocalityPreservingProjection(LPP)methodtoreducethenumberofthesecoefficientsandusedthe F-value measure to rate the LPP features. Finally, the best coefficients were fed into the K-Nearest Neighbor (KNN) model for arrhythmias diagnosis. The results from the experiments proved that themachine-learning-relatedalgorithmsthattheydevisedachievedexcellentperformance. Similarly, PersonalizedFeaturesSelectionandSupportVectorMachineswereusedtoidentifyarrhythmiasin[30,31], respectively.Theyareabletoextractfeaturesaccuratelyandachievegoodresults.However,traditional machine learning approaches to classify ECG data usually requires complex data preprocessing, thusembeddingthemintomobiledevicesincreaseheavyworkloadonthedevice. In deep learning aspect, Pranav Rajpurkar et al. [13] devised a 34-layer convolutional neural network(CNN)forarrhythmiadetectionwithasingle-leadECGsignals. Theycomparedthemodel performancewiththeperformancesofcardiologists,andshowedthattheproposedmodeloutperformed the cardiologists. One explanation for their excellent results is that they adopted the residual connectionstrategytoalleviatedegradationproblem.YiZhengetal.[32]introducedamulti-channel convolutionalneuralnetwork(MCNN)fordetectingarrhythmiasusingmulti-leadtime-seriesECG signals. They adopted two-lead ECG signals to test the MCNN model and the results from the experiments showed that an accuracy of 94.67% was achieved. Overall, the deep learning schemes proposed for ECG diagnosis exhibit excellent performance. Rajendra Acharya et al. [33] compared theperformanceofCNNfornoisyandnoise-freeECGdatasetsusingapubliclyavailablearrhythmia database.Theresultsfromtheexperimentsshowthatthemodelachievedthesamearrhythmiadetection accuracyfornoisyandnoise-freeECGsignals. Thisprovesthattheremovalofnoiseisnotnecessary indeeplearningalgorithmsforECGdiagnosis. Mostofthecurrentdeeplearningmodelsfocuson improvingaccuracyandoftenresultinginthemodelstoolargetobeembeddedinmobiledevices. 3. MethodofLiteNet Inthissection,wefirstintroducetheOne-DimensionalConvolutionKerneldesignedforLiteNet. We then describe core modules of LiteNet, as well as the overall architecture, followed by the introductionoftheAdamoptimizer,adaptedforthebackpropagationtrainingprocess. 3.1. One-DimensionalConvolutionKernel CNN is a well-known deep learning architecture inspired by the natural visual perception mechanismoflivingcreatures. ClassicCNNconsistsofcascadedconvolutionallayersandpooling layers. Eachconvolutionallayercalculatestheinnerproductofthelinearfilterandtheunderlying receptivefieldofaninputsegment,andappliesanonlinearactivationfunction. Theresultingoutputs arecalledfeaturemaps. ThepoolinglayerisavitalcomponentofCNN.Itreducesthecomputational costbycuttingconnectionsbetweenconvolutionallayers. Sensors2018,18,1229 6of18 However,CNNisdesignedfortwo-dimensionalinputsuchasimagepixels.Foraone-dimensional ECGtime-seriessignals,itsconvolutionkernelneedstobeadapted.Weintroducethemodifiedequation forone-dimensionalconvolutionas: y[n] = x[n]∗h[n], m−1 (1) = ∑ x[k]h[n−k], k=0 wherex[n]istheinputsequenceoflengthm,h[n]isthekernelsequenceandy[n]istheoutputsequence. OurproposedLiteNetadoptsEquation(1)asthekernelfunction. Forexample,thelengthofx[n]is3, thelengthofh[n]is3,sothelengthofoutputis5. Theinputsequenceisx[n]=[x ,x ,x ],thekernel 1 2 3 sequenceish[n]=[h ,h ,h ]. h(−k)istoreversethesequenceofh(k),whileh(n–k)istotranslatethe 1 2 3 h(−k)tonpoints. Theoutputsequencethereforeis, y[0] = x[0]h[0−0]+x[1]h[0−1]+x[2]h[0−2] = h ∗x 1 1 y[1] = x[0]h[1−0]+x[1]h[1−1]+x[2]h[1−2] = h ∗x +h ∗x 1 2 2 1 y[2] = x[0]h[2−0]+x[1]h[2−1]+x[2]h[2−2] = h ∗x +h ∗x +h ∗x (2) 1 3 2 2 3 1 y[3] = x[0]h[3−0]+x[1]h[3−1]+x[2]h[3−2] = h ∗x +h ∗x 2 3 3 2 y[4] = x[0]h[4−0]+x[1]h[4−1]+x[2]h[4−2] = h ∗x 3 3 TheconcreteconvolutionprocessisexplainedinFigure3. Figure3.One-dimensionalconvolutionprocess. 3.2. LiteModule Inthispaper,wedeviseanefficientCNNmicroarchitecture,whichisnamedtheLitemodule, as shown in Figure 4. It constitutes the core layers of LiteNet. The Lite module consists of a 1×1squeezeconvolutionallayerandavariantoftheinceptionmodule.Atthebottom,thesqueeze convolutional layer (green) has filter of size 1 × 1. It is a variant of the inception module and the current design of the modified inception module is restricted to filter sizes of 1 × 1, 1 × 2 and 1 × 3. The key motivation for using a small filter size is reduced computational cost between convolutionallayers. Furthermore,itusestwodifferentconvolutionstrategies: standardconvolutions (blue),anddepthwise/pointwiseconvolutions. Additionally,anoptionalresidualconnection[34]is adoptedintheLitemodule. Thelitemodulehasthefollowingadvantages: • Itcanreducetheparametervolumeefficiently. Itreliesheavilyonasqueezeconvolutionallayer andadepthwiseseparableconvolutionallayer,whichcutsdownontheparametervolume. • Thesingle1×1standardconvolutionallayerisabletoenhanceabstractrepresentationsoflocal featuresandclustercorrelatedfeaturemaps[35]. • Large activation maps can be generated by Lite modules due to postponed down-sampling, whichcontributestothehighaccuracyoftheresults. • TheLitemodulecontainsfiltersofheterogeneoussize,whichfacilitatestheexplorationofdifferent featuremapsforkeyfeatureextraction. • Theoptionalresidualconnectioncaneliminatetheeffectofthegradientvanishingproblemin deepnetworks. Sensors2018,18,1229 7of18 Figure4.Litemodule. 3.3. LiteNetArchitecture NowthattheLitemodulehasbeenintroduced,wedescribetheLiteNetarchitectureindetail. LiteNet is built from Lite module layers, which represent an efficient convolution module design approach, as described above. They can reduce sharply both the parameter volume and the computationalcost. AsillustratedinFigure5,abasicLiteNetmodelconsistsofasingleLitemodule, whichweuseinthisstudy,asshowninFigure5a. AnextendedLiteNetmodelcontainsastackofLite modules,asshowninFigure5b. Figure5.LiteNetArchitecture:(a)basicLiteNetarchitecture;(b)extendedLiteNetarchitecture. Sensors2018,18,1229 8of18 LiteNettakesasinputatimeseriesofECGsignals. Thebasicnetworkbeginswithastandard convolutionallayer,followedbyoneLitemodule,twofullyconnectedlayers(dense)andasoftmax layer. For multi-class problems in deep learning, it is standard to use softmax as a classifier [36]. Thisstudy,arrhythmiasdetectionhas5possibleclasses,thesoftmaxlayerhas5unitsrepresentedby p,wherei=1,... ,5. p denotesaprobabilitydistribution.Therefore i i 5 ∑ p =1 (3) i i xistheoutputoftheupper-layerunit,Wistheweightconnectingupper-layertothesoftmaxlayer, thetotalinputintosoftmaxlayer,givenbyz,is ∑ z = x W (4) i k ki k Thesoftmaxlayercalculatesthefinallikelihoodofeachoutputandiscomputedasfollows: exp(z ) p = i (5) i ∑5exp(z ) j j Thepredictedclassiˆwouldbe iˆ =argmaxp i (6) =argmaxz i Furthermore,weusecross-entropyfunction(lossfunction)todeterminehowclosetheactual outputistotheexpectedoutput. Thesmallerthevalueofcross-entropy,thecloserthetwoprobability distributionsare: ∑ H(p,q) = − p(x)logq(x) (7) x H(p,q)iscross-entropy,pandqrepresentexpectedoutputandactualoutput,respectively. p(x)and q(x)representtheirprobabilitydistribution. Forexample,N=3,expectedoutputp=(p ,p ,p ),actual 1 2 3 outputm=(m ,m ,m ),n=(n ,n ,n ),Therefore 1 2 3 1 2 3 H(p,m) = −(p ∗logm1+p ∗logm2+p ∗logm3) = k 1 2 3 1 (8) H(p,n) = −(p ∗logn1+p ∗logn2+p ∗logn3) = k 1 2 3 2 ifk islessthank ,k closertoexpectedoutput,otherwisek closertoexpectedoutput. Butatrealrun 1 2 1 2 time,theseareallM*Nmatrices. MrepresentsthenumberofbatchandNrepresentsthenumberof classification. Allinall,thesmallercross-entropy,thebetterclassificationresultperformancegets. Foraspecifiedaccuracy,weapplyseveralactivationfunctions,e.g.,tanh,sigmoid,rectifierlinear unit, and leaky rectifier linear unit (LeakyRelu) [37], to basic LiteNet and find that the LeakyRelu activationfunctionoutperformstheotheractivationfunctions. Therefore,LeakyReluisusedasthe activation function for both convolutional layers and dense layers. The number of filters per Lite modulecanbeadapted,dependingoneitherthesizeoftheinputortheaccuracy. Table1summariesthestructureofbasicLiteNet. Thekeycharacteristicsofdifferentlayersin basicLiteNetaredetailedasfollows: • Standardconvolutionallayer: MostCNN-basedarchitecturesbeginwithfewfeaturemapsand largefiltersize. Wealsousethisdesignstrategyinthispaper. Astandardconvolutionallayerhas fivefiltersandconvolveswithafiltersizeof1×5. • Max-poolinglayer: LiteNetperformstwomax-poolingoperationswithastrideoftwoaftera standardconvolutionallayerandLitemodulelayer. Themax-poolingoperationcanlowerthe computationalcostbetweenconvolutionallayers. Sensors2018,18,1229 9of18 • Litemodulelayer: Theuseofasmallfiltersizecanreducethecomputationalcostandenhance theabstractrepresentationsoflocalfeaturesinaheterogeneousconvolutionallayer. TheLite modulehasfiltersizesof1×1,1×2and1×3forthispurpose,asshowninFigure4,andthe featuremapsettingsoftheLitemodulelayerarelistedinTable1. • Fullyconnectedlayers: TwofullyconnectedlayersareusedinbasicLiteNetandinmostdeep learning architectures. The first and second layers consist of 30 and 20 units, respectively, whichcanyieldexpectedclassificationresultperformance. • Dropout[38]layer: Totackletheoverfittingproblem,thedropouttechniqueisadopted. Webuild adropoutlayerafterthetwodenselayersandsetthedropoutrateat0.3. • Softmaxlayer: Thesoftmaxlayerhasfiveunits. Thesoftmaxfunctionisusedasaclassifierto predictfiveclasses. Table1.SummaryofthebasicLiteNetmodelforthiswork. Layer KernelSize Stride No.ofFilters StandardConv. 1×5 1 5 Max-Pooling 1×2 2 5 SqueezeConv. 1×1 1 3 1×1 1 6 StandardConv. 1×2 1 6 1×3 1 6 LiteModule 1×2 1 6 DepthwiseConv. 1×3 1 6 1×1 1 6 PointwiseConv. 1×1 1 6 Max-Pooling 1×2 2 18 Dense 30 Dense 20 3.4. AdamOptimizer ThetrainingprocessofLiteNetiscarriedoutbyabackpropagation[39]approach. Stochastic gradientdescentoptimizationisofvitalpracticalimportanceinthebackpropagationtrainingprocess fordeeplearningmodels. Conventionalstochasticgradientdescentalgorithmsusuallyrequirespecial tuning tricks and large memory during the training process. It is time-consuming, laborious and difficulttosetoptimalhyper-parametersfordeeplearningmodels. Toinitializehyper-parameters easily with little tuning during the training process, we adopt the Adam optimizer [22], which is a first-order gradient-based descent optimizer of stochastic objective functions that is based on adaptive estimates of lower-order moments and computes individual adaptive learning rates for differenthyper-parametersfromestimatesoffirstandsecondmomentsofthegradients. Itisvery easytoimplementandcomputationallyefficient. Thisapproachcanalsoimprovetheclassification performancecomparedwithclassicgradientdescentalgorithmsandrequireslittlememoryandlittle tuninginthetrainingprocess. 4. ExperimentsandResults 4.1. DatasetandDataProcessing According to the Association for the Advancement of Medical Instrumentation (AAMI) [40], non-life-threateningarrhythmiascanbedividedinto5mainclasses:N(normalbeat),S(supraventricular Sensors2018,18,1229 10of18 ectopic beat), V (ventricular ectopic beat), F (fusion beat), and Q (unknown). In this paper, we used the datasets from the MIT-BIH Arrhythmia database [41]. This database consists of 48half-h-longECGrecordingsofLeadIIECGsignalsatasamplerateof360Hzfrom48subjects. Furthermore,theserecordingswereannotatedbyatleasttwocardiologists. 109,449ECGsamplesare extractedfromthisdatabase. Thenumbersoffiveclassessamplesare90,952,2781,8039,7235and802, respectively. TwodatasetsareextractedfromtheMIT-BIHArrhythmiadatabase: datasetA(setA) anddatasetB(setB).Sinceasingleheartbeatisnormallybetween1and2s,wesplittheECGdata into1-speriodstogeneratesetA,andgeneratesetBwith2-s-periodsplit. Wetesttheperformanceof LiteNetbybothsets. Toperformsuchsplit,wefirstrecognizeeachR-peak,andtakeuniform-length signalsectionsbothproceedingandfollowingR-peak. 180samplesand360samplesperiodonboth sidesofanR-peakareselectedforadatapieceofsetAandsetBrespectively. Therefore,thelengthof eachECGsampleofsetAandsetBare360and720respectively. Bothdatasetsconsistoforiginaldata (includingnoise)tosimulatetherealarrhythmiadetection. LiteNetisutilizedtoautomaticallyandefficientlydetectthefiveclassesofarrhythmiasonthe wearablemedicalequipmentwithlessoverhead. BothsetAandsetBwillbeusedfortrainingand assessingLiteNet. However,thefiveclassesareimbalancedinbothsetAandsetB.Toovercomethe imbalanceoftwodatasetsinthefiveclasses,adatasynthesisstrategyisused. Weusethestarting pointoftranslationandplus-noisestrategiestosynthetizedata. Afteraugmentation,thenumberof samplesineachclassis93,000andthetotalnumberofECGsegmentsthatincludefiveclasseshas beenincreasedto465,000. EachECGsegmentisnormalizedbyusingtheZ-scoremethodtosolvethe problemofamplitudescalingandeliminatetheoffseteffect. Furthermore,weusedtheten-foldcross-validationmethodwhichisacommonmeasureinboth traditionalmachinelearninganddeeplearningtogetsomeevaluationsondata. TheECGdatasets weredividedinto10independentfolds.Thenumberofsamplesineachfoldisequal.Eachtime,9folds areusedfortrainingandtheremainingfoldisusedasthetestingdataset,withnodataintersection. This is repeated ten times at the same learning rate and return average evaluations of ten results as shown in Figure 6. Each time, the training dataset and testing dataset are different in ten-fold cross-validation,andcanbesummarizedasfollows: D = D (cid:83)D (cid:83)...(cid:83)D , 1 2 10 (9) D (cid:84)D =∅(i (cid:54)= j) i j DrepresentsthetotalECGdata,D andD representindependentsubset. i j Figure6.Ten-foldcross-validation.
Description: