ebook img

Ancestral Dynamic Voting Algorithm for Mutual Exclusion in Partitioned Distributed Systems PDF

16 Pages·2015·1.64 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Ancestral Dynamic Voting Algorithm for Mutual Exclusion in Partitioned Distributed Systems

HindawiPublishingCorporation InternationalJournalofDistributedSensorNetworks Volume2013,ArticleID120308,15pages http://dx.doi.org/10.1155/2013/120308 Research Article Ancestral Dynamic Voting Algorithm for Mutual Exclusion in Partitioned Distributed Systems FaranehZarafshan,1AbbasKarimi,2S.A.R.Al-Haddad,1 M.IqbalSaripan,1andShamalaSubramaniam3 1DepartmentofComputerSystemsEngineering,FacultyofEngineering,UniversitiPutraMalaysia,43400Serdang,Selangor,Malaysia 2DepartmentofComputerEngineering,FacultyofEngineering,ArakBranch,IslamicAzadUniversity,Arak38453,Iran 3DepartmentofCommunicationTechnology&Networks,FacultyofComputerScienceandIT,UniversitiPutraMalaysia, 43400Serdang,Selangor,Malaysia CorrespondenceshouldbeaddressedtoFaranehZarafshan;[email protected] Received1April2013;Revised29July2013;Accepted4September2013 AcademicEditor:ChaseQishiWu Copyright©2013FaranehZarafshanetal. This is an open access article distributed under the Creative Commons Attribution License,whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalworkisproperly cited. Datareplicationisaknownredundancyusedinfault-tolerantdistributedsystem.However,ithastheproblemofmutualexclusion ofreplicateddata.Mutualexclusionbecomesdifficultwhenadistributedsystemispartitionedintotwoormoreisolatedgroupsof sites.Inthisstudy,anewdynamicalgorithmispresentedasasolutionformutualexclusioninpartitioneddistributedsystems. The correctness of the algorithm is proven, and simulation is utilized for availability analysis. Simulations show that the new algorithm,ancestraldynamicvotingalgorithm,improvestheavailabilityandlifetimeofserviceinfaultyenvironments,regardless ofthenumberofsitesandtopologyofthesystem.Thisalgorithmalsoprolongsthelifetimeofservicetomutualexclusionfor fullandpartialtopologiesespeciallyforthesituationswherethereisnomajority.Furthermore,itneedslessnumberofmessages transmitted.Finally,itissimpleandeasytoimplement. 1.Introduction class,aparticularmessageknownastokencirculatesinthe distributedsystem[5]andonlythesitethathasthetokenis In distributed systems, the redundancy of data or data permittedtoenterthecriticalsection(CS).Theotherclass, replication is a well-known approach to achieve fault tol- permission-based algorithms, utilizes the permission from erance [1], increase availability of data base [2], decrease multiple(usuallythemajority)sitessothatthefailureofone the response time of service and communication costs [3], site(ormore,dependingonthealgorithm)istolerable. and to share loads by distributing the computational loads Permission-based algorithms are categorized into cote- over multiple sites rather than imposing them to a singular rie-based and voting-based. Coterie is a set of sites which site [4]. Besides these advantages, data replication allows require permission to commit an update. Every site in the distributed systems to have concurrent access to sites for coteriemustissuepermissionsothatasitecanenteritsCS. common data. If multiple sites simultaneously modify the However,votingreliesonpermissionfromapredetermined commondata,therearedifferentcontentsofthesamedatain number of sites regardless of which site agrees on commit- thedistributedsystemleadingtodatainconsistency.Mutual exclusion(MutEx)aimstopreventdatainconsistencywhen ment. multiple sites are persuaded to access the same data at the Nowadays, distributed systems face new requirements sametime. as they migrate to large scale applications. Social networks, SolutionsfordistributedMutExarecategorizedintotwo sensor networks, and Internet of Things are instances of main classes [5] which are token-based algorithms [6–9] applicationsfordistributedsystemsinwhichlimitationson and permission-based algorithms [5, 10–16]. In the first thenumberofsitesortypesoftopologyarenotacceptable. 2 InternationalJournalofDistributedSensorNetworks Votinghasahigherdegreeofreplication,withnolimitations atleast𝑟votes(forread)or𝑤votes(forwrite)areobtained onthenumberofsitesandnoneedforcomplicatedcomputa- from one or more sites [2]. Weighted voting commits an tions.Thereisahigherabilitytofacedynamicchangesinthe updateifthecumulativevotesofagreedsitesareatleastthe system[17]andeasierimplementationforlargescalesystems. majority.ItthenguaranteesMutExevenifnomajorityexists. However, coteries are not suitable for large number of sites In order to guarantee serial consistency, weighted voting becausethenumberofcoteriesbecomeslargewith5sitesor utilizeslocks.Therefore,thenumberofmessagestransmitted more,unlesstheyuseheuristicsotherwise,votingisverytime is more than majority voting, that is, 3𝑛 + 3 messages in consuming for six or more sites [18]. Therefore, the higher the worst case. However, 4 messages are transmitted in the number of messages transmitted is an issue with voting best case (if the first site has the majority vote or quorum). algorithms.Moreover,somevotingalgorithmsarelimitedby Somestaticvotingalgorithms[16,24–27]arisefromweighted topologyissues,and,insomeotheralgorithms,nonrealistic votingalgorithminallsiteswiththelogicalstructurebeing assumptionsareconsideredwhenusingsiteinformation. chosenforquorumselectionandconsensus. Inadditiontoconsistencyissuesfordistributedsystems, Staticvotingalgorithms[2,4]usepredeterminedinfor- partitiontoleranceandavailabilityareotherconcerns(refer mationonsitesanddistributedsystems.Thisinformationis to [19] for more details). In this study, a new algorithm fixed during the system’s life time. In case of emergency or is introduced for partitioned distributed systems, based on failure,thealgorithmisnotabletochangeitspolicy.Thereare dynamic voting algorithms. Similar to other voting based somesituationswherethedistributedsystemsarepartitioned algorithms, every single site that wishes to access CS must intotwoormoreisolatedgroupsofsites[5,11]withnointer- issueandbroadcastitsrequesttoagroupofconnectedsitesor section between isolated groups. These groups are known distinguishedpartition(DP).ThesiteentersitsCSifnositein as partitions. All sites inside a partition can communicate theDPsendsarejection.ThisalgorithmguaranteesMutEx. withoneanother,buttheycannotcommunicatewithother Itisdeadlockfreeandneedsfewermessagestransmitted.It sitesinotherpartitions[28].Ifstaticalgorithmsareusedfor alsocantolerateahighernumberoffaultsandhasahigher thesesituations,thepartitionscanhaveparallelaccesstoCS availability when facing consecutive failures. Furthermore, becausethesitesinsideeachpartitionlookatthepartitionas thisnewalgorithmissimulated.Itsavailabilityandnumber anintegratedsystem,wheninfacttheybelongtoapartition ofmessagesarecomparedtomajorpartitiontolerantvoting amongmultiplepartitionsofadistributedsystem. algorithms. Inapartitioneddistributedsystem,ifthereisnomajority The remainder of this paper is organized as follows. partition,everysitemustrejectincomingupdaterequeststo Section2hasbackgroundandliteraturereviewonstaticand avoiddatainconsistency.Inthiscase,thedistributedsystem dynamicvotingalgorithmsfordistributedMutEx.Section3 is halted until a recovery or repair process is executed and dealswiththenewproposedvotingalgorithm;proofofcor- aDPisfound.Thisinterruptiondecreasestheavailabilityof rectnessisdiscussedinSection4,whileSection5presentsthe MutExandisknownastheproblemofhaltedstates[5,29]. experimental results and performance analysis of dynamic This problem cannot be solved by static voting algorithms voting algorithms. The number of messages transmitted is becausetheydonothaveadaptabilitywithdynamicchanges also discussed in this section. Finally, the conclusions and ofsystemconnectivity. futureworksareexplainedinSection6. Dynamic voting algorithms [30–32] are the main solu- tionstoavoidhaltedstates.Theyincreasetheavailabilityof distributed MutEx in partitioned distributed systems [29] 2.BackgroundandLiteratureReview and utilize two main mechanisms to prolong MutEx either by increasing the vote (or the priority) of a partition to be Two types of voting-based algorithms, static and dynamic, higherthantheotherpartitionsorbyenablingthealgorithm are utilized for MutEx in distributed systems. The simplest to service noncritical applications during the halted state formofstaticvotingalgorithmismajorityvotingpresented (e.g.,[33]).Inthisstudy,thefirstmechanismisdiscussedin by Thomas [4]. This algorithm performs a majority voting terms of the most successfully referred partition becoming amongallthesitesinthedistributedsystemandusesadif- the DP and when the problem of halted states is prevented ferentstrategytolockthemechanism(includingsemaphore or decreased. Major dynamic voting algorithms are group [20]andmonitor[21])ortimestamps[22,23].Asiterequests consensus [29], autonomous reassignment [29], dynamic to enter the CS to update replicated copies of data if and linear[30,31],hybrid[31,34],anddynamicweightedvoting only if it gets the permission from the majority of 𝑛 sites, [32], all of which have been designed based on majority thatis,⌈(𝑛+1)/2⌉.Otherwise,itmustrejecttherequestand voting,eitherdirectlyorindirectly. waituntilitbecomeseligibletoentertheCS.Thisalgorithm Group consensus requires a coordinator to assign each transmits 2𝑛 + 2 and ⌈𝑛/2⌉ + 𝑛 + 3 messages in the worst site a number of votes and allows only the group with the andbestcases,respectively[4].Majorityvotingisthegeneral majority of votes to perform the requested operation [29]. formformanystaticanddynamicvotingalgorithms.Some Once a site has discovered the failure of the coordinator, staticgenerationsarerelativeconsensusvoting[15]inwhich it initiates itself as a candidate for coordination. This site multipleupdaterequestsareconsideredandorderedinone becomes the new coordinator and installs new votes if and voting process based on some priority measures. Weighted only if at least the majority of sites sends consensus. A voting,aspresentedbyGifford[2],occurswhensomevotes sequencenumberisassignedtoeachsite.Thesiteswiththe areassignedtoeachsite.Eachread/writerequestproceedsif latest data always have the greatest sequence number. Vote InternationalJournalofDistributedSensorNetworks 3 reassignment [18, 35, 36] is done by using the information repairrate[31].However,theresultsofthesealgorithmshave sent by other sites on the topology of the system [29] in not been taken into account for environments with high suchawaythatatthemostonepartitionformsthemajority probabilityoffailure.Theterm“hybridvoting”differsfrom partition. “hybridmutualexclusion”(asusedin[37,38])whichhasthe Autonomous reassignment [29] solves the problem of ideaofbothtoken-basedandpermission-basedalgorithms. group consensus for the single coordinator by having the In dynamic weighted voting [32], each site has a pre- distributionofthealgorithmoneverysiteinthedistributed defined weight and a DP with greater accumulative votes system. Therefore, each site is responsible for managing its (weights) in comparison with the sum of corresponding vote [29]. A mechanism is required to restrict a site from associatedweightstothesite(s),withthemaximumversion changing its vote arbitrarily and making the MutEx at risk. number [32]. Every site keeps the information on version To do so, each site must obtain an approval of a certain numbersandweightsofothersitesatthetimeofpartitioning. number of sites prior to every assignment. Autonomous A coordinator is responsible for forwarding the update reassignment is more flexible, simpler, and quicker than requesttothesitesinthemajoritypartitionandtodetermine group consensus; however, it has a lower degree of fault thecommitmentorrejectionofanupdate[32].Asaresult, tolerance [29]. Although this algorithm has some benefits dynamic weighted voting is a centralized algorithm which including a lower number of messages, faster action, and has a lower degree of fault tolerance in comparison with flexibility,itrequiresknowingtheglobalinformationonthe distributedalgorithms.Adifficultywiththisapproachisthe total number of votes (including the votes of sites in other obtainingoffairandpropervaluesfortheweights.Ifaweight partitions).Thisrequirementisnotrealisticasitcontradicts ofasiteisverygreat,thepartitionincludingthesiteisalways theconceptofisolatedpartitions[11]. consideredasaDP.Hence,evenfailedsitesgetrepairedand As a pessimistic technique to deal with partitioning, can rejoin the partition even if they are not useful for the dynamic linear algorithm [30] is proposed with the aim of consensus. increasingtheavailabilityofMutExindistributedsystems,in Studies in [37, 38] introduce two forms of hybrid mu- additiontoguaranteeingMutEx.Inthisalgorithmwith𝑛sites tual exclusion algorithms as a combination of token-based owning replicated data, two parameters including version and permission-based algorithms. The study in [37] uses number(VN)andthenumberofsitesorsitecopies(SC)are Maekawa’s quorums [24] for every site in the distributed consideredinthelatestupdateandareassignedtoeachsite system. In the first phase, a site broadcasts messages to 𝑖 (1 ≤ 𝑖 ≤ 𝑛). A site is permitted to accept the request if its quorum asking for its permission. Once permission andonlyifitbelongstoaDP.TodetermineaDP,messages is received, it creates a token message. Subsequently, the arebroadcastbytherequestedsitetootherconnectedsites. algorithmneedstoformalogicaltreewiththetokenholderas Then,thepartitionofconnectedsitessendsbackitsVNand itsroot.OncethetokenholderreleasesCS,itsendsthetoken 𝑆𝐶.Ifthemajorityofsitesinthepartition(i.e.,⌈𝑛/2⌉)hasthe tooneofitschildren inthetree.Thisalgorithmguarantees maximumversionnumber,𝑀 = max(𝑉𝑁𝑗|𝑗 ∈ 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛), MutEx as long as no failure of sites or links happens. It thepartitionisaDP.Then,therequestedsiteispermittedto doesnotconsiderfaulttoleranceandpartitioningofnetwork. accepttherequest,toinstallnewVNandSC,andtoaskthe Another hybrid MutEx algorithm is suggested in [38]. It is otherstoupdatetheirdata.Thesitesofotherpartitionsare basicallyatoken-basedalgorithmwhichusestokenstogetthe notallowedtochangetheirdata.Toavoidconcurrentaccess permissionofCSandutilizescoteries(similartoquorums)to to CS, this algorithm uses lock mechanism [31]. This lock reducethenumberofcommunication.Itshouldbenotedthat mechanism cannot solely guarantee MutEx in the systems hybridMutExisforgroupMutEx. needingdynamicvotereassignment(partitioneddistributed Allvotingalgorithmsarecapableofenhancingtheavail- systems)[11].However,themechanismcanbeutilizedwith ability of distributed systems to achieve MutEx. However, differentformsofdynamicvotingalgorithms[29]asdynamic they are inefficient to keep the system available in case of linearvotingisverysimpletounderstandandtoimplement. consecutive failures (e.g., Gifford’s weighted voting [2] and Hybrid voting utilizes static voting for nonfaulty con- Thomas’smajorityvoting[4])orforcomplicatedcalculations ditions and switches to dynamic voting once partitioning (e.g.,autonomousreassignment[29],hybridvoting[34],and occurs. The aim of hybrid voting [31] is to increase the Davcev’s dynamic voting [32]) or for accurate knowledge availabilityofdynamiclinearinsituationswherethenumber about the topology of the distributed system (e.g., group of sites with the current copy of data is exactly 𝑛/2. In consensus [29]). A preliminary study [39] on ancestral additiontothismechanism,hybridalgorithmusesapolicy dynamicvoting(ADV)algorithmshowsimprovementinthe to keep the system up if two out of three sites participate availability of distributed systems where the sites or links in the latest update, having the current data. To do so, a arevulnerabletofail.ADVguaranteesMutExinpartitioned parameterdenotedasdistinguishedsite(DS)isaddedtothe distributed systems. It is simple and does not need special informationstoredateachsite𝑖.Byusingthisparameter,the assumptions on the number of sites or topology of the sites participating in the latest update can be recognized. It network. In this study, fault tolerance, lifetime of service, hasbeenproventhathybridvotingincreasestheavailability and number of messages were investigated according to ofMutExtobehigherthanthatforallstaticvotingalgorithms link failure and site failure, partially and fully connected [31]. The results of the availability analysis of majority, topologies, and two different configurations of the system dynamic linear, and hybrid algorithms demonstrate better (fresh start and consecutive);however, onlyavailabilityhad performance for hybrid algorithm as systems with a high beendiscussedin[39]. 4 InternationalJournalofDistributedSensorNetworks Withtoken-basedandsomestaticalgorithmswhichuse query,readingpermissionfromonesiteisenough[11].This logical topology structure, ADV does not need to know studyconcentratedonwritingonreplicateddataorupdate. the topology. ADV uses the ancestor to give privilege to For simplicity, all the sites are supposed to have the same a partition among multiple partitions. The sites are not priority.Iftwoormorerequestsarriveconcurrently,onlyone required to obtain the permission from all the sites in DP is serviced based on the version numbers. If their version toenterCS.Therefore,ADVisdifferentfromprimarycopy numbers are the same, one of them is chosen randomly. In [40, 41] and coordinator cohort [42] in which one primary thisstudy,theissuesrelatingtoqueuingtheory,storage,and site is set and permissions are issued for CS. The situation synchronizationdelayarenottakenintoaccount. is different to the extent of using a nonfixed coordinator CS Flag announcesasitestatuswhenregardingtheCS. in every cycle and voting technique for decision making. Initially,allthesitesareinthenormalstate,thatis,CS Flag= Furthermore, issues related to network partitioning are not 0.AsitegoestoCSstate,thatis,CS Flag=1,onceitreceives pointedoutincoordinatorcohortandprimarycopy. thepermissionfromothersitesforCS.CS Flag issetintwo cases: when it is the establisher of the current voting cycle orwhenasitehasbeenaskedbyothersitestoparticipatein 3.AncestralDynamicVoting(ADV)Algorithm voting (i.e., received Msg1). In the first case, if CS Flag has beenalreadyset,toavoiddatainconsistency,thesitehasto Inthisstudy,ADVaimstoworkwithanytopology,whether send back a rejection message to the sender. Therefore, the the topology was partially or fully connected. To avoid establisherfindstheconcurrentaccesstoCSandrejectsthe parallelaccessofsitestoCS,CS Flag wasutilized.Ancestor is a label for a site that establishes the latest successful request. commitmentandthenumberofsiteswithreplicateddataof between0and𝑛.Inadistributedsystemcomprising𝑛sites, 3.1.StructureoftheAlgorithm. Inthisstudy,ADVisexecuted each site is labelled as 𝑆𝑘; 𝑘 = 1,2,...,𝑛 and keeps local bythreesubprocedures.Thefirstprocedureisforinitializa- information on its version number (VN), ancestor (Anc), tionthatisperformedbyeverysinglesiteinthedistributed and CS Flag, besides having the latest information of VN system. The second procedure is executed by a site which and Anc of other sites in the partition. This information is needs to update shared data and generates requests for CS. dynamicallymodifiedaccordingtothesite’sreconnectionor The third procedure is event handler and performs proper disconnection,anditisinitiallyconnectedtoeverysiteinthe actions once an event is triggered. The sites are normally network. waitingforincomingrequests,whetherfromanapplication ADV is distributed among all the sites, and the failure or from other sites. Therefore, ADV is an event triggered of one or more sites cannot interrupt the service as long as algorithm. the ancestor of each voting cycle is up. A site is supposed Each request carries at least the label of the sender or to communicate with the others, unless it failed or is dis- receiver(s) and the content of the message. Five types of connected. For a single site, the failure of other sites is not messagesaredefinedinADV.Theyareasfollows: distinguishable from its corresponding links’ failure, unless a special fault detector is utilized. However, in this study, Msg1:requesttoenterCS, timeout for detection of failures (similar to [43]) is used. It Msg2:permission, isassumedthattheancestordoesnotfailduringCSing,but itcandisconnectorleavethecurrentpartitiontoformanew Msg3: request to update data structures and release DP. CS, The main assumption of ADV is that the sites, update Rejection: rejectionofaCSrequest, their information the same way as what the establisher has sentforthem.Theydonotbehavearbitrarilyasreportedin Failure:reportingasitefailuretoothersites. general Byzantine problem. The definition of a partition is similartoothervotingalgorithms;thatis,anisolatedgroup Figure1summarizestherelationsbetweenmessagesand ofsitescannotcommunicatewiththesitesoutsidethegroup. sitestates.ThreemessagesasseeninFigure1aretransferred Thereisnoassumptiononhowthesitesareconnected,and forasuccessfulupdateofCS. there is no single coordinator in the system. Sites do not Therearetwotimersincludingrequest timerandrelease need to know the information of other partitions and only timer which trigger the events of failure. The request timer actbasedontheirpartitioninformation. is embedded in the ancestor’s routineand detects failure or A distributed system initially forms one partition 𝑃; disconnection from other sites to which the request for CS however, some sites or links may be timely disconnected hasbeensent,whereastherelease timer isutilizedtodetect from the partition due to failures. Every single site 𝑘 sees disconnection or partitioning of a site from the ancestor. this partition from its point of view as partition 𝑃𝑘, so that The value of the request timer is selected according to the initially𝑃𝑘 = 𝑃.Inthecaseofpartitioning,somepartitions network communication delays. The release timer is longer mayhaveoutdateddata(becausetheydonotreceiveupdates thantherequest timerbecauseitincludesatleasttwonetwork fromtheDP),andreadingfromsuchpartitionsmayresultin communications delays and a longer time duration while outdatedinformation.However,concurrentreadoperations the other site is in CS. The relation between these timers is are allowed if the site belongs to a DP. Therefore, for each delineatedinFigure2. InternationalJournalofDistributedSensorNetworks 5 Normal Void ADV initialize(𝑆𝑘){ 𝐶𝑆 𝐹𝑙𝑎𝑔=0; CSFlag=0 𝑃𝑘={𝑆1,...,𝑆𝑁}; 𝐴𝑛𝑐=𝑆1; Msg3 Msg1 𝑉𝑁=0; } Procedure 1: ADV Initialize(𝑆𝑘) sets initial local variables on Vote reply, Update request, everysite𝑆𝑘. CSFlag=1 CSFlag=1 site can access CS if and only if it belongs to a DP. Prior to Msg2 that, it needs to communicate with all sites in its partition Figure1:Relationbetweenmessagesandsitestates. by sending them Msg1. Once a site S𝑘 has received Msg1, it checks its CS Flag. If CS Flag has been already set, the siteparticipatesinotherupdatecommitment.Toavoiddata inconsistency,itmustrejecttherequest.Thissiterejectsthe 3.1.1. Initialization of Sites. Procedure ADV initialize sets CSrequestifthereisnoancestor;otherwise,itsetsitsCS Flag initialvariablesoneverysiteinthenetworkandiscalledup andparticipatesinvotingbysendingMsg2anditsVNtothe beforeeventhandlingbyADV.Toavoidambiguity,thefirst establisher. timeancestorofthepartitionisassumedtobethefirstsite, ftohrateivse,r𝑆y1.sThucceeasnsfcuelsCtoSreonfteevreinryg.siAtesinsoupprdeavteiodulsatueprdbayteAsDarVe receSivitieng𝑆𝑘Mpsge1rfforormmsthtehereqpuroecstedeustraeblUisphderat(eherreequ𝑆𝑖e)s,ta,suspeoenn inProcedure4. performedintheinitialstate,allVNaresetto0. Aseverysitehasitsownviewofitspartition,itisgoodto 𝑆𝑖 mayfailafterreceivingthepermissionfrom𝑆𝑘,while indicatetheviewofasite𝑆𝑘(𝑘 = 1,2,...,𝑛)foritspartition 𝑆𝑘iswaitingforjobaccomplishmentandCSreleaseby𝑆𝑖.In by𝑃𝑘.Theprocedureforinitializationofasiteisasseenin this scenario, deadlock occurs when some sites are waiting to enter CS, while no other site is in CS [5]. ADV benefits Procedure1. by having the release timer as 𝑆𝑘 does not have to wait for infinity.Thistimershouldbelongenough.Ifitissettoashort 3.1.2. Generating CS Request. When a site needs to update value, MutEx may not be guaranteed because 𝑆𝑘 may think replicateddata,itperformsprocedureADV CSingbysending that𝑆𝑖hasfailedwhileitisactuallyupanddoingitsjobinthe its request to all its partition mates. Other sites may give CSforalongertime. priority to concurrent incoming requests for CS by consid- If no message arrives before the release timer’s timeout, eringtherequesters’versionnumbers.Therefore,VN𝑘isalso 𝑆𝑘 labels 𝑆𝑖 as faulty and sends a message to other sites in sent along with a request. The time when a site generates a 𝑃𝑘, announcing that it has released CS. In the normal case, CSrequestdependsontheapplicationitperformsorthose 𝑆𝑖 executestheprocedureVote replyuponreceivingareply performedbyothersites. messageintheformofMsg2withineachsite’sVNbeforeits 𝑆𝑘 needstoupdatethereplicateddataandsendsMsg1to request timer’stimeout. other sites in 𝑃𝑘. In this case, the replicated data is the CS. For procedure Vote reply, the establisher of CS request To avoid inconsistency, only one site is eligible to write on receivesreplies(i.e.,Msg2,VN,andAnc)fromothersitesin this value in a mutually exclusive manner. The site sets the partition𝑃𝑖 and decides based on their and Anc, as seen in request timerbeforesendingarequestforCS.Asiteextracts Procedure5. aofrerqequuesetstf.oWrtihmeneoruetqbuyesdt-etcimreearsihnagsthtiimsteiomuetrs,a𝑆n𝑘draealolicsaelsttihmaet 1,2,𝑆.𝑖.s.e,e𝑛kssiftoesrtihnepmaratxitiimonu.mAvseirtseiownitnhuVmNbe𝑘r=𝑀𝑀,amisofnogu𝑘nd=, eitherothersiteshavefailedoritisdisconnectedfromother anditchecksifitsancestorexistsinthepartition.Ifso,the sites. Then, it can resend its requests for a limited number partitionisaDPand𝑆𝑖iseligibletocommittheupdate.Once of times or reject the request if it previously has done that theestablisherfindsitselfinsidetheDP,itacceptstheupdate numberofunsuccessfulattempts,asseeninProcedure2. request,increasesitsversionnumber,assignsitslabelasanew ancestor,andasksothersitestoupdatetheirinformationby 3.1.3. Event Handling. All the sites are normallywaitingfor broadcastingMsg3. aneventfromothersitesoreventhemselves.Whenanevent Every site that receives Msg3 updates its information to arrives, an event handler decides on the event based on new𝑉𝑁and𝐴𝑛𝑐oftheestablisherandresetsitsCS Flag as the type of message delivered to the site. Procedure ADV presentedintheprocedureCommitment release,asseenin EventHandlerisexecutedoneverysite,asseeninProcedure Procedure6. 3. It is always possible for a repaired site to rejoin the When 𝑆𝑖 has received an update request for CS, it is distributed system or to update if its data are obsolete. consideredastheestablisherofthecurrentvotingcycle.This Obsoletenesscanoccurduetolinkfailuresorfailuresofsites 6 InternationalJournalofDistributedSensorNetworks VoidADV CSing(𝑆𝑘){ If(𝑆𝑘wantstoenterCS){ Setrequest timer; Setnumber-of-attempts; Send(Msg1,𝑉𝑁𝑘)to𝑃𝑘; While(number of attempts!=0) While(request timer!=0){ request timer–; If((request timer=0) &&(number-of-attempts!=0)) { Send(Msg1,𝑉𝑁𝑘)to𝑃𝑘; Number-of-attempts–; } ElserejecttheCSrequest; } } } Procedure2 VoidADV EventHandler(void){ While(1){ Ifanyeventarrivestoasite𝑆𝑘 Switch(Event){ CaseMsg1:Update request(); CaseMsg2:Vote Reply(); CaseMsg3:Commitment release(); CaseFailure: Removefailedsitefrom𝑃𝑘; CaseReject:{ CS Flag=0; Broadcastrejectionto𝑃𝑘; } } } } } Procedure3 duringupdatecommitment.Inthiscase,theobsoleteorfailed the coordinator. To detect and elect a new ancestor, some site must send its update request to a site through which it techniquessuchas[29]canbeaddedonADV. connectswiththepartition. If a site 𝑆𝑗 wants to join partition 𝑃𝑖 through site 𝑆𝑖, it sendsitsrequestto𝑆𝑖.𝑆𝑖 actsonbehalfof𝑆𝑗.Itchecksifits 4.ProofofCorrectness partitionisaDPandifothersitesinthepartitionagreeona new site joining. To ensure that the new site’s reconnection ThissectionprovesthatADVcansolvetheproblemofMutEx does not risk MutEx, 𝑆𝑖 must ensure that it is not a part in partitioned distributed systems and prevent deadlocks. of a MutEx mission. If its CS Flag is set, it waits until its The algorithm works correctly if every site updates its CS Flag is released by the establisher. Then it establishes a information.Thefollowingshowsthetheoremsandproofs. procedure similar to MutEx and informs other sites that a newsiteisaddedtothepartition𝑃𝑖.Inotherwords,thesame Theorem 1. ADV guarantees MutEx in a nonpartitioned process to commit an application request must be followed distributedsystem. forreconnectionorjoiningofasite. TheconceptofDPinADVdependsontheexistenceof Proof. As long as no failure occurs, a distributed system is theancestor.Iftheancestorfails,ADVinterruptstoservice nonpartitioned.Thisproofis discussed in two states: initial othersites.Thisproblemiscommoninalgorithmsbasedon stateandnormalstates. InternationalJournalofDistributedSensorNetworks 7 VoidUpdate request(void){ If(CS Flag==0&&Anc!=Null){ CS Flag=1; Send(Msg2,VN,Anc)to𝑆𝑖; Setrelease timer; //ifasitedoesnotreceiveanyCSreleasebeyondrelease timer, //itreleasesCSandbroadcastsittopartition𝑃 While(relase timer!=0) release timer–; If (release timer=0andCS Flag==1){ //noCSreleasehasarrivedfrom𝑆𝑖beforerelease timer //time-out Label𝑆𝑖asfaulty; If(𝑆𝑖!=Anc) Broadcast(Failure,𝑆𝑖)to𝑃𝑘; Else WaituntilAncreplies; } } Else Send(Reject)to𝑆𝑖; } Procedure4:ProcedureUpdate request(). VoidVote reply(void){ StoreVNandAncofsites∈𝑃𝑖; IfMsg2doesnotarrivefromasite𝑆𝑗∈𝑃𝑖&itsrequest timerhastime-outs{ Label𝑆𝑗asfaulty; Broadcast(failure,𝑆𝑗)to𝑃𝑖; } If(Find𝑆𝑗∈𝑃𝑖|𝑉𝑁𝑗=max{𝑉𝑁∈𝑃𝑖})and𝐴𝑛𝑐𝑗 ∈𝑃){ 𝑉𝑁𝑖=𝑉𝑁𝑗+1; 𝐴𝑛𝑐𝑖=𝑆𝑖; Broadcast(Msg3,𝑉𝑁𝑖,𝐴𝑛𝑐𝑖)to𝑃𝑖; } } } Procedure5 1 rejects other CS requests. Therefore, two or more sites VoidCommitment release(void){ cannot enter CS concurrently, and MutEx is guaranteed in 𝑉𝑁𝑗=𝑉𝑁𝑖; nonpartitioneddistributedsystem. 𝐴𝑛𝑐𝑗 =𝐴𝑛𝑐𝑖; CS Flag=0; (2)InNormalStates.Foranonpartitioneddistributedsystem, } MutEx is not guaranteed if the two sites 𝑆𝑖 and 𝑆𝑗 access theCSsimultaneously.Themethodofcontradictionisused. Procedure6 MutEx is not guaranteed when sites 𝑆𝑖 and 𝑆𝑗 have two differentdatawiththesameversionnumbers.Asthedataare different,twodifferentancestors(𝐴𝑛𝑐𝑖and𝐴𝑛𝑐𝑗)coordinate twodifferentupdates.When𝑆𝑖 and𝐴𝑛𝑐𝑖 arenotconnected to 𝑆𝑗 and 𝐴𝑛𝑐𝑗, the system is partitioned. This situation (1) In Initial State. In a nonpartitioned distributed system, contradicts the assumption that the distributed system is initially every site considers 𝑆1 as Anc. If a site 𝑆𝑖 needs to nonpartitioned. accessCS,itissuesMsg1toallothersites.Othersitessettheir CS FlagonceMsg1isreceived,andtheykeepitsetuntilaCS Theorem 2. ADV guarantees MutEx in partitioned distrib- release message arrives from 𝑆𝑖. Every site with 𝐶𝑆 𝐹𝑙𝑎𝑔 = utedsystem. 8 InternationalJournalofDistributedSensorNetworks Proof Node A Node B (requester) (ancestor) (1) In Initial State. Initially one site 𝑆1 is considered as the . . . . ancestor of initial partition. The initial partition is divided . . into two partitions, 𝑃𝑖 and 𝑃𝑗, due to the failure of site 𝑆𝑘. SeSnedtrreeqquueesstt(tMimsegr11)) The method of contradiction is used to prove the theorem. Msg1 Th𝑃𝑖,eanadlg𝑆o𝑗riftrhommd𝑃o𝑗easccneopttgtuwaoraunptdeaetMeruetqEuxeswtshseinmsuitletasn𝑆e𝑖ofuroslmy. timer Send permission(Msg2) In this situation, updates are coordinated by two DPs with est Setreleasetimer u two different ancestors, that is, 𝐴𝑛𝑐𝑖 and 𝐴𝑛𝑐𝑗. Based on Req the algorithm, the repaired site 𝑆𝑘 is eligible to rejoin other Msg2 partitions,aslongasthedestinationisaDP.As𝑃𝑖and𝑃𝑗have Enter CS theirownancestors(i.e.,𝑆𝑖and𝑆𝑗)andboth𝑃𝑖and𝑃𝑗areDPs, r twopartitionsaremergedandtheinitialpartitionisformed, .. me . ti butwithtwodifferentancestors.Thissituationiscontraryto e s thebasicassumptions. Send release CS ea el R (2)InNormalState.Whenadistributedsystemispartitioned (Msg3) intotwopartitions𝑃𝑖 and𝑃𝑗,oneofthefollowingsituations Msg3 occurs.Formorethantwopartitions,MutExcanbeproved inasimilarway. Figure2:Relationbetweenrequest timerandrelease timer. (a)𝑆𝑗disconnectspartition𝑃𝑖fromitsancestor𝐴𝑛𝑐𝑖.Site 𝑆𝑗 in 𝑃𝑗 receives a request for update. 𝑆𝑗 does not detecttheancestor𝐴𝑛𝑐𝑖,and𝑃𝑗isnotDP.Therefore, the site 𝑆𝑗 does not update the data, and MutEx is 1 satisfied. 2 (b)Site𝑆𝑖in𝑃𝑗receivesanupdaterequest.𝑆𝑖isconnected 5 to 𝐴𝑛𝑐𝑖. Therefore, 𝑃𝑖 is DP, and 𝑆𝑖 updates the common data. As other sites in 𝑃𝑗 cannot enter the CS,MutExissatisfied. 4 3 (c)Similar to Theorem1, by using the method of con- tradiction, Theorem2 can simply prove that two Figure3:Adistributedsystemwithtwopartitions. partitions with two different ancestors do not exist in two different partitions 𝑃𝑖 and 𝑃𝑗. Since only one ancestorexistsinthesystem,onlyoneDPisformed. Furthermore,allthesitesinDPupdatetheir𝑉𝑁and Based on the philosophy of partitioned distributed sys- 𝐴𝑛𝑐tothesamevalues,beforereleasingtheirflags. tem,exceptforonepartition,therearesomepartitionswhose sites should not enter CS. In fact, this definition indicates thatthesitesinthenondistinguishedpartitionmuststarve! Thus,inMutExalgorithmsforpartitioneddistributedsystem, Theorem3. ADVisdeadlockfree. starvationfreedomisnotrealistic.Theonlyprivilegeforthese typesofalgorithmsishigheravailabilityofMutEx. Proof. One reason for deadlock to occur is waiting loops, whensites𝑆𝑖,𝑆𝑗,and𝑆𝑘 arewaitingforbeingreleasedfrom 5.ExperimentalResults one another in a chain fashion. ADV utilizes one ancestor to issue release messages. By using the assumption that the For more clarity, ADV algorithm is shown in Figure3 ancestor does not fail during CSing and other sites have to which shows a distributed system with 5 sites. The links wait for release messages from the ancestor, a chain cannot which are displayed with dashed lines have already failed beformed. The other reason for deadlock is waiting for infinity to because 𝑆3 has separated and formed an isolated group. entertheCS.ADVbenefitsbyhavingtherelease timersothat Therefore,thedistributedsystemhastwopartitions{𝑆3}and i𝑆s𝑘adwoaeistendobtywaaniottfhoerrisnifitenoitny.lyBiafsietdsCoSnFAlaDgVisaslegto.rHithowme,vCerS, t{h𝑆1e,la𝑆t2e,st𝑆e4s,ta𝑆b5}li,swhehrenwatwso𝑆4u.pdateshavebeencommittedand ADVsetsarelease timeronceasitesetsitsCS Flag.Thesite Scenario1.Site4hasreceivedanupdaterequest(current𝑉𝑁 releasestheCS Flag.Itsendsamessagetoothersitesinthe is 2). Its lock is granted, and the ancestor of the establisher partitionannouncingthatithasreleasedCSandasksthemto is S4 (S4 is the establisher and ancestor as well). Therefore, do the same procedure. Therefore, waiting for infinity does based on the algorithm, partition 𝑃 = {𝑆1, 𝑆2, 𝑆4, 𝑆5} is a notoccurinADV. DPand𝑆4 commitstheupdate bybroadcastingnew𝑉𝑁 = InternationalJournalofDistributedSensorNetworks 9 3 and 𝐴𝑛𝑐 = 𝑆4 to sites in 𝑃. Once the commitment is InTest1,thesystemisintheinitialcondition.Whenthe accomplished,thesitesareaskedtoreleasetheirlocks. version numbers are zero, there is no ancestor and the site copies.Thedistinguishedsitereferstoalltheothersiteswith tSoceintsarnioei2g.h𝑆b2ouesrtsab{𝑆li1s,h𝑆e3s,a𝑆4v}otainndgpwraoictsesfsobry𝑇s𝑤ensdecinognd𝑀s.𝑠𝑔A1s cAosmthmeosynstdeamtai.n𝑆i1tiraellcyefivoermsasroenqeupeastrttiotiounpdanatdetchoimspmarotnitidoantais. 𝑆3 has already failed, 𝑆2 does not receive any reply from 𝑆3 theDP,updateiscommitted.Theversionnumbersandother after𝑇𝑤seconds.𝑆2thensupposesthat𝑆3isafailedsiteand indicesarethenupdatedinTest2. roefptohretfsaiitletodostithee.rItsiitsensoanteddctohnattinifu𝑆e3sbitescoopmereasttihoensesrteagbalridshleesrs, apaWrtihtieonna{l𝑆i4n}k.Gfaiivlse,nsi𝑆te2𝑆a4sitshseepesatraabtelidsh.Ther,iAssDitVetchaennafcocrempst itrejectseveryincomingrequestbecauseitdoesnotbelong theupdatebecausetheancestorof𝑆2,thatis,𝑆1,belongsto toDP. partition {𝑆1,𝑆2,𝑆3}. Dynamic linear and hybrid algorithms Scenario3.𝑆4receivestwolockrequestssimultaneouslyfrom alsocommittheupdatebecause𝑐𝑎𝑟𝑑 𝐼 = 3(numberofsites sites𝑆2and𝑆5.Inthiscase,asitemaybechosenrandomlyto withmaximumversionnumber)ismorethan𝑁/2,thatis,2 beservicedfirst.Othersitesareaskedtorejecttheirupdate (𝑁asthemaximumSCofsitesinpartition).Theresultsof requests.Ifaneighbourcannotsetitslockforanyreason,it updatecommitmentaredisplayedinTest3. sendsbacktherejectionsothat𝑆4 canbeinformedtoreject InTest3,thesystem’sconditionissimilartotheformer theupdaterequest. test,exceptthattheestablisherischangedto𝑆4.Noneofthe algorithmscancommittheupdate.Therefore,toavoiddata Scenario4.𝑆3isrepairedandfound𝑆2asitsneigh-bour,but inconsistency,therequestmustberejected. itcannotarbitrarilyrejoinapartitiontowhich𝑆2belongs.In Another partitioning happens in Test 4 with 𝑆1 as thiscase,𝑆3asks𝑆2forupdatecommitment.As𝑆2belongsto the establisher. Consequently, there are three partitions: aDP,𝑆3isallowedtoconnect.Itcanreceivethelatest𝑉𝑁and {𝑆1,𝑆2},{𝑆3},and{𝑆4}.Withregardtothesites’information, 𝐴𝑛𝑐through𝑆2too.ADValgorithmisexpectedtoguarantee allthealgorithmshavetheconsensusandupdatetheassoci- MutExandtoimprovetheavailabilityofMutExwhenfailures atedinformation. occur. Hybrid algorithm improves the availability of dynamic Forthepurposesofsimulationandexperimentalresults, linear algorithm if less than the majority of the current ADV, dynamic linear, and hybrid voting algorithms are sitesremains.Thesiteshavehigherrepairrateversusfailure discussed as follows. One effective point on fault tolerance rate (the probability of repair is thus much more than the of an algorithm is coordination. A centralized coordinated failure).Since𝑐𝑎𝑟𝑑 𝐼equals𝑁/2,dynamiclinearalgorithm algorithm is vulnerable to fail because the coordinator is cannot continue working and rejects the request in Test 5. a single point of failure. Reliability and availability of the However, Test 5 has a major difference with other cases systemarealwaysfunctionsofthereliabilityandavailability because only one site can remain in the partition. In this of the coordinator. Therefore, these algorithms are avoided, particularcase,bothADVandhybridalgorithmsareableto except when they are required due to the limitations of commit.Therefore,ADVandhybridimprovetheavailability implementation or when the algorithms are utilized to of dynamic linear if less than the majority of the sites has upgradeorimprovetheperformanceofold-fashionsystems. currentdata. Forthisstudy,groupconsensusanddynamicweightedvoting In this section, the benefits and weaknesses of ADV are ignored in the analysis and comparison. Autonomous versus hybrid and dynamic linear algorithm are analysed. reassignmentcanbeagoodcandidateforanalysis;however,it Differentscenariosoffailureinjection,topology,andsimula- needstoknowthetotalnumberofvotesincludingthevotes tion environment are presented. However, the performance ofsitesinotherpartitions.Thisconditionisnotrealisticor measures used for the comparison and analysis should be possible in many real situations. For this reason, it is also discussed,asinthefollowingsection. neglectedinthisstudy. The remainder of this paper discusses dynamic voting algorithms including hybrid and dynamic linear which are 5.1. Performance Measures. Two main measures are used utilized for analysis and comparison. Table1 displays the for the analysis and comparison of dynamic voting algo- results of 5 consecutive tests on ADV, hybrid, and dynamic rithms including the number of messages transmitted and linearvotingalgorithms. availability of MutEx. Availability has widely been used in Test numbers are presented in the first column. System analysis and comparisonof dynamic voting algorithmsand configuration, and information required by each voting is defined as the probability that an update request can be algorithm is perceived in the second column, including the servicedatanytime,𝑡[34].Thismeasurerequiresnotonly site that has received the update request (establisher). A a partition existence but also the capability of at least one requestcanbeexaminedifandonlyifanonfaultyestablisher site in the partition to be operational in order to service has obtained the request. The version numbers (VN), site therequest.Aseconddefinitionmeasuresavailabilityasthe copies(SC),distinguishedsites(DS),andtheancestorofeach proportionoftimeduringwhichadistributedsystemservices site(Anc)havebeenpresentedinthesystemconfiguration.If the requests either as critical or noncritical. By using this dynamiclinearvotingistakenintoaccount,SCandVNneed definition,synchronizationdelaysarethetimeduringwhich to be known, whereas ADV needs VN and Anc. Obviously, aprocessenterstheCS,tillthenextprocesscomingintothe VN,SC,andDSareessentialforhybridvoting. CS[16].Interruptionperiodsduringconnectivitychanges(as 10 InternationalJournalofDistributedSensorNetworks Table1:ComparisonofADV,dynamiclinearandhybridvotingalgorithmsfor5consecutivetests. Testno. Systemcondition Dynamiclinear Hybrid ADV S1 S2 S3 S4 Accept Accept Accept VN 0 0 0 0 1 Anc — — — — SC 4 4 4 4 DS S1,⋅⋅⋅S4 S1,⋅⋅⋅S4 S1,⋅⋅⋅S4 S1,⋅⋅⋅S4 S1 S2 S3 S4 2 VN 1 1 1 1 Accept Accept Accept Anc 1 1 1 1 SC 4 4 4 4 DS S1,⋅⋅⋅S4 S1,⋅⋅⋅S4 S1,⋅⋅⋅S4 S1,⋅⋅⋅S4 S1 S2 S3 S4 VN 2 2 2 1 Reject Reject Reject 3 Anc 2 2 2 1 SC 3 3 3 4 DS S1,⋅⋅⋅S3 S1,⋅⋅⋅S3 S1,⋅⋅⋅S3 S1,⋅⋅⋅S4 S1 S2 S3 S4 VN 2 2 2 1 Accept Accept Accept 4 Anc 2 2 1 1 SC 3 3 3 4 DS S1,⋅⋅⋅S3 S1,⋅⋅⋅S3 S1,⋅⋅⋅S3 S1,⋅⋅⋅S4 S1 S2 S3 S4 VN 3 3 2 1 Reject Accept Accept 5 Anc 1 1 1 1 SC 2 2 3 4 DS S1 S1 S1,⋅⋅⋅S3 S1,⋅⋅⋅S4 discussedin[17])shouldalsobetakenintoaccount.However, messages(i.e.,3𝑗messages)intheworstcase,where1 ≤ 𝑗 ≤ issues relating to synchronization and delays are not taken 𝑁. intoaccountinthisstudy.Inthispaper,asimplifiedversion Dynamic linear and hybrid algorithms require 4𝑗 − 2 of the first definition is used. Availability is measured by messages(1≤𝑗≤𝑁)forcommitmentand2𝑗+1forrejecting thenumberoftimesthatadistributedsystemcanservicea therequest.Itisnotedthatthedifferencebetweenthesetwo requestandupdatereplicateddatasuccessfully[39]. algorithmsisthewayofchoosingDP,notintheprotocolof Though the availability improvement of MutEx is the sendingmessages.Obviously,ADVneedsfewernumbersof mainobjectiveofthisstudy,itcanbecompromisedbyhigh messagescomparedtodynamiclinearandhybridalgorithms. messagetransmissions.Forahighernumberofsites,ahigher number of messages are required to be transmitted. A high number of message transmissions for every commitment 5.2.SimulationofTestHarness. Softwaresimulationsareused cause extra communication cost, more time for sites to in this section to study the availability of dynamic voting process messages, and traffic increase in the network. The algorithmsdealingwithhighrateoffailures.Inthesimula- effectivetimededicatedtoserviceupdatesandqueryrequests tion environment, the issues related to interruption during isthenaffected.Therefore,thereshouldbeatradeoffbetween connectivity changes [17] are not taken into account. The thesetwomeasures. simulationofvotingalgorithmsisconsideredasevent-driven. At least 3 messages are required for ADV algorithm to An event-driven simulation implies that the algorithm acts commitorrejectanupdaterequest,with2outof3messages orreactsifachangeoccurs,whetheramessageisreceivedor forcommunicationbetweentheapplicationandestablisher. required to be sent. This assumption already has been used Theestablishercanbeanyofnonfaultysitesinthedistributed inthesimulationofprotocolsinthedistributedsystems[44– system.Onemoremessageisenoughforupdatecommitment 46]andindynamicvotingalgorithms[29].MATLABcanbe orrejectioniftheestablisheristheancestororifitdoesnot usedeffectivelyforthepurposeofsimulation.Thetoolboxes agree on an update. Otherwise, 𝑗 − 1 messages should be ofdistributedcomputingandrealtimealsocanbeutilizedor sent to other 1 ≤ 𝑗 ≤ 𝑁 sites for lock activation. 𝑗 − 1 linkedtothisstudy’ssimulatorforfutureworks. messagesfromthesitestotheestablishercanannouncethat Simply,allsitesareassumedtohavecommondata.The lockisgrantedandotherinformationordisagreement.More sites are labelled as 𝑆𝑖, 𝑖 = 1,2,...,𝑛, where 𝑛 presents the 𝑗messagesarerequiredforthecommitmentorrejectionand numberofsitesinthedistributedsystem.Tohavethesame lock release. Hence, ADV algorithm transmits 3𝑗 − 2 + 2 simulation assumptions, the information on the simulation

Description:
1 Department of Computer Systems Engineering, Faculty of Engineering, Universiti Putra Solutions for distributed MutEx are categorized into two.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.