ebook img

Repair for Distributed Storage Systems with Erasure Channels PDF

1.3 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Repair for Distributed Storage Systems with Erasure Channels

Repair for Distributed Storage Systems with Erasure Channels Majid Gerami, Ming Xiao Communication Theory Lab, Royal Institute of Technology, KTH, Sweden, E-mail: {gerami, mingx}@kth.se S Abstract—We study the repair problem of distributed storage systems in erasure networks where the packets 3 transmitted from surviving nodes to the new node might 1 be lost. The fundamental storage-bandwidth tradeoff is 0 2 calculated by multicasting analysis in erasure networks. Theoptimaltradeoffboundcanbeasymptoticallyachieved n a whenthenumberoftransmission(packets)goestoinfinity. node1 J For a limited number of transmission, we study the DC node4 9 probabilityofsuccessfulregenerating.Then,weinvestigate 2 two approaches of increasing the probability of successful node3 regenerating,namely, by connecting more surviving nodes node2 ] T or by increasing the storage space of nodes. Using more Fig. 1. A distributed storage system in a wireless network. Base station I nodes may pose larger delay and in certain situation distributes afileamongstorage nodes onitscoverage area. A data collector . s it might not be possible to connect to more nodes too. (DC)canrecoverthesourcefile,eventhoughitisoutofbasestationcoverage, c We show that in addition to reducing repair bandwidth, after meeting certain number of storages node. In this example, the source [ file is coded by (n= 4,k = 2)-MDS code. Thus, the DC by downloading increasing storage space can also increase reliability for fromanytwostoragenodescanrebuildthesourcefile. 1 repair. v Index Terms—Network coding, Distributed Storage Systems, 4 Erasure Channels. 5 to an erasure channel.When a packeterasure happens,all the 0 information in the packet is lost. 7 I. INTRODUCTION . In a distributed storage system, a source file is generally 1 Distributed storage systems have attracted substantial re- encodedbyerasurecodesanddistributedonredundantnodes. 0 3 search interests recently for increasing applications in e.g., Thus, the file is still available even though some storage 1 file sharing, cloud storage. The main benefit of distributing nodes fail. The maximum reliability by using erasure codes v: a file within a network is to provide reliable storage using is achieved when a file contains k fragments and is encoded i unreliable storage nodes. Distributed storage systems also to n fragments, and every k out of n fragments can rebuild X bring applications in wireless networks where links between the original file. The codes are known as MDS (Maximum r storage nodes may be unreliable. For instance, consider an Distance Separable) codes. Then, an (n,k) erasure code has a application of distributed storage systems in a delay tolerant the MDS property if every k out of n fragments can rebuild network (DTN) with wireless channels, as shown in Fig. 1. theoriginalfile.Forinstance,ifafileofsizeM isdividedinto A DTN is a kind of networks in which there might not be k fragments and encoded by (n,k) MDS ((4,2) here) codes direct links between a source and destinations and services onthe applicationof Fig. 1, thena receiver(ordata collector) can tolerate an acceptable level of incurred delay [1]. In Fig. by observingat least k,(k =2) storage nodescan rebuild the 1, a base station distributes a source file at the mobile nodes source file. which may travel towards arbitrary directions. Then a data Tomaintainthereliabilityofthesystem,whenanodefails, collector (DC) by meeting a certain number of these mobile anewnodeshouldberegenerated.Intheregeneratingprocess, storage nodes can rebuild the source file. surviving nodes transmit sufficient data to the new node such Unreliability in distributed storage systems might stem that the new node can maintain the MDS property, but the from disk failure, but it is not limited to this reason. For new node may have different coded symbols. This is called example, in the network in Fig. 1, a node failure may be the functional repair. We only consider the functional repair becauseamobilenodeleavethesystem.Indistributedstorage in the paper. A naive approach for repair is to rebuild the systemsinwirelessnetworks,unreliabilitymayalsooccurina source file (by download M fragmentsfrom surviving nodes) transmission process due to link failure, congestion, or buffer and then regenerate the new node with the MDS property. overflowandsoon.Wemodelanunreliablelinkinthenetwork However, downloading M fragments for regenerating M/k fragmentsis not efficientin term of bandwidth.Reference [2] Optimal bandwidth−storage tradeoff with different link failure probabilities 0.26 introduces network coding to distributed stroage systems in error-free networks (lossless network) and models the repair 0.25 ε=0.1 ε=0.2 processbyaninformationflowgraphintoamulticastproblem. 0.24 ε=0.3 Moreover,itisshownthatbyslightlyincreasingnodestorage, the bandwidth can be reduced considerably [2]. In [2], the α0.23 fundamental bandwidth-storage tradeoff was established. The 0.22 codes on the optimum tradeoff are called regenerating codes. 0.21 Two extreme scenarios on the fundamental bandwidth- storagetradeoffare minimum storageregenerating(MSR)and 0.02.2 5 0.3 0.35 0.4 0.45 0.5 0.55 0.6 γ minimum bandwidthregenerating(MBR) codes. For the MSR code, like the MDS codes, a file of size M is divided into Fig. 2. Fundamental bandwidth-storage tradeoff in erasure networks, n = k fragments and then encoded to n fragments. Hence, each 10,k=5.Higherlinkfailureprobability(ε)poseshighertrafficontherepair. node stores M/k fragments. Unlike MDS codes, MSR codes have the minimum bandwidth in regenerating a new node. That is, for d numberof surviving nodes joining in the repair probabilityofsuccessfulregeneratingcanbefurtherincreased. process, the minimum bandwidth by each surviving node (β) This approach suits for applications when requesting more is calculated by survivingstoragenodesmaycauselargedelayinthe repairor when there is not possible to connect to more storage nodes. α = M/k, MSR Otherrelatedworksonnetworkcodingforerasurenetworks βMSR = k(d−Mk+1). (1) are as follows. The capacity of wireless erasure networks has been studied in [7]. In references [8], [9], the probability ForMBRcodes,eachnodestoresmorefragmentsthanM/k of successful reconstruction of a source file is studied in butusedrepairbandwidthisexactlythesameasnodestorage, erasure networks. However, the papers do not studied the and is calculated by probability of successful regenerating. References [1], [8] α = 2dM , studyapproachesindelaytolerantnetworks.In[8],theauthor MBR k(2d−k+1) show that there is no unique answer to the question whether β = 2M . (2) MBR k(2d−k+1) coding for distributing a file maximizes the probability of After seminal work in [2], references [3] studies the code successful reconstruction or not. construction and achievablity of the functional and exact The rest of this paper is orangized as follows. In Section repair.In[4],cooperativeregeneratingcodesareconsideredto II we study the fundamental bandwidth-storage tradeoff in reducethe bandwidthin the scenarioof multiple-nodefailure. erasurenetworks.In Section III we studythe maximumprob- Reference [5] suggests surviving nodes cooperation in order abilityofsuccessfulregenerating.InSectionIVweproposeto to minimizes the cost of repair in multi-hop networks. Yet, in use more storage for increasing the probability of successful most of previous work of regenerating codes, it is assumed repair. Finally we conclude the paper in Section V. that links between storage nodes are perfect, without any error and erasure. In distributed storage systems especially II. FUNDAMENTALBANDWIDTH-STORAGE TRADEOFF IN in the wireless networks, as the example in Fig. 1, packets DISTRIBUTED STORAGE NETWORKS WITH ERASURE on the channels might get lost. Then, the redundant data CHANNELS needs to be transmitted for repair. Recently [6] has suggested In [2] the fundamental bandwidth-storage tradeoff was a regenerating code which is resistant to a specific number found by cut analysis in a multicast network where links of path failures by requesting more nodes to join the repair haveno errororerasure (lossless networks).Forthe multicast process. Particularly, in their approach the code resistant to t problem in erasure networks, the capacity of networks was ′ numberofpathfailure,itisrequiredtotransmitfromd =d+t found in [7]. Reference [7] proved the multicast capacity can survivingnodesinsteadof dnodes(thatforperfectchannels). be achieved in erasure networks by linear network coding. However, reference [6] has not studied the probability of Using similar approaches to [2], [7], we can find the fun- successful regenerating and how to construct the optimal (in damental tradeoff of distributed storage systems in erasure regard of probability of successful regenerating) codes. We networks. To have a closed-formformula for the fundamental call a regenerating process is successful when the new node bandwidth-storagetradeoffwitherasurechannels,weconsider beside surviving nodes has MDS property. We shall consider the scenario when all the links in the network have equal theprobabilityofsuccessfulregeneratingintherepairprocess erasure probabilities ε. Then, we formulate the tradeoff as ′ and show that finding the optimum values for d, and d follows. depends on the erasure probability of the links. Thus, we Proposition 1: Consider a repair process in a distributed can find the code maximizing the probability of successful storage system where channels from d surviving nodes (d ≤ regenerating, given the constraints of bandwidth and storage (n − 1)) to the new node have an erasure probability ε. capacity.We also find that by using slightly morestorage, the For any α ≥ α∗(M,n,k,d,γ,ε) there is a linear network ProbabilityofsuccessfulregeneratinginaDSSwithn=10,k=5 cαo∗d(eMt,on,akc,hdi,eγve,ε)th,eitpisoinimt p(ons,skib,lde,αto,γfi,nεd),aacnoddef.orHeαre<n bility) 1 dd==75,d,d′′==96,γ,γ′′==34.6 denotes the number of storage nodes, and every k storage ba0.8 o nodescanrebuildthesourcefile. disthenumberofsurviving gpr nabnoandddeγswipidsathrtt-oiscttaioplraartgeinpegatirrtahdbeeaonrfedfpwcaiaidrn,thba.enTdchaeαlcreuiflsoarteeea,dcthahse,nofudnedastmoreangtael, ulregeneratin00..46 α∗ =(MM−/kg(ki−)(i1−p)γ iiff γγ ∈∈[[1ff1−−((0iεε)),,+f(1∞i−−ε1))), (3) (successfps0.020 0.2 0.4 0.6 0.8 1 ε(erasureprobability) where, 2Md Fig.3. Aschemewithahigherprobability ofsuccess inoneregionmight f(i)= , (4) hasasmallerprobability ofsuccess inotherregion. (2k−i−1)i+2k(d−k+1) and, (2d+2k+i+1)i process is successful when all γ = dβ data are received. g(i)= . (5) 2d However, for an erasure networks, failure at links can cause Proof: The proof is similar to that in [2]. The only the repair unsuccessful. Thus, to increase the probability of ′ ′ difference is that in the erasure network the new node in successful regenerating, one approache is to use d (d ≥ d) average receives (1−ε)β from each surviving node. survivingnodesforrepair. Each survivingnode still transmits Fig. 2 shows the fundamental bandwidth-storage tradeoffs β fragments of data. Hence, the probability of successful whenlinkshaveerasureprobabilitiesε=0.1,0.2,0.3,respec- regeneratingis that the new node receives from at least d out ′ tively.Asweexpect,largererasureprobabilitiesleadtohigher of d surviving nodes, which is calculated by trafficintherepairprocess.Theseoptimumbandwidth-storage ′ d ′ tradeoffcanbeachievedasymptotically,i.e.,whenthenumber d ′ p = (1−ε)iεd −i. (6) of transmission packets goes to infinity. When feedback is s i available on the network the optimal bound can be achieved Xi=d(cid:18) (cid:19) byretransmission(potentiallyinfinite).However,foranetwork In Fig. 3, the probability of successful regenerating with without feedback, rateless random linear network codes [11] various erasure probabilities using two regenerating schemes can asymptotically achieve the optimal tradeoff.For a limited hasbeenshown.Intheexample,thedistributedstoragesystem number of transmission (retransmission) we are interested to contains n = 10 storage nodes and every k = 5 nodes maximize the probability of successful repair. We note that a can rebuild the source file. In the first scheme a minimum- repairprocessis called of success when the new node has the bandwidth regenerating (MBR) code with parameters d = ′ MDS property (along with surviving nodes). 7,d =9isused.Inthesecondschemeaminimum-bandwidth ′ regenerating code with parameters d = 5,d = 6 is used. III. MAXIMIZING THEPROBABILITY OF SUCCESSFUL Using(2),thetransmittedbandwidthfortheformerisγ′ =3.6 REGENERATINGps andforthe latteris γ′ =4. FromFig. 3, we can see thateven Obviously, one approach to provide redundancy (for era- thoughthebandwidthofthesecondschemeishigherthanthe sure)forrepairisto connectmoresurvivingnodesto the new first one, it has lower probabilities of success for ε < 0.4. node. That is, for the repair in a lossless network γ = dβ This example motivates us to further investigate the problem packets are transmitted from d nodes (each node transmits β of maximizing the probability of successful regenerating for ′ ′ fragments). Then in an easure network, d (d ≥ d) nodes the given repair bandwidth. ′ ′ ′ transmitting γ = d β packets may be necessary. One inter- Given the constraint on the transmitted bandwidth (γ = estingscenarioisthatwiththeconstraintofthetotalrepairing d′β),dandd′ canbeoptimizedtomaximizetheprobabilityof ′ bandwidthin the network (γ ≤γth), howto find the optimal successfulrepair.Theoptimizationproblemcanbeformulated repair (maximizing the probability of successful regenerating as follows, p ). We will show that solving the problem depends on the s link erasure probability. max ps d,d′ (7) To formulate the problem, consider a repair process in a ′ s. t. d β ≤γ distributed storage system. Links from surviving nodes to the th new node fails with probability ε. To simplify illustration, we For illustration, we use the optimization problem in the ′ assume all the linkshave equalerasure probabilities.To com- previousexampleandfindoptimaldandd forgiven γ ≤5 th ′ batlinkerasures,theredundantdatashouldbetransmitted.Ina overdifferenterasureprobabilities.Theoptimal dandd with lossless network forregeneratinga new node with parameters different erasure probabilities have been shown in Fig. 4. We (n,k,d,α,γ = dβ). By receiving γ = dβ data, the new can see that for the network with high erasure probabilities, node is regenerated successfully. Therefore, a regenerating using fewer surviving nodes in the repair process maximizes Maximizingpsfordifferentintegervaluesofdandd′ node1 10 a1 secondtransmission 9 d′ b1 f1=a1+2b1 numberofstoragenodes 5678 d baaSb1122 ∞∞∞∞ nnooddee32aa11++2bb11ab++22aa22++b22b2 f3=4ffa121==+a251ab21+++2bb412a2+5b2 4 node4 30 0.2 0.4 0.6 0.8 3aa11++22bb11++32aa22++b32b2 65aa11++97bb11++68aa22++67bb22 ε(erasureprobability) node5 Fig.5. Retransmission isusedtorecover thelostpacket. Fig.4. Foranetworkwithlowε(erasureprobability), maximizing ps uses a larger number of surviving nodes for repair. For a network with high ε, maximizing ps findsasmallernumberofsurviving nodesforrepair. node1 f1+3f2+2f3 secondtransmission a1 b1 f1+3f2+2f3 ∞ pussi.nCgomnvoerresesluyr,vfivoirntghenondeetwsomrakxwimitihzelsowps.erasure probability, S ∞ node2 ab22 f1=a1+2b1 f2=2a2+b2 IV. INCREASING STORAGE SPACE OF NODES ba11 ∞ node3 To provide redundancy, one approach is to use more sur- ab22 aa11++2bb11++aa22++b22b2 f3=4a1+5b1+4a2+5b2 ∞ viving nodes for repair. However, connecting to more storage node4 nodesmayleadtolargerdelay.Forinstance,inFig.1,thenew 3aa11++22bb11++32aa22++b32b2 65aa11++97bb11++68aa22++67bb22 node5 nodemayhaveto waitlongertimeto meetmorenodeswhich are moving. Another approach of improving reliability is to Fig. 6. Larger storage capacity increases the probability of successful retransmitthelostpacketsfromthesurvivingnodestothenew regeneration thanretransmission. node.However,thiswillleadtolargerdelayandalsofeedback may be necessary. In what follows, we propose to increase each surviving node transmits β additional fragments to the storing capacity of the nodes to increase the probability of 2 newnode.Tosimplifycutanalysisonthenetwork,wesplitα successful regenerating. 1 and α storage fragmentsof each node,as shown in Fig. 7 (it Forillustration,consideranexampleofadistributedstorage 2 can be proved that there is no loss in separating α and α ). system with four nodes shown in Fig. 5. Assume M = 4, 1 2 Notethatβ fragmentsareencodedfromα storedfragments, n = 4, k = 2, d = 3 and each node stores two fragments 1 1 andβ fragmentsareencodedfromotherα storedfragments. (α = 2). By (1), the optimal repair-bandwidth in a lossless 2 2 In total, dβ +dβ fragments are transmitted from surviving network is β = 1. Then, when a node fails, the new node is 1 2 nodes. However, due to link failures, the new node receives regenerated by acquiring β = 1 fragment (linearly encoded d β and d β fragments where d ,d < d. We shall show from data stored in surviving node and denoted by f , f 1 1 2 2 1 2 1 2 how to calculate p from d ,d ,β ,β as follows. and f respectively) from each of three surviving nodes. We s 1 2 1 2 3 Proposition 2: To regenerate a new node, which receives assume the channels with the erasure probability ε = 0.1. from d storage nodes by β fragments from α fragments of We also assume no feedback to surviving nodes. Thus the 1 1 eachsurvivingnodeandβ fragmentsfromotherα fragments retransmissionof fragment f givesthe probabilityof success 2 2 ps = ( 2i=1(1 − ε)iε(2−i)1)(1 − ε)2 = 0.8019. Yet if a onfettwhoerknocdoed,inthgeinftdheβne+wdnβode+cank−b1emreigne(nαera,t(edd−biy)βlin)ea≥r linear combination of f ,f , and f can be stored in node 1 1 2 2 i=1 1 1 P 1 2 3 M.Hered andd arethenumberoflinksinthenetworkfor 1, as shown in Fig. 6, then the probability of success is 1 2 P p = 4 (1 − ε)iε(4−i) = 0.9477 (since any 3 out of transmittingβ1andβ2fragmentswithouterasure,respectively. s i=3 Then, the probability of successful repair is, 4 transmitted fragments can regenerate the new node). Note P that two schemes use the same bandwidth for repair. The d d d d example shows that increasing storage space can increase the ps = (1−ε)d1+d2ε2d−(d1+d2)I(d1,d2) d d probabilityof sucessful repair. In the following,we formulate dX1=0dX2=0(cid:18) 1(cid:19)(cid:18) 2(cid:19) (8) this problem in a general setting. where, Consider the repair process in a distributed storage system where each node stores α = α1 + α2 fragments. Assume 1 if d1β1+d2β2 kα1 fragments from k nodes can rebuild the original file. I(d ,d )= ≥M − k−1min(α ,(d−i)β ), (9) If channels are error-free then a new node is regenerated by 1 2  i=1 1 1 0 otherwise transmitting β fragments encoded from α stored data of a P 1 1 node. To increase the probability of successful regenerating, Proof: Ownto space limitation, we skip the proof here. d1 node1 cut node1 α1 β1 β1 ∞ α1+α2 ∞∞ nodαe12 β1... α1 β1 node2 . α1 ∞ . α1+α2 ⇛ S ∞ nodαe.1n ... α1 S .. ∞ β2 . . . ∞ noαd1e+nα2 ∞∞ nnooαddee221 β2 d2 . ... ... . nod..en α2 α2 Fig.7. Cutanalysis forthenetwork withadditional storage spaceα2 whichcanincrease ps. Maximizingprobabilityofsuccessfulregenerating.Thesystemhasn=10,k=5,d=9. lowbandwidthregion,theoptimalsolutionisstilltouseMSR 6 codes.OurinvestigationshowsthatMBRcodesleadtohigher probabilityof successfulregenerationonlywhen there is high 5.5 storage space and large bandwidth available. 5 4.5 V. CONCLUSIONS h 4 We study the regeneration problem for distributed storage αt systemswhenthechannelsareunreliable.Weshowthefunda- 3.5 mentalstorage-bandwidthtradeoffforerasure networks.Then 3 the probabilityof successful regeneratingis analyzed.We use 2.5 two approaches to maximize the probability of successful 2 regenerating. The first is to use more surviving nodes for 1.5 repair. The second is to increase the storage space of storing 2 3 4 5 6 7 8 9 γth nodes.Weshowthattheoptimalsolutionreliesonthechannel- erasure probabilities. We show that increasing storage space Fig.8. ThefilledblueareaisfortheMSRcodeprovidingahigherprobability can also increase reliability for repair. ofsuccessfulregenerating thantheMBRcode. REFERENCES [1] S. Jain, K. Fall, and R. Patra, “Routing in a delay tolerant network,” By the Proposition 2, we can find the optimal storage Proc.ACMSIGCOMM,2004. allocation to achieve the maximum probability of successful [2] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. O. Wainwright, and K. Ramachandran,“Networkcodingfordistributedstoragesystems,”IEEE regenerating for a given storage space and repair bandwidth. Trans.Inform.Theory,vol.56,no.9,pp.45394551, September2010. Moreformally,assumethatstoragespacepernodeisbounded [3] Y.Wu,“Existenceandconstructionofcapacity-achievingnetworkcodes by α and the maximum possible total repair-bandwidth is for distributed storage,” IEEE Journal on Selected Areas in Commun., th vol.28,no.2,pp.277-288,Feb.2010. γth. Then the optimization storage allocation is formulated [4] K. W. Shum, “Cooperative regenerating codes for distributed storage as, systems,”InIEEEInt.Conf.Comm.(ICC),Kyoto, Jun.2011. [5] M.Gerami,M.Xiao,andM.Skoglund,“Optimum-costrepairinmulti- hop distributed storage systems,” IEEE International Symposium on max p s Information Theory(ISIT)2011. α1,α2 [6] K. V. Rashmi, N. B. Shah, K. Ramchandran, and P. V. Kumar, “ s. t. α +α ≤α 1 2 th (10) Regeneratingcodesforerrorsanderasuresindistributedstorage,” IEEE dβ1+dβ2 ≤γth International Symposium onInformation Theory(ISIT)2012. β ≤α . [7] A. F.Dana, R.Gowaikar R. Palanki, B. Hassibi, M.Effros,“Capacity 2 2 ofwirelesserasurenetworks,” IEEETrans.onInform.Theory,Vol.52, pp.789-804,March2006. We use an exampleforillustration,which finds the optimal [8] S.Jain,M.Demmer,R.Patra,andK.Fall,“UsingRedundancytoCope storageallocationbetweenMSRandMBRcodesinanetwork with Failures in a Delay Tolerant Network,” Proc. ACM SIGCOMM, with n = 10,k = 5 and ε = 0.1. We know that MSR codes 2005. [9] D. Leong, A. G. Dimakis, and T. Ho, “Distributed storage allocation uselessstorage,buttheyneedlargerrepair-bandwidth.On the problems,” in Proc. Workshop Netw. Coding, Theory, and Appl. (Net- otherhand,MBRcodesusemorestorageateachstoragenode, Cod),Jun.2009. butthey have lowerrepair-bandwidth.We use the storageand [10] D. Leong, A. G. Dimakis, and T. Ho, “Distributed storage allocations foroptimaldelay,” CoRRabs/1106.2581: (2011) bandwidth as variables, as shown in Fig. 8. Then we observe [11] D.S.Lun,M.Medard,R.Koetter,andM.Effros,“Oncodingforreliable thatforlow storagespace, theoptimalsolutionis touse MSR communication overpacketnetworks,”Elesevier PhysicalCommunica- codes. It is interesting to see that for the large storage and tion2008.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.