NASA-CR-20181_ - -- ' _01 --G _3 JOURNAL OF PARALLEl. AND DISTRIBUTED COMPUTING 29, 211-218 (1995) //t_- Ensuring Correct Rollback Recovery in Dish'ibuted Shared Memory Systems I BOB JANSSENS2AND W. KENT FUCHS Centerfor Reliable andHigh-Performance Computing, Coordinated Science Laboratory, University ofIllinois, 1308West Main Street, Urbana, Illinois 61801 message-passing systems. It is possible to directly apply Distributed shared memory (DSM) implemented on acluster this research to shared memory, by modeling the system of workstations isan increasingly attractive platform for execut- in terms of message passing. However, previous work in ing parallel scientific applications. Checkpointing and rollback shared-memory recovery has used a laxer model of de- techniques can be used in such a system to allow the computa- pendencies, using the intuition that only messages that tion to progress in spite of the temporary failure of one or transfer actual application data should cause dependencies. more processing nodes. This paper presents the design of an This assumption simplifies the implementation of a recov- independent checkpointing method for DSM that takes advan- erable DSM, since many control dependencies can be ig- tage of DSM's specific properties to reduce error-free and roll- nored. Furthermore, the performance overhead of han- back overhead. The scheme reduces the dependencies that need to be considered for correct rollback to those resulting from dling dependencies is reduced, and the potential for transfers of pages. Furthermore, in-transit messages can be rollback propagation is decreased. recovered without the use of logging. We extend the scheme This paper presents the design of an independent check- to a DSM implementation using lazy release consistency, where pointing method for DSM that takes advantage of DSM's the frequency of dependencies is further reduced. ©t99sAca- specific properties to reduce error-free and rollback over- demicPress, lnc, head. By using periodic checkpointing, by ensuring that a node's interaction with other nodes is atomic, and by recovering the pagetable independently through ownership 1. INTRODUCTION timestamps, the number of dependencies that can cause rollback propagation is reduced. Additionally, the re- Distributed shared memory (DSM) provides the pro- maining dependencies are unidirectional, allowing the al- gramming advantages of a shared-memory image in scal- gorithm to handle in-transit messages without the use of able parallel systems. Checkpointing and rollback are com- message logging. By using a passive server model of DSM monly used to recover from detected processor errors in we show that the dependencies in our recovery algorithm environments where high reliability is essential. Recent can be derived from the traditional message-passing model. trends toward using workstation clusters for parallel scien- We extend our checkpointing method to a DSM with lazy tific computing make recoverability useful even when relia- release consistency [7, 15], further reducing dependencies bility demands are less critical. In a workstation network, to only synchronization interactions. As described, our a node may kill a process or completely reboot, either due schemes are designed for software-implemented DSM to a system exception, or due to direct action by a user. (shared virtual memory) and independent checkpointing. With checkpointing and rollback, an application can re- However the ideas presented can also be extended to hard- cover from such an event without restarting computation ware implementations and any other distributed system from the beginning. Checkpointing is also useful for pro- checkpointing method [14]. cess migration to reduce adverse impact on other users [18]. To ensure correct recovery in a parallel system, a roll- In parallel systems, dependencies between processing back needs to result in a consistent global state [6]. If the nodes can cause the overall system state to be incorrect execution is partially deterministic, logging and message when one node rolls back. The problem of rolling back to replay can be used, albeit at a high cost [9]. The simplest a consistent global state has been widely investigated for recovery method in general nondeterministic parallel sys- tems is coordinated checkpointing [6, 8], where nodes syn- This paper contains material previously presented at the13thSRDS chronize both to checkpoint and to roll back. To avoid [13].Thisresearch wassupported inpartbytheOfficeofNaval Research coordination overhead, independent checkpointing can be under Contract N00014-91-J-1283, and bythe National Aeronautics and used [4, 23]. Its main disadvantages are the need to track SpaceAdministration (NASA) under Grant NASA NAG 1-613,incoop- all dependencies, the need to implement logging to enable eration with the Illinois Computer Laboratory for Aerospace Systems and Software (ICLASS). recovery of in-transit messages, and the potential for roll- E-mail: [email protected]. back propagation• 211 0743-7315/95 $12.00 Copyright © 1995 by Academic Press, Inc. All rights of reproduction in any form reserved. 212 JANSSENS AND FUCHS Various distributed system recovery techniques have 2.1. Maintaining Atomicity of Server Events been applied to shared memory. Earlier techniques have The recovery algorithm ensures that a node's part of an used communication-induced checkpointing [2, 5, 12, 24]. interaction is executed atomically. This avoids deadlock Schemes using coordinated checkpointing have also been and spurious messages caused by partially completed inter- developed, both in bus-based systems [2, 3] and in DSM actions. When the checkpoint routine is called, check- systems [10, 11]. Various DSM schemes based on logging pointing is delayed if an interaction is in progress. It is not and deterministic replay have also been designed [19, possible to delay a rollback if an error is detected during 20, 221. an interaction. Consider the situation in Fig. 2, where a The distributed system model, where every message request for read access from node A has been forwarded, causes a dependency between nodes, is too strict for by the page's manager on node M, to node B. If node A shared-memory parallel programs. A more relaxed depen- rolls back while waiting, it will receive the reply from node dency model can be used for rollback recovery in shared B unexpectedly after rollback. To handle such spurious memory only if there is no possibility of deadlock due to replies, a sequence number uniquely identifying the inter- nodes waiting for messages that may never arrive. Recov- action is attached to each message. In the example, node ery schemes designed for bus-based shared memory sys- A assigns a unique number to the request message, and B tems have generally used the relaxed model. In these sys- attaches this number to its reply. When node A rolls back, tems deadlock is avoided by the bounded transmission it will have no record of the sequence number sent by B, delay of the bus. In DSM systems, other measures have so it rejects the message. to be taken to avoid messaging deadlock if the relaxed If node B or node M rolls back to while handling the dependency model is used. A dependency pattern and an request from A, node A will not receive a reply from node atomic interaction model similar to the one described in B. To handle such cases of potential deadlock, a node this paper has been used to design a coordinated check- waiting for a reply that receives a request for dependency pointing scheme [11]. information waits for a fixed amount of time, and if no reply is received, indicates that it must roll back by setting the mus t_rb flag in its dependency table. In the example, 2. DESIGN OF A RECOVERABLE DSM if node B decides to roll back before sending the reply, all other nodes in the system that receives B's request for Our recoverable DSM algorithm uses a fixed distributed dependency information while waiting for a reply set a manager (FDM) protocol for maintaining coherence [17]. timer. In the absence of other rollbacks, all nodes except In this protocol, every node maintains ownership informa- A probably receive the reply before timing out. Node A tion for a fixed subset of shared pages. A page fault to a does time out, sending its dependency table to B with shared page causes a request to its manager, which for- must rb set. Node B then constructs its consistent global wards it to the owner. A node's page table indicates that checkpoint taking into account that node A must roll back. it has either exclusive write access (W), read access (R), or invalid access (I) to a page. A copyset of nodes that 2.2. Page Table Recovery with Ownership Timestamps have a copy of a page is maintained to allow their invalida- tion when a node obtains exclusive write access. Our design uses an ownership timestamp scheme where Pseudo-code for our checkpointing and rollback recov- every node keeps track of the last time it became owner ery algorithm as integrated into the DSM algorithm is of a page. The scheme allows all directory information given in Fig. 1. A page fault initiates an interaction, where beside the ownership timestamps to be lost after rollback a local fault handler iscalled, which then consults a request without affecting correct execution. Every time ownership server on the owner via the manager. Every nodes calls is transferred, the old owner sends its current value of its checkpoint routine every T seconds. Every checkpoint the page's ownership timestamp to the new owner. Upon on a node starts a new checkpoint interval by incrementing receiving the value, the new owner increments it and then ckp_interval. This value is appended to every message stores it as its ownership timestamp for the page. Periodi- that transfers a page of data between nodes. The receiving cally, when the timestamp overflows, all nodes need to node uses the value to update its dependency table, which synchronize to reset their timestamps to ensure correct is used for dependency tracking [4, 23]. When an error is ordering. Ownership timestamps are saved together with detected, the rollback initiation routine constructs a consis- the state of the user appliation during checkpointing. tent global checkpoint by requesting every other node's After a node rolls back, all the page table information current dependency table. It then sends appropriate roll- except for the ownership timestamps is unknown. The first back commands to the nodes. These nodes may need to access to a page after rollback causes afault. If the manager roll back multiple checkpoint intervals. Unless a garbage did not roll back, the protocol proceeds as usual, restoring collection algorithm [23] is used, all checkpoints are saved access rights to the requester. If the manager did roll back, until the end of program execution. Unlike message-pass- it has no ownership information, and queries all nodes ing recovery algorithms [4, 23], in-transit messages do not for their ownership timestamps for the page. It can then need to be replayed from a log during reexecution. determine the owner of the page by comparing all the ROLLBACK RECOVERY IN DISTRIBUTED SHARED MEMORY 213 Read Fault Handler Write Fault Handler send read request to manager of page; send write request to manager of page; receive page and ckp_interval from owner; receive page, copyset, ownership update dependency table; timestamp and ckp_interval from owner; access = R; update dependency table; send invalidates to all members of copyset; increment and save ownership timestamp; access = W; Read Request Server Write Request Server access = R; access = I; add requester to copyset; send page, copyset, ownership timestamp send page and ckp_interval to requester; and ckp_interval to requesting node; Manager Checkpoint if (owner == unknown) if (in_interaction) request ownership timestamps from wait for completion; all nodes; endif owner = node with largest timestamp; save dependency table; endif save ownership timestamps; forward request to owner; save user state; if (request == write) owner = requester; increment ckp_interval; Rollback Initiation Rollback Server request dependency info from other nodes; if (waiting) determine recovery line; set timeout timer; send rollback request to other nodes; if (timer expires) must_rb = I; endif send dependency table to initiator; receive rollback request; if (rb_interval != current) restore user state for rb_interval; reset pagetable; restore ownership timestamps; endif FIG. 1. Pseudo-code for independent checkpointing algorithm. ownership timestamps received. Due to rolibacks on other R(x) nodes, it ispossible that the manager has incorrect owner- ship information. Therefore every node keeps track of the pages it owns. If a node receives a request to a page it Node A _ \- - waiting ..... does not own, it rejects the request, and ownership time- stamps are used to find the correct owner. NodeM _ _askread 2.3. Performance Impact Our DSM recovery scheme reduces the two main draw- forward read _ backs of independent checkpointing techniques: the high NodeB _ / error-free overhead of message logging and dependency tracking, and the potentially high overhead of recovery FIG. 2. Situation resulting from an incomplete interaction. due to rollback propagation. Dependencies between 214 JANSSENS AND FUCHS TABLE I Address Trace Characteristics Data reads Data writes Total number of Program Description references Total Shared Total Shared gravsim N-body simulator 92,178,814 33,266,880 12.484,455 6,392,078 251,694 fsim Fault simulator 149,918,375 50,950.933 39.326,911 3,958,919 999,127 tge n Test generator I01,264.382 32,613.809 16.550,450 4,461,889 642.796 pace Circuit extractor 87,861,165 23,266.576 1.286,787 7,842.338 348,524 phigure Global router 132,998,231 38,244.233 4.281,207 11,530,981 1.876,400 checkpoint intervals can cause a domino effect, where cas- the runtime analysis, nor does it need to incur logging cading rollbacks force reexecution of a large part of the overhead for any messages. program. To determine the reduction in dependencies caused by using our scheme we performed trace-driven 3. MODELING DSM DEPENDENCIES measurements with multiprocessor address traces from five In order to reason about the correctness of the DSM parallel scientific programs running on an Encore Multimax. The traces were generated by the TRAPEDS recovery scheme, it is necessary to extend the traditional address tracer from execution on seven processors. Each message-passing model of execution. We describe our pas- trace contains at least 10 million memory references per sive server model for DSM execution with rollbacks, and processor [21]. Table I describes the characteristics of the then analyze our recovery algorithm for DSM to show that traces used. only page-transfer dependencies remain. Figure 3 presents simulation results for the frequency 3.1. The Passive Server Model of messages in the DSM applications. There are about 10,000 messages per million memory references, all of Program execution in a message-passing distributed sys- which cause dependencies in a traditional message-passing tem is modeled as a set of processes and a set of reliable approach to independent checkpointing. The frequency of channels. Program execution is represented by a pair, dependency-carrying messages is decreased by a factor of P = (E, ---D--Q, where E is a set of events and _ is the about 3.5. This means the overhead of dependency tracking dependence relation defined over E. Events within a pro- is decreased. More importantly, the potential for rollback cess are ordered by the _ (execution order) relation. propagation and the domino effect is reduced. Since our Events on different processes are ordered by the _ (mes- scheme uses periodic checkpointing, with approximately sage) relation where a _ b means event asent a message the same period on each node, only a small percentage of and event b received it. The _ relation is the union of dependencies occur between different checkpoint intervals the other two: _ = --x-o--* tj -----,. Every event represents and cause rollback propagation. The lower the frequency an atomic action which may change the state in one of of dependency-carrying messages, the higher the probabil- the processes. A special checkpoint event can be inserted ity that the latest set of checkpoints represents a consistent between two events to record the current state of the global state and can be used for recovery. In traditional process. message-passing independent checkpointing schemes, run- When a process needs to roll back, it communicates time analysis can be used to avoid logging all but 1% of with all other processes to determine a consistent set of these messages [23]. Our scheme does not need to perform checkpoints. Upon receiving notification of a rollback, a process may either need to roll back to a checkpoint, or it may continue operation. If it continues, we can treat the 1500C recoverable DSM current volatile state as a virtual checkpoint [23]. A global H .9 o--o all messages checkpoint is a set of real and virtual checkpoints, one per 1ooo 4EY process. Consider two events a and b, where b occurs in the execution order before the global checkpoint and a oo ®.o occurs in the execution order after the global checkpoint. _ 5000 A global checkpoint is consistent if there are no two such events such that a _ b or b _ a. A global checkpoint _'_ 2000 is also consistent if lost messages can be retrieved during o I I ] I reexecution and there are no two events such that 4 16 64 256 1k 4k a u-E-,b. page size (bytes) To simplify reasoning about consistency of global check- points it is useful to treat the _ relation as bidirectional. FIG. 3. Frequency of dependency comparison. ROLLBACK RECOVERY IN DISTRIBUTED SHARED MEMORY 215 M To do this we replace every dependency a _ b, by a and the timestamps will be used to find the correct owner. c causal dependency a ----+ b, and a backward dependency So again we can ignore all dependencies with node M. b B a.Consider again two events aand b, where boccurs Therefore, by using ownership timestamps, the function in the execution order before a global checkpoint and a of the manager has been made redundant and does not occurs in the execution order after a global checkpoint. have to be considered for rollback to a consistent state. The requirements for consistency are now that there are Having eliminated the dependencies with the node that c no two such events such that a ----, b and there are no contains a page's manager, we can now analyze interactions B two such events such that b _ a and the message between solely in terms of the dependence between the local (L) a and b is unlogged. and remote (R) nodes. In a read interaction to a clean Our passive server model for DSM systems is derived page (Fig. 4a, when node L rolls back, the state of the from the message-passing model. We model program exe- recovery line is the same as if node R also rolled back, cution in DSM systems as a set of client processes which except for the extra member of the copyset. Since the run the application program and a set of passive server copyset is allowed to be a superset of all the nodes that processes which provide a shared-memory image to the have readable copies, the recovery line is consistent. So clients. Events in the servers are always triggered by the the backward dependency L _ R can be eliminated. receipt of a request message, either from a client, or an- When the read interaction involves a dirty page (Fig. 4b), other server. A write or read event in a client, together a rollback of node L will cause a situation where node R with the events it causes in the servers may be collectively has lost write permission without guaranteeing that a copy called an interaction. The passive server model differs from of the dirty page has been saved on another node. How- the message-passing model in that it collects all the events ever, node R is still the owner, so any further requests will in a process during an interaction into one single event. be supplied from its copy of the page. Therefore the Figure 4 illustrates the interactions in the FDM algorithm dependency L _ R can again be eliminated. So, in a in terms of the passive server model. For every causal read interaction, there remains only the causal depen- dependency between processes, there is abackward depen- dency, R _5_ L, from the remote node to the local node. dency which is not shown. Next, we consider a remote write access (Figs. 4c and 4d). Ignoring invalidations, the interactions for a clean and dirty write are identical, with the access permission of the 3.2. Eliminating Dependencies page on the remote node changing from W to I. If node We now analyze the FDM recovery algorithm to verify L rolls back and reexecutes the write access, the request that all but causal page transfer dependencies can be ig- is directed by the manager to node R. Node R rejects the nored. Consider the role of the manager in an interaction. request because it has given up ownership. This rejection On a read interaction, the state on the manager's node will cause the ownership timestamps to be used to find (node M) is the same before the interaction as afterwards. the correct copy of the page. Therefore, the dependency (, If node M rolls back, the ownership information it main- L _ R is eliminated. The causal dependency R _ L tains is lost, but it can be recovered by using ownership remains since it transmits a block of data. timestamps. So all dependencies involving node M in a If the block is readable by more than one remote node read interaction can be eliminated. In a write interaction, when the local node asks for write access, all the copies node M changes state; it records the new owner of the in the remote nodes will be invalidated. A node L can page. If the new owner rolls back, node M may contain safely roll back past an interaction in which it invalidated erroneous ownership information. Any request that is node R'. In the global state after rollback, it will appear routed to the wrong owner by M will be rejected however, as if node R' has been invalidated spontaneously and at clean dirty clean dirty read read write write cs=O cs=O local . rec>v(page) _ recv(page) recv(page) recv(pagR!> W node L ,__or R ->W manager node M ._page ,_ page remote ;cs++ L) R W__> page age node R a b C d FIG. 4. FDM algorithm interactions in terms of the passive server model. 216 JANSSENS AND FUCHS the next access node R' can ask the owner for a new copy 4. RECOVERABLE DSM WITH LAZY RELEASE of the block. Therefore there is no dependency L ---> R'. CONSISTENCY If node R' rolls back past the interaction, the access rights of all its blocks are set to unknown. Therefore any access In software DSM systems, false sharing due to large page to a block that was invalidated before rollback will ask sizes, and high per-message overhead can make generating the owner for a new copy, just as if the block had been speedup with traditional protocols difficult. Lazy release invalidated. So the remaining R' --->L dependency is elimi- consistency (LRC) successfully overcomes these draw- nated, resulting in a dependency-free invalidation inter- backs, approaching the performance of a bus-based multi- action. processor on a high-speed workstation network [7, 15]. We Write Fault Handler Interval Creation create twin of page; create write notices for every access = W; twinned page; Acquire Acquire Server send acquire request, vt, and ckp_interval send node id and last_acq timestamp to to lock manager; requester; receive node id and last_acq timestamp wait until lock is released; from lock holder; update dependency table; increment and save last_acq timestamp; send vt, write notices, all dills, receive vt, write notices, all dills, and ckp_interval to requester; and ckp_interval from last acquirer; update dependency table; apply diffs; Lock Manager Checkpoint if (last_acq =ffi unknown) if (in_interaction) request last_acq timestamp wait for completion; from all nodes; endif last_acq = node with largest timestamp; save dependency table; endif save node state; forward request to last acquirer; set last_acq to requester; Rollback Initiation Rollback Server request dependency info from other nodes; if (waiting) determine recovery line; if (lock holder == unknown) send rollback request to other nodes; set timeout timer; if timer expires must_rb = I; else if (lock holder == requester) must_rb = 1; endif endif send dependency table to initiator; receive rollback request; if (rb_interval != current) restore user state; set last_acq records to unknown; endif FIG. 5. Pseudo-code for recoverable LRC algorithm. ROLLBACK RECOVERY IN DISTRIBUTED SHARED MEMORY 217 acquire acquire vt vt _ \ lastreq.timestamp ', manager (") last req= L write notices node M diffs nodeLlocal vt .__w._aiting.._,i ?_i ilcilX°iiiil) __ ,') remote __ _> ___>0' node R waiter=L release release a b FIG. 6. (a) Acquire interaction in LRC algorithm. (b) Extra message to support rollback recovery. use the low number of messages transmitted in a DSM laat_acq timestamp when it first receives the acquire with lazy release consistency to develop a recovery algo- request (see Fig. 6b). When a rollback request is received, rithm that further reduces the number of dependencies any node waiting for a lock compares the id of the rollback that need to be considered. initiator with the id of the node holding its lock. If they Our algorithm uses a modified version of the lazy update are equal, it sets must_rb, guaranteeing that it will roll version of LRC [7}. Instead of using the traditional sequen- back out of the partially completed interaction. tial consistency ordering of shared memory accesses [16], The last req timestamp scheme ensures that all de- LRC only enforces ordering between intervals in the com- pendencies with the lock manager can be ignored. The putation delineated by acquire and release synchronization only dependency occurs when a node succeeds in acquiring accesses. As long as the programmer introduces enough a lock. However, since our LRC algorithm does not have synchronization in the program to avoid data races, the an independent mechanism to recover all coherence infor- system is indistinguishable from a sequentially consistent mation, the backward dependency cannot be eliminated. system [1]. Rather than logging messages, our algorithm records a Pseudo-code for our recoverable LRC algorithm is given bidirectional dependency on an acquire. Figure 7 shows in Fig. 5. A multiple writer protocol is used, where a twin the results of simulations with the shared-memory address of a page is created locally on a write fault and updates traces for our sequential consistency and LRC algorithms. are propagated to other nodes by comparing a page and The LRC algorithm reduces the dependency frequency by its twin and encoding the result in a diff. Ordering is guar- about a factor of 3. anteed by using vector timestamps (vt) [15]. Execution on processors is divided into intervals by synchronization 5.CONCLUSIONS accesses. Every processor keeps track of which intervals Checkpointing and rollback recovery algorithms for it is aware by updating its vector timestamp on any interac- tion. On an acquire, the vector timestamps are used to message-passing systems are grounded in well-established propagate write notices of all modifications to memory theory. Research on recoverable shared memory has gen- locations that occurred before the acquire. To limit the erally assumed a looser dependency relation for data trans- dependencies to acquire interactions, unlike the original fers only. By using a passive server model, our work shows LRC algorithm [7], our algorithm also sends all dills to- that this dependency pattern for shared memory can be gether with the write notices. Periodic garbage collection derived from the dependency pattern for message passing, and therefore can be used in architectures where a shared- deletes write notices have been propagated to all nodes. There is no concept of ownership of memory blocks; all the information on the contents of pages is transferred _ sequential consistency j.-" directly from the releaser to the acquirer of a lock. Locks 8 3000 -- lazy release consistency .-_ are implemented separately from data pages. Every lock _._, has a manager which keeps a record of its last acquirer in the last_acq variable. "_ _ 2000 During checkpointing, all state of the nodes is saved. 'o co However, at recovery, the last_req records in the lock _,== 1000 RE managers, and any record of a waiter at a lock in a node are set to unknown. A last_req timestamp, analogous ,', 0 I ] t I i I to the ownership timestamp used for pages in the previous 4 16 64 256 1k 4k algorithm, is used to recover unknown lazt_req records. page size (bytes) The only interaction in our algorithm occurs during an acquire, as illustrated in Fig. 6. To make the interaction FIG. 7. Dependency frequencies with different memory consis- atomic, the node holding the lock replies with its id and tency models. 218 JANSSENS AND FUCHS memory image is provided via physically distributed mem- 13. Janssens, B., and Fuchs, W. K. Reducing interprocessor dependence ory. The model allows the implementation of efficient re- in recoverable distributed shared memory. Proc. 13th Syrup. Reliable Distributed Syst., Oct. 1994, pp. 34-41. coverable DSM algorithms. Since the need for logging is 14. Janssens, B., and Fuchs, W. K. Recoverable distributed shared mem- eliminated, and the potential for rollback propagation ory under sequential and relaxed consistency. Tech. Rep. CRHC-95- across a set of checkpoints is decreased, our method is 10, Center for Reliable and High-Performance Computing, Univ. of especially applicable to periodic independent check- Illinois, Urbana, IL, May 1995. pointing. We applied our technique in the design of peri- 15. Keleher, P., et al. Distributed shared memory on standard worksta- odic checkpointing algorithms for both sequential consis- tions and operating systems. Proc. Winter Usenix Conf., Jan. 1994, tency and lazy relaxed consistency memory models. pp. 115-131. 16. Lamporl, L. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9(Sep. ACKNOWLEDGMENTS 1979), 690-691. Our work benefited from discussions with Alain Gefflaut at IRISA 17. Li, K., and Hudak, P. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Systems 7, 4 (Nov. 1989), 321-359. and Gaurav Suri, Yi-Min Wang, Nuno Neves, and Sujoy Basu at Illinois. 18. Litzkow. M.. and Solomon, M. Supporting checkpointing and process migration outside the UNIX kernel. Proc. Usenix Winter Conf., 1992. REFERENCES 19. Neves, N., Castro, M., and Guedes, P. A checkpoint protocol for an entry consistent shared memory system, Proc. 13th ACM Symp. 1. Adve, S. V., and Hill, M. D. A unified formalization of four shared- Principles Distributed Comput., Aug. 1994. memory models. IEEE Trans. Parallel Distrib. Systems 4, 6 (June 20. Richard, G. G., Ill, and Singhal, M. Using logging and asynchronous 1993), 613-624. checkpointing to implement recoverable distributed shared memory. 2. Ahmed, R. E., Frazier, R. C., and Marinos, P. M. Cache-aided rollback Proc. 12th Syrup. Reliable Distributed Syst., 1993, pp. 58-67. error recovery (CARER) algorithms for shared-memory multiproces- 21. Stunkel, C. B., Janssens, B., and Fuchs, W. K. Address tracking of sor systems. Proc. 20th Int. Syrup. Fault-Tolerant Comput., 1990. parallel systems via TRAPEDS. Microprocessors Microsystems 16, 5 pp. 82-88. (1992), 249-261. 3. Ban_tre, M., et al. An architecture for tolerating processor failures 22. Suri, G., Janssens, B., and Fuchs, W. K. Reduced overhead logging in shared-memory multiprocessors. Tech. Rep. 707, IRISA, Rennes, for rollback recovery in distributed shared memory. Proc. 25th Int. France, Mar. 1993. Symp. Fault-Tolerant Comput., June 1995. 4. Barghava, B., and Lian, S.-R. Independent checkpointing and concur- 23. Wang, Y.-M., and Fuchs, W. K. Optimistic message logging for inde- rent rollback for recovery in distributed systems--an optimistic ap- pendent checkpointing in message-passing systems. Proc. llth Symp. proach. Proc. 7th Syrup. Reliable Distributed Syst., 1988, pp. 3-12. Reliable Distributed Syst., 1992, pp. 147-t54. 24. Wu, K.-L., and Fuchs, W. K. Recoverable distributed shared virtual 5. Bernstein, P. A. Sequoia: A fault-tolerant tightly coupled multiproces- memory. IEEE Trans. Comput. 39, 4 (Apr. 1990), 460-469. sor for transaction processing. Computer 21, 2 (Feb. 1988), 37-45. 6. Chandy, K. M., and Lamport. L. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Systems BOB JANSSENS is a Ph.D. degree candidate in electrical and com- 3, 1(Feb. 1985), 63-75. puter engineering at the University of Illinois at Urbana-Champaign. 7. Dwarkadas, S., et al. Evaluation of release consistent software distrib- He isaresearch assistant in the Coordinated Science Laboratory, focusing on recoverable distributed shared-memory systems. His research interests uted shared memory on emerging network technology. Proc. 20th and experience include computer architecture, operating systems, parallel Int. Syrup. Comp. Arch., May 1993, pp. 144-155. and distributed computing, and fault-tolerant computing. He received 8. Elnozahy, E. N., Johnson, D. B., and Zwaenepoel, W. The perfor- B.S. and M.S. degrees in 1987 and 1991, respectively, from the Department mance of consistent checkpointing. Proc. 1lth Syrup. Reliable Distrib- of Electrical and Computer Engineering at the University of Illinois. He uted Syst., 1992, pp. 39-47. has held a summer research position at IBM T. J. Watson Research 9. Elnozahy, E. N., and Zwaenepoel, W. On the use and implementation Center in Yorktown Heights, New York, and a guest scientist position of message logging. Proe. 24th Int. Symp. Fault-Tolerant Comput., at Siemens Research in Munich, Germany. June 1994, pp. 298-307. W. KENT FUCHS received the B.S.E. degree from Duke University 10. Gefflaut, A., Morin, C., and Ban_tre, M. Tolerating node failures in in 1977. In 1984 he received the M. Div. degree from Trinity Evangelical cache only memory architectures. Proc. Supercomputing '94., Nov. Divinity School in Deerfield, Illinois, and in 1985 he received the Ph.D. 1994. degree from the University of Illinois. He is currently a professor in the Department of Electrical and Computer Engineering and the coordinated I1. Janakiraman, G., and Tamir, Y. Coordinated checkpointing-rollback Science Laboratory at the University of Illinois. He has received several error recovery for distributed shared memory multicomputers. Proc. awards for his research in dependable computing. He has edited special 13th Symp. Reliable Distributed Syst., Oct. 1994, pp. 42-51. issues of Computer and IEEE Transactions on Computers. He iscurrently 12. Janssens, B. and Fuchs, W. K. Relaxing consistency in recoverable a member of the editorial board for IEEE Transactions on Computers distributed shared memory. Proc. 23rd Int. Syrup. Fault-Toerant Corn- and IEEE Transactions on Computer-Aided Design of Integrated Circuits put., Jun. 1993, pp. 155-163. and Systems. He is a fellow of the IEEE. Received November 1, 1994: revised April 27, 1995; accepted May 8, 1995