3-Phase Commit Protocol

Three-Phase Commit Protocol

Srinivas R. Gaddam

Introduction

Fault-tolerant computer systems prevent the disruption of services provided to users. A system can be designed to be fault-tolerant in two ways. The techniques used to implement them are as follows :

Voting Protocols:They are used to design systems that mask failures.These systems continue to perform their specified functions despite failures.
Commit Protocols:They are used to design systems that exhibit a well defined behaviour in the event of a failure. These systems may or may not perform the specified function during failures, but they may facilitate actions suitable for recovery.

There are two kinds of commit protocols :

Two-Phase Commit protocol - a blocking protocol
Three-Phase Commit protocol - a non-blocking protocol

Here, we study the three-phase commit protocol only.
You might want to read the Definitions and Conditions that cause Blocking before going further.

Overview

While the two-phase commit protocol guarantees global atomicity, its biggest drawback is that it is a blocking protocol. Whenever the coordinator fails, cohort sites will have to wait for its recovery. This is undesirable as these sites may be holding locks on the resources. In the event of message loss, the two-phase protocol will result in the sending of more messages.

If transactions must be resilient to site failures

commit protocols must not block in the event of site failures
failed sites, upon recovery must all reach the same conclusion regarding the outcome of the transaction
operational sites should agree on the outcome of the transaction by examining their local states
this decision must be consistent with the final outcome of the operational sites

We will see how these conditions are met by this protocol.

Finite State Automata of the protocol

The Protocol

Assumptions

each site uses the write-ahead-log protocol
atmost one site can fail during the execution of the transaction

Basic Idea is that before the commit protocol begins, all the sites are in state q. If the coordinator fails while in state q1, all the cohorts perform the timeout transition, thus aborting the transition. Upon recovery, the coordinator performs the failure transition.

Phase I

During the first phase, the coordinator is in state w1, and each cohort is either in state a (in which case the site has already sent an abort message to the coordinator) or w or q depending on whether it has received the Commit_Request message or not. If a cohort fails, the coordinator times out waiting for the agreed message from the failed cohort. In this case, the coordinator aborts the transaction and sends abort messages to all the cohorts.

Phase II

In the second phase, the coordinator sends a Prepare message to all the cohorts if all the cohorts have sent Agreed messages in Phase I. Otherwise, the coordinator will send an Abort message to all the cohorts.On receiving a Prepare message, a cohort sends an acknowledge message to the coordinator. If the coordinator fails before sending Prepae messages (i.e., in state w1), it aborts the transaction upon recovery, according to the failure transition. The cohorts time out waiting for the prepare message, and also abort the transaction as per the timeout transition.

Phase III

In the third phase, on receiving acknowledgements to the Prepare messages from all the cohorts, the coordinator sends a Commit message to all the cohorts. A cohort, on receiving a Commit message, commits the transaction. If the coordinator fails before sending the commit message (i.e., in state p1), it commits the transaction upon recovery, according to the failure transition from state p1. The cohorts time out waiting for the Commit message. They commit the transaction according to the timeout transition from state pi. However, if a cohort fails before sending an acknowledgement message to a Prepare message, the coordinator timesout in state p1. The coordinator aborts the transaction and sends Abort messages to all the cohorts. The failed cohort, upon recovery, will abort the transaction according to the failure transaction from state wi.

Definitions

Synchronous protocols

A protocol is said to be synchronous within one state transition if one site never leads another site by more than one state transition during the execution of the protocol.

Concurrency set

Let si denote the state of a site i. The set of all the states of every site that may be concurrent with it is known as the concurrency set of si (denoted by C(si). For example, consider a system having two sites, using the Two-Phase Commit protocol. If site 2's state is w2, then C(w2) = { c1, a1, w1 }. Likewise, C(q2) = { q1, w1 }. Note that, a1, c1 do not belong to C(q2) because the two-phase commit protocol is synchronous within one state transition.

Sender set

Let s be an arbitrary state of a site, and let M be the set of all messages that are received in
state s. The sender set for s, denoted S(s) = { i | site i sends m and m belongs to M }

Conditions that cause Blocking

Let us see the conditions that lead to the Two-Phase commit protocol blocks. Consider a simple case where only one site remains operational and all other sites have failed. This site has to proceed based solely on its local state. Let s denote the state of the site at this point. If C(s) contains both commit and abort states, then the site cannot decide to abort the transaction because someother may be in the commit state. On the other hand, the site cannot decide to commit the transaction because someother site may be in the abort state. In other words, the site has to block until all the failed sites recover.

How to eliminate Blocking

To make the two-phase commit protocol a nonblocking protocol, we need to make sure that C(wi) does not contain both abort and commit states. This can be done by introducing a buffer state p1. We also introduce a buffer state pi, for the cohorts. The resulting final state automaton is shown in the figure. Now, in a system containing only two sites, C(w1) = { q2, w2, a2}, and C(w2) = { a1, p1, w1}.

Failure Transitions

In order to perform independant recovery at a failed site, the failed site should be able to reach a final decision based solely on its local state. A failure transition occurs at a failed site at the instant it fails (or immediately after it recovers from the failure). The site upon recovery assumes this local state caused by the failure transition. The transitions are performed according to the following rules

Rule 1:
For every nonfinal state s (i.e., qi, wi, pi) in the protocol: if C(s) contains a commit, then assign a failure transition from s to a commit state in its FSA; otherwise, assign a failure transition from s to an abort state in its FSA.

Timeout Transitions

Operational sites perform a timeout transition in the event of another site's failure. If site i is waiting for a message from site j (i.e., j belongs to S(i)) and site j has failed, then site i times out. Based on the type of message from j, we can determine in what state site j failed. Once the state of j in known, we can determine the final state of j due to the failure transition at j. This observation leads to the timeout transitions in the commit protocol at the operational sites.

Rule 2:
For each nonfinal state s, if site j is in S(s), and site j has a failure transition to a commit(abort) state, then assign a timeout transition from state s to a commit (abort) state in the FSA.

The rationale behind this rule is as follows. The failed site makes a transition to a commit (abort) state using the failure transition (Rule 1). Therefore, operational sites must make the same transition in order to ensure that the final outcome of the transaction is identical at all the sites.

References

Skeen, D., "A Formal Model of Crash Recovery in a Distributed System," IEEE Transactions on Software Engineering, vol.9, no.3, May 1983, pp.219-228.
Singhal, M and Shivaratri, N., Advanced Concepts in Operating Systems, McGraw-Hill 1994, pp.335-342.

Go to Operating Systems page

Go to Sreenu's page

Last updated: 29 Apr 1995 / sreenu@csgrad.cs.vt.edu