3-Phase Commit Protocol
Three-Phase Commit Protocol
Introduction
Fault-tolerant computer systems prevent the disruption of services provided to users. A system can be designed to be fault-tolerant in two ways. The techniques used to implement them are as follows :
- Voting Protocols:They are used to design systems that mask failures.These systems continue to perform their specified functions despite failures.
- Commit Protocols:They are used to design systems that exhibit a well defined behaviour in the event of a failure. These systems may or may not perform the specified function during failures, but they may facilitate actions suitable for recovery.
There are two kinds of commit protocols :
- Two-Phase Commit protocol - a blocking protocol
- Three-Phase Commit protocol - a non-blocking protocol
Here, we study the three-phase commit protocol only.
You might want to read the Definitions and Conditions that cause Blocking before going further.
Overview
While the two-phase commit protocol guarantees global atomicity, its
biggest drawback is that it is a blocking protocol.
Whenever the coordinator fails, cohort sites will have to wait for its
recovery. This is undesirable as these sites may be holding locks on
the resources. In the event of message loss, the two-phase protocol
will result in the sending of more messages.
If transactions must be resilient to site failures
- commit protocols must not block in the event of site failures
- failed sites, upon recovery must all reach the same conclusion regarding the outcome of the transaction
- operational sites should agree on the outcome of the transaction by examining their local states
- this decision must be consistent with the final outcome of the operational sites
We will see how these conditions are met by this protocol.
Finite State Automata of the protocol
The Protocol
Assumptions
- each site uses the write-ahead-log protocol
- atmost one site can fail during the execution of the transaction
Basic Idea is that before the commit protocol begins, all the
sites are in state q. If the coordinator fails while in state q1, all
the cohorts perform the timeout transition, thus aborting the
transition. Upon recovery, the coordinator performs the failure
transition.
Phase I
During the first phase, the coordinator is in state w1, and each
cohort is either in state a (in which case the site has already sent
an abort message to the coordinator) or w or q depending on whether it
has received the Commit_Request message or not. If a cohort fails, the
coordinator times out waiting for the agreed message from the failed
cohort. In this case, the coordinator aborts the transaction and sends
abort messages to all the cohorts.
Phase II
In the second phase, the coordinator sends a Prepare message to all
the cohorts if all the cohorts have sent Agreed messages in Phase I.
Otherwise, the coordinator will send an Abort message to all the
cohorts.On receiving a Prepare message, a cohort sends an acknowledge
message to the coordinator. If the coordinator fails before sending
Prepae messages (i.e., in state w1), it aborts the transaction upon
recovery, according to the failure transition. The cohorts time out
waiting for the prepare message, and also abort the transaction as per
the timeout transition.
Phase III
In the third phase, on receiving acknowledgements to the Prepare
messages from all the cohorts, the coordinator sends a Commit message
to all the cohorts. A cohort, on receiving a Commit message, commits
the transaction. If the coordinator fails before sending the commit
message (i.e., in state p1), it commits the transaction upon recovery,
according to the failure transition from state p1. The cohorts time
out waiting for the Commit message. They commit the transaction
according to the timeout transition from state pi. However, if a
cohort fails before sending an acknowledgement message to a Prepare
message, the coordinator timesout in state p1. The coordinator aborts
the transaction and sends Abort messages to all the cohorts. The
failed cohort, upon recovery, will abort the transaction according to
the failure transaction from state wi.
Synchronous protocols
A protocol is said to be synchronous within one state
transition if one site never leads another site by more than one state
transition during the execution of the protocol.
Concurrency set
Let si denote the state of a site i. The set of all the states of
every site that may be concurrent with it is known as the concurrency
set of si (denoted by C(si). For example, consider a system having two
sites, using the
Two-Phase Commit protocol. If site 2's state is w2,
then C(w2) = { c1, a1, w1 }. Likewise, C(q2) = { q1, w1 }. Note that,
a1, c1 do not belong to C(q2) because the two-phase commit protocol is synchronous within one state transition.
Sender set
Let s be an arbitrary state of a site, and let M be the set of all messages that are received in
state s. The sender set for s, denoted S(s) = { i | site i sends m and m belongs to M }
Let us see the conditions that lead to the Two-Phase commit protocol
blocks. Consider a simple case where only one site remains operational
and all other sites have failed. This site has to proceed based solely
on its local state. Let s denote the state of the site at this point.
If C(s) contains both commit and abort states, then the site cannot
decide to abort the transaction because someother may be in the commit
state. On the other hand, the site cannot decide to commit the
transaction because someother site may be in the abort state. In other
words, the site has to block until all the failed sites recover.
To make the two-phase commit protocol a nonblocking protocol, we need
to make sure that C(wi) does not contain both abort and commit states.
This can be done by introducing a buffer state p1. We also introduce a
buffer state pi, for the cohorts. The resulting final state automaton
is shown in the figure. Now, in a system containing only two
sites, C(w1) = { q2, w2, a2}, and C(w2) = { a1, p1, w1}.
In order to perform independant recovery at a failed site, the failed
site should be able to reach a final decision based solely on its
local state. A failure transition occurs at a failed site at
the instant it fails (or immediately after it recovers from the
failure). The site upon recovery assumes this local state caused by
the failure transition. The transitions are performed according to the
following rules
Rule 1:
For every nonfinal state s (i.e., qi, wi, pi) in the protocol: if C(s)
contains a commit, then assign a failure transition from s to a commit
state in its FSA; otherwise, assign a failure transition from s to an
abort state in its FSA.
Operational sites perform a timeout transition in the event of
another site's failure. If site i is waiting for a message from site j
(i.e., j belongs to S(i)) and site j has failed, then site i times
out. Based on the type of message from j, we can determine in what
state site j failed. Once the state of j in known, we can determine
the final state of j due to the failure transition at j. This
observation leads to the timeout transitions in the commit protocol
at the operational sites.
Rule 2:
For each nonfinal state s, if site j is in S(s), and site j has a
failure transition to a commit(abort) state, then assign a timeout
transition from state s to a commit (abort) state in the FSA.
The rationale behind this rule is as follows. The failed site makes a
transition to a commit (abort) state using the failure transition
(Rule 1). Therefore, operational sites must make the same transition
in order to ensure that the final outcome of the transaction is
identical at all the sites.
References
- Skeen, D., "A Formal Model of Crash Recovery in a Distributed System," IEEE Transactions on Software Engineering, vol.9, no.3, May 1983, pp.219-228.
- Singhal, M and Shivaratri, N., Advanced Concepts in Operating
Systems, McGraw-Hill 1994, pp.335-342.
Go to Operating Systems page
Go to Sreenu's page
Last updated: 29 Apr 1995 / sreenu@csgrad.cs.vt.edu