[Bernstein09] 9.3. Synchronizing Updates to Replicated Data

来源:百度文库 编辑:神马文学网 时间:2024/05/01 03:35:50

9.3. Synchronizing Updates to Replicated Data

One-Copy Serializability

Onpossible goal of replication is to have replicas behave functionallylike nonreplicated servers. This goal can be stated precisely by theconcept of one-copy serializability, which extends the concept ofserializability to a system where multiple replicas are present. Anexecution is one-copy serializableif it has the same effect as a serial execution on a one-copy database.We would like a system to ensure that its executions are one-copyserializable. In such a system, the user is unaware that data isreplicated.

In asystem that produces serializable executions, what can go wrong thatwould cause it to violate one-copy serializability? The answer issimple, though perhaps not obvious: a transaction might read a copy ofa data item, say x, that was not written by the last transaction that wrote other copies of x. For example, consider a system that has two copies of x, stored at locations A and B, denoted xA and xB. Suppose we express execution histories using the notation of Section 6.1, where r, w, and crepresent read, write, and commit operations, respectively, andsubscripts are transaction identifiers. Consider the followingexecution history:

H = r1[xA] w1[xA] w1[xB] c1 r2[xB] w2[xB] c2 r3[xA] w3[xA] w3[xB]c1

This is a serial execution. Each transaction reads just one copy of x; since the copies are supposed to be identical, any copy will do. The only difficulty with it is that transaction T2 did not write into copy xA. This might have happened because copy xA was unavailable when T2 executed. Rather than delaying the execution of T2 until after xA recovered, the system allowed T2 to finish and commit. Since we see r3[xA] executed after c2, apparently xA recovered before T3 started. However, r3[xA] read a stale value of xA, the one written by T1, not T2.

When xA recovered, it should have been refreshed with the newly updated value of x that is stored in xB. However, we do not see a write operation into xA after T2 committed and before r3[xA] executed. We therefore conclude that when r3[xA] executed, xA still had the value that T1 wrote.

Clearly, the behavior of H is not what we would expect in a one-copy database. In a one-copy database, T3 would read the value of x written by T2, not T1. There is no other serial execution of T1, T2, and T3that has the same effect as H. Therefore, H does not have the sameeffect as any serial execution on a one-copy database. Thus, it is notone-copy serializable.

One obvious implication of one-copy serializability is that each transaction that writes into a data item x should write into all copies of x.However, when replication is used for improved availability, this isn’talways possible. The whole point is to be able to continue to operateeven when some copies are unavailable. Therefore, the not-so-obviousimplication of one-copy serializability is that each transaction thatreads a data item x must read a copy of x that was written by the most recent transaction before it that wrote into any copy of x. This sometimes requires careful synchronization.

Still,during normal operation, each transaction’s updates should be appliedto all replicas. There are two ways to arrange this: replicate updateoperations or replicate requests. In the first case, each requestcauses one transaction to execute. That transaction generates updateoperations, each of which is applied to all replicas. In the secondcase, the request message is sent to all replicas and causes a separatetransaction to execute at each replica. We discuss each case, in turn.

Replicating Updates

There are two approaches to sending a transaction’s updates to replicas: synchronous and asynchronous. In the synchronous approach, when a transaction updates a data item, say x, the update is sent to all replicas of x.These updates of the replicas execute within the context of thetransaction. This is called synchronous because all replicas are, ineffect, updated at the same time (see Figure 9.4a).Although sometimes this is feasible, often it is not, because itproduces a heavy distributed transaction load. In particular, itimplies that all transactions that update replicated data have to usetwo-phase commit, which entails significant communications cost.

Figure 9.4. Synchronous vs. Asynchronous Replication. Insynchronous replication, each transaction updates all copies at thesame time. In asynchronous replication, a transaction updates only onereplica immediately. Its updates are propagated to the other replicaslater.


Fortunately, looser synchronization can be used, which allows replicas to be updated independently. This is called asynchronous replication, where a transaction directly updates one replica and the update is propagated to other replicas later (see Figure 9.4b).

Asynchronousupdates from different transactions can conflict. If they are appliedto replicas in arbitrary orders, then the replicas will not beidentical. For example, suppose transactions T1 and T2 update x, which has copies xA and xB. If T1 updates xA before T2, but T1 updates xB after T2, then xA and xBend up with different values. The usual way to avoid this problem is toensure that the updates are applied in the same order to all replicas.By executing updates in the same order, all replicas go through thesame sequence of states. Thus, each query (i.e., read-only transaction)at any replica sees a state that could have been seen at any otherreplica. And if new updates were shut off and all in-flight updateswere applied to all replicas, the replicas wouldbe identical. Therefore, users working with one replica see the samebehavior that they would see with any other replica. In this sense, allreplicas behave exactly the same way.

Applyingupdates in the same order to all replicas requires somesynchronization. This synchronization can degrade performance, becausesome operations are delayed until other operations have time tocomplete. Much of the complexity in replication comes from cleversynchronization techniques that minimize this performance degradation.

Whethersynchronous or asynchronous replication is used, applying updates toall replicas is sometimes impossible, because some replicas are down.The system could stop accepting updates when this happens, but this israrely acceptable since it decreases availability. If some replicas docontinue processing updates while other replicas are down, then whenthe down replicas recover, some additional work is needed to recoverthe failed replicas to a satisfactory state. Some of the complexity inreplication comes from ways of coping with unavailable servers andhandling their recovery.

Replicas can be down either because a system has failed or because communication has failed (see Figure 9.5).The latter is more dangerous, because it may lead to two or moreindependently functioning partitions of the network, each of whichallows updates to the replicas it knows about. If a resource hasreplicas in both partitions,those replicas can be independently updated. When the partitions arereunited, they may discover they have processed incompatible updates.For example, they might both have sold the last item from inventory.Such executions are not one-copy serializable, since it could not bethe result of a serial execution on a one-copy database. There are twosolutions to this problem. One is to ensure that if a partition occurs,only one partition is allowed to process updates. The other is to allowmultiple partitions to process updates and to reconcile theinconsistencies after the partitions are reunited—something that oftenrequires human intervention.

Figure 9.5. Node and Communications Failures. Replica1, Replica 2, and Replica 3 are connected by a network. In (a), Replica3 fails. In (b), the connection to Replica 3 fails. Replica 1 andReplica 2 cannot distinguish these two situations, yet the system’sbehavior is quite different.


Circumventingthese performance and availability problems usually involvescompromises. To configure a system with replicated servers, one mustunderstand the behavior of the algorithms used for update propagationand synchronization. These algorithms are the main subject of thischapter.

Replicating Requests

An alternative to sending updates to all replicas is to send the requests to run the original transactions to all replicas (see Figure 9.6).To ensure that all the replicas end up as exact copies of each other,the transactions should execute in the same order at all replicas.Depending on the approach selected, this is either slow or tricky. Aslow approach is to run the requests serially at one replica and thenforce the requests to run in the same order as the other replicas. Thisensures they run in the same order at all replicas, but it allows noconcurrency at each replica and therefore would be an inefficient useof each replica’s resources.

Figure 9.6. Replicating Requests. Eachtransaction runs independently against each replica. In both cases,conflicting updates must be applied in the same order against allreplicas.


Thetrickier approach is to allow concurrency within each replica and usesome fancy synchronization across replicas to ensure that timingdifferences at the different replicas don’t lead to different executionorders at different replicas. For example, in HP’s Reliable TransactionRouter (RTR), a replicated request can be executed at two or morereplicas concurrently as a single distributed transaction. Since itruns as a transaction, it is serialized with respect to otherreplicated requests (which also run as transactions). It therefore canexecute concurrently with other requests. Transaction synchronization(e.g., locking) ensures that the requests are processed in the sameorder at all replicas. As usual, transaction termination issynchronized using two-phase commit. However, unlike ordinary two-phasecommit, if one of the replicas fails while a transaction is beingcommitted, theother continues running and commits the transaction. This is useful incertain applications, such as securities trading (e.g., stock markets),where the legal definition of fairness dictates that transactions mustexecute in the order they were submitted, so it is undesirable to aborta transaction due to the failure of a replica.

Replicatingupdates is a more popular approach than replicating requests, by far.Therefore, we will focus on that approach for the rest of this chapter.