[Bernstein09] 9.8. Summary

来源:百度文库 编辑:神马文学网 时间:2024/04/29 07:37:35

9.8. Summary

Themain goal of replication is to improve availability, since a service isavailable even if some of its replicas are not. Replication can alsoimprove response time, since the capacity of a set of replicatedservers can be greater than the capacity of a single server.

Themost widely-used approach to replication is to replicate the resource(i.e., the database) in addition to the server that manages it. Thisrequires synchronizing updates with queries and each other when theseoperations execute on different replicas, so that the effects areindistinguishable from a nonreplicated system. The synchronizationmechanism must allow for replicas or communications between replicas tobe down for long periods. Communication failures are especiallytroublesome, since noncommunicating replicas may process conflictingupdates that they are unable to synchronize until after they reconnect.

Onepopular approach to replication is to designate one replica as theprimary copy and to allow update transactions to originate only at thatreplica. Updates on the primary are distributed and applied to otherreplicas, called secondaries, in the order in which they executed atthe primary. Since all replicas process the same updates in the sameorder, the replicas converge toward the same state as the primary.

Thestream of updates sent from the primary can be quite large, so it isworth minimizing its size by only including data items that aremodified and by filtering out aborted transactions. The stream can begenerated by processing the resource manager’s log or by using triggersto generate the update stream directly from updates on the primary copy.

Analternative to propagating updates is to send the requests to run theoriginal transactions to all secondaries and ensure that thetransactions execute in the same order at all secondaries and theprimary, either by physically running them in that order, which isslow, or synchronizing their execution between primary and secondaries,which can be tricky.

Inany case, when a secondary fails and subsequently recovers, it mustcatch up processing the updates produced by the primary while it wasdown. If the primary fails, the remaining secondaries must elect a newprimary and ensure it has the most up-to-date view of the updates thatexecuted before the primary failed.

Whena primary or secondary fails, the remaining replicas must check thatthey have a majority or quorum of copies, to ensure that they are theonly group of communicating replicas. For if there were two partitionsof replicas that could communicate within the partition but not betweenpartitions, then the two partitions could process conflicting updatesthat would be hard to reconcile after the groups were reunited.

Sometimespartitioning is a planned and frequent event, as with laptop computersthat contain replicas but are only periodically connected to thenetwork. This requires that every partition be allowed to processupdates, allowing for multiple masters, not just one primary. Somevariation of Thomas’ write rule often is used for these situations:each data item is tagged by the timestamp of the latest update to it.An update is applied only if its timestamp is larger than the dataitem’s tag in the database. That way, updates can arrive in differentorders, sometimes with long delays, yet the replicas will alleventually have the same value, namely the one produced by using theupdate with the largest timestamp.

Theproblem with this approach is that an update can be lost if it’soverwritten by another update with larger timestamp that didn’t see theoutput of the earlier update. One way to avoid this problem is to useversion vectors in place of timestamps. Each version vector tells whichupdates were received by the replica before producing the currentversion of the data item. This enables more accurate conflict detectionat the cost of more information attached to each data item. Anoptimization used in Microsoft Sync Framework avoids this per-data-itemversion vector in most cases, but still requires version vectors fordata items involved in a conflict or received out of order.

TheCAP conjecture says that a system can offer at most two of thefollowing three properties: data consistency, system availability, andtolerance to network partitions. The primary-copy approach withsynchronous replication ensures data consistency andpartition-tolerance. It gives up on the availability of replicas thatare outside the quorum. Asynchronous replication gives up oninstantaneous consistency, ensuring eventual consistency instead, whichimproves availability further in some cases. Multimaster replicationoffers availability and partition-tolerance at the cost of dataconsistency.

Theprimary copy and multimaster algorithms described here are the onesused most widely in practice. However, since replication has been muchstudied by database researchers, there are many other publishedalgorithms beyond the ones in this chapter.

Anotherform of replication is data sharing, where data manager replicas shareaccess to a common resource, such as a database. Since two datamanagers can access the same page of the database, some synchronizationis needed between the data managers. This is usually done with a globallock manager that is accessible to all data managers. A data managersets a global lock before operating on a page. If it updates a page,then it flushes the page to stable storage before releasing the lock.This ensures the next data manager that reads the page will see thelatest version. Synchronization is also needed to enable all the datamanagers to write to the shared log. This can be done with a global logserver. Data managers call the log server to append records to the log.Alternatively, a log space server can be used to allocate log pages toeach data manager, which can then write to those log pages withoutfurther synchronization.