[Bernstein09] 9.2. Replicated Servers

来源:百度文库 编辑:神马文学网 时间:2024/05/01 00:24:10

9.2. Replicated Servers

The Primary-Backup Model

Tomaximize a server’s availability, we should try to maximize its meantime between failures (MTBF) and minimize its mean time to repair(MTTR). After doing the best we can at this, we can still expectperiods of unavailability. To improve availability further requiresthat we introduce some redundant processing capability by configuringeach server as two server processes: a primary server that is doing thereal work, and a backup server that is standing by, ready to take overimmediately after the primary fails (see Figure 9.1).The goal is to reduce MTTR: If the primary server fails, then we do notneed to wait for a new server to be created. As soon as the failure isdetected, the backup server can immediately become the primary andstart recovering to the state the former primary had after executingits last non-redoable operation, such as sending a message to anATM to dispense money. If it recovered to an earlier state, it wouldend up redoing the operation, which would be incorrect. Since we areinterested primarily in transactional servers, this means recovering toa state that includes the effects of all transactions that committed atthe former primary and no other transactions. For higher availability,more backup servers can be used to guard against the possibility thatthe primary and backup fail.

Figure 9.1. Primary-Backup Model. The primary server does the real work. The backup server is standing by, ready to take over after the primary fails.


Thistechnique is applicable to resource managers and to servers that runordinary applications, such as request controllers and transactionservers. When a server of either type fails, it needs to be recreated.Having a backup server avoids having to create the backup server atrecovery time.

Ifthere are many clients and some are connected by slow communicationlines, then it can take a long time to recreate sessions with thebackup server. To avoid doing this at recovery time, each clientconnected to the primary server should also have a backup communicationsession with the backup server. This further decreases (i.e., improves)MTTR.

Ingeneral, the degree of readiness of the backup server is a criticalfactor in determining MTTR. If a backup server is kept up to date sothat it is always ready to take over when the primary fails withpractically no delay, then it is called a hot backup.If it has done some preparation to reduce MTTR but still has asignificant amount of work to do before it is ready to take over fromthe primary, then it is called a warm backup. If it has done no preparation, then it is called a cold backup.

Asin the case of a server that has no backup, when the primary serverfails, some external agent, such as a monitoring process, has to detectthe failure and then cause the backup server to become the primary. Thedelay in detecting failures contributes to MTTR, so fast failuredetection is important for high availability.

Oncethe backup server has taken over for the failed primary, it may beworthwhile to create a backup for the new primary. An alternative is towait until the former primary recovers, at which time it can become thebackup. Then, if desired, the former backup (which is the new primary)could be told to fail, so that the original primary becomes primaryagain and the backup is restarted as the backup again. This restoresthe system to its original configuration, which was tuned to work well.The cost is a brief period of downtime while the secondary and primaryswitch roles.

Whentelling a backup to become the primary, some care is needed to avoidending up with two servers believing they’re the primary. For example,if the monitor process gets no response from the primary, it mayconclude that the primary is dead. But the primary may actually beoperating. It may just be slow because its system is overloaded (e.g.,a network storm is swamping its operating system), and it thereforehasn’t sent an “I’m alive” message in a long time, which the monitorinterprets as a failure of the primary. If the monitor then tells thebackup to become the primary, then two processes will be operating asprimary. If both primaries perform operations against the sameresource, they may conflict with each other and corrupt that resource.For example, if the resourceis a disk they might overwrite each other, or if the resource is acommunications line they may send conflicting messages.

Oneway to avoid ending up with two primaries is to require the primary toobtain a lock that only one process can hold. This lock could beimplemented in hardware as part of the resource. For example, somenetworking techniques, such as reflective memory, and most disksystems, such as SCSI and Fiber Channel (as it runs over SCSI), allow alock on a resource over their shared bus. Or it could be implemented insoftware using a global lock manager, which is supported by someoperating systems that are designed for multiserver clusters and asindependent components in some distributed systems. Another solution isto use a third “watchdog” process, which is described in Section 9.4, Primary Recovery with One Secondary.

Replicating the Resource

Aserver usually depends on a resource, typically a database. Whenreplicating a server, an important consideration is whether toreplicate the server’s resource too. The most widely-used approach toreplication is to replicate the resource (i.e., the database) inaddition to the server that manages it (see Figure 9.2).This has two benefits. First, it enables the system to recover from afailure of a resource replica as well as a failure of a server process.And second, by increasing the number of copies of the resource, itoffers performance benefits when access to the resource is thebottleneck. For example, the backup can do real work, such as processqueries, and not just maintain the backup replica so it can take overwhen there is a failure.

Figure 9.2. Replicating a Server and Its Resource. Both the server and the resource are replicated, which enables recovery from a resource failure.


Themain technical challenge in implementing this approach to replicationis to synchronize updates with queries and each other when theseoperations execute on different replicas. This approach of replicatingresources, and its associated technical challenges, is the main subjectof this chapter, covered in Sections 9.3 through 9.6

Replicating the Server with a Shared Resource

Anotherapproach is to replicate the server without replicating the resource,so that all copies of the server share the same copy of the resource(see Figure 9.3). This is useful in a configuration where processors share storage,such as a storage area network. In a primary-backup configuration, ifthe primary fails and the resource is still available, the backupserver on another processor can continue to provide service (see Figure 9.3a).

Figure 9.3. Replicated Server with Shared Resource. In(a), the primary and backup server share the resource, but only one ofthem uses the resource at any given time. In (b), many servers sharethe resource and can concurrently process requests that require accessto the resource.


Thisprimary-backup approach improves availability, but not performance. Ifthe server is the bottleneck and not the resource, then performance canbe improved by allowing multiple servers to access the resourceconcurrently, as shown in Figure 9.3b. This approach, often called data sharing,introduces the problem of conflicts between transactions that executein different servers and read and write the same data item in theresource. One solution is to partition the resource and assign eachpartition to one server. That way, each server can treat the partitionas a private resource and therefore use standard locking and recoveryalgorithms. If a server fails, its partition is assigned to anotherserver, like in the primary-backup approach. Another solution is toallow more than one server to access the same data item. This solutionrequires synchronization between servers and is discussed in Section 9.7.