[Bernstein09] Section 4.3. Client Recovery

来源:百度文库 编辑:神马文学网 时间:2024/04/30 00:49:42
4.3. Client Recovery
Animportant reason to use queuing instead of direct TP is to addresscertain client and server failure situations. In this section, wesystematically explore the various failure situations that can arise.We do this from a client’s perspective, to determine what a clientshould do in each case.
We will assume the request-reply model ofFigure 4.5.That is, a client runs Transaction 1 to construct and submit a request,and later runs Transaction 3 to receive and process the reply. Its goalis to get exactly-once behavior; that is, that Transaction 2 executesexactly once and its reply is processed in Transaction 3 exactly once.
Letus assume that there is no failure of the client, the communicationsbetween the client and the queues, or the queues themselves. In thiscase, the client’s behavior is pretty straightforward. It submits arequest. Since there are no failures between the client and the requestqueue, the client receives an acknowledgment that the request issuccessfully enqueued. The client then waits for a reply. If it iswaiting too long, then there is presumably a problem with the server—itis down, disconnected, or busy—and the client can take appropriateaction, such as sending a message to a system administrator. Theimportant point is that there is no ambiguity about the state of therequest. It’s either in the request queue, in the reply queue, or beingprocessed.
Suppose theclient fails or loses connectivity to the queues, or the queues fail.This could happen for a variety of reasons, such as the failure of theclient application or machine, the failure of the machine that storesthe queues, a network failure, or a burst of traffic that causes one ofthese components to be overloaded and therefore unresponsive due toprocessing delays. At some point, the failed or unresponsive componentsrecover and are running normally again, so the client can communicatewith the queues. At this point the client needs to run recovery actionsto resynchronize with the queues. What exactly should it do?
Tokeep things simple, let’s assume that the client processes one requestat a time. That is, it processes the reply to each request before itsubmits another request, so it has at most one request outstanding. Inthat case, at the time the client recovers, there are four possiblestates of the last request it submitted:
Transaction 1 did not run and commit. Either it didn’t run at all, or it aborted. Either way, the request was not submitted. The client should resubmit the request (if possible) or else continue with a new request.
Transaction 1 committed but Transaction 2 did not. So the request was submitted, but it hasn’t executed yet. The client must wait until the reply is produced and then process it.
Transaction 2 committed but Transaction 3 did not. The request was submitted and executed, but the client hasn’t processed the reply yet. The client can process the reply right away.
Transaction 3 committed. The request was submitted and executed, and the client already processed the reply. So the client’s last request is done, and the client can continue with a new request.
To determine what recovery action to take, the client needs to figure out which of the four states it is in.
Ifeach client has a private reply queue, it can make some headway in thisanalysis. Since the client processes one request at a time, the replyqueue either is empty or has one reply in it. So, if the reply queue isnonempty, then the system must be in state C, and the client should goahead and process the reply. If not, it could be in states A, B, or D.
Todisambiguate these states, some additional state information needs tobe stored somewhere. If the client has access to persistent storagethat supports transaction semantics, it can use that storage for stateinformation. The client marks each request with a globally-uniqueidentifier (ID) and stores the request in persistent storage beforeenqueuing it in the request queue (see LastRequest in Transaction 0 inFigure 4.6).In persistent storage the client also keeps the IDs of the last requestit enqueued and the last reply it dequeued, denoted LastEnqueuedID andLastDequeuedID, respectively. It updates these IDs as part oftransactions 1 and 3 that enqueue a request and dequeue a reply, asshown inFigure 4.6. In that figure, the expression R.ID denotes the ID of request R.
Figure 4.6. Client Maintains Request State. The client stores the ID of the last request it enqueued and the last reply it dequeued, in Transactions 1 and 3, respectively.

Atrecovery time, the client reads LastRequest, LastEnqueuedID, andLastDequeuedID from persistent storage. It uses them to analyze thestate of LastRequest as follows:
If LastRequest.ID ≠ LastEnqueuedID, then the system must be in state A. That is, the last request that the client constructed was not successfully submitted to the request queue. Either the client failed before running Transaction 1, or Transaction 1 aborted because of the client failure or some other error. The client can either resubmit the request or delete it, depending on the behavior expected by the end user.
If LastRequest.ID = LastDequeuedID, then the client dequeued (and presumably processed) the reply to the last request the client submitted, so the system is in state D. In this case, the request ID has helped the client match up the last request with its reply, in addition to helping it figure out which state it is in.
If the reply queue is nonempty, the client should dequeue the reply and process it (i.e., state C). Notice that in this case, LastRequest.ID = LastEnqueuedID and LastRequest.ID ≠ LastDequeuedID, so the previous two cases do not apply.
Otherwise, the client should wait until the reply appears before dequeuing it (i.e., state B).
Thisrecovery procedure assumes that the client uses a persistent storagesystem that supports transaction semantics. This is a fairly strongassumption. The client may not have such storage available. Even if theclient does have it, the application developer may want to avoid usingit for performance reasons. That is, since the queue manager andpersistent storage are independent resource managers, the two-phasecommit protocol is needed for Transactions 1 and 3, which incurs somecost.
This cost canbe avoided by storing the state information in the queue manageritself. For example, the client could store LastEnqueuedID andLastDequeuedID in a separate queue dedicated for this purpose.Alternatively, the queue manager could maintain LastEnqueuedID andLastDequeuedID as the state of a persistent session between the clientand the queue manager. The client signs up with the queue manager byopening a session. The session information is recorded in the queuemanager’s persistent storage, so the queue manager can remember thatthe client is connected. If the client loses connectivity with theserver and later reconnects, the queue manager remembers that italready has a session with the client, because it is maintaining thatinformation in persistent storage. So when the client attempts toreconnect, the system re-establishes the existing session. Since thesession state includes the request and reply IDs, the client can askfor them as input to its recovery activity.
Therecovery scenario that we just described is based on the assumptionthat the client waits for a reply to each request before submittinganother one. That is, the client never has more than one requestoutstanding. What if this assumption doesn’t hold? In that case, it isnot enough for the system to maintain the ID of the last requestenqueued and the last reply dequeued. Rather, it needs to rememberenough information to help the client resolve the state of alloutstanding requests. For example, it could retain the ID of everyrequest that has not been processed and the ID of the last nreplies the client has dequeued. Periodically, the client can tell thequeue manager the IDs of recently dequeued replies for which it has apersistent record, thereby freeing the queue manager from maintainingthat information. Many variations of this type of scheme are possible.
Thisscenario assumes that after a client processes a reply, it no longerneeds to know anything about that request’s state. For example, supposea client runs two requests. It submits Request1, the server processes Request1 and sends Reply1, and the client processes Reply1. Then the client submits Request2, the server processes Request2 and sends Reply2, and the client processes Reply2. At this point, the client can find out about the state of Request2, but not about Request1, at least not using the recovery procedure just described.
Findingout the state of old requests is clearly desirable functionality.Indeed, it’s functionality that we often depend on in our everydaylives, such as finding out whether we paid for a shipment that hasn’tarrived or whether we were credited for mileage on an old flight.However, this functionality usually is not offered by a queuing systemor queued transaction protocols like the ones we have been discussing.Rather, if it is offered, it needs to be supported by the applicationas another transaction type—a lookup function for old requests. Tosupport this type of lookup function, the application needs to maintaina record of requests that it already processed. In financial systems,these records are needed in any case, to support the auditabilityrequired by accounting rules. However, even when they’re not required,they’re often maintained as a convenience to customers.