Tour Web Services Atomic Transaction operatio...

来源:百度文库 编辑:神马文学网 时间:2024/04/28 05:46:57
Tour Web Services Atomic Transaction operations
Beginner‘s guide to classic transactions, data recovery, and mapping to WS-AtomicTransactions

Document options

Print this page

E-mail this page
Rate this page

Help us improve this content
Level: Introductory
Thomas Freund, Senior Technical Staff Member , IBM
Daniel House (dhouse@us.ibm.com), Senior Technical Staff Member, IBM
02 Sep 2004
Explore how transactions work in one common and classic form to preserve data integrity, and apply that classical transaction description to the operations of the new Web Services Atomic Transactions (WS-AT) and related Web Services Coordination (WS-C) specifications. Mapping classical to Web services transactions helps you discover that Web Services Atomic Transactions embodies age-old common industry best practices for one kind of transaction.
Whereas classical transaction processing most often occurred using non-universal means and interoperated in an ad hoc manner, if at all, the new WS-AT embodiment is based on widely accepted interoperability standards such as XML and WSDL. Using Web services mechanisms results in better flexibility and interoperability.
Most people know that transactions are vital to any business. Fewer people know how transaction processing works behind the scenes. In this paper, we illustrate how transactions work using a simple example that addresses, and provides a solution for, a common transactional problem: losing a customer‘s money. First we illustrate the problem using a classic style of transaction processing, then we map the old onto a new Web services-based mechanism with new advantages in flexibility and interoperability.

To really understand how WS-AtomicTransaction defines transactional exchanges, you need to understand Web services and various XML technologies that underlie Web services. For example, it helps to understand how WSDL (Web Services Description Language) is used to describe and publish Web services in a standard way.
However, trying to describe XML, WSDL, and Web services to talk about transactions is like describing chemistry to talk about cooking. So this article ignores much of the underlying technology and focuses on transactions, data recovery, and mapping the classical concepts to their equivalents in WS-Coordination and WS-AtomicTransaction.
Not losing money is quite important. Just ask Waldo. Waldo‘s situation typifies the need for a transaction. Waldo uses an ATM (or a browser) to move some money from one account to another account. These accounts may be in different branches of the same financial institution, or may be in different institutions.
It is never acceptable to Waldo for his money to disappear. Should Waldo ever doubt the safety of his money, he would probably switch financial institutions.
Waldo‘s money is represented by data in two databases that cooperate to insure that the data they contain are always in a known and consistent state. That is, these two databases allow actions or tasks between them to be within a common activity or work scope (seeFigure 1). Put yet another way, a single transaction can manipulate data in both databases and something will guarantee that only one of two possible outcomes occur: all the changes are successfully made or none of the changes are made at all.

The something that guarantees the common outcome of all the actions is a protocol supported by both databases, and some supporting middleware. The protocol the databases use to keep data (such as Waldo‘s balances) coordinated is called two phase commit, or simply 2PC. Our example uses a common variation of 2PC called presumed abort, where the default behavior in the absence of a successful outcome is to rollback or undo all actions in the activity.
From a programming perspective, there are different ways to specify that multiple actions should be within the scope of a single transaction. One particularly clear way to specify transactional behavior is shown inListing 1. The code is the small piece of logic running somewhere behind the ATM Waldo is using -- perhaps within the datacenter of one of the financial institutions involved.
Clearly a lot is left out of Listing 1. We are only going to show enough to use this example to illustrate the 2PC protocol being used to coordinate two actions: taking money out of one account and putting it in another account.
TransferCash(fromAcct, toAcct, amount) BeginTransaction fromAcct -= amount toAcct += amount CommitTransaction Return
Another way to specify that a transaction is needed is to use J2EE‘s Container Managed Transactions (CMT), but we use the code in Listing 1 for now because it is a clear and easy match for Waldo‘s simple transaction. Using J2EE CMT, the BeginTransaction and CommitTransaction are automatically implied without using lines of code.


Back to top
Abstractly, a classic transaction is just a grouping of recoverable actions, the guaranteed outcome of which is that either all the actions are taken, or none of them are taken (seeFigure 1). For our simple purposes, a recoverable action is anything that modifies protected data. For example, taking money out of one of Waldo‘s accounts (fromAcct -= amount) is a recoverable action that can be reversed up to the end of the transaction.
In Waldo‘s case, his transaction comprises two actions: taking money out of one account and putting money into another account. It‘s okay for both of these actions to occur, and it‘s even okay if neither of these actions occur. It‘s never okay for one action to occur without the other also occurring, which would result in corrupt data and either Waldo‘s net worth or the bank‘s assets disappearing or appearing from nowhere. Hence, both actions need to be within a single transaction with a single outcome: either both actions occur (a commit outcome), or neither action occurs (a rollback outcome).
Assuming no errors happen, the code inListing 1 shows that a commit outcome is desired. The code could just as easily have specified rollback instead of commit (for when Waldo hits the key on the ATM), which means reverse all actions in the transactional work scope (between beginning and end). The transaction monitor, which is the underlying middleware helping the code inListing 1 support transaction processing, would automatically specify rollback if the program suffered an unhandled exception. Such an automatic rollback on the part of the transaction monitor is a protection mechanism to make sure that data is not corrupted -- for example even if the ATM application fails unexpectedly, the middleware will "clean up" and guarantee the outcome.
For this introductory paper, we ignore truly catastrophic events, such as one of Waldo‘s banks being entirely swallowed into a sink hole, thereby precluding final outcome processing of his transaction. Really bad events, which are orders of magnitude less common than regular outcome processing, are the subject of something called heuristics. Heuristics are beyond our scope and are likely to involve human intervention at one of the banks (the one that didn‘t fall into a sink hole).
Now let‘s see how one common variant of 2PC (presumed abort) can be used to effect Waldo‘s transaction and move money from one account to another in a recoverable way. A key part of this illustration is to see that no matter what kind of failure occurs (ignoring sink holes), data integrity is preserved and Waldo remains a loyal customer.
Figure 2 shows Waldo‘s transaction on a timeline with all of the interacting components needed to execute the logic shown inListing 1. The ATM application itself is the top line. The next two lines represent the account databases that the application manipulates. The databases will be transactional participants. The next line is a transactional coordinator, or middleware that will orchestrate the 2PC protocol.
The state of the transaction dictates recovery processing in the event of a failure. The colored line at the very bottom indicates the state of Waldo‘s transaction at different points in time.
The lines for Database-1, Database-2, and Coordinator represent both time (flowing left to right) and also some key records recorded onto a recovery log. The recovery log is used insure data integrity during recovery processing, which we will illustrate later.
Safety of the recovery log is vital. The log is specially protected using hardware if the data is really important (and Waldo‘s money is very important). It is critical that the recovery log is not lost, and so it is protected with redundancy and security commensurate with the value of the data (such as secured access, RAID storage, physically separated redundant storage, and so on).
Most optimizations, checkpoint processing (aggregating intermediate results so that recovery actions never need to read the entire log), and "fringe cases" are beyond our scope. For example, we are a little liberal about what it means to write to the log. Some things have to be written to the log before processing can continue, but many things can be written "lazily." In the two cases where it is really important (numbered steps 12 and 13 below), we specify that log is forced, meaning the writes occur before any further steps are taken.
We are using just one of many possible variations on 2PC for our example. Our databases use their logs for before and after images of data (Undo and Do records) and state information. We could have simplified even further and skipped the Do records, but sizeable and high performance databases would probably use them, so we did too (usage becomes clear in the following steps).


Back to top
Now let‘s walk through Waldo‘s transaction. Below, when we talk about the ATM application, take it to mean either the application itself, or some middleware supporting the application. For example, when we say the application begins a transactional scope, it could be that middleware begins the transactional scope on behalf of the application.

Here is narration to help explain the numbered steps shown in Figure 2 (the application pseudo code is shown inListing 1):
The ATM application indicates the beginning of a transactional scope. A coordinator may or may not write anything on a log to remember this transaction -- as the recovery rules later show, using presumed abort (which this example does), nothing really needs to be hardened on the log at this time. The Coordinator creates a context, or unique identifier for this transaction and some other information about it. Importantly, this transaction context flows back to the application. The context flows with other interactions between the application and resource managers; it is the context that helps glue together a whole set of actions into one transactional activity.
The application takes money out of Database-1. The context (from step 1) is inserted into this flow.
Database-1 sees the request for action, but also sees the transactional context. Database-1 uses this context to contact the transactional Coordinator and register interest in this transaction or activity (so that the coordinator will help Database-1 through 2PC processing later to guarantee a commit or rollback outcome of all actions). Coordinator remembers that Database-1 is a participant in the transaction.
Database-1 looks at the request to modify recoverable data. It writes records to a recovery log, plus transaction state information. One record describes the database change to be made if the decision later is to commit (the Do Record). The other record describes the database change to be made if the decision is to rollback (the Undo Record).
In this case, the Undo record says make Waldo‘s balance = X and the Do record says make Waldo‘s balance = X-$. X is the amount of the balance before this transaction ever started and $ is the amount to transfer. Notice we are only looking at the recovery log -- not database files.
The Do records are not strictly required if the DBMS makes database file updates when the application requests it to, instead of waiting. However, waiting to write the data can have advantages for performance and concurrency. In addition, the Do records may be used for audit or other advanced reasons. Since they are so useful, our example databases use them.
As may be clear later, the database doesn‘t actually have to write the Do, Undo, and state records to its log just yet, although it might. It might not write anything at all! It might not perform any write operations, because we haven‘t gotten to the point in the protocol where the database guarantees being able to commit or rollback (see step 12). For example purposes, we assume the records are written as on the timeline (seeFigure 2) and the database holds locks on the database data at some point (for isolation -- so no other program can modify that same data).
Return to the application.
Similarly to step 2, the application requests to manipulate the other database, Database-2. The application wants to add in the amount taken out of Database-1.
Database-2 registers interest in the transaction with the coordinator the same way Database-1 did. The coordinator remembers that Database-2 is a participant in the transaction.
Database-2 writes Undo and Do records and state to its recovery log, again just as Database-1 did.
Return to the application.
The application chooses to commit the transaction. The coordinator now takes over. When Commit is received, Phase 1 of 2PC begins.
In Phase 1, the coordinator goes down the list of all participants (Database-1 and Database-2 in this example) who expressed interest in this transaction, asking each one to Prepare. Prepare means get ready to receive the order to either commit or rollback.
Database-1 and Database-2 both respond with Prepared, meaning they are ready to be told the final outcome (commit or rollback all the changes made) and support it. They must have committed something (at least on their logs) by this point, because responding Prepared means they guarantee being able to commit or rollback when told -- actions up to this point were just tentative.
If either Database had some kind of failure preparing (a different scenario), it would respond Rollback instead of Prepared and the coordinator would broadcast Rollback to all participants.
The Coordinator forces a log record indicating a Transition to Phase 2 (T02). This is sometimes referred to as the atomic instant. Once this record is hardened on a log we know and have recorded that: All participants are prepared to go either way (commit or rollback);
The ultimate outcome of the transaction is known (commit in our example); and,
The outcome is guaranteed by recovery processing (illustrated later).
If this record fails to make it to the log for any reason, the ultimate outcome will be to Rollback (we are using presumed abort in this example). The recovery processing will enforce the outcome.
The Coordinator informs each participant that the decision is to commit the changes. The participants can then do whatever they need to do, such as perhaps writing the results to the real database data or some other optimizations they might devise.
The participants return to the Coordinator with Committed. Once the Coordinator knows that all the participants acknowledged the Commit order with Committed, it can forget about this transaction: the transaction was acknowledged by all to be done.
Return to the application.
At some point, since it knows the participants have succeeded in the 2PC flow by acknowledging the common outcome, the coordinator writes an end indicator on its log, but this is just an optimization (since checkpointing is beyond our scope).
Figure 2 illustrated that a lot of things go on behind the scenes. Next we illustrate how the recoverable resource managers (participants) in this example can recover from failures and insure data integrity.


Back to top
Recovery processing isn‘t really part of two phase commit. Rather, 2PC enables recovery. That is, because all the resource managers used 2PC, they are able to perform actions such as described here and guarantee data integrity. Integrity is maintained even across databases by making sure that all the recoverable actions either go forward or go back.
Figure 3 is a copy of Figure 2 with three red vertical lines inserted. These red vertical lines represent failures. We assume the worst possible failure that is not permanent: everything fails due to a massive, region-wide power failure. Smaller failures are mostly just a subset of this massive failure (for example, the application fails, a database fails, and the coordinator fails). After the failure, the databases and the coordinator can eventually restart and use their recovery logs and recovery rules to insure data integrity as illustrated inFigure 3.
We won‘t really care much about the ATM application itself. If the application alone fails, middleware underneath it will drive rollback (assuming it dies prior to saying Commit). If the application does go down and come back up, it can be blissfully ignorant of what it was doing before the failure, since the middleware and databases guarantee data integrity. This makes the application itself vastly easier to write and maintain.
Data recovery uses rules based on the state of the transaction as recorded by the coordinator and participants. Here are our recovery rules (for this example, using 2PC with presumed abort and our specific optimizations). After a failure:
If the transaction is not known by the coordinator (in other words, is before any record is on the coordinator‘s log), then the recovery rule is to Undo any actions (our databases use their Undo records).
If the transaction is known by the coordinator, but the transaction state is Active or In-Doubt, then the rule is to Undo any actions (our databases again use their Undo records).
If the transaction is known by the coordinator and the transaction state is In Commit, then the rule is to harden any actions (our databases use their Do records -- other DBMS‘s might have written data directly in their database media already and not require Do records). Likewise if the state is In Rollback, then the rule is to Undo any actions (our databases again use their Undo records).

Database-1 looks at its recovery log and notices that it has an incomplete transaction. Database-1 contacts the coordinator (how to do this is encoded in the state information on the recovery log for this transaction -- or is otherwise known). The coordinator, however, never wrote any record to its log and tells Database-1 it has no knowledge of this transaction. Database-1 therefore uses recovery Rule A and applies its Undo record, making the data consistent again from Database-1‘s point of view. Database-1 can now forget about this transaction.
Database-2 finds nothing on its log about this transaction, and there is nothing for it to do.
Coordinator also finds nothing on its log, and has nothing to do.
When Waldo returns to the ATM, all his accounts are in the same state they were in before he started the transaction. He thought the transaction was in process when a massive power failure took out the ATM and the two banks he was transferring money between. He is relieved when he finds that his balances have not changed at all.
Database-1 looks at its recovery log and notices that it had an incomplete transaction. Database-1 contacts the coordinator. The Coordinator tells Database-1 it has no knowledge of this transaction. Database-1 uses recover Rule A and applies its Undo record. Once done, Database-1 can forget this transaction.
Database-2 does the same as Database-1, applies its Undo record and forgets the transaction.
Coordinator hadn‘t logged anything concerning this transaction, so it has nothing to do.
As with Failure 1, when Waldo returns to the ATM, his accounts are again in the same state as before he started the transaction that was interrupted by the massive power failure.
Given where Failure 3 occurs (exactly as shown inFigure 3), both Database-1 and Database-2 were given the Commit, but they might not have logged anything about the Commit order, and so they might not know when restarting if they completed the transaction or not. If a database knows it already committed the transaction because it finds information to that effect on it‘s log, it can forget the transaction.
If a database can only deduce that it was prepared, then it will try to reconnect to the coordinator and get the outcome again (the transaction is probably In-Commit or In-Rollback, but the database isn‘t sure). Once the database gets the outcome from the coordinator (which was hardened as T02 on its log), the database will apply Rule C and use its Do or Undo records.
The Coordinator doesn‘t know that it already told Database-1 and Database-2 the outcome, because the End record didn‘t make it to the coordinator‘s log before the failure. So the coordinator will probably try to contact both database participants and both will respond that they are done or they need to be given the outcome (alternately the coordinator could do nothing and wait to be asked). Once the coordinator knows all participants have been given the outcome and they completed, it can forget the transaction.
Once again, when Waldo returns to the ATM after the failure, his money is again in a known state, but this time the transaction completed successfully (money was taken out of one account and put into the other). Waldo is very impressed with the quality of service provided by his financial institutions, who successfully completed his transaction even during the massive power failure.
It‘s important to remember that this example used just one kind of 2PC with some specific optimizations -- and in fact some things were left out (for example, when can one of the databases forget about a transaction?). More optimizations and lots of heuristic processing goes on in any real system.
A useful transactional protocol requires that all the recovery cases for failure be covered in some way, either in the protocol or in the participants, and be enabled by the protocol. This example only illustrated a few cases of failures.


Back to top
In Figures2 and3 we didn‘t mention how Database-1 contacted the Coordinator, nor did we specify how the Application called the databases. In fact, we didn‘t specify the mechanisms for anything to contact anything else. In the past, these were mostly non-universal mechanisms that sometimes only worked between certain combinations of entities (applications, resource managers, and coordinators or transaction monitors).
The combination of Web Services, Web Services Coordination (WS-C) and Web Services Atomic Transactions (WS-AT) map all of the flows shown inFigure 2 and specify precise communications mechanisms for achieving the same results. However, instead of only working between certain combinations, the Web Services based flows can work with just about anything.

InFigure 4, the classic flows (Figures2 and3) are converted to Web services as follows. Significantly changed steps are numbered and described below. As before, when we say application, take it to mean the application or some helper middleware. Likewise, when we say database, it might mean the actual database, or some helper middleware.
The Application uses the Activation Service defined in WS-Coordination to obtain a transactional context.
The application invokes a Web Service exposed by Database-1 (alternatively exposed by an application server that then talks to Database-1) to subtract money from Waldo‘s balance. Unbeknownst to the application, the context flows along with the Web services invocation.
Database-1 uses information in the context to invoke the Registration Service defined in WS-Coordination to register interest in this transaction.
The application invokes a Web service exposed by Database-2 to add money to Waldo‘s balance. Just like in step 2, the context flows along with the Web services invocation.
Just like step 3, Database-2 uses information in the context to invoke the Registration Service and register interest in this transaction.
The application uses the Completion Protocol defined in WS-AtomicTransaction to indicate that it wishes to commit the transaction. For example, J2EE using Container Managed Transactions could use the Completion Protocol when a program in a CMT completes successfully.
The databases and coordinator participate in 2PC flows, as defined in the WS-AtomicTransaction 2PC Protocol.
FromFigure 4, it is clear that atomic transactions using Web services (WS-C, and WS-AT) are substantially the same as without Web services (Figures2 and3, for example). The primary differences are almost cosmetic from the outside and involve how entities communicate with each other, not the substance of what they communicate. However, these differences in how the entities communicate have a big impact on flexibility and interoperability.
You can acheive universal interoperability with Web services, because instead of changing resource manager X to interoperate with transaction monitor Y, you can change both X and Y to use Web services and then interoperate with many other resource managers and transaction monitors. So instead of two-at-a-time interoperability, or interoperability only within a specific kind of domain, n-way universal interoperability is possible.
Recovery processing using Web services between the interested parties is again the same as before Web services. Resource managers are the only ones who know their resources and how to commit them or roll them back.
As an example, suppose Failure 1 (fromFigure 3) happens, but after the conversion to Web services. Database-1 comes back up and, just like before Web services, it reads its log and realizes that it needs to contact the Coordinator. Information on how to contact the coordinator is in the state saved on its recovery log; with Web services it will be an end point reference (EPR -- defined in WS-Addressing). Database-1 contacts the Coordinator at that EPR with a message defined in WS-AT called Replay. Replay causes the Coordinator to resend the last protocol message to Database-1, which lets Database-1 deduce the transaction state -- and then apply the appropriate recovery rule.


Back to top
Data integrity depends on a set of actions atomically moving data from one well-defined state to another. Our example showed Waldo‘s transaction with two actions inside a common all-or-nothing activity: taking money out of one account and putting it into another account. What if Mrs. Waldo entered into this picture by using an ATM across town at the same time and attempted to take money from one of the same accounts?
If Mrs. Waldo tries to take money out of the account being held by Database-1 between numbered steps 4 and 16 ofFigure 2, her attempted transaction might fail or might have to wait for Waldo‘s transaction to finish. The reason is that Waldo is manipulating that same account data in Database-1. The database record(s) for that account is locked (depending on the database management system, more than just a single balance might be locked).
Having the database records locked in this case is a good thing and a bad thing. It‘s a good thing because data integrity must be preserved and so the actions representing Waldo‘s transaction need to complete (in other words, be isolated) before another transaction is allowed to manipulate the same data; otherwise the recovery steps to guarantee data integrity might be extremely complex. However, having the database records locked is also a bad thing, because, for example, Mrs. Waldo can‘t access the money.
Holding locks for long periods can reduce the amount of concurrency that a database can support. Lack of concurrency inconveniences Mrs. Waldo, but it can be an even bigger concern in other usage examples.
How long is Mrs. Waldo blocked from the joint account? Suppose a failure occurs between numbered steps 4 and 16 fromFigure 3. Database-1 records remain locked somewhere in that span, perhaps for an indeterminate length of time. In this example, observe that access to Database-1 records can depend on the availability of Database-2 (because between steps 4 and 16 is also a lot of Database-2 processing, while Database-1 locks are held).
Mrs. Waldo is accidentally demonstrating a drawback of 2PC: database locks are needed to guarantee isolation and this can have negative consequences for data access (and concurrency). This illustrates one reason that 2PC is generally considered appropriate for more controlled environments, where the participants (such as Database-1 and Database-2) can be expected to behave according to relatively strict policies.
Relaxing isolation while still preserving transactional semantics (and eventual data integrity) would allow better flexibility, but 2PC (classic or Web services) is not well-suited for this. Relaxed isolation, among other things, is the subject of a different Web services transaction specification aimed at more loosely controlled environments: Web Services Business Activity (WS-BA).
WS-BA is outside the scope of this article, but will be the subject of a future article.
Most people will probably not directly use the various features defined in WS-C and WS-AT. Rather, middleware will make much of this processing transparent (and easier). However, it‘s nice to know what goes on behind the scenes, because it can be helpful in using facilities most efficiently, and because you never know -- you might need to write a resource manager someday.


Back to top
We looked at classical transaction processing and how it enables data integrity across a number of actions by forcing an all-or-none semantic to the set of actions. Transaction processing using Web services is logically very similar to classical transaction processing. There is a relatively straightforward map from classic to Web services transactions and we mapped the Durable 2PC variety of Web Services Atomic Transaction as an illustration.