[Bernstein09] 2.7. Summary

来源:百度文库 编辑:神马文学网 时间:2024/04/27 21:26:52

2.7. Summary

Thischapter covered major software abstractions needed to make it easy tobuild reliable TP applications with good performance: transactionbracketing, threads, remote procedure calls, state management, andscalability techniques.

Transaction Bracketing

Transactionbracketing offers the programmer commands to start, commit, and abort atransaction. The operations on data that execute after the Startcommand and before the Commit or Abort are part of the transaction. Inthe chained model, a new transaction begins immediately after a Commitor Abort, so all operations are part of some transaction.

Thetransaction composability problem arises when a program running atransaction calls another program. There is a choice of bracketingsemantics, depending on whether the callee should or should not executewithin the caller’s transaction. If so, the caller’s transaction IDmust be passed to the callee.

Inthe nested transaction model, the callee executes in a subtransaction.A subtransaction abort undoes its actions, but leaves its parenttransaction intact. It is up to the top-level transaction to decidewhether to commit allits committed subtransactions’ work, thereby making its resultsdurable. Savepoints are a related technology that enablesingle-threaded transactions to back up to a previous state, much likesubtransaction abort.

Anotherapproach to transaction bracketing is to tag each component with anattribute that indicates whether an invocation of the component shouldrun in a new transaction, in the caller’s transaction, or in notransaction. This approach commonly is used in object-orientedtransactional middleware products, instead of explicit commands tostart, commit, and abort.

Atransaction program needs to provide compensation steps and exceptionhandling code for transaction failures and system failures. Theprogramming model needs a way to expose the reason for the exception,the state that is available if the exception handler executes after anabort or recovery from a system failure, and whether the handler itselfexecutes within a transaction.

Processes and Threads

Eachprogram executing in a processor has an address space and controlthread, called its processor state. In a multiprogrammed computersystem, each program’s processor state can be temporarily stored inmain memory or on disk and reloaded when the program resumes execution.A TP system architecture must take into account whether its relatedprograms are running in the same or different address spaces, sincethis can affect performance and management.

Thebehavior of a TP system is affected by whether the components of thesystem share an address space and control thread. Although it ispossible to deploy all components of the TP system in a single-threadedprocess, it leads to a system with a large number of processes,typically at least one per executing transaction. Better performanceand scalability usually is obtained with multithreaded processes, dueto reduced main memory requirements, fewer context switchingoperations, and finer grained tuning opportunities. If middlewareimplements the multithreading, then it must intercept synchronous I/Oto avoid blocking the entire process during such operations. The morepopular approach is to use threads supported by the operating system.

Whenmultithreading is unavailable, an alternative is server classes, wheremultiple copies of a TP component are replicated in multiplesingle-threaded servers. Executing the same code in a pool of singlethreaded processes can produce similar benefits to executing the samecode in multiple threads of the same process.

Remote Procedure Calls

Aremote procedure call (RPC) mechanism provides a programming model andruntime environment that allows a program to invoke a program inanother address space as if it were a local call within the sameaddress space. With an RPC, the programmer either receives a returnfrom the call or an error indicating that the program didn’t run, justas if the call were performed locally.

AnRPC mechanism uses an interface definition language to produce a proxylinked to the local program and a stub linked to the program in theremote address space. Proxies and stubs abstract distributed computingdetails such as data marshaling and the communications protocol fromthe programs involved in the call. In an object-oriented RPC, the proxymay use a local object as a surrogate for the remote object beingcalled. A transactional RPC uses the proxies and stubs to propagate thetransaction context, including the transaction ID, from the caller tothe callee.

Beforeperforming an RPC, a client needs to create a binding to the server,for example, by looking up the server’s network address in a registryservice. To perform an RPC securely, the client and server need to beauthenticated and the server needs to check that the client isauthorized to do the call. The runtime needs to monitor each call toensure it succeeds. If the client runtime does not receive anacknowledgment, then it can either repeat the call or ping the server,depending on whether the server is idempotent. The runtime might alsotranslate parameters between different machine formats to enableinteroperability.

Allthis functionality has a price. RPCs are much slower than localprocedure calls and simple message passing. But since RPC functionalityusually is needed, the only alternative is to move the burden to theapplication programmer, which makes application programming moretime-consuming. Hence, transactional RPC is a popular feature oftransactional middleware and underlying platforms.

Shared State

Toprocess transaction requests correctly, components of a TP system needto share state information about users, security tokens, transactionIDs, and the locations and characteristics of other system components.When all the components are deployed within a single address space,they can easily share this state. When the components are distributedacross multiple address spaces, this sharing becomes more challenging.

Thisproblem can be circumvented by using stateless servers, which do notshare state with the client that calls them. Instead of sharing state,each client request includes all state information that the serverneeds for processing the request. For example, a browser can retain acookie, which is the server state that is stored at the client andpassed by the client to the server in each request.

Oneimportant type of shared state is a transaction ID, which identifies atransaction context and is shared across programs that participate inthe same transactional unit of work. A transaction context typically isassociated with a thread of execution and can be propagated from oneprogram to another, for example when using an RPC.

Acommunication session is a way of sharing state between processes ondifferent machines. Typical session state includes transaction contextand security information. By creating a shared session, two processesavoid having to pass state information on every interaction. However,sessions require messages to set up and memory to store their state.Thus, they are primarily useful when information is shared for arelatively long period.

Scalability Techniques

Severalabstractions are needed to help a TP system scale up and scale out tohandle large loads efficiently, including caching, resource pooling,and data partitioning and replication. Using these abstractionsimproves the ability of a TP system to share access to data.

Cachingis a technique that stores a copy of persistent data in shared memoryfor faster access. The major benefit of caching is faster access todata. If the true value of the data in its permanent home needs to beupdated, then synchronization is required to keep the cache valuesacceptably up to date.

Resourcepooling is a mechanism that reuses a resource for many client programs,rather than creating a new resource for each program that needs theresource. For example database connections can be pooled. A databaseconnection is allocated to a program when it needs to use the databaseand returned to the pool when a program’s task is completed.

Partitioningis a technique for improving scalability by segmenting resources intorelated groups that can be assigned to different processors. When aresource type is partitioned, the TP system routes requests for theresource to the partition that contains it. For example, if a databaseis partitioned, an access to a data item is routed to the databasepartition that contains the data item.

Replicationis a technique for improving scalability by spreading the workloadacross multiple identical servers. Clients can either push their workonto particular servers or enqueue the work and have the servers pullfrom the queues. A client may have affinity for a server that hascached data that it frequently accesses, in which case it preferssending its workload to that server. Replication can also be used toimprove availability by using backup replicas to handle the load of afailed replica. One major challenge of replication is to keep updatablereplicas mutually consistent at an affordable cost.