Inside SQL Azure - TechNet Articles - Home - ...

来源:百度文库 编辑:神马文学网 时间:2024/04/28 18:02:53
Resources For IT Professionals  
    Page Details
    • First published by Wayne Walter Berry
    • When: 1 Nov 2010 10:37 AM
    • Revisions: 1
    • Comments: 0
    Options Subscribe to Article (RSS) Can't find it? Write it! Post an Article Can You Improve This Article? Positively! Click Sign In to add the tip, solution, correction or comment that will help other users.

    Report inappropriate content using the Report Abuse link in the footer.

    Inside SQL Azure

    Wiki > TechNet Articles > Inside SQL Azure

    Inside SQL Azure

    Writer: Kalen Delaney

    Technical Reviewers: Lindsey Allen, Wayne Berry, Lubor Kollar, Michael Thomassy

    Summary

    With Microsoft SQL Azure, you can create SQL Server databases in the cloud. Using SQL Azure, you can provision and deploy your relational databases and your database solutions to the cloud, without the startup cost, administrative overhead, and physical resource management required by on-premises databases. The paper will examine the internals of the SQL Azure databases, and how they are managed in the Microsoft Data Centers, to provide you high availability and immediate scalability in a familiar SQL Server development environment.

    Introduction

    SQL Azure Database is Microsoft’s cloud-based relational database service. Cloud computing refers to the applications delivered as services over the Internet and includes the systems, both hardware and software, providing those services from centralized data centers. This introductory section will present basic information about what SQL Azure is and what it is not, define general terminology for use in describing SQL Azure databases and applications, and provide an overview of the rest of the paper.

    What Is SQL Azure?

    Many, if not most, cloud-based databases provide storage using a virtual machine (VM) model. When you purchase your subscription from the vendor and set up an account, you are provided with a VM hosted in a vendor-managed data center. However, what you do with that VM is then entirely isolated from anything that goes on in any other VM in the data center. Although the VM may come with some specified applications preinstalled, in your VM you can install additional applications to provide for your own business needs, in your own personalized environment. Even though your applications can run in isolation, you are dependent on the hosting company to provide the physical infrastructure, and the performance of your applications is impacted by the load on the data center machines, from other VMs using the same CPU, memory, disk I/O, and network resources.

    Microsoft SQL Azure uses a completely different model. The Microsoft data centers have installed large-capacity SQL Server instances on commodity hardware that are used to provide data storage to the SQL Azure databases created by subscribers. One SQL Server database in the data center hosts multiple client databases created through the SQL Azure interface. In addition to the data storage, SQL Azure provides services to manage the SQL Server instances and databases. More details regarding the relationship between the SQL Azure databases and the databases in the data centers, as well as the details regarding the various services that interact with the databases, are provided in later in this paper.

    What Is in This Paper

    In this paper, we describe the underlying architecture of the SQL Server databases in the Microsoft SQL Azure data centers, in order to explain how SQL Azure provides high availability and immediate scalability for your data. We tell you how SQL Azure provides load balancing, throttling, and online upgrades, and we show how SQL Server enables you to focus on your logical database and application design, instead of having to worry about the physical implementation and management of your servers in an on-premises data center.

    If you are considering moving your SQL Server databases to the cloud, knowing how SQL Azure works will provide you with the confidence that SQL Azure can meet your data storage and availability needs.

    What Is NOT in This Paper

    This paper is not a tutorial on how to develop an application using SQL Azure, but some of the basics of setting up your environment will be included to set a foundation. We will provide a list of links where you can get more tutorial-type information.

    This paper will not cover specific Windows Azure features such Windows Azure Table store and Blob store.

    Also, this paper is not a definitive list of SQL Azure features. Although SQL Azure does not support all the features of SQL Server, the list of supported features grows with every new SQL Azure release. Because things are changing so fast with SQL Azure service updates every few months and new releases several times a year, users will need to get in the habit of checking the online documentation regularly. New service updates are announced on the SQL Azure team blog, found here: http://blogs.msdn.com/b/sqlazure/. In addition to announcing the updates, the blog is also your main source for information about new features included in the updates, as well as new options and new tools.

    You can check to see the date of the service update you are connected to by examining the @@version function. At this writing, we receive the following result:

    Query:

    SELECT @@version;

    Result:

    Microsoft SQL Azure (RTM) - 10.25.9386.0

    Jul 21 2010 12:47:47

    Copyright (c) 1988-2009 Microsoft Corporation

    Terminology

    The terms used in this paper will be based on the definitions in “Above the Clouds: A Berkeley View of Cloud Computing”, EECS Technical Report, February 2009. In some cases, there are terms used specific to Microsoft’s implementation of database cloud computing, with SQL Azure, that were not discussed in the aforementioned report, and those are defined as they are introduced. This section lists and defines the basic terms used in the paper.

    Cloud computing –refers to both applications delivered as services over the Internet and the hardware and software in the data centers that provide those services; in the case of SQL Azure, the data centers are set up and managed by Microsoft.

    The cloud – specifically refers to the hardware and software in the data centers.

    Target Audience

    This paper is intended to provide information about SQL Azure for SQL Server database developers, DBAs, and corporate decision makers.

    Prerequisites

    Some SQL Server knowledge is expected, and Windows knowledge would be helpful.

    Setting Up SQL Azure

    Although this section will include some basic information that is also available in more generic SQL Azure papers and documentation, we strongly recommend that you first read the paper entitled “Getting Started with SQL Azure,” which you can download from this page:
    http://msdn.microsoft.com/en-us/windowsazure/sqlazure

    Subscriptions

    To use SQL Azure, you must first create a Windows Azure platform account, which allows you to access all the Windows Azure-related services, such as Windows Azure, Windows Azure AppFabric, and SQL Azure. Creating a Windows Azure platform account does not mean that you have to use Windows Azure in your environment, although it is considered best practice to run applications that access a SQL Azure database on a Windows Azure platform. The Windows Azure platform account is used to set up and manage your subscriptions and to bill for consumption of any of the Windows Azure services including SQL Azure, and running SQL Azure does not require Windows Azure. You can buy your subscription from the Microsoft Online Services Customer Portal (https://mocp.microsoftonline.com/site/default.aspx). After your purchase is complete, you receive a confirmation notification in email with instructions for accessing your account in the SQL Azure platform. This confirmation includes the server name that you use in your connection strings or that you provide when registering a server in SQL Server Management Studio. In SQL Server 2008 R2, SQL Server Management Studio includes built-in support for SQL Azure.

    After you create a Windows Azure platform account, you can use the SQL Azure portal (https://sql.azure.com/ProjectViewer.aspx) to create a single SQL Azure server and its associated administrator account. To create additional SQL Azure servers, create additional subscriptions. The SQL Azure portal provides a user interface that you can use to create servers and to quickly create databases on those servers.

    Each subscription allows one instance of SQL Server to be defined, which will initially include only a master database. For each server, you will also need to configure the firewall settings, to determine which connections will be allowed access.

    We cover more details about SQL Azure login and user security later in this paper. In addition, for more information about connection settings, logins, users, and configuring the SQL Azure Firewall, see the following articles:

    • SQL Azure Firewall (http://msdn.microsoft.com/en-us/library/ee621782.aspx)
    • Managing Databases and Logins in SQL Azure (http://msdn.microsoft.com/en-us/library/ee336235.aspx)
    • Strong Passwords (http://msdn.microsoft.com/en-us/library/ms161962(SQL.100).aspx)
    • How to Configure the SQL Azure Firewall (http://msdn.microsoft.com/en-us/library/ee621783.aspx)
    • Troubleshooting SQL Azure (http://msdn.microsoft.com/en-us/library/ee730906.aspx)

    Databases

    Each SQL Azure server always includes a master database. Up to 149 additional databases can be created for each SQL Azure server. Databases can be created using the SQL Azure Portal or the Transact-SQL CREATE DATABASE command. There is no graphical interface support through SQL Server Management Studio for creating a database. When you are connected to your SQL Azure server, you can create a database using Transact-SQL by using a query template. To open a query template, in SQL Server Management Studio, right click Databases and then click New Database.

    The SQL Azure Portal interface for creating a database is shown in Figure 1. The only options that can be specified are the name and the maximum size of the database. There are two editions of SQL Azure databases: Web and Business, and when you create a database using the SQL Azure Portal, the maximum size you specify determines the edition you create. A Web Edition database can have a MAXSIZE value of 1 GB or 5 GB. A Business Edition database can have a MAXSIZE value of 10 GB, 20 GB, 30 GB, 40 GB, or 50 GB. If the size of your database reaches the limit you set in MAXSIZE, you receive an error code 40544. When this happens, you cannot insert data, update data, or create new database objects. However, you can still read and delete data, truncate tables, drop tables and indexes, and rebuild indexes. If you delete data or drop objects to free storage space, there can be as much as a fifteen-minute delay before you can insert new data or modify existing data.

    Figure 1: Creating a new Web Edition SQL Azure database

    To change the name, edition, or maximum size of your database after creation, you can use the ALTER DATABASE command. The SQL Azure ALTER DATABASE command only allows three options to be specified (name, maximum size, and edition) as shown here:

    ALTER DATABASE database_name

    {

    MODIFY NAME = new_database_name

    |MODIFY (MAXSIZE = {1 | 5 | 10 | 20 | 30 | 40 | 50} GB)

    |MODIFY (EDITION = {'web' | 'business'})

    }

    Pricing for your SQL Azure subscription is per database, and it is based on the edition. The per-database fee is amortized over the month and charged on a daily basis. You pay for the user databases you have on the days you have them. There is no fee for the master database. You should be aware, however, that if you drop and create a database with the same name several times in one day, each CREATE counts as a new database for that day.

    There are also additional charges for data transfer volume, any time data comes into or out of the data center. You have your choice of running your application code on your own premises and connecting to your SQL Azure database in the data center, or of running your application code in Windows Azure, which is hosted in the same data center as your SQL Azure database. The first model is referred to as the code-far model and the second is referred to as code-near. The code-near model avoids the additional data transfer charges and is very close to an on-premises programming model. In either case, you should be aware of the Internet network latency that cannot be mitigated using either model.

    The SQL Azure data access model does not support cross-database queries in the current version, and the USE command is not supported. A connection is made to a single database; if you need data from another database, you must make a new connection. In SQL Server Management Studio, to specify the database to which you want to connect interactively, click Options in the New Connection dialog box. Programmatically, you can specify the database in your application’s connection string. You can code cross-database joins or comparisons in the application after the data has been returned from the appropriate databases.

    The master Database

    When first creating a SQL Azure server, you have only one database available, called the master database. The master database in SQL Azure is slightly different from the corresponding database in SQL Server. Like a SQL Server master database, the SQL Azure master database is the store for login names, but that’s about the limit of the similarities. SQL Azure doesn’t support server-level roles. However, SQL Azure introduces two new database roles, available only in the master database: dbmanager and loginmanager.

    • dbmanager – The SQL Azure dbmanager role is similar to the dbcreator role for an on-premises instance of SQL Server. It is required in order to create databases.
    • loginmanager - Like the securityadmin role for an on-premises instance of SQL Server, the loginmanager role in SQL Azure is required in order to create logins.

    The SQL Azure master database has some other key differences from an on-premises SQL Server master database:

    • The master database in SQL Azure contains some catalog views that are unavailable in an on-premises instance of SQL Server. These include: sys.firewall_rules, sys.bandwidth_usage, and sys.database_usage.
    • Although users can be created in your SQL Azure master database, those users cannot take full control of the database; in other words, no one other than server-level principal login (created by the initial SQL Azure server creation process) can become a database owner of the master database.
    • Guest access to the master database is disabled (this is also true for all databases created on your SQL Azure server.)

    Security

    Most security issues for your SQL Azure databases are managed by Microsoft within the SQL Azure data center, with very little setup required by the users. However, there are some security issues that it will be useful to be aware of to fully understand how your data is kept secure.

    Access to a SQL Azure Database

    Just as when accessing an on-premises database, a user must have a valid login and password in order to connect to the SQL Azure database. Because SQL Azure supports only standard security, each login must be explicitly created. An administrator can create an unlimited number of logins using SQL Server Management Studio for SQL Server 2008 R2 or using Transact-SQL. Common login names such as ‘sa’, ‘admin’, ‘administrator’, ‘root’, and ‘guest’ are not allowed, requiring an illegal user to guess at valid login names.

    Passwords are required to be strong, following the Windows and SQL Server guidelines for password strength.

    In addition, the firewall can be configured on each SQL Azure server to only allow traffic from specified IP addresses to access the SQL Azure server. By default, no IP addresses are included in the list. This helps to greatly reduce any chance of a denial-of-service (DoS) attack. All communications between clients and SQL Azure must be SSL encrypted, and clients should always connect with Encrypt = True to ensure that there is no risk of man-in-the-middle attacks. DoS attacks are further reduced by a service called DoSGuard that actively tracks failed logins from IP addresses and if it notices too many failed logins from the same IP address within a period of time, the IP address is blocked from accessing any resources in the service.

    Database Users

    The security model within a database is identical to that in SQL Server. Users are created and mapped to login names. Users can be assigned to roles, and users can be granted permissions. Data in each database is protected from users in other databases because the connections from the client application are established directly to the connecting user’s database. There is no way to break out of this connection, and as mentioned earlier, the USE command is not supported in SQL Azure.

    For more basic information about security with SQL Azure, see the white paper Security Guidelines for SQL Azure (http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=4322517e-9d80-4ad3-8a75-bf0a10aa64d9).

    Compatibility with SQL Server

    Although we talk about your SQL Azure database being similar to a SQL Server database, there are some important differences. In particular, many SQL Server features are not currently supported in a SQL Azure database.

    Because this list is changing with every SQL Azure Service Update, be sure to check the list of supported and unsupported features. For more information, see SQL Server Feature Limitations (SQL Azure Database) (http://msdn.microsoft.com/en-us/library/ff394115(v=MSDN.10).aspx).

    SQL Azure Architecture Overview

    As discussed earlier, each SQL Azure database is associated with its own subscription. From the subscriber’s perspective, SQL Azure provides logical databases for application data storage. In reality, each subscriber’s data is actually stored multiple times, replicated across three SQL Server databases that are distributed across three physical servers in a single data center. Many subscribers may share the same physical database, but the data is presented to each subscriber through a logical database that abstracts the physical storage architecture and uses automatic load balancing and connection routing to access the data. The logical database that the subscriber creates and uses for database storage is referred to as a SQL Azure database.

    Logical Databases on a SQL Azure Server

    SQL Azure subscribers access the actual databases, which are stored on multiple machines in the data center, through the logical server. The SQL Azure Gateway service acts as a proxy, forwarding the Tabular Data Stream (TDS) requests to the logical server. It also acts as a security boundary providing login validation, enforcing your firewall and protecting the instances of SQL Server behind the gateway against denial-of-service attacks. The Gateway is composed of multiple computers, each of which accepts connections from clients, validates the connection information and then passes on the TDS to the appropriate physical server, based on the database name specified in the connection. Figure 2 shows the complex physical architecture represented by the single logical server.

    Figure 2: A logical server and its databases distributed across machines in the data center (SQL stands for SQL Server)

    In Figure 2, the logical server provides access to three databases: DB1, DB3, and DB4. Each database physically exists on one of the actual SQL Server instances in the data center. DB1 exists as part of a database on a SQL Server instance on Machine 6, DB3 exists as part of a database on a SQL Server instance on Machine 4, and DB4 exists as part of a SQL Server instance on Machine 5. There are other SQL Azure databases existing within the same SQL Server instances in the data center (such as DB2), available to other subscribers and completely unavailable and invisible to the subscriber going through the logical server shown here.

    Each database hosted in the SQL Azure data center has three replicas: one primary replica and two secondary replicas. All reads and writes go through the primary replica, and any changes are replicated to the secondary replicas asynchronously. The replicas are the central means of providing high availability for your SQL Azure databases. For more information about how the replicas are managed, see “High Availability with SQL Azure” later in this paper.

    In Figure 2, the logical server contains three databases: DB1, DB2, and DB3. The primary replica for DB1 is on Machine 6 and the secondary replicas are on Machine 4 and Machine 5. For DB3, the primary replica is on Machine 4, and the secondary replicas are on Machine 5 and on another machine not shown in this figure. For DB4, the primary replica is on Machine 5, and the secondary replicas are on Machine 6 and on another machine not shown in this figure. Note that this diagram is a simplification. Most production Microsoft SQL Azure data centers have hundreds of machines with hundreds of actual instances of SQL Server to host the SQL Azure replicas, so it is extremely unlikely that if multiple SQL Azure databases have their primary replicas on the same machine, their secondary replicas will also share a machine.

    The physical distribution of databases that all are part of one logical instance of SQL Server means that each connection is tied to a single database, not a single instance of SQL Server. If a connection were to issue a USE command, the TDS might have to be rerouted to a completely different physical machine in the data center; this is the reason that the USE command is not supported for SQL Azure connections.

    Network Topology

    Four distinct layers of abstraction work together to provide the logical database for the subscriber’s application to use: the client layer, the services layer, the platform layer, and the infrastructure layer. Figure 3 illustrates the relationship between these four layers.

    Figure 3: Four layers of abstraction provide the SQL Azure logical database for a client application to use

    The client layer resides closest to your application, and it is used by your application to communicate directly with SQL Azure. The client layer can reside on-premises in your data center, or it can be hosted in Windows Azure. Every protocol that can generate TDS over the wire is supported. Because SQL Azure provides the TDS interface as SQL Server, you can use familiar tools and libraries to build client applications for data that is in the cloud.

    The infrastructure layer represents the IT administration of the physical hardware and operating systems that support the services layer. Because this layer is technically not a part of SQL Azure, it is not discussed further in this paper.

    The services and platform layers are discussed in detail in the next sections.

    Services Layer

    The services layer contains the machines that run the gateway services, which include connection routing, provisioning, and billing/metering. These services are provided by four groups of machines. Figure 4 shows the groups and the services each group includes.

    Figure 4: Four groups of machines provide the services layer in SQL Azure

    The front-end cluster contains the actual gateway machines. The utility layer machines validate the requested server and database and manage the billing. The service platform machines monitor and manage the health of the SQL Server instances within the data center, and the master cluster machines keep track of which replicas of which databases physically exist on each actual SQL Server instance in the data center.

    The numbered flow lines in Figure 4 indicate the process of validating and setting up a client connection:

    1. When a new TDS connection comes in, the gateway, hosted in the front-end cluster, is able to establish a connection with the client. A minimal parser verifies that the command is one that should be passed to the database, and is not something such as a CREATE DATABASE, which must be handled in the utility layer.
    2. The gateway performs the SSL handshake with the client. If the client refuses to use SSL, the gateway disconnects. All traffic must be fully encrypted. The protocol parser also includes a “Denial of Service” guard, which keeps track of IP addresses, and if too many requests come from the same IP address or range of addresses, further connections are denied from those addresses.
    3. Server name and login credentials supplied by the user must be verified. Firewall validation is also performed, only allowing connections from the range of IP addresses specified in the firewall configuration.
    4. After a server is validated, the master cluster is accessed to map the database name used by the client to a database name used internally. The master cluster is a set of machines maintaining this mapping information. For SQL Azure, partition means something much different than it means on your on-premises SQL Server instances. For SQL Azure, a partition is a piece of a SQL Server database in the data center that maps to one SQL Azure database. In Figure 2, for example, each of the databases contains three partitions, because each hosts three SQL Azure databases.
    5. After the database is found, authentication of the user name is performed, and the connection is rejected if the authentication fails. The gateway verifies that it has found the database that the user actually wants to connect to.
    6. After all connection information is determined to be acceptable, a new connection can be set up.
    7. This new connection goes straight from the user to the back-end (data) node.
    8. After the connection is established, the gateway’s job is only to proxy packets back and forth from the client to the data platform.

    Platform Layer

    The platform layer includes the computers hosting the actual SQL Server databases in the data center. These computers are called the data nodes. As described in the previous section on logical databases on a SQL Azure server, each SQL Azure database is stored as part of a real SQL Server database, and it is replicated twice onto other SQL Server instances on other computers. Figure 5 provides more details on how the data nodes are organized. Each data node contains a single SQL Server instance, and each instance has a single user database, divided into partitions. Each partition contains one SQL Azure client database, either a primary or secondary replica.

    Figure 5: The actual data nodes are part of the platform layer

    A SQL Server database on a typical data node can host up to 650 partitions. Within the data center, you manage these hosting databases just as you would manage an on-premises SQL Server database, with regular maintenance and backups being performed within the data center. There is one log file shared by all the hosted databases on the data node, which allows for better logging throughput with sequential I/O/group commits. Unlike on-premises databases, in SQL Azure, the database log files pre-allocate and zero out gigabytes of log file space before the space is needed, thus avoiding stalls due to autogrow operations.

    Another difference between log management in the SQL Azure data center and in on-premises databases is that every commit needs to be a quorum commit. That is, the primary replica and at least one of the secondary replicas must confirm that the log records have been written before the transaction is considered to be committed.

    Figure 5 also indicates that each data node machine hosts a set of processes referred to as the fabric. The fabric processes perform the following tasks:

    • Failure detection: notes when a primary or secondary replica becomes unavailable so that the Reconfiguration Agent can be triggered
    • Reconfiguration Agent: manages the re-establishment of primary or secondary replicas after a node failure
    • PM (Partition Manager) Location Resolution: allows messages to be sent to the Partition Manger
    • Engine Throttling: ensures that one logical server does not use an disproportionate amount of the node’s resources, or exceed its physical limits
    • Ring Topology: manages the machines in a cluster as a logical ring, so that each machine has two neighbors that can detect when the machine goes down

    The machines in the data center are all commodity machines with components that are of low-to-medium quality and low-to-medium performance capacity. At this writing, a commodity machine is a SKU with 32 GB RAM, 8 cores, and 12 disks, with a cost of around $3,500. The low cost and the easily available configuration make it easy to quickly replace machines in case of a failure condition. In addition, Windows Azure machines use the same commodity hardware, so that all machines in the data center, whether used for SQL Azure or for Windows Azure, are interchangeable.

    The term cluster refers to a collection of machines in the data center plus the operating system and network. A cluster can have up to 1,000 machines, and at this time, most data centers have one cluster of machines in the platform layer, over which SQL Azure database replicas can be spread. The SQL Azure architecture does not require a single cluster, and if more than 1,000 machines are needed, or if there is need for a set of machines to dedicate all their capacity to a single use, machines can be grouped into multiple clusters.

    High Availability with SQL Azure

    SQL Azure Database offers the high availability of an enterprise data center without the administrative overhead usually associated with managing your own on-premises servers.

    Concept

    The goal for Microsoft SQL Azure is to maintain 99.9 percent availability for the subscribers’ databases. This goal is achieved by the use of commodity hardware that can be quickly and easily replaced in the case of machine or drive failure and the management of the replicas, one primary and two secondary, for each SQL Azure database.

    Failure Detection

    Management in the data centers needs to detect not only a complete failure of a machine, but also conditions where machines are slowly degenerating and communication with them is affected. The concept of quorum commit, discussed earlier, addresses these conditions. First, a transaction is not considered to be committed unless the primary replica and at least one secondary replica can confirm that the transaction log records were successfully written to disk. Second, if both a primary replica and a secondary replica must report success, small failures that might not prevent a transaction from committing but that might point to a growing problem can be detected.

    Reconfiguration

    The process of replacing failed replicas is called reconfiguration. Reconfiguration can be required due to failed hardware or to an operating system crash, or to a problem with the instance of SQL Server running on the node in the data center. Reconfiguration can also be necessary when an upgrade is performed, whether for the operating system, for SQL Server, or for SQL Azure.

    All nodes are monitored by six peers, each on a different rack than the failed machine. The peers are referred to as neighbors. A failure is reported by one of the neighbors of the failed node, and the process of reconfiguration is carried out for each database that has a replica on the failed node. Because each machine holds replicas of hundreds of SQL Azure databases (some primary replicas and some secondary replicas), if a node fails, the reconfiguration operations are performed hundreds of times. There is no prioritization in handling the hundreds of failures when a node fails; the Partition Manager randomly selects a failed replica to handle, and when it is done with that one, it chooses another, until all of the replica failures have been dealt with.

    If a node goes down because of a reboot, that is considered a clean failure, because the neighbors receive a clear exception message.

    Another possibility is that a machine stops responding for an unknown reason, and an ambiguous failure is detected. In this case, an arbitrator process determines whether the node is really down.

    Although this discussion centers on the failure a single replica, it is really the failure of a node that is detected and dealt with. A node contains an entire SQL Server instance with multiple partitions containing replicas from up to 650 different databases. Some of the replicas will be primary and some will be secondary. When a node fails, the processes described earlier are performed for each affected database. That is, for some of the databases, the primary replica fails, and the arbitrator chooses a new primary replica from the existing secondary replicas, and for other databases, a secondary replica fails, and a new secondary replica is created.

    The majority of the replicas of any SQL Azure database must confirm the commit. At this time, user databases maintain three replicas, so a quorum commit would require two of the replicas to acknowledge the transaction. A metadata store, which is part of the Gateway components in the data centers, maintains five replicas and so needs three confirmations to satisfy a quorum commit. The master cluster, which maintains seven replicas, needs four of them to confirm a transaction. However, for the master cluster, even if all seven replicas fail, the information is recoverable, because mechanisms are in place to rebuild the master cluster automatically in case of such a massive failure.

    Failure of Primary Replica

    Because all reads and writes take place on the primary replica first, it is immediately evident if and when the primary replica has failed and work cannot continue. When reconfiguring for a failed primary replica, the Partition Manager chooses one of the secondary replicas as the new primary. In general, the secondary replica on the node with the least load is chosen as the new primary replica. This process of promoting a secondary replica to the status of primary is virtually undetectable by most users, so that as far as the users running the applications are concerned, there is no down time. The gateway will pass a “disconnection” message to the client application, and it should immediately attempt to reconnect. Because it may take up to 30 seconds for the information about the new primary replica to propagate through all the gateway servers, the best practice is to try to reconnect several times with a smaller wait time after each failed attempt.

    Failure of Secondary Replica

    When a secondary replica fails, the database is left with only the two replicas needed for the quorum commit, with no spares. The reconfiguration process is similar to the process that follows failure of the primary replica, in which one of the secondary replicas is promoted to primary. In both cases, only one secondary replica remains, so the Partition Manager attempts, after a short wait to determine whether the failure is permanent, to re-create a new secondary replica.

    In some cases, such as an operating system failure or an upgrade, what appears to be a secondary replica failure may not actually be a failure. Because a secondary replica on a failed node may only be out of service temporarily, a new replica is not built immediately. If the secondary replica comes back online, checkdisk and other verification commands are run to confirm the health of the replica.

    If a replica stays in the failed state for more than two hours, the Partition Manager starts building a replacement replica. There are of course some cases in which this fixed time-out is not optimum, such as in the cases where the machine fails due to an unrecoverable hardware problem. Future enhancements to SQL Azure might include the ability to differentiate different types of replica failure, and the ability to handle nonrecoverable failures more quickly.

    If a node does permanently fail, to create a new secondary replica, a machine is chosen from those in the cluster that has sufficient disk space and CPU capacity. This new machine is now used for the new secondary replica. A copy of the database is made from the primary replica, and then it is connected to the existing configuration. The time required for copying the entire database contents is the limiting factor in the maximum manageable size of SQL Azure databases.

    Availability Guarantees

    As mentioned earlier, the goal for Microsoft SQL Azure is to maintain 99.9 percent availability. Because of the way that database replicas are distributed across multiple servers and the efficient algorithms for promoting secondary replicas to primary, up to 15 percent of the machines in the data center can be down and the availability can still be guaranteed. In addition, even with the loss of up to 15 percent of the machines, the supportable workload will not be reduced.

    Scalability with SQL Azure

    One of the biggest benefits of hosting your databases with SQL Azure is the built-in scalability. You add more databases only when and if you need them, and if the need is only temporary, you can then drop the unneeded databases. There are two components within SQL Azure that allow this scalability by continuously monitoring the load on each node. One component is Engine Throttling, which ensures that the server doesn’t get overloaded. The other component is the Load Balancer, which ensures that a server isn’t continuously in the throttled state. In this section, we’ll look at these two components and discuss how engine throttling applies when predefined limits are reached and how load balancing works as the number of hosted database increases.

    Throttling

    Because of the multitenant use of each SQL Server in the data center, it is possible that one subscriber’s application could render the entire instance of SQL Server ineffective by imposing heavy loads. For example, under full recovery mode, inserting lots of large rows, especially ones containing large objects, can fill up the transaction log and eventually the drive that the transaction log resides on. In addition each instance of SQL Server in the data center shares the machine with other critical system processes that cannot be starved – most relevantly the fabric process that monitors the health of the system, which must be running for SQL Azure to stay up.

    To keep a data center server’s resources from being overloaded and jeopardizing the health of the entire machine, the load on each machine is monitored by the Engine Throttling component. In addition, each database replica is monitored to make sure that statistics such as log size, log write duration, CPU usage, the actual physical database size limit, and the SQL Azure user database size are all below target limits. If the limits are exceeded, the result can be that a SQL Azure database rejects reads or writes for 10 seconds at a time. Occasionally, violation of resource limits may result in the SQL Azure database permanently rejecting reads and writes (depending on the resource type in question). In addition, the Engine Throttling component also monitors for long-running transactions; if transactions take longer than five minutes and consume more than the minimum amount of critical system resources (again, this includes log space, lock memory, and other resources) the Engine Throttling component kills such transactions.

    The user is notified of a limit violation via one of three error messages, depending on the type of resource. For each type of resource, the response to the violation is different. The three types of limits and the responses when each type is exceeded are:

    • System operating system resource limits, such as CPU and memory. If these limits are exceeded, the client application receives a message suggesting that the query or batch be retried after 10 seconds.
    • SQL Azure database size limits, based on the edition and maximum size chosen during database creation. If this limit is exceeded, the client application receives a message requesting that data be deleted, or that the maximum size be increased or the SKU changed from “Web” to “Business”.
    • Physical SQL Server database size limits based on the actual data center machine capacity. If this limit is exceeded, the client application receives a message indicating that the system is experiencing issues that are being investigated.

    When the resource usage falls below the specified limits, the throttling condition is lifted.

    Load Balancer

    At this time, although there are availability guarantees with SQL Azure, there are no performance guarantees. Part of the reason for this is the multitenant problem: many subscribers with their own SQL Azure databases share the same instance of SQL Server and the same computer, and it is impossible to predict the workload that each subscriber’s connections will be requesting. However, not having guarantees doesn’t mean that performance is not a critical aspect of the design of the SQL Azure infrastructure. SQL Azure provides load balancing services that evaluate the load on each machine in the data center.

    When a new SQL Azure database is added to the cluster, the Load Balancer determines the locations of the new primary and secondary replicas based on the current load on the machines.

    If one machine gets loaded too heavily, the Load Balancer can move a primary replica to a machine that is less loaded. The simplest way to do this move is to switch a primary and secondary replica for a SQL Azure database that is performing sluggishly. This switch can have a major and immediate impact on performance, because all reads and writes take place on the primary replica.

    SQL Azure Development Considerations

    Although developing SQL Server applications for SQL Azure is very similar to developing applications for an on-premises instance of SQL Server, there are some key differences that you need to be aware of. As mentioned earlier, make sure you monitor the SQL Azure Team blog, where some of the most crucial differences are discussed, and where new features are announced when they are added. In addition, you should download and read the white paper SQL Server vs. SQL Azure (http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=86f12b41-1eba-4567-9ac8-02eaa7d12034), which is about the differences between SQL Server and SQL Azure.

    In the section, we point out some of the most important differences that you need to be aware of before you start planning your first SQL Azure project.

    Connections, Disconnections

    When using a web-enabled database like SQL Azure, you should be prepared to handle unexpected dropping of connections, including planning for them in your code. The best way to handle connection loss is to re-establish the connection and then re-execute any commands or queries that failed.

    If a network problem causes a disconnection, SQL Azure cannot return a meaningful error to the application before the session ends. However, if you attempt to reuse this connection (like when you use connection pooling with SQL Server), you receive an error message informing you that a transport-level error has occurred.

    Like any other database, SQL Azure will at times terminate sessions due to errors, resource shortages, and other transient reasons. In these situations SQL Azure always attempts to return a specific error message if the client connection has an active request. However, it may not always be possible to return an error message to a client application if there are no pending requests. For example, if you are connected to your database through SQL Server Management Studio for longer than 30 minutes without an active request, your session will time out, and because there are no active requests, SQL Azure can’t return an error message.

    Clustered Index Requirement

    Unlike SQL Server, every table in your SQL Azure databases needs to have a clustered index. You might be aware that when you declare a primary key on a table, a clustered index is created on the primary key column(s) by default. Keep in mind that a clustered index on the primary key is not the only way to create your clustered index, and sometimes it is not the best choice. Clustered indexes sort and store the data rows in the table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order, and it is up to you to make sure every table has a clustered index.

    SQL Azure does not allow heap tables – a heap table, by definition, is a table that doesn't have a clustered index. However, this rule applies only to your SQL Azure databases. Temporary tables exist in the tempdb database that is part of the underlying SQL Server instances in the data center, and those tables can be heaps.

    When additional nonclustered indexes are built on tables that have clustered indexes, those additional indexes reference the table data using the clustered key. In SQL Server, a nonclustered index on a heap references data using physical addresses of the data. Because of the way SQL Azure can store multiple SQL Azure databases within one SQL Server database, referencing actual physical locations would restrict the flexibility of the SQL Azure system.

    All indexes in SQL Server and SQL Azure are stored as B-Trees. The underlying high availability and replication technology in SQL Azure is based on replicating B-Tree rows. Because of this structure, you can maintain machines independently, and your system can benefit from I/O optimizations that would not be possible with heaps.

    Transaction Management

    SQL Azure supports local transactions with the normal Transact-SQL commands BEGIN TRANSACTION, ROLLBACK TRANSACTION, and COMMIT TRANSACTION. They work exactly the way they do on SQL Server.

    Every SQL Azure databases is configured with the database options to enable snapshot isolation. Both READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION database options are set to ON. The default isolation level in both SQL Server and SQL Azure is READ COMMITTED, and because of the database option READ_COMMITTED_SNAPSHOT, your SQL Azure transactions run in optimistic concurrency. You cannot change the database options in a SQL Azure database. You may be able to control the isolation level explicitly on a connection using the SET TRANSACTION ISOLATION LEVEL command prior to beginning a transaction. However, you cannot change the default setting, READ COMMITTED with optimistic concurrency, to the SQL Server default setting, READ COMMITTED with pessimistic concurrency. The only way to duplicate the default SQL Server behavior is to use the locking hint WITH (READCOMMITTEDLOCK) with every table in every transaction.

    SQL Azure Management

    Because your SQL Azure databases are hosted within larger SQL Server instances on machines in the data centers, the management work that you need to do is very limited. However, some maintenance tasks are still necessary. You can use familiar SQL Server tools and methods to perform these tasks. This section discusses common database management tasks such as troubleshooting and maintenance. It also covers tools you use to manage SQL Azure, including supported metadata objects, well as jobs, alerts, and other tools.

    Maintenance and Upgrades

    As a DBA, your primary job is typically to manage regular backups and performance consistency checks. These tasks are now handled by Microsoft within the data centers. DBAs are not responsible for the physical management and placement of files and filegroups, or of monitoring file growth. All physical aspects of dealing with your databases are handled in the data center.

    Upgrades are also a typical DBA task. In SQL Azure, those are also handled in the data center, one replica at a time. In fact, SQL Azure upgrades are released much more often than SQL Server upgrades. Currently, a new SQL Azure version is released every couple of months. Upgrades are handled using the same reconfiguration process used when a node goes down, because the node is made unavailable while the upgrade is being performed. In addition to upgrades of SQL Azure itself, the hosting SQL Server instance, as well as the operating system software, will occasionally need upgrades or patches, necessitating bringing the node offline temporarily.

    For more information about the patching process, see the blog post Patching SQL Azure (http://blogs.msdn.com/b/sqlazure/archive/2010/04/30/10004818.aspx).

    Troubleshooting

    Because you have no control over the physical provisioning or configuration of your machines or of the database files, your troubleshooting requirements are minimized. You may need to perform troubleshooting in areas that involve poorly performing queries and concurrency problems, such as blocking. Because your SQL Azure database runs by default in READ COMMITTED SNAPSHOT isolation level, you may not encounter many blocking problems.

    The techniques for troubleshooting blocking problems or suboptimal execution plans are basically the same in SQL Azure as for SQL Server, and a thorough discussion of techniques is beyond the scope of this paper. However, this paper does describe the most useful tools available for monitoring and troubleshooting your SQL Azure applications. Just like in SQL Server, some of the main tools available for troubleshooting are the dynamic management views (DMVs). However, SQL Azure supports only a fraction of the full set of dynamic management views that SQL Server provides, and some of the available dynamic management views work slightly differently. For example, in SQL Server, users need the VIEW SERVER STATE permission to see the contents of many of the views, but in SQL Azure, users need the VIEW DATABASE STATE permission. In SQL Server, views such as sys.dm_tran_locks, sys.dm_exec_requests, and sys.dm_exec_query_stats show details for processes and queries across the entire server instance. In SQL Azure, these views return information about your SQL Azure database only. For more information about supported dynamic management views, see “Metadata” later in this paper.

    For more information about troubleshooting in SQL Azure, see Monitoring SQL Azure Using Dynamic Management Views (http://msdn.microsoft.com/en-us/library/ff394114.aspx).

    Metadata

    Metadata objects in SQL Azure are a subset of those available in SQL Server. Metadata objects in SQL Server are divided into three categories: compatibility views, catalog views, and dynamic management views. Compatibility views handle backward compatibility, and SQL Azure only supports one of these views, called sys.syslanguages. Catalog views are the preferred interface for accessing SQL Server metadata regarding objects, users, and schemas, and many of the SQL Server catalog views are supported in SQL Azure. Some of the ones that aren’t supported deal with physical structures such as sys.filegroups and sys.allocation_units, or with features that are not supported in SQL Azure, such as sys.fulltext_indexes and sys.service_broker_endpoints.

    SQL Azure partially supports three categories of dynamic management views useful for troubleshooting, monitoring, and system analysis:

    • Database-related dynamic management views.Only one view in this category is supported:
      • sys.dm_db_partition_stats.
    • Execution-related dynamic management views and functions. Seven objects in this category are supported, and the information returned pertains only to activity in the current SQL Azure database:
      • sys.dm_exec_connections
      • sys.dm_exec_query_plan
      • sys.dm_exec_query_stats
      • sys.dm_exec_requests
      • sys.dm_exec_sessions
      • sys.dm_exec_sql_text
      • sys.dm_exec_text_query_plan
    • Transaction-related dynamic management views. Four views in this category are supported, and again, the information returned pertains only to activity in the current SQL Azure database:
      • sys.dm_tran_active_transactions
      • sys.dm_tran_database_transactions
      • sys.dm_tran_locks
      • sys.dm_tran_session_transactions

    For the full list of both supported and unsupported metadata objects (which changes with each service update to SQL Azure), see System Views (SQL Azure Database) (http://msdn.microsoft.com/en-us/library/ee336238.aspx).

    Jobs and Alerts

    Many DBA tasks in an on-premises instance of SQL Server are managed using jobs and alerts through SQL Server Agent. Because SQL Azure does not support SQL Server Agent, setting up any kind of recurring task is a bit more problematic. However, many tasks are not required or possible with SQL Azure, such as backup operations. In addition, many common warnings for which you might want to configure alerts, such as database size and system resource usage, cannot be directly observed in SQL Azure, so jobs and alerts are really not a feature in a SQL Azure DBA’s life.

    If you need functionality that is similar to SQL Server Agent for SQL Azure for other tasks, you can use a Windows Azure worker role and your own custom code. For a series of blog posts that describe the details of how to set this up, see I Miss You SQL Server Agent (http://blogs.msdn.com/b/sqlazure/archive/2010/07/30/10044271.aspx).

    Tools

    As of this writing, the supported graphical interface for working with SQL Azure databases is the same tool, SQL Server Management Studio, you use to work with on-premises databases. However, Microsoft is working on a lightweight and easy-to-use database management tool for SQL Azure databases, which is being referred to by the code name Project “Houston”. It is designed specifically for Web developers and other technology professionals who want to quickly develop, deploy, and manage their data-driven applications in the cloud. Project “Houston” provides a web-based database management tool for basic database management tasks such as authoring and executing queries, as well as designing and editing a database schema. CTP1 of Project “Houston” was made available for download and testing in July. For more information, see the blog post CTP1 of Microsoft Project Code-Named “Houston” (http://blogs.msdn.com/b/sqlazure/archive/2010/07/21/10040258.aspx).

    Future Plans for SQL Azure

    The list of features and capabilities of SQL Azure is changing rapidly, and Microsoft is working continuously to release more enhancements.

    For example, in SQL Azure building big databases means harnessing the easy provisioning of SQL Azure databases and spreading large tables across many databases. Because of the SQL Azure architecture, this also means scaling out the processing power. To assist with scaling out, Microsoft plans to make it easier to manage tables partitioned across a large set of databases. Initially, querying is expected to remain restricted to one database at a time, so developers will have to handle the access to multiple databases to retrieve data. In later versions, improvements are planned in query fan-out, to make the partitioning across databases more transparent to user applications.

    Another feature in development is the ability to take control of your backups. Currently, backups are performed in the data centers to protect your data against disk or system problems. However, there is no way currently to control your own backups to provide protection against logical errors and use a RESTORE operation to return to an earlier point in time when a backup was made. The new feature involves the ability to make your own backups of your SQL Azure databases to your own on-premises storage, and the ability to restore those backups either to an on-premises database or to a SQL Azure database. Eventually Microsoft plans to provide the ability to perform SQL Azure backups across data centers and also make log backups so that point-in-time recovery can be implemented.

    Conclusion

    Using SQL Azure, you can provision and deploy your relational databases and your database solutions to the cloud, without the startup cost, administrative overhead, and physical resource management required by on-premises databases. In this paper, we examined the internals of the SQL Azure databases. By creating multiple replicas of each user database, spread across multiple machines in the Microsoft data centers, SQL Azure can provide high availability and immediate scalability in a familiar SQL Server development environment.

    For more information:

    http://www.microsoft.com/sqlserver/: SQL Server Web site

    http://msdn.microsoft.com/en-us/windowsazure/sqlazure/default.aspx: SQL Azure Web site

    http://blogs.msdn.com/b/sqlazure/: SQL Azure Team blog

    http://msdn.microsoft.com/en-us/library/ee336279.aspx: SQL Azure documentation

    http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter

    http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter

    SQL Azure