Database High Availability in Azure
The Cloud Has Perpetual Up-Time and Resiliency
A common perception is that the cloud is infinitely scalable with perpetual up-time and resiliency. The cloud does enable resiliency, and hyper-clouds like Azure have service level agreements (SLAs) of 99.95% and better for Infrastructure as a Service (IaaS) virtual machines. It is up to application developers to create cloud services that are designed for high availability by planning and deploying workloads around the resiliency-enabling features of the cloud.
Azure and Availability Sets
To help meet the 99.95% SLA, Azure supports availability sets which automatically spread the VMs across 5 update domains, which in turn are spread across up to three fault domains.
An update domain is a virtualization host. Update domains within the same fault domain (i.e., rack) share a common power source and network switch. So an Azure availability set automatically spreads its member VMs across 5 different virtualization hosts, which in turn are spread across 3 different racks.
Deploy Each Application Tier into Its Own Availability Set
For high availability, virtual machines intended for the same purpose are deployed into an Azure availability set. This means VMs for a multi-tiered cloud service are placed into their own tier-specific availability sets.
Perhaps the best-known example is the web role, where multiple web server VMs are created and configured to function with an Azure load balancer. This strategy can be used with any tier, and Azure availability sets can be used as a resiliency feature when designing a cloud service for high availability.
If an update domain (virtualization host) has to reboot due to planned maintenance, or if an unexpected issue develops on an update domain, the application tier remains available since its VMs are spread across different update domains. Likewise, the tier remains available even when its VMs need to undergo maintenance.
But What About the Database Tier?
When thinking of database clusters, many people are reminded of legacy physical fail-over server clusters with shared block storage allocated from a storage area network. While it is possible to use shared block storage with virtual machines, scalable cloud solutions don’t need to blindly follow legacy physical models such as a legacy requirement for shared block storage.
In Azure, the combination of Windows 2012 R2 fail-over clusters, Azure availability sets, and SQL Server 2012 Always-On can be used to design high availability for a cloud service’s database tier.
No Dependencies on Shared SAN-Allocated Volumes
Unlike legacy fail-over clusters, a Windows 2012 R2 fail-over cluster does not need a shared storage area network (SAN)-allocated volume as a quorum disk. Instead a simple CIFS/SMB file-share (known as the file-share witness) can be used as the functional equivalent to a legacy cluster’s quorum disk.
Likewise, SQL Server 2012 Always-On does not depend on shared SAN-allocated volumes either. High availability for databases is achieved with Always-On groups and replication.
Always-On Database Replicas and Replication
An Always-On group is a fail-over environment for a set of databases that are intended to fail over together. It is created from two or more Windows 2012 R2 fail-over cluster nodes where each node is running SQL Server 2012.
Each Always-On group has a set of primary databases and one or more sets of secondary databases to which data is replicated. The set of primary databases live on the cluster node that is currently the primary replica. The other cluster nodes are secondary replicas, where each secondary replica holds a set of secondary databases. The secondary replicas serve as potential fail-over targets for the Always-On group, where a secondary replica can become the new primary replica as a result of a fail-over action.
Reads can occur at either the primary replica or at any of the secondary replicas. Writes occur exclusively at the primary replica, where the changes are then replicated to the secondary replicas.
Always-On Database Fail-Overs
For planned fail-overs due to scheduled maintenance – patching of an IaaS virtual machine running SQL Server, notification of Azure maintenance, etc. – a manual Always-On fail-over can be performed so the primary replica is not on a node that is affected by the scheduled maintenance.
Automatic Always-On fail-overs can also be configured. If the primary replica is suddenly affected by an unexpected issue, a secondary replica can automatically take over as the new primary replica.
Fail-overs are transparent to the tiers and clients that depend on the Always-On databases. The cloud service’s database role continues to be available.
Conclusion
Just as the cloud can be highly scalable, the cloud can also be resilient, but this does not happen automatically. Application developers have to design resiliency and high availability into a cloud service, preferably from the ground up. The design can certainly take advantage of technology elements offered at different levels – resiliency-enabling features from an IaaS virtual machine’s OS, from the data role’s relational database management system (RDBMS), and even from the cloud platform itself. The combination of Azure availability sets, Windows 2012 R2 fail-over clusters, and SQL Server 2012 Always-On is a good example to help illustrate this point.