Azure Backup for SQL Server aka How I Blew Up A Production Environment

Back in June 2018, Microsoft announced the public preview of their new Azure Backup capability: SQL databases running in Azure VMs. Testing it out, I found that many features proved useful for the project we were working on (easy to set up and manage, support for SQL 2012 and later, central management of all database backups, point-in-time restores from the Azure portal, and support for Availability Groups).

We were excited to deploy this in our production environment as it solved many issues we were running into with other backup solutions (ex: Managed Backup available in 2014 but not 2012, fragmented management for each instance, multiple repositories for database backups, etc).

Having experience with a few SQL backup solutions (Managed Backup in SQL 2014+, Automated Backup in the Azure portal, and Ola Hallengren’s SQL Server Maintenance Solution), I was reasonably confident we could enable the Azure Backup solution for SQL databases with relatively little impact…

This is the part where you point at me and laugh…because then things went horribly wrong.

So, what happened? The primary instance for one of our production SQL Availability Groups completely locked up and all databases started to time out when anything tried to connect to them. When we finally managed to log onto the server, we found that disk throughput for the SQL database and log volumes was maxed out.

Until we opened a case with Microsoft support, we were unaware of two VERY important points with the preview for Azure Backup for SQL Server (these are now documented in the Microsoft document link listed below).

  1. Azure Backup does NOT support running a full backup (copy-only) from secondary replicas. Full backups can only run on the primary replica.
  2. By default, Azure Backup will attempt to back up 50 databases at once.

Because of this, the resulting events occurred on the primary SQL replica:

  • Azure Backup for SQL Server was enabled and full backups were kicked-off.
  • Azure Backup attempted to back up 50 databases at the same time.
  • The read IO on the SQL data disks maxed-out their throughput (for 1TB premium disks, this caps at 200MB/s) as Azure Backup attempted to transfer data from the primary replica to the Recovery Services Vault.
  • Since the primary replica was still technically “up” on the network, no failover occurred.
  • Attempts to connect to these databases timed-out as all disk IO was occupied by the backup process, so no reads/writes on the SQL disks could occur
  • To resolve, we disabled Azure Backup for the SQL databases and killed any backup processes on the primary instance.

Microsoft did implement a workaround for this issue. While you still can’t run a full backup from secondary replicas (at least, not yet), you can throttle the number of databases that Azure Backup will attempt to back up. See https://docs.microsoft.com/en-us/azure/backup/backup-azure-sql-database#faq.

To learn more about the Azure Backup for SQL Server solution, I highly recommend reading through Microsoft’s thorough documentation, which provides step-by-step instructions for setting this up.  See https://docs.microsoft.com/en-us/azure/backup/backup-azure-sql-database.

Hopefully, this will be integrated into the Azure Backup policy in the near future rather than requiring a manual change on each SQL instance.

Update:

As of 3/18/2019, the Azure Backup for SQL server solution is now Generally Available. Along with this, the solution was updated to include support for copy-only full backups from secondary replicas in an Availability Group. Note that secondary replicas must be set to read-only in the AG settings and backup preferences must be set to Secondary or Secondary Only.

See updated documentation at:

https://docs.microsoft.com/en-us/azure/backup/backup-azure-sql-database

If you’re interested in learning more about related Hybrid IT & Cloud topics, please take a look at these recent posts. And if you need answers to any questions around RSAT—or any aspect of Windows Server—please don’t hesitate to contact us. We’d love to help you out.

 

Share on

linkedin sharing button twitter sharing button

Ready to get started?

Enter your information to keep the conversation going.
Location image
4 Sentry Parkway East, Suite 300, Blue Bell PA, 19422

Email Image

info@anexinet.com

Phono Image610 239 8100


Location Image4 Sentry Parkway East, Suite 300, Blue Bell PA, 19422
Phono Image610 239 8100