Zerto is a VM-replication technology with a lot of flexibility around failover and testing. Zerto leverages vCenter APIs to manage just about everything. As changes to the source VMs occur they are sent to the recovery site and written to journal volumes. These journal volumes store this data based on the RPO. Zerto also generates checkpoints that serve as point in time recovery options. Journals are limited in size but can be resized as needed. VMs are organized into Virtual Protection Groups which recover as a group. RPO is generated based on all VMs in a VPG. Select VMs within a group can be failed-over but no other VMs can be failover until the previous failover is complete.
Zerto has four distinct Disaster Recovery operations that allow you to bring virtual machines to a recovery site. The following post reviews the nuances and use cases to consider for each of these four operations.
Zerto DR Operation #1: Failover Test
This is your basic non-disruptive failover test option. This option allows you to quickly test the failover of a VPG while maintaining the replication from source to the journal. A new VM is spun-up based on the recovery disks and journal of the source VM and a scratch disk is created to manage writes on the recovered VM. The VM is brought online using the Test Network defined for the VPG or VM. Replication between the Production and Recovery sites is maintained. The production VM is available for the entire process.
Considerations for Failover Test operation
As mentioned, the recovered VM writes changes to the scratch disk. Once this disk is full, no more writes can occur to the VM and it becomes unresponsive. This scratch disk is the same size as the journal disk. This cannot be modified during the Failover Test operation. The only way to modify it is to end the test, resize the journal, and start the test over. This operation is suitable for short-term tests. In order to use the Failover Test operation for longer durations, you must increase the journal size accordingly prior to the test. This test does not include point-in-time recovery, so failover is the latest checkpoint available.
Zerto DR Operation #2: Clone
This operation clones the VMs to the recovery site and utilizes the Production network defined in the VPG, or for the VM. Like the Failover Test operation, this is non disruptive. A new VM is created based on the checkpoint you choose for the source VPG. The difference here is the recovery disks are duplicated at the recovery site. No scratch disk is required as changes are written directly to the recovered VMs disks. Replication between the production and recovery VPG is maintained.
Considerations for Clone operation
As this process clones the VM and storage without requiring scratch disk, this operation is suitable for longer tests or point-in-time recovery. This comes at the cost of additional storage, as recovery disks are duplicated. Additionally, the VMs are brought up on the Production network defined. If this is identical to the actual production network, you must ensure the VM obtains a new IP address and that DNS changes are taken into consideration. The production network can be modified to a network that will not have these routing issues.
Zerto DR Operation #3: Move
The move operation can be used to migrate VMs from the production site to the recovery site. This is a disruptive operation as the source VM is taken offline. This operation requires a commit or rollback operation be performed. This will either commit the VPG to the recovery site or rollback the VPG to the production site. This operation also has the option of reverse protection. With this enabled, a journal is created on the source side to allow migration back to the source.
Considerations for Move operation
This operation assumes both sites are healthy. This operation is intended for planned migrations. To ensure data integrity, it’s recommended that the source VMs are gracefully shutdown and a manual checkpoint is created.
Zerto DR Operation #4: Failover
Not to be confused with the Failover Test operation, this is a disruptive operation that fails the VPG to the target site. The VPGs are brought online on the Production network. This operation has a few different options, depending on the scenario. If the production site is online, reverse protection can be enabled to create a journal at the source site.
Considerations for Failover operation
This should only matter in the event of an actual disaster. If the production site is online, the production VMs will be migrated to the failover site. The only time to utilize the Failover operation during a test is to simulate an actual disaster and to confirm application downtime is acceptable to all parties involved with testing.
Choosing the right operation
With appropriate planning, choosing the right operation is not difficult. Consider the length of time you require for testing. For shorter tests, the Failover test operation is appropriate. For longer tests the Clone operation would be better, provided you have enough storage space and network isolation. If you can accept application downtime, the Failover operation can be used. For actual disasters, the Failover operation is the choice—and for migrations, the Move option. Lastly, if your organization needs more help with Zerto DR Operations (or any VM-replication technology), or any aspect of disaster recovery, please don’t hesitate to reach out to us. We’d love to help you get started.
SQL Server Architect