When taking the VMware vCenter Data Recovery lab on VMworld Europe 2009 I was totally surprised about this new product. The product, which will be part of the new vSphere line of products, is a complete disk based backup and restore solution. In my opinion it’s mainly aimed at SMB organizations however.
The product is build on top of the VCB framework (utilizing the new VMware Consolidated Backup API). The main engine is provided by a virtual appliance and it fully integrates into vCenter Server. One of the strong features is that there are no agents needed to backup your VM and it is fully agnostic to the guest OS. So whatever guest OS you are using doesn’t matter, vCenter Data Recovery will backup your VM. In addition the VMware Tools however have the VSS capability to quiesce the Windows guest OS and applications, so that an accurate copy can be created.
One other cool feature is that it utilizes data de-duplication techniques to save on storage space needed for backups. Data de-duplication is a method of reducing storage needs by eliminating redundant data. Each block of data (vCenter Data Recovery dynamically creates blocks from 2k to 64k) is processed using a hash algorithm (such as MD5 or SHA-1). This process generates a unique number for each block of data which is then stored in an index.
Let me try to explain this. Let’s take a small line of text and chop it into words (blocks) and give each unique word a unique number (hash):
Every file can be presented by a hash index. In our example the index will be 1,2,3,4.
Now let’s take another line of text:
Because some hashes are already in the backup index (the words “the” and “is”), there’s no need to store them again. We just put the hash in de hash index, which will be 1,5,3,6 and only block 5 and 6 are actually written to the backup disk.
Instead of using 8 blocks on our backup disk we are now only using 6 blocks to store both lines of text. Our backup disk will look like:
While the storage savings in our example is very small imagine a 20MB Word document where only 1 word is changed, or a 10GB vmdk representing only a Windows2003 installation. And even imagine you have 100 of those 10GB vmdks, because data de-duplication is across VMs J
One potential problem with de-duplication is hash collisions. In rare cases it’s theoretically possible that the hash algorithm produces the same hash number for two different blocks of data. When a hash collision occurs the new data won’t be stored because the hash already exists. This is called a false positive, and results in data loss. There are however techniques to reduce this possibility.
Now let’s get back to vCenter Data Recovery.
One of the first things I noticed was that it’s really easy to configure the product and start creating backups. Just follow the wizard and you are up in no time. Backup jobs can be created at several levels. Instead of creating a backup job on a VM level it can be created at the cluster level. When created at the cluster level, every new VM that gets deployed in the cluster will be automatically protected. You can also specifically mark a VM to be excluded from the backup job. When a VM is protected by backup it gets a distinct icon, so you can easily visually determine if a VM is complaint to the backup policy.
As a backup destination you can configure a complete LUN or even a CIFS share
There are pre-defined retention policies available or you can create your own policy. The retention policy defines how many backups will be retained on the backup disk.
When the wizard is finished you are ready to go and your VMs are protected. When a backup of a VM is started, a snapshot is actually created on that VM. The vmdk than gets hot added to the Data Recovery virtual appliance, which will create the actual backup. This method of backup is completely LAN free!. When the backup is finished the vmdk is removed from the virtual appliance again and finally the snapshot is removed.
Because the actual backup is performed by the virtual appliance which runs on the ESX host, consider some overhead on disk I/O and CPU utilization because of the de-duplication process.
As I always say: “backup is the beginning of a restore”. vCenter Data Recovery has a very neat feature which is called Restore Rehearsal. As you perform a Restore Rehearsal a new VM <vmname>_rehearsal gets created and the data is restore to this VM’s disks. The network will automatically be disconnected on this VM since it’s a rehearsal. When you are convinced that the restore succeeded you simply discard the VM. This reminds me of the days way back when I had to test restore procedures on physical boxes. Damn! live is so much easier now.
In the final product there will be a file restore functionality which was not available at the moment and there will be an option to perform an integrity check.
There are some other blog articles available on this new product also:
No related posts.