Recovering from VMDKs on NetApp NFS Datastores

Ok, so the last post went over the scenario of recovering entire VMs, what if you just want one file? As I mentioned, we used to recover the whole VM to another place, copy the file out, then delete our copy. But that was far from elegant, and again, a pain if the file they wanted was in a snapvaulted location.

How much do you trust the filesystem to be consistant? Well, we take a “crash-consistant” snapshot every morning, where the NetApp system effectively spools off a version of the underlaying VMDK file, without telling the Virtual machine using it. Our recovery rate, over the last two years, and 1050 VMs, has been 100%. It’s not a solution for everyone and everything – for the VMs running high transaction load DBs, like Oracle (yup, we went there!) and Exchange, we use NFS or iSCSI, and use NetApp’s Snapmanager products to quiesce the Applications and take snapshots of their storage in the instant they are flushed.

So our crash consistant snapshots, how do we get files back out of them? Remember the secured recovery console VM in the previous post? Remember the inception reference in the previous post? Add a few more layers into that.

The basic premise is that we mount the NTFS filesystems in the VM, using NTFS-3G, and use e2tools to copy files out of ext3 partitions.

But to get to those points, you have a few problems. The first is to turn your read-only VMDK (NetApp snapshots are read-only) into a device. losetup -r loop0 /path/to/VMDK will do that. Then, find partitions inside this device: kpartx -a -v /dev/loop0. At this point, you can just mount the NTFS partitions from the Windows VMs, but the Linux systems have a few more tricks up their sleeves..

We use LVM, for flexible volume management. It’s burnt into our template. Which means all of our VMs have the same VG and LV names. The first thing we did to prepare this recovery VM was to rename it’s Volume Groups to avoid conflicts. Simple vgchange, edit /etc/fstab and mkinitrd – in that order. If you do mkinitrd before the /etc/fstab edit, the initrd will load root from a non-existant location.

Having prepared our recovery VM in advance, we scan for volume groups inside the /dev/loop0 partitions using vgscan, then bring them online with vgchange -ay VGname

At this point, you’d think we could just mount the LVs, wouldn’t you?

Quick primer on the ext3 filesystem – it’s ext2, with a journal to enable easy recovery after crashes. In these crash consistant VMDK snapshots, there’s an unflushed journal, and the filesystem is flagged as inuse and having one. Linux’s ext3 implementation will attempt to replay the journal of an ext3 filesystem if present, when mounted. Even if you tell it not to load the journal (noload), it will still attempt to make your readonly filesystem read-write to mark the filesystem as clean. And if you try to mount it as ext2, it will also complain, since there’s a journal there. ext3 journals can be removed, but guess what? It’s a read-write operation. All of these things are perfectly reasonable, and there for very very good reasons. Just, not what I’m after, since this is a 100% read-only situation, and I can’t make it readwrite, even if I wanted to.

So we looked at a couple of options, union filesystems (rejected; wanted to copy the whole VMDK if we made a change), guestfish (actually works ok, but is very resource heavy – it essentially boots the VM inside it) and eventually were pointed at e2tools – it’s in early beta, and it hasn’t been updated in 7 years – but it seems perfectly functional.

At this point, we’ve copied our files out, with just cp or e2cp, so how do we get them to the VM? We’re still working on that, but current plan is to use mkisofs to turn them into an .iso, and mount that to the VM for the end-admin to copy them out of.

Then, once all the copies are done, you need to tear down the LVM with vgchange -an, delete the partitions from the kernel with kpartx -d, then remove the loop device with losetup -d and you’re done! We will be automating a lot of this with some shell scripts (think – startrecover, stoprecover to take care of the loop/LVM setup), but even now it’s a lot quicker than what we had.

Pretty neat huh?

Recovering VMDKs on NetApp NFS Datastores

In my day job, I look after the day to day server operations of a university that makes extensive use of vmware and netapp storage. When I started there, and saw they were using NFS for their datastores, I reversed judgement on if they were crazy-smart or just crazy. Thankfully it was the former – crazy-smart.

Using NetApp NFS for VMDK storage allows us to do all sort of cool stuff, especially with regards to backups/recovery/migration. But it had been tedious, especially if someone wanted a single file restored from their VM.. we had to copy the entire VMDK out of the snapshot directory, mount it on another VM somewhere, find the file, and get it back to the customer somehow. And if it was on our secondary filer, we had to do a flexclone, and mount that onto one of the 96 ESX hosts we had, copy the file out.. etc

Wheels spin sometimes, and an idea comes to you. Remember Inception? and all the layers? Going deeper etc? It’s like that.

/home/user/file.txt -> ext3 -> LVM LV -> LVM VG -> LVM PV -> /dev/sda1 -> ESX -> VMDK -> NFS Datastore -> NetApp Data OnTap -> WAFL -> Disks ..

Over the last week, my co-workers and I have been building up a system to make this easier and less disruptive to the infrastructure (which is good for everyone, the less changes you have to make to production, the better). This gist is this..

We have a secured VM, with a couple of NICs – one standard access port, one a VLAN trunk carrying our NAS networks, including the one that the VM Blades use to mount their storage.

Inside this VM, we do magic…

So, 96 blades – that’s a fairly large VM infrastructure. We have two separate environments, in 6 clusters, two routing domains, etc, running a total of 1050+ VMs at last count. Each cluster with their own datastores, diverse physical locations, etc. One of the service improvement projects that I got our great team to do was to implement were someĀ datastores, mounted onto all the clusters, routed where needed. Performance didn’t have to be great, just good enough, and on 10Gb NFS, yeah, it’s pretty good. We have an ISOs datastore, a Templates datastore and a Transfer datastore. The Transfer one was new – the others we’d had for a while.

On our secured VM, we have the Transfer datastore mounted read-write using NFS, as well as the snapvault repository versions of our datastores (mounted read only for safety, but the files are read-only anyway). This now means that if we have to do a full VM recover, we have a simple process –

  • Shut down the VM
  • Edit the settings to remove the hard drives you want to recover (I know, it sound wrong to me too, but trust me..)
  • Storage vMotion the VM onto the Transfer datastore (which, since it doesn’t have any disks, is quick)
  • Locate the version of the VMDK you want in the .snapshot directory of the snapvault location (We have a simple shell script to list all versions)
  • Copy the VMDK files (remember the -flat.vmdk) from the snapvault location into the appropriate directory on theĀ Transfer datastore, using cp &, then running watch ls -l on the destination, if you want a progress indicator
  • Re-add the storage from the vmware settings, finding it in the place you just copied it
  • Power On VM, check it works, then hand back control to customer, and start a storage vMotion to relocate storage back into the correct primary datastore

All done! No messing around on the NetApp making flexclones and mounting them, cleaning them up etc. Depending on your level of risk tolerance, you could copy the VMDK back to the primary location also mounted via NFS, but we consider the small delay of the storage vMotion to be a price worth paying for peace of mind.