25 Mar, 2011
Recovering from VMDKs on NetApp NFS Datastores
Posted by: alex in: Gibbering
Ok, so the last post went over the scenario of recovering entire VMs, what if you just want one file? As I mentioned, we used to recover the whole VM to another place, copy the file out, then delete our copy. But that was far from elegant, and again, a pain if the file they wanted was in a snapvaulted location.
How much do you trust the filesystem to be consistant? Well, we take a “crash-consistant” snapshot every morning, where the NetApp system effectively spools off a version of the underlaying VMDK file, without telling the Virtual machine using it. Our recovery rate, over the last two years, and 1050 VMs, has been 100%. It’s not a solution for everyone and everything – for the VMs running high transaction load DBs, like Oracle (yup, we went there!) and Exchange, we use NFS or iSCSI, and use NetApp’s Snapmanager products to quiesce the Applications and take snapshots of their storage in the instant they are flushed.
So our crash consistant snapshots, how do we get files back out of them? Remember the secured recovery console VM in the previous post? Remember the inception reference in the previous post? Add a few more layers into that.
The basic premise is that we mount the NTFS filesystems in the VM, using NTFS-3G, and use e2tools to copy files out of ext3 partitions.
But to get to those points, you have a few problems. The first is to turn your read-only VMDK (NetApp snapshots are read-only) into a device. losetup -r loop0 /path/to/VMDK will do that. Then, find partitions inside this device: kpartx -a -v /dev/loop0. At this point, you can just mount the NTFS partitions from the Windows VMs, but the Linux systems have a few more tricks up their sleeves..
We use LVM, for flexible volume management. It’s burnt into our template. Which means all of our VMs have the same VG and LV names. The first thing we did to prepare this recovery VM was to rename it’s Volume Groups to avoid conflicts. Simple vgchange, edit /etc/fstab and mkinitrd – in that order. If you do mkinitrd before the /etc/fstab edit, the initrd will load root from a non-existant location.
Having prepared our recovery VM in advance, we scan for volume groups inside the /dev/loop0 partitions using vgscan, then bring them online with vgchange -ay VGname
At this point, you’d think we could just mount the LVs, wouldn’t you?
Quick primer on the ext3 filesystem – it’s ext2, with a journal to enable easy recovery after crashes. In these crash consistant VMDK snapshots, there’s an unflushed journal, and the filesystem is flagged as inuse and having one. Linux’s ext3 implementation will attempt to replay the journal of an ext3 filesystem if present, when mounted. Even if you tell it not to load the journal (noload), it will still attempt to make your readonly filesystem read-write to mark the filesystem as clean. And if you try to mount it as ext2, it will also complain, since there’s a journal there. ext3 journals can be removed, but guess what? It’s a read-write operation. All of these things are perfectly reasonable, and there for very very good reasons. Just, not what I’m after, since this is a 100% read-only situation, and I can’t make it readwrite, even if I wanted to.
So we looked at a couple of options, union filesystems (rejected; wanted to copy the whole VMDK if we made a change), guestfish (actually works ok, but is very resource heavy – it essentially boots the VM inside it) and eventually were pointed at e2tools – it’s in early beta, and it hasn’t been updated in 7 years – but it seems perfectly functional.
At this point, we’ve copied our files out, with just cp or e2cp, so how do we get them to the VM? We’re still working on that, but current plan is to use mkisofs to turn them into an .iso, and mount that to the VM for the end-admin to copy them out of.
Then, once all the copies are done, you need to tear down the LVM with vgchange -an, delete the partitions from the kernel with kpartx -d, then remove the loop device with losetup -d and you’re done! We will be automating a lot of this with some shell scripts (think – startrecover, stoprecover to take care of the loop/LVM setup), but even now it’s a lot quicker than what we had.
Pretty neat huh?