(Note: This is a guest post by Rackspace SharePoint Consultant Todd Klindt)
Some of my favorite technology is sexy. It has flash. It might have substance, too, but it definitely has flash. My fancy Windows Phone is a good example. It’s a beautiful device. It has all the right curves in all the right places and it catches peoples’ eyes. On top of that, it’s a fantastic device to use.
SharePoint is also pretty sexy…in a utilitarian way. For technology to really get us excited it has to have a bit of sex appeal. You know what’s not sexy, though? SharePoint backup. Nothing against backups, but they are definitely not the prettiest girl at the ball. Unfortunately, because backups aren’t sexy, nobody gets as excited about them. Not even their moms! But we ignore backups at our own peril. It sometimes seems like no matter how many horror stories we hear about someone losing data, we never get serious about backups until something horrible happens to us.
I’ve been in the IT field a long time. Some might argue too long, but that’s another topic entirely. I have loads of stories about data loss – those are a dime a dozen. The “fun” stories are the ones with massive data loss when backups just weren’t in place, or were in place but broke somehow. In this post I’m going to tell two hysterical – at least to us – stories about companies that did not have a proper backup strategy…and how it bit them. Then I’ll tell you how Rackspace is working with AvePoint to make sure I don’t have any more funny stories about our customers losing their data.
SharePoint Backups After Dark
My first major SharePoint Disaster Recovery (with emphasis on “disaster”) story happened to me in 2004. At the time, I was the lead technical person for a large SharePoint 2003 farm. For those of you who didn’t have the pleasure of working with SharePoint 2003, let me set the stage for you. This was the dark ages of SharePoint. It took a physical crank to start your SharePoint 2003 servers, and they ran on steam and the tears of mermaids. Our SharePoint 2003 farm was large, considering the average farm size at the time. It had around 1,400 site collections and 10,000 sub sites. At the time, drive space was pretty expensive, so my boss and I decided we would use the “simple recovery model” for our SharePoint databases. This meant we could not use Microsoft SQL Server’s transaction logs to recover data. The only data we would be able to recover was data that existed at the moment backups were taken. While drive space was expensive, backup tapes were really expensive and we were burning through them at a rate that gave my boss an ulcer. We also had to walk uphill to get to the tape library. Because of the cost of tapes, and not wanting to do all that walking uphill, we decided to do backups nightly, at midnight. That means we could only ever recover data that existed at roughly midnight. Remember that bit – it becomes important for the next part of the story and really adds to the humor.
My boss and I thought we’d done a good job explaining our backup system to our users, but as is often the case, a few people slipped through the cracks. One of those cases popped up one sleepy Saturday night. I’d just gotten into my most comfy pair of Batman pajamas and I was settling down with a bowl of popcorn (Orville Redenbacher, of course) when my phone rang. It was the helpdesk. Someone had called them and needed to speak to me immediately. I had long fancied myself as a very important person, so this didn’t surprise me. It was one of our users and he was all in a tizzy. It turns out he had taken quite a shine to our little SharePoint farm and thought all of his stuff should go in there. He had a lot of data he wanted to move into SharePoint. He hired a dozen or so temps to come in on a Saturday and work tirelessly all day entering the data into a huge SharePoint list. So far, so good. Unfortunately, toward the end of the day, someone who opened the list in datasheet view did something similar to the old “Ctrl-Alt-Delete” gag that we all know and love so much. I find that one works best with my email inbox. The end result was that the list he and his workers had spent all day populating was now empty as empty as my mailbox on Valentine’s Day in grade school. It was all gone. Vanished. Poof! He called the Helpdesk, asking, nay, DEMANDING we recover his list. I talked to him a bit on the phone, and between swear words (his, not mine), I tried to explain to him that I couldn’t get any of it back because it hadn’t existed at midnight. Apparently he was not convinced. A few minutes after I got off the phone with him I got a second, calmer call from my grandboss. I guess he’d just gotten an irate call from my new friend, demanding that I try harder to get his data back. So off came the Batman pajamas, on went my work clothes, and off to the office I went.
When I got into the office, I called the user and tried again to explain his data was all gone, no “backsies.” He suggested we call Microsoft support and see if they had any ideas. I’ll spare you all the painful details, but I didn’t head home until around 3:00 the next morning. And you know how much data I was able to restore for the user? The stuff he’d had at Midnight the night before. I am definitely not on his Christmas card list.
That story is a great example of how a bad backup strategy affected a user. Time and money were a constraint, and IT ended up taking it out on the end user. I’m happy to say that that incident helped us justify a better backup strategy…one that got me sworn at way less often.
Todd Saves the Day
My second story is my favorite, but that might be because I’m the hero and not the butt of the joke. In this story, a company is moving its entire on-premises SharePoint environment to a hosting provider. It’s important to note that this hoster was not Rackspace. The company for which I was working – we’ll call them “Acme Widgets” here – had seven SharePoint farms of various flavors: development, test, production…the whole lot. All farms were all virtualized with there being around 30 guests in all. It was a pretty impressive setup. When they got ready to move the whole shebang over to the hoster (that still is not Rackspace), Acme gave the hoster read/write access to the Storage Area Network (SAN) where all the Virtual Machine (VM) files were stored so they could copy them over. In a move I lovingly refer to as “reverse replication,” the hoster copied their blank file system over top of Acme’s SAN. In an instant, all of Acme’s SharePoint farms were gone. Poof!
This isn’t where the story gets good, though. Acme had backups, so the company started down that road to recover them. Acme was a big company, and things were broken out pretty well. First, the SQL Server team was called and apprised of the situation. They had backups of all the databases and they started restoring them. Next, they called the Virtualization team and asked them to restore the actual guest VMs. The virtualization team told them that was the Windows Team’s responsibility. No problem – that makes sense. Acme’s SharePoint Team called the Windows Team and asked them to restore the guest VMs. This is where things get funny. The Windows Teams told the SharePoint Team that backing up the guest VMs was the Virtualization Team’s responsibility. Whoopsie. All this time they each thought the other team was backing the guest VMs up, but no one was. That’s when I came onto the scene, complete with super hero cape.
Because of how SharePoint stores its data, and since the SQL Team wasn’t sleeping at the wheel, I was able to rebuild all seven of Acme’s farms and not lose any of its important SharePoint data. Acme lost a couple of days of productivity, but taking into account the size of the catastrophe, the company considered itself lucky. Acme learned a lot about SharePoint backup and recovery scenarios, and I’ll never pay full price for a jet motor or anvil again.
Unlike our first story, this wasn’t an issue of the end users not knowing the backup policy, or the backup policy not being appropriate for the data being stored. There were two issues here. First, there wasn’t enough communication between the groups involved. Acme’s SharePoint Team didn’t do a good job coordinating things. And second, the backups were never tested. It would have only taken a single restore exercise for the SharePoint Team to realize that nobody was backing up the guest VMs. That would have been a much better time to discover that little nugget of information.
How Rackspace Prevents SharePoint Backup Tales of Terror
Here at Rackspace, we don’t like getting sworn at, so we take our backups very seriously. We go above and beyond to make sure we never need to have that sad talk with a customer about how their data has gone to live on the farm with that beloved dog from their childhood. Over the years, we have used various backup solutions with the goal of making sure the data was backed up and could be easily restored. We are constantly reevaluating this to make sure we are using the best solution. Recently, we had our socks blown off by AvePoint’s DocAve Backup and Restore, and I’m very impressed by its capabilities. I won’t drag you into weeds and bore you with technical jargon like RTO, RPO, or BBQ. Instead, I’ll just tell you that DocAve is going to make Rackspace’s backups more robust, restores more seamless and painless, and we may all lose weight while still eating what we want.
While backups aren’t normally sexy, to us nerds at Rackspace, they’re plenty sexy. We love super nerdy things, and we love taking fanatical care of our customers and their data. In our quest to find the perfect backup solution for our SharePoint products, we’ve fallen head over heels for DocAve Backup and Restore. And while its sexiness might not be immediately apparent to Rackspace customers on the outside, they can rest easy knowing it is keeping their data safe and our engineers happy.
To learn more about the partnership between AvePoint and Rackspace, please visit AvePoint’s website.