Disaster recovery goes virtual

Business continuity capabilities don’t have to be gold-plated to get the job done anymore.

As federal technology executives gain experience using virtualization technology to reduce the number of physical servers eating up space and power in their datacenters, many are starting to discover that virtualization can also offer similar efficiency and cost-cutting benefits for their business continuity capabilities.

Mike Rosier, senior systems administrator at Fermi National Accelerator Laboratory, explains how he and his colleagues are using virtualization to create a more resilient IT infrastructure for the lab for a fraction of the cost of traditional business continuity options.

Federal Computer Week: Can you give provide an overview of the general use of server virtualization at your organization?

Mike Rosier: At Fermilab, we've been using modern server virtualization technologies for over 5 years. In fact, I'm sure we were utilizing earlier implementations back in our mainframe days.

Some of the early reasons we decided to invest in virtualization were to address power and cooling issues in our computer rooms. This was at a time we were trying to keep up with the growing demands for development and test systems. The procurement costs for dedicated physical servers were also eating into our yearly budgets.

At this point in time, we've migrated between 60 and 70 percent of the physical systems that we originally identified as being good candidates for virtualization. As virtualization technologies continue to improve, we'll look to identify even more systems that may have originally been excluded from our list.

We’re supporting a wide variety of systems as virtual machines, including those used for test, development, integration, and production environments. Some of these include web servers, file servers, custom application servers, data acquisition systems, email servers, monitoring systems, authentication systems, terminal servers, and print servers.

In recent years, we’ve significantly reduced the number of new physical system purchases and now consider virtualization as a first option. Although it might not be a perfect match every time, we’re seeing fewer systems that do not make sense to setup as virtual machines.

FCW: Was using virtualization to support business continuity objectives part of the original impetus and business case for server virtualization, or was it a second stage objective?

Rosier: For the most part, using virtualization to support business continuity objectives has been a second stage objective until recently. While our virtual infrastructure continues to grow and mature, we see more and more customers looking to virtualization as a way to avoid costly clustering solutions, which can provide quick restoration of service after hardware failure or data loss. As service providers have become more and more aware of the capabilities virtualization can provide, we’ve spent just as much time discussing backup/replication options and failover strategies as we spend discussing virtual machine sizing and application specific requirements.

FCW: Can you describe with a little more technical detail how virtualization supports business continuity objectives?

Rosier: In order to describe how virtualization supports business continuity objectives, it helps to understand just what those objectives are in your environment. Some customers require their systems to be available 24x7, while others are satisfied with 8x5. Not only do systems need to be available, but they also need to perform adequately.

Virtualization can allow you to meet your objectives by allowing you to focus more of your efforts on configuring a relatively low number of redundant systems capable of providing enough failover capacity to weather varying types of outages. In our environment, we’re using technologies such as [network interface controller] teaming, redundant storage adapters and paths, live virtual machine migration, full virtual machine image and file-level backups, and cloning/replication. We’re also utilizing multiple data centers using separate power/cooling feeds to meet our business continuity objectives.

Virtualization has given us the ability migrate workloads from one building to another without impacting production operations.

FCW: How does using virtualization for business continuity compare technically and cost-wise to prior approaches for achieving availability objectives?

Rosier: When you compare the use of virtualization technologies to prior approaches for achieving availability objectives, you'll quickly notice how simple it can be to achieve server, storage and network redundancy. With today's technology, you can also easily achieve data center redundancy using fairly low cost solutions compared to what was available in the past.

Some of the earlier solutions providing business continuity required costly clustering software or hardware, and specific knowledge of how each of those solutions functioned in order to quickly recover a workload onto a different system. With the advancement of virtualization technologies, it becomes easier to provide the ability to recover from a hardware failure or to separate certain virtual machines from each other onto different physical servers. Less complexity generally translates into greater reliability.

Virtualization allows you to achieve high availability for a greater number of systems for a fraction of the cost if you consider what it might take to provide “like” hardware for each of your systems. Since the physical server hardware is often abstracted from the guest operating systems, most virtualization platforms make it easy to automatically restart or keep a guest running after a full system failure.

In some cases, we've been able to take a bit of a hybrid approach to providing business continuity for key applications. Mixing virtual machines and dedicated physical servers into a single cluster can be an option that not only saves on hardware costs, but also gives you a foot in each door if vendor certification is a concern. In many cases, devices such as load balancers can certainly function for both virtual and physical systems belonging to the same cluster.

FCW: What pitfalls would you warn others to avoid when using virtualization to support continuity objectives?

Rosier: You should always be prepared to discuss your strengths and weaknesses. If you find shortcomings in your environment, be sure to investigate, identify, and prioritize cost-effective solutions that can be easily integrated into your virtual infrastructure. Virtualization technologies continue to evolve rapidly, so make sure you keep up with industry trends before making any large, strategic purchases.

Make sure your customers and management chain has a clear understanding of your capabilities are and what you're protected against when failures occur. For example, high availability means different things to different people. Sometimes it means no downtime and sometimes it means minimal downtime. Make sure this is clear up front. You should simulate failures and test your resiliency from time to time. You certainly don't want to find out that you're not protected against something you invested in heavily to avoid. It could reflect poorly on you and by extension, your organization.

Depending on the size and structure of your organization, you may need to engage members of other groups to help you meet your business continuity objectives. Don't miss this step! Unless you have a clear understanding of all the resources your virtual infrastructure is dependent on or interacts with, there's a good chance you'll make assumptions that can cost you time and money in the future.

For example, if you purchased iSCSI storage arrays as a cost-effective way to provide storage to a new data center, you might soon learn that there are network switches in your path that do not support or are not configured to support jumbo frames, which can be a requirement for certain workloads. Or, maybe you discover that your fibre channel switches and servers might support 8Gb connections, but your fiber [cables] support less than half of that in a reliable manner.

As they say, “The devil is in the details,” so try to learn those details by communicating with the experts in each area before you purchase and deploy a technology intended to address business continuity objectives.