A DNS Failover Service – Is it right for you?

Over the years, DNS Failover has become a very popular service primarily due to the fact that it is relatively inexpensive and fairly easy to deploy and manage. But is it the right solution for your organization? In this article, we’ll outline what DNS failover is and provide an overview of the benefits and appropriate uses for this type of technology.

What is DNS Failover?

First, we must define what DNS failover is before trying to understand whether or not it offers benefits. Simply put, DNS failover is an add-on feature of a DNS Service and is a system that consists of two main components. The first component monitors servers or devices behind IP addresses in an effort to determine their state, e.g. up or down. Second, it uses the server state information to automatically update the IP address associated with a particular DNS record, such as an ‘A’ record, ‘AAAA’ record or otherwise. It may consist of other components designed to deliver email or SMS alerts in the event that an outage is detected or changes are made, but any successful DNS Failover solution must consist of at least the first two essential elements.

Common Applications for DNS Failover

So now that we know what DNS Failover is, what is the most common use for it and what applications can benefit from this method of automating DNS updates?

Perhaps the most obvious use that comes to mind is to control web traffic. Failing over between redundant web servers is a very common implementation, and is probably the most frequently used application. From the failover system’s perspective, it really doesn’t matter where the two IP addresses are, so long as they are different. So because of this, it could be used to failover between two different servers at the same datacenter, servers at different datacenters (even across the globe) or it could be even the same server with two different network connections, each having their own IP address, for example a server at a corporate office with one IP on each of two different T1 lines.

Naturally, what can be done for web servers can also be done for mail servers, FTP servers, streaming media servers or VoIP servers, just to name a few. Essentially, if it has an IP address and can be monitored externally, DNS Failover can route traffic to it when it is up, and redirect to another server when it is down.

Common Features

Most well thought out DNS Failover implementations should have at least a few basic features. First, and probably paramount to failover success, is monitoring. The more advanced the monitoring feature-set, the better. This allows you to detect outages as accurately as possible. At a minimum, we would expect functionality to fine-tune the monitors and to run them from multiple geographic locations in order to create an accurate picture for when failover actually needs to occur. Of course, it should probably go without saying that the monitoring and alerting functionality should be able to send alerts via email, SMS or both.

Another important feature, in our opinion, is some flexibility in how traffic is routed. The most common traffic routing configurations are sequential or round-robin. Sequential allows you to list servers in the order you want traffic to be delivered so that if the server with a priority of 1 goes down, it sends traffic to the server with a priority of 2 and so forth. Round-robin publishes a DNS record for every one of the active and available servers so that traffic is distributed amongst them as evenly as possible. When one goes down or becomes otherwise unavailable, it is simply removed from the pool allowing traffic to continue being distributed amongst the remaining server(s).

And finally, you’ll probably want a feature that prevents the automatic re-announcement of a server’s IP address when it comes back online. In the case of a server providing static web content, re-adding may not pose a problem, but in the case of a database server requiring resynchronization, this little feature is critical.

Limitations of DNS Failover

The biggest limitation when using DNS Failover for application switching is the ability to quickly detect an outage and make a change that will actually propagate the Internet in relatively short order to seamlessly redirect users, both current and future. Even the best DNS Failover solutions that accurately detect an outage and update DNS quickly and accurately still leave customers at the mercy of a well known problem called caching. This is an extremely tough issue to resolve because it is not within the DNS Failover service providers control.

Caching issues are created when ISPs ignore TTL values, or they can be caused by something as simple as default Internet browser settings which are configured to cache images in order to improve page load times. The shortest compliant TTL value is 60 seconds, and for some ISPs this is simply too short. AOL, for example, is known to ignore low TTL values and substitute them with something much higher, say 15 minutes, for example. But other than those ISPs who ignore TTL values, a 60 second setting in theory *should* propagate the Internet relatively quickly.

Is there anything better?

Depending on your application, Cloud Load Balancing might be a better fit. That, of course, should be dictated by your specific requirements for uptime, flexibility for balancing traffic among servers and how critical the need to avoid caching created by TTL avoidance or just plain old browser settings. Cloud Load Balancing is similar in design, but offers extremely powerful traffic management solutions similar to hardware load balancers (except on a global scale) for serious online organizations.