Amazon Route 53 DNS failover

Tuesday, April 02 2013

While it is no longer shiny and new, I just recently got a chance to sit down and play with Amazon Route 53’s DNS failover feature. So far, I have found it to be simple and very useful for simple cases where DNS fail-over is acceptable.

My usage case

I run EVE Market Data Relay (EMDR), which is a distributed EVE Online market data distribution system. All pieces of the infrastructure have at least one redundant copy, and the only single point of failure is the DNS service itself. We can afford to lose a little bit of data during fail-over, but a complete outage is something we can’t have.

Sitting at the top of the system are two HTTP gateways on different machines at different ISPs. These are set up as a weighted record set, with each gateway weighing in at 50/50 (requests are divided evenly).

We introduce the Route 53 magic by adding in a health check and associating it with each of the two resource record sets. The health check involves Route 53 servers around the world periodically calling a pre-determined URL on the two HTTP gateways in search for a non-error HTTP status code. If any of the entries fails more than three times (they check roughly every 30 seconds), said entry is removed from the weighted set.

By the time that Route 53 picks up on the failure, yanks the entry from the weighted set, and most fast ISP DNS servers notice the change, about two minutes have elapsed.

Why this is a good fit for EMDR

With EVE Market Data Relay, it’s not the end of the world if 50% of user-submitted data gets lost over the minute and a half it takes for Route 53 to remove the unhealthy gateway. It’s highly likely that another user will re-submit the very same data that was lost. Even if we never see the data, the loss of a few data points here and there doesn’t hurt us much in our case.

With that said, DNS failover in general can be sub-optimal in a few basic cases:

  • You don’t want to leave failover up to the many crappy ISP DNS servers around the net. Not all will pick up the change in a timely manner.
  • You can’t afford to lose some requests here and there. DNS failover isn’t seamless, so your application would need to be smart enough on both ends if data loss is unacceptable.

For more simple cases like mine, it’s wonderful.


In my case, Route 53 is health checking two servers that are external to AWS, which means I spend a whopping $1.50/month on Route 53’s DNS failover.

Assorted useful bits of documentation

More details on how the health checks work can be found on the Route 53 documentation.

Refer to the Amazon Route 53 DNS failover documentation for the full run-down.

We all wear many hats, and it’s great!

Tuesday, February 12 2013

Something that I had hoped for and have found in working at a small business (Pathwright) is the variability I see from day to day. You have a general idea of what you need to be doing, what your goals are in the immediate, mid, and longer term, but it ...

read more

Fabric task for notifying New Relic of a code deploy

Monday, February 11 2013

A brief example Fabric task for notify New Relic of code deploys.

read more

Ansible first impressions

Friday, February 08 2013

After brief visits with Puppet and Chef for config management, I’ve set my sights on Ansible. It’s late and I’ve been staring at this stuff for way too long today, but here are some early observations:

  • I really like that it is written in Python. Puppet and ...
read more

Amazon Elastic Transcoder Review

Wednesday, January 30 2013

Amazon Elastic Transcoder was released just a few short days ago. Given that we do a lot of encoding at Pathwright, this was of high interest to us. A year or two ago, we wrote media-nommer which is similar to Amazon’s Transcoder, and it has worked well for us ...

read more

MUD tech is fun/cool, but…

Tuesday, January 08 2013

As software development evolves, there are an ever-expanding number of ways to put together very complex, elaborate systems that are fun to geek out on. Multi-processing is becoming increasingly prevalent, distributed systems are a boon to cases with massive scalability or reliability requirements, and there are all kinds of neat ...

read more

python-route53 released!

Wednesday, November 14 2012

After some more time in the cooker, python-route53 1.0 has landed on PyPi. This is a stand-alone Route 53 package, independent from the one in boto. The major hilights are:

  • Python 2.7 and 3.x compatibility.
  • Extremely simple API
  • Powered by requests

Read the documentation, see the source ...

read more

python-route53 feedback wanted

Thursday, November 08 2012

Late last night (or early this morning), I finished the draft of python-route53, a stand-alone Route 53 module with Python 3.x and Python 2.7 compatibility. Route 53 is an excellent DNS service offered by Amazon Web Services. It exposes everything through an API.

My intentions with python-route53 are ...

read more

python-fedex and colormath re-licensed under BSD

Tuesday, October 23 2012

I am happy to announce that python-fedex and python-colormath have been re-licensed under the BSD License. At the time these two packages were created, there were reasons for GPL’ing these. However, said reasons have long since been removed, so it’s BSD time!

My involvement with both of these ...

read more

python-bluefin 1.3 released

Tuesday, October 16 2012

python-bluefin 1.3 has been released, now with improved error handling. The major feature in this release is that we have smoothed over some inconsistencies in Bluefin’s error handling.

Instead of setting an HTTP status code indicating an error like they do for most of the Bluefin API errors ...

read more