Amazon Route 53 DNS failover
While it is no longer shiny and new, I just recently got a chance to sit down and play with Amazon Route 53’s DNS failover feature. So far, I have found it to be simple and very useful for simple cases where DNS fail-over is acceptable.
My usage case
I run EVE Market Data Relay (EMDR), which is a distributed EVE Online market data distribution system. All pieces of the infrastructure have at least one redundant copy, and the only single point of failure is the DNS service itself. We can afford to lose a little bit of data during fail-over, but a complete outage is something we can’t have.
Sitting at the top of the system are two HTTP gateways on different machines at different ISPs. These are set up as a weighted record set, with each gateway weighing in at 50/50 (requests are divided evenly).
We introduce the Route 53 magic by adding in a health check and associating it with each of the two resource record sets. The health check involves Route 53 servers around the world periodically calling a pre-determined URL on the two HTTP gateways in search for a non-error HTTP status code. If any of the entries fails more than three times (they check roughly every 30 seconds), said entry is removed from the weighted set.
By the time that Route 53 picks up on the failure, yanks the entry from the weighted set, and most fast ISP DNS servers notice the change, about two minutes have elapsed.
Why this is a good fit for EMDR
With EVE Market Data Relay, it’s not the end of the world if 50% of user-submitted data gets lost over the minute and a half it takes for Route 53 to remove the unhealthy gateway. It’s highly likely that another user will re-submit the very same data that was lost. Even if we never see the data, the loss of a few data points here and there doesn’t hurt us much in our case.
With that said, DNS failover in general can be sub-optimal in a few basic cases:
- You don’t want to leave failover up to the many crappy ISP DNS servers around the net. Not all will pick up the change in a timely manner.
- You can’t afford to lose some requests here and there. DNS failover isn’t seamless, so your application would need to be smart enough on both ends if data loss is unacceptable.
For more simple cases like mine, it’s wonderful.
In my case, Route 53 is health checking two servers that are external to AWS, which means I spend a whopping $1.50/month on Route 53’s DNS failover.
We all wear many hats, and it’s great!
Fabric task for notifying New Relic of a code deploy
A brief example Fabric task for notify New Relic of code deploys.read more
Ansible first impressions
Amazon Elastic Transcoder Review
MUD tech is fun/cool, but…
As software development evolves, there are an ever-expanding number of ways to put together very complex, elaborate systems that are fun to geek out on. Multi-processing is becoming increasingly prevalent, distributed systems are a boon to cases with massive scalability or reliability requirements, and there are all kinds of neat ...read more
- Python 2.7 and 3.x compatibility.
- Extremely simple API
- Powered by requests
python-route53 feedback wanted
Late last night (or early this morning), I finished the draft of python-route53, a stand-alone Route 53 module with Python 3.x and Python 2.7 compatibility. Route 53 is an excellent DNS service offered by Amazon Web Services. It exposes everything through an API.
My intentions with python-route53 are ...read more
python-fedex and colormath re-licensed under BSD
I am happy to announce that python-fedex and python-colormath have been re-licensed under the BSD License. At the time these two packages were created, there were reasons for GPL’ing these. However, said reasons have long since been removed, so it’s BSD time!
My involvement with both of these ...read more
python-bluefin 1.3 released
python-bluefin 1.3 has been released, now with improved error handling. The major feature in this release is that we have smoothed over some inconsistencies in Bluefin’s error handling.
Instead of setting an HTTP status code indicating an error like they do for most of the Bluefin API errors ...read more