nginx AWS ELB name resolution with resolvers

If you are running nginx as a proxy in front of An Amazon Web ServicesElastic Load Balancer (ELB), it is not safe to merely define an upstream using the hostname of ELB and call it a day. By default, nginx will only do name resolution at startup time, caching the resolved IP address infinitely.

ELB instances scale up and down based on your traffic levels, often changing IP addresses in the process. It seems to be that increased traffic leads to Amazon spawning a new, beefier ELB instance, then changing the DNS record to point at the new instance. It’ll keep the old ELB instance around for a little while to give you time to resolve the new one, but the old instance (using the old IP) will be retired after a short period of time. We need nginx to be able to periodically re-resolve the load balancer’s hostname so service interruptions aren’t encountered due to the IP address change.

The fix

Fortuantely, this one is really simple to remedy. You need only use the resolver config directive in your nginx config. By specifying a DNS server with the resolver directive from within nginx, you signify that it should check with said server every five minutes (by default) to see if the upstream ELB has changed IPs. This is done in a non-blocking manner, and should pose no real threat to your server’s throughput.

The other critical piece is that you must add a $request_uri to the end of whatever proxy_pass value you’ve specified. DNS caching will remain without this, meaning you are no better off. See the example below.

Example

http {
   [...]

   # Causes DNS resolution of upstreams every 5 minutes.
   resolver 172.16.0.23;

   [...]

   server {
      [...]

      proxy_pass http://somewhere.com$request_uri

      [...]
   }
}

The resolver directive can be used in http, server, and location sections, so you can get as specific or as broad as you’d like.

The future fix

A later version of nginx will honor DNS TTLs, so look forward to that. I’ll try to remember to update this article when this lands.

Rabbits for the celery

I run an Arch Linux desktop as my primary development workstation.We use celery pretty heavily on some of our Django projects, and I was working to get my local environment at least somewhat closer to our production setup, only to find there isn’t a non-AUR package for rabbitmq-server.

Of course you can just download an AUR package, but that’s not nerdy enough. I managed to get a GitHub project to act as a pacman repository, and will share it here in case anyone else would like to stay up to date with rabbitmq-server without mucking with the AUR packages yourselves.

Just to be clear, there is no real benefit to installing from my repository, other than not having to download PKGBUILDs and mess with compiling/installing yourself. I just yanked the highest voted rabbitmq-server off of AUR and sucked it into a repository.

If there are any other common packages that you use that I might also use, comment on here and I might be convinced into pulling them in.

Repository: https://github.com/duointeractive/archduo

Amazon EC2 and long restart delays

For the benefit of others either considering Amazon’s EC2, or who arealready there, I thought I’d point something out. I am not sure if this is an Ubuntu EC2 AMI issue, an EC2 issue, an EBS issue, or some combination of all of these, but we are experiencing some really erratic restart times. We run our EC2 instances with the following basic configration:

  • Size: small and medium high-cpu
  • Root device: EBS
  • Distro: Ubuntu 10.10 (the latest)

Symptoms

The behavior we are seeing is that even our simplest, no-extra-EBS-volumes instances periodically hang on boot when restarted. The restart can range anywhere from less than 60 seconds, to 5 minutes, to not at all. In the case of ‘not at all’, something bad happens to the point where we the instance fails to reach the point where we can even SSH in, and the syslog that the web-based management console shows looks to be out of date.

In the case of a complete hang on restart (it seems to be 50-50 right now), we have to reboot the machine again from the web-based AWS EC2 management console. This second reboot usually results in a full startup.

Our hunch

From what we can tell, this may be EBS-related. Even though we specifically have nobootwait on our swap partition (which is an included ephemeral drive that is standard with the Ubuntu AMI), it seems like Ubuntu may be freaking out when it can’t reach the root EBS drive. It’s also possible that even despite the nobootwait in the fstab, this same thing could be happening with the swap partition as well. We haven’t had a lot of time to try different combinations, as we’re insanely busy right now.

If anyone else has experienced something similar, please chime in.

Toshiba Tecra A4 on Ubuntu Ibex Alpha

I haven’t had enough time to really get some wear and tear testing in,but it looks like as of today, the latest Ubuntu Ibex Alpha release runs almost flawlessly on the Toshiba Tecra A4 (PTA40E). Wireless and the Wired NIC work without any kernel boot options, and performance is much improved without noapic.

There are just a few minor annoyances like what appears to be a remaining quirk with the ipw2200 driver restarting in some rare cases, but it looks a lot less frequent. Dimming on battery disconnect and function keys are still broken, but neither of these is a big deal.

In any case, it’s definitely usable now, and Ibex is looking pretty sharp.