Like many, I’ve been playing with Docker as it has been developing. I feltreasonably comfortable creating images and running isolated containers, but
I had yet to tinker with automatically provisioning and networking multiple
containers together.
To remedy this, I spent a weekend writing a simple distributed dockerized
image crawler using Twisted, Docker, and fig. The full source (under a BSD
license) can be found in the Github repo.
The README covers a lot of this in more detail, but the application is
comprised of three container types:
- An HTTP web API for enqueuing jobs. For the sake of this example, you only
run one of these, but it’d be easy to adapt the code to support multiple instances.
- A worker container that carries out the image crawling. You can run as many
as these as you’d like.
- A Redis container that is used for tracking job state and results.
The worker daemons connect to the HTTP web API container over ZeroMQ, PULLing
jobs that the HTTP daemon PUSHes. If this were anything but a toy app, you’d
probably want a proper message broker that could offer stronger deliverability
and persistence guarantees.
Assorted Observations
- I may have been doing something wrong, but boot2docker seems to be pretty
rough with volume mounting. It looks like it’d be pretty easy to mount
a directory within the boot2docker VM, but the host machine->container sharing
didn’t work in real-time. I expect this will improve with time, or I could
have missed something.
- fig is a great idea, and is very handy. I’m looking forward to seeing where
this concept is taken over the next few years.
- There is a great python:2.7
image on Docker Hub that serves as a great starting point. They’ve got all
sorts of other versions and alternative implementations (PyPy) covered.