python-colormath 3.0.0 released

After a lengthy delay, python-colormath 3.0.0 has been released. The big changes are requiring networkx>=2.0 and shifting our Python 3 support from 3.3/3.4 to 3.5/3.6. See the release notes for the full details.

Impending Archival

I'll probably give this release a while to simmer in the wild to make sure I didn't break anything. At some arbitrary point in the next few months, I intend to archive python-colormath in Github so users understand that it will no longer be maintained. This codebase is now ten years old, needs more love than I can afford to give it, and I'm very far from this subject matter these days.

If you are interested in taking the reigns, please get in touch with me and we can talk potential fit!

Update: Jan 2018

We have some preliminary interest, so the project remains un-archived. We're working through a few updates and will see if activity remains steady enough to transfer the project over!

Fire up an interactive bash Pod within a Kubernetes cluster

In those cases where you need a throw-away interactive shell within your cluster:

$ kubectl run my-shell --rm -i --tty --image ubuntu -- bash

You may, of course, use a different image or shell. Some of the arguments explained:

  • my-shell: This ends up being the name of the Deployment that is created. Your pod name will typically be this plus a unique hash or ID at the end.
  • --rm: Delete any resources we've created once we detach. When you exit out of your session, this cleans up the Deployment and Pod.
  • -i/--tty: The combination of these two are what allows us to attach to an interactive session. 
  • --: Delimits the end of the kubectl run options from the positional arg (bash).
  • bash: Overrides the container's CMD. In this case, we want to launch bash as our container's command.

Note that you'll need to update your apt cache before you can install packages:

$ apt update
$ apt install cowsay

Once you are done with your tinkering:

$ exit

Post-exit, the Deployment and Pod that kubectl created will both be stopped and deleted, taking with it our container and anything we did within it.

Simple outbound request rate limiting with App Engine

I've been doing a lot of playing with Google App Engine (GAE) of late, since it is a cheap/free way for me to quickly toss ideas against the wall and see what sticks. One of the tinker projects I've been working on is a sub-Reddit stat tracker (warning: hastily put together and un-finished) that records and ranks technical sub-Reddit activity over time.

Retrieving the data I needed required polling the Reddit API. There is an existing high-quality Python API client (PRAW), but I ran into a GAE + requests + HTTPS issue that prevented me from using it (PRAW uses requests). Said issue will be fixed in requests 2.10.0, but there's been no indication that requests 2.10.0 is arriving anytime soon. This was significant in that PRAW handles rate limiting and oauth authentication for you. Rather than wait for requests 2.10.0 or forking PRAW to use GAE's urlfetch service (which supports HTTPS), I decided to hit Reddit's public API directly without authenticating.

Warning: The days of unauthenticated Reddit API access may be coming to an end. I don't recommend unauthenticated API access for anyone doing anything more simple than a tinker project like mine.

Early goings

My initial draft used App Engine cron and task queues to schedule and parallelize the work. Once an hour, cron triggered a background task that would create sub-tasks for each sub-Reddit I'm tracking. These sub-tasks would hit the Reddit API once or twice, muddle through the return value, and toss some values into a Custom Metric on Google Stackdriver.

Since the first proof-of-concept wasn't rate limited, I ran into HTTP 429 (Too Many Requests) as I tracked increasingly more sub-Reddits.

App Engine Queue definitions to the rescue

I only needed to do a full scan of all of my tracked sub-Reddits once per hour. I had to make sure that I'm not doing more than 30 API calls per minute. I wanted to try to spread the requests out evenly, rather than exhaust my quota at the beginning of each minute. I also wanted to do this with minimal complexity.

Fortunately, App Engine task queues can be configured with a queue.yaml file in your project root. There are two directives in here that are particularly interesting:

  • rate - How often jobs are popped from the queue and distributed to your workers. The number of jobs that are popped at a time is determined by max_concurrent_requests. For example, a value of 30/m will mean the queue is popped at most 30 times per minute.
  • max_concurrent_requests - The max number of concurrently executing jobs.

Since the unauthenticated Reddit API rate limit is 30 requests per minute, I was able to enforce this at the queue level by using a rate of 30/m and a max_concurrent_requests of 1. Here is my full queue.yaml file. The end result:

  • Tasks are popped from the queue up to 30 times a minute (rate = 30/m).
  • We only pop one task at a time (max_concurrent_requests = 1).
  • We won't pop a new task until the currently running one is ACK'd.
  • As long as everything works as described in the docs, we stay under the rate limit at all times.

As a result, we went from rate limiting errors all over the place to:

So you can rate limit. What's the big deal?

Rate limiting is not an especially difficult thing to implement, but I thought it was interesting to see how easy App Engine made this. My code doesn't know or care that it's being rate limited, which is nice. The most beautiful lines of code are the ones you don't have to write at all!

In the future, I'll want to either move over to PRAW when requests 2.10.0 lands or implement the bare minimum for oauth authentication with App Engine's urlfetch service. At that point, I'll be able to twiddle my rate and concurrency values to get some more throughput.

Nothing earth-shattering here, but I thought I'd share!