May 02, 2011

Minecraft, Python, and nerdery

May 02, 2011/ Greg Taylor

A little over a month ago, I was finally pulled into the rapidly growingthing that is Minecraft. Like many of you, I ended up happily breaking blocks and constructing crude huts and castles into the wee hours of the night.

As is the case with many other things I enjoy, I found myself wondering “What Python nerdery can I get into with Minecraft?” Much like the Minecraft client, the server is written in Java, which is not something I play with for fun. After some Googling around, I stumbled across the Bravo project, an effort to write a custom Minecraft server in Python. “Bingo!”

Bravo is built on top of Twisted and is aimed at being a much more efficient, extendable alternative to the “Notchian” official server. Development is still pretty early, but it is already just about suitable for those who wanting to run creative servers.

Lending a hand

One thing I immediately found out about the Bravo community is that they are immensely patient and helpful with any questions or ideas. I lurk on their IRC channel (#Bravo on Freenode), and have been very impressed so far. For these and other reasons, I can strongly recommend this project for Pythonistas looking for a way to apply their talents to one of their hobbies (Minecraft!).

The Bravo issue tracker has all kinds of stuff in it waiting to be implemented, or fixed up. A lot of these are not extremely difficult, and the maintainer has been great handling pull requests and providing good feedback.

If you’re not sure where to start, or have questions, the IRC room (#Bravo on FreeNode) is great.

tl;dr version

Bravo is a custom Minecraft server written in Python. It is early in development, but is already suitable for creative stuff. The community is friendly, and you should consider perusing their issue tracker.

Source: https://github.com/MostAwesomeDude/bravo

Docs: http://www.docs.bravoserver.org/index.html

IRC: #Bravo on FreeNode

April 28, 2011

S3 access log parsing/storage with Tamarin

April 28, 2011/ Greg Taylor

We have been helping one of our clients moves their massive collectionof audio and video media to S3 over the last few weeks. After most of the files were in place, we saw that our usage reports on for one of the buckets was reporting much higher usage than expected. We ran some CSV usage report dumps to try to get a better idea of what was going on, but found ourselves wanting more details. For example:

Who are the biggest consumers of our media? (IP Addresses)
What are the most frequently downloaded files?
Are there any patterns suggesting that we are having our content scraped by bots or malicious users?
How do the top N users compare to the average user in resource consumption.

Enter: Bucket Logging

One of S3’s many useful features includes Server Access Logging. The basic idea is that you go to the bucket you’d like to log, enable bucket logging, and tell S3 where to dump the logs. You then end up with a bunch of log keys that are in a format that resembles something you’d get from Apache or Nginx. We ran some quick and dirty scripts against a few day’s worth of data, but quickly found ourselves wanting to be able to form more specific queries on the fly without having to maintain a bunch of utility scripts. We also needed to prepare for the scenario where we need to automatically block users that were consuming disproportionately large amounts of bandwidth.

Tamarin screeches its way into existence

The answer for us ended up being to write an S3 access log parser with pyparsing, dumping the results into a Django model. We did the necessary leg work to get the parser working, and tossed this up on GitHub as Tamarin. Complete documentation may be found here.

Tamarin contains no real analytical tools itself, it is just a parser, two Django models, and a log puller (retrieves S3 log keys and tosses them at the parser). Our analytical needs are going to be different than the next person’s, and we like to keep apps like this as focused as possible. We very well may release apps in the future that leverage Tamarin for things like the automated blocking of bandwidth hogs we mentioned, or apps that plot out pretty graphs. However, these are best left up to other apps so Tamarin can be light, simple, and easy to tweak as needed.

Going back to our customer with higher-than-expected bandwidth usage, we ended up finding that aside from a few bots from Nigeria and Canada, usage patterns were pretty normal. The media that was uploaded into that bucket was never tracked for bandwidth usage on the old setup, so the high numbers were actually legitimate. With this in mind, we were able to go back to our client and present concrete evidence that they simply had a lot more traffic than previously imagined.

Where to go from here

If anyone ends up using Tamarin, please do leave a comment for me with any interesting queries you’ve built. We can toss some of them up on the documentation site for other people to draw inspiration from.

Source: https://github.com/duointeractive/tamarin

Documentation: http://duointeractive.github.com/tamarin/

GitHub Project: https://github.com/duointeractive/tamarin

April 19, 2011

Amazon Simple Email Service: Not quite there

April 19, 2011/ Greg Taylor

We’ve been working on transitioning a larger customer’s website from anExchange-based email server to Amazon Simple Email Service, in an effort to improve reliability and eliminate one more bit of reliance on internal infrastructure. I’ll share our experiences here for others considering the same thing.

The Good

The setup process was simple (subscribing to the service, getting production access).
The API calls to SES are quick and reliable.
Sending mail is easy.

The Bad

It makes sense to have to verify ownership of an address to use as a “reply-to” in headers, but it’s a real pain when you send from a bunch of emails under one domain. There’s no way to wildcard-verify the entire domain. Not a deal-breaker, but kind of annoying.
There is no way to see delivery failures. Again, not a deal-breaker, but not ideal.

The Ugly

For higher-volume sites, migrating to SES is awful. You start with a 1,000 emails per 24 hours quota, and have to increase the limit by consistently sending enough emails to put you in the neighborhood of your limit. See the Growing your quota documentation for more specifics. More on this later.
Automated quota increasing hasn’t worked for us. We’ve been sending right near our quota for 6 days now without a bump. We’ve hit our limit a few times, and lost a handful of emails because of this, since we were planning on the quota increasing as advertised.

Quotas: Low, inflexible, and buggy

Take one look at AWS’s migration strategies for SES and consider the annoyance these represent for transitioning a large, active website. The onus is on the developer or administrator to come up with some way to only send 1,000 emails a day until SES gets around to bumping quotas up. There is no way to go through a verification process (like PayPal or some other email services) and just pay for what you use. There is no real special consideration for those who need it. There are forms to ask for an initially higher quota, but everyone ends up at the 1,000/24 hour limit initially.

In our case, we worked around the migration annoyances. We send about 1,500-3,000 a day, so we just manually switched back and forth between our old SMTP/Exchange setup and the new SES backend. We’ve been doing this for 6 days now, whereas our quota should have bumped after day 3. Low quotas are one thing. Not raising the limits after the documented period is even worse.

The Recommendation

For very small sites, or those that are starting with SES, it can be a great fit. It’s cheaper than the comparable services, it’s simple to get set up with, and it’s fast and reliable. For sites that already have a higher email volume, I’d suggest avoiding SES until it matures. It’s still not documented clearly in some places, and the quota buggyness and inflexibility mentioned earlier in this post are show-stoppers.

This one needs another six months in the cooker. I’m sure later iterations will improve the service, as has been the case with the other products offered in AWS.

April 19, 2011

New IMC and IRC extensions for Evennia MUD server

April 19, 2011/ Greg Taylor

Evennia, the Twisted+Django MUD server, has just finished bringing inshiny new support for IRC and IMC (Inter-mud communication) as of revision 1456. This allows users to bind a local game channel to a remote IRC or IMC room. Evennia transparently sends/receives messages between the game server and the remote IRC/IMC server, while the players are able to talk over said channel just like they would a normal one.

It is even possible to bridge an IRC room to an IMC channel, with the Evennia server acting as a hub for messages. The next step for any eager takers may be to create a Jabber extension (any takers?).

If you’re curious, feel free to drop by #evennia on FreeNode to pester the developers.

April 15, 2011

django-ses + celery = Sea Cucumber

April 15, 2011/ Greg Taylor

Maintaining, monitoring, and keeping a mail server in good standing canbe pretty time-consuming. Having to worry about things like PTR records and being blacklisted from a false-positive stinks pretty bad. We also didn’t want to have to run and manage yet another machine. Fortunately, the recently released Amazon Simple Email Service takes care of this for us, with no fuss and at very cheap rates.

We (DUO Interactive) started using django-ses in production a few weeks ago, and things have hummed along without a hitch. We were able to drop django-ses into our deployment with maybe three lines altered. It’s just an email backend, so this is to be expected.

Our initial deployment was for a project running on Amazon EC2, so the latency between it and SES was tiny, and reliability has been great. However, we wanted to be able to make use of SES on our Django projects that were outside of Amazon’s network. Also, even projects internal to AWS should have delivery re-tries and non-blocking sending (more on that later).

Slow-downs and hiccups and errors, oh my!

The big problem we saw with using django-ses on a deployment external to Amazon Web Services was that any kind of momentary slow-down or API error (they happen, but very rarely) resulted in a lost email. The django-ses email backend uses boto’s new SES API, which is blocking, so we also saw email-sending views slow down when there were bumps in network performance. This was obviously just bad design on our part, as views should not block waiting for email to be handed off to an external service.

django-ses is meant to be as simple as possible. We wanted to take django-ses’s simplicity and add the following:

Non-blocking calls for email sending from views. The user shouldn’t see a visible slow-down.
Automatic re-try for API calls to SES that fail. Ensures messages get delivered.
The ability to send emails through SES quickly, reliably, and efficiently from deployments external to Amazon Web Services.

The solution: Sea Cucumber

We ended up taking Harry Marr’s excellent django-ses and adapting it to use the (also awesome) django-celery. Celery has all of the things we needed built in (auto retry, async hand-off of background tasks), and we already have it in use for a number of other purposes. The end result is the now open-sourced Sea Cucumber Django app. It was more appropriate to fork the project, rather than tack something on to django-ses, as what we wanted to do did not mesh well with what was already there.

An additional perk is that combining Sea Cucumber with django-celery’s handy admin views for monitoring tasks lets us have peace of mind that everything is working as it should.

Requirements

boto 2.04b+
Django 1.2 and up, but we won’t turn down patches for restoring compatibility with earlier versions.
Python 2.5+
celery 2.x and django-celery 2.x

Using Sea Cucumber

You may install Sea Cucumber via pip: pip install seacucumber
You’ll also probably want to make sure you have the latest boto: pip install —upgrade boto
Register for SES.
Look at the Sea Cucumber README.
Profit.

Getting help

If you run into any issues, have questions, or would like to offer suggestions or ideas, you are encouraged to drop issues on our issue tracker. We also haunt the #duo room on FreeNode on week days.

Credit where it’s due

Harry Marr put a ton of work into boto’s new SES backend, within a day of Amazon releasing this service. He then went on to write django-ses. We are extremely thankful for all of his hard work, and thank him for cranking out a good chunk of code that Sea Cucumber still uses.

Greg Taylor

Greg Taylor

Blog

Greg Taylor

Minecraft, Python, and nerdery

Lending a hand

tl;dr version

S3 access log parsing/storage with Tamarin

Enter: Bucket Logging

Tamarin screeches its way into existence

Where to go from here

Amazon Simple Email Service: Not quite there

The Good

The Bad

The Ugly

Quotas: Low, inflexible, and buggy

The Recommendation

New IMC and IRC extensions for Evennia MUD server

django-ses + celery = Sea Cucumber

Slow-downs and hiccups and errors, oh my!

The solution: Sea Cucumber

Requirements

Using Sea Cucumber

Getting help

Credit where it’s due

Greg Taylor

RSS Feed