Sphinx and Design Documents

As I’ve matured as an individual and a developer, I find that I like tospend longer and longer amounts of time in the planning phase before writing too much code. I’ll tinker with some of the technical problems I’m very concerned about, make sure my idea is even feasible, then jump into design doc mode.

Depending on the complexity of the project, this can be as simple as a Google Doc (especially if collaborating with less technical people), or as “fancy” as Sphinx documentation. I find the latter to be strangely therapeutic, though (I don’t pretend to be normal).

What is Sphinx?

Sphinx is a wonderful tool used for writing documentation for software. In addition to being able to generate documentation from docstrings (autodoc) for API references, it also makes it really easy to write user/admin guides, and design docs. Your documentation can refer to individual functions, methods, classes, and modules (if they are autodoc’d), and you have the full power and ease of reStructuredText at your fingertips.

You may have seen Sphinx in action if you’ve browsed Django or Python’s documentation. While the most avid users are Python projects, other languages are now supported.

What does the end result look like?

In the case of my latest project, a MUD named Dawn of the Titans, the end result looks like this.

I tend to start out with high-level scribbling in a Scratch Pad section, and just brain dump high level stuff, even if it doesn’t make 100% sense.

As things solidify, they end up graduating to their final homes in the Administrator or Player Documentation sections (in this project’s case). So I am effectively writing much of the guides before I write the actual code. As the code falls into place, my developer documentation references autodoc’d functions/methods/classes/modules, and etc.

reStructuredText is about as simple as Wiki markup, and is really similar to how developers have traditionally formatted comments and READMEs. It’s not much extra burden to write in reST as opposed to just doodling in a text editor.

When does it make sense to use Sphinx?

  • When you don’t mind spending the initial time to get Sphinx set up.
  • You either know, or want to know, reST.
  • The ability to gradually shift to “final” documentation without changing formats is appealing.
  • You value Sphinx’s autodoc and other extensions for API references, and cross-referencing such API documentation throughout your user/admin/dev guides is appealing.

When does it not make sense to use Sphinx?

  • You are in a huge hurry.
  • People without the time or knowledge to write reST or use version control need to be able to update the documentation.
  • The project is simple enough that a Google Doc won’t be too clumsy.

Why so complicated?

You do indeed need to learn some basic reStructuredText in order to go the route that I have, and there is some initial setup work that can be avoided with something like Google Docs. However, the great thing about using something like Sphinx is that my design docs gradually morph into very complete, thorough user/administrator/developer guides. Due to Sphinx’s autodoc extension, I also have API reference generation baked in, for my user/admin/dev guides to reference. A better example of what one of my more mature projects that was designed and eventually documented in Sphinx is media-nommer. See that for a good example of what the Dawn of the Titans documentation may eventually resemble.

There are a few moving pieces (Sphinx+reST+Revision Control), but each one is reasonably simple to work with. You can expect to be reasonably proficient within a day or two, if you go through First Steps with Sphinx. There are some quirks, and the error handling isn’t wonderful, but the end product is great.

Use Read the Docs

The other thing that makes this absolutely wonderful is Read the Docs. This lovely service compiles and hosts your final documentation for all to see. You can even hook in a GitHub (or other service) post-commit signal to cause it to automatically pull and re-compile your docs after each commit.

Here is my workflow for most tinkering:

  • Navigate to a file on GitHub, hit the “Edit this file” button.
  • Make my changes.
  • Enter commit message, hit “Commit Chanes”.
  • Read the Docs gets the post-commit notification from GitHub, re-compiles the docs.
  • After a 15-30 second delay, my docs are updated on the web for all to see and comment on.

I can, of course, still edit all of the docs locally in my favorite editor, and compile everything locally to review before committing (for more major edits).

Python and AWS Cookbook (Ebook) 50% off!

Mitch Garnaat’s excellent Python and AWS Cookbook is now 50% off($6.49) in Ebook format (ePub, Mobi, PDF). The book features some great recipes, straight from the maintainer of boto.

While the book isn’t unfriendly to those looking at boto for the first time, it really shines for those who have done some tinkering with boto in the past. Mitch gets right to the point, providing ample explanations for each recipe. The EC2 sections were particularly useful for me. We use boto heavily at DUO, but I managed to learn some great new tricks with instance management by reading through the examples.

For about $7, this is a great read, and a great way to show appreciation for an excellent project.

Addendum:There is no monetary motivation for my cheerleading; I thought I’d share this with others (the special is easy to miss).

media-nommer scampers closer to initial release

Our Python-based, open source (BSD) distributed encoding system,media-nommer, is inching closer to what we’ll call an initial 1.0 release. We’d love to have other developers take a look, try our documentation, and help us chase down flaws before we stick a 1.0 on this thing and put it out there for others to rip to pieces.

Here are the basics:

  • An orchestrator/management daemon runs on some arbitrary machine of your choice (your own, in EC2, Rackspace, wherever). It provides a simple JSON API that your applications can send encoding jobs to. It also handles spinning up new instances and communicating job state to your applications.
  • Encoding nodes are spawned on Amazon EC2, with lots of configurable options to determine resource usage. The system can scale as far as EC2 will let you keep spinning up instances.
  • Encoding is handled through “Nommers“. Each Nommer wraps a different kind of encoding software. The only Nommer currently is FFmpegNommer, which wraps ffmpeg. We’d love to see some other audio and video-related Nommers added (mencoder, anyone?).
  • The EC2 encoder nodes are never in direct contact with the master management daemon. The management daemon can be stopped and restarted at a later date (or on another machine) without any interruptions or data loss. No firewall holes needed.

We are dogfooding media-nommer like crazy on two very large projects at DUO, and it’s working really well for us. However, we’d love to have some other eyes and hands on the project, so please do consider checking it out. If you find yourself using Zencoder or Encoding.com with any kind of regularity, media-nommer just may save you a lot of money.

CouchDB as a MUD server data store

I’ve been using CouchDB as the data store for my in-developmentMUDDawn of the Titans. So far it’s been very enjoyable to work with, through the CouchDB Python module. I’ll take a moment to share my experiences, for those who might be interested.

To provide some background, my MUD server is built specifically for the game I’m working on, but I’ve been developing it in the open on GitHub. The whole thing is built on Twisted, and is loosely styled after TinyMUX 2 (with Python in place of C++ and SoftCode).

Why CouchDB?

This is probably the first question on most people’s minds. There was nothing overly scientific about the choice of CouchDB. For me, this was a very uninteresting choice. I wasn’t really interested in querying whatever DB I used, as I wanted to keep almost everything memory-resident. I didn’t need much scalability at all, and I didn’t really need a relational database. The only thing I really needed a DB for was persistence.

In the end, I thought CouchDB’s use of JSON documents was pretty neat, and figured they’d allow for a really simple way to store objects. It was also a chance to learn something new (which was the biggest factor of all).

The Perks

So far, CouchDB has been a joy to work with. I realize that almost all of these are possible with X relational/non-relational DB, but I still give an approving nod to CouchDB in these cases.

The biggest benefit to using CouchDB is that my in-game object loading code looks something like this (pseudocode):

# Retrieve the object's document from CouchDB. This is a JSON
# dict, whose keys are the object's various attributes (name,
# description, location, etc).
object_doc = self._db[doc_id]
# A simplified example of how a Room is loaded. The CouchDB's keys
# are expanded as kwargs to the Room's __init__. Presto, loaded object.
loaded_object = RoomObject(**object_doc)

This may not seem like anything earth-shattering (it really isn’t), but it makes things very simple to manage and expand on. RoomObject’s __init__ method knows exactly what to do with a CouchDB object document.

Since CouchDB just deals with JSON documents, I can add/remove keys (which become object attributes in-game) without hassling with schema modifications or explicitly specifying fields. I’ve been surprised at how little time I spend mucking with the DB. I’m free to just focus on dealing in the realm of my MUD server, without worrying too much about my data store.

Another great thing about CouchDB is Futon, the web-based management console for CouchDB. It has made editing objects a breeze. I do this constantly when tinkering with objects. My set of in-game building commands is currently very limited, so this helps me keep getting things done while those materialize.

The last cool thing I’ll mention is that saving objects to the DB can easily be made asynchronous, and you can bulk submit objects (instead of saving one at a time). While async DB operations are expected for most DBs, CouchDB’s calls/queries are really easy to work into a Twisted project (without resorting to threads/processes), since the calls are all just HTTP requests (that can be deferred via Twisted’s HTTP client). Take a look at Paisley for an example of how simple it is to perform non-blocking queries.

The downsides

There is only really one downside (for what I’m doing) to CouchDB, and many non-relational stores in general: I have to manually assure that all ‘fake relations’ stay valid, and fail gracefully if they end up invalid. For example, let’s say I have an Exit that leads to ‘Test Room’. If said room is deleted, the exit should be un-linked or deleted. Leaving the exit in place means that someone attempting to travel through it (to the now non-existant room) would see an error message, since the ‘destination’ attribute points to an invalid object ID.

Most MUD servers have to do this kind of cleanup on their own anyway, I’m just somewhat spoiled from my time spent with Postgres/MySQL/SQLite on Evennia, which cleaned up after me (CASCADE!). So this is far from a show-stopper, just something I’m not used to.

The only other thing that could possibly be construed as a down-side is that querying CouchDB feels clumsy to me. This is almost certainly due to knowledge gaps on my end. I didn’t really need this in my case anyway, so no harm here.

In summation

Used in the context of a MUD, CouchDB has been awesome to work with. I’ve enjoyed it much more than my last MUD server projecet using relational DBs. In the end, the choice of data store should be made based on what lets you spend more time on your game, rather than your serialization/persistence/object saving. For any sanely designed MUD, you’re not likely to hit performance issues, and it all comes down to a matter of preference.

PyPy’s call for donations (NumPy)

Disclaimer: I am not at all involved with PyPy development, planning,or management. You are about to see cheer-leading, but it’s not because this is my project.

PyPy has recently posted  ”Call for donations - PyPy to support Numpy!” There has been some initial ground work laid by Alex Gaynor and others, but it looks like they’re ready to go full speed ahead with the effort now. This is great news for PyPy and Python.

NumPy is one of the defacto scientific computing packages in the Python ecosystem. There are all kinds of other modules that depend on NumPy, and it’s used heavily in research, engineering, and other general sciency stuff.

PyPy is speedy alternative implementation of Python. In many cases, PyPy is able to handily whallop traditional CPython. As of the time of this article’s writing, PyPy’s speed center says “The geometric average of all benchmarks is 0.21, or 4.9 times faster than CPython”.

So… why do I care?

CPython, being an interpreted language, often falls behind other closer-to-the-metal languages and compilers. The scientific community, along with a good number of other modules rely on NumPy. The problem is, CPython isn’t nearly as fast as some of the alternative languages. However, what Python loses in speed, it makes up for in ease-of-use and readability.

With a PyPy-compatible NumPy, we can greatly reduce our speed woes, and open up PyPy compatibility with a very large set of existing modules. The end result being PyPy is one step closer to being ready for everyday use, and it also gets a “killer package”.

Lots of drops in the bucket = ??

There’s only so much we, as individuals can do financially, but do consider making a small donation to the cause. The Python community is large, and contributions will quickly accumulate to something useful. For those who are firmly entrenched in CPython, consider what an aggressive, experimental Python implementation does for the greater Python ecosystem (subject for another post or discussion).