April 17, 2014

Oft-misused adjectives: Bloated and Lightweight

April 17, 2014/ Greg Taylor

In software development, we like to re-purpose everyday adjectives. We’llcall a project “unstable”, or “mature”. Maybe we give a nod where it’s due and say a piece of software is “elegant”. For the most part, this works pretty well. However, I’m going to take a moment to rant about two that I see constantly misused:

Bloated
Lightweight

Every time I see either of these used, I ignore what follows. There are cases where these two adjectives make sense, but I argue that these are few and far between. Some of my hangups may be completely irrational, so take this mostly as a rant, because who doesn’t like a good rant?

Let’s started with “bloated”.

Bloated

The Dictionary.com definition:

bloat·ed [bloh-tid] adjective

swollen; puffed up; overlarge.
excessively vain; conceited.
excessively fat; obese.

Applied in the context of software, this is often used to describe a project that is larger than it needs to be. It is almost always used as a derogatory term, used to describe something overly complex. However, in most cases this term is a too vague and non-specific to be of any use. How “large” should the project be? What defines “large”? If it were stripped down, would it still be useful to the same audience? Is it really a problem that you aren’t using all of the bells and whistles in a module?

When you find yourself considering using “bloated”, ask yourself to more specifically define your grievances. If the software/library/module is a sprawling, badly organized mess, you probably want something like “sloppy” or “tangled”. If you are having a hard time wrapping your head around the basic usage of said software, maybe you’re looking for “complex” or “ergonomically challenged”. These point to more specific issues.

“Bloated” can often appear when a developer is having a hard time understanding a piece of software. Perhaps the real problem is bad documentation, but the dev may instead just call said software “bloated” and move on. I also see it used as justification to re-invent the wheel or justify NIH Syndrome due to a lack of understanding for the original library.

Now that I’ve ranted for a bit, here’s a summary:

Find a better way to describe your objection. Be more specific!
Understand that a general-purpose library/module/software product has to cover more than just your usage case. If your usage case is specific and narrow, you’re more likely to toss around the “B word”.
It’s OK if you don’t use all of a software product’s features. Disk space is cheap.
If a software product is hard to understand, it may not be “bloated”. It may have ergonomic or documentation issues. It could just be messy code. Mention these issues instead of referring to the more vague “bloat”.

There are legitimate usage cases for this term, but there is almost always a better, more specific way to describe why you dislike a piece of software.

Lightweight

A library being described as “bloated” is almost always followed up with a recommendation for a more “lightweight” alternative. The Dictionary.com definition:

light·weight [lahyt-weyt] adjective

light in weight.
being lighter in weight, texture, etc., than another item or object of identical use, quality, or function: a lightweight topcoat; a lightweight alloy for ship construction.

This is probably the more obnoxious of the two terms for me. The intent is to describe a module as being smaller and more simple than an alternative. The amusing thing is that the supposed more “lightweight” alternative is of similar size and complexity to the “bloated” bit. “lightweight” is used to paint a module in a positive light, opposite of “bloated“‘s negative connotation.

As is the case with “bloated”, lightweight is a cop out for justifying why something is better. When you find yourself considering the use of “lightweight”:

Re-visit and re-define your objection to the first library you were considering. Is it sloppy? Does the documentation suck? Does the API have ergonomic issues? Does it use too much of your system’s resources? If so, pick a more descriptive adjective.
Once you have a concise reason for not liking your first candidate library, see if your alternative does things better. If so, specifically mention what it does better. Does it have better documentation? Alternative is “better documented”. Is the API easier to use? Alternative is subjectively “easier to use”. Is the code better organized and of higher quality? Alternative is “more elegantly designed”.

“Lightweight” doesn’t tell us much. Amusingly, it is often used by developers who were looking to re-invent a more “bloated” piece of software by stripping a ton of features out, pooping out some documentation, then calling it a day. Said alternative often lacks the community to see much maintenance, and it often lacks widely used features that its predecessor had.

There are legitimate usage cases for this term, but there is almost always a better, more specific way to describe why an alternative to something is better.

Summary

In closing, it’s important to remember that open source software is created and developed by other real humans. They tend to take some amount of pride in their work. It’s OK to criticize software, it often motivates positive change. However, it’s important to be specific and fair with criticism. Don’t cop out when describing what makes you dislike a certain library/module/software product. Get specific, be fair, and maybe everyone will benefit.

December 03, 2013

When (and when not) to use EC2

December 03, 2013/ Greg Taylor

There is a lot of advice on the internet regarding the suitability of EC2.One can easily find all kinds of benchmark and price comparisons, long rants on Cloud vs Bare metal, and any number of other reasons for or against using EC2. I feel that a lot of these articles miss the mark entirely, and figured I’d toss up my own attempt at guidelines for choosing (or avoiding) EC2.

This article will attempt to provide a simple list of some (but not all) cases where EC2 is and isn’t a good fit. I will undoubtedly miss some, and there are exceptions to every case, but take these as a set of general guidelines. Also, for the sake of brevity, I assume that you have a basic grasp of what EC2 is. If not, check out the EC2 page before continuing.

EC2’s killer feature

Before we get to the fun stuff, I want to take a moment to highlight EC2’s killer feature: Elasticity. Here are some important bits extracted from the AWS EC2 page:

Amazon EC2’s simple web service interface allows you to obtain and
configure capacity with minimal friction.
[...]
Amazon EC2 reduces the time required to obtain and boot new server instances
to minutes, allowing you to quickly scale capacity, both up and down, as
your computing requirements change. Amazon EC2 changes the economics of
cloud computing servers by allowing you to pay only for capacity that
you actually use.

EC2’s primary goal is to make it easy to provision and destroy VMs as needed. They’ve got an advanced set of HTTP APIs for managing your fleet, a very good set of imaging/provisioning utilities, a wide variety of instance sizes to choose from, and they even excellent auto-scaling capabilities built into ELB. The hourly billing allows you to handle bursts of traffic without a long-term commitment, and you can also fire up experimental/tinker instances without much expense.

There are other services that also offer some of these things individually, but EC2 is at the front of the pack when it comes to provisioning instances and scaling up/down.

It’s not all sunshine and roses

“That sounds great! Why would I ever want to use anything else?”

The performance per dollar is atrocious. Be prepared to shell out for a larger instance to get consistent performance. This is more of a barrier for projects with smaller budgets. In the case of a very well-funded project, infrastructure spend is (probably) much less of a concern.
Amazon Web Services in general features a pretty steep learning curve before one can make informed infrastructure decisions. You’ll need to learn about security groups, EBS, EC2’s SSH key management system, how snapshots/AMIs work, performance characteristics, common sources of failures, etc.
The lower end instances (below m1.large) are embarrassingly underpowered and erratic. For services or applications with low resource requirements, this may or may not be an issue. Make sure you benchmark and test before committing to a reservation!
EBS has had serious reliability issues and is very inconsistent performance-wise without Provisioned IOPS or RAID. The majority of the larger EC2 outages have been due to EBS issues.
The customer service is abysmally bad. Unless you pay for a Support Plan, your only option is to post to a public forum and hope that someone from AWS replies. I understand that AWS operates at a huge scale, but I expect better from a company full of brilliant people like Amazon. Telling paying customers (without a support plan) to post in the forums for help is inexcusable. A support plan should be for above-and-beyond service.

“That was discouraging. Let’s get back to the good stuff.”

EC2 is a potential good fit for your application when…

You want/need to be able to scale up and down to meet handle traffic. This can be done automatically via ELB (after some setup work), via the EC2 HTTP API, or by the more traditional web based management console.
Your application has to have consistently fast, low latency access to at least one other AWS service.
Your application pumps enough traffic into/out of an AWS service and you don’t want to pay the higher external traffic tolls.
You plan on using a number of managed services that typically pair with EC2. For example, ElastiCache, RDS, CloudSearch, and etc. These can be a life-saver if your team doesn’t have the ops/administrative skill to manage the equivalent EC2 instances yourselves. Though, they’re not always a good value for teams with some ops chops in-house.
You have enough budgeted to pay for the instance sizes that are appropriate for your performance needs. For many/most, this probably involves buying at least some instance reservations.
EC2 would allow you to streamline your operations processes enough to allow you to run with a smaller ops team. Spending more for infrastructure convenience is a lot less expensive than hiring another employee.

EC2 is probably not a good fit for your application when…

You don’t need to scale up/down much. This is one of the biggest strengths of EC2. If you aren’t using it, you can go with something much cheaper/faster/more reliable/more consistent.
You can’t afford to pay for redundancy, but your application has high stability/availability requirements. EC2 instances fail, EBS volumes fail or slow down randomly, entire availability zones fail. The systems supporting EC2 are incredibly complex and operate at a massive scale. Stuff happens. If you can’t afford some measure of cross-availability-zone redundancy (at minimum) and your application has stability requirements, EC2 is probably not for you.
The need to purchase 1-3 year instance reservations to get decent hourly rates on instances bothers you. The On Demand rates can be incredibly expensive if you run instances with them for extended periods of times. A reservation requires the upfront payment of a large chunk of the next 1-3 year’s costs. This can be a problem for those without enough liquid capital. It also means that adding additional capacity can be a larger business decision.
Your application has components that require very consistent performance, but you can’t afford one of the very large instance sizes that have few to no neighbors on the host machine. Very high activity DB servers often fall into this category.
You are wanting to eventually incorporate bare metal servers into your fleet. EC2 currently only offers virtualized instances, though you can rent out an entire host machine for yourself. Alternatively, you can use Direct Connect to bridge to certain data centers, but you’ll want to make sure this fits in your architecture and budget.

Advice on deciding for or against EC2

Read through the cases outlined above again and keep a tally of how many apply for your project. If you find that a good number of the cases under “good fit” apply, EC2 is probably worth further consideration. If you find that more than 1-2 of the scenarios under “not a good fit” are true, you should probably tread more carefully.

Exceptions, omissions, and etc

“But you forgot X case, or your advice isn’t appropriate for my usage case!”

This article attempts to be decent one-size-fits-most guide, but there are exceptions and other scenarios that I didn’t cover. If you know some of these, you probably don’t need this guide. If I missed something super obvious, leave a comment and I’ll update the article.

I don’t cover comparing prices against other providers in this article, but I’d like to point out that some of EC2’s price is there because of convenience and flexibility. You are paying extra for the ability to interact with stuff like ELB, CloudWatch, and other services. Perhaps I’ll follow up with another article expanding on this more if there’s interest…

In the end, it will come down to you identifying your requirements and finding the best fit.

On a related note, if you need help or advice with infrastructure planning, get in touch with me via the Contact link on the top navbar.

April 16, 2013

EMDR Map Sheds its Snake Skin

April 16, 2013/ Greg Taylor

As a fun exercise, I set out to re-write the WebSocket server behindEMDR Map in GoLang.

The Python version

The initial version of the WebSocket server powering the map was developed with Python, gevent-websockets, and ZeroMQ. While the original Python version was pretty simple, it was much heavier on memory and didn’t free resources very quickly after disconnections. My biggest gripe was that I wasn’t entirely happy with how the Greenlets interacted with one another. Time to needlessly re-invent the wheel for fun and profit!

The GoLang version

After cobbling something together, I found that some resources were saved, but nothing earth-shattering. More importantly, I feel that the channels and goroutines pattern makes a lot more sense for this particular project than my coroutines and internal ZeroMQ sockets.

I apologize for the lack of build instructions or documentation of any sort, but such is the norm for my experiments like this!

See the EMDR Map.
View the project on GitHub.

April 09, 2013

Linode NextGen

April 09, 2013/ Greg Taylor

As of today, Linode announced the completion of their upgrade effort, dubbed“Linode NextGen“. Upgrading an internaitonal fleet of servers is nothing to sneeze at, but they succeeded with flying colors.

In the last few months, we saw Linode:

Double the amount of RAM per instance
Bump all instances up to eight virtual cores (from four)
Invested heavily in improving their network
Bump instance outbound cap by 5x
Increased outbound monthly transfer by 10x

They have allowed much higher usage of their resources without compromising on performance. This wasn’t a matter of just upping the quotas and calling it a day.

Why does it matter?

One only need take a look at lowendbox or ServerBear to see that there are plenty of affordable options for VPS providers. Linode is still nowhere close to being the cheapest, but that’s not really Linode’s game. They’re going to give you something a little faster, a little more roomy, and they’re going to keep you happy wth their support. While it’s entirely possible you’ll find someone with similar specs, you’ll be hard pressed to find a competitor with the strength of this offering from top to bottom (price, hardware, network, service/support).

Linode has traditionally been a little more expensive, very developer-centric, and has (in my experience) had a pretty good customer service story. These latest round of upgrades don’t push Linode down into the “budget” category (nor should they), but they do make a good chunk of their competitors in the same category/price range look inadequate. For example, I’m not sure how I could justify using Rackspace after these adjustments for my own purposes.

We all win

Regardless of whether you use Linode or even like them, let’s be clear about one thing: When upgrades and bigger jumps like this happen, we all win. Other providers are going to look at this and will have to decide whether their current offerings need a shot in the arm. Linode is no industry juggernaut, but they are well known enough for this to cause a few ripples.

Let’s sit back and see who makes the next big jump.

April 01, 2013

Amazon Route 53 DNS failover

April 01, 2013/ Greg Taylor

While it is no longer shiny and new, I just recently got a chance to sit down andplay with Amazon Route 53’s DNS failover feature. So far, I have found it to be simple and very useful for simple cases where DNS fail-over is acceptable.

My usage case

I run EVE Market Data Relay (EMDR), which is a distributed EVE Online market data distribution system. All pieces of the infrastructure have at least one redundant copy, and the only single point of failure is the DNS service itself. We can afford to lose a little bit of data during fail-over, but a complete outage is something we can’t have.

Sitting at the top of the system are two HTTP gateways on different machines at different ISPs. These are set up as a weighted record set, with each gateway weighing in at 50/50 (requests are divided evenly).

We introduce the Route 53 magic by adding in a health check and associating it with each of the two resource record sets. The health check involves Route 53 servers around the world periodically calling a pre-determined URL on the two HTTP gateways in search for a non-error HTTP status code. If any of the entries fails more than three times (they check roughly every 30 seconds), said entry is removed from the weighted set.

By the time that Route 53 picks up on the failure, yanks the entry from the weighted set, and most fast ISP DNS servers notice the change, about two minutes have elapsed.

Why this is a good fit for EMDR

With EVE Market Data Relay, it’s not the end of the world if 50% of user-submitted data gets lost over the minute and a half it takes for Route 53 to remove the unhealthy gateway. It’s highly likely that another user will re-submit the very same data that was lost. Even if we never see the data, the loss of a few data points here and there doesn’t hurt us much in our case.

With that said, DNS failover in general can be sub-optimal in a few basic cases:

You don’t want to leave failover up to the many crappy ISP DNS servers around the net. Not all will pick up the change in a timely manner.
You can’t afford to lose some requests here and there. DNS failover isn’t seamless, so your application would need to be smart enough on both ends if data loss is unacceptable.

For more simple cases like mine, it’s wonderful.

Price

In my case, Route 53 is health checking two servers that are external to AWS, which means I spend a whopping $1.50/month on Route 53’s DNS failover.

Assorted useful bits of documentation

More details on how the health checks work can be found on the Route 53 documentation.

Refer to the Amazon Route 53 DNS failover documentation for the full run-down.

Greg Taylor

Greg Taylor

Blog

Greg Taylor

Oft-misused adjectives: Bloated and Lightweight

Bloated

Lightweight

Summary

When (and when not) to use EC2

EC2’s killer feature

It’s not all sunshine and roses

EC2 is a potential good fit for your application when…

EC2 is probably not a good fit for your application when…

Advice on deciding for or against EC2

Exceptions, omissions, and etc

EMDR Map Sheds its Snake Skin

The Python version

The GoLang version

Linode NextGen

Why does it matter?

We all win

Amazon Route 53 DNS failover

My usage case

Why this is a good fit for EMDR

Price

Assorted useful bits of documentation

Greg Taylor

RSS Feed

Shorter Ramblings