Using Amazon EC2 instances to create a render farm…NOT!

yawn

If you are looking for how to create a render farm using Amazon’s AWS/EC2 platform, you’ve come to the right place. I’ve seen several posts/questions on CG user forums related to using Amazon EC2 instances as render farm nodes. I think the attraction is all due to Amazon’s marketing of their AWS service, which states you can “Instantly deploy your application. Scale resources up or down based on demand.” They like to portray the idea that their service has no limits to scalability. Last I checked, we’re still living in a very finite world. Allow me to explain why using AWS for rendering is an extremely bad idea. Seriously, spend a few minutes to read this to the end. It will save you loads of grief and piles of money.

In my arguments, I’m going to completely ignore any of the licensing, deployment, job distribution, frame delivery, security, and farm management costs and problems. Keep in mind, though, that some of those *by themselves* are more than enough to dissuade a user from implementing EC2 instances as render farm nodes. Let’s just assume that we live in an alternate universe where those issues don’t exist. I’d like to analyze, solely, price and performance of EC2 instances compared to physical hardware.

First, let’s take a look at performance, since that’s the cornerstone of my argument. For those, like us, who have extensive experience with various virtualization technologies (VMWare, Virtuozzo, Xen, HyperV…), you’ll know that there is a performance penalty you pay for the ability to virtualize. That penalty varies based on the type of virtualization, and may change as technologies improve. No matter any improvement, there will always be a performance penalty due to the virtualization overhead. For this topic, we’ll limit our discussion to Xen, which is the virtualization technology that Amazon uses to run their EC2 instances. Taking a look at the Xen website, we find that they don’t state what their virtualization overhead is. Let’s assume it’s 30%, since our EC2 instances are going to run Windows with PV drivers. A little online research will show that’s in the ballpark, and it may even be optimistic.

That leads us to Amazon’s instance types. Hmmm…too bad we have to pay for tons and tons of storage we’ll never use. I guess we’ll pick “High-CPU Extra Large Instance” since that seems to give us the most computer power for the least money. So, what would that node really do for us? Well, look further in Amazon’s documentation to find that an instance of that type is given “20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)”. What the heck does that mean? Well, we have to look further to find out. Amazon defines an EC2 Compute Unit as “One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.”

It’s pretty darn cool that they’re comparing their performance to 7 year old hardware. I wish I could do that and get away with it. So, let’s say their “High-CPU Extra Large Instance” has an equivalent processing power of a fictitious CPU containing 8 cores, running at 2.75Ghz (1.1 * 2.5), that has an instruction set from 2007. I don’t know about you, but I’m already yawning. We have to lop off the virtualization overhead, though, so let’s knock that down to 8 1.9Ghz cores running archane instruction sets. Oh, and we have to assume that performance never, ever drops below theoretical…which, if you’ve ever actually used EC2, you’re already laughing.

So, how does your render application perform with 8 sub-1.9Ghz cores worth of 2007 Xeon? Sure, it runs, but slowly…and at what cost? Oh, and let’s hope your render app needs less than 6GB of memory during its run, since that’s all you’ll get at best.

Fine, you say, I can live with the performance penalty by simply starting up more instances. Better get out your wallet, and start filling out request forms to get over Amazon’s default 20 instance per zone limit. So much for infinite scaling. Only they know what they’d limit you to per zone.

So, at today’s rates, Amazon charges $1.14/hr. for the Windows EC2 instance we’ve defined above. Let’s convert that into a Ghz-hr. cost, since that’s what most render farm services use. Looks like the math yields $0.075/Ghz-hr. for the node. Well, that’s already way more expensive than our services highest priority queue rate, and you don’t even have any of the other necessary pieces of the render farm pie! Add software licensing for your render application, plug-ins, and farm management software. Add the enormous time you spend setting all of this up (installing software, getting licensing to work in the EC2 cloud, figuring out how to start/stop/enslave EC2 instances, debugging the whole thing, moving your scene file(s) and assets to the AWS cloud, and waiting as you sluggishly download the rendered results) and you have a guaranteed losing situation.

Let’s take a minute to summarize:

Performance = yawn
Price = yuck

Hopefully this is all you need to see to realize that EC2 instances don’t make good render farm nodes. I’ve even seen a couple public render farm services claiming they use EC2 instances. Given the reality of the above, be prepared to be simultaneously underwhelmed by their performance and overwhelmed by their price. EC2 simply isn’t the render farm nirvana that many expect.

EC2 instances as render farm nodes = doesn’t compute

We charge as low as $0.005/Ghz-hr for render jobs, have fully automated job submission and result delivery, operate over 256bit AES encrypted tunnels by default, run directly on actual, current hardware, and save you all the headache/heartache/eyeache of the countless hours you’d spend staring at a screen trying to get AWS to turn into a render farm for you. Seriously, AWS or EC2 and “render farm” don’t belong in the same sentence.