Dealing with those pesky “write failed” errors

Perhaps this has happened to you. You test render a frame from your scene file, and it looks OK. You send it to render on the Pixel Plow farm, and the job errors out. Sometimes those error messages claim that a “write failed”, or something similar. Other times, the render app just seems to stop with no stated reason other than a non-zero exit code. We’ve been seeing this on a steadily increasing basis. The are multiple reasons for errors like this, and they can stem anywhere from a render app/engine/plugin trying to be more efficient to the same just being poorly written. We’d like to give you a few general pointers on this topic, though, so you can be better equipped to submit render jobs that don’t error out.

Several render engines these days are trying to outdo each other with optimizations. One way they can achieve higher performance is to pre-compute, pre-create, or otherwise convert textures into a different format at or before render time. If this is done, and cached, prior to render time, it need not be done again during the render…thereby shortening the render time of all render passes that use the converted textures. This is great if you’re doing look dev on your machine and running several iterative renders over the course of the day using the same texture set. It might cause some problems when you attempt to render the same scene on a farm, though.

Consider what the software is actually doing. It’s reading texture files, performing some kind of higher math, and then writing files back out to your project folder. These written files often have the same base file name, but the extension is different to avoid naming collisions with the original texture files. That’s all fine and dandy, until you try to have more than one machine do that simultaneously….like a render farm does. If your scene file is instructing the render app, engine, or plugin to perform these conversions at render time and cache them, you’ve just instructed multiple machines to do that. This is sort of like combining matter and anti-matter, and that gives computers a headache. Only one machine will be able to write those converted texture files, and it’s usually the one that starts writing first. All the rest will not be able to write to the files, the conversions on them will all fail, and the render app will crash or exit. Arnold is guilty of enabling this with their “auto convert textures to .tx” option. Renderman has a similar function. Redshift has an increasing set of data that is being pre-computed and cached. Those are just to name a few of the problem children, but there are more in that family of black sheep. Look for the options named “convert”, “cache”, or anything with “pre-” in it. Keep in mind that those settings probably won’t work farm-side, and plan accordingly. Of course, if you want to do the conversions/caching on your machine and configure the app to use pre-converted objects, by all means do so! That can only benefit you at render time on the farm.

The next likely time for write failures during a render job is actually clear at the tail end of the process. When output files are going to be written by the render app, the app needs to know a few things. 1) Where should it write the output files? 2) What should it name the output file(s)? 3) What file format/type should the file contain? You’d think this should be childs play by now. After all, it’s 2017. Haven’t we figured out how to do this perfectly every time? Unfortunately not. As an example of a render engine violating #1, Renderman can write its output to subdirectories of the directory you specify in your scene file or via a CLI switch at render time. What it lacks the ability to do, however, is create those subdirectories prior to writing to them. Then, when it comes time to write the output files to a location that doesn’t exist, it fails and dies. It does pre-create the subdirectories if launched from a Maya GUI, though. That’s not how a farm is going to run Maya. Pretty unfortunate, huh?

#2 seems easy enough to construct, but that’s only at casual glance. When a render farm is writing output files, each one of those files must have a frame number within the filename. The output file names can’t all be “super cool image.png”, but have to be “super cool image_0001.png” for frame #1. Naming schemes can often be modified by the scene file itself, so it’s important to make sure you’ve selected a naming scheme that writes frame numbers to the output files, and isolates them from the rest of the filename via some sort of separator character like a dash, underscore, period, or similar. We try to handle this automatically in many cases, and we’ll probably be adding more as we see people having common problems with one or more render apps. Ahem…Vue!

As to the file format of the output, well, it has to be defined. Something has to define it, whether the render app knows what to do from settings in the scene file or from command line switches, it has to come from somewhere. If our agent doesn’t ask you for the output format, then you can rest assured you’ll need to define it in your scene file. Render engines that have VFB’s or render viewer windows that don’t require writing output when test rendering can lull you into a sense of complacency. You still need to tell the render app what output format you want it to create.

As stated earlier, we’re seeing these problems with increasing frequency. We continue to fight the settings that don’t work on a farm as we come across them and have the ability to auto-correct them, in a way far better than what your smartphone does to your texts. Since we haven’t had time to address all of them, and there are new offending settings being made all the time, you’ll have to meet us half-way. Hopefully this article has given you some insight into farm operation and an appreciation for some of the many intricacies of what we do. Keep on fighting, ’cause we’re right there with you.