Those of you who follow us closely know that our farm is asymmetric from a hardware perspective. This is certainly true when it comes to the available physical memory per-node. We have nodes with 32GB, 64GB, and 96GB of memory available to them. If you’ve seen the price of memory recently, you’ll know that it isn’t cheap since one of the three fabs burnt to the ground last year. This makes upgrading a large farm a rather expensive proposition.
Good news, though, is that Superdev is at it again with his usual wit-over-problems approach. He realized that Harpoon, our awesome new scheduler, was already tracking average memory consumption on a per-job basis. This allows it to automatically restrict jobs to running on nodes with more memory if needed. Pretty slick, huh? We were trying to detect out-of-memory situations by assessing render app log files of failed frames, and then automatically re-queuing failed frames with higher minimum memory thresholds defined.
Well, he took that a step further, now that Harpoon is in direct control of scheduling. Now, we initially schedule jobs on only nodes with 96GB of memory. Once the first frame/tile completes and the memory consumption is recorded, the job will be released for scheduling on lower memory nodes if they have enough to safely complete the job. This way, we’re not waiting for out-of-memory errors to occur to bump up the jobs to the next higher minimum memory threshold. This top-down approach eliminates the out-of-memory errors from ever happening in the first place, thereby preventing re-queues that delay job completion times. Of course, this entire mechanism is completely automatic, and it doesn’t require any user interaction or involvement before, during, or after the job to work. I mean, after all, how many of you know in advance how much memory your scene file will need in order to render correctly?
It should go without saying, but we’ll say it anyway. The largest memory nodes on Pixel Plow currently have 96GB available to them. If your scene file requires more than that to render, we couldn’t do it here. In those cases, it would be wise for you to do a bit of scene file optimization. Get in contact with your render app and engine devs to know the best approach to take for memory optimization.
This is just one of the pretty darn slick things we can do with Harpoon. Stay tuned for more.