Sunday, April 18, 2010

How Fast is Fast?

In pretty much every review or comment about Krakatoa, I see something along the lines of "Krakatoa renders really fast". Don't get me wrong, I like reading that, but let's be fair - speed is relative (ask Albert E.) and people don't say this because Krakatoa is actually fast, but because most mesh-based particle rendering solutions for 3ds Max are slower in comparison.

All this good press makes my job harder as I try to push for speed improvements in the development of future Krakatoa versions (you might not know this, but my middle name is "Speeeed!"). We are in the business of selling software to users, and if a single copy of Krakatoa can get the job on time, why buy more? There was even an anecdotal case where an important customer of Krakatoa wanted to buy additional licenses, but before the transaction could be finalized, they told us they got their project finished with the few seats they already had because "Krakatoa renders so fast". Obviously, while a fast product is good for the image of the product, it could be quite detrimental to sales - thankfully, the client still bought those additional licenses so no harm done, but it made us thinking...

What would a company do if its existence depended solely on software sales? Things like limiting the Evaluation version to do a lot less (most of the particle management features of Krakatoa like partitioning, converting particle files between formats, deforming and modifying, importing back into Particle Flow etc. are basically free) or making Krakatoa deliberately slower come to mind. Thankfully, the main reason for Krakatoa's existence is the internal need of our visual effects production for a fast particle renderer. And watching Krakatoa in real-world production proves my point - Krakatoa can be damn slow and we have to do something about it!

So how fast is fast? Some people might answer "real time on the GPU would be good enough". Unfortunately, this is not the right answer. There have been some examples of real time CUDA-accelerated particle rendering in nVidia demos, but this is not where we are going and the reason is the data amounts we deal with. The main objective of Krakatoa is the "fast rendering of vast amounts of particles". In this objective, the second part has higher importance than the first one. With the arrival of 64 bit computing and the increases of installed RAM, our workstations can typically fit around 700 MP (Million Particles, or MegaPoints) in 16 GB of RAM, and we often go there or near. A typical graphics card with 1GB of RAM can handle less than 1/16th of that amount, so we don't care about that approach just yet, although it would be great for fast tests with a fraction of the particle count.

Another thing to keep in mind is that various areas of Krakatoa have different impact on speed - some are as fast as we could make them, some could be made faster, some can become bottlenecks depending on the settings. For example, the loading of particles which also includes the evaluation of materials, maps, deformation modifiers, MagmaFlows and particle culling was sped up in v1.5.0 via multi-threading and is now several times faster than in 1.1.x. The sorting portion has been multi-threaded for years and is probably as fast as it could be. The drawing in the Lighting and the Final Passes of Particle Rendering has remained single-threaded since the first version of Krakatoa and has been measured at around 2 MP/second. Given the increasing number of cores in today's machines, this is an area that could improve a lot! The evaluation and processing of Matte Objects was also sped up in 1.5.0 by simply switching from a raytracer to a rasterizer, but it could also benefit from multi-threading the depth map generation.

Adding a new light adds another sorting pass (already fast) and another drawing pass (not as fast as it could be). The moment you check the Motion Blur option, you ask Krakatoa to draw the Final Pass several times. If the number of samples is 8, Krakatoa has to do 8 times the drawing work and typically scales linearly (it actually takes 8 times longer to draw the particles if you ignore the loading time or render with PCache and LCache enabled). But if you have 8 cores in your workstation and Krakatoa would use them all to speed up the drawing, it would mean that 8 passes motion blur would "cost" as much as one pass right now. Wouldn't that be great?

You betcha! So that's what the next version of Krakatoa will do (and more). And the more particles you throw at it, the better it will scale. On my workstation, the pure drawing speed of 100 million particles comes close to 17.5 MP/second! The generation of a Matte depth map from 100 million polygons, something that is also performed before each motion blur pass, went down from 32 seconds to 4 (8x4=32, you know?). And we are not even half-done yet. Add to that the loading speed up with PRT Volumes already reported in a previous blog and you will see how this new build is shaping up as the fastest Krakatoa you have ever seen. In the same blog I mentioned it might ship as v1.6.0 - that suspicion turned out true and we already started updating the documentation for this upcoming 1.6.0 build which should be expected sometime before Siggraph.

I am quite excited about these improvements which we haven't even passed to the Beta testers yet as they happened in the last 10 days. All this wouldn't have been possible without our relentless production team which not only pushes Krakatoa to places it has never been to yet, but also constantly bitches about how slow it is! Love you guys! :)

So don't panic, we won't make Krakatoa slower just to sell more network render licenses and Deadline seats to run them, or disable more features so you cannot play with it for free. We understand that a faster Krakatoa is easier to love, and a Krakatoa you love is one you would if not buy yourself, then at least suggest to your employer (the guy with the wallet ;))...

Stay tuned!