Performance considerations

Topics: Developer Forum
Nov 22, 2006 at 11:13 AM
I believe that performance considerations should be reasinably high priority in the development of Animation Library. I suggest to discuss possible performance approaches and optimizations in this thread.

There is a good starting point for performance considerations in DirectX SDK documentation: the "Top Issues for Windows Titles" article (look for it in "DirectX Sample Browser" in SDK).

Quite expectedly, the article lists #1 top issue as "CPU-Limited Performance" (underused GPU, overloaded CPU). This will remain true for XNA-based games. So a natural strategy for Animation lib should be to push as much processing as possible to the video-card. To a significant degree, the Animation lib follows this approach from earliest versions, but there is still a lot to improve.

A good reference application is MultiAnimation from DirectX SDK samples. It's the sample that demonstrates numerous Tiny characters walkin the scene (with "Add" button to add more characters). On my PC (P4 2GHz + GeForce6600) it is capabale of displaying more then 200 walking Tinys without any signs of framerate degradation, and with 200 characters the CPU load remains low (under 20%). I think it can display even more animations, I just got tired clicking "add" button to add characters. I suspect that a GeForce6600 card is be capable of showing over 1000 simultanions animations with the MultiAnimation sample.

The MultiAnimation sample uses basically the same algorithm as the XNA Animation lib: skinning is done on GPU using shaders. However, the Animation lib certanly has more options then the MultiAnimation sample, and some of them affect performance.

Current version of Animation library (change set 11760) is fairly slow but I believe this should change soon with the introduction of "use provided keyframes" option (that is, turning off run-time interpolations). I believe this is a very important option for common usage scenarious.

Nov 22, 2006 at 10:43 PM
I agree, performance is a huge priority. Right now, a lot of the code does not have optimizations, but with UsePrecomputedInterpolatons set to true, the performance can be good.

I added the Clone() method to animation controller, and this allows them to share the same interpolation table, which greatly increases performance. I tested this out yesterday on my 3 Ghz single core processor and could run over 200 animations of tiny without frame rate drop.

If we use precomputed interpolations, there isn't too much room to improve performance. There are minor things we can do here and there, but I spent a lot of time yesterday trying to figure out ways to optimize it.

Another thing to keep in mind is that the current shader is very expensive. Phong shading does not come cheap, and phong shading with 3 lights can be very costly. Perhaps we should change the shader to be a cheaper type of lighting.

Finally, the bone palette is set inefficiently, and the world parameters are buffered every frame so that the animation controller doesn't change the initial value. I'll change this in the next release.

With these changes, there will be very little room to improve performance for precomputed and using provided keyframes (which will have the same performance), and I expect that my computer will be able to run 250-300 or mroe concurrent animations of tiny.x

On another note, I'm home for Thanksgiving holiday, so won't have as much time to program for the next few days (and I'll have to set up my old computer which can barely run shader 2.0)
Nov 22, 2006 at 10:45 PM
And UsePrecomputedInterpolations does not use runtime interpolations, it calculates them before hand and puts them in a jagged array (which is another thing I will change; I will the array two dimensional instead of jagged)
Nov 22, 2006 at 10:54 PM
Once i get my computer set up, I'll test these various configurations, and I'll also test changing the bone table only in the Update call as opposed to teh Draw call. Hopefully I can get a performance that is equal to the sample.
Nov 23, 2006 at 7:17 AM
Looked at multi-animation sample. It uses a far simpler lighting algorithm, which increase performance a lot, and splits up vertex shader calls based on the number of bone influences, which is a good idea.

Otherwise, it is the same algorithm as the Animation Component's, but is not exactly a shining example of extensible code (comments such as "we are ignoring the texture file in the animation here, and just loading our own" come to mind...)
Nov 23, 2006 at 7:22 AM
And sorry for my 1 million posts but I'm off to bed! Finally got everything installed on this run-down computer, and I'll see if I can put all this theory into action tomorrow.
Nov 23, 2006 at 8:48 AM
In no way MultiAnimation sample is a shiny example of extensible code.
It's just a usefull demo that shows what the performance limits are.

Please note that MultiAnimation does not seem to be limited by CPU performance. On my PC, its performance seems to be limited by GPU fillrate. That should be a typical approach in games. CPU is needed for AI, so pushing animation code to GPU as much as possible is quite reasonamble.

Current version of animation library with all the options configured for higher speed is somewhat slower and it puts more strain on CPU (as can be seen with Task Manager). That means there is some space for improvement (most probably, by trading performance vs. quality via more configurable options).

I don't have a profiler so I did not have a chance to analyze Animation Lib in-depth, but I wanted to start this discussion thread so that we start thinking about performance trade-offs.
Nov 23, 2006 at 8:48 AM
Oh, and I wish you happy Thanksgiving!
Nov 25, 2006 at 9:18 AM
Hmm, I've been testing out a lot of things for performance, and I'm pretty close to the point where I don't see if it's possible to optimize performance any more. All i do for animations is set the world parameter and matrix palette, and the performance is still lower than the MultiAnimation sample.

I assume it has something to do with how parameters are set in XNA... I'm trying to figure this out.
Nov 25, 2006 at 9:24 AM
It is not my code that is slowing it down - even when all I do is:
foreach (ModelMesh mesh in model.Meshes)

The cpu usage jumps up. This is with ZERO updating, so it must be something with xna...
Nov 25, 2006 at 9:52 AM
I've found the source - the slowdown occurs in the vertex shader because the vertex blending on the GPU. This shows up as CPU usage.

So, I'll focus on optimizing the shader. This is the best performance possible, I think. After I submit my next changeset, there is really very little or nothing that can be done to further increase performance.
Nov 28, 2006 at 4:24 PM
Here is some useful discussion of animation in XNA:

Nov 28, 2006 at 5:26 PM
leclerc9 is my handle for the XNA forums, just for future reference.
Nov 28, 2006 at 5:30 PM
I still have that code lying around which separates models based on number of bone influences. I will try this out sometime, because it seems in this case the controller is impeded more by the gpu than the cpu.

The key question is:
Are the draw calls slow because of XNA SOFTWARE or because of the shader HARDWARE?
Oct 12, 2007 at 9:59 AM

dastle wrote:
I agree, performance is a huge priority. Right now, a lot of the code does not have optimizations, but with UsePrecomputedInterpolatons set to true, the performance can be good.

I added the Clone() method to animation controller, and this allows them to share the same interpolation table, which greatly increases performance.

Hi dastle! Have you implemented this Clone()-function you´re mentioning here? I can't find it, neither the 'UsePrecomputedInterpolatons' if there exists one. I could really use this kind of functionality as my scene is getting pretty animation-intense with lots of duplicate animation controllers.