time for another XNA question. This time it is purely from a technical design standpoint though.
My situation is this: I've created a particle-engine based on GPU-calculations, far from complete but it works. My GPU easily handles 10k particles without breaking a sweat and I wouldn't be surprised if I could add a bunch more.
My problem: Whenever I have a lot of particles created at the same time, my frame rate hates me. Why? A lot of CPU-usage, even though I have minimized it to contain almost only memory operations.
Creation of particles is still done by CPU-calls such as:
- Method wants to create particle and makes a call.
- Quad is created in form of vertices and stored in a buffer
- Buffer is inserted into GPU and my CPU can focus on other things
When I have about 4 emitters creating one particle per frame, my FPS lowers (sure, only 4 frames per seconds but 15 emitters drops my FPS to 25).
Creation of a particle:
//### As you can see, not a lot of action here. ###
ParticleVertex []tmpVertices = ParticleQuad.Vertices(Position,Velocity,this.TimeAlive);
particleVertices[i] = tmpVertices[0];
particleVertices[i + 1] = tmpVertices[1];
particleVertices[i + 2] = tmpVertices[2];
particleVertices[i + 3] = tmpVertices[3];
particleVertices[i + 4] = tmpVertices[4];
particleVertices[i + 5] = tmpVertices[5];
particleVertexBuffer.SetData(particleVertices);
My thoughts are that maybe I shouldn't create particles that often, maybe there is a way to let the GPU create everything, or maybe I just don't know how you do these stuff. ;)
Edit: If I weren't to create particles that often, what is the workaround for still making it look good?
So I am posting here in hope that you know how a good particle-engine should be designed and if maybe I took the wrong route somewhere.
There is no way to have the GPU create everything (short of using Geometry Shaders which requires SM4.0).
If I were creating a particle system for maximum CPU efficiency, I would pre-create (just to pick a number for sake of example) 100 particles in a vertex and index buffer like this:
And the cool thing is that you only need to do this once - you can reuse the same vertex buffer and index buffer for all your particle systems (providing they are big enough for your largest particle system).
Then I would have a vertex shader that would take the following input:
That vertex shader (again like the XNA Particle 3D Sample) could then determine the position of a particle's vertex based on its initial velocity and the time that that particle had been in the simulation.
The time for each particle would be (pseudo code):
In other words, as time advances, particles will be released at a constant rate (due to the offset). And whenever a particle dies at
time = particleLifetime
(or is it at 1.0? floating-point modulus is confusing), time loops back around totime = 0.0
so that the particle re-enters the animation.Then, when it came time to draw my particles, I would have my buffers, shader and shader parameters set, and call
DrawIndexedPrimitives
. Now here's the clever bit: I would setstartIndex
andprimitiveCount
such that no particle starts out mid-animation. When the particle system first starts I'd draw 1 particle (2 primitives), and by the time that particle is about to die, I'd be drawing all 100 particles, the 100th of which would just be starting.Then, a moment later, the 1st particle's timer would loop around and make it the 101st particle.
(If I only wanted 50 particles in my system, I'd just set my particle lifetime to 0.5 and only ever draw the first 50 of the 100 particles in the vertex/index buffer.)
And when it came time to turn off the particle system - simply do the same in reverse - set the
startIndex
andprimitiveCount
such that particles stop being drawn after they die.Now I must admit that I've glossed over the maths involved and some details about using quads for particles - but it should not be too hard to figure out. The basic principle to understand is that you're treating your vertex/index buffer as a circular buffer of particles.
One downside of a circular buffer is that, when you stop emitting particles, unless you stop when the current time is a multiple of the particle lifetime, you will end up with the active set of particles straddling the ends of the buffer with a gap in the middle - thus requiring two draw calls (a bit slower). To avoid this you could wait until the time is right before stopping - for most systems this should be ok, but might look weird for some (eg: a "slow" particle system that needs to stop instantly).
Another downside to this method is that particles must be released at a constant rate - although that is usually pretty typical for particle systems (obviously this is per-system and the rate is adjustable). With a little tweaking an explosion effect (all particles released at once) should be possible.
All that being said: If possible, it may be worthwhile using an existing particle library.