optimizing a grid-based particle system

536 views Asked by At

I've implemented a game somewhat similar to this one in Java and currently find that I'm hitting a ceiling number of particles of ~80k. My game board is a 2D array of references to 'Particle' objects, each of which must be updated every frame. Different kinds of 'Particle' have different behaviors and may move or change their state in response to environmental conditions such as wind or adjacent particles.

Some possible 'rules' that might be in effect:

  • If a Particle of type lava is adjacent to a Particle of type water, both disappear, and the lava is replaced by obsidian
  • If a gas Particle is adjacent to a Lava, Fire, Ember, etc. Particle, it will ignite, and produce fire and smoke
  • If a sufficient number of dust particles are stacked on top of one another, those at lower levels, as if under pressure, can become sedimentary rock

I've searched around and haven't been able to find any algorithms or data structures that seem particularly well-suited to speeding up the task. It seems that some kind of memoization might be useful? Would a quad tree be of any use here? I've seen them used in the somewhat similar Conway's Game of Life with the Hashlife algorithm. Or, is it the case that I'm not going to be able to do too much to increase the speed?

3

There are 3 answers

1
dk1111 On

i don't understand your problem clearly but I think cuda or OpenGL (GPU programming in general) can easily handle your ref link: https://dan-ball.jp/en/javagame/dust/

3
Persixty On

Hashlife will work in principle but there are two reasons why you might not get as much out of it as Conway Life.

Firstly it relies on recurring patterns. The more cell states you have and the less structured the plane the fewer cache hits you'll encounter and the more you'll be working with brute force.

Secondly as another poster noted rules that involve non-local effects will either mean your primitives (in Conway Life 4x4) will need to be bigger so you will have abandon divide and conquer at say 8x8 or 16x16 or whatever size guarantees you can correctly calculate the middle portion in n/2 time. That's made the worse by the diversity of states. In Conway Life it's common to pre-calculate all 4x4 gridsor at least have nearly all relevant ones in cache. With 2 states there are only 65536 4x4 grids (peanuts on modern platforms) but with only 3 there are 43046721. If you have to have 8x8 primitives it gets very big very quickly and beyond any realistic storage.

So the larger the primitive and the more states you have that becomes quickly unrealistic.

One way to address that primitive size is to have the rock rule propagate pressure. So a Rock+n (n representing pressure) becomes Rock+(n+1) in the next generation if it has Rock+m where m>=n above it. Up to some threshold k where it turns to sedimentary Rock.

That means cells are still only dependent on their immediate neighbours but again multiplies up the number of states.

If you have cell types like the 'Bird' in the example given and you have velocities that you don't keep to a minimum (say -1,0,1 in either direction) you'll totally collapse memoization. Even then the chaotic nature of such rules may make cache hits on those areas vanishingly small.

If your rules don't lead to steady states (or repeating cycles) like Conway Life often does the return on memoization will be limited unless your plane is mostly empty.

8
AudioBubble On

I'd use a fixed NxN grid for this mainly because there are too many points moving around each frame to benefit from the recursive subdividing nature of the quad-tree. This is a case where a straightforward data structure with the data representations and memory layouts tuned appropriately can make all the difference in the world.

The main thing I'd do for Java here is actually avoid modeling each particle as an object. It should be raw data like just some plain old data like floats or ints. You want to be able to work with contiguity guarantees for spatial locality with sequential processing and not pay for the cost of padding and the 8-byte overhead per class instance. Split cold fields away from hot fields.

For example, you don't necessarily need to know a particle's color to move it around and apply physics. As a result, you don't want an AoS representation here which has to load in a particle's color into cache lines during the sequential physics pass only to evict it and not use it. Cram as much relevant memory used together into a cache line as you can by separating it away from the irrelevant memory for a particular pass.

Each cell in the grid should just store an index into a particle, with each particle storing an index to the next particle in the cell (a singly-linked list, but an intrusive one which requires allocating no nodes and just uses indices into arrays). A -1 can be used to indicate the end of the list as well as empty cells.

To find collisions between particles of interest, look in the same cell as the particle you're testing, and you can do this in parallel where each thread handles one or more cells worth of particles.

The NxN grid should be very fine given the boatload of moving particles you can have per frame. Play with how many cells you create to find something optimal for your input sizes. You might even have multiple grids. If certain particles don't interact with each other, don't put them in the same grid. Don't worry about the memory usage of the grid here. If each grid cell just stores a 32-bit index to the first particle in the cell, then a 200x200 grid only takes 160 kilobytes with a 32-bit next index overhead per particle.

I made something similar to this some years back in C using the technique above (but not with as many interesting particle interactions as the demo game) which could handle about 10 mil particles before it started to go below 30 FPS and on older hardware with only 2 cores. It did use C as well as SIMD and multithreading, but I think you can get a very speedy solution in Java handling a boatload of particles at once if you do the above.

Data structure: enter image description here

As particles move from one cell to the next, all you do is manipulate a couple of integers to move them from one cell to the other. Cells don't "own memory" or allocate any. They're just 32-bit indices.

To figure out which cell a particle occupies, just do:

cell_x = (int)(particle_x[particle_index] / cell_size)
cell_y = (int)(particle_y[particle_index] / cell_size)
cell_index = cell_y * num_cols + cell_x

... much cheaper constant-time operation than traversing a tree structure and having to rebalance it as particles move around.