Entity Component System Framework that is CPU cache friendly

4k views Asked by At

I can not find a single framework implementation that is CPU cache friendly, which means that data on which systems traverse in each game loop cycle is stored in a contiguous memory.

Let's see, systems traverse on specific entities that satisfy their conditions, i.e. the entity should contain A, B, C components to be processed by X system. This means that I need a contiguous memory that contains all entities and components (not references, as far as references are not cache friendly, and you will have lots of cache misses) in order to obtain them from RAM as much fast as it is possible during the processing of X system. But right after the processing of X system, Y system starts to operate on a set of entities that satisfy to its conditions e. g.all entities containing A and B. This means that we deal with the same set of entities as X system plus some other entities that have A and B. This means that we have two contiguous memories which have duplicate data. First of all data duplication is very bad for known reasons. And also this, in its turn, means that we need a synchronization, which again is not CPU cache friendly as far as you need to find some entities from one vector and update with the new data which is contained in another vector.

This is just one of my thoughts. There are other, more realistic, thoughts for Entity Component System Framework data model, but in each model I could figure out there is the same problem: during each game loop cycle you can not prevent lots of cache misses because of not contiguous data.

Can anyone please suggest an implementation, article, example just something on this topic, that can help me understand what data model should be used to obtain cache friendly design, as this is one of the most critical things in game performance.

3

There are 3 answers

0
AudioBubble On

Theoretically I think this problem requires too much effort to solve to justify the time it can take to solve it perfectly. I already spent too much time on this myself in the past, coming up with convoluted solutions only to settle back to a simpler one. Our biggest hotspots aren't necessarily going to come from non-compulsory cache misses for entity/component traversal. Many systems will have their share of hefty work to do for a particular entity that can be sped up, and many components will often be large enough to diminish the benefits of trying to sort them in a way that fits multiple neighbors into the minimum number of cache lines.

That said, if you just wanted to, say, sort the components in a way that will provide cache-friendly memory access patterns but just for one or two crucial systems without overlapping conflicts and maybe for the tiniest and most numerous component types where that's bound to help the most, that's easy enough to do with some post-processing here and there. I recommend seeking that instead in response to your hotspots.

And often just some basic sorting will help you reduce your share of cache misses for all systems, regardless of what combo of components they process. If you start out with a rep like this (what I use):

enter image description here

And after a while of running the game state and removing and adding components sporadically, you end up with something like this:

enter image description here

You can untangle the mess and sort it out like so:

enter image description here

That can be done very cheaply with a radix sort, sorting the elements based on the entity index that owns them as the key in linear time. With a decent implementation you can generally sneak this in without noticing any hiccups to frame rates. I drew the diagram in a different way than the above table of data (just to be clear what component belongs to what entity), but same idea. Just radix sort the array of components based on the entity index (entity ID), update the links (use a parallel array to map before/after indices which gets sorted along with the component data using entity index as the key), and now everything is all nice and neat and not tangled up with sporadic access patterns.

This might not give a system interested in a particular combo of entities a perfectly contiguous set of components (there might be some gaps as in the above diagram), but at least it won't be going back and forth and back again in memory, potentially having to load a memory region into a cache line only to evict it and then come back and reload it again, and there's probably a good probability that a lot of components will be accessed contiguously in those cases.

And if that's not good enough, then given the specific entities in question which have exactly the components a system is interested in for a particular query, you can sort the components to the top of the array against those particular entities the system wants, closing any gaps, and now you have perfect contiguity for the system specifically processing entities that contain both motion and rendering components. This can also be done in linear-time with a post-process here and there, maybe applied periodically after a number of components have been removed and added.

I've never found the need to go this far. I just do the generalized sort against entity IDs now and then to generally improve the access patterns of all the systems (but without an optimal solution for any given system). Your use case might call for an optimal version, but I would suggest to just focus on the key systems with big hotspots that really benefit from it.

2
junkdog On

Adam Martin/t=machine recently posted Data Structures for Entity Systems: Contiguous memory - it's the only article specifically dealing with memory layout in ECS that I'm aware of.

You don't specify a language, but in the java world, entreri and artemis-odb (via PackedComponents / also, disclaimer: my port) handle what Adam refers to as "Iteration 1: a BigArray per ComponentType".

4
Adam On

I would go with junkdog's answer (since I wrote the linked article ;)), but here's another, different, take on it:

If you want cache-friendly design, you need to list:

  1. Your microprocessor
  2. Your processor architecture
  3. Your bus architecture
  4. ...
  5. Your per-sub-frame working-set size
  6. Your total working set / RAM for all game-objects
  7. The amount of interconnections in your specific game
  8. ... etc

Depending on how tight/loose those requirements are, you will have different easy (or hard) decisions to make about your design. Game developers re-write mem-management frequently. They don't do this because they're stupid, they do it because it's easy / worthwhile to (re-)optimize per project (is this a AAA title? or a AA title? Are graphics more important? Or network latency? ... etc) and per-hardware (on PC, the target hardware changes every month)

I recommend you pick a set of hardware, create a simple ES-based game, and start tryign to design a cache-friendly mem usage - and document it publically, make it all open-source, and see if you can get other people interested in running your benchmarks.