I have a very basic space set up with a large number (10000) of circles falling under gravity. It seems that only one of my cores is active at all at any point in time, but as far as I know collision resolution is extremely parallelizable with space partitioning, which the Chipmunk engine does indeed implement.
It seems that even when I create the space with threaded=True on Linux, and space.use_spatial_hash(), the issue persists. I really need the extra performance parallel threads could provide, what are my options? Should I move to pybullet instead with planar constraints to emulate 2d?

You need to set the number of thread to use as well, for example like
your_space.threads = 2In the docs its written like this (Space.init):
You can verify that it works by running the threaded_space example
python -m pymunk.examples.threaded_space. If I do that and watch CPU usage inhtopI can easily see when it goes from 1 to 2 threads.I should also add that unfortunately the multi-threading does not improve performance much in most cases. It can only help if there's a lot of collisions going on at the same time, but even then its not much (Maybe on ARM it gives bigger improvement, but that I cannot try).