I've been working on a series of automatic load-testing scripts, and I've noticed that when averaged out, there's no difference between running a cluster of 2 processes and 4 processes on a Heroku dyno (in this case, a Hapi.js server which just immediately returns a reply), despite the dyno reporting itself as having four available CPUs. The difference between 1 and 2 processes is huge, nearly a 100% increase in throughput.
My guess is Intel CPUs / hyperthreading reporting twice as many cores as are actually available, and Node doesn't really benefit from the benefits in scheduling, but there seems to be very little information available about the specs on Heroku dynos. Is this accurate, or is there another reason performance caps out at 2 threads on a server with no I/O?
This is due to several reasons:
If you're doing CPU intensive stuff, you'll need to scale horizontally across dynos. If you're doing IO intensive stuff, you should be fine vertically scaling to large dynos over time =)
UPDATE: To add more info here, this is the way virtualization works. EC2 boxes (and any linux servers) will always report the total number of CPUs of the core machine, not the VM. Hope that helps