Consider a situation when there is a really large table (up to serveral TB), and for a some reason an application wants to scan this table full. There are multiple CPU cores both on the application and PostgreSQL server side, so it brings us to the idea of separation of a table into multiple parts and reading them concurrently in parallel threads. What would be the best way to achieve that?
Does it make sense to emulate table "pagination" with OFFSET / LIMIT like this?
SELECT * FROM table LIMIT 1000000 OFFSET 0
SELECT * FROM table LIMIT 1000000 OFFSET 1000000
SELECT * FROM table LIMIT 1000000 OFFSET 2000000
(...)
so that each SELECT will be executed in a distinct application thread.
Or maybe it worth trying using PostgreSQL native table partitioning, so each table partition (a distinct table in fact) would be read independently?
Another important point is consistency. Surely table may change during the scanning. Does PostgreSQL provide any kind of transactions for independent SELECTs to make them work with a consistent view of a table without data brought in by later requests?
PostgreSQL has support for parallel queries.
But parallelization isn't a magic bullet.
I'm not going to go into detail, it's all in the docs about how to configure Postgres to use parallel workers and do parallel seq scans.
At a certain point, it doesn't matter how many CPUs you have, you're doing to run into the limits of disk I/O. Even the best SSDs will only read 10 GB/s, several terabytes will take at least a few minutes. It would be better to question whether you can avoid having to seq scan the table.
Possibly, but table partitioning is also not a magic performance bullet. Without more details I can't say.
A 3rd option is to move this massive blob of data into a data warehouse optimized for these sorts of big queries.