In a Postgres DB (9.4), we have a table with 35 millions rows. The data is fast static, it is first inserted with a COPY, then few fields are calculated.
An example of calculation is:
UPDATE mytable
SET the_name = the_rank || '_' || the_number || '_' || the_name
WHERE the_partition=5;
the_partition may have 5 different values, so the computations are not done on the whole table, but 5 requests are required to update the_name.
When 500 000 are updated => it takes about 30s When 5 000 000 => it should take 300s, ia 5 min
But for 15 000 000, it didn't finish after many hours...
I suspect the auto vacuum for not doing its job well, so I tried to set it up more aggressive, expecting it should autovacuum after 500 000 rows update
ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_analyze_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_vacuum_cost_limit = 10000);
But as I understand it, when I run a big update of 15 000 000, it is in the same transaction, so the auto vacuum won't be applied ad 15 000 000 of news rows will be stored until the request is finished (as postgresql makes a copy of updated rows)? So how to make the computation time kind of linear?