Function in postgres taking too long and how to trace back how many records has been updated

92 views Asked by At

I have a cased scenario where I need to update huge records (around 5 million) based on some business keys coming from a dimension table(1.2 million).

The update is for updating the dimension surrogate key present in fact as in dimension we got duplicate records and wanted to update the old surrogate key values to the new ones referring to fact.

So I have created a procedure which will do the following

  1. get all the business keys from dimensions having duplicate values and the loop will fire for individual combination of these business keys
  2. get the surrogate key from dimension where the records is old and needs to be replaced
  3. get the surrogate key from dimension where the records is new and going to replace to old value
  4. Update script which will update the old record with the new one.

Actually for smaller records it worker fine but for these huge table it is taking longer than 4 days and the process is still executing.

Is there any ways which can make the process faster and quicker? I already created indexes on the tables.

When I am checking the records as how many has been updated it is showing 0, but it used to commit after the entire process finishes. Can we commit in between so that we can track back?

The database which I am using is postgresql and after executing the function in database. I am calling the function through simple Pentaho ETL execute step which will call the function.

0

There are 0 answers