I am new to Oracle and I tried to improve the performance of updating a huge table (500+ Millions Records) - no matter how much i try, its killing the performance.

I am trying to UPDATE a Table with 500 Million Records using 50 Parallel SQLPLUS Sessions with Bulk Collect LIMIT and FORALL - Which is taking huge time (Elapsed: 04:32:30.29).

  CURSOR NIT_CURSOR IS
  SELECT SA_ROWID, NEW_TRX_1, NEW_TRX_2
  FROM NIT_SA
  WHERE MOD(DBMS_ROWID.rowid_block_number(SA_ROWID),&2) = &1;
Note: &2 is 50 where as &1 is number of instance invoked from shell between (0-49), 
SA_ROWID is acutal rowid from SA Table

Now, UPDATE Statement is as Below:

  C_COMMIT         CONSTANT PLS_INTEGER := 10000;
  FETCH NIT_CURSOR BULK COLLECT INTO BLK_NIT_SA LIMIT C_COMMIT;     

  FORALL indx IN 1 .. BLK_NIT_SA.COUNT
  UPDATE SA
  SET INS_TRX_ID = BLK_NIT_SA(indx).NEW_TRX_1,
      UPD_TRX_ID = BLK_NIT_SA(indx).NEW_TRX_2
  WHERE ROWID = BLK_NIT_SA(indx).SA_ROWID;

This Script is getting invoked by KSH in 50 Parallel SQLPLUS Sessions:

typeset -A sqlFile
sqlFile[2.3.sql]=50
for counter_i in "${!sqlFile[@]}"
do
  for counter_j in {0..$((${sqlFile[$counter_i]}-1))}
  do
     sqlplus -s ${DATABASE_CONNECT} @${SQLFILE} "${SPOOLFILE}" ${sqlFile[$counter_i]} $counter_j &
  done
done

As per my understanding, since i am directly using the rowid of the table SA, which would be the fastest way of accessing row and 50 SQL Session should be actually process the data much faster, where as with my observation, soon i increase the number the parallel processes of SQL Session from 50 to 100, my update time of each process increases by 2 hours i.e. from 4.5 hours to 7.5 hours.

I am intended to finish it in 1.5 to 2 hours. not sure if its realistic.

Can someone please help me with above?

2 Answers

2
Jon Heller On

Converting the PL/SQL into a single SQL statement and using statement parallelism should significantly improve the performance:

alter session enable parallel dml;

merge /*+ parallel(50) */ into sa using
(
    select sa_rowid, new_trx_1, new_trx_2
    from nit_sa
) nit_sa
on (sa.rowid = nit_sa.sa_rowid)
when matched then update set
    sa.ins_trx_id = nit_sa.new_trx_1,
    sa.upd_trx_id = nit_sa.new_trx_2;

The above statement will only read from the table once. Oracle will automatically divide the table into pieces called granules, there's no need to manually partition the tables.

Parallelism is easy in theory - just add a hint and the run time can improve by an order of magnitude. But getting it right in practice can be tricky. Parallelism requires Enterprise Edition, sufficient resources, a sane configuration, etc. And running parallel DML will lock the entire table until the statement is complete - no other DML can run on the table at the same time. (Although other processes can still read from the table.)

3
Ramki On

500+ million records is not a good idea, you can tell tables structure, and what column needs to be updated. We can consider the following options:

  1. if required new values on the table can be calculated dynamically then we can create a view on top of the table so that whenever we read the view, we can get the required values.

  2. create new table and insert all the data from existing table as required (with new values for those columns) and drop the old table, rename the new one