Should I use NOT IN or NOT EXISTS when UPDATING DATA

75 views Asked by At

sql

UPDATE PERSON
SET ENROLMENT_DATE = CURRENT_TIMESTAMP
WHERE NOT EXISTS (SELECT * FROM FEE_PAYMENT WHERE FEE_PAYMENT.PERSON_ID = PERSON.PERSON_ID);

UPDATE PERSON
SET ENROLMENT_DATE = CURRENT_TIMESTAMP
WHERE PERSON_ID NOT IN (SELECT  FEE_PAYMENT.PERSON_ID FROM FEE_PAYMENT);

Which one is better and why? I like the NOT IN in this case because it reads easier but I know I can have problems if I stumble upon a null value.

2

There are 2 answers

0
Littlefoot On

I guess your best option is to actually compare both versions of your code, check what explain plan says and then decide which one to use.

Though, note that - on small data sets - you won't notice any difference.

An alternative might also be merge, e.g.

merge into person a
  using fee_payment b
  on (a.person_id = b.person_id)
when matched then update set
  a.enrollment_date = current_timestamp;

Also, at the first sight, index on both person_id columns would probably improve performance.

0
Paul W On

From a performance standpoint, it usually doesn't matter which you choose. Oracle will rewrite either of them to the form it thinks best for performance, either a nested loops against an index or a hash anti-join. Which of the two ways you've written it won't compel it to select one or the other.

That being said, it is much harder to control subqueries with hints than it is regular joins. So if Oracle makes a mistake about cardinality estimates and chooses the wrong join method or doesn't run the subquery at the proper time relative to other joins in the main query block, it is often helpful to rewrite these as normal joins so that you can easily control its behavior with hints (assuming the programmer has sufficient knowledge of how Oracle works and how hints affect the optimizer to safely leverage them). For NOT IN / NOT EXISTS that means writing an outer join and testing for nullity on the join key. For selects it would be:

SELECT p.*
  FROM person p,
       fee_payment fp
 WHERE p.person_id = fp.person_id(+)
   AND fp.person_id IS NULL

Or using ANSI-syntax:

SELECT p.*
  FROM person p
       LEFT OUTER JOIN fee_payment fp ON p.person_id = fp.person_id
 WHERE fp.person_id IS NULL

You can then easily hint it if needed:

SELECT /*+ ORDERED USE_NL_WITH_INDEX(fp) */ p.*
  FROM person p,
       fee_payment fp
 WHERE p.person_id = fp.person_id(+)
   AND fp.person_id IS NULL

Or

SELECT /*+ ORDERED USE_HASH(fp) */ p.*
  FROM person p,
       fee_payment fp
 WHERE p.person_id = fp.person_id(+)
   AND fp.person_id IS NULL

Or however you need it.

With update and delete statements this usually requires using merge instead so you can use a normal select with joins to isolate the rows to be modified:

 MERGE INTO person tgt
 USING (SELECT /*+ ORDERED USE_HASH(fp) PARALLEL(4) */ -- or whatever you need
               p.rowid row_id
          FROM person p,
               fee_payment fp
         WHERE p.person_id = fp.person_id(+)
           AND fp.person_id IS NULL) src
   ON (tgt.rowid = src.row_id)
 WHEN MATCHED THEN UPDATE SET tgt.enrolment_date = current_timestamp