netezza left outer join query performance

823 views Asked by At

I have a question related to Netezza query performance .I have 2 tables Table A and Table B and Table B is the sub set of Table A with data alteration .I need to update those new values to table A from table B We can have 2 approaches here

1) Left outer join and select relevant columns and insert in target table

2) Insert table a data into target table and update those values from tableB using join

I tried both and logically both are same.But Explain plan is giving different cost

for normal select

a)Sub-query Scan table "TM2" (cost=0.1..1480374.0 rows=8 width=4864 conf=100)

update

b)Hash Join (cost=356.5..424.5 rows=2158 width=27308 conf=21)

for left outer join

Sub-query Scan table "TM2" (cost=51.0..101474.8 rows=10000000 width=4864 conf=100)

From this I feel left outer join is better .Can anyone put some thought on this and guide

Thanks

1

There are 1 answers

1
Jeremy Fortune On

The reason that the cost of insert into table_c select ... from table_a; update table_c set ... from table_b; is higher is because you're inserting, deleting, then inserting. Updates in Netezza mark the records to be updated as deleted, then inserts new rows with the updated values. Once the data is written to an extent, it's never (to my knowledge) altered.

With insert into table_c select ... from table_a join table_b using (...); you're only inserting once, thereby only updating all the zone maps once. The cost will be noticeably lower.

Netezza does an excellent job of keeping you away from the disk on reads, but it will write to the disk as often as you tell it to. In the case of updates, seemingly more so. Try to only write as often as is necessary to gain benefits of new distributions and co-located joins. Any more than that, and you're just using excess commit actions.