We are using Direct-Path approach to fill one of our Data Marts. We are also pointing the exact subpartition to which insert should be done. Here is the sql:
INSERT /*+ APPEND_VALUES */ INTO DWH.DM_CORE_AC_SALDOS SUBPARTITION(SYS_SUBP7709) VALUES (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 ,:B14 ,:B15 ,:B16 ,:B17 ,:B18 ,:B19 ,:B20 ,:B21 )
All process is controlled by dbms_parallel_execute with parallel level of 40. That is there are 40 separate sessions doing their own insert to separate subpartitions. However, the process is taking too long to finish, in sessions windows we can see that sessions are not locking each other, but they have wait class 'User I/O' and wait state 'WAITING' . Here is the snapshot of how sessions windows looks during the process:
Is there anything we are doing wrong? Maybe we should have set some database parameters right or changed a table structure (initrans or pctfree) ?
Edit : Almost all sessions have event 'db file sequential read'. I suppose reason why it is running slow may be because of an index maintenance operations. I checked file# from dba_data_files and most of the show files allocated for index tablespace. I read that direct path insert does index maintanance in the end of insert so may be this is a problem. In this case should I avoid using direct path insert?

Using direct path writes takes exclusive locks on the corresponding table/partitions. Only one session at a time can do direct-path writes on an object.
Running 40 parallel processes won't help here. You're running 40 transactions one after the other. If you want to use direct path writes, you're better off running it all the same transaction; ditch dbms_parallel_execute.