For a while I've trying to get the time to write this question and explain the issue as best as I can, so forgive me in advance for the long text.
My environment:
- Oracle Database 12.2 running on Red Hat 7 ( R.A.C 2 nodes ) - 16CPU and 64GB RAM per node.
- Parallel Force Local is set to TRUE to force that parallel server processes can only execute on the same node where the SQL statement was started.
We have a very big database that contains a lot of schemas which serve several applications. Most of the applications are in reality batch processing engines in PL/SQL that treat hundredths of millions of records, so most of the big tables have been configured with PARALLEL DEGREE DEFAULT for performance reasons. Tables are partitioned and with advanced compression.
Besides some reporting BI tools for exploitation purposes, many end users have access with SQL Developer to the System ( only in read mode ) for QA checks. I never liked, but sometimes you have to accept how things are.
To control a bit the things, I designed a specific logon trigger that covers not only auditing features but also some aspects of the incoming sessions:
- An End User accessing with SQL Developer might open only 2 sessions.
- An End User accessing with SQL Developer runs an execute immediate ALTER SESSION DISABLE PARALLEL QUERY. Unfortunately, I know that some users are activating it by themselves. The ALTER SESSION ENABLE/DISABLE PARALLEL QUERY is inherited by the CREATE SESSION privilege or CONNECT role, so I can't do anything about this.
- An End User accessing with SQL Developer is assigned to a specific profile with restrictions in terms of CPU , DISK READS, etc.
The logon triggers allows or not the access based on a set of additional rules , but for the purpose of the question they don't matter.
Let's see how queries running on SQL Developer behave against tables with PARALLEL enable:
Scenario
I have a table containing 8 billion records split by different partitions. A user log in with sql developer and runs this query
SELECT COUNT(*) FROM MY_SCHEMA.MY_TABLE PARTITION ( MY_PARTITION ) ;
183.940.801 rows
As the table has no indexes, the CBO runs a TABLE FULL SCAN in PARALLEL using as many slaves as it considers. It takes 6 seconds to complete. So far, nothing wrong.
Meanwhile I am monitoring the session ( you can see all sessions active when the query is running )
INSTANCE SID USERNAME PROGRAM SQL_ID STATUS
---------- ------- -------------------- -------------------------------------------------- -------------------- --------
2 6 FDM_ADM_GRID [email protected] (P00N) dtf8d89xg7muq ACTIVE
2 128 FDM_ADM_GRID [email protected] (P004) dtf8d89xg7muq ACTIVE
2 140 FDM_ADM_GRID [email protected] (P007) dtf8d89xg7muq ACTIVE
2 256 FDM_ADM_GRID [email protected] (P00O) dtf8d89xg7muq ACTIVE
2 284 FDM_ADM_GRID [email protected] (P00D) dtf8d89xg7muq ACTIVE
2 388 FDM_ADM_GRID [email protected] (P00C) dtf8d89xg7muq ACTIVE
2 400 FDM_ADM_GRID [email protected] (P00P) dtf8d89xg7muq ACTIVE
2 510 FDM_ADM_GRID [email protected] (P00E) dtf8d89xg7muq ACTIVE
2 621 FDM_ADM_GRID [email protected] (P00F) dtf8d89xg7muq ACTIVE
2 641 FDM_ADM_GRID [email protected] (P00Q) dtf8d89xg7muq ACTIVE
2 739 FDM_ADM_GRID [email protected] (P008) dtf8d89xg7muq ACTIVE
2 771 FDM_ADM_GRID [email protected] (P005) dtf8d89xg7muq ACTIVE
2 888 FDM_ADM_GRID [email protected] (P00R) dtf8d89xg7muq ACTIVE
2 893 FDM_ADM_GRID [email protected] (P00G) dtf8d89xg7muq ACTIVE
2 996 FDM_ADM_GRID [email protected] (P00H) dtf8d89xg7muq ACTIVE
2 1010 FDM_ADM_GRID [email protected] (P00S) dtf8d89xg7muq ACTIVE
2 1015 FDM_ADM_GRID SQL Developer dtf8d89xg7muq ACTIVE
2 1109 FDM_ADM_GRID [email protected] (P00T) dtf8d89xg7muq ACTIVE
2 1116 FDM_ADM_GRID [email protected] (P00I) dtf8d89xg7muq ACTIVE
2 1230 FDM_ADM_GRID [email protected] (P00J) dtf8d89xg7muq ACTIVE
2 1254 FDM_ADM_GRID [email protected] (P00U) dtf8d89xg7muq ACTIVE
2 1352 FDM_ADM_GRID [email protected] (P009) dtf8d89xg7muq ACTIVE
2 1376 FDM_ADM_GRID [email protected] (P001) dtf8d89xg7muq ACTIVE
2 1383 FDM_ADM_GRID [email protected] (P006) dtf8d89xg7muq ACTIVE
2 1477 FDM_ADM_GRID [email protected] (P00V) dtf8d89xg7muq ACTIVE
2 1488 FDM_ADM_GRID [email protected] (P000) dtf8d89xg7muq ACTIVE
2 1506 FDM_ADM_GRID [email protected] (P00K) dtf8d89xg7muq ACTIVE
2 1604 FDM_ADM_GRID [email protected] (P002) dtf8d89xg7muq ACTIVE
2 1617 FDM_ADM_GRID [email protected] (P00L) dtf8d89xg7muq ACTIVE
2 1620 FDM_ADM_GRID [email protected] (P00A) dtf8d89xg7muq ACTIVE
2 1740 FDM_ADM_GRID [email protected] (P003) dtf8d89xg7muq ACTIVE
2 1743 FDM_ADM_GRID [email protected] (P00M) dtf8d89xg7muq ACTIVE
2 1851 FDM_ADM_GRID [email protected] (P00B) dtf8d89xg7muq ACTIVE
Once the query finishes
SQL> r
1* select inst_id as instance , sid, username, program, sql_id, status from gv$session where username = 'FDM_ADM_GRID'
INSTANCE SID USERNAME PROGRAM SQL_ID STATUS
---------- ------- -------------------- -------------------------------------------------- -------------------- --------
2 1015 FDM_ADM_GRID SQL Developer INACTIVE
So far so good. Now let's run this other query in SQL Developer
SELECT * FROM MY_SCHEMA.MY_TABLE partition ( MY_PARTITION ) fetch first 1000 rows only;
The query retrieves the first 1000 rows almost instantly. But let's see the difference in the database
When it was running
SQL> r
1* select inst_id as instance , sid, username, program, sql_id, status from gv$session where username = 'FDM_ADM_GRID'
INSTANCE SID USERNAME PROGRAM SQL_ID STATUS
---------- ------- -------------------- -------------------------------------------------- -------------------- --------
2 6 FDM_ADM_GRID [email protected] (P00N) 9jyvj64ag15mv ACTIVE
2 128 FDM_ADM_GRID [email protected] (P007) 9jyvj64ag15mv ACTIVE
2 140 FDM_ADM_GRID [email protected] (P004) 9jyvj64ag15mv ACTIVE
2 256 FDM_ADM_GRID [email protected] (P00D) 9jyvj64ag15mv ACTIVE
2 284 FDM_ADM_GRID [email protected] (P00O) 9jyvj64ag15mv ACTIVE
2 388 FDM_ADM_GRID [email protected] (P00P) 9jyvj64ag15mv ACTIVE
2 400 FDM_ADM_GRID [email protected] (P00C) 9jyvj64ag15mv ACTIVE
2 510 FDM_ADM_GRID [email protected] (P00E) 9jyvj64ag15mv ACTIVE
2 621 FDM_ADM_GRID [email protected] (P00F) 9jyvj64ag15mv ACTIVE
2 641 FDM_ADM_GRID [email protected] (P00Q) 9jyvj64ag15mv ACTIVE
2 739 FDM_ADM_GRID [email protected] (P008) 9jyvj64ag15mv ACTIVE
2 771 FDM_ADM_GRID [email protected] (P005) 9jyvj64ag15mv ACTIVE
2 888 FDM_ADM_GRID [email protected] (P00G) 9jyvj64ag15mv ACTIVE
2 893 FDM_ADM_GRID [email protected] (P00R) 9jyvj64ag15mv ACTIVE
2 996 FDM_ADM_GRID [email protected] (P00H) 9jyvj64ag15mv ACTIVE
2 1010 FDM_ADM_GRID [email protected] (P00S) 9jyvj64ag15mv ACTIVE
2 1015 FDM_ADM_GRID SQL Developer ACTIVE
2 1109 FDM_ADM_GRID [email protected] (P00I) 9jyvj64ag15mv ACTIVE
2 1116 FDM_ADM_GRID [email protected] (P00T) 9jyvj64ag15mv ACTIVE
2 1230 FDM_ADM_GRID [email protected] (P00J) 9jyvj64ag15mv ACTIVE
2 1254 FDM_ADM_GRID [email protected] (P00U) 9jyvj64ag15mv ACTIVE
2 1352 FDM_ADM_GRID [email protected] (P006) 9jyvj64ag15mv ACTIVE
2 1376 FDM_ADM_GRID [email protected] (P009) 9jyvj64ag15mv ACTIVE
2 1383 FDM_ADM_GRID [email protected] (P001) 9jyvj64ag15mv ACTIVE
2 1477 FDM_ADM_GRID [email protected] (P000) 9jyvj64ag15mv ACTIVE
2 1488 FDM_ADM_GRID [email protected] (P00V) 9jyvj64ag15mv ACTIVE
2 1506 FDM_ADM_GRID [email protected] (P00K) 9jyvj64ag15mv ACTIVE
2 1604 FDM_ADM_GRID [email protected] (P002) 9jyvj64ag15mv ACTIVE
2 1617 FDM_ADM_GRID [email protected] (P00A) 9jyvj64ag15mv ACTIVE
2 1620 FDM_ADM_GRID [email protected] (P00L) 9jyvj64ag15mv ACTIVE
2 1740 FDM_ADM_GRID [email protected] (P003) 9jyvj64ag15mv ACTIVE
2 1743 FDM_ADM_GRID [email protected] (P00M) 9jyvj64ag15mv ACTIVE
2 1851 FDM_ADM_GRID [email protected] (P00B) 9jyvj64ag15mv ACTIVE
As soon as the query completes, I check back again, however now all the slaves are still there and in status ACTIVE. I would have though that as soon as the QC is finished and marked as inactive, Oracle would have closed all slave processes. But it doesn't
`INSTANCE SID USERNAME PROGRAM` SQL_ID STATUS
---------- ------- -------------------- -------------------------------------------------- -------------------- --------
2 6 FDM_ADM_GRID [email protected] (P00N) 9jyvj64ag15mv ACTIVE
2 128 FDM_ADM_GRID [email protected] (P007) 9jyvj64ag15mv ACTIVE
2 140 FDM_ADM_GRID [email protected] (P004) 9jyvj64ag15mv ACTIVE
2 256 FDM_ADM_GRID [email protected] (P00D) 9jyvj64ag15mv ACTIVE
2 284 FDM_ADM_GRID [email protected] (P00O) 9jyvj64ag15mv ACTIVE
2 388 FDM_ADM_GRID [email protected] (P00P) 9jyvj64ag15mv ACTIVE
2 400 FDM_ADM_GRID [email protected] (P00C) 9jyvj64ag15mv ACTIVE
2 510 FDM_ADM_GRID [email protected] (P00E) 9jyvj64ag15mv ACTIVE
2 621 FDM_ADM_GRID [email protected] (P00F) 9jyvj64ag15mv ACTIVE
2 641 FDM_ADM_GRID [email protected] (P00Q) 9jyvj64ag15mv ACTIVE
2 739 FDM_ADM_GRID [email protected] (P008) 9jyvj64ag15mv ACTIVE
2 771 FDM_ADM_GRID [email protected] (P005) 9jyvj64ag15mv ACTIVE
2 888 FDM_ADM_GRID [email protected] (P00G) 9jyvj64ag15mv ACTIVE
2 893 FDM_ADM_GRID [email protected] (P00R) 9jyvj64ag15mv ACTIVE
2 996 FDM_ADM_GRID [email protected] (P00H) 9jyvj64ag15mv ACTIVE
2 1010 FDM_ADM_GRID [email protected] (P00S) 9jyvj64ag15mv ACTIVE
2 1015 FDM_ADM_GRID SQL Developer INACTIVE
2 1109 FDM_ADM_GRID [email protected] (P00I) 9jyvj64ag15mv ACTIVE
2 1116 FDM_ADM_GRID [email protected] (P00T) 9jyvj64ag15mv ACTIVE
2 1230 FDM_ADM_GRID [email protected] (P00J) 9jyvj64ag15mv ACTIVE
2 1254 FDM_ADM_GRID [email protected] (P00U) 9jyvj64ag15mv ACTIVE
2 1352 FDM_ADM_GRID [email protected] (P006) 9jyvj64ag15mv ACTIVE
2 1376 FDM_ADM_GRID [email protected] (P009) 9jyvj64ag15mv ACTIVE
2 1383 FDM_ADM_GRID [email protected] (P001) 9jyvj64ag15mv ACTIVE
2 1477 FDM_ADM_GRID [email protected] (P000) 9jyvj64ag15mv ACTIVE
2 1488 FDM_ADM_GRID [email protected] (P00V) 9jyvj64ag15mv ACTIVE
2 1506 FDM_ADM_GRID [email protected] (P00K) 9jyvj64ag15mv ACTIVE
2 1604 FDM_ADM_GRID [email protected] (P002) 9jyvj64ag15mv ACTIVE
2 1617 FDM_ADM_GRID [email protected] (P00A) 9jyvj64ag15mv ACTIVE
2 1620 FDM_ADM_GRID [email protected] (P00L) 9jyvj64ag15mv ACTIVE
2 1740 FDM_ADM_GRID [email protected] (P003) 9jyvj64ag15mv ACTIVE
2 1743 FDM_ADM_GRID [email protected] (P00M) 9jyvj64ag15mv ACTIVE
2 1851 FDM_ADM_GRID [email protected] (P00B) 9jyvj64ag15mv ACTIVE
As long as the session remains open, the QC will remain inactive and the slaves ACTIVE, thereby although they are not doing anything, they still count as parallel servers. Either the session is closed or the user runs another another query for me to notice changes in parallel use. But there won't be any if the user goes to take a coffee, or goes to launch or is doing something else. With more than 100 users concurrently working, you might imagine the headache. I had to design some workarounds:
- I had to create a new control inside the trigger to identify the inactive time of QC vs slaves in status ACTIVE, in order to identify how many sessions the user has already opened.
- I had to create a cleaning process to disconnect sessions in this status after a window time of 1 hour
- I can't use limit sessions in Profiles because they don't difference between QC or SLAVES.
- No matter how many things I configured, sometimes I ran out of parallel processes and if a batch process is executed in business hours ( which happens a lot ) I faced sometimes lack of parallel availability for those important processes due the number of slaves occupied by the inactive sessions.
My questions are the following:
- Why the slaves remain in status ACTIVE when the QC has already finished ? Should not the slaves be terminated as soon as the QC has delivered the results ?
- Why this behaviour does not happen neither in SQLPLUS nor in Java Pool solutions ( as SAP Business Objects ) when they run very similar queries ?
- Is there a way to disable parallel capabilities from the end users, no matter they try to activate them by ENABLE PARALLEL QUERY or through HINTS ?
I apologise for the long question, but I did not want to leave anything. I would really appreciate any insight about this.
Thank you all.
Your queries are not truly completing. Although your query only fetches the first 1000 rows, SQL Developer is only fetching the first 50 rows of those 1000 rows. The IDE won't close the cursor until you scroll to the last row. Once you retrieve all the data, those parallel processes disappear. Make sure you see "All Rows Fetched: 1000 in X seconds", instead of ""Fetched 50 rows in Y seconds". (I wish SQL Developer would make it more visually obvious that there are additional rows waiting.) You won't see this problem in SQL*Plus because SQL*Plus always grabs all the rows.
When only the first N rows are fetched, those parallel processes are "ACTIVE" but are not doing anything. You should be able to ignore those sessions since they are not using any significant resources.
If you're worried just about the number of parallel sessions, you might want to adjust your expectations. I used to be in the same situation as you - constantly telling users that their (incomplete) queries were hogging all the parallel sessions. Eventually, I discovered that it was only a problem because I had created an artificially scarce resource. Oracle parallel processes are usually lightweight, and databases can support way more parallel processes than most people think they can.
What are your parameter values for PARALLEL_MAX_SERVERS, PARALLEL_THREADS_PER_CPU, and CPU_COUNT? Look at the default value for PARALLEL_MAX_SERVERS. Per the manual, the default number is:
PARALLEL_MAX_SERVERS = PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5
.Most DBAs see a maximum number of parallel threads in the hundreds, panic, and then decrease that number. And then we start yelling at developers for using an unimportant resource that was artificially limited. Instead, we should crank the number back up to the default, and just ignore random parallel sessions. If a user isn't exceeding IO or CPU limits, it shouldn't matter how many parallel threads they use.
(With the possible exception of preventing massive parallel query session usage. Put your users in a different profile, and set their SESSIONS_PER_USER to a few dozen. Do NOT limit it to just 1 or 2. IDEs need extra sessions for multiple tabs, background processes that grab metadata, and debug sessions. If you set the limit to 2, your developers will not be able to use an IDE properly.)
EDIT (response to comments)
I'm not sure if you can read much into the status of the query coordinator. The QC does several things, but ideally it will be idle most of the time while the parallel sessions handle most of the work.
With the producer/consumer model, half of the parallel sessions may be receiving data but not really doing anything - like they're just memory structures in some operations. Parallel sessions may switch between active and inactive, as not all steps will need as many sessions. But we wouldn't want Oracle to close sessions in the middle, as they may be needed later and we wouldn't want to waste time opening and closing sessions.
There are dozens of factors that affect the degree of parallelism, but as far as I know increasing PARALLEL_MAX_SERVERS will not affect the number of parallel servers requested for a single statement. (But if the statement was already asking for more servers than the max, increasing the parameter may affect the number of sessions allocated).
It might feel like SQL statements are just randomly grabbing all the parallel sessions, but ultimately DOP calculations almost always follow deterministic rules. It's just that the rules are so complicated, it's difficult to tell how it works. For example, one common point of confusion is that whenever a query adds sorting or grouping, the number of parallel sessions is doubled.