Always true conditional aggregate vs count

160 views Asked by At

I've come across coding that is using a 1=1 conditional aggregate instead of a count

select sum(case when 1=1 then 1 else 0 end)

In my eyes, this is equivalent to:

select count(1)

At first I figured this was a place holder where the developer had intended to revisit with some additional logic in this conditional aggregate, but then I found it in another script, and then another. This prompted me to ask around and I discovered a previous developer had stressed that this sum condition is in all cases more efficient and faster than a count function would ever be (stressed it well enough that other developers followed the standard later on). Seems quite counter intuitive to me, forcing the system to evaluate a 1=1 condition might be minor, but it's still additional work over the count function. Figured I'd consult on here before I came back with a firm statement.

a) is there any truth to what this developer was saying the conditional aggregate will be faster than the count

completely side question: b) was there ever a db system that would evaluate the condition aggregate quicker than a count?

This is an oracle 11g database, though I suspect the scripts were written for oracle 8i

And as bonus points...I've been asked to optimize this code. Will removing replacing this with a count function improve speed at all? Record counts exceed 100 million at times.

2

There are 2 answers

0
William Robertson On BEST ANSWER

Executive summary: it makes no difference and it never has in Oracle, since at least version 6 (1989), which was when I first started hearing about clever ways to count faster by selecting primary key columns etc, as if Oracle was not aware that people sometimes count things.

You can see what the parser/optimiser does with the expression by using it in a filter and checking the 'Predicates' section of the execution plan.

create table demo
( demo_id integer generated always as identity constraint demo_pk primary key
, othercolumn integer );

insert into demo (othercolumn) select dbms_random.value(0,1000)
from dual connect by rownum <= 10000;

commit;

call dbms_stats.gather_table_stats(user, 'demo');

Normal count(*) (Oracle 12.1):

select count(*) from demo
having count(*) > 1

Plan hash value: 1044424301

--------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |     1 |     7   (0)| 00:00:01 |
|*  1 |  FILTER                |         |       |            |          |
|   2 |   SORT AGGREGATE       |         |     1 |            |          |
|   3 |    INDEX FAST FULL SCAN| DEMO_PK | 10000 |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(COUNT(*)>1)

Clever superfast expression:

select sum(case when 1=1 then 1 else 0 end) from demo
having sum(case when 1=1 then 1 else 0 end) > 0

Plan hash value: 1044424301

--------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |     1 |     7   (0)| 00:00:01 |
|*  1 |  FILTER                |         |       |            |          |
|   2 |   SORT AGGREGATE       |         |     1 |            |          |
|   3 |    INDEX FAST FULL SCAN| DEMO_PK | 10000 |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(SUM(1)>0)

Notice the Predicates section, which shows that the sum expression has been evaluated and replaced with sum(1). (I don't have time to dig into tracefiles right now, but I am pretty sure they would show that the rewrite occurred before CBO optimisation.)

Here's what it does with count(1), another expression sometimes thought to be more efficient than the standard one:

select count(1) from demo
having count(1) > 1

Plan hash value: 1044424301

--------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |     1 |     7   (0)| 00:00:01 |
|*  1 |  FILTER                |         |       |            |          |
|   2 |   SORT AGGREGATE       |         |     1 |            |          |
|   3 |    INDEX FAST FULL SCAN| DEMO_PK | 10000 |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(COUNT(*)>1)

And here's the plan without filters:

select sum(case when 1=1 then 1 else 0 end) as rowcount
from   demo

Plan hash value: 2242940774

-------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |         |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| DEMO_PK | 10000 |     7   (0)| 00:00:01 |
-------------------------------------------------------------------------

As you can see, they are all the same (apart from differences in my artificial filter condition).

Also, sum(1) does not give the same results as count(*) when there are no rows:

  select sum(case when 1=1 then 1 else 0 end) as sum1
       , count(*)
  from   demo
  where  1=2

      SUM1   COUNT(*)
---------- ----------
                    0
6
Littlefoot On

The simplest way to find the answer is, I guess, explaining both options and see what Oracle says.

First, a usual COUNT option:

SQL> set autotrace on explain
SQL> select /*+ choose */ count(*) from tob_stavke_rac;

  COUNT(*)
----------
  53195373


Execution Plan
----------------------------------------------------------
Plan hash value: 3099656827

--------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |     1 | 30846   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE       |                |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| SRC_S_STA_FK_I |    53M| 30846   (2)| 00:00:02 |
--------------------------------------------------------------------------------

Then, unusual SUM with CASE:

SQL> select /*+ choose */ sum(case when 1 = 1 then 1 else 0 end) from tob_stavke_rac;

SUM(CASEWHEN1=1THEN1ELSE0END)
-----------------------------
                     53195373


Execution Plan
----------------------------------------------------------
Plan hash value: 3099656827

--------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |     1 | 30846   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE       |                |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| SRC_S_STA_FK_I |    53M| 30846   (2)| 00:00:02 |
--------------------------------------------------------------------------------

SQL>

In that database, it appears that there's no advantage. Maybe it was the case back in Oracle 8i (which is, as of today, a 20-years old piece of software), but today, in 12c version, I wouldn't say so. Besides, Oracle tends to rewrite query if the Optimizer concludes that - if rewrited - it works citius, altius, fortius (a tribute to Olympic Games in South Corea).

[EDIT, showing how explain plan looks in RBO]

Some info is, as you can see, missing ...

SQL> select count(*) from tob_stavke_rac;

  COUNT(*)
----------
  53195373


Execution Plan
----------------------------------------------------------
Plan hash value: 3371741006

---------------------------------------------
| Id  | Operation          | Name           |
---------------------------------------------
|   0 | SELECT STATEMENT   |                |
|   1 |  SORT AGGREGATE    |                |
|   2 |   TABLE ACCESS FULL| TOB_STAVKE_RAC |
---------------------------------------------

Note
-----
   - rule based optimizer used (consider using cbo)

SQL>