Assuming that I have the following piece of code in the SELECT clause which is being executed on Spark:

...
MEAN(CASE
         WHEN (col1 = 'A'
               AND (col3 = 'A' OR col4 = 'B')) THEN col2
     END) AS testing,
...

What would be the output of this query when col2 is NULL? Are the rows containing col2=NULL be ignored by the MEAN function?

3 Answers

1
Neville Kuyt On Best Solutions

Disclaimer - don't know Apache Spark!

I've created a SQL Fiddle - http://sqlfiddle.com/#!9/6f7d5e/3.

If col2 is null, it is not included in the average, unless all the matching records are null.

1
Gordon Linoff On

The result will be NULL. It will have the type of col2 -- this might matter in some databases (or if you are saving the result to a table).

What is the MEAN() function? To calculate the average, use AVG(). This is the standard function for calculating averages in SQL.

0
Salman A On

I believe you meant AVG. It will ignore NULL values. So if the result of case expression is:

100
200
300
NULL

Then the result would be (100 + 200 + 300) / 3 = 200 instead of (100 + 200 + 300) / 4. And in case the result of the case expression is:

NULL
NULL
NULL
NULL

Then the result will be NULL instead of 0.