Window function ignore nulls not working in Databricks

366 views Asked by At

I am new to Databricks and was required to implement the snowflake code in Databricks.

The snowflake table, code and output look like below:

table:

id col1 hn
ee1 null 1
ee1 null 2
ee1 test 3
ee1 test 4
ee1 test2 5

Query used:

SELECT ID, FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value, LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS last_value FROM table

Output:

id first_value last_value
ee1 test test2
ee1 test test2
ee1 test test2
ee1 test test2
ee1 test test2

When I tried the same query in Databricks using Spark SQL, ignore nulls did not work properly.

Can anyone provide the equivalent query for this in Databricks?

1

There are 1 answers

0
Lukasz Szozda On BEST ANSWER

The key point is the window frame specification:

SELECT ID, 
  FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value, 
  LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn 
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_value 
FROM table;

If not defined explicitly the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW