I am new to Databricks and was required to implement the snowflake code in Databricks.
The snowflake table, code and output look like below:
table:
| id | col1 | hn |
|---|---|---|
| ee1 | null | 1 |
| ee1 | null | 2 |
| ee1 | test | 3 |
| ee1 | test | 4 |
| ee1 | test2 | 5 |
Query used:
SELECT ID, FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value, LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS last_value FROM table
Output:
| id | first_value | last_value |
|---|---|---|
| ee1 | test | test2 |
| ee1 | test | test2 |
| ee1 | test | test2 |
| ee1 | test | test2 |
| ee1 | test | test2 |
When I tried the same query in Databricks using Spark SQL, ignore nulls did not work properly.
Can anyone provide the equivalent query for this in Databricks?
The key point is the window frame specification:
If not defined explicitly the default is
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW