I have the following dataset:
|value|
+-----+
| 1|
| 2|
| 3|
I want to create a new column newValue that takes the value of newValue from the previous row and does something with it. For simplicity just increment by 3. If there is no previous column, in case of the first row, value should be taken. The result will look like this:
|value|newValue|
+-----+--------+
| 1| 1|
| 2| 4| # newValue previous row (1) + 3
| 3| 7| # newValue previous row (4) + 3
I tried it with the following code, but it seems that the new column newValue does not exist yet when trying to access the previous row. How can I access the newly created column within withColumn?
val data = Seq(1, 2, 3)
val dataset: Dataset[Int] = data.toDS()
val windowSpec = Window.orderBy("value")
val result = dataset.withColumn("newValue", coalesce(lag("newValue", 1).over(windowSpec) + 3, $"value"))
This leads to the following error message:
org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `newValue` cannot be resolved. Did you mean one of the following? [`value`]
I believe all you need is
running sum
with some constant value3
Output