mapGroupsWithState function on static dataframe and why?

359 views Asked by At

When do we need to use mapGroupsWithState function on Static DataFrame?

As per the documentation(https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/sql/streaming/GroupState.html),

In case of a batch Dataset, there is only one invocation and state object will be empty as there is no prior state. Essentially, for batch Datasets, [map/flatMap]GroupsWithState is equivalent to [map/flatMap]Groups and any updates to the state and/or timeouts have no effect.

Then why this method exist for Static DataFrame?

1

There are 1 answers

0
Jungtaek Lim On

Spark is trying to ensure the same query runs for both batch and streaming - the majority of operations are supporting both batch and streaming. (The marketing word is "Unified batch and streaming".) That is "best-effort", and there're realistic restriction on streaming query so some operations or workloads won't be supported in streaming query.