When do we need to use mapGroupsWithState function on Static DataFrame?
As per the documentation(https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/sql/streaming/GroupState.html),
In case of a batch Dataset, there is only one invocation and state object will be empty as there is no prior state. Essentially, for batch Datasets, [map/flatMap]GroupsWithState is equivalent to [map/flatMap]Groups and any updates to the state and/or timeouts have no effect.
Then why this method exist for Static DataFrame?
Spark is trying to ensure the same query runs for both batch and streaming - the majority of operations are supporting both batch and streaming. (The marketing word is "Unified batch and streaming".) That is "best-effort", and there're realistic restriction on streaming query so some operations or workloads won't be supported in streaming query.