declaring output schema when using gapply in Sparkr

86 views Asked by At

I would like to use gapply according to https://spark.apache.org/docs/latest/sparkr.html#gapply

The problem is I am returning a list of 2 dataframes.

return(list(df1, df2))

How do I declare the output schema in this case?

1

There are 1 answers

0
user9279745 On

You cannot use function returning arbitrary list. As per gapply documentation (emphasis mine):

The function func takes as argument a key - grouping columns and a data frame - a local R data.frame. The output of func is a local R data.frame.

You might be make it work by treating each data.frame as a single Row of type equivalent to something struct<col1:array<typeofcol1>, col2:array<typeofcol2>, ..., coln:array<typeofcoln>>, but only as long as both output data.frames have identical schema.