I have a SparkSQL DataFrame.
Some entries in this data are empty but they don't behave like NULL or NA. How could I remove them? Any ideas?
In R I can easily remove them but in sparkR it say that there is a problem with the S4 system/methods.
Thanks.
SparkR Column provides a long list of useful methods including
isNull
andisNotNull
:Please keep in mind that there is no distinction between
NA
andNaN
in SparkR.If you prefer operations on a whole data frame there is a set of NA functions including
fillna
anddropna
:Both can be adjusted to consider only some subset of columns (
cols
), anddropna
has some additional useful parameters. For example you can specify minimal number of not null columns: