I am using Spark Java (not scala, python).
I have to change my code so that my spark query will select all columns rather than a specific set of columns. (Like using select *). Before when I had a specific set of columns, it is easy for me to know the exact position/index of each column because it is in the order of my select. However, since I am now selecting all, I do not know the order exactly.
I need the position/index of particular columns so that I can use the function .isNullAt() because it requires position/index and not the string column name.
I am wondering does using dataframe.columns() give me an array which the exact same index/position I can use for the dataframe methods that require an index/position? And then I can search the array using my string column name to get back the correct index?
From your question I'm guessing you're trying to get the index of a field in a row so you can check nullity.
Indeed you could use
ds.columns()as it will give you the ordered columns and then use the index from here.Nevertheless, I would advice to use another method though as you keep the logic inside row processing and it will be more robust. You can use
.fieldIndex(String fieldName)See more https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Row.html#fieldIndex(java.lang.String)