I'm dealing with a legacy pandas code and because of my data platform i'm working on (databricks), I have to use spark to read CSVs. I'm able to remove rows before headers thanks to the definition of data schemas. But what about removing the last rows of a dataframe with the function spark.read.csv ? I'm using mode=FAILFAST for data quality purposes, so i cannot just transform the data later. I find like its a bit unbelievable that we cannot do this kind of basic transformations easily ?
Thanks for your help, Have a nice WE