What exactly is the format for Hive LazySimpleSerDe?
A format like ParquetHiveSerDe tells me that Hive will read the HDFS files in parquet format.
But what is LazySimpleSerDe? Why not call it something explicit like CommaSepHiveSerDe or TabSepHiveSerDe, given LazySimpleSerDe is for delimited files?
LasySimpleSerde- fast and simple SerDe, it does not recognize quoted values, though it can work with different delimiters, not only commas, default is TAB (\t). You can specifySTORED AS TEXTFILEin table DDL andLasySimpleSerDewill be used. For quoted values use OpenCSVSerDe, it is not as fast asLasySimpleSerDebut works correctly with quoted values.LasySimpleSerDe is simple for the sake of performance, also it creates Objects in a lazy way, to provide better performance, this is why it is preferable when possible (for text files).
See this example with pipe-delimited (
|) file format: https://stackoverflow.com/a/68095278/2700344show create tablecommand for such table prints serde class asorg.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, STORED AS TEXTFILE is a shortcut.