I am wondering if someone might know how to skip a record that we are getting from a json file
Here is the error
[UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_RECORD_WITH_FILE_PATH] Encountered unknown fields during parsing: Here is the code that is failing
sent = spark.readStream.format('cloudFiles') \
.option('cloudFiles.format', 'json') \
.option('multiline', 'true') \
.option('cloudFiles.inferColumnTypes', 'true') \
.option('cloudFiles.schemaLocation', checkpoint_path) \
.load(raw_files) \
.withColumn('load_ts', F.current_timestamp()) \
.writeStream \
.format('delta') \
.option('checkpointLocation', checkpoint_path) \
.trigger(availableNow=True) \
.option('mergeSchema', 'true') \
.toTable(b_write_path)
Thanks!
I have not seen any documentation on how to fix this error.
This depends on what do you want to do with that data. By default, Databricks Autoloader uses the
addNewColumnsmode, which fails stream when encounters the new columns, but after restart it will handle them correctly.You can use either
rescueornoneas schema evolution mode, like this.In
rescuemode, data for new columns will be put into a so-called "rescue column" which you can analyze if necessary, and the process won't fail.In
nonemode, new columns are ignored and the process won't fail.See docs for more details.