Bigquery job fails if input data doesn't follow the table datatype

433 views Asked by At

I'm used to running Hive queries on Hadoop and as per my experience even if input data doesn't follow the table datatype, the job doesn't fail. For example: if input data "abc" routes to a table column having 'Integer' datatype then 'NULL' gets inserted for that particular value.

In Big query, I have observed that the job fails in case of data value not strictly following the column datatype.

file-00000000: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. (error code: invalid) query: Could not parse '2017-11-23' as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]]; Could not parse '2017-11-23' as datetime for field date (position 1) starting at location 0 (error code: invalidQuery)

For this example, if '2017-11-23' doesn't follow timestamp datatype then 'NULL' should have been inserted in the column rather than failing the job.

Is there any way to restrict this behavior? My job shouldn't fail whatever the data is. Any insights?

1

There are 1 answers

0
Elliott Brossard On BEST ANSWER

A couple of options:

  • Use a STRING for the column type instead. With a query, you can transform the data e.g. using SAFE_CAST(timestamp_string AS TIMESTAMP) to get NULL values for invalid timestamps.
  • Use the configuration.load.maxBadRecords property to skip rows with invalid data. Note that this will omit the entire row if there is a bad value, however.