I am very new to Google Cloud Platform and I'm trying to create a table in bigquery from ~60,000 csv.gz
files stored in a GCP bucket.
To do this, I've opened Cloud Shell, and I'm trying the following:
$ bq --location=US mk my_data
$ bq --location=US \
load --null_marker='' \
--source_format=CSV --autodetect \
my_data.my_table gs://my_bucket/*.csv.gz
This throws the following error:
BigQuery error in load operation: Error processing job 'my_job:bqjob_r3eede45779dc9a51_0000017529110a63_1':
Error while reading data, error message:
FAILED_PRECONDITION: Invalid gzip file: bytes are missing
I don't know how to find which file might be problematic when loading the files. I've checked a few of the files, and they are all valid .gz
files that I can open with any csv reader after decompression, but I don't know how to check through all the files to find a problematic one.
Thank you in advance for any help with this!
To loop through your bucket, you can use the eval command
Another option is to download all your files locally, if possible, and process from there: