How to Handle NUL (ASCII 0) Data Error When Loading TSV GZIP File from Google Cloud Storage into BigQuery

15 views Asked by At

I'm encountering a NUL (ASCII 0) data error while attempting to read data from a tab-separated (TSV) GZIP file stored in Google Cloud Storage (GCS) and load it into BigQuery using the GCSToBigQueryOperator in Apache Airflow. It seems that the presence of NUL characters in the file is causing issues during the load process. How can I address this error and successfully load the data into BigQuery?

code:

        task= GCSToBigQueryOperator(
        task_id='task',
        bucket=bucket_name,
        source_objects=[
            'places/dt=2024-01-01/*'
        ],
        destination_project_dataset_table=f'dataset.tablename',
        source_format="csv",
        write_disposition='WRITE_TRUNCATE',
        autodetect=True,
        quote_character="",
        field_delimiter="\t",
        encoding="UTF-8",
        allow_jagged_rows=True,
        ignore_unknown_values=True,
        allow_quoted_newlines=True,
        skip_leading_rows=1,  # If your TSV has a header row
        dag=dag
    )

Error:

 Error while reading data, error message: Bad character (ASCII 0) encountered.; 
 line_number: 611503 byte_offset_to_start_of_line: 181596443 column_index: 0 
 column_name: "fsq_id" column_type: STRING value: "Atakum\000" File: 
 gs://bucket_name/places/dt=2024-03-24/places_tr.tsv.gz

enter image description here

0

There are 0 answers