DBI::dbWriteTable error when writing to BigQuery -- trying to coerce string to int

158 views Asked by At

In an AI notebook, I have the following:

%%R
tempdf %>% summary() %>% print()

DBI::dbWriteTable(
    conn=clinvar_conn, 
    name=table_name, 
    value=tempdf, 
    overwrite=T
)

Giving

    CHROM               POS                 ID                REF           
 Length:775501      Length:775501      Length:775501      Length:775501     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
     ALT                QUAL              FILTER              INFO          
 Length:775501      Length:775501      Length:775501      Length:775501     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  


Error: Job 'eacri-genomics.job_om9Un5yZVSHjLpt0d4ZzyF743zjy.US' failed
✖ Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 740479; errors: 1; max bad: 0; error percent: 0 [invalid]
✖ Error while reading data, error message: JSON parsing error in row starting at position 392688729: Could not convert value 'string_value: "X"' to integer. Field: CHROM; Value: X [invalid]
Run `rlang::last_error()` to see where the error occurred.

I made sure to cast all the data as character. Why is it trying to convert "X" to an integer?

1

There are 1 answers

0
Simon.S.A. On

Re-posting comment as an answer for completeness.

Position 392688729 seems very high if you only have 775501 rows. This makes me suspect that some of your data-types are not what they are assumed to be. Given that the error message mentions JSON, perhaps some of your fields are JSON objects and casting them to string does not work how you might assume.

Two possibilities to investigate:

  1. some of your text fields are longer than the maximum number of characters
  2. there is an un-escaped quote in your input that means multiple values are being misread as a single value