I'm trying to load a csv into Python, but the file keeps failing because one of the fields has a '\N' to represent null values in a field that is Integer. I can't figure out how to deal with this - I'd like to convert it on the way in.
It would be great if I could ignore error and insert the rest of the record into the table, but that doesn't seem to be a thing.
Any help would be much appreciated
So the following code
con.sql("INSERT INTO getNBBOtimes SELECT * FROM read_csv_auto('G:/temp/timeexport.csv')")
results in the following error
InvalidInputException Traceback (most recent call last)
<timed eval> in <module>
InvalidInputException: Invalid Input Error: Could not convert string '\N' to INT64 in column "column3", at line 856438.
Parser options:
file=G:/temp/timeexport.csv
delimiter=',' (auto detected)
quote='"' (auto detected)
escape='"' (auto detected)
header=0 (auto detected)
sample_size=20480
ignore_errors=0
all_varchar=0.
Consider either increasing the sample size (SAMPLE_SIZE=X [X rows] or SAMPLE_SIZE=-1 [all rows]), or skipping column conversion (ALL_VARCHAR=1)
I figured I would try to handle the error on the way in, but nothing seems to work
con.sql("CREATE TABLE test1 as seLECT NULLIF(column1,'\\N') , NULLIF(column2,'\\N'),NULLIF(column3,'\\N'),NULLIF(column4,'\\N'),NULLIF(column2,'\\N') FROM read_csv_auto('G:/temp/timeexport.csv')")
returns the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 46-47: malformed \N character escape
I tried this
con.sql("CREATE TABLE test1 as seLECT NULLIF(column1,repr('\\N')) , NULLIF(column2,repr('\\N')),NULLIF(column3,repr('\\N')),NULLIF(column4,(repr'\\N')),NULLIF(column2,repr('\\N')) FROM read_csv_auto('G:/temp/timeexport.csv')")
and got this error
CatalogException: Catalog Error: Scalar Function with name repr does not exist!
Did you mean "exp"?
You haven't provided any sample data, so let's assume you're starting with:
We start by creating our target table:
We can use a SQL
IF
statement to read in the file:Which gets us:
...which is what I think you were after.
You can make your solution using
NULLIF
work if you're willing to treat all columns asVARCHAR
:Which gets us:
You could then use a second
select
to convert those varchar values to int64.