I am preparing a CSV for loading onto a table using the SQL command:
COPY table FROM filename WITH (FORMAT csv, HEADER)
This particular table has a varchar[]
(array of strings) column, and some of the items in these arrays contain special characters such as double quotes ("
) and backslashes (\
), some of which can break the CSV format or the array notation. Examples:
"quote"
x1"x2
C:\Users
I've made a few attempts so far (attempt is the actual content of the column in the CSV file):
{"""quote""","x1""x2","C:\Users"}
ERROR: extra data after last expected column
CONTEXT: COPY table, line 1: "{"""quote""","x1""x2","C:\Users"}"
"{"""quote""","x1""x2","C:\Users"}"
Inserted value is missing characters:
{quote,x1x2,C:Users}
"{"\""quote\""","x1\""x2","C:\\Users"}"
Inserted value is missing characters:
{"quote\"",x1x2,"C:\\Users"}
What is the correct syntax to insert values with double quotes and backslashes from a CSV?
PostgreSQL version is 13.
The varchar[] column must be formatted using a CSV-formatted version of PostgreSQL array notation. Unfortunately, as my example shows, this can get quite hairy.
Here are some examples how the column should be quoted:
PostgreSQL has this to say on quoting array elements:
As with all CSV documents, if your column data contains quotes, spaces, commas, or other special characters, the column must be CSV-quoted. Or just always quote them to be safe. Normal CSV output libraries will auto-quote columns in your data.