I am preparing a CSV for loading onto a table using the SQL command:
COPY table FROM filename WITH (FORMAT csv, HEADER)
This particular table has a varchar[] (array of strings) column, and some of the items in these arrays contain special characters such as double quotes (") and backslashes (\), some of which can break the CSV format or the array notation. Examples:
"quote"
x1"x2
C:\Users
I've made a few attempts so far (attempt is the actual content of the column in the CSV file):
{"""quote""","x1""x2","C:\Users"}
ERROR: extra data after last expected column
CONTEXT: COPY table, line 1: "{"""quote""","x1""x2","C:\Users"}"
"{"""quote""","x1""x2","C:\Users"}"
Inserted value is missing characters:
{quote,x1x2,C:Users}
"{"\""quote\""","x1\""x2","C:\\Users"}"
Inserted value is missing characters:
{"quote\"",x1x2,"C:\\Users"}
What is the correct syntax to insert values with double quotes and backslashes from a CSV?
PostgreSQL version is 13.
The varchar[] column must be formatted using a CSV-formatted version of PostgreSQL array notation. Unfortunately, as my example shows, this can get quite hairy.
Here are some examples how the column should be quoted:
PostgreSQL has this to say on quoting array elements:
As with all CSV documents, if your column data contains quotes, spaces, commas, or other special characters, the column must be CSV-quoted. Or just always quote them to be safe. Normal CSV output libraries will auto-quote columns in your data.