I'm using BCP to bulk upload data from a CSV file to SQL Azure (because BULK INSERT is not supported). This command runs and uploads the rows:
bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -c -U bcpuser@resource -S tcp:resource.database.windows.net
But data.csv is UTF8 encoded and contains non-ASCII strings. These get corrupted. I've tried changing the -c option to -w:
bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -w -U bcpuser@resource -S tcp:resource.database.windows.net
But then I get '0 rows copied'.
What am I doing wrong and how do I bulk insert Unicode characters using BCP?
The UTF-8 encoding is the primary issue. Using
-w
won't help because in Microsoft-land, the term "Unicode" nearly always refers to UTF-16 Little Endian.The solution will depend on which version of BCP you are using as an option was added in the newest version (13.0 / 2016):
If you are using BCP that came with SQL Server prior to SQL Server 2016 (version 13.0) then you need to convert the csv file to UTF-16 Little Endian (LE) as that is what Windows / SQL Server / .NET use for all strings. And use the
-w
switch.I got this to work encoding a file as "UCS-2 LE BOM" in Notepad++, whereas that same import file failed using the
-c
switch.If you are using BCP that came with SQL Server 2016 (version 13.0) or newer, then you can simply add
-c -C 65001
to the command line.-C
is for "code page", and 65001 is the code page for UTF-8.The MSDN page for bcp Utility states (in the explanation of the
-C
switch):UPDATE
Support for UTF-8 / code page 65001 was added to SQL Server 2014 via SP2, as noted in this Microsoft KB article:
UTF-8 encoding support for the BCP utility and BULK INSERT Transact-SQL command in SQL Server 2014 SP2