Insert rows with Unicode characters using BCP

6.3k views Asked by At

I'm using BCP to bulk upload data from a CSV file to SQL Azure (because BULK INSERT is not supported). This command runs and uploads the rows:

bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -c -U bcpuser@resource -S tcp:resource.database.windows.net

But data.csv is UTF8 encoded and contains non-ASCII strings. These get corrupted. I've tried changing the -c option to -w:

bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -w -U bcpuser@resource -S tcp:resource.database.windows.net

But then I get '0 rows copied'.

What am I doing wrong and how do I bulk insert Unicode characters using BCP?

2

There are 2 answers

0
Solomon Rutzky On BEST ANSWER

But data.csv is UTF8 encoded

The UTF-8 encoding is the primary issue. Using -w won't help because in Microsoft-land, the term "Unicode" nearly always refers to UTF-16 Little Endian.

The solution will depend on which version of BCP you are using as an option was added in the newest version (13.0 / 2016):

  • If you are using BCP that came with SQL Server prior to SQL Server 2016 (version 13.0) then you need to convert the csv file to UTF-16 Little Endian (LE) as that is what Windows / SQL Server / .NET use for all strings. And use the -w switch.

    I got this to work encoding a file as "UCS-2 LE BOM" in Notepad++, whereas that same import file failed using the -c switch.

  • If you are using BCP that came with SQL Server 2016 (version 13.0) or newer, then you can simply add -c -C 65001 to the command line. -C is for "code page", and 65001 is the code page for UTF-8.

The MSDN page for bcp Utility states (in the explanation of the -C switch):

Versions prior to version 13 (SQL Server 2016) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.

UPDATE

Support for UTF-8 / code page 65001 was added to SQL Server 2014 via SP2, as noted in this Microsoft KB article:

UTF-8 encoding support for the BCP utility and BULK INSERT Transact-SQL command in SQL Server 2014 SP2

1
Pho On

The answer from Solomon helped me in my struggle with Unicode and SQL Server 2014. I would like to share my experience about Unicode here. I hope this helps the next person who suffers from Unicode problems with BCP.

I have had a hard time figuring out the UTF and Unicode of SQL Server 2014. I am using Powershell to upload using BCP to a SQL Server 2014 SP2 database. My files are in Dutch, UTF-8 without BOM. I used Powershell to convert the files into microsoft's Unicode:

Get-ChildItem "C:\Documents\ProjectA" -filter *.CSV |
ForEach-Object {
    $path = $_.basename + '.unicode.CSV' 
    get-content $_ | Set-Content -Encoding Unicode -path $path 
}

Then I used BCP without format file:

Get-ChildItem "C:\Documents\ProjectA" -filter *.unicode.CSV |
 ForEach-Object { 
   try { $output = bcp ProjectA.dbo.auditlog in $_.FullName -w "-t," -T -F2 
            if ($LASTEXITCODE)
            {  throw $output
            }
    catch
    { $Output >> C:\Documents\ProjectA\BCPCommandFailed$(get-date -f yyyy-MM-dd).log
    }
}

The conversion into Unicode causes file sizes to double e.g. from 11,630KB into 23,259KB. Template file whether XML or non-XML did not work.