In SQL Server bulk insert, how do I use higher ASCII characters for Field and Row terminators

4.2k views Asked by At

I have a bulk insert that works on SQL Server 2000 that I'm trying to run on SQL Server 2008 R2, but it's not working as I had hoped. I've been successfully running these bulk inserts into SQL 2000 with the following:

Format file:

8.0
9
1 SQLCHAR 0 0 "ù" 1 Col1 ""
2 SQLCHAR 0 0 "ù" 2 Col2 ""
3 SQLCHAR 0 0 "ù" 3 Col3 ""
4 SQLCHAR 0 0 "ù" 4 Col4 ""
5 SQLCHAR 0 0 "ù" 5 Col5 ""
6 SQLCHAR 0 0 "ú" 6 Col6 ""
7 SQLCHAR 0 0 "" 0 Col7 ""
8 SQLCHAR 0 0 "" 0 Col8 ""
9 SQLCHAR 0 0 "" 0 Col9 ""

Data file:

101ù110115100ùC02BCD72-083E-46EE-AA68-848F2F36DB4Dù0ù1ùCú

Bulk insert command:

bulk insert Database1.dbo.Table1
            from 'C:\DataFile.dat'
            with 
                (
                      formatfile = 'C:\FormatFile.fmt'
                    , tablock
                    , check_constraints
                    , maxerrors = 0
                )

Now that I'm running on a SQL 2008 R2 box, I'm getting the following error:

Bulk load: An unexpected end of file was encountered in the data file.

If I change my field terminators from ascii 249 (ù) to commas (,) and change my row terminators from ascii 250 (ú) to semi-colons (;), everything will run. However, this isn't really an option (the data will certainly have those characters in it) and I'd rather not pick some arbitrary string like !@#$%^&*() for my delimiters (have to edit more code that way).

I've tried a few combinations of codepage, datafiletype, collation, sql compat level and format file version, but to no avail (not that I have the expertise to know how all those would interact to change anything here). Various parts of the bulk insert MSDN docs refer to special rules concerning ascii characters greater than 127 or less than 32, but I can't quite make out how that would affect the delimiters.

What can I do to touch as little code as possible but make it run on my new server?

UPDATE (solution)

Thanks to @Adam Wenger's comment, I have found a solution. To deal with having extended ASCII characters in my data, I am no longer using a format file and am writing the bulk insert data file as unicode (not ANSI) to the filesystem (even though there're really no unicode chars in my data). Here is my new bulk insert statement (notice 'widechar'):

bulk insert Database1.dbo.Table1
from 'C:\DataFile.dat'
with (
      check_constraints
    , datafiletype = 'widechar'
    , fieldterminator = 'ù'
    , maxerrors = 0
    , rowterminator = 'ú'
    , tablock
)

I could not get a format file to work with extended ASCII characters (above 127) no matter what I tried. I simply got rid of the format file and now put the additional field delimiters in my data file to represent the columns that I'm not importing (I have defaults on these columns).

1

There are 1 answers

0
Adam Wenger On BEST ANSWER

Specifying DATAFILETYPE='widechar' inside your WITH block should remove your need to use the format file by being able to specify the "widechar" field and row terminators in the WITH of the BULK INSERT as well. I referenced this MSDN article on unicode character format for importing data.

BULK INSERT Database1.dbo.Table1
FROM 'C:\DataFile.dat'
WITH ( TABLOCK
   , CHECK_CONSTRAINTS
   , MAXERRORS = 0
   , DATAFILETYPE = 'widechar'
   , FIELDTERMINATOR = 'ù'
   , ROWTERMINATOR = 'ú'
)