MSSQL Bulk Insert CSV - Multiple columns include commas

1.2k views Asked by At

I'm running MSSQL Server Management Studio R2 and I have a large .csv file with ~30 columns and ~10,000,000 rows. I downloaded the data from an internet source in about 900 separate files and concatenated them to a single file with a .cmd script, so now I have a full file ready for insertion into the database.

Unfortunately, large numbers in multiple columns are recorded as strings that contain internal commas (i.e., "50,000,000" or "3,972,721"). Note that the double quotes are included in the file formatting. BULK INSERT considers these commas to be text field delimiters rather than part of the string.

This is my insert statement.

BULK INSERT Database.dbo.table
    FROM '\\server\filepath\file.csv'
    WITH
        (
            FIELDTERMINATOR = ',',
            ROWTERMINATOR = '\n'
        );

In the table I created to hold the data, the rows in question have the varchar(max) data type, as anything else I've tried throws an error.

The incorrect output for "3,972,721" currently bleeds over into extra rows, completely distorting the rest of the table.

ROW1        ROW2        ROW3
"3          972         721"

I would like the correct output for "3,972,721" to be

ROW1
3972721

I'm looking either for a way to force the BULK INSERT command to cast these strings as ints (with the added constraint that some of the datapoint strings include a + symbol, for example "2,500,000,000+"), or another method of insertion entirely, such as bcp.

Thank you!

1

There are 1 answers

0
Patrick Tucci On

You can specify a text qualifier with bulk inserts by using a format file. This related StackOverflow question has pretty good documentation on how to do this:

Bulk insert with text qualifier in SQL Server