Mysql Using CSV Files Stored In Blob Field With Load Data Infile

281 views Asked by At

I am working with a database that has a series of csv files stored in blob fields.

I'd like to select the correct file (.csv) using a select with where clause, then insert records for each line in the .csv by splitting each comma separated column.

Normally, if the file were located on disk I'd use:

LOAD DATA INFILE 'file-path'
INTO TABLE some-table
FIELDS TERMINATED BY ','
IGNORE 1 ROWS;

or if inserting records from a select I'd use:

INSERT INTO target-table
SELECT * FROM source-table
WHERE condition;

I've also tried to pass as a variable the result of the select on the table containing the files to LOAD DATA INFILE as the file path without success.

Please advise. Thank you.

1

There are 1 answers

0
user1191247 On

AFAIK, there is no way to use LOAD DATA INFILE to load data directly from a CSV file stored in a blob.

A quote from the MySQL docs:

The LOAD DATA statement reads rows from a text file into a table at a very high speed.

Assuming a table my_blobs like:

id blob_data
1 "col head 1","col head 2","col head 3"
"row1 col1","row1 col2","row 1 col3"
"row 2 col1","row 2 col2","row 2 col3"
...
2 ...
3 ...

A simple approach is to export all the CSV data from the table into a single file and then import it. This only works if the structure of the files in the blobs is the same and there are no (possibly limited) encoding differences.

SELECT TRIM(TRAILING '\n' FROM blob_data) FROM my_blobs
INTO OUTFILE '/path/to/file.csv'
FIELDS ESCAPED BY '';

LOAD DATA INFILE '/path/to/file.csv'
INTO TABLE `some-table`
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n';

If the data in your blobs has header rows, then the headers will get imported multiple times, but they can be removed after the import.

A more convoluted (but flexible) approach is to dump the contents of the column using SELECT ... INTO DUMPFILE and then read the file back in to the target table using LOAD DATA INFILE. This is more flexible as you can make changes to the LOAD DATA INFILE based on meta data in the my_blobs table.

You could use a query like this to write the dump/load script:

SELECT CONCAT(
    'SELECT blob_data FROM my_blobs WHERE id = ', id,' INTO DUMPFILE \'/path/to/csv_export_', id,'.csv\';\n',
    'LOAD DATA INFILE \'/path/to/csv_export_', id,'.csv\'\nINTO TABLE `some-table`\nFIELDS TERMINATED BY \',\' ENCLOSED BY \'"\'\nLINES TERMINATED BY \'\\n\'\nIGNORE 1 ROWS;\n\n'
) FROM my_blobs
INTO OUTFILE '/path/to/process_csv_files.sql'
FIELDS ESCAPED BY '' ENCLOSED BY '';

Which will create the file /path/to/process_csv_files.sql with content like:

SELECT blob_data FROM my_blobs WHERE id = 1 INTO DUMPFILE '/path/to/csv_export_1.csv';
LOAD DATA INFILE '/path/to/csv_export_1.csv'
INTO TABLE `some-table`
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;


SELECT blob_data FROM my_blobs WHERE id = 2 INTO DUMPFILE '/path/to/csv_export_2.csv';
LOAD DATA INFILE '/path/to/csv_export_2.csv'
INTO TABLE `some-table`
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;


SELECT blob_data FROM my_blobs WHERE id = 3 INTO DUMPFILE '/path/to/csv_export_3.csv';
LOAD DATA INFILE '/path/to/csv_export_3.csv'
INTO TABLE `some-table`
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

Note (from MySQL docs):

A given SELECT statement can contain at most one INTO clause, although as shown by the SELECT syntax description (see Section 13.2.13, “SELECT Statement”), the INTO can appear in different positions:

  • Before FROM. Example:

    SELECT * INTO @myvar FROM t1;
    
  • Before a trailing locking clause. Example:

    SELECT * FROM t1 INTO @myvar FOR UPDATE;
    
  • At the end of the SELECT. Example:

    SELECT * FROM t1 FOR UPDATE INTO @myvar;
    

The INTO position at the end of the statement is supported as of MySQL 8.0.20, and is the preferred position. The position before a locking clause is deprecated as of MySQL 8.0.20; expect support for it to be removed in a future version of MySQL. In other words, INTO after FROM but not at the end of the SELECT produces a warning.

Another quote from the docs:

The file name must be given as a literal string. On Windows, specify backslashes in path names as forward slashes or doubled backslashes.

# Typical Windows path with secure_file_priv set
'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\file.csv'
# or
'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/file.csv'