Postgres Bulk load using control card

1.5k views Asked by At

We have a case, where we are loading data from flat file to postgres table. It is a delimited file.

File Content:

A|B|C
1.1|2016-12-20|3

I want to load only COLUMN A, B & C, but the table has 10 columns.

In Oracle, using SQL loader, where we can use control file to use the fields needed and we can do manipulation as required.

OPTIONS (DIRECT=TRUE, PARALLEL=FALSE)
LOAD DATA
INFILE FILENAME
APPEND
INTO TABLE TABLENAME
FIELDS TERMINATED BY '\t'
TRAILING NULLCOLS
(
A DECIMAL EXTERNAL,
B "to_date(:B,'YYYY-MM-DD')",
C "case when REGEXP_LIKE(:C,'^[0-9.,]') then :C else '0' end",
)

Used COPY command to load the COLUMNS A,B & C. But the trick part here is, we cannot use any manipulation in source files in COPY command as like SQL loader. So we cannot use COPY command. I left that option.

Other option:

PG BULKLOAD utility was much faster than COPY. There we can use CONTROL file. But options are limited like below;

#
# sample_csv.ctl -- Control file to load CSV input data
#
OUTPUT = table_name                   # [<schema_name>.]table_name
INPUT = /path/to/input_data_file.csv  # Input data location (absolute path)
TYPE = CSV                            # Input file type
QUOTE = "\""                          # Quoting character
ESCAPE = \                            # Escape character for Quoting
DELIMITER = ","                       # Delimiter

But I can't find any option to

  • Specify the fields required to be loaded
  • Do manipulation as like Oracle control file

Is there is any ways, to accomplish the above scenario using PG BULK LOAD with some reference. Can't find any other sources. I assume, PGBULKLOAD is the best way to load the data quickly. Kindly correct me if I'm wrong.

Any other option are available? without compromising performance.Kindly provide me some thoughts on it.

2

There are 2 answers

2
JosMac On

Check pgloader - it can load data also from csv-like file and it can do also conversion "on fly". Here is description and an example - http://pgloader.io/howto/csv.html

0
Anand Rajakrishnan On

I am using copy_from utility from psycopg2 that uses postgreSQL COPY. You can provide the input data as a STDIN (a file like object). This works pretty fast and can be comparable with SQLLDR in Oracle (I haven't tested it against the Direct path load option in SQL Loader). Otherwise, it works pretty well. https://www.psycopg.org/docs/cursor.html#cursor.copy_from

I use the below syntax:

cur.copy_from(dataIterator,schemaTable,sep=chr(31),null='',columns=objColList)

where,

  • dataIterator is a sub class of io.TextIOBase
  • schemaTable is the target table where data is being loaded
  • sep is the column separator
  • objColList is the list of columns that needs to be loaded in the target table