We have a case, where we are loading data from flat file to postgres table. It is a delimited file.
File Content:
A|B|C
1.1|2016-12-20|3
I want to load only COLUMN A, B & C, but the table has 10 columns.
In Oracle, using SQL loader, where we can use control file to use the fields needed and we can do manipulation as required.
OPTIONS (DIRECT=TRUE, PARALLEL=FALSE)
LOAD DATA
INFILE FILENAME
APPEND
INTO TABLE TABLENAME
FIELDS TERMINATED BY '\t'
TRAILING NULLCOLS
(
A DECIMAL EXTERNAL,
B "to_date(:B,'YYYY-MM-DD')",
C "case when REGEXP_LIKE(:C,'^[0-9.,]') then :C else '0' end",
)
Used COPY command to load the COLUMNS A,B & C. But the trick part here is, we cannot use any manipulation in source files in COPY command as like SQL loader. So we cannot use COPY command. I left that option.
Other option:
PG BULKLOAD utility was much faster than COPY. There we can use CONTROL file. But options are limited like below;
#
# sample_csv.ctl -- Control file to load CSV input data
#
OUTPUT = table_name # [<schema_name>.]table_name
INPUT = /path/to/input_data_file.csv # Input data location (absolute path)
TYPE = CSV # Input file type
QUOTE = "\"" # Quoting character
ESCAPE = \ # Escape character for Quoting
DELIMITER = "," # Delimiter
But I can't find any option to
- Specify the fields required to be loaded
- Do manipulation as like Oracle control file
Is there is any ways, to accomplish the above scenario using PG BULK LOAD with some reference. Can't find any other sources. I assume, PGBULKLOAD is the best way to load the data quickly. Kindly correct me if I'm wrong.
Any other option are available? without compromising performance.Kindly provide me some thoughts on it.
Check pgloader - it can load data also from csv-like file and it can do also conversion "on fly". Here is description and an example - http://pgloader.io/howto/csv.html