Postgres-pgloader-transformation in columns

1.7k views Asked by At

Loading flat file to postgres table.I need to do few transformations while reading the file and load it.

Like
-->Check for characters, if it is present, default some value. Reg_Exp can be used in oracle. How the functions can be called in below syntax?
-->TO_DATE function from text format
-->Check for Null and defaulting some value
-->Trim functions
-->Only few columns from source file should be loaded
-->Defaulting values, say for instance, source file has only 3 columns. But we need to load 4 columns. One column should be defaulted with some value

LOAD CSV
    FROM 'filename'
    INTO postgresql://role@host:port/database_name?tablename
        TARGET COLUMNS  
            (  
                alphanm,alphnumnn,nmrc,dte
            ) 

    WITH truncate,  
         skip header = 0,  
         fields optionally enclosed by '"',  
         fields escaped by double-quote,  
         fields terminated by '|',
         batch rows = 100,
         batch size = 1MB,     
         batch concurrency = 64

    SET work_mem to '32 MB', maintenance_work_mem to '64 MB';

Kindly help me, how this can be accomplished used pgloader?

Thanks

1

There are 1 answers

0
Dimitri Fontaine On

Here's a self-contained test case for pgloader that reproduces your use-case, as best as I could understand it:

/*  
  Sorry pgloader version "3.3.2" compiled with SBCL 1.2.8-1.el7 Doing kind
  of POC, to implement in real time work. Sample data from file:
  raj|B|0.5|20170101|ABCD Need to load only first,second,third and fourth
  column; Table has three column, third column should be defaulted with some
  value. Table structure: A B C-numeric D-date E-(Need to add default value)
*/

LOAD CSV
     FROM inline
        (
            alphanm,
            alphnumnn,
            nmrc,
            dte [date format 'YYYYMMDD'],
            other
        )
     INTO postgresql:///pgloader?so.raja
        (
            alphanm,
            alphnumnn,
            nmrc,
            dte,
            col text using "constant value"
        )

     WITH truncate,
          fields optionally enclosed by '"',
          fields escaped by double-quote,
          fields terminated by '|'

      SET work_mem to '12MB',
          standard_conforming_strings to 'on'

   BEFORE LOAD DO
   $$ drop table if exists so.raja; $$,
   $$ create table so.raja (
       alphanm   text,
       alphnumnn text,
       nmrc      numeric,
       dte       date,
       col       text
      );
   $$;

raj|B|0.5|20170101|ABCD

Now here's the extract from running the pgloader command:

$ pgloader 41287414.load 
2017-08-15T12:35:10.258000+02:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2017-08-15T12:35:10.261000+02:00 LOG Data errors in '/private/tmp/pgloader/'
2017-08-15T12:35:10.261000+02:00 LOG Parsing commands from file #P"/Users/dim/dev/temp/pgloader-issues/stackoverflow/41287414.load"
2017-08-15T12:35:10.422000+02:00 LOG report summary reset
             table name       read   imported     errors      total time
-----------------------  ---------  ---------  ---------  --------------
                  fetch          0          0          0          0.007s 
            before load          2          2          0          0.016s 
-----------------------  ---------  ---------  ---------  --------------
                so.raja          1          1          0          0.019s 
-----------------------  ---------  ---------  ---------  --------------
        Files Processed          1          1          0          0.021s 
COPY Threads Completion          2          2          0          0.038s 
-----------------------  ---------  ---------  ---------  --------------
      Total import time          1          1          0          0.426s 

And here's the content of the target table when the command is done:

$ psql -q -d pgloader -c 'table so.raja'
 alphanm │ alphnumnn │ nmrc │    dte     │      col       
═════════╪═══════════╪══════╪════════════╪════════════════
 raj     │ B         │  0.5 │ 2017-01-01 │ constant value
(1 row)