Using PostgreSQL with Pentaho

462 views Asked by At

I have created a DB join in one of my Pentaho Steps. I am shifting my environment from Oracle to PostgreSQL. Being totally new to PostgreSQL I am unable to get rid of the bugs. The join is as following:

select t.contract,
   t.calender,
   t.mspprovider,
   t.mspcustomer,
   t.objectname,
   t.granularity,
   t.timeperiod,
   cast(max(t.value) as numeric) as Value,
   null as correctedvalue,
   t.valueunit,
   max(t.violated) as violated,
   null as violatedcorr,
   t.severity,
   t.relation,
   t.target,
   t.targetunit,
   t.targetperiod,
   t.calculationtype,
   t.aggregationtype,
   current_date as createdate,
   current_date as modifydate
from (select a.contract,
           a.calender,
           a.mspprovider,
           a.mspcustomer,
           a.objectname,
           ? as granularity,
           ? as timeperiod,
           Round(Round((sum(a.value)) / sum(a.anzahl)) / 3600 + 0.0004, 3) - 0.001, 2) as value,
           *****'Hour'***** as valueunit
           *****(case
             when Round(Round((sum(a.value) / sum(a.anzahl)) / 3600 + 0.0004, 3) - 0.001, 2) > b.target Then
              'YES'
             Else
              'NO'
           END)** as violated***,
           a.severity,
           a.relation,
           b.target,
           b.targetunit,
           b.targetperiod,
           a.calculationtype as calculationtype,
           b.calculationtype as aggregationtype
      from (select a.contract,
                   a.calender,
                   a.mspprovider,
                   a.mspcustomer,
                   a.objectname,
                 cast(sum(a.durationmodified) as numeric) as Value,  
                   a.severity,
                   a.relation,
                   a.calculationType as calculationtype,
                   a.originatorid,
                   a.calculationversion,
                   count(*) as anzahl
              from result_slalom a,
                   (select a.contract,
                           a.calender,
                           a.mspprovider,
                           a.mspcustomer,
                           a.objectname,
                           a.eventid,
                           a.calculationtype,
                           max(a.calculationversion) as calculationversion
                      from result_slalom a
                     group by a.contract,
                              a.calender,
                              a.mspprovider,
                              a.mspcustomer,
                              a.objectname,
                              a.eventid,
                              a.calculationtype
                     order by 1 asc, 2 asc, 6 asc, 7 asc, 8 asc) b
             where a.endkeydate >= ?
               and a.endkeydate < ?
               and b.contract = a.contract
               and b.calender = a.calender
               and b.mspprovider = a.mspprovider
               and b.mspcustomer = a.mspcustomer
               and b.objectname = a.objectname
               and b.eventid = a.eventid
               and b.calculationtype = a.calculationtype
               and a.calculationversion = b.calculationversion
               and a.impact = 'YES'
             group by a.contract,
                      a.calender,
                      a.mspprovider,
                      a.mspcustomer,
                      a.objectname,
                      a.severity,
                      a.relation,
                      a.originatorid,
                      a.calculationtype,
                      a.calculationversion
             order by 1 asc, 2 asc, 7 asc, 9 asc) a,
           contract_target_mapping b
     where b.contract = a.contract
       and b.calender = a.calender
       and b.calculationtype = 'M' || a.calculationtype
       and b.severity = a.severity
       and b.target not in ('Dynamic')
     group by a.contract,
              a.calender,
              a.mspprovider,
              a.mspcustomer,
              a.objectname,
              a.severity,a.relation,
              b.target,
              b.targetunit,
              b.targetperiod,
              a.calculationtype,
              b.calculationtype 
    union
    select c.contract,
           c.calender,
           c.mspprovider,
           c.mspcustomer,
           o.objectname as objectname,
           ? as granularity,
           ? as timeperiod,
            null as value,
           'Hour' as valueunit,
           'NO' as violated,
           c.severity,
           c.relation,
           c.target,
           c.targetunit,
           c.targetperiod,
           SubStr(c.calculationtype, 2, 200) as calculationtype,
           'M' || SubStr(c.calculationtype, 2, 200) as aggregationtype
      from contract_target_mapping c
           ***Inner Join originator o on o.mspprovider = c.mspprovider and o.mspcustomer = c.mspcustomer
     where c.CalculationType in ('MTTA','MTTR') ) t***
 group by t.contract,
      t.calender,
      t.mspprovider,
      t.mspcustomer,
      t.objectname,
      t.granularity,
      t.timeperiod,
      t.valueunit,
      t.severity,
      t.relation,
      t.target,
      t.targetunit,
      t.targetperiod,
      t.calculationtype,
      t.aggregationtype


order by 1 asc, 2 asc, 12 asc

This join works fine in Oracle. Tried using it in PostgreSQL but it throws a few errors and the errors are thrown at the following places (Also marked in Bold in the query) :

  1. 'Hour' as valueunit,

  2. (case when Round(Round((sum(a.value) / sum(a.anzahl)) / 3600 + 0.0004, 3) - 0.001, 2) > b.target Then 'YES' Else 'NO' END) as violated,

  3. Inner Join originator o on o.mspprovider = c.mspprovider and o.mspcustomer = c.mspcustomer where c.CalculationType in ('MTTA','MTTR') ) t

I haven't proceeded further as I am stuck onto these errors. Please feel free to point out any other errors if you see. Any help would be much appreciated. Thanks in advance.

1

There are 1 answers

0
AlainD On BEST ANSWER

I understand that it is quick and easy to copy/paste a query in the input table field.

However, the input table step is an implementation not an abstraction, meaning that your are dependent of the sql dialect, and also on more hidden petty things like reserved words [value just before 'Hours'], scope [violated is defined twice], double quotes instead of simple, date format, etc...

If you want to use kettle as an abstraction, do nothing more than a "SELECT FROM table WHERE ORDER BY" in the input table steps and do the logic in Kettle.

After a while you'll find it is simpler and, on complex queries like yours, sometimes faster. Although, I do admit, a copy/paste is tempting for a quick migration.

Also replace the ? with ${parameter}, where parameter is the name of a field defined in a previous step. It's much simpler to maintain and, you'll discover, in a few month that it is useful to reuse the transformation in loops and complex jobs.