I have created a DB join in one of my Pentaho Steps. I am shifting my environment from Oracle to PostgreSQL. Being totally new to PostgreSQL I am unable to get rid of the bugs. The join is as following:
select t.contract,
t.calender,
t.mspprovider,
t.mspcustomer,
t.objectname,
t.granularity,
t.timeperiod,
cast(max(t.value) as numeric) as Value,
null as correctedvalue,
t.valueunit,
max(t.violated) as violated,
null as violatedcorr,
t.severity,
t.relation,
t.target,
t.targetunit,
t.targetperiod,
t.calculationtype,
t.aggregationtype,
current_date as createdate,
current_date as modifydate
from (select a.contract,
a.calender,
a.mspprovider,
a.mspcustomer,
a.objectname,
? as granularity,
? as timeperiod,
Round(Round((sum(a.value)) / sum(a.anzahl)) / 3600 + 0.0004, 3) - 0.001, 2) as value,
*****'Hour'***** as valueunit
*****(case
when Round(Round((sum(a.value) / sum(a.anzahl)) / 3600 + 0.0004, 3) - 0.001, 2) > b.target Then
'YES'
Else
'NO'
END)** as violated***,
a.severity,
a.relation,
b.target,
b.targetunit,
b.targetperiod,
a.calculationtype as calculationtype,
b.calculationtype as aggregationtype
from (select a.contract,
a.calender,
a.mspprovider,
a.mspcustomer,
a.objectname,
cast(sum(a.durationmodified) as numeric) as Value,
a.severity,
a.relation,
a.calculationType as calculationtype,
a.originatorid,
a.calculationversion,
count(*) as anzahl
from result_slalom a,
(select a.contract,
a.calender,
a.mspprovider,
a.mspcustomer,
a.objectname,
a.eventid,
a.calculationtype,
max(a.calculationversion) as calculationversion
from result_slalom a
group by a.contract,
a.calender,
a.mspprovider,
a.mspcustomer,
a.objectname,
a.eventid,
a.calculationtype
order by 1 asc, 2 asc, 6 asc, 7 asc, 8 asc) b
where a.endkeydate >= ?
and a.endkeydate < ?
and b.contract = a.contract
and b.calender = a.calender
and b.mspprovider = a.mspprovider
and b.mspcustomer = a.mspcustomer
and b.objectname = a.objectname
and b.eventid = a.eventid
and b.calculationtype = a.calculationtype
and a.calculationversion = b.calculationversion
and a.impact = 'YES'
group by a.contract,
a.calender,
a.mspprovider,
a.mspcustomer,
a.objectname,
a.severity,
a.relation,
a.originatorid,
a.calculationtype,
a.calculationversion
order by 1 asc, 2 asc, 7 asc, 9 asc) a,
contract_target_mapping b
where b.contract = a.contract
and b.calender = a.calender
and b.calculationtype = 'M' || a.calculationtype
and b.severity = a.severity
and b.target not in ('Dynamic')
group by a.contract,
a.calender,
a.mspprovider,
a.mspcustomer,
a.objectname,
a.severity,a.relation,
b.target,
b.targetunit,
b.targetperiod,
a.calculationtype,
b.calculationtype
union
select c.contract,
c.calender,
c.mspprovider,
c.mspcustomer,
o.objectname as objectname,
? as granularity,
? as timeperiod,
null as value,
'Hour' as valueunit,
'NO' as violated,
c.severity,
c.relation,
c.target,
c.targetunit,
c.targetperiod,
SubStr(c.calculationtype, 2, 200) as calculationtype,
'M' || SubStr(c.calculationtype, 2, 200) as aggregationtype
from contract_target_mapping c
***Inner Join originator o on o.mspprovider = c.mspprovider and o.mspcustomer = c.mspcustomer
where c.CalculationType in ('MTTA','MTTR') ) t***
group by t.contract,
t.calender,
t.mspprovider,
t.mspcustomer,
t.objectname,
t.granularity,
t.timeperiod,
t.valueunit,
t.severity,
t.relation,
t.target,
t.targetunit,
t.targetperiod,
t.calculationtype,
t.aggregationtype
order by 1 asc, 2 asc, 12 asc
This join works fine in Oracle. Tried using it in PostgreSQL but it throws a few errors and the errors are thrown at the following places (Also marked in Bold in the query) :
'Hour' as valueunit,
(case when Round(Round((sum(a.value) / sum(a.anzahl)) / 3600 + 0.0004, 3) - 0.001, 2) > b.target Then 'YES' Else 'NO' END) as violated,
Inner Join originator o on o.mspprovider = c.mspprovider and o.mspcustomer = c.mspcustomer where c.CalculationType in ('MTTA','MTTR') ) t
I haven't proceeded further as I am stuck onto these errors. Please feel free to point out any other errors if you see. Any help would be much appreciated. Thanks in advance.
I understand that it is quick and easy to copy/paste a query in the input table field.
However, the input table step is an implementation not an abstraction, meaning that your are dependent of the sql dialect, and also on more hidden petty things like reserved words [value just before 'Hours'], scope [violated is defined twice], double quotes instead of simple, date format, etc...
If you want to use kettle as an abstraction, do nothing more than a "SELECT FROM table WHERE ORDER BY" in the input table steps and do the logic in Kettle.
After a while you'll find it is simpler and, on complex queries like yours, sometimes faster. Although, I do admit, a copy/paste is tempting for a quick migration.
Also replace the ? with
${parameter}
, where parameter is the name of a field defined in a previous step. It's much simpler to maintain and, you'll discover, in a few month that it is useful to reuse the transformation in loops and complex jobs.