Calculating Variance from different data Sources Oracle SQL

579 views Asked by At

I am attempting to create a list of variances based on data that I get from two difference sources. This data contains a date, a series of references and columns containing numeric counts etc.

The idea behind this is to check that the data from Data Source 1 has the same numeric count as the data from Data Source 2, and then logging the variances of each 5 minute interval.

Here I have the required code to create the table and sample data of a simplified scenario

Create Table ABP_PROFILE 
(  ABP_DATE            Date          not Null, 
  ABP_SOURCE_UID      Number(10)  not Null, 
  ABP_REFERENCE_1     Varchar2(30)  not Null, 
  ABP_CHARGE          Number(18,6), 
  ABP_COUNT           Number(18)
);

insert into ABP_PROFILE (ABP_DATE, ABP_SOURCE_UID, ABP_REFERENCE_1, ABP_CHARGE, ABP_COUNT)
values (to_date('15-06-2015 00:05:00', 'dd-mm-yyyy hh24:mi:ss'), 1, 'Another Reference', 757.500000, 101);

insert into ABP_PROFILE (ABP_DATE, ABP_SOURCE_UID, ABP_REFERENCE_1, ABP_CHARGE, ABP_COUNT)
values (to_date('15-06-2015 00:05:00', 'dd-mm-yyyy hh24:mi:ss'), 1, 'Some Reference', 2954.000000, 211);

insert into ABP_PROFILE (ABP_DATE, ABP_SOURCE_UID, ABP_REFERENCE_1, ABP_CHARGE, ABP_COUNT)
values (to_date('15-06-2015 00:05:00', 'dd-mm-yyyy hh24:mi:ss'), 2, 'Another Reference', 757.500000, 101);

insert into ABP_PROFILE (ABP_DATE, ABP_SOURCE_UID, ABP_REFERENCE_1, ABP_CHARGE, ABP_COUNT)
values (to_date('15-06-2015 00:05:00', 'dd-mm-yyyy hh24:mi:ss'), 2, 'Some Reference', 2954.000000, 211);

insert into ABP_PROFILE (ABP_DATE, ABP_SOURCE_UID, ABP_REFERENCE_1, ABP_CHARGE, ABP_COUNT)
values (to_date('15-06-2015 00:10:00', 'dd-mm-yyyy hh24:mi:ss'), 1, 'Another Reference', 5300.250000, 191);

insert into ABP_PROFILE (ABP_DATE, ABP_SOURCE_UID, ABP_REFERENCE_1, ABP_CHARGE, ABP_COUNT)
values (to_date('15-06-2015 00:10:00', 'dd-mm-yyyy hh24:mi:ss'), 1, 'Some Reference', 9568.000000, 208);

insert into ABP_PROFILE (ABP_DATE, ABP_SOURCE_UID, ABP_REFERENCE_1, ABP_CHARGE, ABP_COUNT)
values (to_date('15-06-2015 00:10:00', 'dd-mm-yyyy hh24:mi:ss'), 2, 'Another Reference', 5300.250000, 5555);

insert into ABP_PROFILE (ABP_DATE, ABP_SOURCE_UID, ABP_REFERENCE_1, ABP_CHARGE, ABP_COUNT)
values (to_date('15-06-2015 00:10:00', 'dd-mm-yyyy hh24:mi:ss'), 2, 'Some Reference', 1111.000000, 208);

Here I have created a BASIC SQL query to how what it is I want to do.

With A_DATA As (
  Select  ABP_DATE              As A_DATE,
          ABP_REFERENCE_1       As A_REFERENCE_1, 
          ABP_CHARGE            As A_CHARGE, 
          ABP_COUNT             As A_COUNT
  From    ABP_PROFILE           
  Where   ABP_SOURCE_UID = 1
), B_DATA As (
  Select  ABP_DATE              As B_DATE,
          ABP_REFERENCE_1       As B_REFERENCE_1, 
          ABP_CHARGE            As B_CHARGE, 
          ABP_COUNT             As B_COUNT
  From    ABP_PROFILE           
  Where   ABP_SOURCE_UID = 2
)
Select  A_DATE,
        A_REFERENCE_1,
        B_CHARGE  - A_CHARGE  As ChargeDifference,
        B_COUNT   - A_COUNT   As CountDifference
From    A_DATA,
        B_DATA
Where   A_DATE        = B_DATE
And     A_REFERENCE_1 = B_REFERENCE_1
;

This does a join based on the Date and Reference from the two data sources. I need to have a much more versatile solution which needs to also show variances if data from one side is missing, yes a full outer join can be used for this, but I want to explore other options.

I've been looking into Analytic Functions and I am sure there is one out there which can do what I want it to do. I'm wondering if any Oracle SQL Experts have any ideas that can help here.

FYI I am running 11gR2 Enterprise

1

There are 1 answers

1
Ponder Stibbons On BEST ANSWER

Solution with FULL JOIN seems to be more readable, but if you search for alternative - here it is - with functions lag() and lead():

with data as (
  select abp_date dt, abp_source_uid id, abp_reference_1 ref, 
      abp_charge charge, abp_count cnt,
      lag(abp_source_uid) over (partition by abp_date, abp_reference_1 
                                order by abp_source_uid) lgid,
      lead(abp_charge)    over (partition by abp_date, abp_reference_1 
                                order by abp_source_uid) ldcharge,
      lead( abp_count)    over (partition by abp_date, abp_reference_1 
                                order by abp_source_uid) ldcnt
    from abp_profile a)
select dt, ref, 
    case when id = 1 then nvl(ldcharge, 0) - charge else charge end chrg_diff,
    case when id = 1 then nvl(ldcnt, 0) - cnt else cnt end cnt_diff
  from data
  where id = 1 or id = 2 and lgid is null;

... and your modified query transformed to full join version, which I made to compare results:

With A_DATA As (
  Select  ABP_DATE              As A_DATE,
          ABP_REFERENCE_1       As A_REFERENCE_1, 
          ABP_CHARGE            As A_CHARGE, 
          ABP_COUNT             As A_COUNT
  From    ABP_PROFILE           
  Where   ABP_SOURCE_UID = 1
), B_DATA As (
  Select  ABP_DATE              As B_DATE,
          ABP_REFERENCE_1       As B_REFERENCE_1, 
          ABP_CHARGE            As B_CHARGE, 
          ABP_COUNT             As B_COUNT
  From    ABP_PROFILE           
  Where   ABP_SOURCE_UID = 2
)
Select  nvl(A_DATE, b_date) dt,
        nvl(A_REFERENCE_1, b_reference_1) ref,
        nvl(B_CHARGE, 0) - nvl(A_CHARGE, 0)  As Chrg_Diff,
        nvl(B_COUNT, 0)   - nvl(A_COUNT, 0)  As Cnt_Diff
From    A_DATA 
full join B_DATA on A_DATE = B_DATE and A_REFERENCE_1 = B_REFERENCE_1;

SQLFiddle

Both queries qives same results. In examples I added two rows to show how to deal with missing data. Here I used nvl(..., 0) but of course you can leave nulls or add column(s) informing of such situation.