Hive query to return single row based on eff and exp date

186 views Asked by At

I have a table with the following data.

enter image description here

I am expecting row which needs to be returned is with exp_dt "2020-09-22". But when run below query it returning both the rows. I am not understanding why it is returning the first row also when it has eff_dt "2020-09-19".

select id,cd,eff_dt,exp_dt,post_dt from table 
where from_unixtime(unix_timestamp(eff_dt,"yyyy-MM-dd")) <= from_unixtime(unix_timestamp("2020-09-21","yyyy-MM-dd"))
and from_unixtime(unix_timestamp(exp_dt,"yyyy-MM-dd")) >= from_unixtime(unix_timestamp("2020-09-21","yyyy-MM-dd"));

Is there any issue with my query? I am expecting 2nd row to be returned.

3

There are 3 answers

2
Gordon Linoff On BEST ANSWER

Use < for the comparison to exp_date:

select id,cd,eff_dt,exp_dt,post_dt
from table 
where from_unixtime(unix_timestamp('2020-09-21', 'yyyy-MM-dd')) >= from_unixtime(unix_timestamp(eff_dt, 'yyyy-MM-dd')) and
      from_unixtime(unix_timestamp('2020-09-22', 'yyyy-MM-dd')) < from_unixtime(unix_timestamp(exp_dt, 'yyyy-MM-dd'))

I reversed the comparison order. I find it easier to follow the logic with the constants first.

0
leftjoin On

You do not need from_unixtime(unix_timestamp()) because dates are already in correct format and argument is in the same yyyy-MM-dd format.

The issue in your query is that you are using equal for both eff and exp dates To find latest record on date use this query:

select id,cd,eff_dt,exp_dt,post_dt from table 
where eff_dt <= "2020-09-21"
  and exp_dt >  "2020-09-21";

There should be no records when eff_dt = exp_dt in SCD2 if you have only date (without time component). dates can be equal only if you are using timestamps, and time is different, in this case convert your argument date to timestamp before checking.

SCD2 should be designed in such way that fact record can be mapped to exactly one record of SCD2.

0
mck On

Does this capture the edge case of equal same day expiry and solve your problem at the same time?

select id,cd,eff_dt,exp_dt,post_dt from table 
where 
    (from_unixtime(unix_timestamp(eff_dt,"yyyy-MM-dd")) <= from_unixtime(unix_timestamp("2020-09-21","yyyy-MM-dd"))
     and
     from_unixtime(unix_timestamp(exp_dt,"yyyy-MM-dd")) > from_unixtime(unix_timestamp("2020-09-21","yyyy-MM-dd"))
    )
    or
    (from_unixtime(unix_timestamp(eff_dt,"yyyy-MM-dd")) = from_unixtime(unix_timestamp("2020-09-21","yyyy-MM-dd"))
     and
     from_unixtime(unix_timestamp(exp_dt,"yyyy-MM-dd")) = from_unixtime(unix_timestamp("2020-09-21","yyyy-MM-dd"))
    )
;

In fact I suspect exp is always >= eff, so maybe only one condition

from_unixtime(unix_timestamp(eff_dt,"yyyy-MM-dd")) <= from_unixtime(unix_timestamp("2020-09-21","yyyy-MM-dd"))

is enough...?