I am trying to split one record into multiple records based on start and end date and a value column in hive Below is the same input

id      startdate       enddate     value
1       01/02/2017      10/02/2017  1000
2       01/02/2019      02/02/2019  5000

Sample output

id      startdate       enddate     value
1       01/02/2017      01/31/2017  100
1       02/02/2017      02/28/2017  100
1       03/02/2017      03/31/2017  100
1       04/02/2017      04/30/2017  100
1       05/02/2017      05/31/2017  100
1       06/02/2017      06/30/2017  100
1       07/02/2017      07/31/2017  100
1       08/02/2017      08/31/2017  100
1       09/02/2017      09/30/2017  100
1       10/02/2017      10/02/2017  100
2       01/02/2019      01/31/2019  2500
2       01/02/2019      02/02/2019  2500

I have a table data with columns id, startdate,enddate and value. For each record, I am trying to split it on a monthly basis if there is difference of 10 months between start and end date then one record should be converted to 10 records. How can we do this in hive? Appreciate help

1 Answers

0
Vijiy On

Can you try below.

with dt_str as (select explode(split('01/02/2017,02/02/2017,03/02/2017,04/02/2017,05/02/2017,06/02/2017,07/02/2017,08/02/2017,09/02/2017,10/02/2017', ',')) as date1), -- this should be your date table where you should have all the dates

ip_rec as (select 1 as id, '01/02/2017' as start_dt, '10/02/2017' as end_dt, 1000 as value),

res1 as (select t1.*, t2.date1 from ip_rec t1, dt_str t2 where  
from_unixtime(unix_timestamp(t2.date1, 'MM/dd/yyyy'), 'yyyyMMdd')>= from_unixtime(unix_timestamp(t1.start_dt, 'MM/dd/yyyy'), 'yyyyMMdd') and from_unixtime(unix_timestamp(t2.date1, 'MM/dd/yyyy'), 'yyyyMMdd')<= from_unixtime(unix_timestamp(t1.end_dt, 'MM/dd/yyyy'), 'yyyyMMdd')),

res2 as (select id, count(*) cnt from res1 group by id) -- get number of records for each id

select t1.id, t1.date1 as start_date, last_day(from_unixtime(unix_timestamp(t1.date1, 'MM/dd/yyyy'), 'yyyy-MM-dd')) as end_date, value/cnt from res1 t1 inner join res2 t2 on t1.id=t2.id;

Results -

OK
1       01/02/2017      2017-01-31      100.0
1       02/02/2017      2017-02-28      100.0
1       03/02/2017      2017-03-31      100.0
1       04/02/2017      2017-04-30      100.0
1       05/02/2017      2017-05-31      100.0
1       06/02/2017      2017-06-30      100.0
1       07/02/2017      2017-07-31      100.0
1       08/02/2017      2017-08-31      100.0
1       09/02/2017      2017-09-30      100.0
1       10/02/2017      2017-10-31      100.0
Time taken: 52.142 seconds, Fetched: 10 row(s)

Hope this helps.