joining two dataset to get optimal result

68 views Asked by At

Thanks in advance! this is little long question but hopefully it is clear and concise.

I have two different query bringing two data sets.

both datasets are bringing data in same grain for similar area.

first data set has multiple snapshots per month of a season , so it have three different metrics value for each season for same grain for a season.

Dataset 1 ( country , season , month , metric is column names ) (1. A , autumn, 9/1, 1 is the record for column)

country season month metric

  1. A, autumn , 9/1 , 1
  2. A, autumn , 10/1 , 2
  3. A, autumn , 11/1 , 3

where as the second data set has one snapshots (just one month out of season) for a season, so it will have just one snapshot per season.

Dataset 2

country season month metric

  1. A , autumn , 9/1 , 4

my current approach of full outer join is giving me the following output

country season month metric1 metric2

  1. A, autumn, 9/1, 1, 4

  2. A, autumn, 10/1, 2,

  3. A, autumn , 11/1, 3 ,

**Notice the metric2 for 10/1 and 11/1 are blank or zero if use nvl but what we need is to populate same value 4 for these two months as well how can we achieve this

or FINAL OUTPUT should look something like this

country season month metric1 metric2

  1. A, autumn, 9/1, 1, 4

  2. A, autumn, 10/1, 2, 4

  3. A, autumn , 11/1, 3 , 4

3

There are 3 answers

0
Somy On

Change the join condition.

Instead of just using

Dataset1.country = Dataset2.country 
and Dataset1.month = Dataset2.month

Use the conditions in join as -

Dataset1.country = Dataset2.country 
and Dataset1.month >= Dataset2.month

Now I don’t know about the data type of month. I have put the logic but to actually compare you may need to do type casting as required for proper comparison using >=

0
user9601310 On

I'm not familiar with Snowflake at all but assuming it uses fairly standard SQL, this should be as simple as:

SELECT a.country, a.season, a.month, a.metric AS metric1, b.metric AS metric2
FROM dataset1 a
INNER JOIN dataset2 b
ON a.country = b.country AND a.season = b.season

i.e. It shouldn't need a full outer join and should need to match only on country and season. P.S. It would help if you showed the SQL you've tried as part of your question.

0
Gordon Linoff On

Use a left join and a window function. Here is one method:

SELECT d1.country, d1.season, d1.month, d1.metric AS metric1,
       COALESCE(d2.metric,
                LAG(d2.metric IGNORE NULLS) OVER (PARTITION BY d1.country ORDER BY d1.month)
               ) AS metric2
FROM dataset1 d1 LEFT JOIN
     dataset2 d2
     ON d2.country = d1.country AND d2.season = d1.season;

You don't specify the exact logic for imputing the value, so this takes the most recent non-NULL value.