I have data from every month in 2019 but only through September in 2020. Each row contains a MonthNo.
, corresponding to the calendar month, and a user ID
entry. It looks like this
| Month | Year | ID | MonthNo. |
|-----------|------|--------|----------|
| January | 2019 | 611330 | 01 |
| January | 2019 | 174519 | 01 |
| January | 2019 | 380747 | 01 |
| February | 2019 | 882347 | 02 |
| February | 2019 | 633797 | 02 |
| February | 2019 | 863219 | 02 |
| March | 2019 | 189924 | 03 |
| March | 2019 | 241922 | 03 |
| March | 2019 | 563335 | 03 |
| April | 2019 | 648660 | 04 |
| April | 2019 | 363710 | 04 |
| April | 2019 | 606284 | 04 |
| May | 2019 | 296508 | 05 |
| May | 2019 | 287650 | 05 |
| May | 2019 | 599909 | 05 |
| June | 2019 | 513844 | 06 |
| June | 2019 | 891633 | 06 |
| June | 2019 | 138250 | 06 |
| July | 2019 | 126235 | 07 |
| July | 2019 | 853840 | 07 |
| July | 2019 | 713104 | 07 |
| August | 2019 | 180511 | 08 |
| August | 2019 | 451735 | 08 |
| August | 2019 | 818095 | 08 |
| September | 2019 | 512621 | 09 |
| September | 2019 | 674079 | 09 |
| September | 2019 | 914015 | 09 |
| October | 2019 | 132859 | 10 |
| October | 2019 | 560572 | 10 |
| October | 2019 | 272557 | 10 |
| November | 2019 | 984001 | 11 |
| November | 2019 | 815688 | 11 |
| November | 2019 | 902748 | 11 |
| December | 2019 | 880285 | 12 |
| December | 2019 | 167629 | 12 |
| December | 2019 | 772039 | 12 |
| January | 2020 | 116886 | 01 |
| January | 2020 | 386078 | 01 |
| February | 2020 | 291060 | 02 |
| February | 2020 | 970032 | 02 |
| March | 2020 | 907555 | 03 |
| March | 2020 | 560827 | 03 |
| April | 2020 | 938039 | 04 |
| April | 2020 | 721640 | 04 |
| May | 2020 | 131719 | 05 |
| May | 2020 | 415596 | 05 |
| June | 2020 | 589375 | 06 |
| June | 2020 | 623663 | 06 |
| July | 2020 | 577748 | 07 |
| July | 2020 | 999572 | 07 |
| August | 2020 | 630975 | 08 |
| August | 2020 | 442278 | 08 |
| September | 2020 | 993318 | 09 |
| September | 2020 | 413214 | 09 |
This example table has exactly 3 records for every month in 2019, and exactly 2 records for every month in 2020. So when I add a calculated field called MonthNotYearTraffic
, defined by
// Averages ID count by month number only, intentionally ignoring year.
avgOver(count(ID), [{MonthNo.}])
I expect the following results
| MonthNo. | MonthNotYearTraffic |
|----------|---------------------|
| 01 | 2.5 |
| 02 | 2.5 |
| 03 | 2.5 |
| 04 | 2.5 |
| 05 | 2.5 |
| 06 | 2.5 |
| 07 | 2.5 |
| 08 | 2.5 |
| 09 | 2.5 |
| 10 | 3 |
| 11 | 3 |
| 12 | 3 |
since months 10-12 only have the three abovementioned 2019 entries. But instead, the results are:
I've tried this several different ways and combinations of the following (several of which I know to be insane, but others unsure):
- at first not relying on custom, calculated fields
- by partitioning on both month and year in the calculated field definition
- by messing with level aware aggregations
- by ensuring data types to agged by are strings/dimensions
No dice.
This seems like it should be straightforward technique, so any pointers would be nice. Thank you.
It looks as if you need to partition the count of your IDs by month and then divide that count by the count of years in which you have user IDs in that month.
Using your sample data I was able to get your desired output.