avgOver in Quicksight

1.8k views Asked by At

I have data from every month in 2019 but only through September in 2020. Each row contains a MonthNo., corresponding to the calendar month, and a user ID entry. It looks like this

| Month     | Year | ID     | MonthNo. |
|-----------|------|--------|----------|
| January   | 2019 | 611330 | 01       |
| January   | 2019 | 174519 | 01       |
| January   | 2019 | 380747 | 01       |
| February  | 2019 | 882347 | 02       |
| February  | 2019 | 633797 | 02       |
| February  | 2019 | 863219 | 02       |
| March     | 2019 | 189924 | 03       |
| March     | 2019 | 241922 | 03       |
| March     | 2019 | 563335 | 03       |
| April     | 2019 | 648660 | 04       |
| April     | 2019 | 363710 | 04       |
| April     | 2019 | 606284 | 04       |
| May       | 2019 | 296508 | 05       |
| May       | 2019 | 287650 | 05       |
| May       | 2019 | 599909 | 05       |
| June      | 2019 | 513844 | 06       |
| June      | 2019 | 891633 | 06       |
| June      | 2019 | 138250 | 06       |
| July      | 2019 | 126235 | 07       |
| July      | 2019 | 853840 | 07       |
| July      | 2019 | 713104 | 07       |
| August    | 2019 | 180511 | 08       |
| August    | 2019 | 451735 | 08       |
| August    | 2019 | 818095 | 08       |
| September | 2019 | 512621 | 09       |
| September | 2019 | 674079 | 09       |
| September | 2019 | 914015 | 09       |
| October   | 2019 | 132859 | 10       |
| October   | 2019 | 560572 | 10       |
| October   | 2019 | 272557 | 10       |
| November  | 2019 | 984001 | 11       |
| November  | 2019 | 815688 | 11       |
| November  | 2019 | 902748 | 11       |
| December  | 2019 | 880285 | 12       |
| December  | 2019 | 167629 | 12       |
| December  | 2019 | 772039 | 12       |
| January   | 2020 | 116886 | 01       |
| January   | 2020 | 386078 | 01       |
| February  | 2020 | 291060 | 02       |
| February  | 2020 | 970032 | 02       |
| March     | 2020 | 907555 | 03       |
| March     | 2020 | 560827 | 03       |
| April     | 2020 | 938039 | 04       |
| April     | 2020 | 721640 | 04       |
| May       | 2020 | 131719 | 05       |
| May       | 2020 | 415596 | 05       |
| June      | 2020 | 589375 | 06       |
| June      | 2020 | 623663 | 06       |
| July      | 2020 | 577748 | 07       |
| July      | 2020 | 999572 | 07       |
| August    | 2020 | 630975 | 08       |
| August    | 2020 | 442278 | 08       |
| September | 2020 | 993318 | 09       |
| September | 2020 | 413214 | 09       |

This example table has exactly 3 records for every month in 2019, and exactly 2 records for every month in 2020. So when I add a calculated field called MonthNotYearTraffic, defined by

// Averages ID count by month number only, intentionally ignoring year.

avgOver(count(ID), [{MonthNo.}])

I expect the following results

| MonthNo. | MonthNotYearTraffic |
|----------|---------------------|
| 01       | 2.5                 |
| 02       | 2.5                 |
| 03       | 2.5                 |
| 04       | 2.5                 |
| 05       | 2.5                 |
| 06       | 2.5                 |
| 07       | 2.5                 |
| 08       | 2.5                 |
| 09       | 2.5                 |
| 10       | 3                   |
| 11       | 3                   |
| 12       | 3                   |

since months 10-12 only have the three abovementioned 2019 entries. But instead, the results are:

enter image description here

I've tried this several different ways and combinations of the following (several of which I know to be insane, but others unsure):

  • at first not relying on custom, calculated fields
  • by partitioning on both month and year in the calculated field definition
  • by messing with level aware aggregations
  • by ensuring data types to agged by are strings/dimensions

No dice.

This seems like it should be straightforward technique, so any pointers would be nice. Thank you.

2

There are 2 answers

0
DataNut On BEST ANSWER

It looks as if you need to partition the count of your IDs by month and then divide that count by the count of years in which you have user IDs in that month.

Using your sample data I was able to get your desired output.

MonthNotYearTraffic = countover(ID,[Month],PRE_FILTER)/distinctCountOver(Year,[Month],PRE_FILTER)

enter image description here

1
JD D On

I think the problem is that avgOver only works when you have the data displayed like you do in your first table where you are defining the values in the question. Since you are only showing the MonthNo. field and there are not many rows with that same MonthNo. value, there is only one row for each month in that partition so it's simply dividing the count by 1.

Maybe try something like count(ID) / count("MonthNo.")