I am new to Vega-Lite and trying to aggregate my data by week. The existing option to display the data by week is not suitable for me because I'd like the week to start on Monday (rather than Sunday as it is right now) + in fact the week numbers are wrong.
Below is my basic code.
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"values": [
{"date": "2020-09-29", "count": "13", "outcome": "invalid"},
{"date": "2020-09-29", "count": "14", "outcome": "fail"},
{"date": "2020-09-29", "count": "20", "outcome": "pass"},
{"date": "2020-09-27", "count": "70", "outcome": "invalid"},
{"date": "2020-09-27", "count": "30", "outcome": "fail"},
{"date": "2020-09-27", "count": "20", "outcome": "pass"},
{"date": "2020-09-26", "count": "5", "outcome": "invalid"},
{"date": "2020-09-26", "count": "15", "outcome": "fail"},
{"date": "2020-09-26", "count": "13", "outcome": "pass"}
]
},
"width": 280,
"height": 200,
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {
"title": "Week",
"field": "date",
"type": "ordinal",
"timeUnit": "week",
"axis": {"format": "%W"}
},
"y": {
"title": "Number of tests",
"field": "count",
"aggregate": "sum",
"type": "quantitative",
"axis": {"orient": "right"}
},
"color": {
"field": "outcome",
"type": "nominal",
"scale": {
"domain": ["invalid", "fail", "pass"],
"range": ["#c7c7c7", "#8fd7f9", "#ef9292"]
},
"legend": {"title": "Test results"}
}
}
}
I could in principle calculate the counts per week using something like the window function in the snippet below, but I have multiple instances of each date and I do not want to collapse across the "outcome" variable. Moreover, my data can start at any arbitrary date, so calculating the week number starting from 0 is not an option either.
{"calculate": "day(datum.date) == 0", "as": "sundays"},
{
"window": [{"op": "sum", "field": "sundays", "as": "week"}],
"sort": "date"
}
I also thought of a less elegant solution - taking steps of 7 days in the x axis and aggregating in the y axis (while making sure that the data starts on a Monday). This gives me a correct total count per week, but then I am struggling with labeling the X axis correctly with week numbers.
Finally, even if I was OK with starting the weeks on Sundays (so using the basic code that I give above), I am seeing unexpected week numbers. For some reason (and perhaps that's because I don't know how to count week numbers correctly), the week numbers displayed are 37 and 38 (as can be seen in the attached image) when in fact they should be 39 and 40. How do I solve this?
I'd be grateful for any tips.
Vega's week timeUnit has well-defined behavior, spelled-out in the timeUnit documentation:
There is currently no alternative week definition built-in to the package, but you can make use of vega expressions within transforms to compute arbitrary quantities from your data.
If I've done the calculations correctly, I think this will give you the ISO weeks that you are after:
Brief explanation of the transforms:
startingDay = 5
, then January 1st is a Friday, so day 4 of the year is the Monday of the first week containing a Thursday. IfstartingDay = 4
, then January 1st is a Thursday, so day -2 is the Monday of the first week containing a Thursday.Note that we use
utc
versions of timeUnits when parsingdatum.date
in order to correctly handle incomplete timestamps like2020-09-29
. If we had not, the ISOweek would be incorrect for the 1st of January.