Vega-Lite: week starting from Monday and wrong week numbers in general

675 views Asked by At

I am new to Vega-Lite and trying to aggregate my data by week. The existing option to display the data by week is not suitable for me because I'd like the week to start on Monday (rather than Sunday as it is right now) + in fact the week numbers are wrong.

Below is my basic code.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {
    "values": [
      {"date": "2020-09-29", "count": "13", "outcome": "invalid"},
      {"date": "2020-09-29", "count": "14", "outcome": "fail"},
      {"date": "2020-09-29", "count": "20", "outcome": "pass"},
      {"date": "2020-09-27", "count": "70", "outcome": "invalid"},
      {"date": "2020-09-27", "count": "30", "outcome": "fail"},
      {"date": "2020-09-27", "count": "20", "outcome": "pass"},
      {"date": "2020-09-26", "count": "5", "outcome": "invalid"},
      {"date": "2020-09-26", "count": "15", "outcome": "fail"},
      {"date": "2020-09-26", "count": "13", "outcome": "pass"}
    ]
  },
  "width": 280,
  "height": 200,
  "mark": {"type": "bar", "tooltip": true},
  "encoding": {
   "x": {
      "title": "Week",
      "field": "date",
      "type": "ordinal",
      "timeUnit": "week",
      "axis": {"format": "%W"}
    },
    "y": {
      "title": "Number of tests",
      "field": "count",
      "aggregate": "sum",
      "type": "quantitative",
      "axis": {"orient": "right"}
    },
    "color": {
      "field": "outcome",
      "type": "nominal",
      "scale": {
        "domain": ["invalid", "fail", "pass"],
        "range": ["#c7c7c7", "#8fd7f9", "#ef9292"]
      },
      "legend": {"title": "Test results"}
    }
  }
}

rendered plot

I could in principle calculate the counts per week using something like the window function in the snippet below, but I have multiple instances of each date and I do not want to collapse across the "outcome" variable. Moreover, my data can start at any arbitrary date, so calculating the week number starting from 0 is not an option either.

{"calculate": "day(datum.date) == 0", "as": "sundays"},
    {
      "window": [{"op": "sum", "field": "sundays", "as": "week"}],
      "sort": "date"
    }

I also thought of a less elegant solution - taking steps of 7 days in the x axis and aggregating in the y axis (while making sure that the data starts on a Monday). This gives me a correct total count per week, but then I am struggling with labeling the X axis correctly with week numbers.

Finally, even if I was OK with starting the weeks on Sundays (so using the basic code that I give above), I am seeing unexpected week numbers. For some reason (and perhaps that's because I don't know how to count week numbers correctly), the week numbers displayed are 37 and 38 (as can be seen in the attached image) when in fact they should be 39 and 40. How do I solve this?

I'd be grateful for any tips.

1

There are 1 answers

0
jakevdp On

Vega's week timeUnit has well-defined behavior, spelled-out in the timeUnit documentation:

"week": Sunday-based weeks. Days before the first Sunday of the year are considered to be in week 0, the first Sunday of the year is the start of week 1, the second Sunday week 2, etc..

There is currently no alternative week definition built-in to the package, but you can make use of vega expressions within transforms to compute arbitrary quantities from your data.

If I've done the calculations correctly, I think this will give you the ISO weeks that you are after:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {
    "values": [
      {"date": "2020-09-29", "count": "14", "outcome": "fail"},
      {"date": "2020-09-29", "count": "20", "outcome": "pass"},
      {"date": "2020-09-27", "count": "70", "outcome": "invalid"},
      {"date": "2020-09-27", "count": "30", "outcome": "fail"},
      {"date": "2020-09-27", "count": "20", "outcome": "pass"},
      {"date": "2020-09-26", "count": "5", "outcome": "invalid"},
      {"date": "2020-09-26", "count": "15", "outcome": "fail"},
      {"date": "2020-09-26", "count": "13", "outcome": "pass"}
    ]
  },
  "transform": [
    {"calculate": "day(datetime(utcyear(datum.date), 0, 1))", "as": "startingDay"},
    {"calculate": "(4 - datum.startingDay) % 7 - 2", "as": "mondayOfFirstWeek"},
    {"calculate": "1 + floor((utcdayofyear(datum.date) - datum.mondayOfFirstWeek) / 7)", "as": "ISOweek"}
  ],
  "width": 280,
  "height": 200,
  "mark": {"type": "bar", "tooltip": true},
  "encoding": {
   "x": {
      "title": "Week",
      "field": "ISOweek",
      "type": "ordinal"
    },
    "y": {
      "title": "Number of tests",
      "field": "count",
      "aggregate": "sum",
      "type": "quantitative",
      "axis": {"orient": "right"}
    },
    "color": {
      "field": "outcome",
      "type": "nominal",
      "scale": {
        "domain": ["invalid", "fail", "pass"],
        "range": ["#c7c7c7", "#8fd7f9", "#ef9292"]
      },
      "legend": {"title": "Test results"}
    }
  }
}

enter image description here

Brief explanation of the transforms:

  • {"calculate": "day(datetime(utcyear(datum.date), 0, 1))", "as": "startingDay"},
    
    This computes the day of the week that falls on January 1st for the given year (Sunday=0, Monday=1... Saturday=6).
  • {"calculate": "(4 - datum.startingDay) % 7 - 2", "as": "mondayOfFirstWeek"},
    
    This computes the day of the year on which the first week starts. So, for example, if startingDay = 5, then January 1st is a Friday, so day 4 of the year is the Monday of the first week containing a Thursday. If startingDay = 4, then January 1st is a Thursday, so day -2 is the Monday of the first week containing a Thursday.
  • {"calculate": "1 + floor((utcdayofyear(datum.date) - datum.mondayOfFirstWeek) / 7)", "as": "ISOweek"}
    
    This counts the rounded number of 7-day weeks from the first monday identified above.

Note that we use utc versions of timeUnits when parsing datum.date in order to correctly handle incomplete timestamps like 2020-09-29. If we had not, the ISOweek would be incorrect for the 1st of January.