How to order a categorical variable in a boxplot using the Julia package Gadfly

580 views Asked by At

Gadfly does not seem to use the (level) order of categorical variables:

using CSV
using DataFrames
using Gadfly
using HTTP

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"

tips = CSV.File(HTTP.get(url).body) |> DataFrame
categorical!(tips, :day)
ordered!(tips.day, true)
levels!(tips.day, ["Thur", "Fri", "Sat", "Sun"])

Gadfly.plot(tips, x=:day, y=:total_bill, color=:smoker, Geom.boxplot)

enter image description here

Should the plot not inherit the order specified in the categorical variable?

I found a way to order the categorical values, but that feels a little 'buggy' because of specifying the order again.

Gadfly.plot(tips, x=:day, y=:total_bill, color=:smoker, Geom.boxplot,
    Scale.x_discrete(levels=levels(tips.day)))

enter image description here

Any suggestions how to solve this?

1

There are 1 answers

1
Mattriks On

In Gadfly, for discrete x the order of the values is determined by their order in the dataframe (so currently the level order in the CategoricalArray is not supported). It might not be supported in the future, because DataFrames plans to drop CategoricalArrays (https://github.com/JuliaData/DataFrames.jl/issues/2321).