I'm trying to create histograms per-group then return a summary. Per this answer, I can use {braces} and print to avoid issues in creating one plot then moving onto another, however this doesn't seem to acknowledge grouping:
data(mtcars)
mtcars |>
group_by(cyl) %T>%
{print(ggplot(.) +
geom_histogram(aes(x = carb)))} |>
summarise(meancarb = mean(carb))
The above code works insofar as it creates a single histogram then the summary, however:
mtcars %T>%
{print(ggplot(.) +
geom_histogram(aes(x = carb)))} |>
group_by(cyl) |>
summarise(meancarb = mean(carb))
The above code produces exactly the same output, i.e. confirming that group_by isn't being acknowledged.
Does anyone know why the grouping isn't being used to create 1 histogram per unique cyl? Ideally I'd love to work out how to use Tee pipes to do this kinda thing more often, including saving the output to unique names, before continuing onto more pipe. In general it feels like Tee pipes are underused, possibly relating to the dearth of info about them, so if anyone has any cool examples to share, that might be great for the community.
Thanks!
Edit
Following divibisan's comment about dplyr::group_map (or group_walk):
mtcars |>
group_by(cyl) %T>%
group_walk(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb))) |>
summarise(meancarb = mean(carb, na.rm = TRUE),
sd3 = sd(carb, na.rm = TRUE) * 3)
This creates the summary table but no plot(s). Output identical for map and walk. Output also the same if I replace %T>% with |>. Ostensibly group_walk is doing the same as %T>%. With |> and group_map, I get:
Error in UseMethod("summarise"): no applicable method for 'summarise' applied to an object of class "list"
mtcars |>
group_by(cyl) %T>%
{print(group_walk(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb))))} |>
summarise(meancarb = mean(carb, na.rm = TRUE),
sd3 = sd(carb, na.rm = TRUE) * 3)
With print and braces:
Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'print': argument ".data" is missing, with no default
Braces no print:
Error in group_map(.data, .f, ..., .keep = .keep): argument ".data" is missing, with no default
Print no braces: same as braces no print.
Edit2
More interesting ideas coming forth, thanks to Ricardo, this:
mtcars |>
group_split(cyl) |>
map(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb)))
Works insofar as it produces 1 plot per group. Success! But: I can't find any combination of Tee/pipes which Tees off mtcars for the group_split AND map, and then resumes the main pipe line:
mtcars %T>%
group_split(cyl) %T>%
map(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb))) |>
summarise(meancarb = mean(carb))
Error in
map(): In index: 1. With name: mpg. Caused by error infortify():datamust be a <data.frame>, or an object coercible byfortify(), not a double vector.
Also anything other than 2 pipes means the plots aren't created.
Trying this another way around, by reordering the pipe structure (which won't always be possible/desirable):
mtcars |>
group_by(cyl) %T>%
summarise(meancarb = mean(carb)) |>
ungroup() |>
group_split(cyl) |>
map(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb)))
This creates the 3 plots but doesn't print the summary. Any combination of {braces} and/or print around the summary line gives:
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'mean': object 'carb' not found.
Does anyone know whether the Tee pipe is explicitly for a single command, i.e. you can't pipe another command onto the tee branch, and then return to the main pipe? Thanks all
Edit 3
Thanks zephyr. Followup question: how to do multi-command tee pipes without a formula-format first command?
mtcars |>
summarise(sdd = sd(carb, na.rm = TRUE))
Works fine, prints a single value.
mtcars %T>%
summarise(sdd = sd(carb, na.rm = TRUE)) |>
summarise(
meancarb = mean(carb, na.rm = TRUE),
sd3 = sd(carb, na.rm = TRUE) * 3
)
Doesn't print the value, performs the calculation invisibly then continues. Any combination of print and {braces} I've tried results in:
Error: function '{' not supported in RHS call of a pipe
or
Error in is.data.frame(x) : object 'carb' not found
Say I wanted, e.g.:
mtcars |>
summarise(~{
print(sdd = sd(carb))
write_csv(file = "tmp.csv")
.x
}) |>
summarise(meancarb = mean(carb))
Any thoughts? Thanks again!
You were on the right track with
group_walk(), but you need to put theprint()inside the mapped function:Note you can get the same result without using
%T>%by assigning the plot to a name in your anonymous function and returning the original dataframe after printing: