Often I need to spread
multiple value columns, as in this question. But I do it often enough that I'd like to be able to write a function that does this.
For example, given the data:
set.seed(42)
dat <- data_frame(id = rep(1:2,each = 2),
grp = rep(letters[1:2],times = 2),
avg = rnorm(4),
sd = runif(4))
> dat
# A tibble: 4 x 4
id grp avg sd
<int> <chr> <dbl> <dbl>
1 1 a 1.3709584 0.6569923
2 1 b -0.5646982 0.7050648
3 2 a 0.3631284 0.4577418
4 2 b 0.6328626 0.7191123
I'd like to create a function that returns something like:
# A tibble: 2 x 5
id a_avg b_avg a_sd b_sd
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123
How can I do that?
We'll return to the answer provided in the question linked to, but for the moment let's start with a more naive approach.
One idea would be to
spread
each value column individually, and then join the results, i.e.(I used a
full_join
just in case we run into situations where not all combinations of the join columns appear in all of them.)Let's start with a function that works like
spread
but allows you to pass thekey
andvalue
columns as characters:The key ideas here are to unquote the arguments
key_col
andvalue_cols[i]
using the!!
operator, and using thesep
argument inspread
to control the resulting value column names.If we wanted to convert this function to accept unquoted arguments for the key and value columns, we could modify it like so:
The change here is that we capture the unquoted arguments with
rlang::quos
andrlang::enquo
and then simply convert them back to characters usingtidyselect::vars_select
.Returning to the solution in the linked question that uses a sequence of
gather
,unite
andspread
, we can use what we've learned to make a function like this:This relies on the same techniques from rlang from the last example. We're using some unusual names like
..var..
for our intermediate variables in order to reduce the chances of name collisions with existing columns in our data frame.Also, we're using the
sep
argument inunite
to control the resulting column names, so in this case when wespread
we forcesep = NULL
.