How to convert from wide to long format in longitudinal data?

102 views Asked by At

My dataset looks something like this:

Id    smoke_2m  Smoke_3m  smoke_4m alcohol_3m  age sex

1      2          3        5          7         30   M
2      3          3        5          6         26   M
3      7          5        7          2         25   F

I want my output as follows:

Id  Time  smoking  alcohol  age sex
1    2m        2      -       30  M
1    3m        3      7       30  M
1    4m        5      -       30  M
2    2m        3      -       26  M
2    3m        3      6       26  M
2    4m        5      -       26  M
3    2m        7      -       25  F
3    3m        5      2       25  F
3    4m        7      -       25  F

Here for alcohol only 3m values are available so I want the values for other time points (2m,4m) to be empty (-) or filled by the previous value all three alcohol time points would be for eg 7

I tried using this code below however it only performs for one variable i.e smoking

data <- data %>%
  pivot_longer(
    cols = c(smoke_2m, smoke_3m, smoke_4m),  
    names_to = "time_point",                     
    values_to = "smoking_status",
)

How to compile for all variables?

1

There are 1 answers

0
r2evans On

The trick here is to use names_pattern, which allows for its special-case ".value" as one of names_to.

pivot_longer(quux, cols = -c(Id, sex, age),
             names_pattern = "(.*)_(.*)", names_to = c(".value", "Time"))
# # A tibble: 9 × 7
#      Id   age sex   Time  smoke Smoke alcohol
#   <int> <int> <chr> <chr> <int> <int>   <int>
# 1     1    30 M     2m        2    NA      NA
# 2     1    30 M     3m       NA     3       7
# 3     1    30 M     4m        5    NA      NA
# 4     2    26 M     2m        3    NA      NA
# 5     2    26 M     3m       NA     3       6
# 6     2    26 M     4m        5    NA      NA
# 7     3    25 F     2m        7    NA      NA
# 8     3    25 F     3m       NA     5       2
# 9     3    25 F     4m        7    NA      NA

Inferring that you intend for Smoke_3m to be grouped within smoke, though, let's fix that:

quux |>
  rename_with(.fn = tolower, .cols = starts_with("Smoke")) |>
  pivot_longer(cols = -c(Id, sex, age),
               names_pattern = "(.*)_(.*)", names_to = c(".value", "Time"))
# # A tibble: 9 × 6
#      Id   age sex   Time  smoke alcohol
#   <int> <int> <chr> <chr> <int>   <int>
# 1     1    30 M     2m        2      NA
# 2     1    30 M     3m        3       7
# 3     1    30 M     4m        5      NA
# 4     2    26 M     2m        3      NA
# 5     2    26 M     3m        3       6
# 6     2    26 M     4m        5      NA
# 7     3    25 F     2m        7      NA
# 8     3    25 F     3m        5       2
# 9     3    25 F     4m        7      NA

Data

quux <- structure(list(Id = 1:3, smoke_2m = c(2L, 3L, 7L), Smoke_3m = c(3L, 3L, 5L), smoke_4m = c(5L, 5L, 7L), alcohol_3m = c(7L, 6L, 2L), age = c(30L, 26L, 25L), sex = c("M", "M", "F")), class = "data.frame", row.names = c(NA, -3L))