Sequentially number across groups without restarting sequence in r

32 views Asked by At

I want to create the "turn" column in the example data frame. I have a larger dataset with thousands of rows. This column will indicate the current turn of the speaker. Even if the sentences are across different rows, if they are spoken by the same speaker, it will count as the same turn. Then, the next time said person has a turn to speak, it will be nth turn.

df <- data.frame(
  line = c(1:9),
  speaker = c("nick", "nick", "nick", "bob", "nick", "ann", "ann", "nick", "bob"),
  sentence = c("hi", "how are you?", "what's up?", "i'm good", "me too", "hi guys", "any plans for the weekend", "no", "ya, the movies"),
  turn = c(1, 1, 1, 2, 3, 4, 4, 5, 6))

I have used:

  • group_by(speaker) %>% mutate(turn2 = cur_group_id()) - but it numbers by speaker's name in alphabetical order and the same speaker is coded as the same number e.g., Nick is always numbered as 3, but should be numbered as turns 1, 3, and 5:
   line speaker sentence      turn turn_curgroupid
1     1 nick    hi               1               3
2     2 nick    how are you?     1               3
3     3 nick    what's up?       1               3
4     4 bob     i'm good         2               2
5     5 nick    me too           3               3
6     6 ann     hi guys          4               1
  • seq_along(speaker) - sequentially counts the rows per speaker despite it being the same turn e.g., what should be Nick's first turn, is numbered as 1:3
   line speaker sentence      turn turn_seqalong
1     1 nick    hi               1             1
2     2 nick    how are you?     1             2
3     3 nick    what's up?       1             3
4     4 bob     i'm good         2             1
5     5 nick    me too           3             4
6     6 ann     hi guys          4             1

Thanks for your help.

1

There are 1 answers

0
Jon Spring On BEST ANSWER
df |>
  mutate(turn2 = cumsum(speaker != lag(speaker, 1, "")),
         turn3 = consecutive_id(speaker)) 
         # H/T @andre-wildberg for mentioning this useful dplyr 1.1.0 function

Result

  line speaker                  sentence turn turn2 turn3
1    1    nick                        hi    1     1     1
2    2    nick              how are you?    1     1     1
3    3    nick                what's up?    1     1     1
4    4     bob                  i'm good    2     2     2
5    5    nick                    me too    3     3     3
6    6     ann                   hi guys    4     4     4
7    7     ann any plans for the weekend    4     4     4
8    8    nick                        no    5     5     5
9    9     bob            ya, the movies    6     6     6