Duplicating rows in dataframe based on column value

168 views Asked by At

I am trying to duplicate rows based on the value of a column. My dataframe (df) currently looks like:

Species name Visits
Apis m 4
Bombus l 7

And so on (there are 34 more columns which all need to be repeated) I want it to look like:

Species name
Apis m
Apis m
Apis m
Apis m
Bombus l
Bombus l
Bombus l
Bombus l
Bombus l
Bombus l
Bombus l

This a fairly large dataset of 1767 observations already, there are 190 'Species Name' and each one has been visited several hundred times.

I'm very new to R (and coding!) so everything is very 'trial and error'. I found a solution on Stack Overflow using "splitstackshape" but am getting the error

"Error in .subset2(x, i, exact = exact) : recursive indexing failed at level 2".

This is my code:

expandRows(df, df$Visits, 
           count.is.col = TRUE, drop = TRUE)

There are questions for other instances of this error but note related to the 'expand rows' function. The column is stored as an integer and I've removed any null values from the 'Visits' column.

Any pointers as to what my issue might be or other ideas of how to do this would be much appreciated.

Danielle

Edit: Reprex below, I'm not sure what 'could not find function' relates to as it appeared to run the code without the Reprex? Also, not in here that it includes the actual column names and df, I simplified in the example above.

expandRows(BombusL, BombusL$No.of.Interaction.Records, count.is.col = TRUE, 
    drop = TRUE)
#> Error in expandRows(BombusL, BombusL$No.of.Interaction.Records, count.is.col = TRUE, : could not find function "expandRows"
2

There are 2 answers

2
TarJae On BEST ANSWER

Update (as uncount is already mentioned):

With your code:

df.expanded <- df[rep(row.names(df), df$Visits), 1:2]

Or: You could use slice with seq_len(n())

library(dplyr)
df %>%  
  slice(rep(seq_len(n()), Visits)) %>% 
  select(-Visits)

Output:

   Species.name
   <chr>       
 1 Apis m      
 2 Apis m      
 3 Apis m      
 4 Apis m      
 5 Bombus l    
 6 Bombus l    
 7 Bombus l    
 8 Bombus l    
 9 Bombus l    
10 Bombus l    
11 Bombus l    
2
Ran K On

You can try uncount from the tidyr/tidyverse package

library(tidyr)

data <- data.frame(Species = c("Apis m","Nimbus"),Visits = c(4,7))
data %>% 
  uncount(Visits)
#>     Species
#> 1    Apis m
#> 1.1  Apis m
#> 1.2  Apis m
#> 1.3  Apis m
#> 2    Nimbus
#> 2.1  Nimbus
#> 2.2  Nimbus
#> 2.3  Nimbus
#> 2.4  Nimbus
#> 2.5  Nimbus
#> 2.6  Nimbus

Created on 2021-04-25 by the reprex package (v2.0.0)