R zip dataframe rows

87 views Asked by At

I have a dataframe in R with the following row-structure

rowA1
rowA2
rowA3
rowB1
rowB2
rowB3
rowC1
rowC2
rowC3

I want to reorder it that way:

rowA1
rowB1
rowC1
rowA2
rowB2
rowC2
rowA3
rowB3
rowC3

how?

edit: to clarify, the dataframe originally is composed out of 3 dataframes using bind_rows(A,B,C) each dataframe had 3 items.

Note that rowA1 etc. are exemplary and could contain any columns and values.

4

There are 4 answers

0
Julian On BEST ANSWER

One dplyr option (instead of substr you can use readr::parse_number as Jon suggested):

tibble::tribble(~row,
                "rowA1",
                "rowA2",
                "rowA3",
                "rowB1",
                "rowB2",
                "rowB3",
                "rowC1",
                "rowC2",
                "rowC3") |> 
  dplyr::mutate(nbr = substr(row, 5,5)) |> 
  dplyr::arrange(nbr) |> 
  dplyr::select(-nbr)
0
zx8754 On

Import as fixed width file, then order based on two columns:

# instead of textConnection use file = myfilename.txt
x <- read.fwf(textConnection("rowA1
rowA2
rowA3
rowB1
rowB2
rowB3
rowC1
rowC2
rowC3"), widths = c(3, 1, 1))

x[ order(x$V3, x$V2), ]
#    V1 V2 V3
# 1 row  A  1
# 4 row  B  1
# 7 row  C  1
# 2 row  A  2
# 5 row  B  2
# 8 row  C  2
# 3 row  A  3
# 6 row  B  3
# 9 row  C  3
0
Darren Tsai On

You can remove the non-numeric parts of those strings and then order():

df <- data.frame(id = 1:9,
x = c(
"rowA1",
"rowA2",
"rowA3",
"rowB1",
"rowB2",
"rowB3",
"rowC1",
"rowC2",
"rowC3"))

df[order(sub('^\\D+', '', df$x)), ]

#   id     x
# 1  1 rowA1
# 4  4 rowB1
# 7  7 rowC1
# 2  2 rowA2
# 5  5 rowB2
# 8  8 rowC2
# 3  3 rowA3
# 6  6 rowB3
# 9  9 rowC3

Its tidyverse equivalent is

library(dplyr)
library(stringr)

df %>%
  arrange(str_extract(x, '\\d+'))
0
SamR On

It appears that you're using the word "zip" in the sense of the Python zip() function:

The zip() function returns a zip object, which is an iterator of tuples where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together etc.

So in this case - rather than reordering based on the value of the rows - this means reordering the rows of three bound data frames so that the first row of each respective data frame forms the first three rows of the bound data frame, the second row of each input frame are rows 4, 5, and 6 and so on.

Method 1: Reordering after binding

This can be used on your data as posted (i.e. you have already used bind_rows()). If you have bound a different number of data frames together you can change the num_df_bound argument to the appropriate value.

reorder_rows <- function(dat, num_df_bound = 3) {

    n  <- nrow(dat)

    row_order  <- vapply(1:num_df_bound, 
        \(x) seq(from = x, to = n, by = num_df_bound),
        numeric(n/num_df_bound)
    ) |> as.numeric()

    dat[row_order, , drop = FALSE]

}

reorder_rows(dat)
#       x
# 1 rowA1
# 4 rowB1
# 7 rowC1
# 2 rowA2
# 5 rowB2
# 8 rowC2
# 3 rowA3
# 6 rowB3
# 9 rowC3

However, this approach will not produce the expected output if the bound data frames do not all have an equal number of rows. Method 2 is preferable in that case.

Method 2: Adding an index before binding

base R approach

Simply add an index before you bind your data frames and then sort on that index after binding. Like method 1 this will work with any number of data frames. However, it will also work if your data frames have an unequal number of rows. I'll set out how to do this using only base R functions, and also with tidyverse and data.table functions.

dat1 <- data.frame(x = c("rowA1", "rowA2", "rowA3"))
dat2 <- data.frame(x = c("rowB1", "rowB2", "rowB3"))
dat3 <- data.frame(x = c("rowC1", "rowC2", "rowC3"))

to_bind  <- list(dat1, dat2, dat3)

dat  <- to_bind |>
    lapply(
        \(df) {
            df$index  <- seq_len(nrow(df))
            df
        }
    ) |>
    do.call(rbind, args = _)

dat[order(dat$index),]
#       x index
# 1 rowA1     1
# 4 rowB1     1
# 7 rowC1     1
# 2 rowA2     2
# 5 rowB2     2
# 8 rowC2     2
# 3 rowA3     3
# 6 rowB3     3
# 9 rowC3     3

This requires at least R 4.2 as it uses the _ pipe placeholder. However, it could be trivially rewritten by assigning an intermediate variable instead of using pipes.

tidyverse approach

As you are using dplyr::bind_rows() already, you may prefer a tidyverse approach:


library(dplyr)
library(purrr)
to_bind |>
    map(
        \(df) df |>
            mutate(index = row_number())
    ) |>
    list_rbind() |>
    arrange(index)
# <same output>

data.table approach

For completeness here is the same approach using data.table. If your data is large this should be quicker than the other methods because setorder() modifies in place, rather than copying.


library(data.table)
dat  <- to_bind |>
    lapply(
        \(df) {
            setDT(df)
            df[, index := .I]
        }
    ) |>
    rbindlist() |>
    setorder(index)
# <also creates the rows in the same order>

Input data (method 2)

dat  <- structure(list(x = c("rowA1", "rowA2", "rowA3", "rowB1", "rowB2",
"rowB3", "rowC1", "rowC2", "rowC3")), class = "data.frame", row.names = c(NA,
-9L))