Convert character string and data frame into another data frame

53 views Asked by At

I have a one row data frame that looks like this:

            Donor  Treatment Timepoint
  MK434_016   WT5 ST002_50uM       6hr

And a character string which looks like this:

[1] "AAACAAGCAAACAAGAATTCGGTT-1" "AAACAAGCAAACAATCATTCGGTT-1" "AAACAAGCAAACCTGAATTCGGTT-1" "AAACAAGCAAACTTGGATTCGGTT-1"
[5] "AAACAAGCAAAGACCCATTCGGTT-1" "AAACAAGCAAAGGTAAATTCGGTT-1"

I'd like to merge the two to create a data frame that looks like this:

                           Donor  Treatment Timepoint
AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
etc...

I've tried merging them in several different ways using rbind() or paste() but can't figure out how to get the full data frame I'm looking for.

2

There are 2 answers

3
r2evans On BEST ANSWER

I'll first join them together without row names, as some tools honor them, some ignore them, and some actively remove them.

df2 <- cbind(df1[rep(1, length(strings)),], data.frame(barcode = strings))
df2
#             Donor  Treatment Timepoint                    barcode
# MK434_016     WT5 ST002_50uM       6hr AAACAAGCAAACAAGAATTCGGTT-1
# MK434_016.1   WT5 ST002_50uM       6hr AAACAAGCAAACAATCATTCGGTT-1
# MK434_016.2   WT5 ST002_50uM       6hr AAACAAGCAAACCTGAATTCGGTT-1
# MK434_016.3   WT5 ST002_50uM       6hr AAACAAGCAAACTTGGATTCGGTT-1
# MK434_016.4   WT5 ST002_50uM       6hr AAACAAGCAAAGACCCATTCGGTT-1
# MK434_016.5   WT5 ST002_50uM       6hr AAACAAGCAAAGGTAAATTCGGTT-1

From here, if you really want to remove the barcode info from the columns and make them row names, it is simple enough:

rownames(df2) <- df2$barcode
df2$barcode <- NULL
df2
#                            Donor  Treatment Timepoint
# AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACTTGGATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAAGACCCATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAAGGTAAATTCGGTT-1   WT5 ST002_50uM       6hr

A quick dplyr version:

library(dplyr)
df1[rep(1, length(strings)),] %>%
  `rownames<-`(NULL) %>%
  mutate(barcode = strings) %>%
  tibble::column_to_rownames("barcode")
#                            Donor  Treatment Timepoint
# AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAACTTGGATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAAGACCCATTCGGTT-1   WT5 ST002_50uM       6hr
# AAACAAGCAAAGGTAAATTCGGTT-1   WT5 ST002_50uM       6hr

Data

df1 <- structure(list(Donor = "WT5", Treatment = "ST002_50uM", Timepoint = "6hr"), class = "data.frame", row.names = "MK434_016")
strings <- c("AAACAAGCAAACAAGAATTCGGTT-1", "AAACAAGCAAACAATCATTCGGTT-1", "AAACAAGCAAACCTGAATTCGGTT-1", "AAACAAGCAAACTTGGATTCGGTT-1", "AAACAAGCAAAGACCCATTCGGTT-1", "AAACAAGCAAAGGTAAATTCGGTT-1")
0
IceCreamToucan On

Using data as in @r2evans' answer

library(dplyr)

df1 %>% 
  reframe(barcode = strings, across(everything()))
#>                      barcode Donor  Treatment Timepoint
#> 1 AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
#> 2 AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
#> 3 AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
#> 4 AAACAAGCAAACTTGGATTCGGTT-1   WT5 ST002_50uM       6hr
#> 5 AAACAAGCAAAGACCCATTCGGTT-1   WT5 ST002_50uM       6hr
#> 6 AAACAAGCAAAGGTAAATTCGGTT-1   WT5 ST002_50uM       6hr

Created on 2023-10-27 with reprex v2.0.2