I am trying to subset one FASTA file (containing multiple sequences) into several smaller ones based on IDs I stored in a list of data frames (and
I have a FASTA called fastafile like this:
fastafile <- dput(fastafile)
structure(list(r1 = "acatattggaggccgaaacaatgaggcgtgatcaactcagtatatcac",
r2 = "ctaacctctcccagtgtggaacctctatctcatgagaaagctgggatgag",
r3 = "atttcctcctgctgcccgggaggtaacaccctggacccctggagtctgca",
r4 = "acatattggaggccgaaacaatgaggcgtgatcaactcagtatatcgg",
r5 = "ctaacctctcccagtgtggaacctctatctcatgagaaagctgggatgg",
r6 = "atttcctcctgctgcccgggaggtaacaccctggacccctggagtctgg"), .Names = c("r1",
"r2", "r3", "r4", "r5", "r6"))
that I loaded using seqinr package like that:
fastafile <- read.fasta(file = "fastafile.fasta",
seqtype = c("DNA","AA"),
as.string = TRUE, set.attributes = FALSE)
I load a table with my IDs and some expression values
GOI <- read.table(header = TRUE, text = "ID T1 T2
1 r1 1.1 2.1
2 r2 1.2 2.2
3 r3 1.1 2.2
4 r4 1.2 2.1
5 r5 1.1 2.1
6 r6 1.2 2.2")
and split them into manageable subsets
GOI.split <- split(GOI,rep(1:3,each=2))
giving me
> GOI.split
$`1`
ID T1 T2
1 r1 1.1 2.1
2 r2 1.2 2.2
$`2`
ID T1 T2
3 r3 1.1 2.2
4 r4 1.2 2.1
$`3`
ID T1 T2
5 r5 1.1 2.1
6 r6 1.2 2.2
Now I would like to subset my sequences based on the IDs in the GOI.split
data frames. In this mock example it should be two sequences per list item. To get the subset for the first one of the listed data frames I can say:
FASTA.1 <- fastafile[c(which(names(fastafile) %in% GOI.split[[1]][,1]))]
# $r1
# [1] "acatattggaggccgaaacaatgaggcgtgatcaactcagtatatcac"
#
# $r2
# [1] "ctaacctctcccagtgtggaacctctatctcatgagaaagctgggatgag"
(and so on) however I would like to subset for all data frames in one swift move to have a list with my desired fastas (3 list items containing, in this case, 2 sequences each). I tried:
FASTAs <- lapply(fastafile, function(i)
{fastafile[c(which(names(fastafile) %in% GOI.split[[i]][ ,1]))]})
Could somebody please tell me why this is not working and what I have to do instead.
Thanks
This can be done with the following code:
As to why your
lapply
code is not working. One reason is that you are passing infastafile
, and you should be passing in indices.So you are trying this:
When you should do this:
So, to fix it, pass in
1:length(GOI.split)
instead offastafile
: