I'm trying to update a variable (popsnp
) in a higher scope within an lapply
, on the basis of a match. I can't quite figure out the syntax for updating the values though, what I have currently overwrites any previously existing values with NA
:
lapply(1:22, function(i){
in.name<-paste("/data/mdp14aps/ld/chr", i, ".ld", sep="")
out.name<-paste("/data/mdp14aps/R/ldatachr", i, ".rda", sep="")
ldata<-read.csv(in.name, sep="", header=TRUE,
colClasses=c(NA,NA,NA,NA,NA,NA,"NULL"))
freq<-count(ldata, c("SNP_A", "CHR_A", "BP_A"))
#the part I'm not sure about
popsnp$chrom<<-freq[match(popsnp$marker, freq$SNP_A),2]
popsnp$position<<-freq[match(popsnp$marker, freq$SNP_A),3]
popsnp$freq<<-freq[match(popsnp$marker, freq$SNP_A),4]
save(ldata,file=out.name)
rm(ldata, freq)
})
I want to preserve the values I'm setting between iterations of lapply
so I end up with popsnp
containing all values of chrom
, position
and freq
, not just the last iteration.
I feel like this should be straightforward, but I'm still somewhat unfamiliar with R.
A toy example:
test<-data.frame(A = c("a", "b", "c", "d", "e"), B = c(rep(NA,5)))
test1<-data.frame(A = c("a", "b"), B = c(1, 2))
test2<-data.frame(A = c("c", "d", "e"), B = c(3, 4, 5))
test$B<-test1[match(test$A, test1$A), 2]
test$B<-test2[match(test$A, test2$A), 2]
I want test$B
to have the values from 1-5 in it.
Update for your Toy Example
You need to subset both sides of your assignment, and also convert your conditions to logical subsetting vectors.
I recommend you look at each element individually so you can see what's happening.
Previously...
I'm not exactly sure I understand what you're example is trying to accomplish. So I'm going to provide you with a toy example of subsetting:
Best practice is to always use TRUE / FALSE conditions while subsetting to avoid future errors. You could subset by row number, but it ALWAYS gets messy.
It's important to note that your use of
<<-
pushes your change of the variable to the parent environment, outside of the scope of your function. This can lead to unexpected results in the future. It's better to supply the variable you want to change and then return it again at the end of your manipulation function. This way you have a clear sequence of events.Final Update
Lastly, with respect to dropping unnecessary columns. Typical practice is to drop them after import by name (best practice) or reference number (changes in data break this).