Building model matrix to correct for batch effect with biological and technical replicates

77 views Asked by At

I recently conducted some MASS SPEC for my samples. Each sample was run thrice through the machine. However, there was a large space of time between the first run and the consequent second and third run (both run at the same time), so I would like to conduct a batch effect. My data set looks something like this:

Sample      |  Biological Rep 1                 |      Biological Rep 2             |
Condition   |C1      |   C2   |  C3    | C4     | c1     |  C2    |  C3    | C4     |
Tech repeat |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 | 2|2 |

In the tech repeats, 1 was the technical repeat run first, and 2 represents the second and third repeat that were run at the same time.

Here is a Sample dataframe, Where B1 represents biological repeat 1, B2 represents biological repeat 2. C1-4 represent conditions, and .1-3 represent technical repeats.

Protein <- c("Protein1", "Protein2", "Protein3")
B1C1.1 <- c(15, 41, 32)
B1C1.2<- c(3, 10, 14)
B1C1.3<- c(4, 6, 9)
B1C2.1 <- c(10, 13, 19)
B1C2.2<- c(5, 11, 15)
B1C2.3<- c(4, 6, 9)
B1C3.1 <- c(15, 41, 32)
B1C3.2<- c(3, 10, 14)
B1C3.3<- c(4, 6, 9)
B1C4.1 <- c(10, 13, 19)
B1C4.2<- c(5, 11, 15)
B1C4.3<- c(4, 6, 9)
B2C1.1 <- c(15, 41, 32)
B2C1.2<- c(3, 10, 14)
B2C1.3<- c(4, 6, 9)
B2C2.1 <- c(10, 13, 19)
B2C2.2<- c(5, 11, 15)
B2C2.3<- c(4, 6, 9)
B2C3.1 <- c(15, 41, 32)
B2C3.2<- c(3, 10, 14)
B2C3.3<- c(4, 6, 9)
B2C4.1 <- c(10, 13, 19)
B2C4.2<- c(5, 11, 15)
B2C4.3<- c(4, 6, 9)
df <- data.frame(Protein, B1C1.1, B1C1.2, B1C1.3, B1C2.1, B1C2.2, B1C2.3, B1C3.1, B1C3.2, B1C3.3, B1C4.1, B1C4.2, B1C4.3, B2C1.1, B2C1.2, B2C1.3, B2C2.1, B2C2.2, B2C2.3, B2C3.1, B2C3.2, B2C3.3, B2C4.1, B2C4.2, B2C4.3)

The model matrix that I have tried to conduct goes like this:

tr1<- as.factor(rep(c(1,2,2),8)) #batch one technical repeat vs 2/3 technical repeat
ms1<- as.factor(c(rep(1,6), rep(2,6), rep(3,6), rep(4,6))) #4 samples, 6 times run 
ex1<- as.factor(c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(1,3), rep(2,3), rep(3,3), rep(4,3))) # 2 biological repeat for each sample, each run thrice
design1<- model.matrix(~ex1 + ms1+tr1)
block <- c(1:6, 1:6, 1:6, 1:6)
dupcor = duplicateCorrelation(df, design = design1,  block = block)
fit <- lmFit(df, design1, block = block, correlation = dupcor$consensus)

However, when I run the code it tells me that

Note: design matrix not of full rank (1 coef not estimable).

How can I work around this problem? Any input would be greatly appreciated! Thank you
0

There are 0 answers