Linear multivariate regression in R

549 views Asked by At

I want to model that a factory takes an input of, say, x tonnes of raw material, which is then processed. In the first step waste materials are removed, and a product P1 is created. For the "rest" of the material, it is processed once again and another product P2 is created.

The problem is that I want to know how much raw material it takes to produce, say, 1 tonne of product P1 and how much raw material it takes to produce 1 tonne of P2.

I know the amount of raw materials, the amount of finished product P1 and P2 but nothing more.

In my mind, this can be modelled through multivariate regression, using P1 and P2 as dependent variables and the total raw material as the independent variable and find the factors <1 for each finished product. Does this seem right?

Also, how can this be achieved using R? From googling, I've found how to conduct multivariable regression, but not multivariate regression in R.

EDIT:

Trying to use:

datas <- read.table("datass.csv",header = TRUE, sep=",")

rawMat <- matrix(datas[,1])
P1 <- matrix(datas[,2])
P2 <- matrix(datas[,3])
fit <- lm(formula = P1 ~ rawMat)
fit

fit2 <-lm(formula = P2 ~ rawMat)
fit2

gave me results which is certainly not aligned with reality. Fit2, for instance returned 0,1381 which should have a value around 0,8. How can I factor in Y1 as well? Fit2 for instance more or less gave me the average P2/RawMat, but the RawMat is the same raw material used to produce both Products, so I would like to have something like 0,8 as the factor for P1, and around the same for the factor of P2.

The R output was only:

 Coefficients:
 (Intercept)      rawMat   
  -65.6702       0.1381  

for fit2. Why doesn't it include "rawMat1", "rawMat2" as in J.R.'s solution?

EDIT2: datass.csv contains 3 columns - the first with the rawMaterial required to produce both Products P1 and P2, the second column represents the tonnes of P1 produces and the last column the same for P2

1

There are 1 answers

6
J.R. On BEST ANSWER

multivariate multiple regression can be done by lm(). This is very well documented, but here follows a little example:

rawMat <- matrix(rnorm(200), ncol=2)
noise <- matrix(rnorm(200, 0, 0.2), ncol=2)
B <- matrix( 1:4, ncol=2)
P <- t( B %*% t(rawMat)) + noise 

fit <- lm(P ~ rawMat)
summary( fit )

with summary output:

Response Y1 :

Call:
lm(formula = Y1 ~ rawMat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50710 -0.14475 -0.02501  0.11955  0.51882 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.007812   0.019801  -0.395    0.694    
rawMat1      1.002428   0.020141  49.770   <2e-16 ***
rawMat2      3.032761   0.020293 149.445   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1978 on 97 degrees of freedom
Multiple R-squared:  0.9964,    Adjusted R-squared:  0.9963 
F-statistic: 1.335e+04 on 2 and 97 DF,  p-value: < 2.2e-16


Response Y2 :

Call:
lm(formula = Y2 ~ rawMat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.60435 -0.11004  0.02105  0.11929  0.42539 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.02287    0.01930   1.185    0.239    
rawMat1      2.05474    0.01964 104.638   <2e-16 ***
rawMat2      4.00162    0.01978 202.256   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1929 on 97 degrees of freedom
Multiple R-squared:  0.9983,    Adjusted R-squared:  0.9983 
F-statistic: 2.852e+04 on 2 and 97 DF,  p-value: < 2.2e-16

EDIT!: In your case with a data.frame named datas you could do something like:

datas <- data.frame( y1 = P[,1], y2=P[,2], x1 = rawMat[,1], x2 = rawMat[,2])
fit <- lm( as.matrix(datas[ ,1:2]) ~ as.matrix(datas[,3:4]) )

or instead:

fit <- with(datas, lm( cbind(y1,y2) ~ x1+x2 ))