Subscript out of bounds while LDA analysis

1.5k views Asked by At

I am trying to run linear discriminant analysis in R. My dataframe contains two groups of data with dimension of 102 and 24. I ran R-code as follows:

mydata<-read.table()

head(mydata)
  Factor   TL   SL   FL   HL   HH EHH   BH  BW   CL  CH FNL  DFH  AFL  AFH PFL
1      1 86.0 68.4 77.5 15.4 14.1 9.4 21.3 4.7 14.2 9.8 6.8 13.0 10.2 10.2 1.7
2      1 71.8 57.4 65.1 14.3 12.1 8.2 16.3 4.1  9.1 6.5 5.5 10.4  8.9  7.8 1.1
3      1 82.9 64.3 72.8 15.3 13.1 8.3 19.1 4.7 10.7 9.5 7.7 12.4 10.9  8.1 1.6
4      1 74.2 56.5 55.7 14.3 11.8 7.2 18.7 5.2  7.5 5.7 5.6 11.8  9.4  7.8 1.2
5      1 66.8 52.1 61.1 13.1 10.9 7.9 15.5 5.5  7.2 5.4 4.2 10.1  6.5  5.5 1.1
6      1 72.6 58.9 61.7 13.5 12.4 8.2 18.3 6.1  9.7 7.6 6.8 10.4  5.6  8.9 1.2
   PFH ABFL ABFH Sin_P Posh_P  B_P  B_M B_M_B
1 13.7  1.8  9.4  16.3   34.6 39.6 48.1  29.1
2  9.4  1.2  6.3   9.4   30.5 32.8 38.4  23.8
3 12.2  1.7  9.1  16.4   34.6 39.5 44.8  30.1
4 11.1  1.3  5.7  14.3   31.6 29.1 41.1  23.2
5  9.2  1.1  6.8  14.8   30.2 29.1 36.3  23.4
6  9.8  1.9  8.5  15.4   30.9 32.9 41.9  25.1

library(MASS)
ord <- lda(Factor ~ ., mydata)
ord
Call:
lda(Factor ~ ., data = mydata)

Prior probabilities of groups:
  1   2 
0.5 0.5 

Group means:
        TL       SL       FL       HL       HH      EHH       BH       BW
1 73.29020 57.99412 64.90392 14.15686 13.33137 8.347059 16.41373 5.821569
2 76.44118 61.42745 68.01569 14.48627 12.54510 8.227451 16.15294 7.586275
        CL       CH      FNL      DFH      AFL      AFH      PFL      PFH
1 8.427451 6.449020 6.070588 11.70980 8.611765 8.233333 1.360784 10.92157
2 8.752941 6.619608 6.954902 12.99412 8.821569 9.013725 2.754902 11.37255
      ABFL     ABFH    Sin_P   Posh_P      B_P      B_M    B_M_B
1 1.482353 7.982353 14.78235 32.70196 32.94314 39.09235 23.77157
2 1.698039 8.639216 15.40196 33.13725 33.78431 40.99020 24.82745

Coefficients of linear discriminants:
                LD1
TL     -0.158877362
SL      0.085504033
FL     -0.001151154
HL      0.001549496
HH     -0.006513463
EHH    -0.457378984
BH     -0.071013364
BW      0.682076101
CL      0.124730256
CH      0.064695108
FNL     0.059726102
DFH     0.193330210
AFL    -0.121504298
AFH     0.126553648
PFL     0.092334665
PFH     0.162660412
ABFL    0.041923390
ABFH   -0.168389200
Sin_P  -0.071962994
Posh_P -0.093672821
B_P     0.082480896
B_M     0.030929099
B_M_B   0.037913734

but when I try to plot the output I get this error:

library(ggord)
ggord(ord, mydata$Factor)
Error in predict(ord_in)$x[, axes] : subscript out of bounds

I found that the problem is that I have juts LD1 in the output and LD2 is not available. Can anyone kindly solve this? By this link you can find mydata: https://www.dropbox.com/preview/Foruhar/morph.txt

2

There are 2 answers

0
missuse On BEST ANSWER

LDA produces min(n,c-1) discriminants (c is the number of classes, n is the number of features). So with two classes you get only LD1. ggord needs 2 dimensions so it does not work. Try to make a histogram/density plot colored by class. Your data link is not valid (works only for you). Here's an example on generated data:

LD1_proj = c(rnorm(50)-1, rnorm(50)+1)
class = rep(c(1,2), each = 50)
df  =data.frame(LD1 = LD1_proj, class = as.factor(class))

library(ggplot2)
ggplot(data = df)+
  geom_density(aes(LD1, fill = class), alpha = 0.1)

enter image description here

With the data one would do something like this

pred = predict(ord, mydata)
LD1_proj = prex$x
class = mydata$Factor

and continue as above

0
Keerthesh Kumar On

After you have run the LDA program you will need to create a dataframe. below is the sample code i have used to create a density plot.

library(MASS)
sm.lda <- lda(Direction ~ Lag1+Lag2, data = SMTrain)
print(sm.lda)

p<- predict(sm.lda, SMTrain)
p.df <- data.frame(LD1 = p$x, class = p$class) #--- converting the 
prediction to data.frame
print(p.df)

#--- plotting the density plot --- #
library(ggplot2)
ggplot(p.df) + geom_density(aes(LD1, fill = class), alpha = 0.2)