Issue with subsetting data; not picking up observations for one category

254 views Asked by At

I'm attempting to subset data in R from one column of my spreadsheet into three different categories: Cod, Haddock and Whiting. For some reason however, Haddock is not working and is saying there are no observations for this subset, when in fact there should be 51 - the other two categories are subsetting fine with all observations accounted for. What can be the reasons for this? The spreadsheet appears to be ok, and doesn't seem to contain any obvious problems, but is there something I could be overlooking?

Thanks

edit:

ok, here's part of the data set here...

OpCode                 Species      DistanceFromCoast
SA_F1_280714_C4_1   Atlantic cod    583.69
SA_F1_280714_C4_1   Haddock         583.69
SA_F1_280714_C4_1   Whiting         583.69
SA_F1_290714_C2_10  Atlantic cod    892.51
SA_F1_290714_C2_10  Haddock         892.51
SA_F1_290714_C2_10  Whiting         892.51
SA_F1_280714_C4_6   Haddock         1080.5
SA_F1_280714_C4_6   Whiting         1080.5
SA_F1_280714_C4_6   Atlantic cod    1080.5
SA_F1_280714_C4_7   Whiting         1030.59
SA_F1_280714_C4_7   Haddock         1030.59
SA_F1_280714_C4_7   Atlantic cod    1030.59
1

There are 1 answers

1
Rorschach On

Maybe there is something with the class of your variables. Try

str(dat)
'data.frame':   12 obs. of  3 variables:
 $ OpCode           : Factor w/ 4 levels "SA_F1_280714_C4_1",..: 1 1 1 4 4 4 2 2 2 3 ...
 $ Species          : Factor w/ 3 levels "Atlantic Cod",..: 1 2 3 1 2 3 2 3 1 3 ...
 $ DistanceFromCoast: num  584 584 584 893 893 ...

A grouping operation should work fine,

library(dplyr)
dat %>% group_by(Species) %>%
  summarise(Ave.Dist = mean(DistanceFromCoast))
#        Species Ave.Dist
# 1 Atlantic Cod 896.8225
# 2      Haddock 896.8225
# 3      Whiting 896.8225

To graph by group using ggplot2, you need to specify a grouping option in aes (ie. color, shape, group, etc).

library(ggplot2)
ggplot(dat, aes(x=Species, y=DistanceFromCoast, fill=Species)) + geom_bar(stat="identity")