In a comlex dataframe I am having a column with a net recalled salary inclusive NAs that I want to exclude plus a column with the year when the study was conducted ranging from 1992 to 2010, more or less like this:
q32 pgssyear
2000 1992
1000 1992
NA 1992
3000 1994
etc.
If I try to draw a boxplot like:
boxplot(dataset$q32~pgssyear,data=dataset, main="Recalled Net Salary per Month (PLN)",
xlab="Year", ylab="Net Salary")
it seems to work, however NAs might distort the calculations, so I wanted to get rid of them:
boxplot(na.omit(dataset$q32)~pgssyear,data=dataset, main="Recalled Net Salary per Month (PLN)",
xlab="Year", ylab="Net Salary")
Then I get a warning message that the length of pgsyear and q32 do not match, most likely cause I removed NAs from q32, so I tried to shorten the pgsyear, so that it does not include the rows that correspond to NAs from the q32 column:
pgssyearprim <- subset(dataset$pgssyear, dataset$q32!= NA )
however then the pgsyearprim gets treated as a factor variable:
pgssyearprim
factor(0)
and I get the same warning message if I introduce it to the boxplot formula...
Levels: 1992 1993 1994 1995 1997 1999 2002 2005 2008 2010
Of course they wouldn't ... you removed some of the data only from the LHS with
na.omit(dataset$q32)~pgssyear
. Instead use!is.na(dataset$q32)
as a subset argument