How can I select height size and just two factor levels?

90 views Asked by At

I would like to select height size interval (height >0.50 and height < 1.00) and just two levels in pattern (for example: norm and esti) inside the same height size interval. Example:

sp  diameter    height  breaking    pattern
actcon  15,10   9,50    no  norm
actcon  8,90    9,46    no  norm
actcon  7,00    9,40    no  norm
actcon  12,50   9,25    no  norm
actcon  8,60    9,00    no  norm
actcon  7,70    8,76    no  norm
actcon  0,80    0,50    yes norm
actcon  0,50    0,50    no  norm
actcon  0,90    0,53    yes norm
actcon  0,55    0,54    no  norm
actcon  0,65    0,54    no  norm
actcon  1,10    0,50    no  curv
actcon  0,85    0,93    no  norm
actcon  1,20    0,94    no  norm
actcon  1,30    0,94    no  deit
actcon  0,90    0,94    no  norm
actcon  2,10    0,94    yes norm
actcon  1,00    0,95    no  norm
actcon  0,90    0,95    no  norm
actcon  0,80    0,95    no  norm
actcon  1,00    0,95    no  norm
actcon  1,05    0,96    no  norm
actcon  1,00    0,96    no  norm
actcon  0,90    1,30    no  esti
2

There are 2 answers

1
Gavin Simpson On

[I see @akrun added the same solution in a comment as I was writing this]

You want,

subdf <- subset(yourdf, subset = (height >0.50 & height < 1.00) & 
                                 pattern %in% c("norm","esti"))

which gives

> subdf
       sp diameter height breaking pattern
9  actcon     0.90   0.53      yes    norm
10 actcon     0.55   0.54       no    norm
11 actcon     0.65   0.54       no    norm
13 actcon     0.85   0.93       no    norm
14 actcon     1.20   0.94       no    norm
16 actcon     0.90   0.94       no    norm
17 actcon     2.10   0.94      yes    norm
18 actcon     1.00   0.95       no    norm
19 actcon     0.90   0.95       no    norm
20 actcon     0.80   0.95       no    norm
21 actcon     1.00   0.95       no    norm
22 actcon     1.05   0.96       no    norm
23 actcon     1.00   0.96       no    norm

If you want to remove the levels of factors that no longer exist due to the subsetting,

> str(subdf)
'data.frame':   13 obs. of  5 variables:
 $ sp      : Factor w/ 1 level "actcon": 1 1 1 1 1 1 1 1 1 1 ...
 $ diameter: num  0.9 0.55 0.65 0.85 1.2 0.9 2.1 1 0.9 0.8 ...
 $ height  : num  0.53 0.54 0.54 0.93 0.94 0.94 0.94 0.95 0.95 0.95 ...
 $ breaking: Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 2 1 1 1 ...
 $ pattern : Factor w/ 4 levels "curv","deit",..: 4 4 4 4 4 4 4 4 4 4 ...

Then you can do

subdf <- droplevels(subdf)

> str(subdf)
'data.frame':   13 obs. of  5 variables:
 $ sp      : Factor w/ 1 level "actcon": 1 1 1 1 1 1 1 1 1 1 ...
 $ diameter: num  0.9 0.55 0.65 0.85 1.2 0.9 2.1 1 0.9 0.8 ...
 $ height  : num  0.53 0.54 0.54 0.93 0.94 0.94 0.94 0.95 0.95 0.95 ...
 $ breaking: Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 2 1 1 1 ...
 $ pattern : Factor w/ 1 level "norm": 1 1 1 1 1 1 1 1 1 1 ...

But that might not be the right thing depending on your actual problem down the road.

2
costebk08 On

You could download also the "dplyr" package and use the filter function with the or operator |. Using dplyr in general can be a simpler way of cleaning data.

install.packages("dplyr")
library(dplyr)
filter(subdf, height>.5|height<1,pattern=="norm"|pattern=="esti")

This code specifies subdf as the data frame says height can be greater than five or less than 1, where the pattern can be "norm" or "esti." If you want to continue working with this subset you will have to assign it to something else. This does not alter your original data.