I am new to R. I want to do a chi-square outlier test, using outliers library on a variable x$indel
until the returned p.value is > 0.01
(after removing the outlier from the data).
Here is what I tried:
while(chisq.out.test(x$indel)$p.value < 0.01)
{
# str: string contains the outlier value and some text
# n: extract the outlier value and transform to numeric
str <- chisq.out.test(x$indel)$alternative
print(str)
n <- as.numeric(unlist(regmatches(str,
gregexpr("[[:digit:]]+\\.*[[:digit:]]*",str))))
x <- x[x$indel < n,]
print(nrow(x))
}
Below is the x$indel column
c(0.287749287749, 0.324786324786, 0.330484330484, 0.293447293447,
0.293447293447, 0.31339031339, 0.31339031339, 0.327635327635,
0.344729344729, 0.327635327635, 0.304843304843, 0.296296296296,
0.433048433048, 0.700854700855, 0.467236467236, 0.31339031339,
0.373219373219, 0.293447293447, 0.304843304843, 0.293447293447,
0.407407407407, 0.301994301994, 0.307692307692, 0.301994301994,
0.381766381766, 0.307692307692)
When I paste this command to the console nothing happens, what's wrong?
Generate some data with "outliers"
Replaced all
x$indel
withx
. The problem with using a data.frame is that when you remove values from the column and try to replace the original list, you'll get a complaint about dimension mismatch.Also improved the regex to handle negative numbers, and improved the subset logic to deal with "highest value" and "lowest value" cases.