R chisquare outlier test while loop

683 views Asked by At

I am new to R. I want to do a chi-square outlier test, using outliers library on a variable x$indel until the returned p.value is > 0.01 (after removing the outlier from the data). Here is what I tried:

while(chisq.out.test(x$indel)$p.value < 0.01)
{
    # str: string contains the outlier value and some text 
    #   n: extract the outlier value and transform to numeric 
    str <- chisq.out.test(x$indel)$alternative
    print(str)

    n <- as.numeric(unlist(regmatches(str,
             gregexpr("[[:digit:]]+\\.*[[:digit:]]*",str))))
    x <- x[x$indel < n,]
    print(nrow(x))
}

Below is the x$indel column

    c(0.287749287749, 0.324786324786, 0.330484330484, 0.293447293447, 
0.293447293447, 0.31339031339, 0.31339031339, 0.327635327635, 
0.344729344729, 0.327635327635, 0.304843304843, 0.296296296296, 
0.433048433048, 0.700854700855, 0.467236467236, 0.31339031339, 
0.373219373219, 0.293447293447, 0.304843304843, 0.293447293447, 
0.407407407407, 0.301994301994, 0.307692307692, 0.301994301994, 
0.381766381766, 0.307692307692)

When I paste this command to the console nothing happens, what's wrong?

1

There are 1 answers

0
Vlo On BEST ANSWER

Generate some data with "outliers"

x = round(rnorm(100, 100, 100), 2)

Replaced all x$indel with x. The problem with using a data.frame is that when you remove values from the column and try to replace the original list, you'll get a complaint about dimension mismatch.

Also improved the regex to handle negative numbers, and improved the subset logic to deal with "highest value" and "lowest value" cases.

while(chisq.out.test(x)$p.value < 0.01)
{
  # str: string contains the outlier value and some text 
  #   n: extract the outlier value and transform to numeric 
  str <- chisq.out.test(x)$alternative
  print(str)
  n <- as.numeric(unlist(regmatches(str,
                                    gregexpr("(?<=value)(.*)(?=is an outlier)", str, perl = T))))
  x <- x[x != n]
  print(length(x))
}