What is the accurate way of removing outliers from a set of data and make the data consistent using javascript?

1.2k views Asked by At

Let's say we have the following set of data. 2.33, 2.19, 4.7, 2.69, 2.8, 2.12, 3.01, 2.5, 1.98, 2.34

How do I pick the consistent data from the above sample by eliminating the outliers using JavaScript or any other mathematical method which can be implemented in JavaScript?

I approached the following way of calculating: Average value, Standard deviation, Min value (avg - std dev), Max value (avg + std dev). And considered the data which falls in the range between Min and Max values.

Are there are any better approaches we can follow to obtain accuracy?

1

There are 1 answers

0
سعيد On

I don't think your approach is sufficient, you need to make sure a number is really extremely high or extremely low before deciding whether its an outlier . to achieve this we need to find Q1 and Q1 to calculate IQR which Q3 – Q1.
Q3 && Q1 are Quartiles learn more :https://www.statisticshowto.com/what-are-quartiles/ IQR is (interquartile range) learn more : https://www.statisticshowto.com/probability-and-statistics/interquartile-range/

will all of this we can check for outliers which are extremely low and high value :
extremely high value is any value that is greater than Q3 + ( 1.5* IQR )
extremely low value is any value that is lower than Q1 - ( 1.5* IQR )

so in code

// sort array ascending
const dataSet= [2, 2.5, 2.25, 4, 1, -3, 10, 20];
const asc = arr => arr.sort((a, b) => a - b);

const quartile = (arr, q) => {
    const sorted = asc(arr);
    const pos = (sorted.length - 1) * q;
    const base = Math.floor(pos);
    const rest = pos - base;
    if (sorted[base + 1] !== undefined) {
        return sorted[base] + rest * (sorted[base + 1] - sorted[base]);
    } else {
        return sorted[base];
    }
};

const Q1 = quartile(dataSet, .25);
const Q3 = quartile(dataSet, .75);
const IQR = Q3 - Q1;

let noneOutliers=[]
dataSet.forEach(number => {
    if(number > (Q3 + (1.5 * IQR)) || number < (Q1 - (1.5 * IQR))) {
        console.log('number is outlier');
    }
    else {
        noneOutliers.push(number);
    }
});

the quartile function I used is from this answer How to get median and quartiles/percentiles of an array in JavaScript (or PHP)?

for the method you can check this video https://www.youtube.com/watch?v=9aDHbRb4Bf8