Finding numbers greater than the average - why is my IF statement not working properly?

817 views Asked by At

I am testing a program with different text files containing randomly generated numbers. The Java program is built to add these numbers together from the text file, take the average of those numbers, and then (using an IF statement) finding the numbers from the text file that are greater than the average, putting said values into an ArrayList, and printing the average and the ArrayList as output. For some reason, however, when I run my program with a different text file (I tested using two, one of which worked, and the one that currently does not). The results printed in the shell are not correct - the majority of the values are greater than the average, but I get a few that are not, and by a margin of at most three.

Here is my code:

package homework.pkg23.average;

import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;

public class Homework23Average {

    public static void main(String[] args) {
        ArrayList exes = new ArrayList ();
        double x = 0;
        double y = 0;
        Scanner inputStream = null;

        try {
            inputStream = new Scanner (new File ("MyInput.txt"));
        }
        catch (FileNotFoundException e) {
            System.out.println ("File not found, program aborted:");
            System.exit (1);
        }
        int count = 0;
        while (inputStream.hasNextDouble ()) {
            count ++;
            x = inputStream.nextDouble ();
            y += x;
            if (x > y/count) // x values greater than the mean
                exes.add (x);
        }
        System.out.println ("The value(s) greater than the mean, " 
                            + y/count + ", are (is):");
        exes.forEach (System.out::println);
        inputStream.close ();
    }

}

When running this from the file, I get an average of 79.67, but my output looks like this:

The value(s) greater than the mean, 79.67, are (is):
128.0
93.0
143.0
111.0
95.0
116.0
136.0
129.0
141.0
78.0    <-- NOTICE: value is less than the average
93.0
105.0
90.0
90.0
144.0
116.0
136.0
138.0
75.0    <-- NOTICE: value is less than the average
80.0
126.0
75.0    <-- NOTICE: value is less than the average
80.0
98.0
114.0
116.0
86.0
78.0    <-- NOTICE: value is less than the average
123.0
145.0
103.0
111.0
91.0
134.0
119.0
91.0
121.0
113.0
129.0
91.0
116.0
85.0
85.0
126.0
145.0
98.0
115.0
83.0
127.0
119.0
97.0
125.0
121.0
123.0
86.0
108.0
100.0
134.0

I cannot figure out for the life of me why these values are slipping through. I tested this program on another text file containing fewer input values and everything worked fine. I am new to Java as this is my second program after the "Hello World" program, and I do not have extensive knowledge of Java syntax.

3

There are 3 answers

4
Eran On BEST ANSWER

When you decide if you add a number to the list of The value(s) greater than the mean, you base this decision on the partial average of the numbers processed till this point. That's why you see in the output elements lower than the final average.

For example, suppose the first element is 1 and the second is 2. Then the average of the first two elements is 1.5, and since 2 > 1.5, you'll add it to your output list. However, if the next elements are larger than 2, the final average might be higher than 2, so your output will have a number lower than the final average.

your code can err in the other direction too - a number near the beginning of the input may be mistakenly considered lower than the average even though it's higher than the final average.

In order to get the correct output, you must have two iterations. The first would calculate the average and store all the input numbers, and the second would find the numbers higher than the average.

1
user2357112 On

You're comparing values to the running average, not the final average. The comparison for the nth item doesn't take into account the n+1th item and beyond.

0
chiastic-security On

If you want this to work, then you have to go through the inputs twice, once to calculate the mean and once to work out which ones exceed it.

When you read in the first value, you have no idea what the mean is, so you can't decide whether it's greater than it!

The key point is that the mean is a property of the inputs as a whole, so until you've considered them all, you can't know the mean.