Using WEKA Filters in Java - Oversampling and Undersampling

275 views Asked by At

I'm having an issue with finding out how to use WEKA filters in the java code. I've looked up help but it seems a little dated as I'm using WEKA 3.8.5 . I'm doing 3 test. Test 1: No Filter, Test 2: weka.filters.supervised.instance.SpreadSubsample -M 1.0 , and Test 3: weka.filters.supervised.instance.Resample -B 1.0 -Z 130.3.

If my research is correct I should import the filters like this. Now I'm lost on having "-M 1.0 " for SpreadSample(my under sampling Test) and "-B 1.0 -Z 130.3." for Resample(My oversampling test).

import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.supervised.instance.Resample; 
import weka.filters.supervised.instance.SpreadSubsample;

And I have Test 1(my no filter Test) coded below

import java.io.FileReader;
import java.util.Random;
import weka.classifiers.Evaluation;
import weka.classifiers.trees.J48;
import weka.core.Instances;


public class Fraud {
    public static void main(String args[])
    {
  
        try {
  
            // Creating J48 classifier for the  tree
            J48 j48Classifier = new J48();
  
            // Setting the path for the dataset
            String FraudDataset = "C:\\Users\\Owner\\Desktop\\CreditCard\\CreditCard.arff";
            BufferedReader bufferedReader
            = new BufferedReader(
                new FileReader(FraudDataset));
            
            

        // Creating the data set instances
        Instances datasetInstances
            = new Instances(bufferedReader);

  
        datasetInstances.setClassIndex(
            datasetInstances.numAttributes() - 1);

        Evaluation evaluation
            = new Evaluation(datasetInstances);

        // Cross Validate Model. 10 Folds
        evaluation.crossValidateModel(
            j48Classifier, datasetInstances, 10,
            new Random(1));
        System.out.println(evaluation.toSummaryString(
            "\nResults", false));
        
        
        
    }

    // Catching exceptions
    catch (Exception e) {
        System.out.println("Error Occured!!!! \n"
                           + e.getMessage());
    }


    System.out.print("DT Successfully executed.");
}
    
}

The results of my code is:
Results
Correctly Classified Instances      284649               99.9445 %
Incorrectly Classified Instances       158                0.0555 %
Kappa statistic                          0.8257
Mean absolute error                      0.0008
Root mean squared error                  0.0232
Relative absolute error                 24.2995 %
Root relative squared error             55.9107 %
Total Number of Instances           284807     

DT Successfully executed.

Does anyone have an idea on how I can add the filters and the settings I want for the filters to the code for Test 2 and 3? Any help will be appreciated. I will run the 3 tests multiple times and compare the results. I want to see what works best of the 3.

1

There are 1 answers

1
fracpete On

-M 1.0 and -B 1.0 -Z 130.3 are the options that you supply to the filters from the command-line. These filters implement the weka.core.OptionHandler interface, which offers the setOptions and getOptions methods.

For example, SpreadSubsample can be instantiated like this:

import weka.filters.supervised.instance.SpreadSubsample;
import weka.core.Utils;
...
SpreadSubsample spread = new SpreadSubsample();
// Utils.splitOptions generates an array from an option string
spread.setOptions(Utils.splitOptions("-M 1.0"));
// alternatively:
// spread.setOptions(new String[]{"-M", "1.0"});

In order to apply the filters, you should use the FilteredClassifier approach. E.g., for SpreadSubsample you would do something like this:

import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.filters.supervised.instance.SpreadSubsample;
import weka.core.Utils;
...
// base classifier
J48 j48 = new J48();
// filter
SpreadSubsample spread = new SpreadSubsample();
spread.setOptions(Utils.splitOptions("-M 1.0"));
// meta-classifier
FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(spread);
fc.setClassifier(j48);

And then evaluate the fc classifier object on your dataset.