I'm using Weka to develop a classifier for detecting semantic relations. Lets supose I have a multiclass dataset. The dataset, at first, contains 4 numeric features (could be over 4) and a class attribute, where a valid class attribute value is "HYPERNYM", "SYNONYM" or "NO", i.e., three classes. So, examples of instances could be:
feat1 feat2 feat3 feat4 class
....
0.32 0.45 0.15 5 NO
0.26 0.48 0.93 20 HYPER
0.65 0.32 0.43 13 NO
0.43 0.19 0.89 45 SYN
...
This is a typical classification problem. However, we must consider the dataset is inflicted by class imbalance problem (it is a problem in machine learning where the total number of a class of data (positive) is far less than the total number of another class of data (negative)) and class overlapping (examples of different classes have very similar characteristics).
The question is: How can I represent each instance in a graph 2D, in a way that I can visualize the degree of overlapping between classes?
I have found a picture which illustrates a possible example of graph, like a scatter plot. However, I don't know how to plot this.
Is there an easy way to make a figure similar, but in R or using Weka?
You can use Multidimensional Scaling (MDS) to first, reduce the dimension of your data and then plot it. This method tries to preserve the distances between points when projecting into a lower dimension.
Here is an example in R for the iris dataset
Or you could also reduce it to 3 dimensions and plot it using the scatterplot3d library.
About the class imbalance problem, I don't know how you would like to represent it in the scatter plot. Maybe by increasing the size of the points from the minority classes.