what should i do with unknown data while creating weka arff files

459 views Asked by At

I am trying to format my dataset as a weka arff file. this is a sample of my arff file:

@relation my_relation
@attribute 'attrib_1' numeric
@attribute 'attrib_2' numeric
@attribute 'attrib_3' numeric
...
@attribute 'class' {1,2,3,4,5}
@data
6,6,55,0,0,0,18.9,0,1,2,'?',14,15,20,'?','?','?','?',28,29,1
54,25,19,4.85,0,1,10,13,'?','?','?','?','?','?',15,16,19,20,21,0,3
...

My featrues are numeric and real values but there are some missing values for each feature in different cases(instances). how should i determine that my features contain missing values? (I used '?' for missing values but this error occurs while trying to open mydata.arff

number expected, read token[?], line 746

) Edit: I changed the '?' to ? and tried to load the file.this time the following error occurs:

nominal value not declared in header, read Token[86], line 746
1

There are 1 answers

0
G5W On BEST ANSWER

This is too long to fit into a comment. I think that I can see a likely problem with your data. It contains some bad characters. You are probably reading this in a web browser. If so, view the html source for this page and then scroll down to your data. In Internet explorer, I was able to save this web page as a text file and then just look at the text in an editor to see the bad characters. In many places throughout the data, I see ‌​ These are zero-width characters (see zwnj and 8203. That is, they are characters that are present in the data, but do not show up on the screen, not even as blank space. Because your data contains these spurious characters, WEKA cannot read it. Please check your data to see if the original contains these hidden characters.