ARFF file extension to csv binary executable

575 views Asked by At

Thanks in advance for the help.

I'm looking for a binary executable to convert an .arff into a .csv in a bash script. Ideally something that I could run along the lines of

#! /bin/sh
... some stuff....
converstionFunc input.arff output.csv
... some more stuff ...

Looking into writing this myself I found that weka provides a library that I could utilize that would allow me to do this. However, as much as I looked for it, I could not find it. I have weka installed on my mac and after looking around for the library I still was unable to find it.

Does anyone know where I may find such an executable, or able to point me where I could get a hold of the weka java library that would let me write it myself?

2

There are 2 answers

1
knb On BEST ANSWER

Clone this github repository. It contains an arff2csv tool in the "tools" subdirectory.

arff2csv is designed to run in pipes of unix commandline tools.

https://github.com/jeroenjanssens/data-science-at-the-command-line

arff2csv is a one-line shell-script that calls another shell script that calls weka.jar,

so it needs java installed on your machine; and note that arff2csv needs Weka version 3.6. (According to my experiments the newer v3.7 does not work.)

The script wants this environment variable set:

export WEKAPATH=/path/to/wekajar-dirname

and then you can do

cat /opt/smallapps/weka-stable/data/breast-cancer.arff | arff2csv > breast-cancer.arff.csv

Large arffs need some time to get processed.

You can read J.Janssen's book (see repo-README) for a bit more info.

0
reynoldsnlp On

Try an web search for arff2csv. It looks like there are lots of utilities out there.