I am currently trying to import some comma delimited text data into an array using the numpy library in Python. I am using the following code:
data = np.genfromtxt(fname, delimiter=',')
I get the following error:
Line #2 (got 12 columns instead of 11)
for every line after the header.
The reason for this appears to be that one of the columns contains a comma, but attempts to deal with this using text qualifiers (") around the data for that column. If I used the Python csv library this is handled by default e.g.:
reader = csvreader(open(fname, 'rb'))
I know that I could import the data using the csv
library and then convert it to an array, but I wondered if it is possible to do this from one of numpy's functions that convert text data to an array such as genfromtxt
. I have checked out the help on genfromtxt
but none of the arguments listed appear to describe what I was looking for, unless I am missing something.
In case it helps here is a sample of a few lines from the file:
survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
It is the name column that I assume is causing the issue.
Numpy arrays are not well-suited for categorical data like you have here. You may be better off using
pandas
: