How do you allow for text qualifiers using numpy genfromtxt

Question

How do you allow for text qualifiers using numpy genfromtxt

1k views Asked by ChrisProsser At 09 December 2013 at 12:16

I am currently trying to import some comma delimited text data into an array using the numpy library in Python. I am using the following code:

data = np.genfromtxt(fname, delimiter=',')

I get the following error:

Line #2 (got 12 columns instead of 11)

for every line after the header.

The reason for this appears to be that one of the columns contains a comma, but attempts to deal with this using text qualifiers (") around the data for that column. If I used the Python csv library this is handled by default e.g.:

reader = csvreader(open(fname, 'rb'))

I know that I could import the data using the csv library and then convert it to an array, but I wondered if it is possible to do this from one of numpy's functions that convert text data to an array such as genfromtxt. I have checked out the help on genfromtxt but none of the arguments listed appear to describe what I was looking for, unless I am missing something.

In case it helps here is a sample of a few lines from the file:

survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S

It is the name column that I assume is causing the issue.

Original Q&A

There are 2 answers

Lee On 09 December 2013 at 14:11

One way around this is to add another name field, so that you have thirteen name fields with a separate forename and surname column:

survived,pclass,surname,forname,sex,age,sibsp,parch,ticket,fare,cabin,embarked

If you then import like so:

data = np.genfromtxt(fname, delimiter=',',names=True,dtype=None)

It should work:

data['surname']
array(['"Braund', '"Cumings', '"Heikkinen'], 
      dtype='|S10')

Note that you may also want to stip out the " marks in the original file.

**chthonicdaemon** · Accepted Answer · 2013-12-09T12:24:07+00:00

chthonicdaemon On 09 December 2013 at 12:24 BEST ANSWER

Numpy arrays are not well-suited for categorical data like you have here. You may be better off using pandas:

import pandas
data = pandas.read_csv(fname)

TechQA.

How do you allow for text qualifiers using numpy genfromtxt

There are 2 answers

Related Questions in PYTHON

Related Questions in ARRAYS

Related Questions in CSV

Related Questions in NUMPY

Related Questions in GENFROMTXT

Popular Questions

Trending Questions