I have a csv-file containing a lot of data that I want to read as a masked array. I've done so using the following:
data=np.recfromcsv(filename,case_sensitive=True,usemask=True)
which works just fine. However, my problem is that the data are either strings, integers, or floats. What I want to do now is convert all the integers into floats, i.e. turn all the "1"s into "1.0"s etc. while preserving everything else.
Additionally, I am looking for a generic solution. So simply specifying the desired types manually won't do since the csv-file (including the number of columns) changes.
I've tried astype but since the array also has string-entries that won't work, or am I missing something?
Thanks.
I haven't used
recfromcsv
, but looking at its code I see it usesnp.genfromtxt
, followed by a masked records construction.I'd suggest giving a small sample
csv
text (3 or so lines), and show the resultingdata
. We need to see thedtype
in particular.It may also be useful to start with
genfromtxt
, skipping the masked array stuff for now. I don't think that's where the sticky point is in converting dtypes in structured arrays.In any case, we need something more concrete to explore.
You can't change the
dtype
of structured fields in-place. You have to make a new array with a new dtype, and copy values from the old to the new.has some functions that can help in changing structured arrays.
===========
I suspect that it will be simpler to spell out the
dtypes
when callinggenfromtxt
than to change dtypes in an existing array.You could try one read with the
dtype=None
and limited number of lines to get the column count and basedtype
. Then edit that, substituting floats for ints as needed. Now read the whole with the new dtype. Look in therecfunctions
code if you need ideas on how to edit dtypes.For example:
A crude dtype editor:
And applying this to default dtype:
=====================
astype
works if the target dtype matches. For example if I read thetxt
with dtype=None, and then use the deriveddt
, it works:Same for
arr.astype('U3,int,float,int')
which also has 4 compatible fields.