I'm trying to use genfromtxt to extract a csv file that contains missing values such as 'na' and '-' I am required to look for the minimum value in the list of data, but the missing values got returned as -1.
this is my code:
data = np.genfromtxt('median-resale-prices-for-registered-applications-by-town-and-flat-type.csv',
skip_header=1,
dtype=[('quarter', 'U7'), ('town', 'U50'), ('flat_type', 'U10'), ('price', 'i8')], delimiter=",",
missing_values=['na','-'], filling_values=[0])
min_price = np.min(data['price'])
print(min_price)
and this is what i have in return
-1
i have also tried isnan()
print("Original data: " + str(data.shape))
null_rows = np.isnan(data['price'])
print(null_rows)
nonnull_values = data[null_rows==False]
print("Filtered data: " + str(nonnull_values.shape))
however, python did not perceive the na and - values as isnan Original data: (9360,) [False False False ... False False False] Filtered data: (9360,)
is there something wrong with my code?
With the sample, adapted from the comment:
Accepting that last field as floats (no fill stuff):
genfromtxtnormally usesnanfor values it can't parse as floats.For integer, it apparently uses -1 instead:
After some fiddling, I got this to work. The key was to use a single value of
filling_values, not a list.Looking at the code (via
[source]in the docs), I see we can usedict, specifying different values for different columns. ThusThere are more details in the code than in the documentation. I haven't used these values much, so each time I have learn more.
edit
In today's question you seem to have forgotten all that you learned here.
datais a structured array. Withf8you getnanfor the missing values, not-1. And you attempt to treat the array as a list of tuples. Why not continue to treat it as a structured array?The
pricefield:Use the mask to select, or "delete" elements from the 1d array:
But here's a list based approach: