Error when calculating average scores with NumPy: 'ufunc add' did not contain a loop

98 views Asked by At

I'm encountering an issue when trying to calculate the average scores of students using NumPy. The code I've written is giving me the following error:

Traceback (most recent call last): average_scores = np.nanmean(numeric_columns, axis=1) ... numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types

CODE

import numpy as np

# Defining anything that could be missing in someone else's data
missing_values = ['N/A', 'NA', 'nan', 'NaN', 'NULL', '', '']

# Defining each of the data types
dtype = [('Student Name', 'U50'), ('Math', 'float'), 
         ('Science', 'float'), ('English', 'float'), 
         ('History', 'float'), ('Art', 'float')]

# Load data into a numpy array
data = np.genfromtxt('grades.csv', delimiter=',', 
                     names=True, dtype=dtype,
                     encoding=None, missing_values=missing_values,
                     filling_values=np.nan, ndmin=2)


# Get all the field names (column names) in the structured array
field_names = data.dtype.names


# Extract the numeric columns by checking their data type
numeric_columns = data[[field for field in field_names if data[field].dtype == float]]


# Calculate the average score for each student
average_scores = np.nanmean(numeric_columns, axis=1)

print(average_scores)

Here is my data in the 'grades.csv' file:

Student Name,Math,Science,English,History,Art
Alice,90,88,94,85,78
Bob, 85,92,,88,90
Charlie,78,80,85,85,79
David,94,,90,92,84
Eve,92,88,92,90,88
Frank,,95,94,86,95

What I've Tried I've tried loading the data, filtering the numeric columns, and calculating the average scores using np.nanmean(). I've also made sure to handle missing values appropriately.

EXPECTATIONS I expected the code to calculate and print the average scores for each student without errors.

REQUEST FOR HELP I'd appreciate any assistance in understanding the cause of the error and how to resolve it.

2

There are 2 answers

0
ahjim0m0 On

The function np.nanmean() is correct as it ignores the NaN values, read documentation.

For your example, your numeric columns is a heterogeneous (multi-type) array. You can resolve this by converting it to a homogeneous (single-type) array with the array.astype() function.

Try this:
enter image description here

2
hpaulj On

With your sample txt:

In [2]: txt='''Student Name,Math,Science,English,History,Art
   ...: Alice,90,88,94,85,78
   ...: Bob, 85,92,,88,90
   ...: Charlie,78,80,85,85,79
   ...: David,94,,90,92,84
   ...: Eve,92,88,92,90,88
   ...: Frank,,95,94,86,95'''

You'vw handled genfromtxt as well as anyone I've seen on SO:

In [3]: # Defining anything that could be missing in someone else's data
   ...: missing_values = ['N/A', 'NA', 'nan', 'NaN', 'NULL', '', '']
   ...: 
   ...: # Defining each of the data types
   ...: dtype = [('Student Name', 'U50'), ('Math', 'float'), 
   ...:          ('Science', 'float'), ('English', 'float'), 
   ...:          ('History', 'float'), ('Art', 'float')]
   ...: 
   ...: # Load data into a numpy array
   ...: data = np.genfromtxt(txt.splitlines(), delimiter=',', 
   ...:                      names=True, dtype=dtype,
   ...:                      encoding=None, missing_values=missing_values,
   ...:                      filling_values=np.nan, ndmin=2)

data is a structured array; ipython's display is the repr, so it shows dtype:

In [4]: data
Out[4]: 
array([[('Alice', 90., 88., 94., 85., 78.)],
       [('Bob', 85., 92., nan, 88., 90.)],
       [('Charlie', 78., 80., 85., 85., 79.)],
       [('David', 94., nan, 90., 92., 84.)],
       [('Eve', 92., 88., 92., 90., 88.)],
       [('Frank', nan, 95., 94., 86., 95.)]],
      dtype=[('Student_Name', '<U50'), ('Math', '<f8'), ('Science', '<f8'), ('English', '<f8'), ('History', '<f8'), ('Art', '<f8')])

In [5]: field_names = data.dtype.names
   ...: # Extract the numeric columns by checking their data type
   ...: numeric_columns = data[[field for field in field_names if data[field].dtype == float]]

In [6]: numeric_columns
Out[6]: 
array([[(90., 88., 94., 85., 78.)],
       [(85., 92., nan, 88., 90.)],
       [(78., 80., 85., 85., 79.)],
       [(94., nan, 90., 92., 84.)],
       [(92., 88., 92., 90., 88.)],
       [(nan, 95., 94., 86., 95.)]],
      dtype={'names': ['Math', 'Science', 'English', 'History', 'Art'], 'formats': ['<f8', '<f8', '<f8', '<f8', '<f8'], 'offsets': [200, 208, 216, 224, 232], 'itemsize': 240})

This is (6,1) shape with 5 fields. The size 1 dimension is there because you specified ndmin=2. Without that it would be (6,).

nanmean can't work with this compound dtype. There are several ways of converting to a simple float array. astype/view work, so does:

In [7]: x=np.array(numeric_columns.tolist())
In [8]: x.shape
Out[8]: (6, 1, 5)
In [10]: np.nanmean(x[:,0,:], axis=1)
Out[10]: array([87.  , 88.75, 81.4 , 90.  , 90.  , 92.5 ])

Another converter:

In [12]: import numpy.lib.recfunctions as rf
In [13]: y=rf.structured_to_unstructured(numeric_columns[:,0])
In [14]: y
Out[14]: 
array([[90., 88., 94., 85., 78.],
       [85., 92., nan, 88., 90.],
       [78., 80., 85., 85., 79.],
       [94., nan, 90., 92., 84.],
       [92., 88., 92., 90., 88.],
       [nan, 95., 94., 86., 95.]])