Numpy masked_array sum

3.3k views Asked by At

I would expect the result of a summation for a fully masked array to be zero, but instead "masked" is returned. How can I get the function to return zero?

>>> a = np.asarray([1, 2, 3, 4])
>>> b = np.ma.masked_array(a, mask=~(a > 2))
>>> b
masked_array(data = [-- -- 3 4],
             mask = [ True  True False False],
       fill_value = 999999)

>>> b.sum()
7
>>> b = np.ma.masked_array(a, mask=~(a > 5))
>>> b
masked_array(data = [-- -- -- --],
         mask = [ True  True  True  True],
   fill_value = 999999)


>>> b.sum()
masked
>>> np.ma.sum(b)
masked
>>> 

Here's another unexpected thing:

>>> b.sum() + 3
masked
1

There are 1 answers

6
hpaulj On BEST ANSWER

In your last case:

In [197]: bs=b1.sum()
In [198]: bs.data
Out[198]: array(0.0)
In [199]: bs.mask
Out[199]: array(True, dtype=bool)
In [200]: repr(bs)
Out[200]: 'masked'
In [201]: str(bs)
Out[201]: '--'

If I specify keepdims, I get a different array:

In [208]: bs=b1.sum(keepdims=True)
In [209]: bs
Out[209]: 
masked_array(data = [--],
             mask = [ True],
       fill_value = 999999)
In [210]: bs.data
Out[210]: array([0])
In [211]: bs.mask
Out[211]: array([ True], dtype=bool)

here's the relevant part of the sum code:

def sum(self, axis=None, dtype=None, out=None, keepdims=np._NoValue):
    kwargs = {} if keepdims is np._NoValue else {'keepdims': keepdims}

    _mask = self._mask
    newmask = _check_mask_axis(_mask, axis, **kwargs)
    # No explicit output
    if out is None:
        result = self.filled(0).sum(axis, dtype=dtype, **kwargs)
        rndim = getattr(result, 'ndim', 0)
        if rndim:
            result = result.view(type(self))
            result.__setmask__(newmask)
        elif newmask:
            result = masked
        return result
    ....

It's the

 newmask = np.ma.core._check_mask_axis(b1.mask, axis=None)
 ...
 elif newmask: result = masked

lines that produce the masked value in your case. newmask is True in the case where all values are masked, and False is some are not. The choice to return np.ma.masked is deliberate.

The core of the calculation is:

In [218]: b1.filled(0).sum()
Out[218]: 0

the rest of the code decides whether to return a scalar or masked array.

============

And for your addition:

In [232]: np.ma.masked+3
Out[232]: masked

It looks like the np.ma.masked is a special array that propagates itself across calculations. Sort of like np.nan.