counts with 2 variables

422 views Asked by At

In a research study I have 2 variables:

x = number objects remembered
y = % tasks completed correctly

as follows:

x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])

I would like to return the result of the number of:

WMC Percent Count
2   100      3
3    33      2
3    66      2  etc.

I note the scipy.stats.itemfreq and np.bincounts only work for one variable.

4

There are 4 answers

6
Wolph On

If you have access to a recent version of numpy (1.9.0 or higher) you can use unique with the return_counts flag enabled. That will give you 2 arrays, one with values and one with the counts.

Here's a slightly modified version of the numpy.unique method which works for your case:

def unique(ar):
    ar = ar[np.lexsort((ar[:, 1], ar[:, 0]))]
    flag = np.concatenate(([True], (ar[1:] != ar[:-1]).any(axis=1)))
    idx = np.concatenate(np.nonzero(flag) + ([ar.size / 2],))
    return np.array(zip(ar[flag][:, 0], ar[flag][:, 1], np.diff(idx)))

print unique(np.array(zip(x, y)))

Result:

[[ 2.    1.    3.  ]
 [ 3.    0.33  2.  ]
 [ 3.    0.66  2.  ]
 [ 3.    1.    1.  ]
 [ 4.    0.5   1.  ]
 [ 4.    0.75  2.  ]
 [ 4.    1.    3.  ]
 [ 5.    0.4   1.  ]
 [ 5.    0.5   1.  ]
 [ 5.    0.6   1.  ]
 [ 5.    1.    2.  ]
 [ 6.    0.6   1.  ]
 [ 6.    0.75  1.  ]
 [ 6.    1.    2.  ]
 [ 7.    0.5   1.  ]
 [ 7.    0.75  1.  ]]
1
John Sharp On

Earlier on in your code why not construct a dictionary linking 'number objects remembered' to '% tasks completed correctly'?

i.e.

completed_tasks = {2 : 1.0, 3 : 33, 4 : 66}

then, you can easily add the completed tasks count to the array that is returned by scipy.stats.itemfreq:

a = scipy.stats.itemfreq(x)
a = [i.append(completed_tasks[i[0]]) for i in a]
0
Andrey Sobolev On

I would use collections.Counter for that purpose:

>>> import numpy as np
>>> x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
>>> y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])
>>> from collections import Counter
>>> c = Counter(zip(x,y))
>>> c
Counter({(2, 1.0): 3, (4, 1.0): 3, (3, 0.66000000000000003): 2, (5, 1.0): 2, (3, 0.33000000000000002): 2, (6, 1.0): 2, (4, 0.75): 2, (7, 0.5): 1, (6, 0.59999999999999998): 1, (5, 0.40000000000000002): 1, (5, 0.59999999999999998): 1, (3, 1.0): 1, (7, 0.75): 1, (6, 0.75): 1, (5, 0.5): 1, (4, 0.5): 1})
2
mhawke On

Not sure if it is suitable in your case, however, you can do this using itertools.groupby() on the zipped lists:

import numpy as np
from itertools import groupby

x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])

print "WMC\tPercent\tCount"
for key, group in groupby(sorted(zip(x, y))):
    print "{}\t{}\t{}".format(key[0], int(key[1]*100), len(list(group)))

Output

WMC Percent Count
2   100 3
3   33  2
3   66  2
3   100 1
4   100 3
4   75  2
4   50  1
5   100 2
5   60  1
5   40  1
5   50  1
6   75  1
6   100 2
6   60  1
7   50  1
7   75  1

Updated to produce numpy array

import numpy as np
from itertools import groupby

x = np.array([2,2,2,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,7,7])
y = np.array([1.0, 1.0, 1.0, 0.33, 0.33, 0.66, 0.66, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.5, 1.0, 1.0, 0.6, 0.4, 0.5,0.75, 1.0,1.0,0.6,0.5,0.75])

results = np.array([(key[0], int(key[1]*100), len(list(group)))
                        for key, group in groupby(sorted(zip(x, y)))])

Output

>>> results
array([[  2, 100,   3],
       [  3,  33,   2],
       [  3,  66,   2],
       [  3, 100,   1],
       [  4,  50,   1],
       [  4,  75,   2],
       [  4, 100,   3],
       [  5,  40,   1],
       [  5,  50,   1],
       [  5,  60,   1],
       [  5, 100,   2],
       [  6,  60,   1],
       [  6,  75,   1],
       [  6, 100,   2],
       [  7,  50,   1],
       [  7,  75,   1]])