Quicker Method to Group Numpy Array Elements Based on Second Numpy Array

Question

Quicker Method to Group Numpy Array Elements Based on Second Numpy Array

269 views Asked by Athena Wisdom At 14 September 2020 at 22:56

There are 2 NumPy arrays groups and selectors, where

selectors is an array containing integers that needs to be grouped

import numpy as np
np.random.seed(0)

selectors = np.random.randint(0, 300, 5)
# [172  47 117 192 251]

groups is a structured array containing the first index (int) of a group (str)

# Generate groups `a` to `t` and their first index
start = ord('a')
groups = []
for i in range(20):
    e = (i*i, chr(start+i))
    groups.append(e)
groups = np.array(groups, dtype=[('index', np.uint32), ('selector', '|U1')])
groups = np.sort(groups, order='index')

# [(  0, 'a') (  1, 'b') (  4, 'c') (  9, 'd') ( 16, 'e') ( 25, 'f')
#  ( 36, 'g') ( 49, 'h') ( 64, 'i') ( 81, 'j') (100, 'k') (121, 'l')
#  (144, 'm') (169, 'n') (196, 'o') (225, 'p') (256, 'q') (289, 'r')
#  (324, 's') (361, 't')]

Given these example arrays, the desired result after grouping will be a dictionary of np.ndarrays/lists

{
    "g": [47] ,
    "k": [117],
    "n": [172, 192],
    "p": [251]
}

Is there a quicker way to perform this grouping in Numpy instead of nesting 2 loops, as shown below? This will be useful for large selectors arrays with 10-100 million rows using groups array with 100-1000 rows.

Using Nested Loops

results = {}
for s in selectors:
    for i in range(len(groups)-1):
        if s >= groups[i][0] and s < groups[i+1][0]:
            j = i
            break
    else:
        j = i + 1

    try:
        results[groups[j][1]].append(s)
    except KeyError:
        results[groups[j][1]] = [s]
print(results)
# {'n': [172, 192], 'g': [47], 'k': [117], 'p': [251]}

Original Q&A

There are 1 answers

**Frank Yellin** · Answer 1 · 2020-09-14T23:36:49+00:00

If you use binary search on each selector, you are effectively changing the time of your routine from O(len(groups) * len(selectors)) to O(log2(len(groups)) * len(selectors))

The Python documentation on the bisect module explains how to use it to find the right-most element less than or equal to a specified value.

TechQA.

Quicker Method to Group Numpy Array Elements Based on Second Numpy Array

There are 1 answers

Related Questions in PYTHON

Related Questions in ARRAYS

Related Questions in ALGORITHM

Related Questions in NUMPY

Related Questions in STRUCTURED-ARRAY

Popular Questions

Popular Tags

Trending Questions