Optimizing two-dimensional arrays

596 views Asked by At

I'm coding in Python 2.7. I have two 2D arrays of coordinate tuples.

array1 = [[[00_RA,00_DEC] [01_RA,01_DEC] ... [0N_RA,0N_DEC]]
          [[10_RA,10_DEC] [11_RA,11_DEC] ... [1N_RA,1N_DEC]]
          ...
          [[M0_RA,M0_DEC] [M1_RA,M1_DEC] ... [MN_RA,MN_DEC]]]

array2 = [[[00_ra,00_dec] [01_ra,01_dec] ... [0n_ra,0n_dec]]
          [[10_ra,10_dec] [11_ra,11_dec] ... [1n_ra,1n_dec]]
          ...
          [[m0_ra,m0_dec] [m1_ra,m1_dec] ... [mn_ra,mn_dec]]]

I want to find the coordinates of the entries of both arrays that appear in the other array. The following code works, but takes a very long time to run, especially with M,N,m,n being typically between 100-1000 each.

indices = []
for M in xrange(len(array1)):
    array1_row = array1[M]
    for N in xrange(len(array1_row)):
        array1_coord = array1_row[N]
        RA = array1_coord[0]
        DEC = array1_coord[1]
        for m in xrange(len(array2)):
            array2_row = array2[m]
            for n in xrange(len(array2_row)):
                array2_coord = array2_row[n]
                ra = array2_coord[0]
                dec = array2_coord[1]
                if ra == RA and dec == DEC:
                    indices.append((M,N,m,n))

I am trying to optimize this using list comprehensions. I think the following should work:

indices = [(M,N,m,n) for M in xrange(len(array1)) for N in xrange(len(array1[M])) for m in xrange(len(array2)) for n in xrange(len(array2[m])) if array1[M,N,0] == array2[m,n,0] and array1[M,N,1] == array2[m,n,1]]

This is taking much longer to run though, even for a single row. (I stopped it after a few hours of running but it didn't throw an error in that time). Am I optimizing this in the best way? What can I do to make this quicker?

2

There are 2 answers

0
FMc On BEST ANSWER

You need a different data structure for fast lookups: dict rather than list. Here's an example:

array1 = [
    [[1,2], [1,5]],
    [[3,2], [7,5]],
]

array2 = [
    [[3,2], [9,9]],
    [[1,2], [1,5]],
]

lookup = {}

for r, row in enumerate(array1):
    for c, val in enumerate(row):
        pair = tuple(val)
        lookup[pair] = (r, c)

for r, row in enumerate(array2):
    for c, val in enumerate(row):
        pair = tuple(val)
        if pair in lookup:
            print dict(
                pair = pair, 
                array2_indexes = (r, c),
                array1_indexes = lookup[pair],
            )

Output:

{'array1_indexes': (1, 0), 'pair': (3, 2), 'array2_indexes': (0, 0)}
{'array1_indexes': (0, 0), 'pair': (1, 2), 'array2_indexes': (1, 0)}
{'array1_indexes': (0, 1), 'pair': (1, 5), 'array2_indexes': (1, 1)}
1
div On

Provided code has a runtime compexity of θ(M * N * m * n). It can be reduced to θ(M * N + m * n) by using a set. This can be done by first hashing values of first array in the set and then checking for values of other array whether they are already present in the set or not.

One thing to note, you can't add a list in a set(??they are mutable) so you will have to convert them to tuples before adding to the set.