Identify conserved sequences in lists of integers

119 views Asked by At

I have lists of integers :

   [[1,2,3,4,5,6,7,8,9],
    [1,2,-7,-6,-5,-4,-3,10,11,12],
    [3,4,-5,6,7,8,11,12,2,2],
    [etc]]
  • a number can be found 0, 1 or multiple times in each list
  • the sign can be negative

I need to find the motifs that are conserved in all the lists. Here, the result would be only one motif (which I found manually) :

 [[3,4,5,6,7],
 [-7,-6,-5,-4,-3],
 [3,4,-5,6,7]]

By "motif", I mean a sequence of numbers (at least 2 digits) that is found in all the lists : here for an example, the numbers 3,4,5,6,7 are found in this consecutive order in all the lists, though the order is reversed in the second list. Ideally, the detection of the motifs would allows a small number of differences

Any ideas ?

I thought that using networkX could help me identify "cliques" but I do not find a function that would help me solve this problem.

2

There are 2 answers

0
Ajay On

a is your list

In [78]: b=[[abs(i) for i in elem] for elem in a]

In [81]: c=list(set.intersection(*map(set, b)))


In [84]: [list({i for i in elem if abs(i) in c}) for elem in a]
Out[84]: [[2, 3, 4, 5, 6, 7], [2, -7, -6, -5, -4, -3], [3, 4, -5, 6, 7,  2]]
5
Mazdak On

You can first find one intersect between all of sub lists based on the absolute values of the sub lists then loop over all sub lists and find the desire intersection :

def find_intersection(m_list):
        temp=[map(abs,i) for i in m_list]
        v=set(temp[0])
        for k in temp[1:]:
            v=v.intersection(k)

        for i,k in enumerate(m_list):
            m_list[i]={t for t in k if abs(t) in v}

        return m_list


l=[[1, 2, -3, 4, -5, 6, 7, 8, 9], [1, 2, -7, -6, -5, -4, -3, 10, 11, 12], [3, 4, -5, 6, 7, 8, 11, 12, 2, 2]]

print find_intersection(l)

result :

[set([2, 4, 6, 7, -5, -3]), set([2, -7, -6, -5, -4, -3]), set([2, 3, 4, 6, 7, -5])]