using enumerate to iterate over a dictionary of lists to extract information

154 views Asked by At

I got some help earlier today about how to obtain positional information from a dictionary using enumerate(). I will provide the code shortly. However, now that I've found this cool tool, I want to implement it in a different manner to obtain some more information from my dictionary.

I have a dictionary:

length = {'A': [(0,21), (30,41), (70,80), (95,200)] 'B': [(0,42), (70,80)]..etc}

and a file:

A    73
B    15
etc

What I want to do now is to find the difference from the max of the first element in my list from the min from the second element. For example, the difference of 21 and 30. Then I want to add all these differences up until I hit the pair (range) of numbers that the number from my file matches to (if that makes sense).

Here is the code that I've been working on:

import csv
with open('Exome_agg_cons_snps_pct_RefSeq_HGMD_reinitialized.txt') as f:
    reader = csv.DictReader(f,delimiter="\t")
    for row in reader:
        snppos = row['snp_rein']
        name = row['isoform']
        snpos = int(snppos)
        if name in exons:
            y = exons[name]
            for sd, i  in enumerate(exons[name]):
                while not snpos<=max(i):
                    intron = min(i+1) - max(i) #this doesn't work unfortunately. It says I can't add 1 to i
                    totalintron = 0 + intron
                if snpos<=max(i):
                    exonmin = min(i)
                    exonnumber = sd+1
                    print exonnumber,name,totalintron
                    break

I think it's the sd (indexer) that is confusing me. I don't know how to use it in the this context. The commented out portions are other avenues I've tried but failed to be successful. Any help? I know this is a confusing question and my code might be a little mixed up, but that's because I can't even get an output to correct my other mistakes yet.

I want my output to look like this based on the file provided:

exon   name    introntotal    
3    A    38
1    B    0
2

There are 2 answers

3
Nir Friedman On BEST ANSWER

To try to provide some help for this question: a critical part of the problem is that I don't think enumerate does what you think it does. Enumerate just numbers the things you are iterating over. So when you go through your for loop, sd will first be 0, then it will be 1... And that's all. In your case, you want to look at adjacent list entries (it seems?), so the more idiomatic ways of looping in python aren't nearly as clean. So you could do something like:

...
y = exons[name]

for index in range(len(y) - 1): # the - 1 is to prevent going out of bounds
    first_max = max(y[index])
    second_min = min(y[index+1])
    ... # do more stuff, I didn't completely follow what you're trying to do

I will add for the hardcore pythonistas, you can of course do some clever stuff to write this more idiomatically and avoid the C style loop that I wrote, but I think that getting into zip and so on might be a bit confusing for somebody new to python.

0
junnytony On

The issue is that you're using the output of enumerate() incorrectly.

enumerate() returns the index (position) first then the item

Ex:

x = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
for i, item in enumerate(x):
    print(i, item)

# prints
#(0, 10)
#(1, 11)
#(2, 12)
#(3, 13)
#(4, 14)
#(5, 15)
#(6, 16)
#(7, 17)
#(8, 18)
#(9, 19)

So in your case, you should switch i and sd:

for i, sd in enumerate(exons[name]):
    # do something

Like other commenters suggested, reading the python documentation is usually a good place to start resolving issues, especially if you're not sure how a function does what it does :)