Bioinformatic retrieval of GO terms and comparison against a user specified term

297 views Asked by At

I have the following function (that belongs to a class):

import Bio
from bioservices import KEGGParser, UniProt, QuickGO

def locate_common_GO(self,list_of_genes,GO_term):
    
    #initialize variables and classes
    q = QuickGO()
    a = Retrieve_Data()
    b=[]

    #get the uniprot IDS using hugo2uniprot. hugo2uniprot is a custom method of my Retrieve_Data class (which uses bioservices module) simply for getting a uniprot ID from a gene symbol. 
    for i in range(0,len(list_of_genes)):
        b.append(a.hugo2uniprot(list_of_genes[i],'hsa'))
        print 'Gene: {} \t UniProtID: {}'.format(list_of_genes[i],b[i])


    #search for GO terms and store as dictionary. Keys are the gene name and a list of GO terms are values. 
    GO_dict = {}
    for i in range(0,len(b)):
        q = QuickGO()
        GO_dict[list_of_genes[i]]= q.Annotation(protein=b[i], frmt="tsv", _with=True,tax=9606, source="UniProt", col="goName")
    keys = GO_dict.keys()

    #This bit should search the dictionary values for a term supplied by the user (stored in the variable 'GO_Term'). 
    #If the user supplied term is present in the retrieved list of GO terms I want it to add the dictionary key (i.e. the gene name) to a list named 'matches'. 
    matches = []
    for gene in range(0,len(keys)):
        if GO_term in GO_dict[keys[gene]].splitlines():
            matches.append(keys[i])
    return matches 

The problem is that despite supplying a gene list with known common gene terms the output of this function is always the same gene name. For example, 'TGFB1' and 'COL9A2' both have a GO term 'proteinaceous extracellular matrix' yet the output is a list, ['COL9A2','COL9A2'] which should be ['COL9A2','TGFB1']. Does anybody have any suggestions as to how to fix this program? I think I'm close but I can't find a solution.

1

There are 1 answers

0
Lev Levitsky On BEST ANSWER

You always append keys[i] to matches, but i doesn't change in that loop, so you always append the same item. You might want to append keys[gene] instead.