How to parse a Uniprot Dat file to retrieve GO in python?

Question

How to parse a Uniprot Dat file to retrieve GO in python?

1.2k views Asked by Muhammad Zeeshan At 31 July 2017 at 16:14

I have tried BioPython SeqIO and other parsers but couldn't find any good tool to parse DAT files.

https://omics.pnl.gov/software/uniprot-dat-file-parser

I have tried this one but they don't provide any gene annotations

http://biopython.org/wiki/SeqIO

They mostly talk about taking inputs of FASTA and not DAT file.

from Bio import SeqIO
   for record in SeqIO.parse("Fasta/f002", "fasta"):
...     print("%s %i" % (record.id, len(record)))

Original Q&A

There are 2 answers

**Christian Ebeling** · Answer 1 · 2017-08-28T16:30:33+00:00

Dear Muhammad Zeeshan,

you can use the query functions of the python library pyuniprot to get sequence (or many thing else)

install (with pip or git clone) and update. Find out which taxonomy identifier fits to your organisms. Example here (human, mouse, rat). Don't make a full update for all organisms (takes very long).

pyuniprot.update(taxids=[9606, 10090, 10116])

Use following python code for your problem:

Assuming 1433E_HUMAN and A4_HUMAN are the identifier of interest:

Python code:

import pyuniprot
query = pyuniprot.query() 
entries = query.entry(name=('1433E_HUMAN', 'A4_HUMAN'))  
seqs = [x.sequence.sequence for x in entries]

**Peter Cock** · Answer 2 · 2017-08-01T13:48:48+00:00

Those look like what Biopython calls "swiss" format, the plain text format used at SwissProt prior to it being called UniProt. Try:

from Bio import SeqIO
   for record in SeqIO.parse("example.dat", "swiss"):
       print("%s %i" % (record.id, len(record)))

See also the table for formats at http://biopython.org/wiki/SeqIO

TechQA.

How to parse a Uniprot Dat file to retrieve GO in python?

There are 2 answers

Related Questions in PYTHON

Related Questions in PARSING

Related Questions in BIOPYTHON

Related Questions in BIOSERVICES

Popular Questions

Popular Tags

Trending Questions