Issue with parsing publication data from PubMed with Entrez

990 views Asked by At

I am trying to use Entrez to import publication data into a database. The search part works fine, but when I try to parse:

from Bio import Entrez

def create_publication(pmid):

    handle = Entrez.efetch("pubmed", id=pmid, retmode="xml")
    records = Entrez.parse(handle)
    item_data = records.next()
    handle.close()

... I get the following error:

File "/venv/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 296, in parse raise ValueError("The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse") ValueError: The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse

This code used to work until a few days ago. Any ideas what might be going wrong here?

Also, looking at the source code (http://biopython.org/DIST/docs/api/Bio.Entrez-pysrc.html) and trying to follow the listed example, gives the same error:

from Bio import Entrez 
Entrez.email = "[email protected]"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")    
records = Entrez.parse(handle) 
for record in records: 
    print(record['MedlineCitation']['Article']['ArticleTitle']) 
handle.close()
1

There are 1 answers

0
nbryans On BEST ANSWER

The issue, as documented in other comments and the GitHub Issue, is caused by a deliberate change made by NCBI Entrez Utilities Developers. As documented in this issue by Jhird , you can change your code to the following:

from Bio import Entrez 
Entrez.email = "[email protected]"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")  

records = Entrez.read(handle)      # Difference here
records = records['PubmedArticle'] # New line here  

for record in records: 
    print(record['MedlineCitation']['Article']['ArticleTitle']) 
handle.close()