Searching Entrez based on paper title

Question

Searching Entrez based on paper title

65 views Asked by Intrastellar Explorer At 02 November 2023 at 18:31

I am trying to search NCBI's Entrez based on a title. Here are my GET requests's URL and parameters:

import requests

url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
# SEE: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8021862/
params = {
    "tool": "foo",
    "email": "[email protected]",
    "api_key": None,
    "retmode": "json",
    "db": "pubmed",
    "retmax": "1",
    "term": '"Interpreting Genetic Variation in Human and Cancer Genomes"[Title]',
}
response = requests.get(url, params=params, timeout=15.0)
response.raise_for_status()
result = response.json()["esearchresult"]

However, I am getting no results, the result["count"] is 0. How can I search Entrez for based on a paper's title?

When answering, feel free to use requests directly, or common Entrez wrappers like biopython's Bio.Entrez or easy-entrez. I am using Python 3.11.

Original Q&A

There are 1 answers

**Intrastellar Explorer** · Accepted Answer · 2023-11-02T21:04:40+00:00

Okay, turns out nothing is ever easy.

TL;DR use proximity search with a distance of 0.

"term": '"Interpreting Genetic Variation in Human and Cancer Genomes"[Title:~0]'

Starting at the PubMed User Guide, in the "Searching for a phrase section", it talks about PubMed's phrase index. Let's start by seeing what the phrase index contains for this search.

Going to PubMed Advanced Search Builder, and hitting the "Show Index" button:

We observe that the search query was not in the phrase index. Now back in the "Searching for a phrase" section, we see:

If you use quotes and the phrase is not found in the phrase index, the quotes are ignored and the terms are processed using automatic term mapping.

Okay, so it seems automatic term mapping (ATM) is failing us as well. Let's keep reading in the "Quoted phrase not found" section:

To search for a phrase that is not found in the phrase index, use a proximity search with a distance of 0 (...); this will search for the quoted terms appearing next to each other, in any order.

Now, trying that proximity search with 0 distance:

import requests

url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
# SEE: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8021862/
params = {
    "tool": "foo",
    "email": "[email protected]",
    "api_key": None,
    "retmode": "json",
    "db": "pubmed",
    "retmax": "1",
    "term": '"Interpreting Genetic Variation in Human and Cancer Genomes"[Title:~0]',
}
response = requests.get(url, params=params, timeout=15.0)
response.raise_for_status()
result = response.json()["esearchresult"]
print(result["count"])  # Prints: 1
print(result["idlist"][0])  # Prints: 33834021

It works! Case closed.

Notes:

Searching for a whole title in exact ordering (via double quotes) doesn't work, because the PubMed doesn't index full titles into the phrase index.
Using a zero distance proximity search has a downside: it doesn't enforce exact term ordering. However, it's a viable workaround for point 1.

TechQA.

Searching Entrez based on paper title

There are 1 answers

Related Questions in PYTHON

Related Questions in BIOPYTHON

Related Questions in NCBI

Related Questions in PUBMED

Popular Questions

Popular Tags

Trending Questions