Download all pubmed abstracts

14.1k views Asked by At

Does anyone know how I can easily download all of the pubmed article abstracts? I am working on a text mining project.

The closest one I can find can download one abstract at a time given an pmid, but that would be too slow for my purpose since I'd have to download them one at a time.

5

There are 5 answers

0
Matt Yoon On

As of 2021, you can access the corpus through simple API of Huggingface Datasets.

https://huggingface.co/datasets/pubmed

0
user4724822 On

Searching for "0000/01/01"[PDAT] : "3000/12/31"[PDAT] should get you every article from the beginning of time.

Maybe the "sent to" function shown right-above the search results will let you download everything.

Alternatively, you can write a script and use the Entrez programming utility from the NCBI.

You can perform a search query using ESearch, which will return all PMID's. Then you can use EFetch to return all data. It is explained in this book/manual: http://www.ncbi.nlm.nih.gov/books/NBK25501/

Chapter 3 contains some example scripts which should get you started: http://www.ncbi.nlm.nih.gov/books/NBK25498/#chapter3

You will get xml files containing the abstract and all other data.

25 million XML files...

0
JDR On

I am aware this has gone a bit stale, but they have a process for your exact same use case - large scale mining projects.

You can get the data via a free licensing agreement - more information here.

2
RMagauran On

You can get ALL the data from NLM directly via FTP.

https://www.nlm.nih.gov/databases/download/terms_and_conditions_pubmed.html

Download and work away without worrying about e-utils.

0
gawi On

I would use RESTful API provided by Europe PMC. They allow to download 25 articles per query in json or xml format. Example queries for articles about malaria would look like:

You can use different format of search query, it depends what you really want to retrieve.