I need some help extracting affiliation information from PubMed search strings in R. I have already successfully extracted affiliation information from a single PubMed ID XML, but now I have a search string of multiple terms that I need to extract the affiliation information from with hope of then creating a data frame with columns such as: PMID, author, country, state etc.
This is my code so far:
my_query <- (PubMed Search String)
my_entrez_id <- get_pubmed_ids(my_query)
my_abstracts_txt <- fetch_pubmed_data(my_entrez_id, format = "abstract")
The PubMed search string is very long, hence why I haven't included it here. The main aim is therefore to produce a dataframe from this search string which is a table clearly showing affiliation and other general information from the PubMed articles.
Any help would be greatly appreciated!
Have you tried the
pubmedRpackage? https://cran.rstudio.com/web/packages/pubmedR/index.htmlYou can use the built in function
my_pm_df <- pmApi2df(my_request)but this will not provide affiliations for all authors.You can use a combination of
pluck()andmap()frompurrrto extract what you need into a tibble.All author data is contained in that nested list, in the
Author$AffiliationInfolist (note it is a list because one author can have multiple affiliations).================================================= EDIT based on comments:
First construct your request URLs. Make sure you replace
&emailwith your email address:I like to wrap my API requests in
safelyto catch any errors. Then usemapto loop through themy_queryvector. Note weSys.sleepfor 5 seconds after each request to comply with PubMed's rate limit. You can probably cut this down a bit seconds or even less, check in the API documentation.Next we parse the request with
content()inread_xml(). Note that we are parsing theresult:This can probably be cleaned up some but it works. Coerce the AuthorInfo to a list and use a combination of
map(),pluck()andunnest(). Note that a given author might have more than one affiliation but am only plucking the first one.