Is there a faster way to find synonyms for a large list of taxa in R?

493 views Asked by At

I have a list of about ~96,000 species names I need to collect all synonyms for. I have tried the 'taxize' package with the synonyms() function, which outputs the information I need but my list is too long for it to work properly. I have looked into the 'taxizedb' package which has been suggested as faster for some users before, but I am not sure which functions within this package will accomplish what I am trying to do.

Any suggestions would be greatly appreciated! Thanks!

Code so far:

library("taxize")
library("tidyverse")

#load in list of species (~96,000)
#vspli <- read.csv(file="AllBHLspecieslist.csv", header=TRUE) #my code
vspli <- c("Acer obtusatum", "Acer interius", "Acer opalus", "Acer saccharum", "Acer palmatum") #workable example
#Use Taxize to search for synonyms
synlist1 <- synonyms(c(vspli), db="itis", rows=1) #currently this line of code crashes before completion when using the list of 96k species
1

There are 1 answers

0
mfertakos On BEST ANSWER

In case anyone comes across this later, I found the package 'taxadb' which allowed for the completion of this problem much faster. Here is the code in case it proves useful:

library(taxadb)

#create local itis database
td_create("itis",overwrite=FALSE)

allnames<-read.csv(file="AllBHLspecieslist.csv", header=TRUE)



#get  IDS for each scientific name
syn1<-allnames %>%
  select(Scientific.Name) %>%
  mutate(ID=get_ids(Scientific.Name,"itis"))

#Deal with NAs (one name corresponds to more than 1 ITIS code) (~10k names)

syn1_NA<-as.data.frame(syn1$Scientific.Name[is.na(syn1$ID)])
colnames(syn1_NA)<-c("name")

NA_IDS<-NULL
for(i in unique(syn1_NA$name)){
  tmp<-as.data.frame(filter_name(i, 'itis')[5])
  tmp$name<-paste0(i)
  NA_IDS<-rbind(NA_IDS,tmp)
}

#join with originial names
colnames(syn1)<-c("name","ID")
IDS<-left_join(syn1,NA_IDS,by="name") #I think its a left join double check this

#extract just the unique IDs
IDS<-data.frame(ID=c(IDS[,"ID"],IDS[,"acceptedNameUsageID"]))
IDS<-as.data.frame(unique(IDS$ID))
IDS<-as.data.frame(IDS[-is.na(IDS)])
colnames(IDS)<-"ID"
#extract all names with synonyms in ITIS that are at the species level [literally all of them]
#set query
ITIS<-taxa_tbl("itis") %>%
  select(scientificName,taxonRank,acceptedNameUsageID,taxonomicStatus) %>%
  filter(taxonRank == "species")

#see query
ITIS %>% show_query()
#retrieve results
ITIS_names<-ITIS %>% collect()

#filter to only those that match ITIS codes for all my species
ITIS_names<-ITIS_names %>%
  filter(acceptedNameUsageID %in% IDS$ID)