Get data from BIEN database using R for species names including characters like "-" and "x"

43 views Asked by At

I´m trying to load data from the botanical database BIEN in R. Now I´m facing a problem, because it seems that there is no data loaded for the three species "Abies borisii-regis", "Abies equi-trojani" and "Abies bornmuelleriana". I guess that the names of the species causes the problem, since the code works for other species whose names don´t include characters like " - ". I use the following code:

library(BIEN)
occ <- BIEN_occurrence_species(species = c("Abies borisii-regis"), 
                           cultivated = FALSE,
                           natives.only = TRUE,
                           collection.info = FALSE,
                           all.taxonomy = TRUE)
ccdf <- data.frame(species = occ$scrubbed_species_binomial, # ccdf = clean coordinate data frame 
               decimalLongitude = occ$longitude,
               decimalLatitude = occ$latitude,
               dataset = "BIEN")

and receive this error message:

Error in data.frame(species = occ$scrubbed_species_binomial, decimalLongitude = occ$longitude,  : 
  Arguments imply different number of rows: 0, 1 

(The german original is:)

Fehler in data.frame(species = occ$scrubbed_species_binomial, decimalLongitude = occ$longitude,  : 
  Argumente implizieren unterschiedliche Anzahl Zeilen: 0, 1 

I already tried to make sure that I´m using the correct species name, by checking the names with TNRS (Taxonomic Name Resolution Service). Based on this, the correct names seem to be "Abies x borisii-regis" and "Abies nordmanniana subsp. equi-trojani", but the I still receive the error message.

Does anyone know how to fix this so that I can get the data? Or does anyone know if I have to type in the characters "-" and "x" from the species names in a different way, so that R recognizes them correctly?

1

There are 1 answers

8
Chris On

BIEN does provide an approach to assess whether other researchers have uploaded data regarding one's species of interest

species_lst = BIEN::BIEN_list_all()
which(stringr::str_detect(species_lst$species, 'Abies +') == TRUE)
 [1]  11559  11597  29860  30220  33864  40501  40867  41220  42231  47302
[11]  55005  56268  59129  59968  66001  88025  91275  92130  98312 112850
[21] 117033 119440 127698 136841 144468 155411 159019 167615 168374 169346
[31] 169825 174149 202565 203818 204107 212636 213108 217599 220297 227342
[41] 235460 239224 266549 266587 278250 278452 288153 294894 302138 305755
[51] 308746 310182 316280 319215 320767 320770 321848 323917 328317

Is there data like your data

"Abies borisii-regis" %in% species_lst$species
[1] TRUE
"Abies bornmuelleriana" %in% species_lst$species
[1] FALSE
"Abies equi-trojani" %in% species_lst$species
[1] FALSE

which seems to suggest that other researchers have yet to upload data related to two your species of interest, though you might decide to. And BIEN facilitates that process.

In this instance you appear to be the first researcher to potentially provide data for "Abies equi-trojani" and "Abies bornmuelleriana", while you can both find and share data for "Abies borisii-regis".

split_all = do.call(rbind, strsplit(species_lst$species, split= "(?<=[a-zA-Z])(?=[0-9])", perl = TRUE))

species_df = data.frame(id = as.integer(1:333778), species = split_all[, 1])

which(species_df$species == "Abies borisii-regis")
[1] 174149

It seemed more convenient to split, data.frame, then search. Moving on to attempting to query

num_rec_hits = BIEN_occurrence_records_per_species(species = c("Abies borisii-regis","Abies bornmuelleriana","Abies equi-trojani"))
Getting page 1 of records
num_rec_hits
  scrubbed_species_binomial count
1       Abies borisii-regis    45
45/lengths(species_lst)
     species 
0.0001348201

there are some, very few. A straight request, i.e. BIEN_occurrence_species("Abies borisii-regis") returns, but empty. The options then are a kind of bewildering 'truth' table to explore to see if something other than empty might be returned. Compare, only.geovalid=TRUE vs FALSE

a_hit = BIEN_occurrence_species("Abies borisii-regis", collection.info = TRUE, only.geovalid = FALSE)
Getting page 1 of records
a_hit
  scrubbed_species_binomial latitude longitude date_collected datasource
1       Abies borisii-regis       NA        NA           <NA>       GBIF
2       Abies borisii-regis       NA        NA           <NA>       GBIF
3       Abies borisii-regis       NA        NA     1981-09-21       GBIF
4       Abies borisii-regis       NA        NA           <NA>       GBIF
  dataset dataowner custodial_institution_codes collection_code datasource_id
1       E         E                           E               E          4362
2       E         E                           E               E          4362
3    TAIF      TAIF                        TAIF           PLANT          4813
4       S         S                           S  VascularPlants          4748
  catalog_number recorded_by record_number date_collected identified_by
1     1228027409        <NA>          <NA>           <NA>          <NA>
2     1228032008        <NA>          <NA>           <NA>          <NA>
3     1821797004   R. Warren           217     1981-09-21          <NA>
4     1095965342        <NA>          <NA>           <NA>          <NA>
  date_identified identification_remarks is_geovalid
1            <NA>                   <NA>           0
2            <NA>                   <NA>           0
3            <NA>                   <NA>           0
4            <NA>                   <NA>           0

a_hit2 = BIEN_occurrence_species("Abies borisii-regis", collection.info = TRUE, only.geovalid = TRUE)
Getting page 1 of records
> a_hit2
 [1] scrubbed_species_binomial   latitude                   
 [3] longitude                   date_collected             
 [5] datasource                  dataset                    
 [7] dataowner                   custodial_institution_codes
 [9] collection_code             datasource_id              
[11] catalog_number              recorded_by                
[13] record_number               date_collected             
[15] identified_by               date_identified            
[17] identification_remarks     
<0 rows> (or 0-length row.names)

By twiddling around with the default TRUE/FALSE, we've managed to surface 4 records, and judging by results of only.geovalid, none as yet are. The defaults

BIEN_occurrence_species(
       species,
       cultivated = FALSE,
       new.world = NULL,
       all.taxonomy = FALSE,
       native.status = FALSE,
       natives.only = TRUE,
       observation.type = FALSE,
       political.boundaries = FALSE,
       collection.info = FALSE,
       only.geovalid = TRUE,
       ...
     )

And further

a_hit3 = BIEN_occurrence_species("Abies borisii-regis", collection.info = TRUE, only.geovalid = FALSE, natives.only = FALSE)
tail(a_hit3)
    scrubbed_species_binomial latitude longitude date_collected datasource
104       Abies borisii-regis 40.93083  21.78695     1938-09-01       GBIF
105       Abies borisii-regis 41.76737  23.39889     1979-09-09       GBIF
106       Abies borisii-regis 41.76722  23.39889     1979-09-09       GBIF
107       Abies borisii-regis 42.13000  23.13000     1926-08-01       GBIF
108       Abies borisii-regis 42.14678  23.36869           <NA>       GBIF
109       Abies borisii-regis 41.65000  24.63000     1930-12-24       GBIF
    dataset dataowner custodial_institution_codes collection_code datasource_id
104    MNHN      MNHN                        MNHN               P          4620
105    MNHN      MNHN                        MNHN               P          4620
106    MNHN      MNHN                        MNHN               P          4620
107       K         K                           K       Herbarium          4513
108     SEV       SEV                         SEV     Herbariosev          4764
109       K         K                           K       Herbarium          4513
    catalog_number                     recorded_by record_number date_collected
104      439257047                     Humbert, H.          <NA>     1938-09-01
105      437258780                    Debreczy, Z.          <NA>     1979-09-09
106      438918642                    Debreczy, Z.          <NA>     1979-09-09
107     1020042023               Turrill, Dr. W.B.          1569     1926-08-01
108     2235618931 J.M. Sánchez-Robles & A. Terrab          <NA>           <NA>
109     1020039052                      Tedd, H.G.           537     1930-12-24
    identified_by date_identified identification_remarks is_geovalid
104          <NA>            <NA>                   <NA>           0
105          <NA>            <NA>                   <NA>           1
106          <NA>            <NA>                   <NA>           1
107    Farjon, A.            <NA>                   <NA>           1
108          <NA>            <NA>                   <NA>           1
109    Farjon, A.            <NA>                   <NA>           1

Ignore what I said about only.geovalid, there appear to be some records with longitude and latitude, but setting to FALSE is the only way to get to them? Truth tables are tough sledding.