R - most efficient way to get specific row from nested dataframe inside a dataframe

441 views Asked by At

I am working with API data from the FoodData Central API. The JSON returned includes a row for each matching food, plus a nested dataframe with specific nutritional information (foodNutrients) for this same food. Note that the foodNutrients dataframe is not always available, and the number of columns it returns varies.

I am trying to pull 2 specific columns (value, unitName) from 1 specific row (where nutrientId==1008) of the foodNutrients df, when available for a given row of the parent df.

I'm looking for the most efficient way to do what this ugly, ugly code is doing:

##############################################################################
# ugly hacky version
library(jsonlite)
library(dplyr)

# search FoodData Central API and get JSON
api_key = "DEMO_KEY"
search_keyword = "banana cream pie"
endpoint_url <- paste("https://api.nal.usda.gov/fdc/v1/foods/search?api_key=",api_key,"&query=",gsub(" ","%20",search_keyword),"&requireAllWords=true",sep="")
json <- fromJSON(endpoint_url)

# here is our dataframe which contains a list containing another dataframe (foodNutrients)
df1 <- json$foods %>%
  select(fdcId, description, brandOwner, ingredients, foodNutrients)

# here is some hacky code to show what I am trying to do
# yes, this is probably the stupidest way I could code this
# may God have mercy on my soul

# help me Obi Wan, you're my only hope

# hold output here
output <- data.frame() 

# yes, I am using a loop, which is dumb
for(i in 1:nrow(df1)) {
  
  # grab the nested dataframe and add our index
  nested_df <- df1[i,"foodNutrients"][[1]]

  # hack
  chk <- data.frame() 
  
  # if nested_df exists and has all needed columns...
  if(nrow(nested_df)>0 & 
     "value" %in% colnames(nested_df) & 
     "unitName" %in% colnames(nested_df) & 
     "nutrientId" %in% colnames(nested_df)) {chk <- nested_df %>% filter(nutrientId==1008)}
  
  # pull out energy if available
  if(nrow(chk)==1) {
    energy_value <- chk[1,"value"]
    energy_unit <- chk[1,"unitName"]    
  } else {
    energy_value <- 0
    energy_unit <- NA    
  }
  
  # add a row to a new dataframe with everything I want
  row <- data.frame(fdcId=df1[i,"fdcId"],
                    description=df1[i,"description"],
                    brandOwner=df1[i,"brandOwner"],
                    ingredients=df1[i,"ingredients"],
                    energy_value,
                    energy_unit)
  output <- rbind(output, row)  
}

# hello there!
print(head(output))

I have been playing around with various map() functions but have yet to find the right combination that does what I am trying to do here. How do I get this same result without a loop and pile of hacks?

1

There are 1 answers

0
Ben On

I'm not sure this is exactly what you're looking for, but thought...

Be mindful of your thoughts...they betray you...(says Obi Wan)

...so I tried map and unnest of the foodNutrients column:

library(tidyverse)

df1 %>%
  mutate(foodNutrients = map(foodNutrients, as_tibble)) %>%
  unnest(cols = foodNutrients) %>%
  filter(nutrientId == 1008) %>%
  select(value, unitName)

And, if you want to add zeros when no energy available, you can add:

right_join(df1, by = "fdcId") %>%
mutate(value = replace_na(value, 0)) %>%

before select.

Does this accomplish what you needed?

Output

# A tibble: 43 x 2
   value unitName
   <dbl> <chr>   
 1   292 KCAL    
 2   422 KCAL    
 3   300 KCAL    
 4   247 KCAL    
 5   231 KCAL    
 6   254 KCAL    
 7   269 KCAL    
 8   237 KCAL    
 9   269 KCAL    
10   251 KCAL    
# … with 33 more rows