Convert Tibble to Parameter List

755 views Asked by At

I am trying to convert a Tibble to a parameter list for a function call. The reason I am doing this is because I want to create a simple file specification Tibble for reading in multiple fixed width files with varying columns. This way I only need to specify what columns are in a file using pull and select and then I can automatically have the file loaded and parsed. However, I am running into problems using the cols object to specify column formats.

For this example lets assume I have a Tibble of the format:

> (filespec <- tibble(ID = c("Title", "Date", "ATTR"), Length = c(23, 8, 6), Type = c("col_character()", "col_date()", "col_factor(levels=c(123456,654321)")))
# A tibble: 3 x 3
     ID Length                               Type
  <chr>  <dbl>                              <chr>
1 Title     23                    col_character()
2  Date      8                         col_date()
3  ATTR      6 col_factor(levels=c(123456,654321)

I want to end up with a cols object of the format:

> (cols(Title = col_character(), Date = col_date(), ATTR=col_factor(levels=c(123456,654321))))
cols(
  Title = col_character(),
  Date = col_date(format = ""),
  ATTR = col_factor(levels = c(123456, 654321), ordered = FALSE)
)

From other questions I have read I know this can be done with do.call. But I can not figure out how to convert the columns ID and Type to a cols object in an automated manner. Here is an example of what I tried...

> do.call(cols, select(filespec,ID, Type))
Error in switch(x, `_` = , `-` = col_skip(), `?` = col_guess(), c = col_character(),  : 
  EXPR must be a length 1 vector

I am assuming the select needs to be wrapped with another function that performs the row to parameter mapping, how is this done?

2

There are 2 answers

1
Konrad Rudolph On BEST ANSWER

tl;dr: There are many things that make this more complex than it seems. But it’s feasible, and the resulting code (provided at the end) isn’t complicated, once the individual parts are understood.

As discussed in the comments, I fundamentally prefer Joran’s approach. In fact, whenever you find yourself storing code expressions in character strings, this should set off alarm bells: it’s an anti-pattern known as stringly typed code (a riff on, and quite the opposite of, strongly typed code). Unfortunately R is quite full of stringly typed code.

That said, your use-case (file-based configuration) is in itself a good idea. I would consider storing the information in a different format than R code fragments. But, well, it does work. So let’s explore why your code doesn’t work.

The first problem is this: you pass a tibble to do.call. Tibbles are lists of columns, so do.call allows this. However, internally your call is transformed to something equivalent to:

cols(
    ID = c("Title", "Date", "ATTR"),
    Type = c("col_character()", "col_date()", "col_factor(levels=c(123456,654321))")
)

— But this isn’t the code we want at all!

We need to fix two things here:

  1. We need to use the Type column as argument values, and the ID column as argument names. We can do this by creating a new list that has ID as names and Type as values: setNames(Type, ID).

  2. cols does not know what to do with character string arguments. It needs column specifications — objects of type Collector.

    Put differently, it’s a huge difference whether you write "col_date()" or col_date().

To fix this, we need to do something fairly complex: we nee to parse the Type column as R code, and we need to evaluate the resulting parsed expressions. R provides two handy functions (parse and eval, respectively) to accomplish this. But don’t let the existence of two easy functions fool you: it’s an incredibly complex operation. R essentially needs to run a full parser and interpreter on your code fragments. And it gets even hairier if the code isn’t what you expect. For instance, the text might contain the code unlink('/', recursive = TRUE) instead of col_date(). R would then happily erase your hard drive.

This is just one of the reasons why parse/eval is complex and generally avoided. Other reasons include: what happens if there’s a parse error in the code (in fact, your code does contain a missing closing parenthesis …)?

But here we go. Now that we have all the pieces together, we can join them relatively easily:

filespec %>%
    mutate(Parsed = lapply(Type, function (x) parse(text = x, encoding = 'UTF-8'))) %>%
    mutate(ColSpec = lapply(Parsed, eval)) %>%
    with(setNames(ColSpec, ID)) %>%
    do.call(cols, .)

Execute this code piece by piece to see what it does and convince yourself that it’s working correctly.

1
joran On

I might approach this a little differently, and store the file specs in a simple list:

library(purrr)
library(readr)
filespec <- list(Title = list(Length = 23,
                              Type = col_character()),
                 Date = list(Length = 8,
                             Type = col_date()),
                 ATTR = list(Length = 6,
                             Type = col_factor(levels = 123456,654321)))

a <- at_depth(.x = filespec,.depth = 1,.f = "Type")
> invoke(.f = cols,.x = a)

cols(
  Title = col_character(),
  Date = col_date(format = ""),
  ATTR = col_factor(levels = 123456, ordered = 654321, include_na = FALSE)
)

or,

> invoke(.f = cols,.x = a[c('Title','ATTR')])
cols(
  Title = col_character(),
  ATTR = col_factor(levels = 123456, ordered = 654321, include_na = FALSE)
)