Create a datatable containing the Nth digit of each of a list of file names

198 views Asked by At

I have a list of files containing output from a large model. I load these as a datatable using:

files <- list.files(path.expand("/XYZ/"), pattern = ".*\\.rds", full.names =    TRUE)
dt<- as.data.table(files)

This datatable "dt" has just 1 column, the file name. e.g XZY_00_34234.rds

the 50th and 51st character of each file name is a number. I want to create a datatable containing that 2 digit number for each file.

I used:

index <- as.data.table(as.integer(substr(dt,50,51)))

This gives me the correct value for the first file. I think I should be able to use apply to run this against each row of the file

I tried:

integers <- as.data.table(apply(dt,1,as.integer(substr(50,51))))

But get:

Error in substr(50, 51) : argument "stop" is missing, with no default

Any suggestions gratefully accepted!

2

There are 2 answers

0
Pierre L On BEST ANSWER

Try:

integers <- as.data.table(apply(dt, 1, function(x) as.integer(substr(x, 50, 51))))

The apply family of functions accept other functions and executes them over vectors and arrays. These functions are some times already defined, but an interesting feature was added to apply functions, you can write the function right there at the line for the first time. This saves time and keystrokes.

A narrower programming setup would require your function to first be written like:

fiftieth_char <- function(x) {
  as.integer(substr(x, 50, 51))
}

Next, that function could then be passed to the apply function.

apply(dt, 1, fiftieth_char)

But look how we were able to do those two steps in one.

0
akrun On

If you have just 1 column, you could extract the column as a vector and use substr directly on it instead of looping with apply. For data.table, extracting a column is using ?Extract functions [[ or $.

 as.data.table(as.integer(substr(dt[[1]], 50, 51)))

Or

 as.data.table(as.integer(substr(dt$files, 50, 51)))

I noticed that you are creating 'dt' as a data.table from 'files'. The output of list.files() is a vector, so instead of creating the data.table first, you could substr the vector and wrap it with as.data.table.

as.data.table(as.integer(files, 50, 51))

As an example,

files <- c('ABC_25', 'DEF_39')
dt <- as.data.table(files)
as.integer(substr(dt[[1]], 5, 6))
#[1] 25 39
as.integer(substr(files, 5, 6))
#[1] 25 39