data.table + ecdf - undefined column

423 views Asked by At

I am working with data.table. It is easy to select a column from a data.table object:

> head(data.table(mtcars)[,2])
   cyl
1:   6
2:   6
3:   4
4:   6
5:   8
6:   6

But trying to select a column using this syntax within a ecdf call yields an error:

> ecdf(data.table(mtcars)[,2])(data.table(mtcars)[,2])

Error in [.data.frame(x, i) : undefined columns selected

Can someone explain me why please?

Pragmatically, one way around this is to do:

> ecdf(data.table(mtcars)[[2]])(data.table(mtcars)[[2]])
 [1] 0.56250 0.56250 0.34375 0.56250 1.00000 0.56250 1.00000 0.34375 0.34375 0.56250 0.56250 1.00000 1.00000 1.00000 1.00000 1.00000
[17] 1.00000 0.34375 0.34375 0.34375 0.34375 1.00000 1.00000 1.00000 1.00000 0.34375 0.34375 0.34375 1.00000 0.56250 1.00000 0.34375

but I would like to understand the behavior above.

1

There are 1 answers

1
akrun On BEST ANSWER

The reason is in the extraction. In the first case, it is still a a data.table, while in second case it is a vector

data.table(mtcars)[[2]]
#[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

data.table and data.frame syntax are slightly different. IN data.table, the [, by default uses drop = TRUE. So, using , and selecting only a single column will drop the dimensions to become a vector

It is also mentioned in the data.table-faq

For consistency so that when you use data.table in functions that accept varying inputs, you can rely on DT[...] returning a data.table. You don’t have to remember to include drop=FALSE like you do in data.frame. data.table was first released in 2006 and this difference to data.frame has been a feature since the very beginning.