I recently asked a question about how to take the contents of a column and use them as column headers in a new data-frame with a Boolean expression of 1 or 0. if it contained the value in R
An example would be
Id. Event
A. Wc
B. Df
C. Df
A. Df
Needs to be converted to
Wc df
A 1. 1
B 0. 1
C. 0. 1
I have since being playing around with it and it seems to work fine however recently i have been getting the following error
Error in FUN(X[[1L]], ...) : invalid 'type' (character) of argument
# get the totals by counting factors for SMS Type and number of replies
cols <- c("SMS.Type", "Replied")
setDT(train)[, paste0(cols, ".count") :=
lapply(.SD, function(x) length(unique(na.omit(x)))),
.SDcols = cols,
by = awb_no]
# Summerize a column and convert it to boolean column header
lst <- train$SMS.Type
lvl <- unique(unlist(lst))
train.agg.chkpt <- data.frame(ID_no=train$ID_no,
do.call(rbind,lapply(lst, function(x) table(factor(x,levels=lvl)))),
stringsAsFactors=FALSE)
train.agg.chkpt <- aggregate (train.agg.chkpt,by=list(ID_no=train.agg.chkpt$ID_no), FUN = "sum")
train.agg.chkpt <- train.agg.chkpt[c(-1)]
The column ID_no is just an ID number and this is the ID around which the booleans are grouped. Its a character type number (I assume this is what the error message is referencing)
Each ID should be unique. Below is the structure of the dataset
str(train.agg.chkpt)
'data.frame': 823462 obs. of 12 variables:
$ ID_no : chr "AAAAAAA75465" "BBBBB175465" "CCCCCC75476" "DDDDD75476" ...
$ WC : int 1 0 0 1 0 0 0 1 0 1 ...
$ DF1 : int 0 1 1 0 0 0 0 0 0 0 ...
$ DF2 : int 0 0 0 0 1 1 1 0 1 0 ...
$ WCB14 : int 0 0 0 0 0 0 0 0 0 0 ...
$ WCA13 : int 0 0 0 0 0 0 0 0 0 0 ...
$ HN : int 0 0 0 0 0 0 0 0 0 0 ...
$ WCB13 : int 0 0 0 0 0 0 0 0 0 0 ...
$ WCA12 : int 0 0 0 0 0 0 0 0 0 0 ...
$ WCA14 : int 0 0 0 0 0 0 0 0 0 0 ...
$ WCB12 : int 0 0 0 0 0 0 0 0 0 0 ...
Below is the traceback()
lapply(X = split(e, grp), FUN = FUN, ...)
4: FUN(X[[1L]], ...)
3: lapply(x, function(e) {
ans <- lapply(X = split(e, grp), FUN = FUN, ...)
if (simplify && length(len <- unique(sapply(ans, length))) ==
1L) {
if (len == 1L) {
cl <- lapply(ans, oldClass)
cl1 <- cl[[1L]]
ans <- unlist(ans, recursive = FALSE)
if (!is.null(cl1) && all(sapply(cl, function(x) identical(x,
cl1))))
class(ans) <- cl1
}
else if (len > 1L)
ans <- matrix(unlist(ans, recursive = FALSE), nrow = nry,
ncol = len, byrow = TRUE, dimnames = {
if (!is.null(nms <- names(ans[[1L]])))
list(NULL, nms)
else NULL
})
}
ans
})
2: aggregate.data.frame(train.agg.chkpt, by = list(ID_no = train.agg.chkpt$ID_no),
FUN = "sum")
1: aggregate(train.agg.chkpt, by = list(ID_no = train.agg.chkpt$ID_no),
FUN = "sum")
Can anyone help me understand the error message?
Thank you for your time
Your desired output could be easily reached with a simple
table
implementation per eachId
. Here's a possibledata.table
(which you already using) implementationOr alternatively, (as suggested) you could use a simple
dcast
Or similarly
Or using
tidyr
(See Note below)Or using
reshape
from base R (See Note below)spread
andreshape
won't work here in case sameId
has sameEvent
more than once because they don't have thefun.aggregate
argument, so they won't know how to handle it.Benchmarks