I have a data.table (r1) that contains duplicated values for site and time. Here I create this examplary data table already showing why these duplicates appear, i.e. that I initially have two data tables that I merge. Perhaps my problem can be solved already at the stage of merging dt1 and dt2..
dt1 <- data.table(site=c(1,1,2,2), site_type="type1", time=c(1,2,1,2), temp=c(10,12,13,NA), prec=c(10,101,1,1) ) dt2 <- data.table(site=c(3,3,2,2 ), site_type="type2", time=c(1,2,1,2), temp=c(10,12,100,140), prec=c(10,101,1000,NA), snow=c(1,1,1,1)) r1 <- rbindlist(list(dt1,dt2), fill=T)
Now, I would like to aggregate all duplicated rows (4 and 8, as well as 3 and 7) so that for the columns 'temp', 'prec', 'snow' the values will be taken from site_type='type1' unless it is NA.
The dirty solution that I found is to create a subset of r1 containing duplicated rows, and a subset with no duplicates.
duplicates <- r1[duplicated(r1,by=c("site","time")) | duplicated(r1,by=c("site","time"), fromLast=TRUE)] no_duplicates <- r1[!(duplicated(r1,by=c("site","time")) | duplicated(r1,by=c("site","time"), fromLast=TRUE))]
Then to aggregate the duplicates wiht lapply.
aggregated_duplicates <- duplicates[, lapply(.SD, function(x) ifelse(!is.na(x[site_type=="type1"]),x[site_type=="type1"],x[site_type=="type2"])), by=c("site","time")]
And then to merge the aggregated_duplicates and no_duplicates data tables.
r1_without_duplicates <- rbindlist(list(no_duplicates, aggregated_duplicates), use.names = T)
The approach seems to work, but I have the feeling that in data table this could be done in one line either while merging dt1 and dt2, or by a smart use of 'duplicated' or 'unique'.