How do I create a unique ID for each night-time period across consecutive dates?

853 views Asked by At

I have data collected continuously over consecutive days. Each observation already contains the variables datetime (a POSIXct object) and par (a numeric object). I want to create a new variable called ID which will be a unique number associated with an individual night-time period. I will define night-time as all observations where par = 0. An example of what I would like my data to look like is given below for 3 consecutive days.

Note: my actual data is taken at 15-minute intervals, but I truncated this example for easier viewing.

              datetime    par    ID
1  2015-04-23 00:00:00    0.0     1
2  2015-04-23 08:00:00    0.0     1
3  2015-04-23 12:00:00  817.7     0
4  2015-04-23 19:00:00    0.0     2
5  2015-04-24 00:00:00    0.0     2
6  2015-04-24 08:00:00    0.0     2
7  2015-04-24 12:00:00  269.9     0
8  2015-04-24 19:00:00    0.0     3
9  2015-04-25 00:00:00    0.0     3
10 2015-04-25 08:00:00    0.0     3
11 2015-04-25 12:00:00 1701.8     0
12 2015-04-25 19:00:00    0.0     4
13 2015-04-25 23:00:00    0.0     4

I would like non-night (par !=0) observations have ID = 0. The first night-time period occurs on 4/23/2015, so I want it's ID = 1. I then want to add 1 to each following night-time period ID.

Any ideas? Despite much effort, I have not been able to achieve the above result. Thank you in advance.

3

There are 3 answers

6
thelatemail On BEST ANSWER

With cumsum:

df$ID2 <- 0
sel <- df$par == 0
df$ID2[sel] <- cumsum(!sel)[sel] + 1

#      par ID ID2
#1     0.0  1   1
#2     0.0  1   1
#3   817.7  0   0
#4     0.0  2   2
#5     0.0  2   2
#6     0.0  2   2
#7   269.9  0   0
#8     0.0  3   3
#9     0.0  3   3
#10    0.0  3   3
#11 1701.8  0   0
#12    0.0  4   4
#13    0.0  4   4
0
SymbolixAU On

A couple of ways, both employing run-length-encoding

data.table

This uses the fact that time periods are grouped by date, so we can use run-length-encoding to group the values on the same date. Then anything with a par > 0 can be set to 0.

library(data.table)

setDT(df)
## explicitly ordering the data.table
df[order(datetime), ID := rleid(as.Date(datetime))][ par > 0, ID := 0]
df
#               datetime    par ID
# 1:  2015-04-23 00:00:00    0.0  1
# 2:  2015-04-23 08:00:00    0.0  1
# 3:  2015-04-23 12:00:00  817.7  0
# 4:  2015-04-23 19:00:00    0.0  2
# 5:  2015-04-24 00:00:00    0.0  2
# 6:  2015-04-24 08:00:00    0.0  2
# 7:  2015-04-24 12:00:00  269.9  0
# 8:  2015-04-24 19:00:00    0.0  3
# 9:  2015-04-25 00:00:00    0.0  3
# 10: 2015-04-25 08:00:00    0.0  3
# 11: 2015-04-25 12:00:00 1701.8  0
# 12: 2015-04-25 19:00:00    0.0  4
# 13: 2015-04-25 23:00:00    0.0  4

(This is my preferred solution as it can handle any dates, correctly orders the data, and is efficient because, you know, data.table...)


base

If the 'night time period' are always separated by a 'daytime' period, then you can also do it this way

## create an ID column set the 'daytime' periods to 0
df[ df$par != 0, "ID"] <- 0

## get the run-length-encoding of the ID column
r <- rle(is.na(df$ID))

## the length of the rle where the value is TRUE gives us teh number of NAs in each 'group', so we can replicate the rle value by that amount
rep(1:length(r$lengths[r$values]), r$length[r$values])
# [1] 1 1 2 2 2 3 3 3 4 4

## so you can fill the rest of hte column with these values

df[ df$par == 0, "ID"] <- rep(1:length(r$lengths[r$values]), r$length[r$values])
df
#               datetime    par ID
# 1  2015-04-23 00:00:00    0.0  1
# 2  2015-04-23 08:00:00    0.0  1
# 3  2015-04-23 12:00:00  817.7  0
# 4  2015-04-23 19:00:00    0.0  2
# 5  2015-04-24 00:00:00    0.0  2
# 6  2015-04-24 08:00:00    0.0  2
# 7  2015-04-24 12:00:00  269.9  0
# 8  2015-04-24 19:00:00    0.0  3
# 9  2015-04-25 00:00:00    0.0  3
# 10 2015-04-25 08:00:00    0.0  3
# 11 2015-04-25 12:00:00 1701.8  0
# 12 2015-04-25 19:00:00    0.0  4
# 13 2015-04-25 23:00:00    0.0  4
0
Barker On

I would start by converting your POSIXct dates into POSIXlt dates because these have attributes which make determining things like days easier.

df[["datetime"]] <- as.POSIXlt(df[["datetime"]])

We can then use the fields yday and hour to get a numeric value where anything past noon in a 24 hour period a has the same value. This will only work if all your times are in the same year. If not, you will have to use the year attribute to adjust the days.

indicator <- df[["datetime"]][["yday"]] + (df[["datetime"]][["hour"]] > 12)

Using factor levels, we can then get the order numbers starting at 1.

df[["ID"]] <- as.numeric(as.factor(indicator))

Then we can set all day periods to 0.

df[["ID"]][df[["par"]] != 0] <- 0

This can be done in two lines if so desired.

df[["ID"]] <- as.numeric(as.factor(df[["datetime"]][["yday"]] + 
                                (df[["datetime"]][["hour"]] > 12)))
df[["ID"]][df[["par"]] != 0] <- 0

This solution will work no matter what order your data are in.