Extract series of observations from dataframe for complete sets of data

317 views Asked by At

I have a data frame of values composed of 5 variables (class in brackets)

1) DateTime (as.POSIXct), 2) ID (character), 3) Sensor 1 (numeric), 4) Sensor 2 (numeric), 5) Sensor 3 (numeric)

This data was collected from 5 tagged fish. Each fish has one tag with 3 sensors on it, and each sensor has a unique ID (thus 5 fishtimes 3 ID's/tag = 15 unique ID's). The sensors record measurements related to each other and so record these measurements at the same time. The measurement data is sent out in same order every time (ID=A then B then C). This data is sent to listening receivers that can only receive one transmission at a time. To avoid having multiple tags send the data at the same time and potentially never receiving data, each sensor is sent out at a random interval (between 2-4 minutes) before a new set of measurements is collected and the cycle restarts. But with the random intervals, sometimes multiple tags try to send the data at the same time and so those measurements would not get recorded. Example data is provided for one fish below:

> head(dat,15)

                   DateTime ID  Sensor1 Sensor2  Sensor3
    446 2015-05-15 19:05:41  B       NA    10.2       NA
    464 2015-05-15 19:14:20  B       NA    10.2       NA
    475 2015-05-15 19:17:32  C       NA      NA 10.58824
    486 2015-05-15 19:19:52  A 1.999499      NA       NA
    499 2015-05-15 19:22:31  B       NA    10.2       NA
    515 2015-05-15 19:28:10  A 1.999499      NA       NA
    523 2015-05-15 19:30:56  B       NA    10.1       NA
    542 2015-05-15 19:37:22  A 1.999499      NA       NA
    559 2015-05-15 19:41:09  B       NA    10.2       NA
    574 2015-05-15 19:44:47  C       NA      NA 10.50980
    613 2015-05-15 19:50:23  B       NA    10.3       NA
    633 2015-05-15 19:53:07  C       NA      NA 10.50980
    650 2015-05-15 19:56:32  A 1.999499      NA       NA
    684 2015-05-15 20:02:49  C       NA      NA 10.50980
    702 2015-05-15 20:05:51  A 1.999499      NA       NA

My problem becomes trying to extract only the complete sets of the data, meaning cycles where ID's A, B, and C are all detected for a tag from the same cycle, so the data from the 3 sensors can be used together. If an ID was missed in a cycle then I do not want any of the measurements from that cycle. In the example above, there is only one cycle that I want to keep (rows beginning with numbers 542, 559, and 574.)

Once I have removed all of the incomplete cycles, I would like to combine each cycle into a single observation so I have a new dataframe where each row represents a cycle and all 3 sensor variables have values. It would also be useful to calculate the time between ID's A and C so that I can verify they are from the same cycle and not a situation where the same ID's could have been missed multiple times in a row but the order still works (the chances of this happening are very, very low).

So far I've been trying to use a for loop to extract the rows of dat where the correct order is seen and put these rows into a new data frame. I'm not sure how to get R to read my criteria as a conditional statement and how to have criteria from 3 different observations be met before performing what I want the loop to do. I would be happy to do it in a way other than using a loop if possible. Here's an example of my loop below (I know that I am not calling up a True or False value to test against the ==TRUE condition, I'm just not sure how to do it for each row):

#make blank dataframe    
output <- data.frame (DateTime=rep(as.POSIXct(NA, tz="UTC"), length(tag123o$Transmitter)),
                          ID=rep(as.character(NA), length(tag123o$Transmitter)),
                          Sensor1=rep(as.numeric(NA), length(tag123o$Transmitter)),
                          Sensor2=rep(as.numeric(NA), length(tag123o$Transmitter)),
                          Sensor3=rep(as.numeric(NA), length(tag123o$Transmitter)))

    for (i in 1:length(dat$ID)) {
      if (((dat[i,names(dat)=="ID"] == "A69-1105-123") &
        (dat[i+1,names(dat)=="ID"] == "A69-1105-124") &
          (dat[i+2,names(dat)=="ID"] == "A69-1105-125"))==TRUE) {
            output[i,] <- cbind(dat[i,], data.frame(Cycle=i)) 
            output[i+1,] <- cbind(dat[i+1,], data.frame (Cycle=i))
            output[i+2,] <- cbind(dat[i+2,], data.frame(Cycle=i))
          }
    }
1

There are 1 answers

0
josliber On BEST ANSWER

Your question boils down to searching for sequences of "ABC" within the sequences of the IDs:

(matches <- gregexpr("ABC", paste(dat$ID, collapse=""))[[1]])
# [1] 8
# ...

This indicates that the only match begins at row 8. You now know that the information for Sensor1 are at rows numbered matches, the information for Sensor2 are at rows numbered matches+1, and the information for Sensor3 are at rows numbered matches+2. This enables you to efficiently construct the desired data frame that combines the information for cycles:

data.frame(DateTime1 = dat$DateTime[matches],
           DateTime2 = dat$DateTime[matches+1],
           DateTime3 = dat$DateTime[matches+2],
           Sensor1 = dat$Sensor1[matches],
           Sensor2 = dat$Sensor2[matches+1],
           Sensor3 = dat$Sensor3[matches+2])
#             DateTime1           DateTime2           DateTime3  Sensor1 Sensor2 Sensor3
# 1 2015-05-15 19:37:22 2015-05-15 19:41:09 2015-05-15 19:44:47 1.999499    10.2 10.5098

You can now do any computations you want to further filter the information (e.g. removing cycles where the time difference between measurements is too large).