I have a data frame of values composed of 5 variables (class in brackets)
1) DateTime (as.POSIXct), 2) ID (character), 3) Sensor 1 (numeric), 4) Sensor 2 (numeric), 5) Sensor 3 (numeric)
This data was collected from 5 tagged fish. Each fish has one tag with 3 sensors on it, and each sensor has a unique ID (thus 5 fishtimes 3 ID's/tag = 15 unique ID's). The sensors record measurements related to each other and so record these measurements at the same time. The measurement data is sent out in same order every time (ID=A then B then C). This data is sent to listening receivers that can only receive one transmission at a time. To avoid having multiple tags send the data at the same time and potentially never receiving data, each sensor is sent out at a random interval (between 2-4 minutes) before a new set of measurements is collected and the cycle restarts. But with the random intervals, sometimes multiple tags try to send the data at the same time and so those measurements would not get recorded. Example data is provided for one fish below:
> head(dat,15)
DateTime ID Sensor1 Sensor2 Sensor3
446 2015-05-15 19:05:41 B NA 10.2 NA
464 2015-05-15 19:14:20 B NA 10.2 NA
475 2015-05-15 19:17:32 C NA NA 10.58824
486 2015-05-15 19:19:52 A 1.999499 NA NA
499 2015-05-15 19:22:31 B NA 10.2 NA
515 2015-05-15 19:28:10 A 1.999499 NA NA
523 2015-05-15 19:30:56 B NA 10.1 NA
542 2015-05-15 19:37:22 A 1.999499 NA NA
559 2015-05-15 19:41:09 B NA 10.2 NA
574 2015-05-15 19:44:47 C NA NA 10.50980
613 2015-05-15 19:50:23 B NA 10.3 NA
633 2015-05-15 19:53:07 C NA NA 10.50980
650 2015-05-15 19:56:32 A 1.999499 NA NA
684 2015-05-15 20:02:49 C NA NA 10.50980
702 2015-05-15 20:05:51 A 1.999499 NA NA
My problem becomes trying to extract only the complete sets of the data, meaning cycles where ID's A, B, and C are all detected for a tag from the same cycle, so the data from the 3 sensors can be used together. If an ID was missed in a cycle then I do not want any of the measurements from that cycle. In the example above, there is only one cycle that I want to keep (rows beginning with numbers 542, 559, and 574.)
Once I have removed all of the incomplete cycles, I would like to combine each cycle into a single observation so I have a new dataframe where each row represents a cycle and all 3 sensor variables have values. It would also be useful to calculate the time between ID's A and C so that I can verify they are from the same cycle and not a situation where the same ID's could have been missed multiple times in a row but the order still works (the chances of this happening are very, very low).
So far I've been trying to use a for loop to extract the rows of dat where the correct order is seen and put these rows into a new data frame. I'm not sure how to get R to read my criteria as a conditional statement and how to have criteria from 3 different observations be met before performing what I want the loop to do. I would be happy to do it in a way other than using a loop if possible. Here's an example of my loop below (I know that I am not calling up a True or False value to test against the ==TRUE condition, I'm just not sure how to do it for each row):
#make blank dataframe
output <- data.frame (DateTime=rep(as.POSIXct(NA, tz="UTC"), length(tag123o$Transmitter)),
ID=rep(as.character(NA), length(tag123o$Transmitter)),
Sensor1=rep(as.numeric(NA), length(tag123o$Transmitter)),
Sensor2=rep(as.numeric(NA), length(tag123o$Transmitter)),
Sensor3=rep(as.numeric(NA), length(tag123o$Transmitter)))
for (i in 1:length(dat$ID)) {
if (((dat[i,names(dat)=="ID"] == "A69-1105-123") &
(dat[i+1,names(dat)=="ID"] == "A69-1105-124") &
(dat[i+2,names(dat)=="ID"] == "A69-1105-125"))==TRUE) {
output[i,] <- cbind(dat[i,], data.frame(Cycle=i))
output[i+1,] <- cbind(dat[i+1,], data.frame (Cycle=i))
output[i+2,] <- cbind(dat[i+2,], data.frame(Cycle=i))
}
}
Your question boils down to searching for sequences of "ABC" within the sequences of the IDs:
This indicates that the only match begins at row 8. You now know that the information for Sensor1 are at rows numbered
matches
, the information for Sensor2 are at rows numberedmatches+1
, and the information for Sensor3 are at rows numberedmatches+2
. This enables you to efficiently construct the desired data frame that combines the information for cycles:You can now do any computations you want to further filter the information (e.g. removing cycles where the time difference between measurements is too large).