For a dataset like this
MainID SubID DOB BMI
1234 1234_A Feb-19-2024 10.1
1235 1235_A Jan-11-2023 17.23
1235 1235_B Jan-11-2023 19.11
5136 5136_A May-17-2021 21.87
5136 5136_B May-17-2021 14.18
5136 5136_C May-17-2021 18.11
3357 3357-A Oct-06-2023 24.10
9124 9124-B July-01-2021 12.09
9124 9124-B July-01-2021 15.06
I am trying to randomly assign a value 0 or 1 only if the values in MainID and DOB are same SubID is different. Expecting a dataset like this
MainID SubID DOB BMI Col1
1234 1234_A Feb-19-2024 10.1 0
1235 1235_A Jan-11-2023 17.23 0
1235 1235_B Jan-11-2023 19.11 1
3357 3357-A Oct-06-2023 24.10 0
5136 5136_A May-17-2021 21.87 0
5136 5136_B May-17-2021 14.18 0
5136 5136_C May-17-2021 18.11 1
9124 9124-B July-01-2021 12.09 0
9124 9124-B July-01-2021 15.06 0
Here only rows with ID 1235 and 5136 are assigned 0 or 1 because the repeated rows have same MainID, DOB and different SubID.
I tried options with ifelse and duplicated(df[c("MainID", "DOB")]) but this did not work. Any suggestion is much appreciated. Thanks in advance.
It's not clear if you want
0:1sampled with replacement. I'm assuming always0:1and with-replacement only when the number of rows in a group is more than 2.Up front, both dplyr and base-R code have a snippet similar to this.
0and one1in the bunch; if we didsample(0:1, size=n(), replace=TRUE), it is feasible (though improbable) that we can have all0s or all1s, and I'm inferring that we want/need at least one of each;c(0:1, ..), forcing the presence of each number;sampleto produce a size ofn() - 2(2 fewer than there are rows); if there are only four rows, then the outer0:1and the inner0:1will suffice, so we don't technically need replacement, ergoreplace=(n() > 4)0:1and may have zero or more0s and1s based on the number of rows in the group, we need to randomly reorder those (because we don't want the first two to be always0:1dplyr
base R
Data