Randomizing 1s and 0s by groups while specifiying proportion of 1 and 0 within groups

Question

Randomizing 1s and 0s by groups while specifiying proportion of 1 and 0 within groups

53 views Asked by Alfa At 20 October 2023 at 07:43

First, I want to create a column that randomize 1s and 0s by group while maintaining the same proportion of 1s and 0s in another column.

Second, I want to repeat the above procedure many times (say 1000) and calculate the expected value.

Let me clarify with hypothetical data.

library(data.table) 

district <- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3)                                       
village <- c(1,2,3,4,1,2,3,4,5,1,2,3,4,5,6,7)                              
status <- c(1,0,1,0, 1,1,1,0,0,1,1,1,1,0,0,0) 

datei <- data.table(district, village, status)

What I want to do is I want to create a column that randomize 1s and 0s within a district while maintaining the same proportion of 1s and 0s in status; the proportions of 1:0 are 2:2, 3:2 and 4:3 in district 1, 2 and 3 respectively.

Second, I also want to repeat this randomization many times (say 1000 times) and calculate the expected value for each row.

I know how to randomize 1s and 0s based on district.

datei[, random_status := sample(c(1,0), .N, replace=TRUE), keyby = district]

However, I do not know how to have the same proportion of 1s and 0s as in status and how to repeat and calculate the expected values for each row.

Many thanks.

Edit: Let me add what I expect regarding calculating the expected values for each raw after, say, 1000 repetitions. Column exp_status is generated after randomizing many times while keeping the proportion of 1:0 within district is the same as in status.

district	village	status	exp_status
1	1	1	0.9
1	2	0	0.7
1	3	1	0.8
1	4	0	0.1
2	1	1	0.2
2	2	1	0.3
2	3	1	0.2
2	4	0	0.9
2	5	0	0.8
3	1	1	0.4
3	2	1	0.5
3	3	1	0.9
3	4	1	0.8
3	5	0	0.9
3	6	0	0.8
3	7	0	0.7

Original Q&A

There are 2 answers

Maël On 20 October 2023 at 08:10

The default behavior of sample is exactly what you are looking for, i.e. reshuffling:

library(dplyr)
datei |> 
  mutate(random_status = sample(status), .by = district)

#or
library(data.table)
datei[, random_status := sample(status), district]

As for the second question, I join @Paul Stafford Allen's comment in that it will always be .5, as per the law of large numbers.

**jay.sf** · Accepted Answer · 2023-10-20T07:50:18+00:00

Use a table as prob=, which gives on large scale similar proportions.

set.seed(42)
datei[, random_status := sample(0:1, .N, replace=TRUE, prob=table(status)), keyby = district]

colMeans(datei[, 3:4])
      #  status random_status 
      # 0.56339       0.56277

Data:

(slightly blown up, to 1e5 rows)

datei <- structure(list(district = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 
3, 3, 3, 3, 3), village = c(1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 
3, 4, 5, 6, 7), status = c(1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 
1, 0, 0, 0)), row.names = c(NA, -16L), class = c("data.table", 
"data.frame"))

set.seed(42)
datei <- datei[sample.int(nrow(datei), 1e5, replace=TRUE), ]

TechQA.

Randomizing 1s and 0s by groups while specifiying proportion of 1 and 0 within groups

There are 2 answers

Related Questions in R

Related Questions in RANDOM

Related Questions in SAMPLE

Related Questions in GROUP

Popular Questions

Popular Tags

Trending Questions

district	village	status	exp_status
1	1	1	0.9
1	2	0	0.7
1	3	1	0.8
1	4	0	0.1
2	1	1	0.2
2	2	1	0.3
2	3	1	0.2
2	4	0	0.9
2	5	0	0.8
3	1	1	0.4
3	2	1	0.5
3	3	1	0.9
3	4	1	0.8
3	5	0	0.9
3	6	0	0.8
3	7	0	0.7

district	village	status	exp_status
1	1	1	0.9
1	2	0	0.7
1	3	1	0.8
1	4	0	0.1
2	1	1	0.2
2	2	1	0.3
2	3	1	0.2
2	4	0	0.9
2	5	0	0.8
3	1	1	0.4
3	2	1	0.5
3	3	1	0.9
3	4	1	0.8
3	5	0	0.9
3	6	0	0.8
3	7	0	0.7

district	village	status	exp_status
1	1	1	0.9
1	2	0	0.7
1	3	1	0.8
1	4	0	0.1
2	1	1	0.2
2	2	1	0.3
2	3	1	0.2
2	4	0	0.9
2	5	0	0.8
3	1	1	0.4
3	2	1	0.5
3	3	1	0.9
3	4	1	0.8
3	5	0	0.9
3	6	0	0.8
3	7	0	0.7