Generate a Bernoulli variable from vector with probabilities [r]

785 views Asked by At

I'm having some issues with a quite basic issue. I tried to find any threads who is having the same issue but couldn't find any.

I'm trying to figure out how to generate a Bernoulli variable (y) which is based on probabilities (z) I have generated for each observation. I've generated the fictive dataset below to represent my problem.

x <- c("A", "B", "C", "D", "E", "F")
z <- c(0.11, 0.23, 0.25, 0.06, 0.1, 0.032)

df <- data.frame(x, z)

I want to add the variable y which is a binary variable based upon the probabilities from variable z.

I tried the following:

df <- df %>%
  mutate(y = rbinom(1,1,z))

But it seems like it gives the same value to all observation, and not based on the observation's own probability.

Does anyone know how to solve this?

Thanks!

1

There are 1 answers

0
Limey On BEST ANSWER

From the online documentation for rbinom:

rbinom(n, size, prob)
n: number of observations. If length(n) > 1, the length is taken to be the number required.

So

df <- df %>%
  mutate(y = rbinom(nrow(df), 1, z))
df
> df
  x     z y
1 A 0.110 0
2 B 0.230 1
3 C 0.250 0
4 D 0.060 0
5 E 0.100 0
6 F 0.032 0

To demonstrate that events are generated with the correct probabilities:

df <- data.frame(x=rep(x, each=500), z=rep(z, each=500))
df <- df %>%
  mutate(y = rbinom(nrow(df), 1, z))
df %>% group_by(x) %>% summarise(y=mean(y), groups="drop")
# A tibble: 6 x 2
  x         y
  <fct> <dbl>
1 A     0.114
2 B     0.232
3 C     0.25 
4 D     0.06 
5 E     0.106
6 F     0.018