How to get distribution of sum of dependent bernoulli variables

455 views Asked by At

I have N bernoulli variables, X1, ..., XN, and Xi~B(1, pi), pi is known for each Xi, and Y=X1+...XN, now I need to get the destribution of Y.

If Xi and Xj is independent when i!=j, then I can use the simulation:

1. Generate `X1`, ..., `XN` via their distribution, and then get the value of `Y`;
2. Repet step 1 for 10000 times, and then I can get `Y1`, ..., `Y10000`, so I can konw the distribution of `Y`.

But now Xi and Xj is dependent, so I also need to take into account the correlation, assuming that corr(Xi, Xj)=0.2 when i!=j, how can I insert the correlation to the simulation? Or get the distribution of Y via other ways?

Thanks for the help and advise.

1

There are 1 answers

0
pjs On

You can generate specific pairwise correlations (within limits) by deriving the conditional distribution of one given the other. The limits are that you can't have completely arbitrary p-values and correlations. However, the simultaneous constraints implied by N-choose-2 pairwise sets of correlations will be infeasible for arbitrary choices of N, p-values, and correlations.

The following Ruby implementation shows the calculations for obtaining specified p-values and correlations for a pair of X's:

# Control the run with command-line args.
# If no args provided, default to test case of
# p1 = 0.2, p2 = 0.8, rho = -0.5, sample size = 10
p1 = (ARGV.shift || 0.2).to_f
p2 = (ARGV.shift || 0.8).to_f
rho = (ARGV.shift || -0.5).to_f
n = (ARGV.shift || 10).to_i

# Calculate conditional probabilities for p2 given p1 = 0, 1
p2_given = [p2 - rho * Math::sqrt(p1 * p2 * (1.0 - p2) / (1.0 - p1)),
            p2 + rho * Math::sqrt((1.0 - p1) * p2 * (1.0 - p2) / p1)]

printf "p2_given_1 = %8.5f, p2_given_0 = %8.5f\n", p2_given[1], p2_given[0]

# Only proceed to actually generate values if the conditional
# probabilities are between zero and one
if p2_given.inject(true) {|m, e| m &= (e >= 0 && e <= 1)}
  n.times do
    x1 = (rand <= p1) ? 1 : 0
    x2 = (rand <= p2_given[x1]) ? 1 : 0
    printf "%d,%d\n", x1, x2
  end
end