There are extensive resources on the internet discussing the famous Birthday Paradox. It is clear to me how you calculate the probability of two people sharing a birthday i.e. P(same) = 1 - P(different)
. However if I ask myself something apparently more simple I stall: firstly, let's say I generate two random birthdays. Getting the same birthday is like tossing a coin. Either the two persons share a birthday (Heads) or they don't share a birthday (Tail). Run this 500 times and the end result (#Heads/500) will somehow be close to 0.5
Q1) But how do I think about this if I generate three random birthdays? How can I estimate the probability then? Obviously my coin analogy won't be applicable.
Q2) once I have figured out the above I will need to scale it up and generate 30 or 50 birthdays. Is there a recommended technique or algorithm to isolate identical birthdays from a large set? Should I put them into arrays and loop through them?
Here's what I think I need:
Q1)
r = 25 i.e. each trial run generates 25 birthdays
Trial 1 >
3 duplicates: 0
Trial 2 >
3 duplicates: 0
Trial 3 >
3 duplicates: 2
Trial 4 >
3 duplicates: 1
...
T100 >
3 duplicates: 2
estimated probability of 3 persons sharing a birthday in a room of 25 = (0+0+2+1+...+2)/100
Q2)
- Create an array for 2 duplicates, an array for 3 duplicates and one for more than 3 duplicates
- add each generated birthday one by one into the first array. But before doing so, loop through the array to see if it's in there already. If so, add it to the second array, but before doing so repeat the above process and so on
- It doesn't seem to be a very efficient algorithm though :) suggestions to improve the Big O here?
Create an integer array of length 365, initialized to 0. Then generate N (in your case 25) random numbers between 1-365 and increase that number in the array (ie. bdays[random_value]++). Since you are only interested in a collision happening, right after increasing the number in the array check if it is greater than 2 (If it is then there is a second collision, which means there are 3 people with the same birthday). Keep track of collisions and execute this as many times as you wish (1000).
In the end, the ratio of collisions/1000 will be your requested value.
and, no tossing coins analogy is wrong.