In a previous post I discussed generating a unique id number for an order in our database. Read the original post for the full story, but to sum up, I decided that playing around with rand(100_000_000) was good enough. Martin Ankerl posted a comment pointing out that I was vulnerable to the Birthday Problem. So after a quick trip to wikipedia, I found out that if you have 23 people in a room you might think you have a 23/365 chance of having 2 people with the same birthdays, but no, your chances are closer to 50%! This is because in a group of 23 people there are 253 pairs each of which have a 1/365ish chance to have the same birthday. The formula to calculate how many are needed to give you a 50% chance of having a birthday collision is:
.5 + sqrt(.25 + 2 * 365 * ln(2) ) = 22.9999
Which means the 50% chance of collision mark for 100,000,000 is:
.5 + sqrt(.25 + 2 * 100,000,000 * ln(2) ) = 11774.600235771
Holy crap! I’m screwed if I can only have 12,000 orders before I get a 50% chance of collision. But wait, I think I’m ok because every time I insert a key into the database I check to make sure it isn’t already in there. This means that when I’m at 12,000 orders the next key has only a 1 / (100,000,000 - 12,000) chance of being a collision. I think I’ve changed from the “Birthday problem” to the “Same birthday as you” problem, which is way more intuitive. If I’m wrong I rely on you, the blogoshpere, to tell me so.
If you haven’t already, go read the wikipedia article on the Birthday problem. It’s pretty cool.