Monday, October 26, 2009

Distinguish of HyperGeometric , Binomial and Poisson Distribution

First , Let me give definition of them by when to use.

HyperGeometric (HG) = when the events are not replaceable. ( like pick cards) i.e. events are related.

Binomial (BI) = when events are replaceable. ( like toss coins) i.e. events are not related (mutually exclusive)

Poisson ( pronounce: po song ) (PS) = when events are counted continuously and in term of interval. (like How many cars pass the lines in given time interval)



HG and BI are closely related as we can see, just a matter of replaceable or non-replaceable. But why PS also related?? Before, we should know HG and BI, how exactly they are related.

Lets take an example of cards drawing.we have 52 cards. the probability of pick 2 hearts out of 5 draws is:

5C2 * (13/52) (12/51) (39/50) (38/49) (37/48) = 0.27

or using Excel =HYPGEOMDIST(2,5,13,52)

because every time you pick up a card, there is 1 card less in the sample space.

how about we pick out 52 cards, all of them? of course, we will have 13 hearts, no more, no less.

HYPGEOMDIST(13,52,13,52) = 1

ok, now, if we have infinite number of cards, and, the proportion of heart is still a quarter, what is the probability of 2 hearts in 5 draws?

is HYPGEOMDIST(2,5,inf/4,inf) ???

since there are infinite number of cards, the statement of non-replaceable is making no sense. Thus, it become BI

5C2 (1/4) (1/4) (3/4) (3/4) (3/4) = BINOMDIST(2,5,0.25,0) = 0.26 little bit less, but it is NOT because the huge sample space reduced the chance.

OK, i assume, you got the idea that, if the sample space goes to infinite, or real large, HG is BI. i say, if sample space is 500 times larger than the pick up size, then, it is safe to use BI to replace HG.

We can related BI to PS now.

Let takes an example to show the different.

in 100 people, 30 are boys . the probability of getting 3 boys in 10 people is follow BI or HG more accurately.

if someone using Poisson, he is assuming that, there is probability that we can get 11 boys by picking 10 people out!!

One thing we can see is, in Poisson Distribution, the event of success (boy) should not be bounded by the sample space (people picked) . or we can say,

in Binomial, number of boys <= number of people picked.
in Poisson, number of boy can any number.

so, if the number of pick goes to infinite in a given interval, or counted continuously . thus, BI goes to PS.

one sample is monitoring, we detect signal in any time.

like the number of leak in 1 km of water pipe. first, the leakage is not bounded by the length of water pipe. Second, we pick event in every where. Thus, we use PS.

But if we divide the water pipe in 100 equal intervals, and we want to know how many intervals are leaking. thus we should use BI, or HG.

A short summery,

sample space = N (people, cards)
pick put size = n
target size = T (boy, heart)
interested size = x

HYPERGEODIST(x,n,T,N) = Probability of x heart out of n pick out, where there are T hearts and N cards, with out replacement.

BINOMDIST(x,n,T/N,0) = Probability of x heart out of n pick out, where there are T hearts and N cards, with replacement. ( when T/N is a constant, and N goes to INF, HG goes to BI )

POISSON(x, n T/N,0) = Probability of x heart out of N intervals, where nT/N is the mean. ( when nT/N is a constant, n and N goes to INF, HG or BI goes to PS )

No comments: