Deep in the trenches: On the statistics of rolling a dice

Assume that you have a fair dice, numbered 1 to 6. The probability that you get a side with 6 on a given roll is 1/6. What is the probability you get side 6 exactly two times?
If we line the dice rolling event and name each event as the side of outcome then there are 6C2 ways in which side 6 can be appeared in the line of events. The probability of getting two 6s in one such outcome is (1/6) to the power of 2 * 5/6 to the power of 4. Total probability of getting two 6s is 6C2 * (1/6) to the power of 2 * 5/6 to the power of 4.

Note - nCr is the number of ways of drawing r items from a basket of n items = n!/(r! * (n-r)!)
6C2 = 6!/(4! 2!) = 15.

What is the probability of getting at least one 6.

This is the sum of probabilities of getting 1,2,3,4,5 and 6 6s. or equivalently this is one minus the probability of getting no 6s= 1- (5/6) to the power of 6.

What is the probability of getting at least two 6s.

This is the sum of probabilities of getting 2,3,4,5 and 6 6s. Or equivalently 1- (probability of getting no 6s at all + probability of getting one 6s).

= $1-(5/6)^6)+(5/6)^5*1/6*6C5)$

= $1-(5/6)^6)+(5/6)^5*1/6*6)$

Distribution of the dice rolling event

The distribution of the outcome of the roll is a multinomial distribution. That means each roll can take 6 different out comes with a probability value of 1/6. If c1,c2,c3,c4,c5,c6 are the number of times each side appears then the joint probability of a given series of dice rolls is given by the multinomial distribution $n!/(c1! * c2! * c3! * c4! * c5! * c6!) * (1/6) ^(c1+c2+c3+c4+c5+c6) =n!/(c1! * c2! * c3! * c4! * c5! * c6!) * (1/6) ^n$

Expected value of the outcome and central limit theorem.

If we have 60 of the dice rolling event, what is the expected number of getting side 6. It is 60/6 = 10;

Now this is only an expected value, we can have variations around this expected value. I mean, for one set of 60 dice rolling event we might get 8 side 6s and in another such even we can get 12 6s. If we take a very large number of such sets of events and plot the number of 6s on each such set, we get a bell curve centered around the expected value which is 10. This is the central limit theorem, which says that the expected value or arithmetic mean of a set of observations will follow normal distribution for a large number of such sets.

A non fair dice

Let us say that the dice is not fair. The probability of getting side 6 is not 1/6 but some other number, which we do not know. How do we infer the probability of getting a side 6 from a large number of observations of dice rolling.

One approach will be do the dice rolling as a set of rolling events and repeat this for a large number of times. Observe the number of 6s in each set and calculate the mean. Central limit theorem says that this mean is the expected value of the number of 6s in a set of dice rolls. If we know the number of events in each set, the expected value divided by the size of the event set gives you the probability of observing side 6 in a single roll.

Maximum Likelihood Estimate

We can do a maximum likelihood estimate to find out the parameter of the distribution. Maximum Likelihood estimate, estimates the parameter which maximizes the likelihood of observing the data.

If we assume that the dice is biased to side 6, and if p is the probability of getting side 6, then (1-p)/5 is the probability getting a side which is not side 6. If X is the observed set of data, c is the number of times we observe side 6

$P(p/X) = argmax P(X/p) = (P^c)* ((1-P)/5)^(n-c)$

$log(p/X) = c * log(p) + (n-c) log (1-p)/5$

To maximize this likelihood, we differentiate this w.r.t p

$d/d(p) (c * log(p) + (n-c) log (1-p)/5) = 0$

$c/p - (n-c)/(1-p) = 0$

p= c/n. That is if we observe 10 6s out of 60 rolls, the probability of getting a 6 is 10/60 = 1/6.

What if there is a prior belief associated with p?

If there is a prior distribution on p, that is the probability distribution of getting a given p, then we need to factor that in, to calculate our p after observing the data. Instead of calculating the p which maximizes the probability of observing the given data, we calculate the maximum a posteriori estimation or MAP estimate of p.

The name MAP estimate comes from the objective of maximizing the posterior of the parameter p given the data.

By Bayes theorem P(p/X) = P(X/p) * P(p) / P(X).

MAP estimate for p, Vmap = argmax log(P(X/p)) * log(P(p))

Note that here there is an additional term related to the prior distribution comes in, which affects the calculation of the estimate for p.

To be continued.....

Deep in the trenches

Tuesday, May 6, 2014

On the statistics of rolling a dice - part 1