Appendix E

The Binomial Distribution

Permutations and Combinations: When we flip a coin twice, we observe a particular sequence or permutation; for example, a head followed by a tail. In contrast, a head and a tail, in any order, is called a combination. A head and a tail can exist as one of two possible permutations xht = (xh, xt), or xth = (xt, xh). The probability of observing each of these permutations is the same because the order in which heads and tails occurs does not affect their probabilities. Thus,

p(xht | θ)

=

p(xh|θ)×p(xt|θ)

(E.1)

 

=

θ × (1 − θ)

(E.2)

p(xth | θ)

=

p(xt|θ)×p(xh|θ)

(E.3)

 

=

(1θ)×θ,

(E.4)

and if we assume θ = 0.6 then the probability of the permutations p(xht|θ) = p(xth|θ) = 0.24. As each of these two permutations occurs with probability 0.24, it follows that the probability of observing a head and a tail in any order, or equivalently, the probability of a combination that contains a head and a tail, is

p(xht|θ)+p(xth|θ)=0.48.

(E.5)

We can clean up the notation by defining x as a specific combination, such as x = {xt, xh}, where this is read as one head out of two coin flips in any order, and where we use curly brackets to represent a combination. Thus, we can write

p(x | θ)

=

p({xt, xh | θ})

(E.6)

 

=

2 × θ × (1 − θ)

(E.7)

 

=

0.48,

(E.8)

where θ(1 – θ) is the probability that any permutation contains exactly one head and one tail, and 2 is the number of permutations that contain exactly one head and one tail.

We can reassure ourselves that Equation E.8 is true by working out the probability of observing the three possible combinations that can result from two coin flips, namely, {xt, xt}, {xt, xh}, and {xh, xh}, given a specific value for θ. The combination {xt, xt} includes only one permutation (xt, xt), so

p({xt,xt}|θ)=p(xt|θ)×p(xt|θ)=0.16.

(E.9)

Similarly, for the combination {xh, xh}, we have

p({xh,xh}|θ)=p(xh|θ)×p(xh|θ)=0.36.

(E.10)

So, given that the only possible three combinations from two coin flips contain zero heads {xt, xt}, one head {xh, xt}, or two heads {xh, xh}, with probabilities

p({xt,xt}|θ)=0.16

(E.11)

p({xh,xh}|θ)=0.36

(E.12)

p({xh,xt}|θ)=0.48,

(E.13)

the sum of these probabilities is one

0.16+0.36+0.48=1.

(E.14)

The general lesson to be drawn from this simple example is as follows. To find the probability of observing a combination that contains n heads and Nn tails, first find the probability of one such permutation, and multiply this by the number of permutations that contain n heads and Nn tails, where this number is a binomial coefficient (described next).

The Binomial Coefficient: The number of permutations that contain exactly x heads amongst N coin flips is

CN,x=N!/(x(Nx)!),

(E.15)

where CN,x is a binomial coefficient and is pronounced N choose x. If a coin is flipped N = 10 times and the number of heads is x = 7 then the number of permutations that contain x = 7 heads and Nx = 3 tails is CN,x = 120. If a coin has a bias θ = 0.7 then the probability of obtaining any one of these permutations, such as, x7 = (xh, xh, xh, xh, xh, xh, xh, xt, xt, xt), is

p(x7|θ)=θ7(1θ)3=2.223×103.

(E.16)

Given that the probability of obtaining any other permutation with x = 7 heads and Nx tails is also 2.223 × 10−3, and that there are CN,x such permutations, it follows that the probability of obtaining x heads and Nx tails in any order (ie a combination) is

p(x | θ)

=

CN,xp(x7|θ)

(E.17)

 

=

120×(2.223×103)

(E.18)

 

=

0.267.

(E.19)

The Binomial Distribution: If we flip this coin N times then the probability of observing x heads given a coin bias θ is

p(x|θ,N)=CN,xθx(1θ)Nx.

(E.20)

Image

Figure E.1.: The binomial distribution. Given a coin with a bias of θ = 0.7, which is flipped N = 10 times, the probability p(x|θ) of obtaining different numbers x of heads defines a binomial distribution.

The binomial distribution for N = 10 and θ = 0.7 is shown in Figure E.1. (see Section 4.1, p74).

The binomial distribution is a cornerstone in the analysis of binary events. As the number N (eg coin flips) increases, given some mild assumptions, the binomial distribution becomes increasingly like the Gaussian distribution (see Chapter 5 and Appendix F). Because the Gaussian distribution is mathematically convenient, this approximation allows the binomial distribution to be replaced with the Gaussian distribution in a wide variety of contexts.