## Reference Priors

The question of what constitutes an un-biased or fair prior has several answers. Here, we provide a brief account of the answer given by Bernardo(1979)3, who called them reference priors.

Reference priors rely on the idea of mutual information. In essence, the mutual information between two variables is a measure of how tightly coupled they are, and can be considered to be a general measure of the correlation between variables. More formally, it is the average amount of Shannon information conveyed about one variable by the other variable. For our purposes, we note that the mutual information I(x, θ) between x and θ is also the average difference between the posterior p(θ|x) and the prior p(θ), where this difference is measured as the Kullback-Leibler divergence. A reference prior is defined as that particular prior which makes the mutual information between x and θ as large as possible, and (equivalently) maximises the average Kullback-Leibler divergence between the posterior and the prior.

What has this to do with fair priors? A defining, and useful, feature of mutual information is that it is immune or invariant to the effects of transformations of variables. For example, if a measurement device adds a constant amount k to each reading, so that we measure x as y = x + k, then the mean θ becomes ϕ = θ + k, where θ and ϕ are location parameters. Despite the addition of k to measured values, the mutual information between ϕ and y remains the same as the mutual information between θ and x; that is, I(y,ϕ) = I(x, θ). Thus, the fairness of a prior (defined in terms of transformation invariance) is guaranteed if we choose a common prior for θ and ϕ which ensures that I(y, ϕ) = I(x, θ). Indeed, it is possible to harness this equality to derive priors which have precisely the desired invariance. It can be shown that the only prior that satisfies this equality for a location parameter (such as the mean) is the uniform prior.

As a more concrete example, suppose we wish to estimate the length θ of a table, based on many noisy measurements of x inches each. If we accidentally included a blank part at the beginning of the ruler, which has length k inches, then each measurement would be y = x + k inches, and the mean would be ϕ = θ + k inches. Whichever prior p(θ) we use for the mean θ of x, the corresponding prior p(ϕ) for the mean ϕ of y = x + k should remain fair, regardless of the accidental offset k. As stated above, the only prior that guarantees this is the uniform prior.

As a further example, if a measurement device multiplies each reading by a constant c then we would measure x as z = cx. This could occur because the tape measure used is made of a material that has stretched (so that each measurement is in error by a constant proportion), or it could because we do not know whether lengths were measured in inches or feet. In either case, we would be ignorant of the scale of the measurements. If we define σ to be the standard deviation of x then the parameter σ would get transformed to ψ = , where σ and ψ are scale parameters. As in the previous example, if a prior is fair σ for then it should remain fair for ψ = , so that p(θ) = p() for any value of c. It can be shown that the only prior that satisfies the equality I(z, ψ) = I(x, σ) for a scale parameter is p(σ) = 1/σ, which is therefore the reference prior.

Of course, if it is obvious what the correct prior is then, for the sake of consistency, this prior should also be a reference prior. For example, for the joint pdf p(x, θ), the marginal pdf p(θ) is, by definition, the correct prior (see Chapter 6). Crucially, it can be shown that this marginal pdf of p(x, θ) is indeed the pdf that makes the mutual information between x and θ as large as possible, and is therefore the reference prior for θ.