**Bayes’ rule** Given some observed data *x*, the posterior probability that the parameter Θ has the value *θ* is *p*(*θ*|*x*) = *p*(*x*|*θ*)*p*(*θ*)/*p*(*x*), where *p*(*x*|*θ*) is the likelihood, *p*(*θ*) is the prior probability of the value *θ*, and *p*(*x*) is the marginal probability of the value *x*.

**conditional probability** The probability that the value of one random variable Θ has the value *θ* given that the value of another random variable *X* has the value *x*; written as *p*(Θ = *θ*|*X* = *x*) or *p*(*θ*|*x*).

**forward probability** Reasoning forwards from the known value of a parameter to the probability of some event defines the forward probability of that event. For example, if a coin has a bias of *θ* then the forward probability *p*(*x _{h}*|

**independence** If two variables *X* and Θ are independent then the value *x* of *X* provides no information regarding the value *θ* of the other variable Θ, and vice versa.

**inverse probability** Reasoning backwards from an observed measurement *x _{h}* eg coin flip) involves finding the posterior or inverse probability

**joint probability** The probability that two or more quantities simultaneously adopt specified values. For example, the probability that a coin flip yields a head *x _{h}* and that a (possibly different) coin has a bias

**likelihood** The conditional probability *p*(*x*|*θ*) that the observed data *X* has the value *x* given a putative parameter value *θ* is the likelihood of *θ*, and is often written as *L*(*θ*|*x*). When considered over all values Θ of *θ*, *p*(*x*|Θ) defines a *likelihood function*.

**marginal distribution** The distribution that results from marginalisation of a multivariate (eg 2D) distribution. For example, the 2D distribution *p*(*X*, Θ) shown in Figure 3.4 has two marginal distributions, which, in this case, are the prior distribution *p*(Θ) and the distribution of marginal likelihoods *p*(*X*).

**maximum a posteriori (MAP)** Given some observed data *x*, the value *θ _{MAP}* of an unknown parameter Θ that makes the posterior probability

**maximum likelihood estimate (MLE)** Given some observed data *x*, the value *θ _{MLE}* of an unknown parameter Θ that makes the likelihood function

**noise** Usually considered to be the random jitter that is part of a measured quantity.

**non-informative prior** See reference prior, and Section 4.8.

**parameter** A variable (often a random variable), which is part of an equation which, in turn, acts as a model for observed data.

**posterior** The posterior probability *p*(*θ*|*x*) is the probability that a parameter Θ has the value *θ*, based on current evidence (data, *x*) and prior knowledge. When considered over all values of *θ*, it refers to the posterior probability distribution *p*(Θ|*x*).

**prior** The prior probability *p*(*θ*) is the probability that the random variable Θ adopts the value *θ*. When considered over all values Θ, it is the prior probability distribution *p*(Θ).

**probability** There are many definitions of probability. The two main ones are (using coin bias as an example): 1) Bayesian probability: an observer’s estimate of the probability that a coin will land heads up is based on all the information the observer has, including the proportion of times it was observed to land heads up in the past. 2) Frequentist probability: the probability that a coin will land heads up is given by the proportion of times it lands heads up, when measured over a large number of coin flips.

**probability density function (pdf)** The function *p*(Θ) of a continuous random variable Θ defines the probability density of each possible value of Θ. The probability that Θ = *θ* can be considered as the probability density *p*(*θ*) (it is actually the product *p*(*θ*) × *dθ*).

**probability distribution** The distribution of probabilities of different values of a variable. The probability distribution of a continuous variable is a *probability density function*, and the probability distribution of a discrete variable is a *probability function*. When we refer to a case which includes either continuous or discrete variables, we use the term *probability distribution* in this text.

**probability function (pf)** A function *p*(Θ) of a discrete random variable Θ defines the probability of each possible value of Θ. The probability that Θ = *θ* is *p*(Θ = *θ*) or more succinctly *p*(*θ*). This is called a *probability mass function* (pmf) in some texts.

**product rule** The joint probability *p*(*x*, *θ*) is given by the product of the conditional probability *p*(*x*|*θ*) and the probability *p*(*θ*); that is, *p*(*x*, *θ*) = *p*(*x*|*θ*)*p*(*θ*). See Appendix C.

**random variable (RV)** Each value of a random variable can be considered as one possible outcome of an experiment that has a number of different possible outcomes, such as the throw of a die. The set of possible outcomes is the sample space of a random variable. A discrete random variable has a probability function (pf), which assigns a probability to each possible value. A continuous random variable has a probability density function (pdf), which assigns a probability density to each possible value. Upper case letters (eg *X*) refer to random variables, and (depending on context) to the set of all possible values of that variable. See Section 2.1, p29.

**real number** A number that can have any value corresponding to the length of a continuous line.

**regression** A technique used to fit a parametric curve (eg a straight line) to a set of data points.

**reference prior** A prior distribution that is ‘fair’. See Section 4.8, p91 and Appendix H, p157.

**standard deviation** The standard deviation of a variable is a measure of how ‘spread out’ its values are. If we have a sample of *n* values of a variable *x* then the standard deviation of our sample is

$\begin{array}{ccc}\sigma & =& \sqrt{\frac{1}{n}\underset{i=1}{\overset{n}{\mathrm{\Sigma}}}{({x}_{i}-\overline{x})}^{2}},\end{array}$ |

where $\overline{x}$ is the mean of our sample. The sample’s variance is σ^{2}.

**sum rule** This states that the probability *p*(*x*) that *X*= *x* is the sum of joint probabilities *p*(*x*, Θ), where this sum is taken over all *N* possible values of Θ,

$\begin{array}{ccc}p\left(x\right)& =& \underset{i=1}{\overset{N}{\mathrm{\Sigma}}}p(x,{\theta}_{i})\end{array}\mathrm{.}$

Also known as the law of total probability. See Appendix C.

**variable** A variable is essentially a ‘container’, usually for one number. We use the lower case (eg *x*) to refer to a particular value of a variable.