# Prior distributions¶

One important aspect of Bayesian inference has not yet been discussed in
this tutorial: prior
distributions. In
Bayesian statistics, one has to provide probability (density) values for
every possible parameter value *before* taking into account the data at
hand. This prior distribution thus reflects all *prior* knowledge of the
system that is to be investigated. In the case that no prior knowledge
is available, a *non-informative* prior in the form of the so-called
Jeffreys prior allows
to minimize the effect of the prior on the results. The next two
sub-sections discuss how one can set custom prior distributions for the
parameters of the observation model and for hyper-parameters in a
hyper-study or change-point study.

```
In [1]:
```

```
%matplotlib inline
import matplotlib.pyplot as plt # plotting
import seaborn as sns # nicer plots
sns.set_style('whitegrid') # plot styling
import numpy as np
import bayesloop as bl
# prepare study for coal mining data
S = bl.Study()
S.loadExampleData()
```

```
+ Created new study.
+ Successfully imported example data.
```

## Parameter prior¶

*bayesloop* employs a forward-backward algorithm that is based on
Hidden Markov models.
This inference algorithm iteratively produces a parameter distribution
for each time step, but it has to start these iterations from a
specified probability distribution - the parameter prior. All built-in
observation models already have a predefined prior, stored in the
attribute `prior`

. Here, the prior distribution is stored as a Python
function that takes as many arguments as there are parameters in the
observation model. The prior distributions can be looked up directly
within `observationModels.py`

. For the `Poisson`

model discussed in
this tutorial, the default prior distribution is defined in a method
called `jeffreys`

as

```
def jeffreys(x):
return np.sqrt(1. / x)
```

corresponding to the non-informative Jeffreys prior, \(p(\lambda) \propto 1/\sqrt{\lambda}\). This type of prior can also be determined automatically for arbitrary user-defined observation models, see here.

### Prior functions and arrays¶

To change the predefined prior of a given observation model, one can add
the keyword argument `prior`

when defining an observation model. There
are different ways of defining a parameter prior in *bayesloop*: If
`prior=None`

is set, *bayesloop* will assign equal probability to all
parameter values, resulting in a uniform prior distribution within the
specified parameter boundaries. One can also directly supply a Numpy
array with prior probability (density) values. The shape of the array
must match the shape of the parameter grid! Another way to define a
custom prior is to provide a function that takes exactly as many
arguments as there are parameters in the defined observation model.
*bayesloop* will then evaluate the function for all parameter values and
assign the corresponding probability values.

**Note:** In all of the cases described above, *bayesloop* will
re-normalize the provided prior values, so they do not need to be passed
in a normalized form. Below, we describe the possibility of using
probability distributions from the SymPy stats module as prior
distributions, which are not re-normalized by *bayesloop*.

Next, we illustrate the difference between the Jeffreys prior and a
flat, uniform prior with a very simple inference example: We fit the
coal mining example data set using the `Poisson`

observation model and
further assume the rate parameter to be static:

```
In [2]:
```

```
# we assume a static rate parameter for simplicity
S.set(bl.tm.Static())
print 'Fit with built-in Jeffreys prior:'
S.set(bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000)))
S.fit()
jeffreys_mean = S.getParameterMeanValues('accident_rate')[0]
print('-----\n')
print 'Fit with custom flat prior:'
S.set(bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000),
prior=lambda x: 1.))
# alternatives: prior=None, prior=np.ones(1000)
S.fit()
flat_mean = S.getParameterMeanValues('accident_rate')[0]
```

```
+ Transition model: Static/constant parameter values. Hyper-Parameter(s): []
Fit with built-in Jeffreys prior:
+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Started new fit:
+ Formatted data.
+ Set prior (function): jeffreys. Values have been re-normalized.
+ Finished forward pass.
+ Log10-evidence: -88.00564
+ Finished backward pass.
+ Computed mean parameter values.
-----
Fit with custom flat prior:
+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Started new fit:
+ Formatted data.
+ Set prior (function): <lambda>. Values have been re-normalized.
+ Finished forward pass.
+ Log10-evidence: -87.98915
+ Finished backward pass.
+ Computed mean parameter values.
```

First note that the model evidence indeed slightly changes due to the
different choices of the parameter prior. Second, one may notice that
the posterior mean value of the flat-prior-fit does not exactly match
the arithmetic mean of the data. This small deviation shows that a
flat/uniform prior is not completely non-informative for a Poisson
model! The fit using the Jeffreys prior, however, succeeds in
reproducing the *frequentist* estimate, i.e. the arithmetic mean:

```
In [3]:
```

```
print('arithmetic mean = {}'.format(np.mean(S.rawData)))
print('flat-prior mean = {}'.format(flat_mean))
print('Jeffreys prior mean = {}'.format(jeffreys_mean))
```

```
arithmetic mean = 1.69090909091
flat-prior mean = 1.7
Jeffreys prior mean = 1.69090909091
```

### SymPy prior¶

The second option is based on the
SymPy module that introduces
symbolic mathematics to Python. Its sub-module
sympy.stats covers a
wide range of discrete and continuous random variables. The keyword
argument `prior`

also accepts a list of `sympy.stats`

random
variables, one for each parameter (if there is only one parameter, the
list can be omitted). The multiplicative joint probability density of
these random variables is then used as the prior distribution. The
following example defines an exponential prior for the `Poisson`

model, favoring small values of the rate parameter:

```
In [4]:
```

```
import sympy.stats
S.set(bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000),
prior=sympy.stats.Exponential('expon', 1)))
S.fit()
```

```
+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Started new fit:
+ Formatted data.
+ Set prior (sympy): exp(-x)
+ Finished forward pass.
+ Log10-evidence: -87.94640
+ Finished backward pass.
+ Computed mean parameter values.
```

Note that one needs to assign a name to each `sympy.stats`

variable.
In this case, the output of *bayesloop* shows the mathematical formula
that defines the prior. This is possible because of the symbolic
representation of the prior by `SymPy`

.

**Note:** The support interval of a prior distribution defined via SymPy
can deviate from the parameter interval specified in *bayesloop*. In the
example above, we specified the parameter interval ]0, 6[, while the
exponential prior has the support ]0, \(\infty\)[. SymPy priors
are not re-normalized with respect to the specified parameter interval.
Be aware that the resulting model evidence value will only be correct if
no parameter values outside of the parameter boundaries gain significant
probability values. In most cases, one can simply check whether the
parameter distribution has sufficiently *fallen off* at the parameter
boundaries.

## Hyper-parameter priors¶

As shown before, hyper-studies and change-point
studies can be used to determine the full
distribution of hyper-parameters (the parameters of the transition
model). As for the time-varying parameters of the observation model, one
might have prior knowledge about the values of certain hyper-parameters
that can be included into the study to refine the resulting distribution
of these hyper-parameters. Hyper-parameter priors can be defined just as
regular priors, either by an arbitrary function or by a list of
`sympy.stats`

random variables.

In a first example, we return to the simple change-point model of the coal-mining data set and perform to fits of the change-point: first, we specify no hyper-prior for the time step of our change-point, assuming equal probability for each year in our data set. Second, we define a Normal distribution around the year 1920 with a (rather unrealistic) standard deviation of 5 years as the hyper-prior using a SymPy random variable. For both fits, we plot the change-point distribution to show the differences induced by the different priors:

```
In [5]:
```

```
print 'Fit with flat hyper-prior:'
S = bl.ChangepointStudy()
S.loadExampleData()
L = bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000))
T = bl.tm.ChangePoint('tChange', 'all')
S.set(L, T)
S.fit()
plt.figure(figsize=(8,4))
S.plot('tChange', facecolor='g', alpha=0.7)
plt.xlim([1870, 1930])
plt.show()
print('-----\n')
print 'Fit with custom normal prior:'
T = bl.tm.ChangePoint('tChange', 'all', prior=sympy.stats.Normal('norm', 1920, 5))
S.set(T)
S.fit()
plt.figure(figsize=(8,4))
S.plot('tChange', facecolor='g', alpha=0.7)
plt.xlim([1870, 1930]);
```

```
Fit with flat hyper-prior:
+ Created new study.
--> Hyper-study
--> Change-point analysis
+ Successfully imported example data.
+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Transition model: Change-point. Hyper-Parameter(s): ['tChange']
+ Detected 1 change-point(s) in transition model: ['tChange']
+ Set hyper-prior(s): ['uniform']
+ Started new fit.
+ 109 analyses to run.
+ Computed average posterior sequence
+ Computed hyper-parameter distribution
+ Log10-evidence of average model: -75.71555
+ Computed local evidence of average model
+ Computed mean parameter values.
+ Finished fit.
```

```
-----
Fit with custom normal prior:
+ Transition model: Change-point. Hyper-Parameter(s): ['tChange']
+ Detected 1 change-point(s) in transition model: ['tChange']
+ Set hyper-prior(s): ['sqrt(2)*exp(-(x - 1920)**2/50)/(10*sqrt(pi))']
+ Started new fit.
+ 109 analyses to run.
+ Computed average posterior sequence
+ Computed hyper-parameter distribution
+ Log10-evidence of average model: -80.50692
+ Computed local evidence of average model
+ Computed mean parameter values.
+ Finished fit.
```

Since we used a quite narrow prior (containing a lot of information) in the second case, the resulting distribution is strongly shifted towards the prior. The following example revisits the two break-point-model from here and a linear decrease with a varying slope as a hyper-parameter. Here, we define a Gaussian prior for the slope hyper-parameter, which is centered around the value -0.2 with a standard deviation of 0.4, via a lambda-function. For simplification, we set the break-points to fixed years.

```
In [6]:
```

```
S = bl.HyperStudy()
S.loadExampleData()
L = bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000))
T = bl.tm.SerialTransitionModel(bl.tm.Static(),
bl.tm.BreakPoint('t_1', 1880),
bl.tm.Deterministic(lambda t, slope=np.linspace(-2.0, 0.0, 30): t*slope,
target='accident_rate',
prior=lambda slope: np.exp(-0.5*((slope + 0.2)/(2*0.4))**2)/0.4),
bl.tm.BreakPoint('t_2', 1900),
bl.tm.Static()
)
S.set(L, T)
S.fit()
```

```
+ Created new study.
--> Hyper-study
+ Successfully imported example data.
+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Transition model: Serial transition model. Hyper-Parameter(s): ['slope', 't_1', 't_2']
+ Set hyper-prior(s): ['<lambda> (re-normalized)', 'uniform', 'uniform']
+ Started new fit.
+ 30 analyses to run.
+ Computed average posterior sequence
+ Computed hyper-parameter distribution
+ Log10-evidence of average model: -74.84129
+ Computed local evidence of average model
+ Computed mean parameter values.
+ Finished fit.
```

Finally, note that you can mix SymPy- and function-based hyper-priors for nested transition models.