gensbi.diagnostics.tarp#

Implementation taken from Lemos et al, ‘Sampling-Based Accuracy Testing of Posterior Estimators for General Inference’ https://arxiv.org/abs/2302.03026

The TARP diagnostic is a global diagnostic which can be used to check a trained posterior against a set of true values of theta.

Functions#

`_run_tarp`(posterior_samples, thetas, references[, ...])	Estimates coverage of samples given true values thetas with the TARP method.
`check_tarp`(ecp, alpha)	Check the obtained TARP credibility levels and expected coverage probabilities.
`get_tarp_references`(key, thetas)	Returns reference points for the TARP diagnostic, sampled from a uniform.
`plot_tarp`(ecp, alpha[, title])	Plot the expected coverage probability (ECP) against the credibility level (alpha).
`run_tarp`(thetas, posterior_samples[, seed, ...])	Estimates coverage of samples given true values thetas with the TARP method.

Module Contents#

gensbi.diagnostics.tarp._run_tarp(posterior_samples, thetas, references, distance=l2, num_bins=30, z_score_theta=False)[source]#

Estimates coverage of samples given true values thetas with the TARP method.

Parameters:

posterior_samples (jax.Array)
thetas (jax.Array)
references (jax.Array)
distance (Callable)
num_bins (Optional[int])
z_score_theta (bool)

Return type:

Tuple[jax.Array, jax.Array]

gensbi.diagnostics.tarp.check_tarp(ecp, alpha)[source]#

Check the obtained TARP credibility levels and expected coverage probabilities.

This diagnostic helps to uncover underdispersed, well-covering, or overdispersed posteriors.

Let \(\mathrm{ecp}\) be the expected coverage probability computed with the TARP method, and \(\alpha\) the credibility levels (second output of run_tarp).

The area to curve (ATC) is defined as:

\[\mathrm{ATC} = \sum_{i: \alpha_i > 0.5} \left( \mathrm{ecp}_i - \alpha_i \right)\]

where values close to zero indicate well-calibrated posteriors. Values larger than zero indicate overdispersed distributions (the estimated posterior is too wide), while values smaller than zero indicate underdispersed distributions (the estimated posterior is too narrow). This property can also indicate if the posterior is biased (see Figure 2 of the reference paper).

A two-sample Kolmogorov-Smirnov test is performed between \(\mathrm{ecp}\) and \(\alpha\) to test the null hypothesis that both distributions are identical (produced by one common CDF). The p-value should be close to 1 for well-calibrated posteriors. Commonly, the null is rejected if p-value is below 0.05.

Parameters:

ecp (jax.Array)
alpha (jax.Array)

Return type:

Tuple[float, float]

gensbi.diagnostics.tarp.get_tarp_references(key, thetas)[source]#

Returns reference points for the TARP diagnostic, sampled from a uniform.

Parameters:: thetas (jax.Array)
Return type:: jax.Array

gensbi.diagnostics.tarp.plot_tarp(ecp, alpha, title=None)[source]#

Plot the expected coverage probability (ECP) against the credibility level (alpha).

Parameters:

ecp (array-like) – Array of expected coverage probabilities.
alpha (array-like) – Array of credibility levels.
title (str, optional) – Title for the plot. Default is “”.

Returns:

fig (matplotlib.figure.Figure) – The figure object.
ax (matplotlib.axes.Axes) – The axes object.

Return type:

Tuple[matplotlib.figure.Figure, matplotlib.axes.Axes]

gensbi.diagnostics.tarp.run_tarp(thetas, posterior_samples, seed=1, references=None, distance=l2, num_bins=30, z_score_theta=True)[source]#

Estimates coverage of samples given true values thetas with the TARP method.

Parameters:

thetas (jax.Array)
posterior_samples (jax.Array)
seed (int)
references (Optional[jax.Array])
distance (Callable)
num_bins (Optional[int])
z_score_theta (bool)

Return type:

Tuple[jax.Array, jax.Array]