statistics
Routines for computing statistics related to MCMC time series.
per_chain_autocorr_fast(samples, cutoff=None)
Calculate autocorrelation curve per chain using FFT.
See Sokal, Alan D., "Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms," pp. 13-16, September 1996. http://staff.ustc.edu.cn/~yjdeng/lecturenotes/cargese_1996.pdf
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples |
np.ndarray |
samples of shape (N, M) where N is num-samples-per-chain and M is num-chains. |
required |
cutoff |
int |
hard cut-off for the length of returned curve. |
None |
Returns:
Type | Description |
---|---|
np.ndarray |
autcorrelation curve per chain, of shape (C, M), where C = min(N, cutoff) |
Source code in vmcnet/mcmc/statistics.py
def per_chain_autocorr_fast(
samples: np.ndarray, cutoff: Optional[int] = None
) -> np.ndarray:
"""Calculate autocorrelation curve per chain using FFT.
See Sokal, Alan D., "Monte Carlo Methods in Statistical Mechanics: Foundations
and New Algorithms," pp. 13-16, September 1996.
http://staff.ustc.edu.cn/~yjdeng/lecturenotes/cargese_1996.pdf
Args:
samples (np.ndarray): samples of shape (N, M) where N is num-samples-per-chain
and M is num-chains.
cutoff (int): hard cut-off for the length of returned curve.
Returns:
np.ndarray: autcorrelation curve per chain, of shape (C, M), where
C = min(N, cutoff)
"""
N = len(samples)
if cutoff is None:
cutoff = N
else:
cutoff = min(cutoff, N)
centered_samples = samples - np.mean(samples, axis=0, keepdims=True)
# Calculate autocorrelation curve efficiently as the inverse Fourier transform
# of the power spectral density of the series.
fvi = np.fft.fft(centered_samples, n=(2 * N), axis=0)
G = np.real(np.fft.ifft(fvi * np.conjugate(fvi), axis=0))[:cutoff]
# Divide (i)th term by (n-i) to account for the number of available samples.
normalization_factors = N - np.arange(cutoff)
G /= np.expand_dims(normalization_factors, -1)
# Divide by C(0) to get the normalized autocorrelation curve.
G /= G[:1]
return G
multi_chain_autocorr_and_variance(samples, cutoff=None)
Calculate multi-chain autocorrelation curve with cutoff and multi-chain variance.
The variance estimate here is the population variance, sum_i (x_i - mu)^2 / N, and not the sample variance, sum_i (x_i - mu)^2 / (N - 1).
See Stan Reference Manual, Version 2.27, Section 16.3-16-4 https://mc-stan.org/docs/2_27/reference-manual
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples |
np.ndarray |
samples of shape (N, M) where N is num-samples-per-chain and M is num-chains. |
required |
cutoff |
int |
hard cut-off for the length of returned curve. |
None |
Returns:
Type | Description |
---|---|
(np.ndarray, np.float32) |
combined autcorrelation curve using data from all chains, of length C, where C = min(N, cutoff); overall variance estimate |
Source code in vmcnet/mcmc/statistics.py
def multi_chain_autocorr_and_variance(
samples: np.ndarray, cutoff: Optional[int] = None
) -> Tuple[np.ndarray, np.float32]:
"""Calculate multi-chain autocorrelation curve with cutoff and multi-chain variance.
The variance estimate here is the population variance, sum_i (x_i - mu)^2 / N,
and *not* the sample variance, sum_i (x_i - mu)^2 / (N - 1).
See Stan Reference Manual, Version 2.27, Section 16.3-16-4
https://mc-stan.org/docs/2_27/reference-manual
Args:
samples (np.ndarray): samples of shape (N, M) where N is num-samples-per-chain
and M is num-chains.
cutoff (int): hard cut-off for the length of returned curve.
Returns:
(np.ndarray, np.float32): combined autcorrelation curve using data from all
chains, of length C, where C = min(N, cutoff); overall variance estimate
"""
N = len(samples)
per_chain_autocorr = per_chain_autocorr_fast(samples, cutoff)
per_chain_var = np.var(samples, axis=0, ddof=1)
autocorrelation_term = np.mean(per_chain_var * per_chain_autocorr, axis=1)
between_chain_term = np.var(np.mean(samples, axis=0), ddof=1)
within_chain_term = np.mean(per_chain_var)
overall_var_estimate = within_chain_term * (N - 1) / N + between_chain_term
return (
1 - (within_chain_term - autocorrelation_term) / overall_var_estimate,
overall_var_estimate,
)
tau(autocorr_curve)
Integrated autocorrelation, with automatic truncation.
Uses Geyer's initial minimum sequence for estimating the integrated autocorrelation. This is a consistent overestimate of the asymptotic IAC.
See Geyer, Charles J. 2011. “Introduction to Markov Chain Monte Carlo.” In Handbook of Markov Chain Monte Carlo, edited by Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng, 3–48. Chapman; Hall/CRC.
Also see Charles J. Geyer. "Practical Markov Chain Monte Carlo." Statist. Sci. 7 (4) 473 - 483, November, 1992. https://doi.org/10.1214/ss/1177011137
Parameters:
Name | Type | Description | Default |
---|---|---|---|
autocorr_curve |
np.ndarray |
1D array containing the series of autocorrelation values over which to calculate the autocorrelation time. |
required |
Returns:
Type | Description |
---|---|
(np.ndarray) |
a single estimate of the autocorrelation time, packaged as an array. |
Source code in vmcnet/mcmc/statistics.py
def tau(autocorr_curve: np.ndarray) -> np.ndarray:
"""Integrated autocorrelation, with automatic truncation.
Uses Geyer's initial minimum sequence for estimating the integrated autocorrelation.
This is a consistent overestimate of the asymptotic IAC.
See Geyer, Charles J. 2011. “Introduction to Markov Chain Monte Carlo.” In Handbook
of Markov Chain Monte Carlo, edited by Steve Brooks, Andrew Gelman, Galin L. Jones,
and Xiao-Li Meng, 3–48. Chapman; Hall/CRC.
Also see Charles J. Geyer. "Practical Markov Chain Monte Carlo." Statist. Sci.
7 (4) 473 - 483, November, 1992. https://doi.org/10.1214/ss/1177011137
Args:
autocorr_curve (np.ndarray): 1D array containing the series of autocorrelation
values over which to calculate the autocorrelation time.
Returns:
(np.ndarray): a single estimate of the autocorrelation time, packaged
as an array.
"""
# Cut off the last sample if necessary to get an even number of samples.
nsamples = autocorr_curve.shape[0]
even_nsamples = int(2 * np.floor(nsamples / 2))
even_length_curve = autocorr_curve[:even_nsamples]
# Create new curve containing sums of adjacent pairs from the initial curve.
paired_curve = even_length_curve.reshape(
(even_nsamples // 2, 2) + tuple(autocorr_curve.shape[1:])
)
sums_of_pairs = np.sum(paired_curve, axis=1)
# The new curve is always positive in theory. In practice, the first negative term
# can be used as a sentinel to decide when to cut off the calculation.
negative_indices = np.nonzero(sums_of_pairs < 0)[0]
if len(negative_indices) > 0:
first_negative_idx = negative_indices[0]
else:
first_negative_idx = len(sums_of_pairs)
positive_sums_curve = sums_of_pairs[:first_negative_idx]
# Final estimate is based on the sum of the monotonically decreasing curve created
# by taking the running minimum of the positive_sums_curve.
monotonic_min_curve = np.minimum.accumulate(positive_sums_curve)
return -1.0 + 2.0 * np.sum(monotonic_min_curve, axis=0)
get_stats_summary(samples)
Get a summary of the stats (mean, var, std_err, iac) for a collection of samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
samples |
np.ndarray |
samples of shape (N, M) where N is num-samples-per-chain and M is num-chains. |
required |
Returns:
Type | Description |
---|---|
dictionary |
a summary of the statistics, with keys "average", "variance", "std_err", and "integrated_autocorrelation" |
Source code in vmcnet/mcmc/statistics.py
def get_stats_summary(samples: np.ndarray) -> Dict[str, np.float32]:
"""Get a summary of the stats (mean, var, std_err, iac) for a collection of samples.
Args:
samples (np.ndarray): samples of shape (N, M) where N is num-samples-per-chain
and M is num-chains.
Returns:
dictionary: a summary of the statistics, with keys "average", "variance",
"std_err", and "integrated_autocorrelation"
"""
# Nested mean may be more numerically stable than single mean
average = np.mean(np.mean(samples, axis=-1), axis=-1)
autocorr_curve, variance = multi_chain_autocorr_and_variance(samples)
iac = tau(autocorr_curve)
std_err = np.sqrt(iac * variance / np.size(samples))
eval_statistics = {
"average": average,
"variance": variance,
"std_err": std_err,
"integrated_autocorrelation": iac,
}
return eval_statistics