Initial public release

This commit is contained in:
fjosw 2020-10-13 16:53:00 +02:00
commit d9b2077d2c
24 changed files with 6794 additions and 0 deletions

7
.gitignore vendored Normal file
View file

@ -0,0 +1,7 @@
__pycache__
*.pyc
.ipynb_*
examples/B1k2_pcac_plateau.p
examples/Untitled.*
core.*
*.swp

97
CHANGELOG.md Normal file
View file

@ -0,0 +1,97 @@
# Changelog
All notable changes to this project will be documented in this file.
## [1.0.0] - 2020-10-13
### Added
- Compatibility with the BDIO Native format outlined [here](https://ific.uv.es/~alramos/docs/ADerrors/tutorial/). Read and write function added to input.bdio
- new function `input.bdio.read_dSdm` which can read the bdio output of the
program `dSdm` by Tomasz Korzec
- Expected chisquare implemented for fits with xerrors
- New implementation of the covariance of two observables which employs the
arithmetic mean of the integrated autocorrelation times of the two
observables. This new procedure has proven to be less biased in simulated
data and is also much faster to compute as the computation time is of O(N)
whereas the evaluation of the full correlation function is of O(Nlog(N)).
- Added function `gen_correlated_data` to `misc` which generates a set of
observables with given covariance and autocorrelation.
### Fixed
- Bias correction hep-lat/0306017 eq. (49) is no longer applied to the
exponential tail in the critical slowing down analysis, but only to the part
which is directly estimated from rho. This can lead to slightly smaller
errors when using the critical slowing down analysis. The values for the
integrated autocorrelation time tauint now include this bias correction (up
to now the bias correction was applied after estimating tauint). The errors
resulting from the automatic windowing procedure are unchanged.
## [0.8.1] - 2020-06-09
### Fixed
- Bug in fits.standard_fit fixed which occurred when attempting a fit with zero
degrees of freedom.
## [0.8.0] - 2020-06-05
### Added
- `merge_obs` function added which allows to merge Obs which describe different replica of the same observable and have been read in separately. Use with care as there is no safeguard implemented which prevent you from merging unrelated Obs.
- `standard fit` and `odr_fit` can now treat fits with several x-values via tuples.
- Fit functions have a new kwarg `dict_output` which allows to change the
output to a dictionary containing additional information.
- `S_dict` and `tau_exp_dict` added to Obs in which global values for individual ensembles can be stored.
- new function `read_pbp` added which reads dS/dm_q from pbp.dat files.
- new function `extract_t0` added which can extract the value of t0 from .ms.dat files of openQCD v 1.2
### Changed
- When creating an Obs object defined for multiple replica/ensembles, the given names are now sorted alphabetically before assigning the internal dictionaries. This makes sure that `my_Obs` has the same dictionaries as `my_Obs * 1` (`derived_observable` always sorted the names). WARNING: `Obs` created with previous versions of pyerrors may not be completely identical to new ones (The internal dictionaries may have different ordering). However, this should not affect the inner workings of the error analysis.
### Fixed
- Bug in `covariance` fixed which appeared when different ensemble contents were used.
## [0.7.0] - 2020-03-10
### Added
- New fit funtions for fitting with and without x-errors added which use automatic differentiation and should be more reliable than the old ones.
- Fitting with Bayesian priors added.
- New functions for visualization of fits which can be activated via the kwargs resplot and qqplot.
- chisquare/expected_chisquared which takes into account correlations in the data and non-linearities in the fit function can now be activated with the kwarg expected_chisquare.
- Silent mode added to fit functions.
- Examples reworked.
- Changed default function to compute covariances.
- output of input.bdio.read_mesons is now a dictionary instead of a list.
### Deprecated
- The function `fit_general` which is based on numerical differentiation will be removed in future versions as new fit functions based on automatic differentiation are now available.
## [0.6.1] - 2020-01-14
### Added
- mesons bdio functionality improved and accelerated, progress report added.
- added the possibility to manually supply a jacobian to derived_observable via the kwarg `man_grad`. This feature was not implemented for the user, but for internal optimization of most basic arithmetic operations which now do not require a call to the autograd package anymore. This results in a speed up of 2 to 3, especially relevant for the multiplication of large matrices.
### Changed
- input.py and bdio.py moved into submodule input. This should not affect the user API.
- autograd.numpy was replaced by pure numpy wherever it was possible. This should result in a slight speed up.
### Fixed
- fixed bias_correction which broke as a result of the vectorized derived_observable.
- linalg.eig does not give an error anymore if the eigenvalues are complex by just truncating the imaginary part.
## [0.6.0] - 2020-01-06
### Added
- Matrix pencil method for algebraic extraction of energy levels implemented according to [Y. Hua, T. K. Sarkar, IEEE Trans. Acoust. 38, 814-824 (1990)](https://ieeexplore.ieee.org/document/56027) in module `mpm.py`.
- Import API simplified. After `import pyerrors as pe`, some submodules can be accessed via `pe.fits` etc.
- `derived_observable` now supports functions which have single- or multi-dimensional numpy arrays as input and/or output (Works only with automatic differentiation).
- Matrix functions accelerated by using the new version of `derived_observable`.
- New matrix functions: Moore-Penrose Pseudoinverse, Singular Value Decomposition, eigenvalue determination of a general matrix (automatic differentiation included from autograd master).
- Obs can now be compared with < or >, a list of Obs can now be sorted.
- Numerical differentiation can now be controlled via the kwargs of numdifftools.step_generators.MaxStepGenerator.
- Tuned standard parameters for numerical derivative to `base_step=0.1` and `step_ratio=2.5`.
### Changed
- Matrix functions moved to new module `linalg.py`.
- Kolmogorov-Smirnov test moved to new module `misc.py`.
## [0.5.0] - 2019-12-19
### Added
- Numerical differentiation is now based on the package numdifftools which should be more reliable.
### Changed
- kwarg `h_num_grad` changed to `num_grad` which takes boolean values (default False).
- Speed up of rfft calculation of the autocorrelation by reducing the zero padding.

21
LICENSE Normal file
View file

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2020 Fabian Joswig
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

56
README.md Normal file
View file

@ -0,0 +1,56 @@
# pyerrors
pyerrors is a python package for error computation and propagation of Markov Chain Monte Carlo data.
It is based on the gamma method [arXiv:hep-lat/0306017](https://arxiv.org/abs/hep-lat/0306017). Some of its features are:
* automatic differentiation as suggested in [arXiv:1809.01289](https://arxiv.org/abs/1809.01289) (partly based on the [autograd](https://github.com/HIPS/autograd) package)
* the treatment of slow modes in the simulation as suggested in [arXiv:1009.5228](https://arxiv.org/abs/1009.5228)
* multi ensemble analyses
* non-linear fits with y-errors and exact linear error propagation based on automatic differentiation as introduced in [arXiv:1809.01289]
* non-linear fits with x- and y-errors and exact linear error propagation based on automatic differentiation
* matrix valued operations and their error propagation based on automatic differentiation (cholesky decomposition, calculation of eigenvalues and eigenvectors, singular value decomposition...)
* implementation of the matrix-pencil-method [IEEE Trans. Acoust. 38, 814-824 (1990)](https://ieeexplore.ieee.org/document/56027) for the extraction of energy levels, especially suited for noisy data and excited states
There exist similar implementations of gamma method error analysis suites in
- [Fortran](https://gitlab.ift.uam-csic.es/alberto/aderrors).
- [Julia](https://gitlab.ift.uam-csic.es/alberto/aderrors.jl)
- [Python 3](https://github.com/mbruno46/pyobs)
## Installation
pyerrors requires python versions >= 3.5.0
Install the package for the local user:
```bash
pip install . --user
```
Run tests to verify the installation:
```bash
pytest .
```
## Usage
The basic objects of a pyerrors analysis are instances of the class `Obs`. They can be initialized with an array of Monte Carlo data (e.g. `samples1`) and a name for the given ensemble (e.g. `'ensemble1'`). The `gamma_method` can then be used to compute the statistical error, taking into account autocorrelations. The `print` method outputs a human readable result.
```python
import numpy as np
import pyerrors as pe
observable1 = pe.Obs([samples1], ['ensemble1'])
observable1.gamma_method()
observable1.print()
```
Often one is interested in secondary observables which can be arbitrary functions of primary observables. `pyerrors` overloads most basic math operations and numpy functions such that the user can work with `Obs` objects as if they were floats
```python
observable3 = 12.0 / observable1 ** 2 - np.exp(-1.0 / observable2)
observable3.gamma_method()
observable3.print()
```
More detailed examples can be found in the `/examples` folder:
* [01_basic_example](examples/01_basic_example.ipynb)
* [02_pcac_example](examples/02_pcac_example.ipynb)
* [03_fit_example](examples/03_fit_example.ipynb)
* [04_matrix_operations](examples/04_matrix_operations.ipynb)
## License
[MIT](https://choosealicense.com/licenses/mit/)

0
conftest.py Normal file
View file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,475 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.append('..')\n",
"import pyerrors as pe\n",
"import numpy as np\n",
"import scipy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As an example we look at a symmetric 2x2 matrix which positive semidefinte and has an error on all entries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[Obs[4.10(20)] Obs[-1.00(10)]]\n",
" [Obs[-1.00(10)] Obs[1.000(10)]]]\n"
]
}
],
"source": [
"obs11 = pe.pseudo_Obs(4.1, 0.2, 'e1')\n",
"obs22 = pe.pseudo_Obs(1, 0.01, 'e1')\n",
"obs12 = pe.pseudo_Obs(-1, 0.1, 'e1')\n",
"matrix = np.asarray([[obs11, obs12], [obs12, obs22]])\n",
"print(matrix)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We require to use `np.asarray` here as it makes sure that we end up with a numpy array of `Obs`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The standard matrix product can be performed with @"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[Obs[17.81] Obs[-5.1]]\n",
" [Obs[-5.1] Obs[2.0]]]\n"
]
}
],
"source": [
"print(matrix @ matrix)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Multiplication with unit matrix leaves the matrix unchanged"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[Obs[4.1] Obs[-1.0]]\n",
" [Obs[-1.0] Obs[1.0]]]\n"
]
}
],
"source": [
"print(matrix @ np.identity(2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mathematical functions work elementwise"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[Obs[30.161857460980094] Obs[-1.1752011936438014]]\n",
" [Obs[-1.1752011936438014] Obs[1.1752011936438014]]]\n"
]
}
],
"source": [
"print(np.sinh(matrix))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a vector of `Obs`, we again use np.asarray to end up with the correct object"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Obs[2.00(40)] Obs[1.00(10)]]\n"
]
}
],
"source": [
"vec1 = pe.pseudo_Obs(2, 0.4, 'e1')\n",
"vec2 = pe.pseudo_Obs(1, 0.1, 'e1')\n",
"vector = np.asarray([vec1, vec2])\n",
"for (i), entry in np.ndenumerate(vector):\n",
" entry.gamma_method()\n",
"print(vector)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The matrix times vector product can then be computed via"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Obs[7.2(1.7)] Obs[-1.00(47)]]\n"
]
}
],
"source": [
"product = matrix @ vector\n",
"for (i), entry in np.ndenumerate(product):\n",
" entry.gamma_method()\n",
"print(product)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Matrix to scalar operations\n",
"If we want to apply a numpy matrix function with a scalar return value we can use `scalar_mat_op`. __Here we need to use the autograd wrapped version of numpy__ (imported as anp) to use automatic differentiation."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"det \t Obs[3.10(28)]\n",
"trace \t Obs[5.10(20)]\n",
"norm \t Obs[4.45(19)]\n"
]
}
],
"source": [
"import autograd.numpy as anp # Thinly-wrapped numpy\n",
"funcs = [anp.linalg.det, anp.trace, anp.linalg.norm]\n",
"\n",
"for i, func in enumerate(funcs):\n",
" res = pe.linalg.scalar_mat_op(func, matrix)\n",
" res.gamma_method()\n",
" print(func.__name__, '\\t', res)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For matrix operations which are not supported by autograd we can use numerical differentiation"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cond \t Obs[6.23(59)]\n",
"expm_cond \t Obs[4.45(19)]\n"
]
}
],
"source": [
"funcs = [np.linalg.cond, scipy.linalg.expm_cond]\n",
"\n",
"for i, func in enumerate(funcs):\n",
" res = pe.linalg.scalar_mat_op(func, matrix, num_grad=True)\n",
" res.gamma_method()\n",
" print(func.__name__, ' \\t', res)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Matrix to matrix operations\n",
"For matrix operations with a matrix as return value we can use another wrapper `mat_mat_op`. Take as an example the cholesky decompostion. __Here we need to use the autograd wrapped version of numpy__ (imported as anp) to use automatic differentiation."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[Obs[2.025(49)] Obs[0.0]]\n",
" [Obs[-0.494(50)] Obs[0.870(29)]]]\n"
]
}
],
"source": [
"cholesky = pe.linalg.mat_mat_op(anp.linalg.cholesky, matrix)\n",
"for (i, j), entry in np.ndenumerate(cholesky):\n",
" entry.gamma_method()\n",
"print(cholesky)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now check if the decomposition was succesfull"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[Obs[-8.881784197001252e-16] Obs[0.0]]\n",
" [Obs[0.0] Obs[0.0]]]\n"
]
}
],
"source": [
"check = cholesky @ cholesky.T\n",
"print(check - matrix)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now further compute the inverse of the cholesky decomposed matrix and check that the product with its inverse gives the unit matrix with zero error."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[Obs[0.494(12)] Obs[0.0]]\n",
" [Obs[0.280(40)] Obs[1.150(39)]]]\n",
"Check:\n",
"[[Obs[1.0] Obs[0.0]]\n",
" [Obs[0.0] Obs[1.0]]]\n"
]
}
],
"source": [
"inv = pe.linalg.mat_mat_op(anp.linalg.inv, cholesky)\n",
"for (i, j), entry in np.ndenumerate(inv):\n",
" entry.gamma_method()\n",
"print(inv)\n",
"print('Check:')\n",
"check_inv = cholesky @ inv\n",
"print(check_inv)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Matrix to matrix operations which are not supported by autograd can also be computed with numeric differentiation"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"orth\n",
"[[Obs[-0.9592(75)] Obs[0.283(25)]]\n",
" [Obs[0.283(25)] Obs[0.9592(75)]]]\n",
"expm\n",
"[[Obs[75(15)] Obs[-21.4(4.2)]]\n",
" [Obs[-21.4(4.2)] Obs[8.3(1.4)]]]\n",
"logm\n",
"[[Obs[1.334(57)] Obs[-0.496(61)]]\n",
" [Obs[-0.496(61)] Obs[-0.203(50)]]]\n",
"sinhm\n",
"[[Obs[37.3(7.4)] Obs[-10.8(2.1)]]\n",
" [Obs[-10.8(2.1)] Obs[3.94(69)]]]\n",
"sqrtm\n",
"[[Obs[1.996(51)] Obs[-0.341(37)]]\n",
" [Obs[-0.341(37)] Obs[0.940(15)]]]\n"
]
}
],
"source": [
"funcs = [scipy.linalg.orth, scipy.linalg.expm, scipy.linalg.logm, scipy.linalg.sinhm, scipy.linalg.sqrtm]\n",
"\n",
"for i,func in enumerate(funcs):\n",
" res = pe.linalg.mat_mat_op(func, matrix, num_grad=True)\n",
" for (i, j), entry in np.ndenumerate(res):\n",
" entry.gamma_method()\n",
" print(func.__name__)\n",
" print(res)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Eigenvalues and eigenvectors\n",
"We can also compute eigenvalues and eigenvectors of symmetric matrices with a special wrapper `eigh`"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Eigenvalues:\n",
"[Obs[0.705(57)] Obs[4.39(19)]]\n",
"Eigenvectors:\n",
"[[Obs[-0.283(25)] Obs[-0.9592(75)]]\n",
" [Obs[-0.9592(75)] Obs[0.283(25)]]]\n"
]
}
],
"source": [
"e, v = pe.linalg.eigh(matrix)\n",
"for (i), entry in np.ndenumerate(e):\n",
" entry.gamma_method()\n",
"print('Eigenvalues:')\n",
"print(e)\n",
"for (i, j), entry in np.ndenumerate(v):\n",
" entry.gamma_method()\n",
"print('Eigenvectors:')\n",
"print(v)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can check that we got the correct result"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Check eigenvector 1\n",
"[Obs[-5.551115123125783e-17] Obs[0.0]]\n",
"Check eigenvector 2\n",
"[Obs[0.0] Obs[-2.220446049250313e-16]]\n"
]
}
],
"source": [
"for i in range(2):\n",
" print('Check eigenvector', i + 1)\n",
" print(matrix @ v[:, i] - v[:, i] * e[i])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

BIN
examples/data/B1k2_f_A.p Normal file

Binary file not shown.

BIN
examples/data/B1k2_f_P.p Normal file

Binary file not shown.

5
pyerrors/__init__.py Normal file
View file

@ -0,0 +1,5 @@
from .pyerrors import *
from . import fits
from . import linalg
from . import misc
from . import mpm

730
pyerrors/fits.py Normal file
View file

@ -0,0 +1,730 @@
#!/usr/bin/env python
# coding: utf-8
import numpy as np
import autograd.numpy as anp
import scipy.optimize
import scipy.stats
import matplotlib.pyplot as plt
from matplotlib import gridspec
from scipy.odr import ODR, Model, RealData
import iminuit
from autograd import jacobian
from autograd import elementwise_grad as egrad
from .pyerrors import Obs, derived_observable, covariance, pseudo_Obs
def standard_fit(x, y, func, silent=False, **kwargs):
"""Performs a non-linear fit to y = func(x) and returns a list of Obs corresponding to the fit parameters.
x has to be a list of floats.
y has to be a list of Obs, the dvalues of the Obs are used as yerror for the fit.
func has to be of the form
def func(a, x):
return a[0] + a[1] * x + a[2] * anp.sinh(x)
For multiple x values func can be of the form
def func(a, x):
(x1, x2) = x
return a[0] * x1 ** 2 + a[1] * x2
It is important that all numpy functions refer to autograd.numpy, otherwise the differentiation
will not work
Keyword arguments
-----------------
dict_output -- If true, the output is a dictionary containing all relevant
data instead of just a list of the fit parameters.
silent -- If true all output to the console is omitted (default False).
initial_guess -- can provide an initial guess for the input parameters. Relevant for
non-linear fits with many parameters.
method -- can be used to choose an alternative method for the minimization of chisquare.
The possible methods are the ones which can be used for scipy.optimize.minimize and
migrad of iminuit. If no method is specified, Levenberg-Marquard is used.
Reliable alternatives are migrad, Powell and Nelder-Mead.
resplot -- If true, a plot which displays fit, data and residuals is generated (default False).
qqplot -- If true, a quantile-quantile plot of the fit result is generated (default False).
expected_chisquare -- If true prints the expected chisquare which is
corrected by effects caused by correlated input data.
This can take a while as the full correlation matrix
has to be calculated (default False).
"""
result_dict = {}
result_dict['fit_function'] = func
x = np.asarray(x)
if x.shape[-1] != len(y):
raise Exception('x and y input have to have the same length')
if len(x.shape) > 2:
raise Exception('Unkown format for x values')
if not callable(func):
raise TypeError('func has to be a function.')
for i in range(25):
try:
func(np.arange(i), x.T[0])
except:
pass
else:
break
n_parms = i
if not silent:
print('Fit with', n_parms, 'parameters')
y_f = [o.value for o in y]
dy_f = [o.dvalue for o in y]
if np.any(np.asarray(dy_f) <= 0.0):
raise Exception('No y errors available, run the gamma method first.')
if 'initial_guess' in kwargs:
x0 = kwargs.get('initial_guess')
if len(x0) != n_parms:
raise Exception('Initial guess does not have the correct length.')
else:
x0 = [0.1] * n_parms
def chisqfunc(p):
model = func(p, x)
chisq = anp.sum(((y_f - model) / dy_f) ** 2)
return chisq
if 'method' in kwargs:
result_dict['method'] = kwargs.get('method')
if not silent:
print('Method:', kwargs.get('method'))
if kwargs.get('method') == 'migrad':
fit_result = iminuit.minimize(chisqfunc, x0)
fit_result = iminuit.minimize(chisqfunc, fit_result.x)
else:
fit_result = scipy.optimize.minimize(chisqfunc, x0, method=kwargs.get('method'))
fit_result = scipy.optimize.minimize(chisqfunc, fit_result.x, method=kwargs.get('method'), tol=1e-12)
chisquare = fit_result.fun
else:
result_dict['method'] = 'Levenberg-Marquardt'
if not silent:
print('Method: Levenberg-Marquardt')
def chisqfunc_residuals(p):
model = func(p, x)
chisq = ((y_f - model) / dy_f)
return chisq
fit_result = scipy.optimize.least_squares(chisqfunc_residuals, x0, method='lm', ftol=1e-15, gtol=1e-15, xtol=1e-15)
chisquare = np.sum(fit_result.fun ** 2)
if not fit_result.success:
raise Exception('The minimization procedure did not converge.')
if x.shape[-1] - n_parms > 0:
result_dict['chisquare/d.o.f.'] = chisquare / (x.shape[-1] - n_parms)
else:
result_dict['chisquare/d.o.f.'] = float('nan')
if not silent:
print(fit_result.message)
print('chisquare/d.o.f.:', result_dict['chisquare/d.o.f.'])
if kwargs.get('expected_chisquare') is True:
W = np.diag(1 / np.asarray(dy_f))
cov = covariance_matrix(y)
A = W @ jacobian(func)(fit_result.x, x)
P_phi = A @ np.linalg.inv(A.T @ A) @ A.T
expected_chisquare = np.trace((np.identity(x.shape[-1]) - P_phi) @ W @ cov @ W)
result_dict['chisquare/expected_chisquare'] = chisquare / expected_chisquare
if not silent:
print('chisquare/expected_chisquare:',
result_dict['chisquare/expected_chisquare'])
hess_inv = np.linalg.pinv(jacobian(jacobian(chisqfunc))(fit_result.x))
def chisqfunc_compact(d):
model = func(d[:n_parms], x)
chisq = anp.sum(((d[n_parms:] - model) / dy_f) ** 2)
return chisq
jac_jac = jacobian(jacobian(chisqfunc_compact))(np.concatenate((fit_result.x, y_f)))
deriv = -hess_inv @ jac_jac[:n_parms, n_parms:]
result = []
for i in range(n_parms):
result.append(derived_observable(lambda x, **kwargs: x[0], [pseudo_Obs(fit_result.x[i], 0.0, y[0].names[0], y[0].shape[y[0].names[0]])] + list(y), man_grad=[0] + list(deriv[i])))
result_dict['fit_parameters'] = result
result_dict['chisquare'] = chisqfunc(fit_result.x)
result_dict['d.o.f.'] = x.shape[-1] - n_parms
if kwargs.get('resplot') is True:
residual_plot(x, y, func, result)
if kwargs.get('qqplot') is True:
qqplot(x, y, func, result)
return result_dict if kwargs.get('dict_output') else result
def odr_fit(x, y, func, silent=False, **kwargs):
"""Performs a non-linear fit to y = func(x) and returns a list of Obs corresponding to the fit parameters.
x has to be a list of Obs, or a tuple of lists of Obs
y has to be a list of Obs
the dvalues of the Obs are used as x- and yerror for the fit.
func has to be of the form
def func(a, x):
y = a[0] + a[1] * x + a[2] * anp.sinh(x)
return y
For multiple x values func can be of the form
def func(a, x):
(x1, x2) = x
return a[0] * x1 ** 2 + a[1] * x2
It is important that all numpy functions refer to autograd.numpy, otherwise the differentiation
will not work.
Based on the orthogonal distance regression module of scipy
Keyword arguments
-----------------
dict_output -- If true, the output is a dictionary containing all relevant
data instead of just a list of the fit parameters.
silent -- If true all output to the console is omitted (default False).
initial_guess -- can provide an initial guess for the input parameters. Relevant for non-linear
fits with many parameters.
expected_chisquare -- If true prints the expected chisquare which is
corrected by effects caused by correlated input data.
This can take a while as the full correlation matrix
has to be calculated (default False).
"""
result_dict = {}
result_dict['fit_function'] = func
x = np.array(x)
x_shape = x.shape
if not callable(func):
raise TypeError('func has to be a function.')
for i in range(25):
try:
func(np.arange(i), x.T[0])
except:
pass
else:
break
n_parms = i
if not silent:
print('Fit with', n_parms, 'parameters')
x_f = np.vectorize(lambda o: o.value)(x)
dx_f = np.vectorize(lambda o: o.dvalue)(x)
y_f = np.array([o.value for o in y])
dy_f = np.array([o.dvalue for o in y])
if np.any(np.asarray(dx_f) <= 0.0):
raise Exception('No x errors available, run the gamma method first.')
if np.any(np.asarray(dy_f) <= 0.0):
raise Exception('No y errors available, run the gamma method first.')
if 'initial_guess' in kwargs:
x0 = kwargs.get('initial_guess')
if len(x0) != n_parms:
raise Exception('Initial guess does not have the correct length.')
else:
x0 = [1] * n_parms
data = RealData(x_f, y_f, sx=dx_f, sy=dy_f)
model = Model(func)
odr = ODR(data, model, x0, partol=np.finfo(np.float).eps)
odr.set_job(fit_type=0, deriv=1)
output = odr.run()
result_dict['residual_variance'] = output.res_var
result_dict['method'] = 'ODR'
result_dict['xplus'] = output.xplus
if not silent:
print('Method: ODR')
print(*output.stopreason)
print('Residual variance:', result_dict['residual_variance'])
if output.info > 3:
raise Exception('The minimization procedure did not converge.')
m = x_f.size
def odr_chisquare(p):
model = func(p[:n_parms], p[n_parms:].reshape(x_shape))
chisq = anp.sum(((y_f - model) / dy_f) ** 2) + anp.sum(((x_f - p[n_parms:].reshape(x_shape)) / dx_f) ** 2)
return chisq
if kwargs.get('expected_chisquare') is True:
W = np.diag(1 / np.asarray(np.concatenate((dy_f.ravel(), dx_f.ravel()))))
if kwargs.get('covariance') is not None:
cov = kwargs.get('covariance')
else:
cov = covariance_matrix(np.concatenate((y, x.ravel())))
number_of_x_parameters = int(m / x_f.shape[-1])
old_jac = jacobian(func)(output.beta, output.xplus)
fused_row1 = np.concatenate((old_jac, np.concatenate((number_of_x_parameters * [np.zeros(old_jac.shape)]), axis=0)))
fused_row2 = np.concatenate((jacobian(lambda x, y : func(y, x))(output.xplus, output.beta).reshape(x_f.shape[-1], x_f.shape[-1] * number_of_x_parameters), np.identity(number_of_x_parameters * old_jac.shape[0])))
new_jac = np.concatenate((fused_row1, fused_row2), axis=1)
A = W @ new_jac
P_phi = A @ np.linalg.inv(A.T @ A) @ A.T
expected_chisquare = np.trace((np.identity(P_phi.shape[0]) - P_phi) @ W @ cov @ W)
if expected_chisquare <= 0.0:
print('Warning, negative expected_chisquare.')
expected_chisquare = np.abs(expected_chisquare)
result_dict['chisquare/expected_chisquare'] = odr_chisquare(np.concatenate((output.beta, output.xplus.ravel()))) / expected_chisquare
if not silent:
print('chisquare/expected_chisquare:',
result_dict['chisquare/expected_chisquare'])
hess_inv = np.linalg.pinv(jacobian(jacobian(odr_chisquare))(np.concatenate((output.beta, output.xplus.ravel()))))
def odr_chisquare_compact_x(d):
model = func(d[:n_parms], d[n_parms:n_parms + m].reshape(x_shape))
chisq = anp.sum(((y_f - model) / dy_f) ** 2) + anp.sum(((d[n_parms + m:].reshape(x_shape) - d[n_parms:n_parms + m].reshape(x_shape)) / dx_f) ** 2)
return chisq
jac_jac_x = jacobian(jacobian(odr_chisquare_compact_x))(np.concatenate((output.beta, output.xplus.ravel(), x_f.ravel())))
deriv_x = -hess_inv @ jac_jac_x[:n_parms + m, n_parms + m:]
def odr_chisquare_compact_y(d):
model = func(d[:n_parms], d[n_parms:n_parms + m].reshape(x_shape))
chisq = anp.sum(((d[n_parms + m:] - model) / dy_f) ** 2) + anp.sum(((x_f - d[n_parms:n_parms + m].reshape(x_shape)) / dx_f) ** 2)
return chisq
jac_jac_y = jacobian(jacobian(odr_chisquare_compact_y))(np.concatenate((output.beta, output.xplus.ravel(), y_f)))
deriv_y = -hess_inv @ jac_jac_y[:n_parms + m, n_parms + m:]
result = []
for i in range(n_parms):
result.append(derived_observable(lambda x, **kwargs: x[0], [pseudo_Obs(output.beta[i], 0.0, y[0].names[0], y[0].shape[y[0].names[0]])] + list(x.ravel()) + list(y), man_grad=[0] + list(deriv_x[i]) + list(deriv_y[i])))
result_dict['fit_parameters'] = result
result_dict['odr_chisquare'] = odr_chisquare(np.concatenate((output.beta, output.xplus.ravel())))
result_dict['d.o.f.'] = x.shape[-1] - n_parms
return result_dict if kwargs.get('dict_output') else result
def prior_fit(x, y, func, priors, silent=False, **kwargs):
"""Performs a non-linear fit to y = func(x) with given priors and returns a list of Obs corresponding to the fit parameters.
x has to be a list of floats.
y has to be a list of Obs, the dvalues of the Obs are used as yerror for the fit.
func has to be of the form
def func(a, x):
y = a[0] + a[1] * x + a[2] * anp.sinh(x)
return y
It is important that all numpy functions refer to autograd.numpy, otherwise the differentiation
will not work
priors has to be a list with an entry for every parameter in the fit. The entries can either be
Obs (e.g. results from a previous fit) or strings containing a value and an error formatted like
0.548(23), 500(40) or 0.5(0.4)
It is important for the subsequent error estimation that the e_tag for the gamma method is large
enough.
Keyword arguments
-----------------
dict_output -- If true, the output is a dictionary containing all relevant
data instead of just a list of the fit parameters.
silent -- If true all output to the console is omitted (default False).
initial_guess -- can provide an initial guess for the input parameters.
If no guess is provided, the prior values are used.
resplot -- if true, a plot which displays fit, data and residuals is generated (default False)
qqplot -- if true, a quantile-quantile plot of the fit result is generated (default False)
tol -- Specify the tolerance of the migrad solver (default 1e-4)
"""
result_dict = {}
result_dict['fit_function'] = func
if Obs.e_tag_global < 4:
print('WARNING: e_tag_global is smaller than 4, this can cause problems when calculating errors from fits with priors')
x = np.asarray(x)
if not callable(func):
raise TypeError('func has to be a function.')
for i in range(100):
try:
func(np.arange(i), 0)
except:
pass
else:
break
n_parms = i
if n_parms != len(priors):
raise Exception('Priors does not have the correct length.')
def extract_val_and_dval(string):
split_string = string.split('(')
if '.' in split_string[0] and '.' not in split_string[1][:-1]:
factor = 10 ** -len(split_string[0].partition('.')[2])
else:
factor = 1
return float(split_string[0]), float(split_string[1][:-1]) * factor
loc_priors = []
for i_n, i_prior in enumerate(priors):
if isinstance(i_prior, Obs):
loc_priors.append(i_prior)
else:
loc_val, loc_dval = extract_val_and_dval(i_prior)
loc_priors.append(pseudo_Obs(loc_val, loc_dval, 'p' + str(i_n)))
result_dict['priors'] = loc_priors
if not silent:
print('Fit with', n_parms, 'parameters')
y_f = [o.value for o in y]
dy_f = [o.dvalue for o in y]
if np.any(np.asarray(dy_f) <= 0.0):
raise Exception('No y errors available, run the gamma method first.')
p_f = [o.value for o in loc_priors]
dp_f = [o.dvalue for o in loc_priors]
if np.any(np.asarray(dp_f) <= 0.0):
raise Exception('No prior errors available, run the gamma method first.')
if 'initial_guess' in kwargs:
x0 = kwargs.get('initial_guess')
if len(x0) != n_parms:
raise Exception('Initial guess does not have the correct length.')
else:
x0 = p_f
def chisqfunc(p):
model = func(p, x)
chisq = anp.sum(((y_f - model) / dy_f) ** 2) + anp.sum(((p_f - p) / dp_f) ** 2)
return chisq
if not silent:
print('Method: migrad')
m = iminuit.Minuit.from_array_func(chisqfunc, x0, error=np.asarray(x0) * 0.01, errordef=1, print_level=0)
if 'tol' in kwargs:
m.tol = kwargs.get('tol')
else:
m.tol = 1e-4
m.migrad()
params = np.asarray(m.values.values())
result_dict['chisquare/d.o.f.'] = m.fval / len(x)
result_dict['method'] = 'migrad'
if not silent:
print('chisquare/d.o.f.:', result_dict['chisquare/d.o.f.'])
if not m.get_fmin().is_valid:
raise Exception('The minimization procedure did not converge.')
hess_inv = np.linalg.pinv(jacobian(jacobian(chisqfunc))(params))
def chisqfunc_compact(d):
model = func(d[:n_parms], x)
chisq = anp.sum(((d[n_parms: n_parms + len(x)] - model) / dy_f) ** 2) + anp.sum(((d[n_parms + len(x):] - d[:n_parms]) / dp_f) ** 2)
return chisq
jac_jac = jacobian(jacobian(chisqfunc_compact))(np.concatenate((params, y_f, p_f)))
deriv = -hess_inv @ jac_jac[:n_parms, n_parms:]
result = []
for i in range(n_parms):
result.append(derived_observable(lambda x, **kwargs: x[0], [pseudo_Obs(params[i], 0.0, y[0].names[0], y[0].shape[y[0].names[0]])] + list(y) + list(loc_priors), man_grad=[0] + list(deriv[i])))
result_dict['fit_parameters'] = result
result_dict['chisquare'] = chisqfunc(np.asarray(params))
if kwargs.get('resplot') is True:
residual_plot(x, y, func, result)
if kwargs.get('qqplot') is True:
qqplot(x, y, func, result)
return result_dict if kwargs.get('dict_output') else result
def fit_lin(x, y, **kwargs):
"""Performs a linear fit to y = n + m * x and returns two Obs n, m.
y has to be a list of Obs, the dvalues of the Obs are used as yerror for the fit.
x can either be a list of floats in which case no xerror is assumed, or
a list of Obs, where the dvalues of the Obs are used as xerror for the fit.
"""
def f(a, x):
y = a[0] + a[1] * x
return y
if all(isinstance(n, Obs) for n in x):
return odr_fit(x, y, f, **kwargs)
elif all(isinstance(n, float) or isinstance(n, int) for n in x) or isinstance(x, np.ndarray):
return standard_fit(x, y, f, **kwargs)
else:
raise Exception('Unsupported types for x')
def fit_exp(data, **kwargs):
"""Fit a single exponential to a discrete time series of Obs without errors.
Keyword arguments
-----------------
shift -- specifies the absolute timeslice value of the first entry of data (default 0.0)
only important if one is interested in the matrix element, for the mass this is irrelevant.
"""
if 'shift' in kwargs:
shift = kwargs.get("shift")
else:
shift = 0
length = len(data)
xsum = 0
xsum2 = 0
ysum = 0
xysum = 0
for i in range(shift, length + shift):
xsum += i
xsum2 += i ** 2
tmp_log = np.log(np.abs(data[i - shift]))
ysum += tmp_log
xysum += i * tmp_log
res0 = -(length * xysum - xsum * ysum) / (length * xsum2 - xsum * xsum) # mass
res1 = np.exp((xsum2 * ysum - xsum * xysum) / (length * xsum2 - xsum * xsum)) # matrix element
return [res0, res1]
def qqplot(x, o_y, func, p):
""" Generates a quantile-quantile plot of the fit result which can be used to
check if the residuals of the fit are gaussian distributed.
"""
residuals = []
for i_x, i_y in zip(x, o_y):
residuals.append((i_y - func(p, i_x)) / i_y.dvalue)
residuals = sorted(residuals)
my_y = [o.value for o in residuals]
probplot = scipy.stats.probplot(my_y)
my_x = probplot[0][0]
fig = plt.figure(figsize=(8, 8 / 1.618))
plt.errorbar(my_x, my_y, fmt='o')
fit_start = my_x[0]
fit_stop = my_x[-1]
samples = np.arange(fit_start, fit_stop, 0.01)
plt.plot(samples, samples, 'k--', zorder=11, label='Standard normal distribution')
plt.plot(samples, probplot[1][0] * samples + probplot[1][1], zorder=10, label='Least squares fit, r=' + str(np.around(probplot[1][2], 3)))
plt.xlabel('Theoretical quantiles')
plt.ylabel('Ordered Values')
plt.legend()
plt.show()
def residual_plot(x, y, func, fit_res):
""" Generates a plot which compares the fit to the data and displays the corresponding residuals"""
xstart = x[0] - 0.5
xstop = x[-1] + 0.5
x_samples = np.arange(xstart, xstop, 0.01)
plt.figure(figsize=(8, 8 / 1.618))
gs = gridspec.GridSpec(2, 1, height_ratios=[3, 1], wspace=0.0, hspace=0.0)
ax0 = plt.subplot(gs[0])
ax0.errorbar(x, [o.value for o in y], yerr=[o.dvalue for o in y], ls='none', fmt='o', capsize=3, markersize=5, label='Data')
ax0.plot(x_samples, func([o.value for o in fit_res], x_samples), label='Fit', zorder=10)
ax0.set_xticklabels([])
ax0.set_xlim([xstart, xstop])
ax0.set_xticklabels([])
ax0.legend()
residuals = (np.asarray([o.value for o in y]) - func([o.value for o in fit_res], x)) / np.asarray([o.dvalue for o in y])
ax1 = plt.subplot(gs[1])
ax1.plot(x, residuals, 'ko', ls='none', markersize=5)
ax1.tick_params(direction='out')
ax1.tick_params(axis="x", bottom=True, top=True, labelbottom=True)
ax1.axhline(y=0.0, ls='--', color='k')
ax1.fill_between(x_samples, -1.0, 1.0, alpha=0.1, facecolor='k')
ax1.set_xlim([xstart, xstop])
ax1.set_ylabel('Residuals')
plt.subplots_adjust(wspace=None, hspace=None)
plt.show()
def covariance_matrix(y):
"""Returns the covariance matrix of y."""
length = len(y)
cov = np.zeros((length, length))
for i, item in enumerate(y):
for j, jtem in enumerate(y[:i + 1]):
if i == j:
cov[i, j] = item.dvalue ** 2
else:
cov[i, j] = covariance(item, jtem)
return cov + cov.T - np.diag(np.diag(cov))
def error_band(x, func, beta):
"""Returns the error band for an array of sample values x, for given fit function func with optimized parameters beta."""
cov = covariance_matrix(beta)
if np.any(np.abs(cov - cov.T) > 1000 * np.finfo(np.float).eps):
print('Warning, Covariance matrix is not symmetric within floating point precision')
print('cov - cov.T:')
print(cov - cov.T)
deriv = []
for i, item in enumerate(x):
deriv.append(np.array(egrad(func)([o.value for o in beta], item)))
err = []
for i, item in enumerate(x):
err.append(np.sqrt(deriv[i] @ cov @ deriv[i]))
err = np.array(err)
return err
def fit_general(x, y, func, silent=False, **kwargs):
"""Performs a non-linear fit to y = func(x) and returns a list of Obs corresponding to the fit parameters.
WARNING: In the current version the fits are performed with numerical derivatives.
Plausibility of the results should be checked. To control the numerical differentiation
the kwargs of numdifftools.step_generators.MaxStepGenerator can be used.
func has to be of the form
def func(a, x):
y = a[0] + a[1] * x + a[2] * np.sinh(x)
return y
y has to be a list of Obs, the dvalues of the Obs are used as yerror for the fit.
x can either be a list of floats in which case no xerror is assumed, or
a list of Obs, where the dvalues of the Obs are used as xerror for the fit.
Keyword arguments
-----------------
silent -- If true all output to the console is omitted (default False).
initial_guess -- can provide an initial guess for the input parameters. Relevant for non-linear fits
with many parameters.
"""
if not silent:
print('WARNING: This function is deprecated and will be removed in future versions.')
print('New fit functions with exact error propagation are now available as alternative.')
if not callable(func):
raise TypeError('func has to be a function.')
for i in range(10):
try:
func(np.arange(i), 0)
except:
pass
else:
break
n_parms = i
if not silent:
print('Fit with', n_parms, 'parameters')
global print_output, beta0
print_output = 1
if 'initial_guess' in kwargs:
beta0 = kwargs.get('initial_guess')
if len(beta0) != n_parms:
raise Exception('Initial guess does not have the correct length.')
else:
beta0 = np.arange(n_parms)
if len(x) != len(y):
raise Exception('x and y have to have the same length')
if all(isinstance(n, Obs) for n in x):
obs = x + y
x_constants = None
xerr = [o.dvalue for o in x]
yerr = [o.dvalue for o in y]
elif all(isinstance(n, float) or isinstance(n, int) for n in x) or isinstance(x, np.ndarray):
obs = y
x_constants = x
xerr = None
yerr = [o.dvalue for o in y]
else:
raise Exception('Unsupported types for x')
def do_the_fit(obs, **kwargs):
global print_output, beta0
func = kwargs.get('function')
yerr = kwargs.get('yerr')
length = len(yerr)
xerr = kwargs.get('xerr')
if length == len(obs):
assert 'x_constants' in kwargs
data = RealData(kwargs.get('x_constants'), obs, sy=yerr)
fit_type = 2
elif length == len(obs) // 2:
data = RealData(obs[:length], obs[length:], sx=xerr, sy=yerr)
fit_type = 0
else:
raise Exception('x and y do not fit together.')
model = Model(func)
odr = ODR(data, model, beta0, partol=np.finfo(np.float).eps)
odr.set_job(fit_type=fit_type, deriv=1)
output = odr.run()
if print_output and not silent:
print(*output.stopreason)
print('chisquare/d.o.f.:', output.res_var)
print_output = 0
beta0 = output.beta
return output.beta[kwargs.get('n')]
res = []
for n in range(n_parms):
res.append(derived_observable(do_the_fit, obs, function=func, xerr=xerr, yerr=yerr, x_constants=x_constants, num_grad=True, n=n, **kwargs))
return res

View file

@ -0,0 +1,2 @@
from .input import *
from . import bdio

628
pyerrors/input/bdio.py Normal file
View file

@ -0,0 +1,628 @@
#!/usr/bin/env python
# coding: utf-8
import ctypes
import hashlib
import autograd.numpy as np # Thinly-wrapped numpy
from ..pyerrors import Obs
def read_ADerrors(file_path, bdio_path='./libbdio.so', **kwargs):
""" Extract generic MCMC data from a bdio file
read_ADerrors requires bdio to be compiled into a shared library. This can be achieved by
adding the flag -fPIC to CC and changing the all target to
all: bdio.o $(LIBDIR)
gcc -shared -Wl,-soname,libbdio.so -o $(BUILDDIR)/libbdio.so $(BUILDDIR)/bdio.o
cp $(BUILDDIR)/libbdio.so $(LIBDIR)/
Parameters
----------
file_path -- path to the bdio file
bdio_path -- path to the shared bdio library libbdio.so (default ./libbdio.so)
"""
bdio = ctypes.cdll.LoadLibrary(bdio_path)
bdio_open = bdio.bdio_open
bdio_open.restype = ctypes.c_void_p
bdio_close = bdio.bdio_close
bdio_close.restype = ctypes.c_int
bdio_close.argtypes = [ctypes.c_void_p]
bdio_seek_record = bdio.bdio_seek_record
bdio_seek_record.restype = ctypes.c_int
bdio_seek_record.argtypes = [ctypes.c_void_p]
bdio_get_rlen = bdio.bdio_get_rlen
bdio_get_rlen.restype = ctypes.c_int
bdio_get_rlen.argtypes = [ctypes.c_void_p]
bdio_get_ruinfo = bdio.bdio_get_ruinfo
bdio_get_ruinfo.restype = ctypes.c_int
bdio_get_ruinfo.argtypes = [ctypes.c_void_p]
bdio_read = bdio.bdio_read
bdio_read.restype = ctypes.c_size_t
bdio_read.argtypes = [ctypes.c_char_p, ctypes.c_size_t, ctypes.c_void_p]
bdio_read_f64 = bdio.bdio_read_f64
bdio_read_f64.restype = ctypes.c_size_t
bdio_read_f64.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_void_p]
bdio_read_int32 = bdio.bdio_read_int32
bdio_read_int32.restype = ctypes.c_size_t
bdio_read_int32.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_void_p]
b_path = file_path.encode('utf-8')
read = 'r'
b_read = read.encode('utf-8')
fbdio = bdio_open(ctypes.c_char_p(b_path), ctypes.c_char_p(b_read), None)
return_list = []
print('Reading of bdio file started')
while 1 > 0:
record = bdio_seek_record(fbdio)
ruinfo = bdio_get_ruinfo(fbdio)
if ruinfo == 7:
print('MD5sum found') # For now we just ignore these entries and do not perform any checks on them
continue
if ruinfo < 0:
# EOF reached
break
rlen = bdio_get_rlen(fbdio)
def read_c_double():
d_buf = ctypes.c_double
pd_buf = d_buf()
ppd_buf = ctypes.c_void_p(ctypes.addressof(pd_buf))
iread = bdio_read_f64(ppd_buf, ctypes.c_size_t(8), ctypes.c_void_p(fbdio))
return pd_buf.value
mean = read_c_double()
print('mean', mean)
def read_c_size_t():
d_buf = ctypes.c_size_t
pd_buf = d_buf()
ppd_buf = ctypes.c_void_p(ctypes.addressof(pd_buf))
iread = bdio_read_int32(ppd_buf, ctypes.c_size_t(4), ctypes.c_void_p(fbdio))
return pd_buf.value
neid = read_c_size_t()
print('neid', neid)
ndata = []
for index in range(neid):
ndata.append(read_c_size_t())
print('ndata', ndata)
nrep = []
for index in range(neid):
nrep.append(read_c_size_t())
print('nrep', nrep)
vrep = []
for index in range(neid):
vrep.append([])
for jndex in range(nrep[index]):
vrep[-1].append(read_c_size_t())
print('vrep', vrep)
ids = []
for index in range(neid):
ids.append(read_c_size_t())
print('ids', ids)
nt = []
for index in range(neid):
nt.append(read_c_size_t())
print('nt', nt)
zero = []
for index in range(neid):
zero.append(read_c_double())
print('zero', zero)
four = []
for index in range(neid):
four.append(read_c_double())
print('four', four)
d_buf = ctypes.c_double * np.sum(ndata)
pd_buf = d_buf()
ppd_buf = ctypes.c_void_p(ctypes.addressof(pd_buf))
iread = bdio_read_f64(ppd_buf, ctypes.c_size_t(8 * np.sum(ndata)), ctypes.c_void_p(fbdio))
delta = pd_buf[:]
samples = np.split(np.asarray(delta) + mean, np.cumsum([a for su in vrep for a in su])[:-1])
no_reps = [len(o) for o in vrep]
assert len(ids) == len(no_reps)
tmp_names = []
ens_length = max([len(str(o)) for o in ids])
for loc_id, reps in zip(ids, no_reps):
for index in range(reps):
missing_chars = ens_length - len(str(loc_id))
tmp_names.append(str(loc_id) + ' ' * missing_chars + 'r' + '{0:03d}'.format(index))
return_list.append(Obs(samples, tmp_names))
bdio_close(fbdio)
print()
print(len(return_list), 'observable(s) extracted.')
return return_list
def write_ADerrors(obs_list, file_path, bdio_path='./libbdio.so', **kwargs):
""" Write Obs to a bdio file according to ADerrors conventions
read_mesons requires bdio to be compiled into a shared library. This can be achieved by
adding the flag -fPIC to CC and changing the all target to
all: bdio.o $(LIBDIR)
gcc -shared -Wl,-soname,libbdio.so -o $(BUILDDIR)/libbdio.so $(BUILDDIR)/bdio.o
cp $(BUILDDIR)/libbdio.so $(LIBDIR)/
Parameters
----------
file_path -- path to the bdio file
bdio_path -- path to the shared bdio library libbdio.so (default ./libbdio.so)
"""
for obs in obs_list:
if not obs.e_names:
raise Exception('Run the gamma method first for all obs.')
bdio = ctypes.cdll.LoadLibrary(bdio_path)
bdio_open = bdio.bdio_open
bdio_open.restype = ctypes.c_void_p
bdio_close = bdio.bdio_close
bdio_close.restype = ctypes.c_int
bdio_close.argtypes = [ctypes.c_void_p]
bdio_start_record = bdio.bdio_start_record
bdio_start_record.restype = ctypes.c_int
bdio_start_record.argtypes = [ctypes.c_size_t, ctypes.c_size_t, ctypes.c_void_p]
bdio_flush_record = bdio.bdio_flush_record
bdio_flush_record.restype = ctypes.c_int
bdio_flush_record.argytpes = [ctypes.c_void_p]
bdio_write_f64 = bdio.bdio_write_f64
bdio_write_f64.restype = ctypes.c_size_t
bdio_write_f64.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_void_p]
bdio_write_int32 = bdio.bdio_write_int32
bdio_write_int32.restype = ctypes.c_size_t
bdio_write_int32.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_void_p]
b_path = file_path.encode('utf-8')
write = 'w'
b_write = write.encode('utf-8')
form = 'pyerrors ADerror export'
b_form = form.encode('utf-8')
fbdio = bdio_open(ctypes.c_char_p(b_path), ctypes.c_char_p(b_write), b_form)
for obs in obs_list:
mean = obs.value
neid = len(obs.e_names)
vrep = [[obs.shape[o] for o in sl] for sl in list(obs.e_content.values())]
vrep_write = [item for sublist in vrep for item in sublist]
ndata = [np.sum(o) for o in vrep]
nrep = [len(o) for o in vrep]
print('ndata', ndata)
print('nrep', nrep)
print('vrep', vrep)
keys = list(obs.e_content.keys())
ids = []
for key in keys:
try: # Try to convert key to integer
ids.append(int(key))
except: # If not possible construct a hash
ids.append(int(hashlib.sha256(key.encode('utf-8')).hexdigest(), 16) % 10 ** 8)
print('ids', ids)
nt = []
for e, e_name in enumerate(obs.e_names):
r_length = []
for r_name in obs.e_content[e_name]:
r_length.append(len(obs.deltas[r_name]))
#e_N = np.sum(r_length)
nt.append(max(r_length) // 2)
print('nt', nt)
zero = neid * [0.0]
four = neid * [4.0]
print('zero', zero)
print('four', four)
delta = np.concatenate([item for sublist in [[obs.deltas[o] for o in sl] for sl in list(obs.e_content.values())] for item in sublist])
bdio_start_record(0x00, 8, fbdio)
def write_c_double(double):
pd_buf = ctypes.c_double(double)
ppd_buf = ctypes.c_void_p(ctypes.addressof(pd_buf))
iwrite = bdio_write_f64(ppd_buf, ctypes.c_size_t(8), ctypes.c_void_p(fbdio))
def write_c_size_t(int32):
pd_buf = ctypes.c_size_t(int32)
ppd_buf = ctypes.c_void_p(ctypes.addressof(pd_buf))
iwrite = bdio_write_int32(ppd_buf, ctypes.c_size_t(4), ctypes.c_void_p(fbdio))
write_c_double(obs.value)
write_c_size_t(neid)
for element in ndata:
write_c_size_t(element)
for element in nrep:
write_c_size_t(element)
for element in vrep_write:
write_c_size_t(element)
for element in ids:
write_c_size_t(element)
for element in nt:
write_c_size_t(element)
for element in zero:
write_c_double(element)
for element in four:
write_c_double(element)
for element in delta:
write_c_double(element)
bdio_close(fbdio)
return 0
def _get_kwd(string, key):
return (string.split(key, 1)[1]).split(" ", 1)[0]
def _get_corr_name(string, key):
return (string.split(key, 1)[1]).split(' NDIM=', 1)[0]
def read_mesons(file_path, bdio_path='./libbdio.so', **kwargs):
""" Extract mesons data from a bdio file and return it as a dictionary
The dictionary can be accessed with a tuple consisting of (type, source_position, kappa1, kappa2)
read_mesons requires bdio to be compiled into a shared library. This can be achieved by
adding the flag -fPIC to CC and changing the all target to
all: bdio.o $(LIBDIR)
gcc -shared -Wl,-soname,libbdio.so -o $(BUILDDIR)/libbdio.so $(BUILDDIR)/bdio.o
cp $(BUILDDIR)/libbdio.so $(LIBDIR)/
Parameters
----------
file_path -- path to the bdio file
bdio_path -- path to the shared bdio library libbdio.so (default ./libbdio.so)
stop -- stops reading at given configuration number (default None)
alternative_ensemble_name -- Manually overwrite ensemble name
"""
bdio = ctypes.cdll.LoadLibrary(bdio_path)
bdio_open = bdio.bdio_open
bdio_open.restype = ctypes.c_void_p
bdio_close = bdio.bdio_close
bdio_close.restype = ctypes.c_int
bdio_close.argtypes = [ctypes.c_void_p]
bdio_seek_record = bdio.bdio_seek_record
bdio_seek_record.restype = ctypes.c_int
bdio_seek_record.argtypes = [ctypes.c_void_p]
bdio_get_rlen = bdio.bdio_get_rlen
bdio_get_rlen.restype = ctypes.c_int
bdio_get_rlen.argtypes = [ctypes.c_void_p]
bdio_get_ruinfo = bdio.bdio_get_ruinfo
bdio_get_ruinfo.restype = ctypes.c_int
bdio_get_ruinfo.argtypes = [ctypes.c_void_p]
bdio_read = bdio.bdio_read
bdio_read.restype = ctypes.c_size_t
bdio_read.argtypes = [ctypes.c_char_p, ctypes.c_size_t, ctypes.c_void_p]
bdio_read_f64 = bdio.bdio_read_f64
bdio_read_f64.restype = ctypes.c_size_t
bdio_read_f64.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_void_p]
b_path = file_path.encode('utf-8')
read = 'r'
b_read = read.encode('utf-8')
form = 'Generic Correlator Format 1.0'
b_form = form.encode('utf-8')
ensemble_name = ''
volume = [] # lattice volume
boundary_conditions = []
corr_name = [] # Contains correlator names
corr_type = [] # Contains correlator data type (important for reading out numerical data)
corr_props = [] # Contanis propagator types (Component of corr_kappa)
d0 = 0 # tvals
d1 = 0 # nnoise
prop_kappa = [] # Contains propagator kappas (Component of corr_kappa)
prop_source = [] # Contains propagator source positions
# Check noise type for multiple replica?
cnfg_no = -1
corr_no = -1
data = []
fbdio = bdio_open(ctypes.c_char_p(b_path), ctypes.c_char_p(b_read), ctypes.c_char_p(b_form))
print('Reading of bdio file started')
while 1 > 0:
record = bdio_seek_record(fbdio)
ruinfo = bdio_get_ruinfo(fbdio)
if ruinfo < 0:
# EOF reached
break
rlen = bdio_get_rlen(fbdio)
if ruinfo == 5:
d_buf = ctypes.c_double * (2 + d0 * d1 * 2)
pd_buf = d_buf()
ppd_buf = ctypes.c_void_p(ctypes.addressof(pd_buf))
iread = bdio_read_f64(ppd_buf, ctypes.c_size_t(rlen), ctypes.c_void_p(fbdio))
if corr_type[corr_no] == 'complex':
tmp_mean = np.mean(np.asarray(np.split(np.asarray(pd_buf[2 + 2 * d1:-2 * d1:2]), d0 - 2)), axis=1)
else:
tmp_mean = np.mean(np.asarray(np.split(np.asarray(pd_buf[2 + d1:-d0 * d1 - d1]), d0 - 2)), axis=1)
data[corr_no].append(tmp_mean)
corr_no += 1
else:
alt_buf = ctypes.create_string_buffer(1024)
palt_buf = ctypes.c_char_p(ctypes.addressof(alt_buf))
iread = bdio_read(palt_buf, ctypes.c_size_t(rlen), ctypes.c_void_p(fbdio))
if rlen != iread:
print('Error')
for i, item in enumerate(alt_buf):
if item == b'\x00':
alt_buf[i] = b' '
tmp_string = (alt_buf[:].decode("utf-8")).rstrip()
if ruinfo == 0:
ensemble_name = _get_kwd(tmp_string, 'ENSEMBLE=')
volume.append(int(_get_kwd(tmp_string, 'L0=')))
volume.append(int(_get_kwd(tmp_string, 'L1=')))
volume.append(int(_get_kwd(tmp_string, 'L2=')))
volume.append(int(_get_kwd(tmp_string, 'L3=')))
boundary_conditions.append(_get_kwd(tmp_string, 'BC0='))
boundary_conditions.append(_get_kwd(tmp_string, 'BC1='))
boundary_conditions.append(_get_kwd(tmp_string, 'BC2='))
boundary_conditions.append(_get_kwd(tmp_string, 'BC3='))
if ruinfo == 1:
corr_name.append(_get_corr_name(tmp_string, 'CORR_NAME='))
corr_type.append(_get_kwd(tmp_string, 'DATATYPE='))
corr_props.append([_get_kwd(tmp_string, 'PROP0='), _get_kwd(tmp_string, 'PROP1=')])
if d0 == 0:
d0 = int(_get_kwd(tmp_string, 'D0='))
else:
if d0 != int(_get_kwd(tmp_string, 'D0=')):
print('Error: Varying number of time values')
if d1 == 0:
d1 = int(_get_kwd(tmp_string, 'D1='))
else:
if d1 != int(_get_kwd(tmp_string, 'D1=')):
print('Error: Varying number of random sources')
if ruinfo == 2:
prop_kappa.append(_get_kwd(tmp_string, 'KAPPA='))
prop_source.append(_get_kwd(tmp_string, 'x0='))
if ruinfo == 4:
if 'stop' in kwargs:
if cnfg_no >= kwargs.get('stop') - 1:
break
cnfg_no += 1
print('\r%s %i' % ('Reading configuration', cnfg_no + 1), end='\r')
if cnfg_no == 0:
no_corrs = len(corr_name)
data = []
for c in range(no_corrs):
data.append([])
corr_no = 0
bdio_close(fbdio)
print('\nEnsemble: ', ensemble_name)
if 'alternative_ensemble_name' in kwargs:
ensemble_name = kwargs.get('alternative_ensemble_name')
print('Ensemble name overwritten to', ensemble_name)
print('Lattice volume: ', volume)
print('Boundary conditions: ', boundary_conditions)
print('Number of time values: ', d0)
print('Number of random sources: ', d1)
print('Number of corrs: ', len(corr_name))
print('Number of configurations: ', cnfg_no + 1)
corr_kappa = [] # Contains kappa values for both propagators of given correlation function
corr_source = []
for item in corr_props:
corr_kappa.append([float(prop_kappa[int(item[0])]), float(prop_kappa[int(item[1])])])
if prop_source[int(item[0])] != prop_source[int(item[1])]:
raise Exception('Source position do not match for correlator' + str(item))
else:
corr_source.append(int(prop_source[int(item[0])]))
result = {}
for c in range(no_corrs):
tmp_corr = []
for t in range(d0 - 2):
tmp_corr.append(Obs([np.asarray(data[c])[:, t]], [ensemble_name]))
result[(corr_name[c], corr_source[c]) + tuple(sorted(corr_kappa[c]))] = tmp_corr
# Check that all data entries have the same number of configurations
if len(set([o[0].N for o in list(result.values())])) != 1:
raise Exception('Error: Not all correlators have the same number of configurations. bdio file is possibly corrupted.')
return result
def read_dSdm(file_path, bdio_path='./libbdio.so', **kwargs):
""" Extract dSdm data from a bdio file and return it as a dictionary
The dictionary can be accessed with a tuple consisting of (type, kappa)
read_dSdm requires bdio to be compiled into a shared library. This can be achieved by
adding the flag -fPIC to CC and changing the all target to
all: bdio.o $(LIBDIR)
gcc -shared -Wl,-soname,libbdio.so -o $(BUILDDIR)/libbdio.so $(BUILDDIR)/bdio.o
cp $(BUILDDIR)/libbdio.so $(LIBDIR)/
Parameters
----------
file_path -- path to the bdio file
bdio_path -- path to the shared bdio library libbdio.so (default ./libbdio.so)
stop -- stops reading at given configuration number (default None)
"""
bdio = ctypes.cdll.LoadLibrary(bdio_path)
bdio_open = bdio.bdio_open
bdio_open.restype = ctypes.c_void_p
bdio_close = bdio.bdio_close
bdio_close.restype = ctypes.c_int
bdio_close.argtypes = [ctypes.c_void_p]
bdio_seek_record = bdio.bdio_seek_record
bdio_seek_record.restype = ctypes.c_int
bdio_seek_record.argtypes = [ctypes.c_void_p]
bdio_get_rlen = bdio.bdio_get_rlen
bdio_get_rlen.restype = ctypes.c_int
bdio_get_rlen.argtypes = [ctypes.c_void_p]
bdio_get_ruinfo = bdio.bdio_get_ruinfo
bdio_get_ruinfo.restype = ctypes.c_int
bdio_get_ruinfo.argtypes = [ctypes.c_void_p]
bdio_read = bdio.bdio_read
bdio_read.restype = ctypes.c_size_t
bdio_read.argtypes = [ctypes.c_char_p, ctypes.c_size_t, ctypes.c_void_p]
bdio_read_f64 = bdio.bdio_read_f64
bdio_read_f64.restype = ctypes.c_size_t
bdio_read_f64.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_void_p]
b_path = file_path.encode('utf-8')
read = 'r'
b_read = read.encode('utf-8')
form = 'Generic Correlator Format 1.0'
b_form = form.encode('utf-8')
ensemble_name = ''
volume = [] # lattice volume
boundary_conditions = []
corr_name = [] # Contains correlator names
corr_type = [] # Contains correlator data type (important for reading out numerical data)
corr_props = [] # Contains propagator types (Component of corr_kappa)
d0 = 0 # tvals
d1 = 0 # nnoise
prop_kappa = [] # Contains propagator kappas (Component of corr_kappa)
# Check noise type for multiple replica?
cnfg_no = -1
corr_no = -1
data = []
fbdio = bdio_open(ctypes.c_char_p(b_path), ctypes.c_char_p(b_read), ctypes.c_char_p(b_form))
print('Reading of bdio file started')
while 1 > 0:
record = bdio_seek_record(fbdio)
ruinfo = bdio_get_ruinfo(fbdio)
if ruinfo < 0:
# EOF reached
break
rlen = bdio_get_rlen(fbdio)
if ruinfo == 5:
d_buf = ctypes.c_double * (2 + d0)
pd_buf = d_buf()
ppd_buf = ctypes.c_void_p(ctypes.addressof(pd_buf))
iread = bdio_read_f64(ppd_buf, ctypes.c_size_t(rlen), ctypes.c_void_p(fbdio))
tmp_mean = np.mean(np.asarray(pd_buf[2:]))
data[corr_no].append(tmp_mean)
corr_no += 1
else:
alt_buf = ctypes.create_string_buffer(1024)
palt_buf = ctypes.c_char_p(ctypes.addressof(alt_buf))
iread = bdio_read(palt_buf, ctypes.c_size_t(rlen), ctypes.c_void_p(fbdio))
if rlen != iread:
print('Error')
for i, item in enumerate(alt_buf):
if item == b'\x00':
alt_buf[i] = b' '
tmp_string = (alt_buf[:].decode("utf-8")).rstrip()
if ruinfo == 0:
creator = _get_kwd(tmp_string, 'CREATOR=')
ensemble_name = _get_kwd(tmp_string, 'ENSEMBLE=')
volume.append(int(_get_kwd(tmp_string, 'L0=')))
volume.append(int(_get_kwd(tmp_string, 'L1=')))
volume.append(int(_get_kwd(tmp_string, 'L2=')))
volume.append(int(_get_kwd(tmp_string, 'L3=')))
boundary_conditions.append(_get_kwd(tmp_string, 'BC0='))
boundary_conditions.append(_get_kwd(tmp_string, 'BC1='))
boundary_conditions.append(_get_kwd(tmp_string, 'BC2='))
boundary_conditions.append(_get_kwd(tmp_string, 'BC3='))
if ruinfo == 1:
corr_name.append(_get_corr_name(tmp_string, 'CORR_NAME='))
corr_type.append(_get_kwd(tmp_string, 'DATATYPE='))
corr_props.append(_get_kwd(tmp_string, 'PROP0='))
if d0 == 0:
d0 = int(_get_kwd(tmp_string, 'D0='))
else:
if d0 != int(_get_kwd(tmp_string, 'D0=')):
print('Error: Varying number of time values')
if ruinfo == 2:
prop_kappa.append(_get_kwd(tmp_string, 'KAPPA='))
if ruinfo == 4:
if 'stop' in kwargs:
if cnfg_no >= kwargs.get('stop') - 1:
break
cnfg_no += 1
print('\r%s %i' % ('Reading configuration', cnfg_no + 1), end='\r')
if cnfg_no == 0:
no_corrs = len(corr_name)
data = []
for c in range(no_corrs):
data.append([])
corr_no = 0
bdio_close(fbdio)
print('\nCreator: ', creator)
print('Ensemble: ', ensemble_name)
print('Lattice volume: ', volume)
print('Boundary conditions: ', boundary_conditions)
print('Number of random sources: ', d0)
print('Number of corrs: ', len(corr_name))
print('Number of configurations: ', cnfg_no + 1)
corr_kappa = [] # Contains kappa values for both propagators of given correlation function
corr_source = []
for item in corr_props:
corr_kappa.append(float(prop_kappa[int(item)]))
result = {}
for c in range(no_corrs):
result[(corr_name[c], str(corr_kappa[c]))] = Obs([np.asarray(data[c])], [ensemble_name])
# Check that all data entries have the same number of configurations
if len(set([o.N for o in list(result.values())])) != 1:
raise Exception('Error: Not all correlators have the same number of configurations. bdio file is possibly corrupted.')
return result

660
pyerrors/input/input.py Normal file
View file

@ -0,0 +1,660 @@
#!/usr/bin/env python
# coding: utf-8
import sys
import os
import fnmatch
import re
import struct
import autograd.numpy as np # Thinly-wrapped numpy
from ..pyerrors import Obs
from ..fits import fit_lin
def read_sfcf(path, prefix, name, **kwargs):
"""Read sfcf C format from given folder structure.
Keyword arguments
-----------------
im -- if True, read imaginary instead of real part of the correlation function.
single -- if True, read a boundary-to-boundary correlation function with a single value
b2b -- if True, read a time-dependent boundary-to-boundary correlation function
names -- Alternative labeling for replicas/ensembles. Has to have the appropriate length
"""
if kwargs.get('im'):
im = 1
part = 'imaginary'
else:
im = 0
part = 'real'
if kwargs.get('single'):
b2b = 1
single = 1
else:
b2b = 0
single = 0
if kwargs.get('b2b'):
b2b = 1
read = 0
T = 0
start = 0
ls = []
for (dirpath, dirnames, filenames) in os.walk(path):
ls.extend(dirnames)
break
if not ls:
print('Error, directory not found')
sys.exit()
for exc in ls:
if fnmatch.fnmatch(exc, prefix + '*'):
ls = list(set(ls) - set(exc))
if len(ls) > 1:
ls.sort(key=lambda x: int(re.findall(r'\d+', x[len(prefix):])[0]))
replica = len(ls)
print('Read', part, 'part of', name, 'from', prefix, ',', replica, 'replica')
if 'names' in kwargs:
new_names = kwargs.get('names')
if len(new_names) != replica:
raise Exception('Names does not have the required length', replica)
else:
new_names = ls
print(replica, 'replica')
for i, item in enumerate(ls):
print(item)
sub_ls = []
for (dirpath, dirnames, filenames) in os.walk(path+'/'+item):
sub_ls.extend(dirnames)
break
for exc in sub_ls:
if fnmatch.fnmatch(exc, 'cfg*'):
sub_ls = list(set(sub_ls) - set(exc))
sub_ls.sort(key=lambda x: int(x[3:]))
no_cfg = len(sub_ls)
print(no_cfg, 'configurations')
if i == 0:
with open(path + '/' + item + '/' + sub_ls[0] + '/' + name) as fp:
for k, line in enumerate(fp):
if read == 1 and not line.strip() and k > start + 1:
break
if read == 1 and k >= start:
T += 1
if '[correlator]' in line:
read = 1
start = k + 7 + b2b
T -= b2b
deltas = []
for j in range(T):
deltas.append([])
sublength = len(sub_ls)
for j in range(T):
deltas[j].append(np.zeros(sublength))
for cnfg, subitem in enumerate(sub_ls):
with open(path + '/' + item + '/' + subitem + '/'+name) as fp:
for k, line in enumerate(fp):
if(k >= start and k < start + T):
floats = list(map(float, line.split()))
deltas[k-start][i][cnfg] = floats[1 + im - single]
result = []
for t in range(T):
result.append(Obs(deltas[t], new_names))
return result
def read_sfcf_c(path, prefix, name, **kwargs):
"""Read sfcf c format from given folder structure.
Keyword arguments
-----------------
im -- if True, read imaginary instead of real part of the correlation function.
single -- if True, read a boundary-to-boundary correlation function with a single value
b2b -- if True, read a time-dependent boundary-to-boundary correlation function
names -- Alternative labeling for replicas/ensembles. Has to have the appropriate length
"""
if kwargs.get('im'):
im = 1
part = 'imaginary'
else:
im = 0
part = 'real'
if kwargs.get('single'):
b2b = 1
single = 1
else:
b2b = 0
single = 0
if kwargs.get('b2b'):
b2b = 1
read = 0
T = 0
start = 0
ls = []
for (dirpath, dirnames, filenames) in os.walk(path):
ls.extend(dirnames)
break
if not ls:
print('Error, directory not found')
sys.exit()
# Exclude folders with different names
for exc in ls:
if not fnmatch.fnmatch(exc, prefix+'*'):
ls = list(set(ls) - set([exc]))
if len(ls) > 1:
ls.sort(key=lambda x: int(re.findall(r'\d+', x[len(prefix):])[0])) # New version, to cope with ids, etc.
replica = len(ls)
if 'names' in kwargs:
new_names = kwargs.get('names')
if len(new_names) != replica:
raise Exception('Names does not have the required length', replica)
else:
new_names = ls
print('Read', part, 'part of', name, 'from', prefix[:-1], ',', replica, 'replica')
for i, item in enumerate(ls):
sub_ls = []
for (dirpath, dirnames, filenames) in os.walk(path+'/'+item):
sub_ls.extend(filenames)
break
for exc in sub_ls:
if not fnmatch.fnmatch(exc, prefix+'*'):
sub_ls = list(set(sub_ls) - set([exc]))
sub_ls.sort(key=lambda x: int(re.findall(r'\d+', x)[-1]))
first_cfg = int(re.findall(r'\d+', sub_ls[0])[-1])
last_cfg = len(sub_ls) + first_cfg - 1
for cfg in range(1, len(sub_ls)):
if int(re.findall(r'\d+', sub_ls[cfg])[-1]) != first_cfg + cfg:
last_cfg = cfg + first_cfg - 1
break
no_cfg = last_cfg - first_cfg + 1
print(item, ':', no_cfg, 'evenly spaced configurations (', first_cfg, '-', last_cfg, ') ,', len(sub_ls) - no_cfg, 'configs omitted\n')
if i == 0:
read = 0
found = 0
with open(path+'/'+item+'/'+sub_ls[0]) as fp:
for k, line in enumerate(fp):
if 'quarks' in kwargs:
if found == 0 and read == 1:
if line.strip() == 'quarks ' + kwargs.get('quarks'):
found = 1
print('found', kwargs.get('quarks'))
else:
read = 0
if read == 1 and not line.strip():
break
if read == 1 and k >= start_read:
T += 1
if line.strip() == 'name '+name:
read = 1
start_read = k + 5 + b2b
print('T =', T, ', starting to read in line', start_read)
#TODO what to do if start_read was not found
if 'quarks' in kwargs:
if found == 0:
raise Exception(kwargs.get('quarks') + ' not found')
deltas = []
for j in range(T):
deltas.append([])
sublength = no_cfg
for j in range(T):
deltas[j].append(np.zeros(sublength))
for cfg in range(no_cfg):
with open(path+'/'+item+'/'+sub_ls[cfg]) as fp:
for k, line in enumerate(fp):
if k == start_read - 5 - b2b:
if line.strip() != 'name ' + name:
raise Exception('Wrong format', sub_ls[cfg])
if(k >= start_read and k < start_read + T):
floats = list(map(float, line.split()))
deltas[k-start_read][i][cfg] = floats[1 + im - single]
result = []
for t in range(T):
result.append(Obs(deltas[t], new_names))
return result
def read_qtop(path, prefix, **kwargs):
"""Read qtop format from given folder structure.
Keyword arguments
-----------------
target -- specifies the topological sector to be reweighted to (default 0)
full -- if true read the charge instead of the reweighting factor.
"""
if 'target' in kwargs:
target = kwargs.get('target')
else:
target = 0
if kwargs.get('full'):
full = 1
else:
full = 0
ls = []
for (dirpath, dirnames, filenames) in os.walk(path):
ls.extend(filenames)
break
if not ls:
print('Error, directory not found')
sys.exit()
# Exclude files with different names
for exc in ls:
if not fnmatch.fnmatch(exc, prefix+'*'):
ls = list(set(ls) - set([exc]))
if len(ls) > 1:
ls.sort(key=lambda x: int(re.findall(r'\d+', x[len(prefix):])[0])) # New version, to cope with ids, etc.
replica = len(ls)
print('Read Q_top from', prefix[:-1], ',', replica, 'replica')
deltas = []
for rep in range(replica):
tmp = []
with open(path+'/'+ls[rep]) as fp:
for k, line in enumerate(fp):
floats = list(map(float, line.split()))
if full == 1:
tmp.append(floats[1])
else:
if int(floats[1]) == target:
tmp.append(1.0)
else:
tmp.append(0.0)
deltas.append(np.array(tmp))
result = Obs(deltas, [(w.split('.'))[0] for w in ls])
return result
def read_rwms(path, prefix, **kwargs):
"""Read rwms format from given folder structure. Returns a list of length nrw
Keyword arguments
-----------------
new_format -- if True, the array of the associated numbers of Hasenbusch factors is extracted (v>=openQCD1.6)
r_start -- list which contains the first config to be read for each replicum
r_stop -- list which contains the last config to be read for each replicum
"""
if kwargs.get('new_format'):
extract_nfct = 1
else:
extract_nfct = 0
ls = []
for (dirpath, dirnames, filenames) in os.walk(path):
ls.extend(filenames)
break
if not ls:
print('Error, directory not found')
sys.exit()
# Exclude files with different names
for exc in ls:
if not fnmatch.fnmatch(exc, prefix + '*.dat'):
ls = list(set(ls) - set([exc]))
if len(ls) > 1:
ls.sort(key=lambda x: int(re.findall(r'\d+', x[len(prefix):])[0]))
replica = len(ls)
if 'r_start' in kwargs:
r_start = kwargs.get('r_start')
if len(r_start) != replica:
raise Exception('r_start does not match number of replicas')
# Adjust Configuration numbering to python index
r_start = [o - 1 if o else None for o in r_start]
else:
r_start = [None] * replica
if 'r_stop' in kwargs:
r_stop = kwargs.get('r_stop')
if len(r_stop) != replica:
raise Exception('r_stop does not match number of replicas')
else:
r_stop = [None] * replica
print('Read reweighting factors from', prefix[:-1], ',', replica, 'replica', end='')
print_err = 0
if 'print_err' in kwargs:
print_err = 1
print()
deltas = []
for rep in range(replica):
tmp_array = []
with open(path+ '/' + ls[rep], 'rb') as fp:
#header
t = fp.read(4) # number of reweighting factors
if rep == 0:
nrw = struct.unpack('i', t)[0]
for k in range(nrw):
deltas.append([])
else:
if nrw != struct.unpack('i', t)[0]:
print('Error: different number of reweighting factors for replicum', rep)
sys.exit()
for k in range(nrw):
tmp_array.append([])
# This block is necessary for openQCD1.6 ms1 files
nfct = []
if extract_nfct == 1:
for i in range(nrw):
t = fp.read(4)
nfct.append(struct.unpack('i', t)[0])
print('nfct: ', nfct) # Hasenbusch factor, 1 for rat reweighting
else:
for i in range(nrw):
nfct.append(1)
nsrc = []
for i in range(nrw):
t = fp.read(4)
nsrc.append(struct.unpack('i', t)[0])
#body
while 0 < 1:
t = fp.read(4)
if len(t) < 4:
break
if print_err:
config_no = struct.unpack('i', t)
for i in range(nrw):
tmp_nfct = 1.0
for j in range(nfct[i]):
t = fp.read(8 * nsrc[i])
t = fp.read(8 * nsrc[i])
tmp_rw = struct.unpack('d' * nsrc[i], t)
tmp_nfct *= np.mean(np.exp(-np.asarray(tmp_rw)))
if print_err:
print(config_no, i, j, np.mean(np.exp(-np.asarray(tmp_rw))), np.std(np.exp(-np.asarray(tmp_rw))))
print('Sources:', np.exp(-np.asarray(tmp_rw)))
print('Partial factor:', tmp_nfct)
tmp_array[i].append(tmp_nfct)
for k in range(nrw):
deltas[k].append(tmp_array[k][r_start[rep]:r_stop[rep]])
print(',', nrw, 'reweighting factors with', nsrc, 'sources')
result = []
for t in range(nrw):
result.append(Obs(deltas[t], [(w.split('.'))[0] for w in ls]))
return result
def read_pbp(path, prefix, **kwargs):
"""Read pbp format from given folder structure. Returns a list of length nrw
Keyword arguments
-----------------
r_start -- list which contains the first config to be read for each replicum
r_stop -- list which contains the last config to be read for each replicum
"""
extract_nfct = 1
ls = []
for (dirpath, dirnames, filenames) in os.walk(path):
ls.extend(filenames)
break
if not ls:
print('Error, directory not found')
sys.exit()
# Exclude files with different names
for exc in ls:
if not fnmatch.fnmatch(exc, prefix + '*.dat'):
ls = list(set(ls) - set([exc]))
if len(ls) > 1:
ls.sort(key=lambda x: int(re.findall(r'\d+', x[len(prefix):])[0]))
replica = len(ls)
if 'r_start' in kwargs:
r_start = kwargs.get('r_start')
if len(r_start) != replica:
raise Exception('r_start does not match number of replicas')
# Adjust Configuration numbering to python index
r_start = [o - 1 if o else None for o in r_start]
else:
r_start = [None] * replica
if 'r_stop' in kwargs:
r_stop = kwargs.get('r_stop')
if len(r_stop) != replica:
raise Exception('r_stop does not match number of replicas')
else:
r_stop = [None] * replica
print('Read <bar{psi}\psi> from', prefix[:-1], ',', replica, 'replica', end='')
print_err = 0
if 'print_err' in kwargs:
print_err = 1
print()
deltas = []
for rep in range(replica):
tmp_array = []
with open(path+ '/' + ls[rep], 'rb') as fp:
#header
t = fp.read(4) # number of reweighting factors
if rep == 0:
nrw = struct.unpack('i', t)[0]
for k in range(nrw):
deltas.append([])
else:
if nrw != struct.unpack('i', t)[0]:
print('Error: different number of reweighting factors for replicum', rep)
sys.exit()
for k in range(nrw):
tmp_array.append([])
# This block is necessary for openQCD1.6 ms1 files
nfct = []
if extract_nfct == 1:
for i in range(nrw):
t = fp.read(4)
nfct.append(struct.unpack('i', t)[0])
print('nfct: ', nfct) # Hasenbusch factor, 1 for rat reweighting
else:
for i in range(nrw):
nfct.append(1)
nsrc = []
for i in range(nrw):
t = fp.read(4)
nsrc.append(struct.unpack('i', t)[0])
#body
while 0 < 1:
t = fp.read(4)
if len(t) < 4:
break
if print_err:
config_no = struct.unpack('i', t)
for i in range(nrw):
tmp_nfct = 1.0
for j in range(nfct[i]):
t = fp.read(8 * nsrc[i])
t = fp.read(8 * nsrc[i])
tmp_rw = struct.unpack('d' * nsrc[i], t)
tmp_nfct *= np.mean(np.asarray(tmp_rw))
if print_err:
print(config_no, i, j, np.mean(np.asarray(tmp_rw)), np.std(np.asarray(tmp_rw)))
print('Sources:', np.asarray(tmp_rw))
print('Partial factor:', tmp_nfct)
tmp_array[i].append(tmp_nfct)
for k in range(nrw):
deltas[k].append(tmp_array[k][r_start[rep]:r_stop[rep]])
print(',', nrw, '<bar{psi}\psi> with', nsrc, 'sources')
result = []
for t in range(nrw):
result.append(Obs(deltas[t], [(w.split('.'))[0] for w in ls]))
return result
def extract_t0(path, prefix, dtr_read, xmin, spatial_extent, fit_range=5, **kwargs):
"""Extract t0 from given .ms.dat files. Returns t0 as Obs.
It is assumed that all boundary effects have sufficiently decayed at x0=xmin.
The data around the zero crossing of t^2<E> - 0.3 is fitted with a linear function
from which the exact root is extracted.
Only works with openQCD v 1.2.
Parameters
----------
path -- Path to .ms.dat files
prefix -- Ensemble prefix
dtr_read -- Determines how many trajectories should be skipped when reading the ms.dat files.
Corresponds to dtr_cnfg / dtr_ms in the openQCD input file.
xmin -- First timeslice where the boundary effects have sufficiently decayed.
spatial_extent -- spatial extent of the lattice, required for normalization.
fit_range -- Number of data points left and right of the zero crossing to be included in the linear fit. (Default: 5)
Keyword arguments
-----------------
r_start -- list which contains the first config to be read for each replicum.
r_stop -- list which contains the last config to be read for each replicum.
plaquette -- If true extract the plaquette estimate of t0 instead.
"""
ls = []
for (dirpath, dirnames, filenames) in os.walk(path):
ls.extend(filenames)
break
if not ls:
print('Error, directory not found')
sys.exit()
# Exclude files with different names
for exc in ls:
if not fnmatch.fnmatch(exc, prefix + '*.ms.dat'):
ls = list(set(ls) - set([exc]))
if len(ls) > 1:
ls.sort(key=lambda x: int(re.findall(r'\d+', x[len(prefix):])[0]))
replica = len(ls)
if 'r_start' in kwargs:
r_start = kwargs.get('r_start')
if len(r_start) != replica:
raise Exception('r_start does not match number of replicas')
# Adjust Configuration numbering to python index
r_start = [o - 1 if o else None for o in r_start]
else:
r_start = [None] * replica
if 'r_stop' in kwargs:
r_stop = kwargs.get('r_stop')
if len(r_stop) != replica:
raise Exception('r_stop does not match number of replicas')
else:
r_stop = [None] * replica
print('Extract t0 from', prefix, ',', replica, 'replica')
Ysum = []
for rep in range(replica):
with open(path + '/' + ls[rep], 'rb') as fp:
# Read header
t = fp.read(12)
header = struct.unpack('iii', t)
if rep == 0:
dn = header[0]
nn = header[1]
tmax = header[2]
elif dn != header[0] or nn != header[1] or tmax != header[2]:
raise Exception('Replica parameters do not match.')
t = fp.read(8)
if rep == 0:
eps = struct.unpack('d', t)[0]
print('Step size:', eps, ', Maximal t value:', dn * (nn) * eps)
elif eps != struct.unpack('d', t)[0]:
raise Exception('Values for eps do not match among replica.')
Ysl = []
# Read body
while 0 < 1:
t = fp.read(4)
if(len(t) < 4):
break
nc = struct.unpack('i', t)[0]
t = fp.read(8 * tmax * (nn + 1))
if kwargs.get('plaquette'):
if nc % dtr_read == 0:
Ysl.append(struct.unpack('d' * tmax * (nn + 1), t))
t = fp.read(8 * tmax * (nn + 1))
if not kwargs.get('plaquette'):
if nc % dtr_read == 0:
Ysl.append(struct.unpack('d' * tmax * (nn + 1), t))
t = fp.read(8 * tmax * (nn + 1))
Ysum.append([])
for i, item in enumerate(Ysl):
Ysum[-1].append([np.mean(item[current + xmin:current + tmax - xmin]) for current in range(0, len(item), tmax)])
t2E_dict = {}
for n in range(nn + 1):
samples = []
for nrep, rep in enumerate(Ysum):
samples.append([])
for cnfg in rep:
samples[-1].append(cnfg[n])
samples[-1] = samples[-1][r_start[nrep]:r_stop[nrep]]
new_obs = Obs(samples, [(w.split('.'))[0] for w in ls])
t2E_dict[n * dn * eps] = (n * dn * eps) ** 2 * new_obs / (spatial_extent ** 3) - 0.3
zero_crossing = np.argmax(np.array([o.value for o in t2E_dict.values()]) > 0.0)
x = list(t2E_dict.keys())[zero_crossing - fit_range: zero_crossing + fit_range]
y = list(t2E_dict.values())[zero_crossing - fit_range: zero_crossing + fit_range]
[o.gamma_method() for o in y]
fit_result = fit_lin(x, y)
return -fit_result[0] / fit_result[1]

160
pyerrors/jackknifing.py Normal file
View file

@ -0,0 +1,160 @@
#!/usr/bin/env python
# coding: utf-8
import pickle
import matplotlib.pyplot as plt
import numpy as np
def _jack_error(jack):
n = jack.size
mean = np.mean(jack)
error = 0
for i in range(n):
error += (jack[i] - mean) ** 2
return np.sqrt((n - 1) / n * error)
class Jack:
def __init__(self, value, jacks):
self.jacks = jacks
self.N = list(map(np.size, self.jacks))
self.max_binsize = len(self.N)
self.value = value #list(map(np.mean, self.jacks))
self.dvalue = list(map(_jack_error, self.jacks))
def print(self, **kwargs):
"""Print basic properties of the Jack."""
if 'binsize' in kwargs:
b = kwargs.get('binsize') - 1
if b == -1:
b = 0
if not isinstance(b, int):
raise TypeError('binsize has to be integer')
if b + 1 > self.max_binsize:
raise Exception('Chosen binsize not calculated')
else:
b = 0
print('Result:\t %3.8e +/- %3.8e +/- %3.8e (%3.3f%%)' % (self.value, self.dvalue[b], self.dvalue[b] * np.sqrt(2 * b / self.N[0]), np.abs(self.dvalue[b] / self.value * 100)))
def plot_tauint(self):
plt.xlabel('binsize')
plt.ylabel('tauint')
length = self.max_binsize
x = np.arange(length) + 1
plt.errorbar(x[:], (self.dvalue[:] / self.dvalue[0]) ** 2 / 2, yerr=np.sqrt(((2 * (self.dvalue[:] / self.dvalue[0]) ** 2 * np.sqrt(2 * x[:] / self.N[0])) / 2) ** 2
+ ((2 * (self.dvalue[:] / self.dvalue[0]) ** 2 * np.sqrt(2 / self.N[0])) / 2) ** 2), linewidth=1, capsize=2)
plt.xlim(0.5, length + 0.5)
plt.title('Tauint')
plt.show()
def plot_history(self):
N = self.N
x = np.arange(N)
tmp = []
for i in range(self.replicas):
tmp.append(self.deltas[i] + self.r_values[i])
y = np.concatenate(tmp, axis=0) # Think about including kwarg to look only at some replica
plt.errorbar(x, y, fmt='.', markersize=3)
plt.xlim(-0.5, N - 0.5)
plt.show()
def dump(self, name, **kwargs):
"""Dump the Jack to a pickle file 'name'.
Keyword arguments:
path -- specifies a custom path for the file (default '.')
"""
if 'path' in kwargs:
file_name = kwargs.get('path') + '/' + name + '.p'
else:
file_name = name + '.p'
with open(file_name, 'wb') as fb:
pickle.dump(self, fb)
def generate_jack(obs, **kwargs):
full_data = []
for r, name in enumerate(obs.names):
if r == 0:
full_data = obs.deltas[name] + obs.r_values[name]
else:
full_data = np.append(full_data, obs.deltas[name] + obs.r_values[name])
jacks = []
if 'max_binsize' in kwargs:
max_b = kwargs.get('max_binsize')
if not isinstance(max_b, int):
raise TypeError('max_binsize has to be integer')
else:
max_b = 1
for b in range(max_b):
#binning if necessary
if b > 0:
n = full_data.size // (b + 1)
binned_data = np.zeros(n)
for i in range(n):
for j in range(b + 1):
binned_data[i] += full_data[i * (b + 1) + j]
binned_data[i] /= (b + 1)
else:
binned_data = full_data
n = binned_data.size
#generate jacks from data
mean = np.mean(binned_data)
tmp_jacks = np.zeros(n)
#print(binned_data)
for i in range(n):
tmp_jacks[i] = (n * mean - binned_data[i]) / (n - 1)
jacks.append(tmp_jacks)
# Value is not correctly reproduced here
return Jack(obs.value, jacks)
def derived_jack(func, data, **kwargs):
"""Construct a derived Jack according to func(data, **kwargs).
Parameters
----------
func -- arbitrary function of the form func(data, **kwargs). For the automatic differentiation to work,
all numpy functions have to have the autograd wrapper (use 'import autograd.numpy as np').
data -- list of Jacks, e.g. [jack1, jack2, jack3].
Notes
-----
For simple mathematical operations it can be practical to use anonymous functions.
For the ratio of two jacks one can e.g. use
new_jack = derived_jack(lambda x : x[0] / x[1], [jack1, jack2])
"""
# Check shapes of data
if not all(x.N == data[0].N for x in data):
raise Exception('Error: Shape of data does not fit')
values = np.zeros(len(data))
for j, item in enumerate(data):
values[j] = item.value
new_value = func(values, **kwargs)
jacks = []
for b in range(data[0].max_binsize):
tmp_jacks = np.zeros(data[0].N[b])
for i in range(data[0].N[b]):
values = np.zeros(len(data))
for j, item in enumerate(data):
values[j] = item.jacks[b][i]
tmp_jacks[i] = func(values, **kwargs)
jacks.append(tmp_jacks)
return Jack(new_value, jacks)

347
pyerrors/linalg.py Normal file
View file

@ -0,0 +1,347 @@
#!/usr/bin/env python
# coding: utf-8
import numpy as np
import autograd.numpy as anp # Thinly-wrapped numpy
from .pyerrors import derived_observable
### This code block is directly taken from the current master branch of autograd and remains
# only until the new version is released on PyPi
from functools import partial
from autograd.extend import defvjp
_dot = partial(anp.einsum, '...ij,...jk->...ik')
# batched diag
_diag = lambda a: anp.eye(a.shape[-1])*a
# batched diagonal, similar to matrix_diag in tensorflow
def _matrix_diag(a):
reps = anp.array(a.shape)
reps[:-1] = 1
reps[-1] = a.shape[-1]
newshape = list(a.shape) + [a.shape[-1]]
return _diag(anp.tile(a, reps).reshape(newshape))
# https://arxiv.org/pdf/1701.00392.pdf Eq(4.77)
# Note the formula from Sec3.1 in https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf is incomplete
def grad_eig(ans, x):
"""Gradient of a general square (complex valued) matrix"""
e, u = ans # eigenvalues as 1d array, eigenvectors in columns
n = e.shape[-1]
def vjp(g):
ge, gu = g
ge = _matrix_diag(ge)
f = 1/(e[..., anp.newaxis, :] - e[..., :, anp.newaxis] + 1.e-20)
f -= _diag(f)
ut = anp.swapaxes(u, -1, -2)
r1 = f * _dot(ut, gu)
r2 = -f * (_dot(_dot(ut, anp.conj(u)), anp.real(_dot(ut, gu)) * anp.eye(n)))
r = _dot(_dot(anp.linalg.inv(ut), ge + r1 + r2), ut)
if not anp.iscomplexobj(x):
r = anp.real(r)
# the derivative is still complex for real input (imaginary delta is allowed), real output
# but the derivative should be real in real input case when imaginary delta is forbidden
return r
return vjp
defvjp(anp.linalg.eig, grad_eig)
### End of the code block from autograd.master
def scalar_mat_op(op, obs, **kwargs):
"""Computes the matrix to scalar operation op to a given matrix of Obs."""
def _mat(x, **kwargs):
dim = int(np.sqrt(len(x)))
if np.sqrt(len(x)) != dim:
raise Exception('Input has to have dim**2 entries')
mat = []
for i in range(dim):
row = []
for j in range(dim):
row.append(x[j + dim * i])
mat.append(row)
return op(anp.array(mat))
if isinstance(obs, np.ndarray):
raveled_obs = (1 * (obs.ravel())).tolist()
elif isinstance(obs, list):
raveled_obs = obs
else:
raise TypeError('Unproper type of input.')
return derived_observable(_mat, raveled_obs, **kwargs)
def mat_mat_op(op, obs, **kwargs):
"""Computes the matrix to matrix operation op to a given matrix of Obs."""
if kwargs.get('num_grad') is True:
return _num_diff_mat_mat_op(op, obs, **kwargs)
return derived_observable(lambda x, **kwargs: op(x), obs)
def eigh(obs, **kwargs):
"""Computes the eigenvalues and eigenvectors of a given hermitian matrix of Obs according to np.linalg.eigh."""
if kwargs.get('num_grad') is True:
return _num_diff_eigh(obs, **kwargs)
w = derived_observable(lambda x, **kwargs: anp.linalg.eigh(x)[0], obs)
v = derived_observable(lambda x, **kwargs: anp.linalg.eigh(x)[1], obs)
return w, v
def eig(obs, **kwargs):
"""Computes the eigenvalues of a given matrix of Obs according to np.linalg.eig."""
if kwargs.get('num_grad') is True:
return _num_diff_eig(obs, **kwargs)
# Note: Automatic differentiation of eig is implemented in the git of autograd
# but not yet released to PyPi (1.3)
w = derived_observable(lambda x, **kwargs: anp.real(anp.linalg.eig(x)[0]), obs)
return w
def pinv(obs, **kwargs):
"""Computes the Moore-Penrose pseudoinverse of a matrix of Obs."""
if kwargs.get('num_grad') is True:
return _num_diff_pinv(obs, **kwargs)
return derived_observable(lambda x, **kwargs: anp.linalg.pinv(x), obs)
def svd(obs, **kwargs):
"""Computes the singular value decomposition of a matrix of Obs."""
if kwargs.get('num_grad') is True:
return _num_diff_svd(obs, **kwargs)
u = derived_observable(lambda x, **kwargs: anp.linalg.svd(x, full_matrices=False)[0], obs)
s = derived_observable(lambda x, **kwargs: anp.linalg.svd(x, full_matrices=False)[1], obs)
vh = derived_observable(lambda x, **kwargs: anp.linalg.svd(x, full_matrices=False)[2], obs)
return (u, s, vh)
def slog_det(obs, **kwargs):
"""Computes the determinant of a matrix of Obs via np.linalg.slogdet."""
def _mat(x):
dim = int(np.sqrt(len(x)))
if np.sqrt(len(x)) != dim:
raise Exception('Input has to have dim**2 entries')
mat = []
for i in range(dim):
row = []
for j in range(dim):
row.append(x[j + dim * i])
mat.append(row)
(sign, logdet) = anp.linalg.slogdet(np.array(mat))
return sign * anp.exp(logdet)
if isinstance(obs, np.ndarray):
return derived_observable(_mat, (1 * (obs.ravel())).tolist(), **kwargs)
elif isinstance(obs, list):
return derived_observable(_mat, obs, **kwargs)
else:
raise TypeError('Unproper type of input.')
# Variants for numerical differentiation
def _num_diff_mat_mat_op(op, obs, **kwargs):
"""Computes the matrix to matrix operation op to a given matrix of Obs elementwise
which is suitable for numerical differentiation."""
def _mat(x, **kwargs):
dim = int(np.sqrt(len(x)))
if np.sqrt(len(x)) != dim:
raise Exception('Input has to have dim**2 entries')
mat = []
for i in range(dim):
row = []
for j in range(dim):
row.append(x[j + dim * i])
mat.append(row)
return op(np.array(mat))[kwargs.get('i')][kwargs.get('j')]
if isinstance(obs, np.ndarray):
raveled_obs = (1 * (obs.ravel())).tolist()
elif isinstance(obs, list):
raveled_obs = obs
else:
raise TypeError('Unproper type of input.')
dim = int(np.sqrt(len(raveled_obs)))
res_mat = []
for i in range(dim):
row = []
for j in range(dim):
row.append(derived_observable(_mat, raveled_obs, i=i, j=j, **kwargs))
res_mat.append(row)
return np.array(res_mat) @ np.identity(dim)
def _num_diff_eigh(obs, **kwargs):
"""Computes the eigenvalues and eigenvectors of a given hermitian matrix of Obs according to np.linalg.eigh
elementwise which is suitable for numerical differentiation."""
def _mat(x, **kwargs):
dim = int(np.sqrt(len(x)))
if np.sqrt(len(x)) != dim:
raise Exception('Input has to have dim**2 entries')
mat = []
for i in range(dim):
row = []
for j in range(dim):
row.append(x[j + dim * i])
mat.append(row)
n = kwargs.get('n')
res = np.linalg.eigh(np.array(mat))[n]
if n == 0:
return res[kwargs.get('i')]
else:
return res[kwargs.get('i')][kwargs.get('j')]
if isinstance(obs, np.ndarray):
raveled_obs = (1 * (obs.ravel())).tolist()
elif isinstance(obs, list):
raveled_obs = obs
else:
raise TypeError('Unproper type of input.')
dim = int(np.sqrt(len(raveled_obs)))
res_vec = []
for i in range(dim):
res_vec.append(derived_observable(_mat, raveled_obs, n=0, i=i, **kwargs))
res_mat = []
for i in range(dim):
row = []
for j in range(dim):
row.append(derived_observable(_mat, raveled_obs, n=1, i=i, j=j, **kwargs))
res_mat.append(row)
return (np.array(res_vec) @ np.identity(dim), np.array(res_mat) @ np.identity(dim))
def _num_diff_eig(obs, **kwargs):
"""Computes the eigenvalues of a given matrix of Obs according to np.linalg.eig
elementwise which is suitable for numerical differentiation."""
def _mat(x, **kwargs):
dim = int(np.sqrt(len(x)))
if np.sqrt(len(x)) != dim:
raise Exception('Input has to have dim**2 entries')
mat = []
for i in range(dim):
row = []
for j in range(dim):
row.append(x[j + dim * i])
mat.append(row)
n = kwargs.get('n')
res = np.linalg.eig(np.array(mat))[n]
if n == 0:
# Discard imaginary part of eigenvalue here
return np.real(res[kwargs.get('i')])
else:
return res[kwargs.get('i')][kwargs.get('j')]
if isinstance(obs, np.ndarray):
raveled_obs = (1 * (obs.ravel())).tolist()
elif isinstance(obs, list):
raveled_obs = obs
else:
raise TypeError('Unproper type of input.')
dim = int(np.sqrt(len(raveled_obs)))
res_vec = []
for i in range(dim):
# Note: Automatic differentiation of eig is implemented in the git of autograd
# but not yet released to PyPi (1.3)
res_vec.append(derived_observable(_mat, raveled_obs, n=0, i=i, **kwargs))
return np.array(res_vec) @ np.identity(dim)
def _num_diff_pinv(obs, **kwargs):
"""Computes the Moore-Penrose pseudoinverse of a matrix of Obs elementwise which is suitable
for numerical differentiation."""
def _mat(x, **kwargs):
shape = kwargs.get('shape')
mat = []
for i in range(shape[0]):
row = []
for j in range(shape[1]):
row.append(x[j + shape[1] * i])
mat.append(row)
return np.linalg.pinv(np.array(mat))[kwargs.get('i')][kwargs.get('j')]
if isinstance(obs, np.ndarray):
shape = obs.shape
raveled_obs = (1 * (obs.ravel())).tolist()
else:
raise TypeError('Unproper type of input.')
res_mat = []
for i in range(shape[1]):
row = []
for j in range(shape[0]):
row.append(derived_observable(_mat, raveled_obs, shape=shape, i=i, j=j, **kwargs))
res_mat.append(row)
return np.array(res_mat) @ np.identity(shape[0])
def _num_diff_svd(obs, **kwargs):
"""Computes the singular value decomposition of a matrix of Obs elementwise which
is suitable for numerical differentiation."""
def _mat(x, **kwargs):
shape = kwargs.get('shape')
mat = []
for i in range(shape[0]):
row = []
for j in range(shape[1]):
row.append(x[j + shape[1] * i])
mat.append(row)
res = np.linalg.svd(np.array(mat), full_matrices=False)
if kwargs.get('n') == 1:
return res[1][kwargs.get('i')]
else:
return res[kwargs.get('n')][kwargs.get('i')][kwargs.get('j')]
if isinstance(obs, np.ndarray):
shape = obs.shape
raveled_obs = (1 * (obs.ravel())).tolist()
else:
raise TypeError('Unproper type of input.')
mid_index = min(shape[0], shape[1])
res_mat0 = []
for i in range(shape[0]):
row = []
for j in range(mid_index):
row.append(derived_observable(_mat, raveled_obs, shape=shape, n=0, i=i, j=j, **kwargs))
res_mat0.append(row)
res_mat1 = []
for i in range(mid_index):
res_mat1.append(derived_observable(_mat, raveled_obs, shape=shape, n=1, i=i, **kwargs))
res_mat2 = []
for i in range(mid_index):
row = []
for j in range(shape[1]):
row.append(derived_observable(_mat, raveled_obs, shape=shape, n=2, i=i, j=j, **kwargs))
res_mat2.append(row)
return (np.array(res_mat0) @ np.identity(mid_index), np.array(res_mat1) @ np.identity(mid_index), np.array(res_mat2) @ np.identity(shape[1]))

84
pyerrors/misc.py Normal file
View file

@ -0,0 +1,84 @@
#!/usr/bin/env python
# coding: utf-8
import gc
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
from .pyerrors import Obs
def gen_correlated_data(means, cov, name, tau=0.5, samples=1000):
""" Generate observables with given covariance and autocorrelation times.
Arguments
-----------------
means -- list containing the mean value of each observable.
cov -- covariance matrix for the data to be geneated.
name -- ensemble name for the data to be geneated.
tau -- can either be a real number or a list with an entry for
every dataset.
samples -- number of samples to be generated for each observable.
"""
assert len(means) == cov.shape[-1]
tau = np.asarray(tau)
if np.min(tau) < 0.5:
raise Exception('All integrated autocorrelations have to be >= 0.5.')
a = (2 * tau - 1) / (2 * tau + 1)
rand = np.random.multivariate_normal(np.zeros_like(means), cov * samples, samples)
# Normalize samples such that sample variance matches input
norm = np.array([np.var(o, ddof=1) / samples for o in rand.T])
rand = rand @ np.diag(np.sqrt(np.diag(cov))) @ np.diag(1 / np.sqrt(norm))
data = [rand[0]]
for i in range(1, samples):
data.append(np.sqrt(1 - a ** 2) * rand[i] + a * data[-1])
corr_data = np.array(data) - np.mean(data, axis=0) + means
return [Obs([dat], [name]) for dat in corr_data.T]
def ks_test(obs=None):
"""Performs a KolmogorovSmirnov test for the Q-values of a list of Obs.
If no list is given all Obs in memory are used.
Disclaimer: The determination of the individual Q-values as well as this function have not been tested yet.
"""
if obs is None:
obs_list = []
for obj in gc.get_objects():
if isinstance(obj, Obs):
obs_list.append(obj)
else:
obs_list = obs
Qs = []
for obs_i in obs_list:
for ens in obs_i.e_names:
if obs_i.e_Q[ens] is not None:
Qs.append(obs_i.e_Q[ens])
bins = len(Qs)
x = np.arange(0, 1.001, 0.001)
plt.plot(x, x, 'k', zorder=1)
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.xlabel('Q value')
plt.ylabel('Cumulative probability')
plt.title(str(bins) + ' Q values')
n = np.arange(1, bins + 1) / np.float(bins)
Xs = np.sort(Qs)
plt.step(Xs, n)
diffs = n - Xs
loc_max_diff = np.argmax(np.abs(diffs))
loc = Xs[loc_max_diff]
plt.annotate(s='', xy=(loc, loc), xytext=(loc, loc + diffs[loc_max_diff]), arrowprops=dict(arrowstyle='<->', shrinkA=0, shrinkB=0))
plt.show()
print(scipy.stats.kstest(Qs, 'uniform'))

112
pyerrors/mpm.py Normal file
View file

@ -0,0 +1,112 @@
#!/usr/bin/env python
# coding: utf-8
import numpy as np
import scipy.linalg
from .pyerrors import Obs
from .linalg import svd, eig, pinv
def matrix_pencil_method(corrs, k=1, p=None, **kwargs):
""" Matrix pencil method to extract k energy levels from data
Implementation of the matrix pencil method based on
eq. (2.17) of Y. Hua, T. K. Sarkar, IEEE Trans. Acoust. 38, 814-824 (1990)
Parameters
----------
data -- can be a list of Obs for the analysis of a single correlator, or a list of lists
of Obs if several correlators are to analyzed at once.
k -- Number of states to extract (default 1).
p -- matrix pencil parameter which filters noise. The optimal value is expected between
len(data)/3 and 2*len(data)/3. The computation is more expensive the closer p is
to len(data)/2 but could possibly suppress more noise (default len(data)//2).
"""
if isinstance(corrs[0], Obs):
data = [corrs]
else:
data = corrs
lengths = [len(d) for d in data]
if lengths.count(lengths[0]) != len(lengths):
raise Exception('All datasets have to have the same length.')
data_sets = len(data)
n_data = len(data[0])
if p is None:
p = max(n_data // 2, k)
if n_data <= p:
raise Exception('The pencil p has to be smaller than the number of data samples.')
if p < k or n_data - p < k:
raise Exception('Cannot extract', k, 'energy levels with p=', p, 'and N-p=', n_data - p)
# Construct the hankel matrices
matrix = []
for n in range(data_sets):
matrix.append(scipy.linalg.hankel(data[n][:n_data-p], data[n][n_data-p-1:]))
matrix = np.array(matrix)
# Construct y1 and y2
y1 = np.concatenate(matrix[:, :, :p])
y2 = np.concatenate(matrix[:, :, 1:])
# Apply SVD to y2
u, s, vh = svd(y2, **kwargs)
# Construct z from y1 and SVD of y2, setting all singular values beyond the kth to zero
z = np.diag(1. / s[:k]) @ u[:, :k].T @ y1 @ vh.T[:, :k]
# Return the sorted logarithms of the real eigenvalues as Obs
energy_levels = np.log(np.abs(eig(z, **kwargs)))
return sorted(energy_levels, key=lambda x: abs(x.value))
def matrix_pencil_method_old(data, p, noise_level=None, verbose=1, **kwargs):
""" Older impleentation of the matrix pencil method with pencil p on given data to
extract energy levels.
Parameters
----------
data -- lists of Obs, where the nth entry is considered to be the correlation function
at x0=n+offset.
p -- matrix pencil parameter which corresponds to the number of energy levels to extract.
higher values for p can help decreasing noise.
noise_level -- If this argument is not None an additional prefiltering via singular
value decomposition is performed in which all singular values below 10^(-noise_level)
times the largest singular value are discarded. This increases the computation time.
verbose -- if larger than zero details about the noise filtering are printed to stdout
(default 1)
"""
n_data = len(data)
if n_data <= p:
raise Exception('The pencil p has to be smaller than the number of data samples.')
matrix = scipy.linalg.hankel(data[:n_data-p], data[n_data-p-1:]) @ np.identity(p + 1)
if noise_level is not None:
u, s, vh = svd(matrix)
s_values = np.vectorize(lambda x: x.value)(s)
if verbose > 0:
print('Singular values: ', s_values)
digit = np.argwhere(s_values / s_values[0] < 10.0**(-noise_level))
if digit.size == 0:
digit = len(s_values)
else:
digit = int(digit[0])
if verbose > 0:
print('Consider only', digit, 'out of', len(s), 'singular values')
new_matrix = u[:, :digit] * s[:digit] @ vh[:digit, :]
y1 = new_matrix[:, :-1]
y2 = new_matrix[:, 1:]
else:
y1 = matrix[:, :-1]
y2 = matrix[:, 1:]
# MoorePenrose pseudoinverse
pinv_y1 = pinv(y1)
# Note: Automatic differentiation of eig is implemented in the git of autograd
# but not yet released to PyPi (1.3). The code is currently part of pyerrors
e = eig((pinv_y1 @ y2), **kwargs)
energy_levels = -np.log(np.abs(e))
return sorted(energy_levels, key=lambda x: abs(x.value))

1222
pyerrors/pyerrors.py Normal file

File diff suppressed because it is too large Load diff

4
pytest.ini Normal file
View file

@ -0,0 +1,4 @@
[pytest]
filterwarnings =
ignore::RuntimeWarning:autograd.*:
ignore::RuntimeWarning:numdifftools.*:

13
setup.py Normal file
View file

@ -0,0 +1,13 @@
#!/usr/bin/env python
from setuptools import setup, find_packages
setup(name='pyerrors',
version='1.0.0',
description='Error analysis for lattice QCD',
author='Fabian Joswig',
author_email='fabian.joswig@wwu.de',
packages=find_packages(),
python_requires='>=3.5.0',
install_requires=['numpy>=1.16', 'autograd>=1.2', 'numdifftools', 'matplotlib', 'scipy', 'iminuit']
)

339
tests/test_pyerrors.py Normal file
View file

@ -0,0 +1,339 @@
import sys
sys.path.append('..')
import autograd.numpy as np
import os
import random
import math
import string
import copy
import scipy.optimize
from scipy.odr import ODR, Model, Data, RealData
import pyerrors as pe
import pytest
test_iterations = 100
def test_dump():
value = np.random.normal(5, 10)
dvalue = np.abs(np.random.normal(0, 1))
test_obs = pe.pseudo_Obs(value, dvalue, 't')
test_obs.dump('test_dump')
new_obs = pe.load_object('test_dump.p')
os.remove('test_dump.p')
assert test_obs.deltas['t'].all() == new_obs.deltas['t'].all()
def test_comparison():
value1 = np.random.normal(0, 100)
test_obs1 = pe.pseudo_Obs(value1, 0.1, 't')
value2 = np.random.normal(0, 100)
test_obs2 = pe.pseudo_Obs(value2, 0.1, 't')
assert (value1 > value2) == (test_obs1 > test_obs2)
assert (value1 < value2) == (test_obs1 < test_obs2)
def test_man_grad():
a = pe.pseudo_Obs(17,2.9,'e1')
b = pe.pseudo_Obs(4,0.8,'e1')
fs = [lambda x: x[0] + x[1], lambda x: x[1] + x[0], lambda x: x[0] - x[1], lambda x: x[1] - x[0],
lambda x: x[0] * x[1], lambda x: x[1] * x[0], lambda x: x[0] / x[1], lambda x: x[1] / x[0],
lambda x: np.exp(x[0]), lambda x: np.sin(x[0]), lambda x: np.cos(x[0]), lambda x: np.tan(x[0]),
lambda x: np.log(x[0]), lambda x: np.sqrt(x[0]),
lambda x: np.sinh(x[0]), lambda x: np.cosh(x[0]), lambda x: np.tanh(x[0])]
for i, f in enumerate(fs):
t1 = f([a,b])
t2 = pe.derived_observable(f, [a,b])
c = t2 - t1
assert c.value == 0.0, str(i)
assert np.all(np.abs(c.deltas['e1']) < 1e-14), str(i)
def test_overloading_vectorization():
a = np.array([5, 4, 8])
b = pe.pseudo_Obs(4,0.8,'e1')
assert [o.value for o in a * b] == [o.value for o in b * a]
assert [o.value for o in a + b] == [o.value for o in b + a]
assert [o.value for o in a - b] == [-1 * o.value for o in b - a]
assert [o.value for o in a / b] == [o.value for o in [p / b for p in a]]
assert [o.value for o in b / a] == [o.value for o in [b / p for p in a]]
@pytest.mark.parametrize("n", np.arange(test_iterations // 10))
def test_covariance_is_variance(n):
value = np.random.normal(5, 10)
dvalue = np.abs(np.random.normal(0, 1))
test_obs = pe.pseudo_Obs(value, dvalue, 't')
test_obs.gamma_method()
assert np.abs(test_obs.dvalue ** 2 - pe.covariance(test_obs, test_obs)) <= 10 * np.finfo(np.float).eps
test_obs = test_obs + pe.pseudo_Obs(value, dvalue, 'q', 200)
test_obs.gamma_method(e_tag=0)
assert np.abs(test_obs.dvalue ** 2 - pe.covariance(test_obs, test_obs)) <= 10 * np.finfo(np.float).eps
@pytest.mark.parametrize("n", np.arange(test_iterations // 10))
def test_fft(n):
value = np.random.normal(5, 100)
dvalue = np.abs(np.random.normal(0, 5))
test_obs1 = pe.pseudo_Obs(value, dvalue, 't', int(500 + 1000 * np.random.rand()))
test_obs2 = copy.deepcopy(test_obs1)
test_obs1.gamma_method()
test_obs2.gamma_method(fft=False)
assert max(np.abs(test_obs1.e_rho[''] - test_obs2.e_rho[''])) <= 10 * np.finfo(np.float).eps
assert np.abs(test_obs1.dvalue - test_obs2.dvalue) <= 10 * max(test_obs1.dvalue, test_obs2.dvalue) * np.finfo(np.float).eps
@pytest.mark.parametrize('n', np.arange(test_iterations // 10))
def test_standard_fit(n):
dim = 10 + int(30 * np.random.rand())
x = np.arange(dim)
y = 2 * np.exp(-0.06 * x) + np.random.normal(0.0, 0.15, dim)
yerr = 0.1 + 0.1 * np.random.rand(dim)
oy = []
for i, item in enumerate(x):
oy.append(pe.pseudo_Obs(y[i], yerr[i], str(i)))
def f(x, a, b):
return a * np.exp(-b * x)
popt, pcov = scipy.optimize.curve_fit(f, x, y, sigma=[o.dvalue for o in oy], absolute_sigma=True)
def func(a, x):
y = a[0] * np.exp(-a[1] * x)
return y
beta = pe.fits.standard_fit(x, oy, func)
pe.Obs.e_tag_global = 5
for i in range(2):
beta[i].gamma_method(e_tag=5, S=1.0)
assert math.isclose(beta[i].value, popt[i], abs_tol=1e-5)
assert math.isclose(pcov[i, i], beta[i].dvalue ** 2, abs_tol=1e-3)
assert math.isclose(pe.covariance(beta[0], beta[1]), pcov[0, 1], abs_tol=1e-3)
pe.Obs.e_tag_global = 0
chi2_pyerrors = np.sum(((f(x, *[o.value for o in beta]) - y) / yerr) ** 2) / (len(x) - 2)
chi2_scipy = np.sum(((f(x, *popt) - y) / yerr) ** 2) / (len(x) - 2)
assert math.isclose(chi2_pyerrors, chi2_scipy, abs_tol=1e-10)
@pytest.mark.parametrize('n', np.arange(test_iterations // 10))
def test_odr_fit(n):
dim = 10 + int(30 * np.random.rand())
x = np.arange(dim) + np.random.normal(0.0, 0.15, dim)
xerr = 0.1 + 0.1 * np.random.rand(dim)
y = 2 * np.exp(-0.06 * x) + np.random.normal(0.0, 0.15, dim)
yerr = 0.1 + 0.1 * np.random.rand(dim)
ox = []
for i, item in enumerate(x):
ox.append(pe.pseudo_Obs(x[i], xerr[i], str(i)))
oy = []
for i, item in enumerate(x):
oy.append(pe.pseudo_Obs(y[i], yerr[i], str(i)))
def f(x, a, b):
return a * np.exp(-b * x)
def func(a, x):
y = a[0] * np.exp(-a[1] * x)
return y
data = RealData([o.value for o in ox], [o.value for o in oy], sx=[o.dvalue for o in ox], sy=[o.dvalue for o in oy])
model = Model(func)
odr = ODR(data, model, [0,0], partol=np.finfo(np.float).eps)
odr.set_job(fit_type=0, deriv=1)
output = odr.run()
beta = pe.fits.odr_fit(ox, oy, func)
pe.Obs.e_tag_global = 5
for i in range(2):
beta[i].gamma_method(e_tag=5, S=1.0)
assert math.isclose(beta[i].value, output.beta[i], rel_tol=1e-5)
assert math.isclose(output.cov_beta[i,i], beta[i].dvalue**2, rel_tol=2.5e-1), str(output.cov_beta[i,i]) + ' ' + str(beta[i].dvalue**2)
assert math.isclose(pe.covariance(beta[0], beta[1]), output.cov_beta[0,1], rel_tol=2.5e-1)
pe.Obs.e_tag_global = 0
@pytest.mark.parametrize('n', np.arange(test_iterations // 10))
def test_odr_derivatives(n):
x = []
y = []
x_err = 0.01
y_err = 0.01
for n in np.arange(1, 9, 2):
loc_xvalue = n + np.random.normal(0.0, x_err)
x.append(pe.pseudo_Obs(loc_xvalue, x_err, str(n)))
y.append(pe.pseudo_Obs((lambda x: x ** 2 - 1)(loc_xvalue) +
np.random.normal(0.0, y_err), y_err, str(n)))
def func(a, x):
return a[0] + a[1] * x ** 2
fit1 = pe.fits.odr_fit(x, y, func)
tfit = pe.fits.fit_general(x, y, func, base_step=0.1, step_ratio=1.1, num_steps=20)
assert np.abs(np.max(np.array(list(fit1[1].deltas.values()))
- np.array(list(tfit[1].deltas.values())))) < 10e-8
@pytest.mark.parametrize('n', np.arange(test_iterations))
def test_covariance_symmetry(n):
value1 = np.random.normal(5, 10)
dvalue1 = np.abs(np.random.normal(0, 1))
test_obs1 = pe.pseudo_Obs(value1, dvalue1, 't')
test_obs1.gamma_method()
value2 = np.random.normal(5, 10)
dvalue2 = np.abs(np.random.normal(0, 1))
test_obs2 = pe.pseudo_Obs(value2, dvalue2, 't')
test_obs2.gamma_method()
cov_ab = pe.covariance(test_obs1, test_obs2)
cov_ba = pe.covariance(test_obs2, test_obs1)
assert np.abs(cov_ab - cov_ba) <= 10 * np.finfo(np.float).eps
assert np.abs(cov_ab) < test_obs1.dvalue * test_obs2.dvalue * (1 + 10 * np.finfo(np.float).eps)
@pytest.mark.parametrize('n', np.arange(test_iterations))
def test_gamma_method(n):
# Construct pseudo Obs with random shape
value = np.random.normal(5, 10)
dvalue = np.abs(np.random.normal(0, 1))
test_obs = pe.pseudo_Obs(value, dvalue, 't', int(1000 * (1 + np.random.rand())))
# Test if the error is processed correctly
test_obs.gamma_method(e_tag=1)
assert np.abs(test_obs.value - value) < 1e-12
assert abs(test_obs.dvalue - dvalue) < 1e-10 * dvalue
@pytest.mark.parametrize('n', np.arange(test_iterations))
def test_overloading(n):
# Construct pseudo Obs with random shape
obs_list = []
for i in range(5):
value = np.abs(np.random.normal(5, 2)) + 2.0
dvalue = np.abs(np.random.normal(0, 0.1)) + 1e-5
obs_list.append(pe.pseudo_Obs(value, dvalue, 't', 2000))
# Test if the error is processed correctly
def f(x):
return x[0] * x[1] + np.sin(x[2]) * np.exp(x[3] / x[1] / x[0]) - np.sqrt(2) / np.cosh(x[4] / x[0])
o_obs = f(obs_list)
d_obs = pe.derived_observable(f, obs_list)
assert np.max(np.abs((o_obs.deltas['t'] - d_obs.deltas['t']) / o_obs.deltas['t'])) < 1e-7, str(obs_list)
assert np.abs((o_obs.value - d_obs.value) / o_obs.value) < 1e-10
@pytest.mark.parametrize('n', np.arange(test_iterations))
def test_derived_observables(n):
# Construct pseudo Obs with random shape
test_obs = pe.pseudo_Obs(2, 0.1 * (1 + np.random.rand()), 't', int(1000 * (1 + np.random.rand())))
# Check if autograd and numgrad give the same result
d_Obs_ad = pe.derived_observable(lambda x, **kwargs: x[0] * x[1] * np.sin(x[0] * x[1]), [test_obs, test_obs])
d_Obs_ad.gamma_method()
d_Obs_fd = pe.derived_observable(lambda x, **kwargs: x[0] * x[1] * np.sin(x[0] * x[1]), [test_obs, test_obs], num_grad=True)
d_Obs_fd.gamma_method()
assert d_Obs_ad.value == d_Obs_fd.value
assert np.abs(4.0 * np.sin(4.0) - d_Obs_ad.value) < 1000 * np.finfo(np.float).eps * np.abs(d_Obs_ad.value)
assert np.abs(d_Obs_ad.dvalue-d_Obs_fd.dvalue) < 1000 * np.finfo(np.float).eps * d_Obs_ad.dvalue
i_am_one = pe.derived_observable(lambda x, **kwargs: x[0] / x[1], [d_Obs_ad, d_Obs_ad])
i_am_one.gamma_method(e_tag=1)
assert i_am_one.value == 1.0
assert i_am_one.dvalue < 2 * np.finfo(np.float).eps
assert i_am_one.e_dvalue['t'] <= 2 * np.finfo(np.float).eps
assert i_am_one.e_ddvalue['t'] <= 2 * np.finfo(np.float).eps
@pytest.mark.parametrize('n', np.arange(test_iterations // 10))
def test_multi_ens_system(n):
names = []
for i in range(100 + int(np.random.rand() * 50)):
tmp_string = ''
for _ in range(int(2 + np.random.rand() * 4)):
tmp_string += random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits)
names.append(tmp_string)
names = list(set(names))
samples = [np.random.rand(5)] * len(names)
new_obs = pe.Obs(samples, names)
for e_tag_length in range(1, 6):
new_obs.gamma_method(e_tag=e_tag_length)
e_names = sorted(set([n[:e_tag_length] for n in names]))
assert e_names == new_obs.e_names
assert sorted(x for y in sorted(new_obs.e_content.values()) for x in y) == sorted(new_obs.names)
@pytest.mark.parametrize('n', np.arange(test_iterations))
def test_overloaded_functions(n):
funcs = [np.exp, np.log, np.sin, np.cos, np.tan, np.sinh, np.cosh, np.arcsinh, np.arccosh]
deriv = [np.exp, lambda x: 1 / x, np.cos, lambda x: -np.sin(x), lambda x: 1 / np.cos(x) ** 2, np.cosh, np.sinh, lambda x: 1 / np.sqrt(x ** 2 + 1), lambda x: 1 / np.sqrt(x ** 2 - 1)]
val = 3 + 0.5 * np.random.rand()
dval = 0.3 + 0.4 * np.random.rand()
test_obs = pe.pseudo_Obs(val, dval, 't', int(1000 * (1 + np.random.rand())))
for i, item in enumerate(funcs):
ad_obs = item(test_obs)
fd_obs = pe.derived_observable(lambda x, **kwargs: item(x[0]), [test_obs], num_grad=True)
ad_obs.gamma_method(S=0.01, e_tag=1)
assert np.max((ad_obs.deltas['t'] - fd_obs.deltas['t']) / ad_obs.deltas['t']) < 1e-8, item.__name__
assert np.abs((ad_obs.value - item(val)) / ad_obs.value) < 1e-10, item.__name__
assert np.abs(ad_obs.dvalue - dval * np.abs(deriv[i](val))) < 1e-6, item.__name__
@pytest.mark.parametrize('n', np.arange(test_iterations // 10))
def test_matrix_functions(n):
dim = 3 + int(4 * np.random.rand())
print(dim)
matrix = []
for i in range(dim):
row = []
for j in range(dim):
row.append(pe.pseudo_Obs(np.random.rand(), 0.2 + 0.1 * np.random.rand(), 'e1'))
matrix.append(row)
matrix = np.array(matrix) @ np.identity(dim)
# Check inverse of matrix
inv = pe.linalg.mat_mat_op(np.linalg.inv, matrix)
check_inv = matrix @ inv
for (i, j), entry in np.ndenumerate(check_inv):
entry.gamma_method()
if(i == j):
assert math.isclose(entry.value, 1.0, abs_tol=1e-9), 'value ' + str(i) + ',' + str(j) + ' ' + str(entry.value)
else:
assert math.isclose(entry.value, 0.0, abs_tol=1e-9), 'value ' + str(i) + ',' + str(j) + ' ' + str(entry.value)
assert math.isclose(entry.dvalue, 0.0, abs_tol=1e-9), 'dvalue ' + str(i) + ',' + str(j) + ' ' + str(entry.dvalue)
# Check Cholesky decomposition
sym = np.dot(matrix, matrix.T)
cholesky = pe.linalg.mat_mat_op(np.linalg.cholesky, sym)
check = cholesky @ cholesky.T
for (i, j), entry in np.ndenumerate(check):
diff = entry - sym[i, j]
diff.gamma_method()
assert math.isclose(diff.value, 0.0, abs_tol=1e-9), 'value ' + str(i) + ',' + str(j)
assert math.isclose(diff.dvalue, 0.0, abs_tol=1e-9), 'dvalue ' + str(i) + ',' + str(j)
# Check eigh
e, v = pe.linalg.eigh(sym)
for i in range(dim):
tmp = sym @ v[:, i] - v[:, i] * e[i]
for j in range(dim):
tmp[j].gamma_method()
assert math.isclose(tmp[j].value, 0.0, abs_tol=1e-9), 'value ' + str(i) + ',' + str(j)
assert math.isclose(tmp[j].dvalue, 0.0, abs_tol=1e-9), 'dvalue ' + str(i) + ',' + str(j)