Documentation of the JSON format

This commit is contained in:
Simon Kuberski 2022-02-21 18:33:06 +01:00
parent dce7d4bec8
commit 04533c2c2f

View file

@ -187,7 +187,7 @@ obs3.details()
```
`Obs` objects defined on regular and irregular histories of the same ensemble can be computed with each other and the correct error propagation and estimation is automatically taken care of.
`Obs` objects defined on regular and irregular histories of the same ensemble can be combined with each other and the correct error propagation and estimation is automatically taken care of.
**Warning:** Irregular Monte Carlo chains can result in odd patterns in the autocorrelation functions.
Make sure to check the autocorrelation time with e.g. `pyerrors.obs.Obs.plot_rho` or `pyerrors.obs.Obs.plot_tauint`.
@ -339,7 +339,47 @@ For the full API see `pyerrors.linalg`.
# Export data
The preferred exported file format within `pyerrors` is json.gz. The exact specifications of this formats will be listed here soon.
The preferred exported file format within `pyerrors` is json.gz. Files written to this format are valid JSON files that have been compressed using gzip. The structure of the content is inspired by the dobs format of the ALPHA collaboration. The aim of the format is to facilitate the storage of data in a self-contained way such that, even years after the creation of the file, it is possible to extract all necessary information:
- What observables are stored? Possibly: How exactly are they defined.
- How does each single ensemble or external quantity contribute to the error of the observable?
- Who did write the file when and on which machine?
This can be achieved by storing all information in on single file. The export routines of `pyerrors` are written such that as much information is written automatically. The first entries of the file provide optional auxiliary information:
- `program` is a string that indicates which program was used to write the file.
- `version` is a string that specifies the version of the format.
- `who' is a string that specifies the user name of the creator of the file.
- `date` is a string and contains the creation date of the file.
- `host` is a string and contains the hostname on which the file was written.
- `description` contains information on the content of the file. This field is not filled automatically. The user is advised to provide as detailed information as possible in this field. Examples are: Input files of measurements or simulations, LaTeX formulae or references to publications to specify how the observables have been computed, details on the analysis strategy, ... This field may be any valid JSON type. Strings, arrays or objects (equivalent to dicts in python) are well suited to provide information.
The only necessary entry of the file is the field
-`obsdata`, an array that contains the actual data.
Each entry of the array belongs to a single structure of observables. Currently, these strucutres can be eiter of `Obs`, `list`, `numpy.ndarray`, `Corr`. All `Obs` inside a structure (with dimension > 0) have to be defined on the same set of configurations. Different structures, that are represented by entries of the array `obsdata`, are treated independently. Each entry of this array has the following required entries:
- `type` is a string that specifies the type of the structure. This allows to parse the content to the correct form after reading the file. It is always possible to interpret the content as list of Obs.
- `value` is an array that contains the mean values of the Obs inside the structure.
The following entries are optional:
- `layout` is a string that specifies the layout of multi-dimensional structures. Examples are "2, 2" for a 2x2 dimensional matrix or "64, 4, 4" for a Corr with T=64 and 4x4 matrices at each time slices. "1" denotes a single Obs.
- `tag` is any JSON type. It contains additional information concerning the structure. The `tag` of an `Obs` in `pyerrors` is written here.
- `reweighted` is a Bool that may be used to specify, whether the `Obs` in the structure have been reweighted.
- `data` is an array that contains the data from MC chains. We will define it below.
- `cdata` is an array that contains the data from external quantities with an error (`Covobs` in `pyerrors`). We will define it below.
The array `data` contains the data from MC chains. Each entry of the array corresponds to one ensemble and contains:
- `id`, a string giving the name of the ensemble
- `replica`, an array that contains an entry per replica of the ensemble.
Each entry of `replica` contains
`name`, a string that contains the name of the replica
`deltas`, an array that contains the actual data.
Each entry in `deltas` corresponds to one configuration of the replica and has $1+N$ many entries. The first entry is an integer that specifies the configuration number that, together with ensemble and replica name, may be used to uniquely identify the configuration on which the data has been obtained. The following N entries specify the deltas, i.e., the deviation of the observable from the mean value on this configuration, of each `Obs` inside the structure. Multi-dimensional structures are stored in a row-major format.
The array `cdata` contains information about the contribution of auxiliary observables, represented by `Covobs` in `pyerrors`, to the total error of the observables. Each entry of the array belongs to one auxiliary covariance matrix and contains:
- `id`, a string that identifies the covariance matrix
- `layout`, a string that defines the dimensions of the $M\times M$ covariance matrix (has to be "M, M").
- `cov`, an array that contains the $M\times M$ many entries of the covariance matrix, stored in row-major format.
- `grad`, an array that contains N entries, one for each `Obs` inside the structure. Each entry is an array, that contains the M gradients of the Nth observable with respect to the values that correspond to the diagonal entries of the covariance matrix.
## Jackknife samples
For comparison with other analysis workflows `pyerrors` can generate jackknife samples from an `Obs` object or import jackknife samples into an `Obs` object.