Merge pull request #78 from s-kuberski/feature/docu

Feature/docu
2026-05-19 19:32:32 +02:00 · 2022-02-21 17:50:18 +00:00 · 2022-02-21 17:50:18 +00:00 · 03eac295b2
commit 03eac295b2
parent 9ba180e3c4 8e5938ebe3
2 changed files with 45 additions and 3 deletions
--- a/pyerrors/init.py
+++ b/pyerrors/init.py
@ -187,7 +187,7 @@ obs3.details()

 ```

-`Obs` objects defined on regular and irregular histories of the same ensemble can be computed with each other and the correct error propagation and estimation is automatically taken care of.
+`Obs` objects defined on regular and irregular histories of the same ensemble can be combined with each other and the correct error propagation and estimation is automatically taken care of.

 **Warning:** Irregular Monte Carlo chains can result in odd patterns in the autocorrelation functions.
 Make sure to check the autocorrelation time with e.g. `pyerrors.obs.Obs.plot_rho` or `pyerrors.obs.Obs.plot_tauint`.
@ -339,7 +339,49 @@ For the full API see `pyerrors.linalg`.

 # Export data

-The preferred exported file format within `pyerrors` is json.gz. The exact specifications of this format will be listed here soon.
+The preferred exported file format within `pyerrors` is json.gz. Files written to this format are valid JSON files that have been compressed using gzip. The structure of the content is inspired by the dobs format of the ALPHA collaboration. The aim of the format is to facilitate the storage of data in a self-contained way such that, even years after the creation of the file, it is possible to extract all necessary information:
+- What observables are stored? Possibly: How exactly are they defined.
+- How does each single ensemble or external quantity contribute to the error of the observable?
+- Who did write the file when and on which machine?
+
+This can be achieved by storing all information in one single file. The export routines of `pyerrors` are written such that as much information as possible is written automatically. The first entries of the file provide optional auxiliary information:
+- `program` is a string that indicates which program was used to write the file.
+- `version` is a string that specifies the version of the format.
+- `who` is a string that specifies the user name of the creator of the file.
+- `date` is a string and contains the creation date of the file.
+- `host` is a string and contains the hostname of the machine where the file has been written.
+- `description` contains information on the content of the file. This field is not filled automatically in `pyerrors`. The user is advised to provide as detailed information as possible in this field. Examples are: Input files of measurements or simulations, LaTeX formulae or references to publications to specify how the observables have been computed, details on the analysis strategy, ... This field may be any valid JSON type. Strings, arrays or objects (equivalent to dicts in python) are well suited to provide information.
+
+The only necessary entry of the file is the field 
+-`obsdata`, an array that contains the actual data.
+
+Each entry of the array belongs to a single structure of observables. Currently, these strucutres can be eiter of `Obs`, `list`, `numpy.ndarray`, `Corr`. All `Obs` inside a structure (with dimension > 0) have to be defined on the same set of configurations. Different structures, that are represented by entries of the array `obsdata`, are treated independently. Each entry of the array `obsdata` has the following required entries:
+- `type` is a string that specifies the type of the structure. This allows to parse the content to the correct form after reading the file. It is always possible to interpret the content as list of Obs.
+- `value` is an array that contains the mean values of the Obs inside the structure.
+The following entries are optional:
+- `layout` is a string that specifies the layout of multi-dimensional structures. Examples are "2, 2" for a 2x2 dimensional matrix or  "64, 4, 4" for a Corr with $T=64$ and 4x4 matrices on each time slices. "1" denotes a single Obs. Multi-dimensional structures are stored in row-major format (see below).
+- `tag` is any JSON type. It contains additional information concerning the structure. The `tag` of an `Obs` in `pyerrors` is written here.
+- `reweighted` is a Bool that may be used to specify, whether the `Obs` in the structure have been reweighted.
+- `data` is an array that contains the data from MC chains. We will define it below.
+- `cdata` is an array that contains the data from external quantities with an error (`Covobs` in `pyerrors`). We will define it below.
+
+The array `data` contains the data from MC chains. Each entry of the array corresponds to one ensemble and contains:
+- `id`, a string that contains the name of the ensemble
+- `replica`, an array that contains an entry per replica of the ensemble. 
+
+Each entry of `replica` contains
+`name`, a string that contains the name of the replica
+`deltas`, an array that contains the actual data. 
+
+Each entry in `deltas` corresponds to one configuration of the replica and has $1+N$ many entries. The first entry is an integer that specifies the configuration number that, together with ensemble and replica name, may be used to uniquely identify the configuration on which the data has been obtained. The following N entries specify the deltas, i.e., the deviation of the observable from the mean value on this configuration, of each `Obs` inside the structure. Multi-dimensional structures are stored in a row-major format. For primary observables, such as correlation functions, $value + delta_i$ matches the primary data obtained on the configuration.
+
+The array `cdata` contains information about the contribution of auxiliary observables, represented by `Covobs` in `pyerrors`, to the total error of the observables. Each entry of the array belongs to one auxiliary covariance matrix and contains:
+- `id`, a string that identifies the covariance matrix
+- `layout`, a string that defines the dimensions of the $M\times M$ covariance matrix (has to be "M, M" or "1").
+- `cov`, an array that contains the $M\times M$ many entries of the covariance matrix, stored in row-major format.
+- `grad`, an array that contains N entries, one for each `Obs` inside the structure. Each entry itself is an array, that contains the M gradients of the Nth observable  with respect to the quantity that corresponds to the Mth diagonal entry of the covariance matrix.
+
+A JSON schema that may be used to verify the correctness of a file with respect to the format definition is stored in ./examples/json_schema.json. The schema is a self-descriptive format definition and contains an exemplary file.

 ## Jackknife samples
 For comparison with other analysis workflows `pyerrors` can generate jackknife samples from an `Obs` object or import jackknife samples into an `Obs` object.
--- a/pyerrors/input/json.py
+++ b/pyerrors/input/json.py
@ -207,7 +207,7 @@ def create_json_string(ol, description='', indent=1):

    d = {}
    d['program'] = 'pyerrors %s' % (pyerrorsversion.__version__)
-    d['version'] = '0.2'
+    d['version'] = '1.0'
    d['who'] = getpass.getuser()
    d['date'] = datetime.datetime.now().astimezone().strftime('%Y-%m-%d %H:%M:%S %z')
    d['host'] = socket.gethostname() + ', ' + platform.platform()