pyerrors/examples/07_data_management.ipynb at develop

lattice/pyerrors

Fork 0

mirror of https://github.com/fjosw/pyerrors.git synced 2025-07-03 02:09:27 +02:00

Fabian Joswig 6a57264868

docs: data management example refined.

2022-09-29 17:11:24 +01:00

10 KiB

Raw Permalink Blame History

None <html> <head> </head>

Data management¶

In [1]:

import numpy as np
import pandas as pd
import pyerrors as pe

For the data management example we reuse the data from the correlator example.

In [2]:

correlator_data = pe.input.json.load_json("./data/correlator_test")
my_correlator = pe.Corr(correlator_data)
my_correlator.gamma_method()

Data has been written using pyerrors 2.0.0.
Format version 0.1
Written by fjosw on 2022-01-06 11:11:19 +0100 on host XPS139305, Linux-5.11.0-44-generic-x86_64-with-glibc2.29

Description:  Test data for the correlator example

In [3]:

import autograd.numpy as anp
def func_exp(a, x):
    return a[1] * anp.exp(-a[0] * x)

In this example we perform uncorrelated fits of a single exponential function to the correlator and vary the range of the fit. The fit result can be conveniently stored in a pandas DataFrame together with the corresponding metadata.

In [4]:

rows = []
for t_start in range(12, 17):
    for t_stop in range(30, 32):
        fr = my_correlator.fit(func_exp, [t_start, t_stop], silent=True)
        fr.gamma_method()
        row = {"t_start": t_start,
               "t_stop": t_stop,
               "datapoints": t_stop - t_start + 1,
               "chisquare_by_dof": fr.chisquare_by_dof,
               "mass": fr[0]}
        rows.append(row)
my_df = pd.DataFrame(rows)

In [5]:

my_df

Out[5]:

	t_start	t_stop	datapoints	chisquare_by_dof	mass
0	12	30	19	0.057872	0.2218(12)
1	12	31	20	0.063951	0.2221(11)
2	13	30	18	0.051577	0.2215(12)
3	13	31	19	0.060901	0.2219(11)
4	14	30	17	0.052349	0.2213(13)
5	14	31	18	0.063640	0.2218(13)
6	15	30	16	0.056088	0.2213(16)
7	15	31	17	0.067552	0.2218(17)
8	16	30	15	0.059969	0.2214(21)
9	16	31	16	0.070874	0.2220(20)

The content of this pandas DataFrame can be inserted into a relational database, making use of the JSON serialization of pyerrors objects. In this example we use an SQLite database.

In [6]:

pe.input.pandas.to_sql(my_df, "mass_table", "my_db.sqlite", if_exists='fail')

At a later stage of the analysis the content of the database can be reconstructed into a DataFrame via SQL queries. In this example we extract t_start, t_stop and the fitted mass for all fits which start at times larger than 14.

In [7]:

new_df = pe.input.pandas.read_sql(f"SELECT t_start, t_stop, mass FROM mass_table WHERE t_start > 14",
                                  "my_db.sqlite",
                                  auto_gamma=True)

In [8]:

new_df

Out[8]:

	t_start	t_stop	mass
0	15	30	0.2213(16)
1	15	31	0.2218(17)
2	16	30	0.2214(21)
3	16	31	0.2220(20)

The storage of intermediate analysis results in relational databases allows for a convenient and scalable way of splitting up a detailed analysis in multiple independent steps.

In [ ]:

</html>

10 KiB Raw Permalink Blame History

Data management¶

10 KiB

Raw Permalink Blame History