mirror of
https://igit.ific.uv.es/alramos/latticegpu.jl.git
synced 2025-05-14 11:13:42 +02:00
Merge branch 'master' into 'master'
Master See merge request alramos/latticegpu.jl!5
This commit is contained in:
commit
94f410b51f
23 changed files with 2674 additions and 804 deletions
|
@ -1,7 +1,7 @@
|
||||||
|
|
||||||
# Dirac operator
|
# Dirac operator
|
||||||
|
|
||||||
The module `Dirac` has the necessary stuctures and function
|
The module `Dirac` has the necessary structures and functions
|
||||||
to simulate non-dynamical 4-dimensional Wilson fermions.
|
to simulate non-dynamical 4-dimensional Wilson fermions.
|
||||||
|
|
||||||
There are two main data structures in this module, the structure [`DiracParam`](@ref)
|
There are two main data structures in this module, the structure [`DiracParam`](@ref)
|
||||||
|
@ -18,7 +18,7 @@ DiracWorkspace
|
||||||
|
|
||||||
The workspace stores four fermion fields, namely `.sr`, `.sp`, `.sAp` and `.st`, used
|
The workspace stores four fermion fields, namely `.sr`, `.sp`, `.sAp` and `.st`, used
|
||||||
for different purposes. If the representation is either `SU2fund` of `SU3fund`, an extra
|
for different purposes. If the representation is either `SU2fund` of `SU3fund`, an extra
|
||||||
field with values in `U2alg`/`U3alg` is created to store the clover, used for the improvent.
|
field with values in `U2alg`/`U3alg` is created to store the clover, used for the improvement.
|
||||||
|
|
||||||
## Functions
|
## Functions
|
||||||
|
|
||||||
|
@ -38,7 +38,7 @@ where $$m_0$$ and $$\theta$$ are respectively the values `.m0` and `.th` of [`Di
|
||||||
Note that $$|\theta(\mu)|=1$$ is not built into the code, so it should be imposed explicitly.
|
Note that $$|\theta(\mu)|=1$$ is not built into the code, so it should be imposed explicitly.
|
||||||
|
|
||||||
Additionally, if |`dpar.csw`| > 1.0E-10, the clover term is assumed to be stored in `ymws.csw`, which
|
Additionally, if |`dpar.csw`| > 1.0E-10, the clover term is assumed to be stored in `ymws.csw`, which
|
||||||
can be done via the [`Csw!`](@ref) function. In this case we have the Sheikholeslami–Wohlert (SW) term
|
can be done via the [`Csw!`](@ref) function. In this case we have the Sheikholeslami-Wohlert (SW) term
|
||||||
in `Dw!`:
|
in `Dw!`:
|
||||||
|
|
||||||
```math
|
```math
|
||||||
|
@ -53,7 +53,7 @@ improvement term
|
||||||
```math
|
```math
|
||||||
\delta D_w^{SF} = (c_t -1) (\delta_{x_4,a} \psi(\vec{x}) + \delta_{x_4,T-a} \psi(\vec{x}))
|
\delta D_w^{SF} = (c_t -1) (\delta_{x_4,a} \psi(\vec{x}) + \delta_{x_4,T-a} \psi(\vec{x}))
|
||||||
```
|
```
|
||||||
is added. Since the time-slice $$t=T$$ is not stored, this accounts to modifying the second
|
is added. Since the time-slice $$t=T$$ is not stored, this accounts for modifying the second
|
||||||
and last time-slice.
|
and last time-slice.
|
||||||
|
|
||||||
Note that the Dirac operator for SF boundary conditions assumes that the value of the field
|
Note that the Dirac operator for SF boundary conditions assumes that the value of the field
|
||||||
|
@ -63,11 +63,6 @@ in the first time-slice is zero. To enforce this, we have the function
|
||||||
SF_bndfix!
|
SF_bndfix!
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that this is not enforced in the Dirac operators, so if the field `so` does not satisfy SF
|
|
||||||
boundary conditions, it will not (in general) satisfy them after applying [`Dw!`](@ref)
|
|
||||||
or [`g5Dw!`](@ref). This function is called for the function [`DwdagDw!`](@ref), so in this case
|
|
||||||
`so` will always be a proper SF field after calling this function.
|
|
||||||
|
|
||||||
The function [`Csw!`](@ref) is used to store the clover in `dws.csw`. It is computed
|
The function [`Csw!`](@ref) is used to store the clover in `dws.csw`. It is computed
|
||||||
according to the expression
|
according to the expression
|
||||||
|
|
||||||
|
@ -97,8 +92,8 @@ F[b,4,r] \to F_{31}(b,r) ,\quad F[b,5,r] \to F_{32}(b,r) ,\quad F[b,6,r] \to F_{
|
||||||
```
|
```
|
||||||
where $$(b,r)$$ labels the lattice points as explained in the module `Space`
|
where $$(b,r)$$ labels the lattice points as explained in the module `Space`
|
||||||
|
|
||||||
The function [`pfrandomize!`](@ref), userfull for stochastic sources, is also present. It
|
The function [`pfrandomize!`](@ref), userful for stochastic sources, is also present. It
|
||||||
randomizes a fermion field either in all the space or in a specifit time-slice.
|
randomizes a fermion field, either in all the space or in a specific time-slice.
|
||||||
|
|
||||||
The generic interface of these functions reads
|
The generic interface of these functions reads
|
||||||
|
|
||||||
|
|
|
@ -1,16 +1,18 @@
|
||||||
|
|
||||||
# Lattice fields
|
# Lattice fields
|
||||||
|
|
||||||
The module `Fields` include simple routines to define a few typical
|
The module `Fields` includes simple routines to define a few typical
|
||||||
fields. Fields are simple `CuArray` types with special memory
|
fields. Fields are simple `CuArray` types with special memory
|
||||||
layout. A field always has an associated elemental type (i.e. for
|
layout. A field always has an associated elemental type (i.e. for
|
||||||
gauge fields `SU3`, for scalar fields `Float64`). We have:
|
gauge fields `SU3`, for scalar fields `Float64`). We have:
|
||||||
- scalar fields: One elemental type in each spacetime point.
|
- Scalar fields: One elemental type in each spacetime point.
|
||||||
- vector field: One elemental type at each spacetime point and
|
- Vector field: One elemental type at each spacetime point and
|
||||||
direction.
|
direction.
|
||||||
- `N` scalar fields: `N` elemental types at each spacetime point.
|
- `N` scalar fields: `N` elemental types at each spacetime point.
|
||||||
|
- Tensor fields: One elemental type at each spacetime point and
|
||||||
|
plane. They are to be thought of as symmetric tensors.
|
||||||
|
|
||||||
Fields can have **naturaL indexing**, where the memory layout follows
|
Fields can have **natural indexing**, where the memory layout follows
|
||||||
the point-in-block and block indices (see
|
the point-in-block and block indices (see
|
||||||
[`SpaceParm`](@ref)). Fields can also have **lexicographic indexing**,
|
[`SpaceParm`](@ref)). Fields can also have **lexicographic indexing**,
|
||||||
where points are labelled by a D-dimensional index (see [`scalar_field_point`](@ref)).
|
where points are labelled by a D-dimensional index (see [`scalar_field_point`](@ref)).
|
||||||
|
@ -21,6 +23,7 @@ where points are labelled by a D-dimensional index (see [`scalar_field_point`](@
|
||||||
```@docs
|
```@docs
|
||||||
scalar_field
|
scalar_field
|
||||||
vector_field
|
vector_field
|
||||||
|
tensor_field
|
||||||
nscalar_field
|
nscalar_field
|
||||||
scalar_field_point
|
scalar_field_point
|
||||||
```
|
```
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
|
|
||||||
# Groups and Algebras
|
# Groups and Algebras
|
||||||
|
|
||||||
The module `Groups` contain generic data types to deal with group and
|
The module `Groups` contains generic data types to deal with group and
|
||||||
algebra elements. Group elements $$g\in SU(N)$$ are represented in
|
algebra elements. Group elements $$g\in SU(N)$$ are represented in
|
||||||
some compact notation. For the case $$N=2$$ we use two complex numbers
|
some compact notation. For the case $$N=2$$ we use two complex numbers
|
||||||
(Caley-Dickson representation, i.e. $$g=(z_1,z_2)$$ with
|
(Caley-Dickson representation, i.e. $$g=(z_1,z_2)$$ with
|
||||||
|
@ -79,7 +79,7 @@ elements. The objective is to get an idea on how group operations
|
||||||
We can generate some random group elements.
|
We can generate some random group elements.
|
||||||
```@repl exs
|
```@repl exs
|
||||||
# Generate random groups elements,
|
# Generate random groups elements,
|
||||||
# check they are actually from the grup
|
# check they are actually from the group
|
||||||
g = rand(SU2{Float64})
|
g = rand(SU2{Float64})
|
||||||
println("Are we in a group?: ", isgroup(g))
|
println("Are we in a group?: ", isgroup(g))
|
||||||
g = rand(SU3{Float64})
|
g = rand(SU3{Float64})
|
||||||
|
|
|
@ -29,13 +29,15 @@ is given by $$|$$``dws.sr``$$|^2$$.
|
||||||
|
|
||||||
## Propagators.jl
|
## Propagators.jl
|
||||||
|
|
||||||
In this file, we define a couple of useful functions to obtain certain
|
In this file, we define some useful functions to obtain certain
|
||||||
propagators.
|
propagators.
|
||||||
|
|
||||||
```@docs
|
```@docs
|
||||||
propagator!
|
propagator!
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Note that the indexing in Julia starts at 1, so the first tiime slice is t=1.
|
||||||
|
|
||||||
Internally, this function solves the equation
|
Internally, this function solves the equation
|
||||||
|
|
||||||
```math
|
```math
|
||||||
|
|
|
@ -6,7 +6,7 @@ which is a NS-tuple with values in G.
|
||||||
The functions `norm`, `norm2`, `dot`, `*`, `/`, `/`, `+`, `-`, `imm` and `mimm`,
|
The functions `norm`, `norm2`, `dot`, `*`, `/`, `/`, `+`, `-`, `imm` and `mimm`,
|
||||||
if defined for G, are extended to Spinor{NS,G} for general NS.
|
if defined for G, are extended to Spinor{NS,G} for general NS.
|
||||||
|
|
||||||
For the 4d case where NS = 4 there are some specific functions to implement different
|
For the 4D case, where NS = 4, there are some specific functions to implement different
|
||||||
operations with the gamma matrices. The convention for these matrices is
|
operations with the gamma matrices. The convention for these matrices is
|
||||||
|
|
||||||
|
|
||||||
|
@ -79,7 +79,6 @@ using LatticeGPU # hide
|
||||||
```
|
```
|
||||||
```@repl exs
|
```@repl exs
|
||||||
spin = Spinor{4,Complex{Float64}}((1.0,im*0.5,2.3,0.0))
|
spin = Spinor{4,Complex{Float64}}((1.0,im*0.5,2.3,0.0))
|
||||||
println(spin)
|
|
||||||
println(dmul(Gamma{4},spin))
|
println(dmul(Gamma{4},spin))
|
||||||
println(pmul(Pgamma{2,-1},spin))
|
println(pmul(Pgamma{2,-1},spin))
|
||||||
|
|
||||||
|
|
|
@ -105,500 +105,6 @@ struct DiracWorkspace{T}
|
||||||
|
|
||||||
end
|
end
|
||||||
|
|
||||||
export DiracWorkspace, DiracParam
|
|
||||||
|
|
||||||
|
|
||||||
"""
|
|
||||||
function Csw!(dws, U, gp, lp::SpaceParm)
|
|
||||||
|
|
||||||
Computes the clover and stores it in dws.csw.
|
|
||||||
|
|
||||||
"""
|
|
||||||
function Csw!(dws, U, gp, lp::SpaceParm{4,6,B,D}) where {B,D}
|
|
||||||
|
|
||||||
@timeit "Csw computation" begin
|
|
||||||
|
|
||||||
for i in 1:Int(lp.npls)
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_csw!(dws.csw, U, gp.Ubnd, i, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_csw!(csw::AbstractArray{T}, U, Ubnd, ipl, lp::SpaceParm{4,M,B,D}) where {T,M,B,D}
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
b = Int64(CUDA.threadIdx().x)
|
|
||||||
r = Int64(CUDA.blockIdx().x)
|
|
||||||
I = point_coord((b,r), lp)
|
|
||||||
it = I[4]
|
|
||||||
|
|
||||||
id1, id2 = lp.plidx[ipl]
|
|
||||||
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
|
|
||||||
|
|
||||||
bu1, ru1 = up((b, r), id1, lp)
|
|
||||||
bu2, ru2 = up((b, r), id2, lp)
|
|
||||||
bd1, rd1 = dw((b, r), id1, lp)
|
|
||||||
bd2, rd2 = dw((b, r), id2, lp)
|
|
||||||
bdd, rdd = dw((bd1, rd1), id2, lp)
|
|
||||||
bud, rud = dw((bu1, ru1), id2, lp)
|
|
||||||
bdu, rdu = up((bd1, rd1), id2, lp)
|
|
||||||
|
|
||||||
if SFBC && (it == lp.iL[end])
|
|
||||||
gt1 = Ubnd[id2]
|
|
||||||
gt2 = Ubnd[id2]
|
|
||||||
else
|
|
||||||
gt1 = U[bu1,id2,ru1]
|
|
||||||
gt2 = U[bud,id2,rud]
|
|
||||||
end
|
|
||||||
|
|
||||||
M1 = U[b,id1,r]*gt1/(U[b,id2,r]*U[bu2,id1,ru2])
|
|
||||||
M2 = (U[bd2,id2,rd2]\(U[bd2,id1,rd2]*gt2))/U[b,id1,r]
|
|
||||||
M3 = (U[bdd,id2,rdd]*U[bd1,id1,rd1])\(U[bdd,id1,rdd]*U[bd2,id2,rd2])
|
|
||||||
M4 = (U[b,id2,r]/(U[bd1,id2,rd1]*U[bdu,id1,rdu]))*U[bd1,id1,rd1]
|
|
||||||
|
|
||||||
|
|
||||||
if !(SFBC && (it == 1))
|
|
||||||
csw[b,ipl,r] = 0.125*(antsym(M1)+antsym(M2)+antsym(M3)+antsym(M4))
|
|
||||||
end
|
|
||||||
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
|
|
||||||
"""
|
|
||||||
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
|
|
||||||
|
|
||||||
Computes the Dirac operator (with the Wilson term) `\`\``D_w``\`\` with gauge field U and parameters `dpar` of the field `si` and stores it in `so`.
|
|
||||||
If `dpar.csw` is different from zero, the clover term should be stored in `dws.csw` via the Csw! function and is automatically included in the operator.
|
|
||||||
|
|
||||||
"""
|
|
||||||
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D}) where {B,D}
|
|
||||||
|
|
||||||
if abs(dpar.csw) > 1.0E-10
|
|
||||||
@timeit "Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
else
|
|
||||||
@timeit "Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, lp::SpaceParm{4,6,B,D}) where {B,D}
|
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
bu1, ru1 = up((b,r), 1, lp)
|
|
||||||
bd1, rd1 = dw((b,r), 1, lp)
|
|
||||||
bu2, ru2 = up((b,r), 2, lp)
|
|
||||||
bd2, rd2 = dw((b,r), 2, lp)
|
|
||||||
bu3, ru3 = up((b,r), 3, lp)
|
|
||||||
bd3, rd3 = dw((b,r), 3, lp)
|
|
||||||
bu4, ru4 = up((b,r), 4, lp)
|
|
||||||
bd4, rd4 = dw((b,r), 4, lp)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
|
|
||||||
so[b,r] = (4+m0)*si[b,r]+ im*tm*dmul(Gamma{5},si[b,r]) + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
|
||||||
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
|
||||||
|
|
||||||
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
|
||||||
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
|
||||||
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
|
||||||
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
|
||||||
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_Dw!(so, U, si, m0, tm, th, lp::SpaceParm{4,6,B,D}) where {B,D}
|
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
bu1, ru1 = up((b,r), 1, lp)
|
|
||||||
bd1, rd1 = dw((b,r), 1, lp)
|
|
||||||
bu2, ru2 = up((b,r), 2, lp)
|
|
||||||
bd2, rd2 = dw((b,r), 2, lp)
|
|
||||||
bu3, ru3 = up((b,r), 3, lp)
|
|
||||||
bd3, rd3 = dw((b,r), 3, lp)
|
|
||||||
bu4, ru4 = up((b,r), 4, lp)
|
|
||||||
bd4, rd4 = dw((b,r), 4, lp)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
|
|
||||||
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r])
|
|
||||||
|
|
||||||
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
|
||||||
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
|
||||||
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
|
||||||
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
|
||||||
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
|
||||||
|
|
||||||
if abs(dpar.csw) > 1.0E-10
|
|
||||||
@timeit "Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
else
|
|
||||||
@timeit "Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
|
||||||
|
|
||||||
# The field si is assumed to be zero at t = 0
|
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
if (point_time((b,r),lp) != 1)
|
|
||||||
|
|
||||||
bu1, ru1 = up((b,r), 1, lp)
|
|
||||||
bd1, rd1 = dw((b,r), 1, lp)
|
|
||||||
bu2, ru2 = up((b,r), 2, lp)
|
|
||||||
bd2, rd2 = dw((b,r), 2, lp)
|
|
||||||
bu3, ru3 = up((b,r), 3, lp)
|
|
||||||
bd3, rd3 = dw((b,r), 3, lp)
|
|
||||||
bu4, ru4 = up((b,r), 4, lp)
|
|
||||||
bd4, rd4 = dw((b,r), 4, lp)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
|
|
||||||
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r]) + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
|
||||||
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
|
||||||
|
|
||||||
|
|
||||||
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
|
||||||
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
|
||||||
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
|
||||||
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
|
||||||
|
|
||||||
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
|
|
||||||
so[b,r] += (ct-1.0)*si[b,r]
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_Dw!(so, U, si, m0, tm, th, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
|
||||||
|
|
||||||
# The field si is assumed to be zero at t = 0
|
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
if (point_time((b,r),lp) != 1)
|
|
||||||
|
|
||||||
bu1, ru1 = up((b,r), 1, lp)
|
|
||||||
bd1, rd1 = dw((b,r), 1, lp)
|
|
||||||
bu2, ru2 = up((b,r), 2, lp)
|
|
||||||
bd2, rd2 = dw((b,r), 2, lp)
|
|
||||||
bu3, ru3 = up((b,r), 3, lp)
|
|
||||||
bd3, rd3 = dw((b,r), 3, lp)
|
|
||||||
bu4, ru4 = up((b,r), 4, lp)
|
|
||||||
bd4, rd4 = dw((b,r), 4, lp)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
|
|
||||||
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r])
|
|
||||||
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
|
||||||
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
|
||||||
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
|
||||||
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
|
||||||
|
|
||||||
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
|
|
||||||
so[b,r] += (ct-1.0)*si[b,r]
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
"""
|
|
||||||
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
|
|
||||||
|
|
||||||
Computes \`\` \\gamma_5 \`\` times the Dirac operator (with the Wilson term) with gauge field U and parameters `dpar` of the field `si` and stores it in `so`.
|
|
||||||
If `dpar.csw` is different from zero, the clover term should be stored in `dws.csw` via the Csw! function and is automatically included in the operator.
|
|
||||||
"""
|
|
||||||
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D}) where {B,D}
|
|
||||||
|
|
||||||
if abs(dpar.csw) > 1.0E-10
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
else
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, lp::SpaceParm{4,6,B,D}) where {B,D}
|
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
bu1, ru1 = up((b,r), 1, lp)
|
|
||||||
bd1, rd1 = dw((b,r), 1, lp)
|
|
||||||
bu2, ru2 = up((b,r), 2, lp)
|
|
||||||
bd2, rd2 = dw((b,r), 2, lp)
|
|
||||||
bu3, ru3 = up((b,r), 3, lp)
|
|
||||||
bd3, rd3 = dw((b,r), 3, lp)
|
|
||||||
bu4, ru4 = up((b,r), 4, lp)
|
|
||||||
bd4, rd4 = dw((b,r), 4, lp)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
|
|
||||||
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
|
||||||
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
|
||||||
|
|
||||||
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
|
||||||
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
|
||||||
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
|
||||||
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
|
||||||
|
|
||||||
so[b,r] = dmul(Gamma{5}, so[b,r])+ im*tm*si[b,r]
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_g5Dw!(so, U, si, m0, tm, th, lp::SpaceParm{4,6,B,D}) where {B,D}
|
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
bu1, ru1 = up((b,r), 1, lp)
|
|
||||||
bd1, rd1 = dw((b,r), 1, lp)
|
|
||||||
bu2, ru2 = up((b,r), 2, lp)
|
|
||||||
bd2, rd2 = dw((b,r), 2, lp)
|
|
||||||
bu3, ru3 = up((b,r), 3, lp)
|
|
||||||
bd3, rd3 = dw((b,r), 3, lp)
|
|
||||||
bu4, ru4 = up((b,r), 4, lp)
|
|
||||||
bd4, rd4 = dw((b,r), 4, lp)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
|
|
||||||
so[b,r] = (4+m0)*si[b,r]
|
|
||||||
|
|
||||||
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
|
||||||
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
|
||||||
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
|
||||||
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
|
||||||
|
|
||||||
so[b,r] = dmul(Gamma{5}, so[b,r]) + im*tm*si[b,r]
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
|
||||||
|
|
||||||
if abs(dpar.csw) > 1.0E-10
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
else
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
|
||||||
|
|
||||||
# The field si is assumed to be zero at t = 0
|
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
if (point_time((b,r),lp) != 1)
|
|
||||||
|
|
||||||
bu1, ru1 = up((b,r), 1, lp)
|
|
||||||
bd1, rd1 = dw((b,r), 1, lp)
|
|
||||||
bu2, ru2 = up((b,r), 2, lp)
|
|
||||||
bd2, rd2 = dw((b,r), 2, lp)
|
|
||||||
bu3, ru3 = up((b,r), 3, lp)
|
|
||||||
bd3, rd3 = dw((b,r), 3, lp)
|
|
||||||
bu4, ru4 = up((b,r), 4, lp)
|
|
||||||
bd4, rd4 = dw((b,r), 4, lp)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
|
|
||||||
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
|
||||||
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
|
||||||
|
|
||||||
|
|
||||||
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
|
||||||
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
|
||||||
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
|
||||||
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
|
||||||
|
|
||||||
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
|
|
||||||
so[b,r] += (ct-1.0)*si[b,r]
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
so[b,r] = dmul(Gamma{5}, so[b,r])+ im*tm*si[b,r]
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_g5Dw!(so, U, si, m0, tm, th, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
|
||||||
|
|
||||||
# The field si is assumed to be zero at t = 0
|
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
if (point_time((b,r),lp) != 1)
|
|
||||||
|
|
||||||
bu1, ru1 = up((b,r), 1, lp)
|
|
||||||
bd1, rd1 = dw((b,r), 1, lp)
|
|
||||||
bu2, ru2 = up((b,r), 2, lp)
|
|
||||||
bd2, rd2 = dw((b,r), 2, lp)
|
|
||||||
bu3, ru3 = up((b,r), 3, lp)
|
|
||||||
bd3, rd3 = dw((b,r), 3, lp)
|
|
||||||
bu4, ru4 = up((b,r), 4, lp)
|
|
||||||
bd4, rd4 = dw((b,r), 4, lp)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
|
|
||||||
so[b,r] = (4+m0)*si[b,r]
|
|
||||||
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
|
||||||
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
|
||||||
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
|
||||||
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
|
||||||
|
|
||||||
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
|
|
||||||
so[b,r] += (ct-1.0)*si[b,r]
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
so[b,r] = dmul(Gamma{5}, so[b,r]) + im*tm*si[b,r]
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
"""
|
|
||||||
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
|
|
||||||
|
|
||||||
Applies the operator \`\` \\gamma_5 D_w \`\` twice to `si` and stores the result in `so`. This is equivalent to appling the operator \`\` D_w^\\dagger D_w \`\`
|
|
||||||
The Dirac operator is the same as in the functions `Dw!` and `g5Dw!`
|
|
||||||
"""
|
|
||||||
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
|
||||||
|
|
||||||
if abs(dpar.csw) > 1.0E-10
|
|
||||||
@timeit "DwdagDw" begin
|
|
||||||
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
SF_bndfix!(dws.st,lp)
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, -dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
SF_bndfix!(so,lp)
|
|
||||||
end
|
|
||||||
else
|
|
||||||
@timeit "DwdagDw" begin
|
|
||||||
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
SF_bndfix!(dws.st,lp)
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, -dpar.tm, dpar.th, dpar.ct, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
SF_bndfix!(so,lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D}) where {B,D}
|
|
||||||
|
|
||||||
if abs(dpar.csw) > 1.0E-10
|
|
||||||
@timeit "DwdagDw" begin
|
|
||||||
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, -dpar.tm, dpar.th, dpar.csw, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
else
|
|
||||||
@timeit "DwdagDw" begin
|
|
||||||
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.tm, dpar.th, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
@timeit "g5Dw" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, -dpar.tm, dpar.th, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
"""
|
"""
|
||||||
function mtwmdpar(dpar::DiracParam)
|
function mtwmdpar(dpar::DiracParam)
|
||||||
|
@ -610,105 +116,19 @@ function mtwmdpar(dpar::DiracParam{P,R}) where {P,R}
|
||||||
end
|
end
|
||||||
|
|
||||||
|
|
||||||
"""
|
export DiracWorkspace, DiracParam, mtwmdpar
|
||||||
SF_bndfix!(sp, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}})
|
|
||||||
|
|
||||||
Sets all the values of `sp` in the first time slice to zero.
|
include("Diracfields.jl")
|
||||||
"""
|
export SF_bndfix!, Csw!, pfrandomize!
|
||||||
function SF_bndfix!(sp, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
|
||||||
@timeit "SF boundary fix" begin
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_sfbndfix!(sp, lp)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_sfbndfix!(sp,lp::SpaceParm)
|
|
||||||
b=Int64(CUDA.threadIdx().x)
|
|
||||||
r=Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
if (point_time((b,r),lp) == 1)
|
|
||||||
sp[b,r] = 0.0*sp[b,r]
|
|
||||||
end
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
|
|
||||||
"""
|
|
||||||
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund / SU2fund {T}}}, lp::SpaceParm, t::Int64 = 0)
|
|
||||||
|
|
||||||
Randomizes the SU2fund / SU3fund fermion field. If the argument t is present, it only randomizes that time-slice.
|
|
||||||
"""
|
|
||||||
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund{T}}}, lp::SpaceParm, t::Int64 = 0) where {T}
|
|
||||||
|
|
||||||
@timeit "Randomize pseudofermion field" begin
|
|
||||||
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4) # complex generation not suported for Julia 1.5.4
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su3!(f,p,lp,t)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_assign_pf_su3!(f::AbstractArray, p , lp::SpaceParm, t::Int64)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
b = Int64(CUDA.threadIdx().x)
|
|
||||||
r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
if t == 0
|
|
||||||
f[b,r] = Spinor(map(x->SU3fund(x[b,1,r,1] + im* x[b,1,r,2],
|
|
||||||
x[b,2,r,1] + im* x[b,2,r,2],
|
|
||||||
x[b,3,r,1] + im* x[b,3,r,2]),p))
|
|
||||||
elseif point_time((b,r),lp) == t
|
|
||||||
f[b,r] = Spinor(map(x->SU3fund(x[b,1,r,1] + im* x[b,1,r,2],
|
|
||||||
x[b,2,r,1] + im* x[b,2,r,2],
|
|
||||||
x[b,3,r,1] + im* x[b,3,r,2]),p))
|
|
||||||
end
|
|
||||||
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function pfrandomize!(f::AbstractArray{Spinor{4, SU2fund{T}}},lp::SpaceParm, t::Int64=0) where {T}
|
|
||||||
|
|
||||||
@timeit "Randomize pseudofermion field" begin
|
|
||||||
p = ntuple(i->CUDA.randn(T, lp.bsz, 2, lp.rsz,2),4) # complex generation not suported for Julia 1.5.4
|
|
||||||
CUDA.@sync begin
|
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su2!(f,p,lp,t)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
function krnl_assign_pf_su2!(f::AbstractArray, p , lp::SpaceParm, t::Int64)
|
|
||||||
|
|
||||||
@inbounds begin
|
|
||||||
b = Int64(CUDA.threadIdx().x)
|
|
||||||
r = Int64(CUDA.blockIdx().x)
|
|
||||||
|
|
||||||
if t == 0
|
|
||||||
f[b,r] = Spinor(map(x->SU2fund(x[b,1,r,1] + im* x[b,1,r,2],
|
|
||||||
x[b,2,r,1] + im* x[b,2,r,2]),p))
|
|
||||||
elseif point_time((b,r),lp) == t
|
|
||||||
f[b,r] = Spinor(map(x->SU2fund(x[b,1,r,1] + im* x[b,1,r,2],
|
|
||||||
x[b,2,r,1] + im* x[b,2,r,2]),p))
|
|
||||||
end
|
|
||||||
|
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
|
||||||
end
|
|
||||||
|
|
||||||
export Dw!, g5Dw!, DwdagDw!, SF_bndfix!, Csw!, pfrandomize!, mtwmdpar
|
|
||||||
|
|
||||||
|
include("Diracoper.jl")
|
||||||
|
export Dw!, g5Dw!, DwdagDw!
|
||||||
|
|
||||||
include("DiracIO.jl")
|
include("DiracIO.jl")
|
||||||
export read_prop, save_prop, read_dpar
|
export read_prop, save_prop, read_dpar
|
||||||
|
|
||||||
|
include("Diracflow.jl")
|
||||||
|
export Nablanabla!, Dslash_sq!, flw, backflow
|
||||||
|
|
||||||
|
|
||||||
end
|
end
|
||||||
|
|
211
src/Dirac/Diracfields.jl
Normal file
211
src/Dirac/Diracfields.jl
Normal file
|
@ -0,0 +1,211 @@
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
function Csw!(dws, U, gp, lp::SpaceParm)
|
||||||
|
|
||||||
|
Computes the clover and stores it in dws.csw.
|
||||||
|
|
||||||
|
"""
|
||||||
|
function Csw!(dws, U, gp, lp::SpaceParm{4,6,B,D}) where {B,D}
|
||||||
|
|
||||||
|
@timeit "Csw computation" begin
|
||||||
|
|
||||||
|
for i in 1:Int(lp.npls)
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_csw!(dws.csw, U, gp.Ubnd, i, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_csw!(csw::AbstractArray{T}, U, Ubnd, ipl, lp::SpaceParm{4,M,B,D}) where {T,M,B,D}
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[4]
|
||||||
|
|
||||||
|
id1, id2 = lp.plidx[ipl]
|
||||||
|
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
|
||||||
|
OBC = (B == BC_OPEN) && ((it == 1) || (it == lp.iL[end]))
|
||||||
|
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
bd1, rd1 = dw((b, r), id1, lp)
|
||||||
|
bd2, rd2 = dw((b, r), id2, lp)
|
||||||
|
bdd, rdd = dw((bd1, rd1), id2, lp)
|
||||||
|
bud, rud = dw((bu1, ru1), id2, lp)
|
||||||
|
bdu, rdu = up((bd1, rd1), id2, lp)
|
||||||
|
|
||||||
|
if SFBC && (it == lp.iL[end])
|
||||||
|
gt1 = Ubnd[id2]
|
||||||
|
gt2 = Ubnd[id2]
|
||||||
|
else
|
||||||
|
gt1 = U[bu1,id2,ru1]
|
||||||
|
gt2 = U[bud,id2,rud]
|
||||||
|
end
|
||||||
|
|
||||||
|
M1 = U[b,id1,r]*gt1/(U[b,id2,r]*U[bu2,id1,ru2])
|
||||||
|
M2 = (U[bd2,id2,rd2]\(U[bd2,id1,rd2]*gt2))/U[b,id1,r]
|
||||||
|
M3 = (U[bdd,id2,rdd]*U[bd1,id1,rd1])\(U[bdd,id1,rdd]*U[bd2,id2,rd2])
|
||||||
|
M4 = (U[b,id2,r]/(U[bd1,id2,rd1]*U[bdu,id1,rdu]))*U[bd1,id1,rd1]
|
||||||
|
|
||||||
|
|
||||||
|
if !(SFBC && (it == 1)) && !OBC
|
||||||
|
csw[b,ipl,r] = 0.125*(antsym(M1)+antsym(M2)+antsym(M3)+antsym(M4))
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
SF_bndfix!(sp, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}})
|
||||||
|
|
||||||
|
Sets all the values of `sp` in the first time slice to zero.
|
||||||
|
"""
|
||||||
|
function SF_bndfix!(sp, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
@timeit "SF boundary fix" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_sfbndfix!(sp, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_sfbndfix!(sp,lp::SpaceParm)
|
||||||
|
b=Int64(CUDA.threadIdx().x)
|
||||||
|
r=Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) == 1)
|
||||||
|
sp[b,r] = 0.0*sp[b,r]
|
||||||
|
end
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
"""
|
||||||
|
SF_bndfix!(sp, lp::SpaceParm{4,6,BC_OPEN,D})
|
||||||
|
|
||||||
|
Sets all the values of `sp` in the first and last time slice to zero.
|
||||||
|
"""
|
||||||
|
function SF_bndfix!(sp, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
|
||||||
|
@timeit "SF boundary fix" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_opbndfix!(sp, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_opbndfix!(sp,lp::SpaceParm)
|
||||||
|
b=Int64(CUDA.threadIdx().x)
|
||||||
|
r=Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if ((point_time((b,r),lp) == 1) || (point_time((b,r),lp) == lp.iL[end]))
|
||||||
|
sp[b,r] = 0.0*sp[b,r]
|
||||||
|
end
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund / SU2fund {T}}}, lp::SpaceParm, t::Int64 = 0)
|
||||||
|
|
||||||
|
Randomizes the SU2fund / SU3fund fermion field. If the argument t is present, it only randomizes that time-slice.
|
||||||
|
"""
|
||||||
|
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund{T}}}, lp::SpaceParm{4,6,BC_PERIODIC,D}, t::Int64 = 0) where {T,D}
|
||||||
|
|
||||||
|
@timeit "Randomize pseudofermion field" begin
|
||||||
|
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4)./sqrt(2) # complex generation not suported for Julia 1.5.4
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su3!(f,p,lp,t)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund{T}}}, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D},SpaceParm{4,6,BC_OPEN,D}}, t::Int64 = 0) where {T,D}
|
||||||
|
|
||||||
|
@timeit "Randomize pseudofermion field" begin
|
||||||
|
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4)./sqrt(2) # complex generation not suported for Julia 1.5.4
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su3!(f,p,lp,t)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(f,lp)
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_assign_pf_su3!(f::AbstractArray, p , lp::SpaceParm, t::Int64)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if t == 0
|
||||||
|
f[b,r] = Spinor(map(x->SU3fund(x[b,1,r,1] + im* x[b,1,r,2],
|
||||||
|
x[b,2,r,1] + im* x[b,2,r,2],
|
||||||
|
x[b,3,r,1] + im* x[b,3,r,2]),p))
|
||||||
|
elseif point_time((b,r),lp) == t
|
||||||
|
f[b,r] = Spinor(map(x->SU3fund(x[b,1,r,1] + im* x[b,1,r,2],
|
||||||
|
x[b,2,r,1] + im* x[b,2,r,2],
|
||||||
|
x[b,3,r,1] + im* x[b,3,r,2]),p))
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function pfrandomize!(f::AbstractArray{Spinor{4, SU2fund{T}}}, lp::SpaceParm{4,6,BC_PERIODIC,D}, t::Int64 = 0) where {T,D}
|
||||||
|
|
||||||
|
@timeit "Randomize pseudofermion field" begin
|
||||||
|
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4)./sqrt(2) # complex generation not suported for Julia 1.5.4
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su2!(f,p,lp,t)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function pfrandomize!(f::AbstractArray{Spinor{4, SU2fund{T}}}, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D},SpaceParm{4,6,BC_OPEN,D}}, t::Int64 = 0) where {T,D}
|
||||||
|
|
||||||
|
@timeit "Randomize pseudofermion field" begin
|
||||||
|
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4)./sqrt(2) # complex generation not suported for Julia 1.5.4
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su2!(f,p,lp,t)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(f,lp)
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_assign_pf_su2!(f::AbstractArray, p , lp::SpaceParm, t::Int64)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if t == 0
|
||||||
|
f[b,r] = Spinor(map(x->SU2fund(x[b,1,r,1] + im* x[b,1,r,2],
|
||||||
|
x[b,2,r,1] + im* x[b,2,r,2]),p))
|
||||||
|
elseif point_time((b,r),lp) == t
|
||||||
|
f[b,r] = Spinor(map(x->SU2fund(x[b,1,r,1] + im* x[b,1,r,2],
|
||||||
|
x[b,2,r,1] + im* x[b,2,r,2]),p))
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
456
src/Dirac/Diracflow.jl
Normal file
456
src/Dirac/Diracflow.jl
Normal file
|
@ -0,0 +1,456 @@
|
||||||
|
|
||||||
|
import ..YM.flw, ..YM.force_gauge, ..YM.flw_adapt
|
||||||
|
|
||||||
|
|
||||||
|
function flw(U, psi, int::FlowIntr{NI,T}, ns::Int64, eps, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace) where {NI,T}
|
||||||
|
@timeit "Integrating flow equations" begin
|
||||||
|
for i in 1:ns
|
||||||
|
force_gauge(ymws, U, int.c0, 1, gp, lp)
|
||||||
|
|
||||||
|
if int.add_zth
|
||||||
|
add_zth_term(ymws::YMworkspace, U, lp)
|
||||||
|
end
|
||||||
|
|
||||||
|
Nablanabla!(dws.sAp, U, psi, dpar, dws, lp)
|
||||||
|
psi .= psi + 2*int.r*eps*dws.sAp
|
||||||
|
|
||||||
|
ymws.mom .= ymws.frc1
|
||||||
|
U .= expm.(U, ymws.mom, 2*eps*int.r)
|
||||||
|
|
||||||
|
for k in 1:NI
|
||||||
|
force_gauge(ymws, U, int.c0, 1, gp, lp)
|
||||||
|
|
||||||
|
if int.add_zth
|
||||||
|
add_zth_term(ymws::YMworkspace, U, lp)
|
||||||
|
end
|
||||||
|
|
||||||
|
Nablanabla!(dws.sp, U, psi, dpar, dws, lp)
|
||||||
|
dws.sAp .= int.e0[k].*dws.sAp .+ int.e1[k].*dws.sp
|
||||||
|
psi .= psi + 2*eps*dws.sAp
|
||||||
|
|
||||||
|
ymws.mom .= int.e0[k].*ymws.mom .+ int.e1[k].*ymws.frc1
|
||||||
|
U .= expm.(U, ymws.mom, 2*eps)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
flw(U, psi, int::FlowIntr{NI,T}, ns::Int64, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace) where {NI,T} = flw(U, psi, int::FlowIntr{NI,T}, ns::Int64, int.eps, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
|
||||||
|
|
||||||
|
"""
|
||||||
|
function backflow(psi, U, Dt, nsave::Int64, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
|
||||||
|
|
||||||
|
Performs one step back in flow time for the fermion field, according to 1302.5246. The fermion field must me that of the time-slice Dt and is flowed back to the first time-slice
|
||||||
|
nsave is the total number of gauge fields saved in the process
|
||||||
|
|
||||||
|
"""
|
||||||
|
function backflow(psi, U, Dt, maxnsave::Int64, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
|
||||||
|
|
||||||
|
int = wfl_rk3(Float64,0.01,1.0) # Default integrator, it has to be order 3 rk but in can be zfl
|
||||||
|
|
||||||
|
@timeit "Backflow integration" begin
|
||||||
|
@timeit "GPU to CPU" U0 = Array(U)
|
||||||
|
|
||||||
|
nt,eps_all = flw_adapt(U, int, Dt, gp, lp, ymws)
|
||||||
|
|
||||||
|
nsave = min(maxnsave,nt)
|
||||||
|
|
||||||
|
nsave != 0 ? dsave = Int64(floor(nt/nsave)) : dsave = nt
|
||||||
|
Usave = Vector{typeof(U0)}(undef,nsave)
|
||||||
|
|
||||||
|
@timeit "CPU to GPU" copyto!(U,U0)
|
||||||
|
for i in 1:(dsave*nsave)
|
||||||
|
flw(U, int, 1, eps_all[i], gp, lp, ymws)
|
||||||
|
(i%dsave)==0 ? Usave[Int64(i/dsave)] = Array(U) : nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
for j in (nt%nsave):-1:1
|
||||||
|
@timeit "CPU to GPU" copyto!(U,Usave[end])
|
||||||
|
for k in 1:j-1
|
||||||
|
flw(U, int, 1, eps_all[nsave*dsave + k], gp, lp, ymws)
|
||||||
|
end
|
||||||
|
bflw_step!(psi, U, eps_all[nsave*dsave + j], int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
|
||||||
|
end
|
||||||
|
|
||||||
|
for i in (nsave-1):-1:1
|
||||||
|
for j in dsave:-1:1
|
||||||
|
@timeit "CPU to GPU" copyto!(U,Usave[i])
|
||||||
|
for k in 1:j-1
|
||||||
|
flw(U, int, 1, eps_all[i*dsave + k], gp, lp, ymws)
|
||||||
|
end
|
||||||
|
bflw_step!(psi, U, eps_all[i*dsave + j], int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
@timeit "CPU to GPU" copyto!(U,U0)
|
||||||
|
|
||||||
|
for j in dsave:-1:1
|
||||||
|
@timeit "CPU to GPU" copyto!(U,U0)
|
||||||
|
for k in 1:j-1
|
||||||
|
flw(U, int, 1, eps_all[k], gp, lp, ymws)
|
||||||
|
end
|
||||||
|
bflw_step!(psi, U, eps_all[j], int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
|
||||||
|
end
|
||||||
|
|
||||||
|
@timeit "CPU to GPU" copyto!(U,U0)
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
"""
|
||||||
|
function bflw_step!(U, psi, eps, int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
|
||||||
|
|
||||||
|
Performs ONE backstep in psi, from t to t-\eps. U is supposed to be the one in t-\eps and is left unchanged. So far, int has to be rk4
|
||||||
|
"""
|
||||||
|
function bflw_step!(psi, U, eps, int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
|
||||||
|
|
||||||
|
@timeit "Backflow step" begin
|
||||||
|
|
||||||
|
V = copy(U)
|
||||||
|
V .= U
|
||||||
|
|
||||||
|
force_gauge(ymws, U, int.c0, 1, gp, lp)
|
||||||
|
|
||||||
|
if int.add_zth
|
||||||
|
add_zth_term(ymws::YMworkspace, U, lp)
|
||||||
|
end
|
||||||
|
|
||||||
|
ymws.mom .= ymws.frc1
|
||||||
|
U .= expm.(U, ymws.mom, 2*eps*int.r)
|
||||||
|
|
||||||
|
force_gauge(ymws, U, int.c0, 1, gp, lp)
|
||||||
|
|
||||||
|
if int.add_zth
|
||||||
|
add_zth_term(ymws::YMworkspace, U, lp)
|
||||||
|
end
|
||||||
|
|
||||||
|
ymws.mom .= int.e0[1].*ymws.mom .+ int.e1[1].*ymws.frc1
|
||||||
|
U .= expm.(U, ymws.mom, 2*eps)
|
||||||
|
|
||||||
|
Nablanabla!(dws.sp, U, 0.75*2*eps*psi, dpar, dws, lp)
|
||||||
|
|
||||||
|
U .= V
|
||||||
|
|
||||||
|
force_gauge(ymws, U, int.c0, 1, gp, lp)
|
||||||
|
|
||||||
|
if int.add_zth
|
||||||
|
add_zth_term(ymws::YMworkspace, U, lp)
|
||||||
|
end
|
||||||
|
|
||||||
|
U .= expm.(U, ymws.frc1, 2*eps*int.r)
|
||||||
|
|
||||||
|
Nablanabla!(dws.sAp, U, 2*eps*dws.sp, dpar, dws, lp)
|
||||||
|
dws.sAp .= psi + (8/9)*dws.sAp
|
||||||
|
|
||||||
|
U .= V
|
||||||
|
|
||||||
|
Nablanabla!(psi, U, 2*eps*(dws.sAp - (8/9)*dws.sp), dpar, dws, lp)
|
||||||
|
psi .= (1/4)*psi + dws.sp + dws.sAp
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
function flw_adapt(U, psi, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace) where {NI,T}
|
||||||
|
|
||||||
|
eps = epsini
|
||||||
|
dt = tend
|
||||||
|
nstp = 0
|
||||||
|
eps_all = Vector{T}(undef,0)
|
||||||
|
while true
|
||||||
|
ns = convert(Int64, floor(dt/eps))
|
||||||
|
if ns > 10
|
||||||
|
flw(U, psi, int, 9, eps, gp, dpar, lp, ymws, dws)
|
||||||
|
ymws.U1 .= U
|
||||||
|
flw(U, psi, int, 1, eps, gp, dpar, lp, ymws, dws)
|
||||||
|
flw(ymws.U1, int, 2, eps/2, gp, lp, ymws)
|
||||||
|
|
||||||
|
dt = dt - 10*eps
|
||||||
|
nstp = nstp + 10
|
||||||
|
push!(eps_all,ntuple(i->eps,10)...)
|
||||||
|
|
||||||
|
# adjust step size
|
||||||
|
ymws.U1 .= ymws.U1 ./ U
|
||||||
|
maxd = CUDA.mapreduce(dev_one, max, ymws.U1, init=zero(tend))
|
||||||
|
eps = min(int.max_eps, 2*eps, int.sft_fac*eps*(int.tol/maxd)^(one(tend)/3))
|
||||||
|
|
||||||
|
else
|
||||||
|
flw(U, psi, int, ns, eps, gp, dpar, lp, ymws, dws)
|
||||||
|
dt = dt - ns*eps
|
||||||
|
|
||||||
|
push!(eps_all,ntuple(i->eps,ns)...)
|
||||||
|
push!(eps_all,dt)
|
||||||
|
|
||||||
|
flw(U, psi, int, 1, dt, gp, dpar, lp, ymws, dws)
|
||||||
|
dt = zero(tend)
|
||||||
|
|
||||||
|
nstp = nstp + ns + 1
|
||||||
|
end
|
||||||
|
|
||||||
|
if dt == zero(tend)
|
||||||
|
break
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nstp, eps_all
|
||||||
|
end
|
||||||
|
flw_adapt(U, psi, int::FlowIntr{NI,T}, tend::T, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace) where {NI,T} = flw_adapt(U, psi, int, tend, int.eps_ini, gp, dpar, lp, ymws, dws)
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
function Nablanabla!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
|
||||||
|
|
||||||
|
Computes /`/` \\nabla^* \\nabla /`/` `si` and stores it in `si`.
|
||||||
|
|
||||||
|
"""
|
||||||
|
function Nablanabla!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
|
||||||
|
@timeit "Laplacian" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Nablanabla(so, U, si, dpar.th, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
function Nablanabla!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D},SpaceParm{4,6,BC_OPEN,D}}) where {D}
|
||||||
|
SF_bndfix!(si,lp)
|
||||||
|
@timeit "Laplacian" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Nablanabla(so, U, si, dpar.th, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(so,lp)
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
function krnl_Nablanabla(so, U, si, th, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
|
||||||
|
|
||||||
|
so[b,r] = -4*si[b,r]
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
so[b,r] += 0.5*( th[1] * (U[b,1,r]*si[bu1,ru1]) +conj(th[1]) * (U[bd1,1,rd1]\si[bd1,rd1]) +
|
||||||
|
th[2] * (U[b,2,r]*si[bu2,ru2]) +conj(th[2]) * (U[bd2,2,rd2]\si[bd2,rd2]) +
|
||||||
|
th[3] * (U[b,3,r]*si[bu3,ru3]) +conj(th[3]) * (U[bd3,3,rd3]\si[bd3,rd3]) +
|
||||||
|
th[4] * (U[b,4,r]*si[bu4,ru4]) +conj(th[4]) * (U[bd4,4,rd4]\si[bd4,rd4]) )
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_Nablanabla(so, U, si, th, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = -4*si[b,r]
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
so[b,r] += 0.5*( th[1] * (U[b,1,r]*si[bu1,ru1]) +conj(th[1]) * (U[bd1,1,rd1]\si[bd1,rd1]) +
|
||||||
|
th[2] * (U[b,2,r]*si[bu2,ru2]) +conj(th[2]) * (U[bd2,2,rd2]\si[bd2,rd2]) +
|
||||||
|
th[3] * (U[b,3,r]*si[bu3,ru3]) +conj(th[3]) * (U[bd3,3,rd3]\si[bd3,rd3]) +
|
||||||
|
th[4] * (U[b,4,r]*si[bu4,ru4]) +conj(th[4]) * (U[bd4,4,rd4]\si[bd4,rd4]) )
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_Nablanabla(so, U, si, th, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) != 1)
|
||||||
|
|
||||||
|
so[b,r] = -4*si[b,r]
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
so[b,r] += 0.5*( th[1] * (U[b,1,r]*si[bu1,ru1]) +conj(th[1]) * (U[bd1,1,rd1]\si[bd1,rd1]) +
|
||||||
|
th[2] * (U[b,2,r]*si[bu2,ru2]) +conj(th[2]) * (U[bd2,2,rd2]\si[bd2,rd2]) +
|
||||||
|
th[3] * (U[b,3,r]*si[bu3,ru3]) +conj(th[3]) * (U[bd3,3,rd3]\si[bd3,rd3]) +
|
||||||
|
th[4] * (U[b,4,r]*si[bu4,ru4]) +conj(th[4]) * (U[bd4,4,rd4]\si[bd4,rd4]) )
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
export Nablanabla!, flw, backflow, flw_adapt, bflw_step!
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
function Dslash_sq!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
|
||||||
|
|
||||||
|
Computes /`/` //slashed{D}^2 si /`/` ans stores it in `si`.
|
||||||
|
|
||||||
|
"""
|
||||||
|
function Dslash_sq!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D}) where {B,D}
|
||||||
|
|
||||||
|
@timeit "DwdagDw" begin
|
||||||
|
|
||||||
|
@timeit "g5Dslsh" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dslsh!(dws.st, U, si, dpar.th, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "Dw_improvement" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dslsh_impr!(dws.st, dws.csw, dpar.csw, si, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
@timeit "g5Dslsh" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dslsh!(so, U, dws.st, dpar.th, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "Dw_improvement" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dslsh_impr!(so, dws.csw, dpar.csw, dws.st, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
function krnl_g5Dslsh!(so, U, si, th, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) != 1)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = 4*si[b,r]
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
so[b,r] = dmul(Gamma{5}, so[b,r])
|
||||||
|
end
|
||||||
|
end
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
function krnl_g5Dslsh!(so, U, si, th, lp::SpaceParm{4,6,B,D}) where {D,B}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = 4*si[b,r]
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
so[b,r] = dmul(Gamma{5}, so[b,r])
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_g5Dslsh_impr!(so, Fcsw, csw, si, lp::SpaceParm{4,6,B,D}) where {B,D}
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x);
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
so[b,r] += 0.5*csw*im*dmul(Gamma{5},( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
||||||
|
-Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) - Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) - Fcsw[b,6,r]*dmul(Gamma{13},si[b,r])))
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
function krnl_g5Dslsh_impr!(so, Fcsw, csw, si, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x);
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) != 1)
|
||||||
|
|
||||||
|
so[b,r] += 0.5*csw*im*dmul(Gamma{5},( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
||||||
|
-Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) - Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) - Fcsw[b,6,r]*dmul(Gamma{13},si[b,r])))
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
end
|
667
src/Dirac/Diracoper.jl
Normal file
667
src/Dirac/Diracoper.jl
Normal file
|
@ -0,0 +1,667 @@
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## OPEN
|
||||||
|
|
||||||
|
"""
|
||||||
|
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
|
||||||
|
|
||||||
|
Computes the Dirac operator (with the Wilson term) `\`\``D_w``\`\` with gauge field U and parameters `dpar` of the field `si` and stores it in `so`.
|
||||||
|
If `dpar.csw` is different from zero, the clover term should be stored in `dws.csw` via the Csw! function and is automatically included in the operator.
|
||||||
|
|
||||||
|
"""
|
||||||
|
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
|
||||||
|
|
||||||
|
SF_bndfix!(si,lp)
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
else
|
||||||
|
@timeit "Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(so,lp)
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
|
||||||
|
|
||||||
|
# The field si is assumed to be zero at t = 0,T
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r]) + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
||||||
|
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
||||||
|
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == (lp.iL[4]-1))
|
||||||
|
so[b,r] += (ct-1.0)*si[b,r]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_Dw!(so, U, si, m0, tm, th, ct, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
|
||||||
|
|
||||||
|
# The field si is assumed to be zero at t = 0,T
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r])
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == (lp.iL[4]-1))
|
||||||
|
so[b,r] += (ct-1.0)*si[b,r]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
|
||||||
|
|
||||||
|
Computes \`\` \\gamma_5 \`\` times the Dirac operator (with the Wilson term) with gauge field U and parameters `dpar` of the field `si` and stores it in `so`.
|
||||||
|
If `dpar.csw` is different from zero, the clover term should be stored in `dws.csw` via the Csw! function and is automatically included in the operator.
|
||||||
|
"""
|
||||||
|
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
|
||||||
|
|
||||||
|
SF_bndfix!(si,lp)
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
else
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(so,lp)
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
|
||||||
|
|
||||||
|
# The field si is assumed to be zero at t = 0,T
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
||||||
|
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
||||||
|
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == (lp.iL[4]-1))
|
||||||
|
so[b,r] += (ct-1.0)*si[b,r]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
so[b,r] = dmul(Gamma{5}, so[b,r])+ im*tm*si[b,r]
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_g5Dw!(so, U, si, m0, tm, th, ct, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
|
||||||
|
|
||||||
|
# The field si is assumed to be zero at t = 0,T
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r]
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == (lp.iL[4]-1))
|
||||||
|
so[b,r] += (ct-1.0)*si[b,r]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
so[b,r] = dmul(Gamma{5}, so[b,r]) + im*tm*si[b,r]
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
"""
|
||||||
|
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
|
||||||
|
|
||||||
|
Applies the operator \`\` \\gamma_5 D_w \`\` twice to `si` and stores the result in `so`. This is equivalent to appling the operator \`\` D_w^\\dagger D_w \`\`
|
||||||
|
The Dirac operator is the same as in the functions `Dw!` and `g5Dw!`
|
||||||
|
"""
|
||||||
|
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
|
||||||
|
|
||||||
|
SF_bndfix!(si,lp)
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "DwdagDw" begin
|
||||||
|
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(dws.st,lp)
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, -dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(so,lp)
|
||||||
|
end
|
||||||
|
else
|
||||||
|
@timeit "DwdagDw" begin
|
||||||
|
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(dws.st,lp)
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, -dpar.tm, dpar.th, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(so,lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
## PERDIODIC
|
||||||
|
|
||||||
|
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
|
||||||
|
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
else
|
||||||
|
@timeit "Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r]+ im*tm*dmul(Gamma{5},si[b,r]) + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
||||||
|
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_Dw!(so, U, si, m0, tm, th, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r])
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
|
||||||
|
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
else
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
||||||
|
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
so[b,r] = dmul(Gamma{5}, so[b,r])+ im*tm*si[b,r]
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_g5Dw!(so, U, si, m0, tm, th, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r]
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
so[b,r] = dmul(Gamma{5}, so[b,r]) + im*tm*si[b,r]
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
|
||||||
|
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "DwdagDw" begin
|
||||||
|
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, -dpar.tm, dpar.th, dpar.csw, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
else
|
||||||
|
@timeit "DwdagDw" begin
|
||||||
|
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.tm, dpar.th, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, -dpar.tm, dpar.th, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
## SF
|
||||||
|
|
||||||
|
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
SF_bndfix!(si,lp)
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
else
|
||||||
|
@timeit "Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(so,lp)
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
# The field si is assumed to be zero at t = 0
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) != 1)
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r]) + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
||||||
|
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
||||||
|
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
|
||||||
|
so[b,r] += (ct-1.0)*si[b,r]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_Dw!(so, U, si, m0, tm, th, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
# The field si is assumed to be zero at t = 0
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) != 1)
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r])
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
|
||||||
|
so[b,r] += (ct-1.0)*si[b,r]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
SF_bndfix!(si,lp)
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
else
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(so,lp)
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
# The field si is assumed to be zero at t = 0
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) != 1)
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
|
||||||
|
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
|
||||||
|
|
||||||
|
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
|
||||||
|
so[b,r] += (ct-1.0)*si[b,r]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
so[b,r] = dmul(Gamma{5}, so[b,r])+ im*tm*si[b,r]
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_g5Dw!(so, U, si, m0, tm, th, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
# The field si is assumed to be zero at t = 0
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) != 1)
|
||||||
|
|
||||||
|
bu1, ru1 = up((b,r), 1, lp)
|
||||||
|
bd1, rd1 = dw((b,r), 1, lp)
|
||||||
|
bu2, ru2 = up((b,r), 2, lp)
|
||||||
|
bd2, rd2 = dw((b,r), 2, lp)
|
||||||
|
bu3, ru3 = up((b,r), 3, lp)
|
||||||
|
bd3, rd3 = dw((b,r), 3, lp)
|
||||||
|
bu4, ru4 = up((b,r), 4, lp)
|
||||||
|
bd4, rd4 = dw((b,r), 4, lp)
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
|
||||||
|
so[b,r] = (4+m0)*si[b,r]
|
||||||
|
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
|
||||||
|
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
|
||||||
|
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
|
||||||
|
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
|
||||||
|
|
||||||
|
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
|
||||||
|
so[b,r] += (ct-1.0)*si[b,r]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
so[b,r] = dmul(Gamma{5}, so[b,r]) + im*tm*si[b,r]
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
|
||||||
|
|
||||||
|
SF_bndfix!(si,lp)
|
||||||
|
if abs(dpar.csw) > 1.0E-10
|
||||||
|
@timeit "DwdagDw" begin
|
||||||
|
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(dws.st,lp)
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, -dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(so,lp)
|
||||||
|
end
|
||||||
|
else
|
||||||
|
@timeit "DwdagDw" begin
|
||||||
|
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(dws.st,lp)
|
||||||
|
@timeit "g5Dw" begin
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, -dpar.tm, dpar.th, dpar.ct, lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
SF_bndfix!(so,lp)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
|
@ -31,7 +31,7 @@ scalar_field(::Type{T}, lp::SpaceParm) where {T} = CuArray{T, 2}(undef, lp.b
|
||||||
"""
|
"""
|
||||||
nscalar_field(::Type{T}, n::Integer, lp::SpaceParm)
|
nscalar_field(::Type{T}, n::Integer, lp::SpaceParm)
|
||||||
|
|
||||||
Returns `n` scalar fields of elemental type `T`
|
Returns `n` scalar fields of elemental type `T`.
|
||||||
"""
|
"""
|
||||||
nscalar_field(::Type{T}, n, lp::SpaceParm) where {T} = CuArray{T, 3}(undef, lp.bsz, n, lp.rsz)
|
nscalar_field(::Type{T}, n, lp::SpaceParm) where {T} = CuArray{T, 3}(undef, lp.bsz, n, lp.rsz)
|
||||||
|
|
||||||
|
@ -46,7 +46,7 @@ scalar_field_point(::Type{T}, lp::SpaceParm{N,M,D}) where {T,N,M,D} = CuArray{T,
|
||||||
"""
|
"""
|
||||||
tensor_field(::Type{T}, lp::SpaceParm)
|
tensor_field(::Type{T}, lp::SpaceParm)
|
||||||
|
|
||||||
Returns a tensor field of elemental type `T`.
|
Returns a (symmetric) tensor field of elemental type `T`.
|
||||||
"""
|
"""
|
||||||
tensor_field(::Type{T}, lp::SpaceParm) where {T} = CuArray{T, 3}(undef, lp.bsz, lp.npls, lp.rsz)
|
tensor_field(::Type{T}, lp::SpaceParm) where {T} = CuArray{T, 3}(undef, lp.bsz, lp.npls, lp.rsz)
|
||||||
|
|
||||||
|
|
|
@ -36,7 +36,7 @@ norm2(a::SU2{T}) where T <: AbstractFloat = abs2(a.t1) + abs2(a.t2)
|
||||||
"""
|
"""
|
||||||
tr(g::T) where T <: Group
|
tr(g::T) where T <: Group
|
||||||
|
|
||||||
Returns the trace of the groups element `g`.
|
Returns the trace of the group element `g`.
|
||||||
"""
|
"""
|
||||||
tr(g::SU2{T}) where T <: AbstractFloat = complex(2*real(g.t1), 0.0)
|
tr(g::SU2{T}) where T <: AbstractFloat = complex(2*real(g.t1), 0.0)
|
||||||
|
|
||||||
|
|
|
@ -60,6 +60,7 @@ using .Dirac
|
||||||
export DiracWorkspace, DiracParam
|
export DiracWorkspace, DiracParam
|
||||||
export Dw!, g5Dw!, DwdagDw!, SF_bndfix!, Csw!, pfrandomize!, mtwmdpar
|
export Dw!, g5Dw!, DwdagDw!, SF_bndfix!, Csw!, pfrandomize!, mtwmdpar
|
||||||
export read_prop, save_prop, read_dpar
|
export read_prop, save_prop, read_dpar
|
||||||
|
export Nablanabla!, flw, backflow
|
||||||
|
|
||||||
include("Solvers/Solvers.jl")
|
include("Solvers/Solvers.jl")
|
||||||
using .Solvers
|
using .Solvers
|
||||||
|
|
|
@ -17,6 +17,8 @@ function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Space
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
|
@timeit "Propagator computation" begin
|
||||||
|
|
||||||
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
||||||
|
|
||||||
CUDA.@allowscalar dws.sp[point_index(CartesianIndex{lp.ndim}(y),lp)...] = Spinor{4,SU3fund{Float64}}(ntuple(i -> (i==s)*SU3fund{Float64}(ntuple(j -> (j==c)*1.0,3)...),4))
|
CUDA.@allowscalar dws.sp[point_index(CartesianIndex{lp.ndim}(y),lp)...] = Spinor{4,SU3fund{Float64}}(ntuple(i -> (i==s)*SU3fund{Float64}(ntuple(j -> (j==c)*1.0,3)...),4))
|
||||||
|
@ -28,6 +30,8 @@ function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Space
|
||||||
g5Dw!(pro,U,dws.sp,mtwmdpar(dpar),dws,lp)
|
g5Dw!(pro,U,dws.sp,mtwmdpar(dpar),dws,lp)
|
||||||
|
|
||||||
niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
|
niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
|
||||||
|
end
|
||||||
|
|
||||||
return niter
|
return niter
|
||||||
end
|
end
|
||||||
|
|
||||||
|
@ -40,6 +44,9 @@ function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Space
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
|
@timeit "Propagator computation" begin
|
||||||
|
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
||||||
|
|
||||||
pfrandomize!(dws.sp,lp,time)
|
pfrandomize!(dws.sp,lp,time)
|
||||||
|
|
||||||
CUDA.@sync begin
|
CUDA.@sync begin
|
||||||
|
@ -49,6 +56,8 @@ function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Space
|
||||||
g5Dw!(pro,U,dws.sp,mtwmdpar(dpar),dws,lp)
|
g5Dw!(pro,U,dws.sp,mtwmdpar(dpar),dws,lp)
|
||||||
|
|
||||||
niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
|
niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
|
||||||
|
end
|
||||||
|
|
||||||
return niter
|
return niter
|
||||||
end
|
end
|
||||||
|
|
||||||
|
@ -81,6 +90,7 @@ function bndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Sp
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
|
@timeit "Propagator computation" begin
|
||||||
SF_bndfix!(pro,lp)
|
SF_bndfix!(pro,lp)
|
||||||
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
||||||
|
|
||||||
|
@ -95,6 +105,8 @@ function bndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Sp
|
||||||
g5Dw!(pro,U,dpar.ct*dws.sp,mtwmdpar(dpar),dws,lp)
|
g5Dw!(pro,U,dpar.ct*dws.sp,mtwmdpar(dpar),dws,lp)
|
||||||
|
|
||||||
niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
|
niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
|
||||||
|
end
|
||||||
|
|
||||||
return niter
|
return niter
|
||||||
end
|
end
|
||||||
|
|
||||||
|
@ -126,7 +138,7 @@ function Tbndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::S
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
SF_bndfix!(pro,lp)
|
@timeit "Propagator computation" begin
|
||||||
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
||||||
|
|
||||||
CUDA.@sync begin
|
CUDA.@sync begin
|
||||||
|
@ -137,9 +149,11 @@ function Tbndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::S
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp)
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp)
|
||||||
end
|
end
|
||||||
|
|
||||||
|
|
||||||
g5Dw!(pro,U,dpar.ct*dws.sp,mtwmdpar(dpar),dws,lp)
|
g5Dw!(pro,U,dpar.ct*dws.sp,mtwmdpar(dpar),dws,lp)
|
||||||
|
|
||||||
niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
|
niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
|
||||||
|
end
|
||||||
return niter
|
return niter
|
||||||
end
|
end
|
||||||
|
|
||||||
|
|
|
@ -26,19 +26,19 @@ This structure contains information about the lattice being simulated. The param
|
||||||
- `N`: The number of dimensions
|
- `N`: The number of dimensions
|
||||||
- `M`: The number of planes (i.e. \`\` N(N-1)/2 \`\`)
|
- `M`: The number of planes (i.e. \`\` N(N-1)/2 \`\`)
|
||||||
- `B`: The boundary conditions in Euclidean time. Acceptable values are
|
- `B`: The boundary conditions in Euclidean time. Acceptable values are
|
||||||
- `BC_PERIODIC`: Periodic boundary conditions
|
- `BC_PERIODIC`: Periodic boundary conditions.
|
||||||
- `BC_SF_AFWB`: Schrödinger Funtional Aoki-Frezzoptti-Weisz Choice B.
|
- `BC_SF_AFWB`: Schrödinger Functional Aoki-Frezzotti-Weisz Choice B.
|
||||||
- `BC_SF_ORBI`: Schrödinger Funtional orbifold constructions.
|
- `BC_SF_ORBI`: Schrödinger Functional orbifold constructions.
|
||||||
- `BC_OPEN`: Open boundary conditions.
|
- `BC_OPEN`: Open boundary conditions.
|
||||||
|
|
||||||
The structure conatins the following components:
|
The structure contains the following components:
|
||||||
- `iL`: Tuple containing the lattice length in each dimension.
|
- `iL`: Tuple containing the lattice length in each dimension.
|
||||||
- `plidx`: The directions of each plane
|
- `plidx`: The directions of each plane.
|
||||||
- `blk`: The block size in each each dimension
|
- `blk`: The block size in each each dimension.
|
||||||
- `rbk`: The number of blocks in each dimension
|
- `rbk`: The number of blocks in each dimension.
|
||||||
- `bsz`: The number of points in each block
|
- `bsz`: The number of points in each block.
|
||||||
- `rsz`: The number of blocks in the lattice
|
- `rsz`: The number of blocks in the lattice.
|
||||||
- `ntw`: The twist tensor in each plane
|
- `ntw`: The twist tensor in each plane.
|
||||||
"""
|
"""
|
||||||
struct SpaceParm{N,M,B,D}
|
struct SpaceParm{N,M,B,D}
|
||||||
ndim::Int64
|
ndim::Int64
|
||||||
|
|
|
@ -14,6 +14,7 @@ module Spinors
|
||||||
using ..Groups
|
using ..Groups
|
||||||
import ..Groups.imm, ..Groups.mimm, ..Groups.norm, ..Groups.norm2, ..Groups.dot
|
import ..Groups.imm, ..Groups.mimm, ..Groups.norm, ..Groups.norm2, ..Groups.dot
|
||||||
|
|
||||||
|
|
||||||
struct Spinor{NS,G}
|
struct Spinor{NS,G}
|
||||||
s::NTuple{NS,G}
|
s::NTuple{NS,G}
|
||||||
end
|
end
|
||||||
|
@ -291,25 +292,23 @@ end
|
||||||
"""
|
"""
|
||||||
dmul(Gamma{n}, a::Spinor)
|
dmul(Gamma{n}, a::Spinor)
|
||||||
|
|
||||||
Returns ``\\gamma_n a``
|
Returns ``\\gamma_n a``. Indexing for Dirac basis ``\\gamma_n``:
|
||||||
|
|
||||||
indexing for Dirac basis ``\\gamma_n``:
|
1 ``\\gamma_1``;
|
||||||
|
2 ``\\gamma_2``;
|
||||||
1 gamma1;
|
3 ``\\gamma_3``;
|
||||||
2 gamma2;
|
4 ``\\gamma_0``;
|
||||||
3 gamma3;
|
5 ``\\gamma_5``;
|
||||||
4 gamma0;
|
6 ``\\gamma_1 \\gamma_5``;
|
||||||
5 gamma5;
|
7 ``\\gamma_2 \\gamma_5``;
|
||||||
6 gamma1 gamma5;
|
8 ``\\gamma_3 \\gamma_5``;
|
||||||
7 gamma2 gamma5;
|
9 ``\\gamma_0 \\gamma_5``;
|
||||||
8 gamma3 gamma5;
|
10 ``\\sigma_{01}``;
|
||||||
9 gamma0 gamma5;
|
11 ``\\sigma_{02}``;
|
||||||
10 sigma01;
|
12 ``\\sigma_{03}``;
|
||||||
11 sigma02;
|
13 ``\\sigma_{21}``;
|
||||||
12 sigma03;
|
14 ``\\sigma_{32}``;
|
||||||
13 sigma21;
|
15 ``\\sigma_{31}``;
|
||||||
14 sigma32;
|
|
||||||
15 sigma31;
|
|
||||||
16 identity;
|
16 identity;
|
||||||
|
|
||||||
"""
|
"""
|
||||||
|
|
20
src/YM/YM.jl
20
src/YM/YM.jl
|
@ -23,15 +23,15 @@ import Base.show
|
||||||
"""
|
"""
|
||||||
struct GaugeParm{T,G,N}
|
struct GaugeParm{T,G,N}
|
||||||
|
|
||||||
Structure containning the parameters of a pure gauge simulation. These are:
|
Structure containing the parameters of a pure gauge simulation. These are:
|
||||||
- beta: Type `T`. The bare coupling of the simulation
|
- beta: Type `T`. The bare coupling of the simulation.
|
||||||
- c0: Type `T`. LatticeGPU supports the simulation of gauge actions made of 1x1 Wilson Loops and 2x1 Wilson loops. The parameter c0 defines the coefficient on the simulation of the 1x1 loops. Some common choices are:
|
- c0: Type `T`. LatticeGPU supports the simulation of gauge actions made of 1x1 Wilson Loops and 2x1 Wilson loops. The parameter c0 defines the coefficient on the simulation of the 1x1 loops. Some common choices are:
|
||||||
- c0=1: Wilson plaquette action
|
- c0=1: Wilson plaquette action.
|
||||||
- c0=5/3: Tree-level improved Lüscher-Weisz action.
|
- c0=5/3: Tree-level improved Lüscher-Weisz action.
|
||||||
- c0=3.648: Iwasaki gauge action
|
- c0=3.648: Iwasaki gauge action.
|
||||||
- cG: Tuple (`T`, `T`). Boundary improvement parameters.
|
- cG: Tuple (`T`, `T`). Boundary improvement parameters.
|
||||||
- ng: `Int64`. Rank of the gauge group.
|
- ng: `Int64`. Rank of the gauge group.
|
||||||
- Ubnd: Boundary field for SF boundary conditions
|
- Ubnd: Boundary field for SF boundary conditions.
|
||||||
"""
|
"""
|
||||||
struct GaugeParm{T,G,N}
|
struct GaugeParm{T,G,N}
|
||||||
beta::T
|
beta::T
|
||||||
|
@ -79,11 +79,11 @@ end
|
||||||
"""
|
"""
|
||||||
struct YMworkspace{T}
|
struct YMworkspace{T}
|
||||||
|
|
||||||
Structure containing memory workspace that is resused by different routines in order to avoid allocating/deallocating time.
|
Structure containing memory workspace that is reused by different routines in order to avoid allocating/deallocating time.
|
||||||
The parameter `T` represents the precision of the simulation (i.e. single/double). The structure contains the following components
|
The parameter `T` represents the precision of the simulation (i.e. single/double). The structure contains the following components
|
||||||
- GRP: Group being simulated
|
- GRP: Group being simulated.
|
||||||
- ALG: Corresponding Algebra
|
- ALG: Corresponding Algebra.
|
||||||
- PRC: Precision (i.e. `T`)
|
- PRC: Precision (i.e. `T`).
|
||||||
- frc1: Algebra field with natural indexing.
|
- frc1: Algebra field with natural indexing.
|
||||||
- frc2: Algebra field with natural indexing.
|
- frc2: Algebra field with natural indexing.
|
||||||
- mom: Algebra field with natural indexing.
|
- mom: Algebra field with natural indexing.
|
||||||
|
@ -141,7 +141,7 @@ end
|
||||||
"""
|
"""
|
||||||
function ztwist(gp::GaugeParm{T,G}, lp::SpaceParm{N,M,B,D}[, ipl])
|
function ztwist(gp::GaugeParm{T,G}, lp::SpaceParm{N,M,B,D}[, ipl])
|
||||||
|
|
||||||
Returns the twist factor. If a plane index is passed, returns the twist factor as a complex{T}. If this is not provided, returns a tuple, containing the factor of each plane.
|
Returns the twist factor. If a plane index is passed, returns the twist factor as a Complex{T}. If this is not provided, returns a tuple, containing the factor of each plane.
|
||||||
"""
|
"""
|
||||||
function ztwist(gp::GaugeParm{T,G}, lp::SpaceParm{N,M,B,D}) where {T,G,N,M,B,D}
|
function ztwist(gp::GaugeParm{T,G}, lp::SpaceParm{N,M,B,D}) where {T,G,N,M,B,D}
|
||||||
|
|
||||||
|
|
805
src/YM/YMact.jl
805
src/YM/YMact.jl
|
@ -9,7 +9,11 @@
|
||||||
### created: Mon Jul 12 18:31:19 2021
|
### created: Mon Jul 12 18:31:19 2021
|
||||||
###
|
###
|
||||||
|
|
||||||
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,B,D}) where {T,NB,N,M,B,D}
|
|
||||||
|
##
|
||||||
|
## OPEN
|
||||||
|
##
|
||||||
|
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,NB,N,M,D}
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x)
|
b = Int64(CUDA.threadIdx().x)
|
||||||
r = Int64(CUDA.blockIdx().x)
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
@ -21,7 +25,404 @@ function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, zt
|
||||||
@inbounds begin
|
@inbounds begin
|
||||||
for id1 in N:-1:1
|
for id1 in N:-1:1
|
||||||
bu1, ru1 = up((b, r), id1, lp)
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1==N)
|
TOBC = (id1==N)
|
||||||
|
|
||||||
|
for id2 = 1:id1-1
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
ipl = ipl + 1
|
||||||
|
|
||||||
|
TWP = (I[id1]==1) && (I[id2]==1)
|
||||||
|
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
|
||||||
|
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
|
||||||
|
|
||||||
|
# H2 staple
|
||||||
|
(b1, r1) = up((b,r), id1, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id1, lp)
|
||||||
|
gb = U[b2,id2,r2]
|
||||||
|
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
|
||||||
|
|
||||||
|
# H3 staple
|
||||||
|
(b1, r1) = up((b,r), id2, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
|
||||||
|
(b3, r3) = up((b1,r1), id1, lp)
|
||||||
|
|
||||||
|
gc = U[b3,id2,r3]
|
||||||
|
|
||||||
|
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
|
||||||
|
# END staples
|
||||||
|
|
||||||
|
ga = U[bu1,id2,ru1]
|
||||||
|
|
||||||
|
g2 = U[b,id2,r]\U[b,id1,r]
|
||||||
|
|
||||||
|
if ( (it == lp.iL[end]) || (it == 1) ) && !TOBC
|
||||||
|
S += 0.5*cG*(c0*tr(g2*ga/U[bu2,id1,ru2]) + c1*tr(g2*ga/h3) + c1*tr(g2*h2/U[bu2,id1,ru2]))
|
||||||
|
elseif (it == lp.iL[end]-1) && TOBC
|
||||||
|
S += c0*tr(g2*ga/U[bu2,id1,ru2]) + c1*tr(g2*ga/h3)
|
||||||
|
elseif (it == lp.iL[end]) && TOBC
|
||||||
|
nothing
|
||||||
|
else
|
||||||
|
if TWP
|
||||||
|
S += (ztw[ipl]*c0)*tr(g2*ga/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
S += c0*tr(g2*ga/U[bu2,id1,ru2])
|
||||||
|
end
|
||||||
|
if TWH2
|
||||||
|
S += (ztw[ipl]*c1)*tr(g2*h2/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
S += c1*tr(g2*h2/U[bu2,id1,ru2])
|
||||||
|
end
|
||||||
|
if TWH3
|
||||||
|
S += (ztw[ipl]*c1)*tr(g2*ga/h3)
|
||||||
|
else
|
||||||
|
S += c1*tr(g2*ga/h3)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
plx[I] = S
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,N,M,D}
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
S = zero(eltype(plx))
|
||||||
|
ipl = 0
|
||||||
|
for id1 in N:-1:1
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
TOBC = (id1==N)
|
||||||
|
|
||||||
|
for id2 = 1:id1-1
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
ipl = ipl + 1
|
||||||
|
TWP = (I[id1]==1) && (I[id2]==1)
|
||||||
|
|
||||||
|
gt1 = U[bu1,id2,ru1]
|
||||||
|
|
||||||
|
if ( (it == lp.iL[end]) || (it == 1)) && !TOBC
|
||||||
|
S += 0.5*cG*(tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2])))
|
||||||
|
elseif (it == lp.iL[end]) && TOBC
|
||||||
|
nothing
|
||||||
|
else
|
||||||
|
if TWP
|
||||||
|
S += ztw[ipl]*tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
else
|
||||||
|
S += tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
plx[I] = S
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,N,M,D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
id1, id2 = lp.plidx[ipl]
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
TWP = (I[id1]==1)&&(I[id2]==1)
|
||||||
|
|
||||||
|
TOBC = (id1 == N)
|
||||||
|
|
||||||
|
gt1 = U[bu1,id2,ru1]
|
||||||
|
|
||||||
|
g1 = gt1/U[bu2,id1,ru2]
|
||||||
|
g2 = U[b,id2,r]\U[b,id1,r]
|
||||||
|
|
||||||
|
if !TOBC && ( (it == 1) || (it == lp.iL[end]) )
|
||||||
|
X = 0.5*cG*projalg(U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
|
||||||
|
frc1[b ,id1, r ] -= X
|
||||||
|
frc1[b ,id2, r ] += X
|
||||||
|
frc2[bu1,id2,ru1] -= 0.5*cG*projalg(g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += 0.5*cG*projalg(g2*g1)
|
||||||
|
elseif TOBC && (it == lp.iL[end])
|
||||||
|
nothing
|
||||||
|
else
|
||||||
|
if TWP
|
||||||
|
X = projalg(ztw,U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw,g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw,g2*g1)
|
||||||
|
else
|
||||||
|
X = projalg(U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(g2*g1)
|
||||||
|
end
|
||||||
|
frc1[b ,id1, r ] -= X
|
||||||
|
frc1[b ,id2, r ] += X
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,N,M,D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
id1, id2 = lp.plidx[ipl]
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
|
||||||
|
TOBC = (id1 == N)
|
||||||
|
TWP = (I[id1]==1) && (I[id2]==1)
|
||||||
|
TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) )
|
||||||
|
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
|
||||||
|
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
|
||||||
|
TWH4 = TWP || ( (I[id1]==2) && (I[id2]==1) )
|
||||||
|
|
||||||
|
# H1 staple
|
||||||
|
(b1, r1) = dw((b,r), id2, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id1, lp)
|
||||||
|
gc = U[b2,id2,r2]
|
||||||
|
h1 = (U[b1,id2,r1]\U[b1,id1,r1])*gc
|
||||||
|
|
||||||
|
# H2 staple
|
||||||
|
(b1, r1) = up((b,r), id1, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id1, lp)
|
||||||
|
gb = U[b2,id2,r2]
|
||||||
|
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
|
||||||
|
|
||||||
|
# H3 staple
|
||||||
|
(b1, r1) = up((b,r), id2, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
(b3, r3) = up((b1,r1), id1, lp)
|
||||||
|
gc = U[b3,id2,r3]
|
||||||
|
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
|
||||||
|
|
||||||
|
# H4 staple
|
||||||
|
(b1, r1) = dw((b,r), id1, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
h4 = (U[b1,id1,r1]\U[b1,id2,r1])*U[b2,id1,r2]
|
||||||
|
# END staples
|
||||||
|
|
||||||
|
ga = U[bu1,id2,ru1]
|
||||||
|
|
||||||
|
g1 = ga/U[bu2,id1,ru2]
|
||||||
|
g2 = U[b,id2,r]\U[b,id1,r]
|
||||||
|
|
||||||
|
if !TOBC && ( (it == 1) || (it == lp.iL[end]) )
|
||||||
|
X = 0.5*cG*(c0*projalg(U[b,id1,r]*g1/U[b,id2,r]) + c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2])) + c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3)) )
|
||||||
|
|
||||||
|
frc1[b,id1,r] -= X + 0.5*cG*c1*projalg(U[b,id1,r]*g1/h4)
|
||||||
|
frc1[b,id2,r] += X + 0.5*cG*c1*projalg(h1*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= 0.5*cG*c0*projalg(g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += 0.5*cG*c0*projalg(g2*g1)
|
||||||
|
frc2[bu1,id2,ru1] -= 0.5*cG*c1*projalg((g1/U[b,id2,r])*h1)
|
||||||
|
frc2[bu2,id1,ru2] += 0.5*cG*c1*projalg((U[b,id2,r]\h1)*g1)
|
||||||
|
frc2[bu2,id1,ru2] += 0.5*cG*c1*projalg(g2*h2/U[bu2,id1,ru2])
|
||||||
|
frc2[bu1,id2,ru1] -= 0.5*cG*c1*projalg((ga/h3)*g2)
|
||||||
|
frc2[bu1,id2,ru1] -= 0.5*cG*c1*projalg((g1/h4)*U[b,id1,r])
|
||||||
|
frc2[bu2,id1,ru2] += 0.5*cG*c1*projalg(h4\U[b,id1,r]*g1)
|
||||||
|
elseif TOBC && (it == lp.iL[end])
|
||||||
|
nothing
|
||||||
|
elseif TOBC && (it == 1)
|
||||||
|
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r]) + c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2])) + c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
|
||||||
|
|
||||||
|
frc1[b,id1,r] -= X
|
||||||
|
frc1[b,id2,r] += X + c1*projalg(h1*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg(g2*h2/U[bu2,id1,ru2])
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
|
||||||
|
elseif TOBC && (it == (lp.iL[end]-1) )
|
||||||
|
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r]) + c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
|
||||||
|
|
||||||
|
frc1[b,id1,r] -= X + c1*projalg(U[b,id1,r]*g1/h4)
|
||||||
|
frc1[b,id2,r] += X + c1*projalg(h1*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((g1/h4)*U[b,id1,r])
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg(h4\U[b,id1,r]*g1)
|
||||||
|
else
|
||||||
|
if TWP
|
||||||
|
X = projalg(c0*ztw,U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(c0*ztw,g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(c0*ztw,g2*g1)
|
||||||
|
else
|
||||||
|
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
|
||||||
|
end
|
||||||
|
if TWH1
|
||||||
|
frc1[b,id2,r] += projalg(ztw*c1,h1*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/U[b,id2,r])*h1)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw*c1,(U[b,id2,r]\h1)*g1)
|
||||||
|
else
|
||||||
|
frc1[b,id2,r] += c1*projalg(h1*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
|
||||||
|
end
|
||||||
|
if TWH2
|
||||||
|
X += projalg(ztw*c1,U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw*c1,g2*h2/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
X += c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg(g2*h2/U[bu2,id1,ru2])
|
||||||
|
end
|
||||||
|
if TWH3
|
||||||
|
X += projalg(ztw*c1,U[b,id1,r]*ga/(U[b,id2,r]*h3))
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(ga/h3)*g2)
|
||||||
|
else
|
||||||
|
X += c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
|
||||||
|
end
|
||||||
|
if TWH4
|
||||||
|
frc1[b,id1,r] -= projalg(ztw*c1,U[b,id1,r]*g1/h4)
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/h4)*U[b,id1,r])
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw*c1,h4\U[b,id1,r]*g1)
|
||||||
|
else
|
||||||
|
frc1[b,id1,r] -= c1*projalg(U[b,id1,r]*g1/h4)
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((g1/h4)*U[b,id1,r])
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg(h4\U[b,id1,r]*g1)
|
||||||
|
end
|
||||||
|
frc1[b,id1,r] -= X
|
||||||
|
frc1[b,id2,r] += X
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
##
|
||||||
|
## SF
|
||||||
|
##
|
||||||
|
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,BC_SF_ORBI,D}) where {T,NB,N,M,D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
ipl = 0
|
||||||
|
S = zero(eltype(plx))
|
||||||
|
@inbounds begin
|
||||||
|
for id1 in N:-1:1
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
SFBC = (id1==N)
|
||||||
|
|
||||||
|
for id2 = 1:id1-1
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
ipl = ipl + 1
|
||||||
|
|
||||||
|
TWP = (I[id1]==1) && (I[id2]==1)
|
||||||
|
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
|
||||||
|
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
|
||||||
|
|
||||||
|
# H2 staple
|
||||||
|
(b1, r1) = up((b,r), id1, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id1, lp)
|
||||||
|
if SFBC && (it == lp.iL[end]-1)
|
||||||
|
gb = Ubnd[id2]
|
||||||
|
else
|
||||||
|
gb = U[b2,id2,r2]
|
||||||
|
end
|
||||||
|
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
|
||||||
|
|
||||||
|
# H3 staple
|
||||||
|
(b1, r1) = up((b,r), id2, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
|
||||||
|
(b3, r3) = up((b1,r1), id1, lp)
|
||||||
|
if SFBC && (it == lp.iL[end])
|
||||||
|
gc = Ubnd[id2]
|
||||||
|
else
|
||||||
|
gc = U[b3,id2,r3]
|
||||||
|
end
|
||||||
|
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
|
||||||
|
# END staples
|
||||||
|
|
||||||
|
if SFBC && (it == lp.iL[end])
|
||||||
|
ga = Ubnd[id2]
|
||||||
|
else
|
||||||
|
ga = U[bu1,id2,ru1]
|
||||||
|
end
|
||||||
|
|
||||||
|
g2 = U[b,id2,r]\U[b,id1,r]
|
||||||
|
|
||||||
|
if (it == lp.iL[end]) && SFBC
|
||||||
|
S += cG*(c0*tr(g2*ga/U[bu2,id1,ru2])) + (c1)*tr(g2*ga/h3) + (c1/2)*tr((g2*ga/U[bu2,id1,ru2])*g2*ga/U[bu2,id1,ru2])
|
||||||
|
elseif (it == 1) && SFBC
|
||||||
|
S += cG*(c0*tr(g2*ga/U[bu2,id1,ru2])) + (c1)*tr(g2*ga/h3) + c1*tr(g2*h2/U[bu2,id1,ru2]) + (c1/2)*tr((g2*ga/U[bu2,id1,ru2])*g2*ga/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
if TWP
|
||||||
|
S += (ztw[ipl]*c0)*tr(g2*ga/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
S += c0*tr(g2*ga/U[bu2,id1,ru2])
|
||||||
|
end
|
||||||
|
if TWH2
|
||||||
|
S += (ztw[ipl]*c1)*tr(g2*h2/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
S += c1*tr(g2*h2/U[bu2,id1,ru2])
|
||||||
|
end
|
||||||
|
if TWH3
|
||||||
|
S += (ztw[ipl]*c1)*tr(g2*ga/h3)
|
||||||
|
else
|
||||||
|
S += c1*tr(g2*ga/h3)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
plx[I] = S
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,BC_SF_AFWB,D}) where {T,NB,N,M,D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
ipl = 0
|
||||||
|
S = zero(eltype(plx))
|
||||||
|
@inbounds begin
|
||||||
|
for id1 in N:-1:1
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
SFBC = (id1==N)
|
||||||
|
|
||||||
for id2 = 1:id1-1
|
for id2 = 1:id1-1
|
||||||
bu2, ru2 = up((b, r), id2, lp)
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
@ -95,7 +496,7 @@ function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, zt
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,B,D}) where {T,N,M,B,D}
|
function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::Union{SpaceParm{N,M,BC_SF_ORBI,D},SpaceParm{N,M,BC_SF_AFWB,D}}) where {T,N,M,D}
|
||||||
|
|
||||||
|
|
||||||
@inbounds begin
|
@inbounds begin
|
||||||
|
@ -103,8 +504,7 @@ function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,B
|
||||||
r = Int64(CUDA.blockIdx().x)
|
r = Int64(CUDA.blockIdx().x)
|
||||||
I = point_coord((b,r), lp)
|
I = point_coord((b,r), lp)
|
||||||
it = I[N]
|
it = I[N]
|
||||||
IBND = ( ( (B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) &&
|
IBND = ( (it == 1) || (it == lp.iL[end]))
|
||||||
( (it == 1) || (it == lp.iL[end])) )
|
|
||||||
|
|
||||||
S = zero(eltype(plx))
|
S = zero(eltype(plx))
|
||||||
ipl = 0
|
ipl = 0
|
||||||
|
@ -141,7 +541,7 @@ function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,B
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,B,D}) where {T,N,M,B,D}
|
function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw, ipl, lp::Union{SpaceParm{N,M,BC_SF_ORBI,D},SpaceParm{N,M,BC_SF_AFWB,D}}) where {T,N,M,D}
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x)
|
b = Int64(CUDA.threadIdx().x)
|
||||||
r = Int64(CUDA.blockIdx().x)
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
@ -154,7 +554,7 @@ function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw,
|
||||||
bu2, ru2 = up((b, r), id2, lp)
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
TWP = (I[id1]==1)&&(I[id2]==1)
|
TWP = (I[id1]==1)&&(I[id2]==1)
|
||||||
|
|
||||||
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == N)
|
SFBC = (id1 == N)
|
||||||
|
|
||||||
if SFBC && (it == lp.iL[end])
|
if SFBC && (it == lp.iL[end])
|
||||||
gt1 = Ubnd[id2]
|
gt1 = Ubnd[id2]
|
||||||
|
@ -195,7 +595,7 @@ function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw,
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,B,D}) where {T,N,M,B,D}
|
function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_SF_ORBI,D}) where {T,N,M,D}
|
||||||
|
|
||||||
b = Int64(CUDA.threadIdx().x)
|
b = Int64(CUDA.threadIdx().x)
|
||||||
r = Int64(CUDA.blockIdx().x)
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
@ -207,7 +607,146 @@ function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG,
|
||||||
bu1, ru1 = up((b, r), id1, lp)
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
bu2, ru2 = up((b, r), id2, lp)
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
|
||||||
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == N)
|
SFBC = (id1 == N)
|
||||||
|
TWP = (I[id1]==1) && (I[id2]==1)
|
||||||
|
TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) )
|
||||||
|
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
|
||||||
|
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
|
||||||
|
TWH4 = TWP || ( (I[id1]==2) && (I[id2]==1) )
|
||||||
|
|
||||||
|
# H1 staple
|
||||||
|
(b1, r1) = dw((b,r), id2, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id1, lp)
|
||||||
|
if SFBC && (it == lp.iL[end])
|
||||||
|
gc = Ubnd[id2]
|
||||||
|
else
|
||||||
|
gc = U[b2,id2,r2]
|
||||||
|
end
|
||||||
|
h1 = (U[b1,id2,r1]\U[b1,id1,r1])*gc
|
||||||
|
|
||||||
|
# H2 staple
|
||||||
|
(b1, r1) = up((b,r), id1, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id1, lp)
|
||||||
|
if SFBC && (it == lp.iL[end]-1)
|
||||||
|
gb = Ubnd[id2]
|
||||||
|
else
|
||||||
|
gb = U[b2,id2,r2]
|
||||||
|
end
|
||||||
|
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
|
||||||
|
|
||||||
|
# H3 staple
|
||||||
|
(b1, r1) = up((b,r), id2, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
(b3, r3) = up((b1,r1), id1, lp)
|
||||||
|
if SFBC && (it == lp.iL[end])
|
||||||
|
gc = Ubnd[id2]
|
||||||
|
else
|
||||||
|
gc = U[b3,id2,r3]
|
||||||
|
end
|
||||||
|
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
|
||||||
|
|
||||||
|
# H4 staple
|
||||||
|
(b1, r1) = dw((b,r), id1, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
h4 = (U[b1,id1,r1]\U[b1,id2,r1])*U[b2,id1,r2]
|
||||||
|
# END staples
|
||||||
|
|
||||||
|
if SFBC && (it == lp.iL[end])
|
||||||
|
ga = Ubnd[id2]
|
||||||
|
else
|
||||||
|
ga = U[bu1,id2,ru1]
|
||||||
|
end
|
||||||
|
|
||||||
|
g1 = ga/U[bu2,id1,ru2]
|
||||||
|
g2 = U[b,id2,r]\U[b,id1,r]
|
||||||
|
|
||||||
|
if SFBC && (it == 1)
|
||||||
|
X = (cG*c0)*projalg(U[b,id1,r]*g1/U[b,id2,r]) + c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2])) +
|
||||||
|
(c1)*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3)) + c1*(projalg(U[b,id1,r]*g1*g2*g1/U[b,id2,r]))
|
||||||
|
|
||||||
|
frc1[b,id1,r] -= X
|
||||||
|
|
||||||
|
frc2[bu1,id2,ru1] -= (cG*c0)*projalg(g1*g2) + c1*projalg((ga/h3)*g2) +
|
||||||
|
c1*projalg((g1/U[b,id2,r])*h1) + c1*projalg(g1*g2*g1*g2)
|
||||||
|
|
||||||
|
frc2[bu2,id1,ru2] += (cG*c0)*projalg(g2*g1) + c1*projalg((U[b,id2,r]\h1)*g1) +
|
||||||
|
c1*projalg(g2*h2/U[bu2,id1,ru2]) + c1*projalg(g2*g1*g2*g1)
|
||||||
|
elseif SFBC && (it == lp.iL[end])
|
||||||
|
X = cG*c0*projalg(U[b,id1,r]*g1/U[b,id2,r]) +
|
||||||
|
c1 * (projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))) + c1*(projalg(U[b,id1,r]*g1*g2*g1/U[b,id2,r]))
|
||||||
|
|
||||||
|
frc1[b,id1,r] -= X + c1*projalg(U[b,id1,r]*g1/h4)
|
||||||
|
frc1[b,id2,r] += X + c1*projalg(h1*g1/U[b,id2,r])
|
||||||
|
|
||||||
|
frc2[bu2,id1,ru2] += (cG*c0)*projalg(g2*g1) + c1*projalg((U[b,id2,r]\h1)*g1) +
|
||||||
|
c1 * projalg(h4\U[b,id1,r]*g1) + c1* projalg(g2*g1*g2*g1)
|
||||||
|
else
|
||||||
|
if TWP
|
||||||
|
X = projalg(c0*ztw,U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(c0*ztw,g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(c0*ztw,g2*g1)
|
||||||
|
else
|
||||||
|
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
|
||||||
|
end
|
||||||
|
if TWH1
|
||||||
|
frc1[b,id2,r] += projalg(ztw*c1,h1*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/U[b,id2,r])*h1)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw*c1,(U[b,id2,r]\h1)*g1)
|
||||||
|
else
|
||||||
|
frc1[b,id2,r] += c1*projalg(h1*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
|
||||||
|
end
|
||||||
|
if TWH2
|
||||||
|
X += projalg(ztw*c1,U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw*c1,g2*h2/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
X += c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg(g2*h2/U[bu2,id1,ru2])
|
||||||
|
end
|
||||||
|
if TWH3
|
||||||
|
X += projalg(ztw*c1,U[b,id1,r]*ga/(U[b,id2,r]*h3))
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(ga/h3)*g2)
|
||||||
|
else
|
||||||
|
X += c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
|
||||||
|
end
|
||||||
|
if TWH4
|
||||||
|
frc1[b,id1,r] -= projalg(ztw*c1,U[b,id1,r]*g1/h4)
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/h4)*U[b,id1,r])
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw*c1,h4\U[b,id1,r]*g1)
|
||||||
|
else
|
||||||
|
frc1[b,id1,r] -= c1*projalg(U[b,id1,r]*g1/h4)
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((g1/h4)*U[b,id1,r])
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg(h4\U[b,id1,r]*g1)
|
||||||
|
end
|
||||||
|
frc1[b,id1,r] -= X
|
||||||
|
frc1[b,id2,r] += X
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_SF_AFWB,D}) where {T,N,M,D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
id1, id2 = lp.plidx[ipl]
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
|
||||||
|
SFBC = (id1 == N)
|
||||||
TWP = (I[id1]==1) && (I[id2]==1)
|
TWP = (I[id1]==1) && (I[id2]==1)
|
||||||
TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) )
|
TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) )
|
||||||
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
|
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
|
||||||
|
@ -334,6 +873,253 @@ function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG,
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
##
|
||||||
|
## PERIODIC
|
||||||
|
##
|
||||||
|
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,BC_PERIODIC,D}) where {T,NB,N,M,D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
ipl = 0
|
||||||
|
S = zero(eltype(plx))
|
||||||
|
@inbounds begin
|
||||||
|
for id1 in N:-1:1
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
|
||||||
|
for id2 = 1:id1-1
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
ipl = ipl + 1
|
||||||
|
|
||||||
|
TWP = (I[id1]==1) && (I[id2]==1)
|
||||||
|
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
|
||||||
|
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
|
||||||
|
|
||||||
|
# H2 staple
|
||||||
|
(b1, r1) = up((b,r), id1, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id1, lp)
|
||||||
|
gb = U[b2,id2,r2]
|
||||||
|
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
|
||||||
|
|
||||||
|
# H3 staple
|
||||||
|
(b1, r1) = up((b,r), id2, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
|
||||||
|
(b3, r3) = up((b1,r1), id1, lp)
|
||||||
|
|
||||||
|
gc = U[b3,id2,r3]
|
||||||
|
|
||||||
|
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
|
||||||
|
# END staples
|
||||||
|
|
||||||
|
ga = U[bu1,id2,ru1]
|
||||||
|
|
||||||
|
g2 = U[b,id2,r]\U[b,id1,r]
|
||||||
|
|
||||||
|
if TWP
|
||||||
|
S += (ztw[ipl]*c0)*tr(g2*ga/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
S += c0*tr(g2*ga/U[bu2,id1,ru2])
|
||||||
|
end
|
||||||
|
if TWH2
|
||||||
|
S += (ztw[ipl]*c1)*tr(g2*h2/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
S += c1*tr(g2*h2/U[bu2,id1,ru2])
|
||||||
|
end
|
||||||
|
if TWH3
|
||||||
|
S += (ztw[ipl]*c1)*tr(g2*ga/h3)
|
||||||
|
else
|
||||||
|
S += c1*tr(g2*ga/h3)
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
plx[I] = S
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,BC_PERIODIC,D}) where {T,N,M,D}
|
||||||
|
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
S = zero(eltype(plx))
|
||||||
|
ipl = 0
|
||||||
|
for id1 in N:-1:1
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
|
||||||
|
for id2 = 1:id1-1
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
ipl = ipl + 1
|
||||||
|
TWP = (I[id1]==1) && (I[id2]==1)
|
||||||
|
|
||||||
|
gt1 = U[bu1,id2,ru1]
|
||||||
|
|
||||||
|
if TWP
|
||||||
|
S += ztw[ipl]*tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
else
|
||||||
|
S += tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
plx[I] = S
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_PERIODIC,D}) where {T,N,M,D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
id1, id2 = lp.plidx[ipl]
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
TWP = (I[id1]==1)&&(I[id2]==1)
|
||||||
|
|
||||||
|
gt1 = U[bu1,id2,ru1]
|
||||||
|
|
||||||
|
g1 = gt1/U[bu2,id1,ru2]
|
||||||
|
g2 = U[b,id2,r]\U[b,id1,r]
|
||||||
|
|
||||||
|
if TWP
|
||||||
|
X = projalg(ztw,U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw,g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw,g2*g1)
|
||||||
|
else
|
||||||
|
X = projalg(U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(g2*g1)
|
||||||
|
end
|
||||||
|
frc1[b ,id1, r ] -= X
|
||||||
|
frc1[b ,id2, r ] += X
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_PERIODIC,D}) where {T,N,M,D}
|
||||||
|
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
I = point_coord((b,r), lp)
|
||||||
|
it = I[N]
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
id1, id2 = lp.plidx[ipl]
|
||||||
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
bu2, ru2 = up((b, r), id2, lp)
|
||||||
|
|
||||||
|
TWP = (I[id1]==1) && (I[id2]==1)
|
||||||
|
TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) )
|
||||||
|
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
|
||||||
|
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
|
||||||
|
TWH4 = TWP || ( (I[id1]==2) && (I[id2]==1) )
|
||||||
|
|
||||||
|
# H1 staple
|
||||||
|
(b1, r1) = dw((b,r), id2, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id1, lp)
|
||||||
|
|
||||||
|
gc = U[b2,id2,r2]
|
||||||
|
|
||||||
|
h1 = (U[b1,id2,r1]\U[b1,id1,r1])*gc
|
||||||
|
|
||||||
|
# H2 staple
|
||||||
|
(b1, r1) = up((b,r), id1, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id1, lp)
|
||||||
|
|
||||||
|
gb = U[b2,id2,r2]
|
||||||
|
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
|
||||||
|
|
||||||
|
# H3 staple
|
||||||
|
(b1, r1) = up((b,r), id2, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
(b3, r3) = up((b1,r1), id1, lp)
|
||||||
|
|
||||||
|
gc = U[b3,id2,r3]
|
||||||
|
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
|
||||||
|
|
||||||
|
# H4 staple
|
||||||
|
(b1, r1) = dw((b,r), id1, lp)
|
||||||
|
(b2, r2) = up((b1,r1), id2, lp)
|
||||||
|
h4 = (U[b1,id1,r1]\U[b1,id2,r1])*U[b2,id1,r2]
|
||||||
|
# END staples
|
||||||
|
|
||||||
|
ga = U[bu1,id2,ru1]
|
||||||
|
|
||||||
|
g1 = ga/U[bu2,id1,ru2]
|
||||||
|
g2 = U[b,id2,r]\U[b,id1,r]
|
||||||
|
|
||||||
|
if TWP
|
||||||
|
X = projalg(c0*ztw,U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(c0*ztw,g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(c0*ztw,g2*g1)
|
||||||
|
else
|
||||||
|
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
|
||||||
|
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
|
||||||
|
end
|
||||||
|
if TWH1
|
||||||
|
frc1[b,id2,r] += projalg(ztw*c1,h1*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/U[b,id2,r])*h1)
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw*c1,(U[b,id2,r]\h1)*g1)
|
||||||
|
else
|
||||||
|
frc1[b,id2,r] += c1*projalg(h1*g1/U[b,id2,r])
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
|
||||||
|
end
|
||||||
|
if TWH2
|
||||||
|
X += projalg(ztw*c1,U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw*c1,g2*h2/U[bu2,id1,ru2])
|
||||||
|
else
|
||||||
|
X += c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg(g2*h2/U[bu2,id1,ru2])
|
||||||
|
end
|
||||||
|
if TWH3
|
||||||
|
X += projalg(ztw*c1,U[b,id1,r]*ga/(U[b,id2,r]*h3))
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(ga/h3)*g2)
|
||||||
|
else
|
||||||
|
X += c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
|
||||||
|
end
|
||||||
|
if TWH4
|
||||||
|
frc1[b,id1,r] -= projalg(ztw*c1,U[b,id1,r]*g1/h4)
|
||||||
|
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/h4)*U[b,id1,r])
|
||||||
|
frc2[bu2,id1,ru2] += projalg(ztw*c1,h4\U[b,id1,r]*g1)
|
||||||
|
else
|
||||||
|
frc1[b,id1,r] -= c1*projalg(U[b,id1,r]*g1/h4)
|
||||||
|
frc2[bu1,id2,ru1] -= c1*projalg((g1/h4)*U[b,id1,r])
|
||||||
|
frc2[bu2,id1,ru2] += c1*projalg(h4\U[b,id1,r]*g1)
|
||||||
|
end
|
||||||
|
frc1[b,id1,r] -= X
|
||||||
|
frc1[b,id2,r] += X
|
||||||
|
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
"""
|
"""
|
||||||
function force_gauge(ymws::YMworkspace, U, gp::GaugeParm, lp::SpaceParm)
|
function force_gauge(ymws::YMworkspace, U, gp::GaugeParm, lp::SpaceParm)
|
||||||
|
|
||||||
|
@ -388,4 +1174,3 @@ function force_pln!(frc1, ftmp, U, Ubnd, cG, ztw, lp::SpaceParm, c0=1)
|
||||||
|
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
|
|
|
@ -54,14 +54,28 @@ function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::SpaceParm{N,M,BC_PERIODI
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::SpaceParm{N,M,B,D}) where {T,N,M,B,D}
|
function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,N,M,D}
|
||||||
|
|
||||||
|
@inbounds begin
|
||||||
|
b = Int64(CUDA.threadIdx().x)
|
||||||
|
r = Int64(CUDA.blockIdx().x)
|
||||||
|
for id in 1:lp.ndim
|
||||||
|
frc[b,id,r] = SU3alg(m[b,id,1,r], m[b,id,2,r], m[b,id,3,r],
|
||||||
|
m[b,id,4,r], m[b,id,5,r], m[b,id,6,r],
|
||||||
|
m[b,id,7,r], m[b,id,8,r])
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::Union{SpaceParm{N,M,BC_SF_ORBI,D},SpaceParm{N,M,BC_SF_AFWB,D}}) where {T,N,M,D}
|
||||||
|
|
||||||
@inbounds begin
|
@inbounds begin
|
||||||
b = Int64(CUDA.threadIdx().x)
|
b = Int64(CUDA.threadIdx().x)
|
||||||
r = Int64(CUDA.blockIdx().x)
|
r = Int64(CUDA.blockIdx().x)
|
||||||
it = point_time((b,r), lp)
|
it = point_time((b,r), lp)
|
||||||
|
|
||||||
if ((B==BC_SF_AFWB)||(B==BC_SF_ORBI))
|
|
||||||
if it == 1
|
if it == 1
|
||||||
for id in 1:lp.ndim-1
|
for id in 1:lp.ndim-1
|
||||||
frc[b,id,r] = zero(T)
|
frc[b,id,r] = zero(T)
|
||||||
|
@ -77,7 +91,6 @@ function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::SpaceParm{N,M,B,D}) wher
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
end
|
|
||||||
|
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
|
@ -135,6 +135,7 @@ function krnl_add_zth!(frc, frc2::AbstractArray{TA}, U::AbstractArray{TG}, lp::S
|
||||||
it = point_time((b, r), lp)
|
it = point_time((b, r), lp)
|
||||||
|
|
||||||
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) )
|
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) )
|
||||||
|
OBC = (B == BC_OPEN)
|
||||||
|
|
||||||
@inbounds for id in 1:N
|
@inbounds for id in 1:N
|
||||||
bu, ru = up((b,r), id, lp)
|
bu, ru = up((b,r), id, lp)
|
||||||
|
@ -152,13 +153,21 @@ function krnl_add_zth!(frc, frc2::AbstractArray{TA}, U::AbstractArray{TG}, lp::S
|
||||||
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
|
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
|
||||||
projalg(U[b,id,r]*X/U[b,id,r]))
|
projalg(U[b,id,r]*X/U[b,id,r]))
|
||||||
end
|
end
|
||||||
|
end
|
||||||
|
if OBC
|
||||||
|
if (it > 1) && (it < lp.iL[end])
|
||||||
|
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
|
||||||
|
projalg(U[b,id,r]*X/U[b,id,r]))
|
||||||
|
elseif ((it == lp.iL[end]) || (it == 1)) && (id < N)
|
||||||
|
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
|
||||||
|
projalg(U[b,id,r]*X/U[b,id,r]))
|
||||||
|
end
|
||||||
else
|
else
|
||||||
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
|
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
|
||||||
projalg(U[b,id,r]*X/U[b,id,r]))
|
projalg(U[b,id,r]*X/U[b,id,r]))
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
|
@ -204,19 +213,21 @@ Integrates the flow equations with the integration scheme defined by `int` using
|
||||||
"""
|
"""
|
||||||
function flw_adapt(U, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T}
|
function flw_adapt(U, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T}
|
||||||
|
|
||||||
eps = int.eps_ini
|
eps = epsini
|
||||||
dt = tend
|
dt = tend
|
||||||
nstp = 0
|
nstp = 0
|
||||||
|
eps_all = Vector{T}(undef,0)
|
||||||
while true
|
while true
|
||||||
ns = convert(Int64, floor(dt/eps))
|
ns = convert(Int64, floor(dt/eps))
|
||||||
if ns > 10
|
if ns > 10
|
||||||
flw(U, int, 9, eps, gp, lp, ymws)
|
flw(U, int, 9, eps, gp, lp, ymws)
|
||||||
ymws.U1 .= U
|
ymws.U1 .= U
|
||||||
flw(U, int, 2, eps/2, gp, lp, ymws)
|
flw(U, int, 1, eps, gp, lp, ymws)
|
||||||
flw(ymws.U1, int, 1, eps, gp, lp, ymws)
|
flw(ymws.U1, int, 2, eps/2, gp, lp, ymws)
|
||||||
|
|
||||||
dt = dt - 10*eps
|
dt = dt - 10*eps
|
||||||
nstp = nstp + 10
|
nstp = nstp + 10
|
||||||
|
push!(eps_all,ntuple(i->eps,10)...)
|
||||||
|
|
||||||
# adjust step size
|
# adjust step size
|
||||||
ymws.U1 .= ymws.U1 ./ U
|
ymws.U1 .= ymws.U1 ./ U
|
||||||
|
@ -227,6 +238,9 @@ function flw_adapt(U, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, lp
|
||||||
flw(U, int, ns, eps, gp, lp, ymws)
|
flw(U, int, ns, eps, gp, lp, ymws)
|
||||||
dt = dt - ns*eps
|
dt = dt - ns*eps
|
||||||
|
|
||||||
|
push!(eps_all,ntuple(i->eps,ns)...)
|
||||||
|
push!(eps_all,dt)
|
||||||
|
|
||||||
flw(U, int, 1, dt, gp, lp, ymws)
|
flw(U, int, 1, dt, gp, lp, ymws)
|
||||||
dt = zero(tend)
|
dt = zero(tend)
|
||||||
|
|
||||||
|
@ -238,7 +252,7 @@ function flw_adapt(U, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, lp
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
return nstp, eps
|
return nstp, eps_all
|
||||||
end
|
end
|
||||||
flw_adapt(U, int::FlowIntr{NI,T}, tend::T, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T} = flw_adapt(U, int, tend, int.eps_ini, gp, lp, ymws)
|
flw_adapt(U, int::FlowIntr{NI,T}, tend::T, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T} = flw_adapt(U, int, tend, int.eps_ini, gp, lp, ymws)
|
||||||
|
|
||||||
|
@ -260,6 +274,7 @@ function Eoft_plaq(Eslc, U, gp::GaugeParm{T,G,NN}, lp::SpaceParm{N,M,B,D}, ymws:
|
||||||
|
|
||||||
ztw = ztwist(gp, lp)
|
ztw = ztwist(gp, lp)
|
||||||
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) )
|
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) )
|
||||||
|
OBC = (B == BC_OPEN)
|
||||||
|
|
||||||
tp = ntuple(i->i, N-1)
|
tp = ntuple(i->i, N-1)
|
||||||
V3 = prod(lp.iL[1:end-1])
|
V3 = prod(lp.iL[1:end-1])
|
||||||
|
@ -280,6 +295,10 @@ function Eoft_plaq(Eslc, U, gp::GaugeParm{T,G,NN}, lp::SpaceParm{N,M,B,D}, ymws:
|
||||||
if !SFBC
|
if !SFBC
|
||||||
Eslc[1,ipl] = Etmp[1] + Etmp[end]
|
Eslc[1,ipl] = Etmp[1] + Etmp[end]
|
||||||
end
|
end
|
||||||
|
if OBC ## Check normalization of timelike boundary plaquettes
|
||||||
|
Eslc[end,ipl] = Etmp[end-1]
|
||||||
|
Eslc[1,ipl] = Etmp[1]
|
||||||
|
end
|
||||||
else
|
else
|
||||||
for it in 1:lp.iL[end]
|
for it in 1:lp.iL[end]
|
||||||
Eslc[it,ipl] = 2*Etmp[it]
|
Eslc[it,ipl] = 2*Etmp[it]
|
||||||
|
@ -304,7 +323,7 @@ function krnl_plaq_pln!(plx, U::AbstractArray{T}, Ubnd, ztw, ipl, lp::SpaceParm{
|
||||||
I = point_coord((b,r), lp)
|
I = point_coord((b,r), lp)
|
||||||
|
|
||||||
id1, id2 = lp.plidx[ipl]
|
id1, id2 = lp.plidx[ipl]
|
||||||
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI)) && (id1 == lp.iL[end])
|
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI)) && (id1 == N)
|
||||||
TWP = ((I[id1]==1)&&(I[id2]==1))
|
TWP = ((I[id1]==1)&&(I[id2]==1))
|
||||||
|
|
||||||
bu1, ru1 = up((b, r), id1, lp)
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
@ -322,14 +341,13 @@ function krnl_plaq_pln!(plx, U::AbstractArray{T}, Ubnd, ztw, ipl, lp::SpaceParm{
|
||||||
plx[I] = tr(U[b,id1,r]*gt / (U[b,id2,r]*U[bu2,id1,ru2]))
|
plx[I] = tr(U[b,id1,r]*gt / (U[b,id2,r]*U[bu2,id1,ru2]))
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
"""
|
"""
|
||||||
Qtop([Qslc,] U, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace)
|
Qtop([Qslc,] U, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace)
|
||||||
|
|
||||||
Measure the topological charge `Q` of the configuration `U` using the clover definition of the field strength tensor. If the argument `Qslc` is present the contribution for each Euclidean time slice are returned. Only wors in 4D.
|
Measure the topological charge `Q` of the configuration `U` using the clover definition of the field strength tensor. If the argument `Qslc` is present the contributions for each Euclidean time slice are returned. Only works in 4D.
|
||||||
"""
|
"""
|
||||||
function Qtop(Qslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace) where {M,B,D}
|
function Qtop(Qslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace) where {M,B,D}
|
||||||
|
|
||||||
|
@ -345,21 +363,18 @@ function Qtop(Qslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace)
|
||||||
CUDA.@sync begin
|
CUDA.@sync begin
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, -, ymws.frc1, ymws.frc2, lp)
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, -, ymws.frc1, ymws.frc2, lp)
|
||||||
end
|
end
|
||||||
|
|
||||||
CUDA.@sync begin
|
CUDA.@sync begin
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_field_tensor!(ymws.frc1, ymws.frc2, U, gp.Ubnd, 2,4, ztw[2], ztw[4], lp)
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_field_tensor!(ymws.frc1, ymws.frc2, U, gp.Ubnd, 2,4, ztw[2], ztw[4], lp)
|
||||||
end
|
end
|
||||||
CUDA.@sync begin
|
CUDA.@sync begin
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, +, ymws.frc1, ymws.frc2, lp)
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, +, ymws.frc1, ymws.frc2, lp)
|
||||||
end
|
end
|
||||||
|
|
||||||
CUDA.@sync begin
|
CUDA.@sync begin
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_field_tensor!(ymws.frc1, ymws.frc2, U, gp.Ubnd, 3,6, ztw[3], ztw[6], lp)
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_field_tensor!(ymws.frc1, ymws.frc2, U, gp.Ubnd, 3,6, ztw[3], ztw[6], lp)
|
||||||
end
|
end
|
||||||
CUDA.@sync begin
|
CUDA.@sync begin
|
||||||
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, -, ymws.frc1, ymws.frc2, lp)
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, -, ymws.frc1, ymws.frc2, lp)
|
||||||
end
|
end
|
||||||
|
|
||||||
Qslc .= reshape(Array(CUDA.reduce(+, ymws.rm; dims=tp)),lp.iL[end])./(32*pi^2)
|
Qslc .= reshape(Array(CUDA.reduce(+, ymws.rm; dims=tp)),lp.iL[end])./(32*pi^2)
|
||||||
end
|
end
|
||||||
|
|
||||||
|
@ -371,7 +386,7 @@ Qtop(U, gp::GaugeParm, lp::SpaceParm{4,M,D}, ymws::YMworkspace{T}) where {T,M,D}
|
||||||
"""
|
"""
|
||||||
function Eoft_clover([Eslc,] U, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace)
|
function Eoft_clover([Eslc,] U, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace)
|
||||||
|
|
||||||
Measure the action density `E(t)` using the clover discretization. If the argument `Eslc`
|
Measure the action density `E(t)` using the clover discretization. If the argument `Eslc` is given
|
||||||
the contribution for each Euclidean time slice and plane are returned.
|
the contribution for each Euclidean time slice and plane are returned.
|
||||||
"""
|
"""
|
||||||
function Eoft_clover(Eslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace{T}) where {T,M,B,D}
|
function Eoft_clover(Eslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace{T}) where {T,M,B,D}
|
||||||
|
@ -469,6 +484,7 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
|
||||||
#First plane
|
#First plane
|
||||||
id1, id2 = lp.plidx[ipl1]
|
id1, id2 = lp.plidx[ipl1]
|
||||||
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
|
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
|
||||||
|
OBC = ((B == BC_OPEN) && (id1 == 4))
|
||||||
TWP = ((I[id1]==1)&&(I[id2]==1))
|
TWP = ((I[id1]==1)&&(I[id2]==1))
|
||||||
|
|
||||||
bu1, ru1 = up((b, r), id1, lp)
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
@ -488,6 +504,11 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
|
||||||
frc1[bu1,2,ru1] = zero(TA)
|
frc1[bu1,2,ru1] = zero(TA)
|
||||||
frc1[bd,3,rd] = zero(TA)
|
frc1[bd,3,rd] = zero(TA)
|
||||||
frc1[bu2,4,ru2] = projalg(l2*l1)
|
frc1[bu2,4,ru2] = projalg(l2*l1)
|
||||||
|
elseif OBC && (it == lp.iL[end])
|
||||||
|
frc1[b,1,r] = projalg(U[b,id1,r]*l1/U[b,id2,r])
|
||||||
|
frc1[bu1,2,ru1] = zero(TA)
|
||||||
|
frc1[bd,3,rd] = zero(TA)
|
||||||
|
frc1[bu2,4,ru2] = projalg(l2*l1)
|
||||||
else
|
else
|
||||||
if TWP
|
if TWP
|
||||||
frc1[b,1,r] = projalg(ztw1, U[b,id1,r]*l1/U[b,id2,r])
|
frc1[b,1,r] = projalg(ztw1, U[b,id1,r]*l1/U[b,id2,r])
|
||||||
|
@ -505,6 +526,7 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
|
||||||
# Second plane
|
# Second plane
|
||||||
id1, id2 = lp.plidx[ipl2]
|
id1, id2 = lp.plidx[ipl2]
|
||||||
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
|
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
|
||||||
|
OBC = ((B == BC_OPEN) && (id1 == 4))
|
||||||
TWP = ((I[id1]==1)&&(I[id2]==1))
|
TWP = ((I[id1]==1)&&(I[id2]==1))
|
||||||
|
|
||||||
bu1, ru1 = up((b, r), id1, lp)
|
bu1, ru1 = up((b, r), id1, lp)
|
||||||
|
@ -524,6 +546,11 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
|
||||||
frc2[bu1,2,ru1] = zero(TA)
|
frc2[bu1,2,ru1] = zero(TA)
|
||||||
frc2[bd,3,rd] = zero(TA)
|
frc2[bd,3,rd] = zero(TA)
|
||||||
frc2[bu2,4,ru2] = projalg(l2*l1)
|
frc2[bu2,4,ru2] = projalg(l2*l1)
|
||||||
|
elseif OBC && (it == lp.iL[end])
|
||||||
|
frc1[b,1,r] = projalg(U[b,id1,r]*l1/U[b,id2,r])
|
||||||
|
frc1[bu1,2,ru1] = zero(TA)
|
||||||
|
frc1[bd,3,rd] = zero(TA)
|
||||||
|
frc1[bu2,4,ru2] = projalg(l2*l1)
|
||||||
else
|
else
|
||||||
if TWP
|
if TWP
|
||||||
frc2[b,1,r] = projalg(ztw2, U[b,id1,r]*l1/U[b,id2,r])
|
frc2[b,1,r] = projalg(ztw2, U[b,id1,r]*l1/U[b,id2,r])
|
||||||
|
@ -538,7 +565,5 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
return nothing
|
return nothing
|
||||||
end
|
end
|
||||||
|
|
||||||
|
|
|
@ -13,7 +13,7 @@
|
||||||
|
|
||||||
function gauge_action(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace)
|
function gauge_action(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace)
|
||||||
|
|
||||||
Returns the value of the gauge action for the configuration U. The parameters `\beta` and `c0` are taken from the `gp` structure.
|
Returns the value of the gauge action for the configuration U. The parameters ``\\beta`` and `c0` are taken from the `gp` structure.
|
||||||
"""
|
"""
|
||||||
function gauge_action(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}) where T <: AbstractFloat
|
function gauge_action(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}) where T <: AbstractFloat
|
||||||
|
|
||||||
|
@ -71,7 +71,7 @@ end
|
||||||
"""
|
"""
|
||||||
HMC!(U, int::IntrScheme, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace; noacc=false)
|
HMC!(U, int::IntrScheme, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace; noacc=false)
|
||||||
|
|
||||||
Performs a HMC step (molecular dynamics integration and accept/reject step). The configuration `U` is updated ans function returns the energy violation and if the configuration was accepted in a tuple.
|
Performs a HMC step (molecular dynamics integration and accept/reject step). The configuration `U` is updated and function returns the energy violation and if the configuration was accepted in a tuple.
|
||||||
"""
|
"""
|
||||||
function HMC!(U, int::IntrScheme, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}; noacc=false) where T
|
function HMC!(U, int::IntrScheme, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}; noacc=false) where T
|
||||||
|
|
||||||
|
|
42
test/dirac/test_backflow.jl
Normal file
42
test/dirac/test_backflow.jl
Normal file
|
@ -0,0 +1,42 @@
|
||||||
|
using CUDA
|
||||||
|
|
||||||
|
using Pkg
|
||||||
|
|
||||||
|
Pkg.activate("/home/fperez/Git/LGPU_fork_ferflow")
|
||||||
|
|
||||||
|
using LatticeGPU
|
||||||
|
|
||||||
|
lp = SpaceParm{4}((4,4,4,4),(2,2,2,2),0,(0,0,0,0,0,0));
|
||||||
|
|
||||||
|
pso = scalar_field(Spinor{4,SU3fund{Float64}},lp);
|
||||||
|
psi = scalar_field(Spinor{4,SU3fund{Float64}},lp);
|
||||||
|
psi2 = scalar_field(Spinor{4,SU3fund{Float64}},lp);
|
||||||
|
|
||||||
|
ymws = YMworkspace(SU3,Float64,lp);
|
||||||
|
dws = DiracWorkspace(SU3fund,Float64,lp);
|
||||||
|
|
||||||
|
int = wfl_rk3(Float64, 0.01, 1.0)
|
||||||
|
|
||||||
|
gp = GaugeParm{Float64}(SU3{Float64},6.0,1.0,(1.0,0.0),(0.0,0.0),lp.iL)
|
||||||
|
|
||||||
|
dpar = DiracParam{Float64}(SU3fund,1.3,0.9,(1.0,1.0,1.0,1.0),0.0)
|
||||||
|
|
||||||
|
randomize!(ymws.mom, lp, ymws)
|
||||||
|
U = exp.(ymws.mom);
|
||||||
|
|
||||||
|
pfrandomize!(psi,lp)
|
||||||
|
for L in 4:19
|
||||||
|
pso .= psi
|
||||||
|
V = Array(U)
|
||||||
|
a,b = flw_adapt(U, psi, int, L*int.eps, gp,dpar, lp, ymws,dws)
|
||||||
|
# for i in 1:a
|
||||||
|
# flw(U, psi, int, 1 ,b[i], gp, dpar, lp, ymws, dws)
|
||||||
|
# end
|
||||||
|
pfrandomize!(psi2,lp)
|
||||||
|
|
||||||
|
foo = sum(dot.(psi,psi2))# field_dot(psi,psi2,sumf,lp)
|
||||||
|
copyto!(U,V);
|
||||||
|
backflow(psi2,U,L*int.eps,7,gp,dpar,lp, ymws,dws)
|
||||||
|
println("Error:",(sum(dot.(pso,psi2))-foo)/foo)
|
||||||
|
psi .= pso
|
||||||
|
end
|
119
test/dirac/test_backflow_tl.jl
Normal file
119
test/dirac/test_backflow_tl.jl
Normal file
|
@ -0,0 +1,119 @@
|
||||||
|
using LatticeGPU, CUDA, TimerOutputs
|
||||||
|
|
||||||
|
#Test for the relation K(t,y;0,n)^+ Dw(n|m)^{-1} e^(ipm) = D(p)^{-1} exp(4t sin^2(p/2)) e^{ipn} with a given momenta (if p=0 its randomized), spin and color
|
||||||
|
#Kernel en 1207.2096
|
||||||
|
|
||||||
|
|
||||||
|
@timeit "Plw backflow test" begin
|
||||||
|
|
||||||
|
function Dwpw_test(;p=0,s=1,c=1)
|
||||||
|
lp = SpaceParm{4}((16,16,16,16), (4,4,4,4), 0, (0,0,0,0,0,0))
|
||||||
|
gp = GaugeParm{Float64}(SU3{Float64}, 6.0, 1.0)
|
||||||
|
dpar = DiracParam{Float64}(SU3fund,1.3,0.0,(1.0,1.0,1.0,1.0),0.0)
|
||||||
|
dws = DiracWorkspace(SU3fund,Float64,lp);
|
||||||
|
ymws = YMworkspace(SU3,Float64,lp);
|
||||||
|
|
||||||
|
p==0 ? p = Int.(round.(lp.iL.*rand(4),RoundUp)) : nothing
|
||||||
|
U = fill!(vector_field(SU3{Float64},lp),one(SU3{Float64}))
|
||||||
|
|
||||||
|
rm = 2* pi* p./(lp.iL)
|
||||||
|
rmom=(rm[1],rm[2],rm[3],rm[4])
|
||||||
|
|
||||||
|
int = wfl_rk3(Float64, 0.01, 1.0)
|
||||||
|
Nsteps = 15
|
||||||
|
|
||||||
|
@timeit "Generate plane wave" begin
|
||||||
|
|
||||||
|
pwave = fill!(scalar_field(Spinor{4,SU3fund{Float64}},lp),zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
||||||
|
prop = scalar_field(Spinor{4,SU3fund{Float64}},lp)
|
||||||
|
prop_th = fill!(scalar_field(Spinor{4,SU3fund{Float64}},lp),zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
||||||
|
|
||||||
|
|
||||||
|
#Generate plane wave
|
||||||
|
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar pwave[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*Spinor{4,SU3fund{Float64}}(ntuple(i -> (i==s)*SU3fund{Float64}(ntuple(j -> (j==c)*1.0,3)...),4))
|
||||||
|
end end end end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
@timeit "Generate analitical solution" begin
|
||||||
|
|
||||||
|
#Th solution
|
||||||
|
|
||||||
|
if s == 1
|
||||||
|
vals = (dpar.m0 + 4.0 - sum(cos.(rmom)),0.0,im*sin(rmom[4])+sin(rmom[3]),im*sin(rmom[2])+sin(rmom[1]))
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
|
||||||
|
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
|
||||||
|
end end end end
|
||||||
|
elseif s == 2
|
||||||
|
vals = (0.0,dpar.m0 + 4.0 - sum(cos.(rmom)),sin(rmom[1]) - im *sin(rmom[2]),-sin(rmom[3])+im*sin(rmom[4]))
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
|
||||||
|
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
|
||||||
|
end end end end
|
||||||
|
elseif s == 3
|
||||||
|
vals = (-sin(rmom[3])+im*sin(rmom[4]),-sin(rmom[1])-im*sin(rmom[2]),dpar.m0 + 4.0 - sum(cos.(rmom)),0.0)
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
|
||||||
|
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
|
||||||
|
end end end end
|
||||||
|
else
|
||||||
|
vals = (-sin(rmom[1])+im*sin(rmom[2]),sin(rmom[3])+im*sin(rmom[4]),0.0,dpar.m0 + 4.0 - sum(cos.(rmom)))
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
|
||||||
|
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
|
||||||
|
end end end end
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
prop_th .= exp(-4*Nsteps*int.eps*sum(sin.(rmom./2).^2))*prop_th
|
||||||
|
|
||||||
|
|
||||||
|
#compute Sum{x} D^-1(x|y)P(y)
|
||||||
|
|
||||||
|
@timeit "Solving propagator and flowing" begin
|
||||||
|
|
||||||
|
function krnlg5!(src)
|
||||||
|
b=Int64(CUDA.threadIdx().x)
|
||||||
|
r=Int64(CUDA.blockIdx().x)
|
||||||
|
src[b,r] = dmul(Gamma{5},src[b,r])
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(pwave)
|
||||||
|
end
|
||||||
|
g5Dw!(prop,U,pwave,dpar,dws,lp)
|
||||||
|
CG!(prop,U,DwdagDw!,dpar,lp,dws,10000,1.0e-14)
|
||||||
|
|
||||||
|
for _ in 1:Nsteps
|
||||||
|
backflow(U,prop,1,int.eps,gp,dpar,lp, ymws,dws)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
dif = sum(norm2.(prop - prop_th))
|
||||||
|
|
||||||
|
return dif
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
begin
|
||||||
|
dif = 0.0
|
||||||
|
for i in 1:3 for j in 1:4
|
||||||
|
dif += Dwpw_test(c=i,s=j)
|
||||||
|
end end
|
||||||
|
|
||||||
|
if dif < 1.0e-5
|
||||||
|
print("Backflow_tl test passed with average error ", dif/12,"!\n")
|
||||||
|
else
|
||||||
|
error("Backflow_tl test failed with difference: ",dif,"\n")
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
end
|
||||||
|
end
|
119
test/dirac/test_flow_tl.jl
Normal file
119
test/dirac/test_flow_tl.jl
Normal file
|
@ -0,0 +1,119 @@
|
||||||
|
using LatticeGPU, CUDA, TimerOutputs
|
||||||
|
|
||||||
|
#Test for the relation K(t,y;0,n) Dw(n|m)^{-1} e^(ipm) = D(p)^{-1} exp(-4t sin^2(p/2)) e^{ipn} with a given momenta (if p=0 its randomized), spin and color
|
||||||
|
#Kernel en 1207.2096
|
||||||
|
|
||||||
|
|
||||||
|
@timeit "Plw flow test" begin
|
||||||
|
|
||||||
|
function Dwpw_test(;p=0,s=1,c=1)
|
||||||
|
lp = SpaceParm{4}((16,16,16,16), (4,4,4,4), 0, (0,0,0,0,0,0))
|
||||||
|
gp = GaugeParm{Float64}(SU3{Float64}, 6.0, 1.0)
|
||||||
|
dpar = DiracParam{Float64}(SU3fund,1.3,0.0,(1.0,1.0,1.0,1.0),0.0)
|
||||||
|
dws = DiracWorkspace(SU3fund,Float64,lp);
|
||||||
|
ymws = YMworkspace(SU3,Float64,lp);
|
||||||
|
|
||||||
|
p==0 ? p = Int.(round.(lp.iL.*rand(4),RoundUp)) : nothing
|
||||||
|
U = fill!(vector_field(SU3{Float64},lp),one(SU3{Float64}))
|
||||||
|
|
||||||
|
rm = 2* pi* p./(lp.iL)
|
||||||
|
rmom=(rm[1],rm[2],rm[3],rm[4])
|
||||||
|
|
||||||
|
int = wfl_rk3(Float64, 0.01, 1.0)
|
||||||
|
Nsteps = 15
|
||||||
|
|
||||||
|
@timeit "Generate plane wave" begin
|
||||||
|
|
||||||
|
pwave = fill!(scalar_field(Spinor{4,SU3fund{Float64}},lp),zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
||||||
|
prop = scalar_field(Spinor{4,SU3fund{Float64}},lp)
|
||||||
|
prop_th = fill!(scalar_field(Spinor{4,SU3fund{Float64}},lp),zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
|
||||||
|
|
||||||
|
|
||||||
|
#Generate plane wave
|
||||||
|
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar pwave[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*Spinor{4,SU3fund{Float64}}(ntuple(i -> (i==s)*SU3fund{Float64}(ntuple(j -> (j==c)*1.0,3)...),4))
|
||||||
|
end end end end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
@timeit "Generate analitical solution" begin
|
||||||
|
|
||||||
|
#Th solution
|
||||||
|
|
||||||
|
if s == 1
|
||||||
|
vals = (dpar.m0 + 4.0 - sum(cos.(rmom)),0.0,im*sin(rmom[4])+sin(rmom[3]),im*sin(rmom[2])+sin(rmom[1]))
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
|
||||||
|
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
|
||||||
|
end end end end
|
||||||
|
elseif s == 2
|
||||||
|
vals = (0.0,dpar.m0 + 4.0 - sum(cos.(rmom)),sin(rmom[1]) - im *sin(rmom[2]),-sin(rmom[3])+im*sin(rmom[4]))
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
|
||||||
|
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
|
||||||
|
end end end end
|
||||||
|
elseif s == 3
|
||||||
|
vals = (-sin(rmom[3])+im*sin(rmom[4]),-sin(rmom[1])-im*sin(rmom[2]),dpar.m0 + 4.0 - sum(cos.(rmom)),0.0)
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
|
||||||
|
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
|
||||||
|
end end end end
|
||||||
|
else
|
||||||
|
vals = (-sin(rmom[1])+im*sin(rmom[2]),sin(rmom[3])+im*sin(rmom[4]),0.0,dpar.m0 + 4.0 - sum(cos.(rmom)))
|
||||||
|
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
|
||||||
|
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
|
||||||
|
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
|
||||||
|
end end end end
|
||||||
|
end
|
||||||
|
|
||||||
|
end
|
||||||
|
|
||||||
|
prop_th .= exp(-4*Nsteps*int.eps*sum(sin.(rmom./2).^2))*prop_th
|
||||||
|
|
||||||
|
|
||||||
|
#compute Sum{x} D^-1(x|y)P(y)
|
||||||
|
|
||||||
|
@timeit "Solving propagator and flowing" begin
|
||||||
|
|
||||||
|
function krnlg5!(src)
|
||||||
|
b=Int64(CUDA.threadIdx().x)
|
||||||
|
r=Int64(CUDA.blockIdx().x)
|
||||||
|
src[b,r] = dmul(Gamma{5},src[b,r])
|
||||||
|
return nothing
|
||||||
|
end
|
||||||
|
|
||||||
|
CUDA.@sync begin
|
||||||
|
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(pwave)
|
||||||
|
end
|
||||||
|
g5Dw!(prop,U,pwave,dpar,dws,lp)
|
||||||
|
CG!(prop,U,DwdagDw!,dpar,lp,dws,10000,1.0e-14)
|
||||||
|
|
||||||
|
flw(U, prop, int, Nsteps ,int.eps, gp, dpar, lp, ymws, dws)
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
dif = sum(norm2.(prop - prop_th))
|
||||||
|
|
||||||
|
|
||||||
|
return dif
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
begin
|
||||||
|
dif = 0.0
|
||||||
|
for i in 1:3 for j in 1:4
|
||||||
|
dif += Dwpw_test(c=i,s=j)
|
||||||
|
end end
|
||||||
|
|
||||||
|
if dif < 1.0e-4
|
||||||
|
print("Flow_tl test passed with average error ", dif/12,"!\n")
|
||||||
|
else
|
||||||
|
error("Flow_tl test failed with difference: ",dif,"\n")
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
end
|
||||||
|
end
|
Loading…
Add table
Add a link
Reference in a new issue