QSGW Tutorial for magnetic bcc Fe

This tutorial describes how to make a QSGW self-energy, and draw energy bands for elemental Fe, a bcc metal.

Several tutorials build on this one, in particular the tutorial for the dynamical self-energy.

Preliminaries
Introduction
1. Self-consistent LDA calculation for Fe
2. QSGW calculation for Fe
Additional exercises
Footnotes and References

Preliminaries

Executables in the Questaal build are assumed to be in your path.

The band-plotting part uses the plbnds utility and to prepare figures depicting energy bands, and the fplot graphics utility to make a postscript file.

To view the postscript file, substitute your favorite viewer for apple-style open command assumed here.

This tutorial proceeds in a manner similar to the basic QSGW tutorial and you may want to familiarize yourself with that tutorial first. It also forms a starting point for other tutorials, for example the calculation of the dynamical self-energy, another for k-resolved DOS, and as well as combined of k-resolved and Mulliken-resolved DOS - going over these tutorials after or during this tutorial may help you better follow both.

LDA self-consistency (starting from init.fe)

blm --brief --mag --noforce --rsham~el=-.4,-1.2~npr=27 --autobas~ehmx=-.4~eloc=-4.5~noelo --mix~nit=20~conv=2e-6 --gw~emax=3~gcutb=2.7~gcutx=2.3~vkap=1e-4~mixb=1~wemax=0~nk=8 --nk=16 --gmax=9 fe
echo 'BASIS:' > basp.fe
echo ' Fe RSMH= 1.561 1.561 1.05 1.561 1.561 EH= -0.4 -0.4 -0.4 -0.4 -0.4 RSMH2= 1.561 1.561 1.05 EH2= -1.2 -1.2 -1.2 P= 4.737 4.538 3.892 4.148 5.102 PZ= 5.5 3.9437 4.4' >> basp.fe
echo '--tbeh --v8 -vrsham=2' > switches-for-lm
lmfa ctrl.fe `cat switches-for-lm`
lmf ctrl.fe -vnkabc=8 `cat switches-for-lm ` > out.8
rm -f mixm.fe
lmf ctrl.fe `cat switches-for-lm ` > out.16
cp rst.fe rst.lda

Make the energy bands with spin-orbit coupling and save in bnds.lda:

lmf ctrl.fe `cat switches-for-lm` -vso=1 --quit=band > out.so
lmf ctrl.fe `cat switches-for-lm` -vso=1 --band~fn=syml~scol@sites=1@l=2@spin1~scol2@sites=1@l=2@spin2
plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -lt=1,bold=3,col=0,0,0,colw=1,0,0,colw2=0,1,0 -lbl bnds.fe
cp bnds.fe bnds.lda

QSGW self-consistency

touch moms.h5
rm -r -f meta moms.h5
lmfgwd ctrl.fe `cat switches-for-lm` --job=-1
lmfgwd ctrl.fe `cat switches-for-lm` --job=0 --lmqp~rdgwin --batch~np=4~pqmap@plot
rm -f mixm.fe
lmgw.sh --incount --mpirun-1 "env OMP_NUM_THREADS=16 MKL_NUM_THREADS=16 mpirun -n 1" --mpirun-n "env OMP_NUM_THREADS=4 MKL_NUM_THREADS=4 mpirun -n 4" --tol=2e-5 ctrl.fe > lmgw8.log

Make the energy bands

lmf ctrl.fe `cat switches-for-lm` -vso=1 --quit=band > out.so
lmf ctrl.fe `cat switches-for-lm` -vso=1 --band~fn=syml
plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 bnds.fe

Introduction

This tutorial carries out a QSGW calculation for Fe, a ferromagnet with a fairly large moment of 2.2μ_B.

Quasiparticle Self-Consistent GW (QSGW) is a special form of self-consistency within GW. Its advantages are briefly described in the overview, and also in the summary of the basic QSGW tutorial, and also below in the magnetic context.

Self-consistency is rather necessary in magnetic systems, because the magnetic moment is not reliably described by 1-shot GW, and is somewhat ill-defined. For the most part magnetic moments predicted by QSGW are significantly better than the LDA. Moments in itinerant magnets, however, tend to be overestimated because diagrams with spin fluctuations (absent in GW) tend to reduce the magnetic moment. This is true for both QSGW and LDA.

1. Self-consistent LDA calculation for Fe

QSGW requires a starting point. The LDA is a reasonably accurate, and convenient starting point, indeed it is good enough in its own right for many magnetic systems.

Following other LDA tutorials, the input file is built with the blm utility, which requires file init.fe.

Input file setup

Fe is a bcc metal with a lattice constant 5.408 a₀ and a magnetic moment of 2.2 μ_B. The LDA needs some breaking of the symmetry to stabilize a nonzero magnetic moment, so we supply a trial moment of 2.2μ_B.

Cut and paste the contents in the box below into init.fe.

# Init file for Fe
LATTICE
#   SPCGRP=Im-3m  A=5.408
    ALAT=5.408 PLAT= -.5 .5 .5  .5 -.5 .5  .5 .5 -.5
SPEC
    ATOM=Fe  MMOM=0,0,2.2  PZ= 5.5 3.9437 4.4  LMX=4 A=0.025 # Use experimental value for trial Fe moment
SITE
    ATOM=Fe  X=0 0 0

Note: Some of these tags are of minor importance, but omitting they will change the output slightly. They are included here to maintain compatibility with the Questaal test case, gw_fe45.

Supply either the space group (Im-3m) or the bcc lattice vectors. The former is commented out in the file above.

The input file setup and self-consistent cycle are very similar to the PbTe tutorial; review it first if you are not familiar with the structure of the input file, ctrl.fe, or how to set up an LDA calculation from structural information.

Run blm this way. Some of the switches are of minor importance, but they serve a useful purpose as explained below, or they are included to maintain compatibility with the Questaal test case, gw_fe45.

blm --brief --mag --noforce --rsham~el=-.4,-1.2~npr=27 --autobas~ehmx=-.4~eloc=-4.5~noelo --mix~nit=20~conv=2e-6 --gw~emax=3~gcutb=2.7~gcutx=2.3~vkap=1e-4~mixb=1~wemax=0~nk=8 --nk=16 --gmax=9 fe

Note that blm generates these files:

ctrl.fe the main input file
basp.fe contains parameters for the basis set
site.fe contains structural information. To include this information instead directly in the ctrl.fe file instead, add switch --wsite~nofile.

The switches to blm are documented here, but click on the line below for further details.

--brief : sets the verbosity and style in the ctrl file
--ctrl=ctrl : tells blm to write the input file to ctrl.fe, instead of the default (actrl.fe).
--mag : tells blm to set up a spin-polarized calculation.
--noforce : tells blm to suppress calculating forces
--rsham~el=-.4,-1.2~npr=27 :
causes lmf to use tight-binding form of the hamiltonian.
In this mode energies of the smooth Hankel envelope functions is the same for all orbitals and sites (el=-.4,-1.2).
~npr=27 tells blm look find a radius large enough to accommodate at least 28 orbitals
--autobas~ehmx=-.4~eloc=-4.5~noelo
sets parameters for the basis set.
ehmx=-.4 is not used or needed here, but it is translated into the basis set file basp.fe.
This makes basp.fe consistent with the tight-binding mode, if the latter turned off.
eloc=-4.5 tells blm to look capture any core state above −4.5 Ry as a local orbital.
This is deep for the LDA, but in GW the Fe 3p level near −4.4 Ry has non-negligible (~0.05 eV) effect on states near the Fermi level
noelo tells blm to use conventional local orbitals for core-like states, rather than extended ones.
As of this writing the tight-binding mode does not allow extended local orbitals.
If you do not include this tag, PZ for the Fe 3p would be written as 13.9437 in basp.fe, rather than 3.9437.
--mix~nit=20~conv=2e-6:
sets the maximum number of iterations in lmf self-consistency cycle.
This tag is not required; the default (10 iterations) is about how many are needed to make it self-consistent.
~conv=2e-6 is not essential but it is included to maintain compatibility with the Questaal test case, gw_fe45
--gw~emax=3~gcutb=2.7~gcutx=2.3~vkap=1e-4~wemax=0~nk=8
tells blm to prepare for a GW calculation. It does the following:
makes the basis set a bit larger
pushes the default smooth Hankel energies a little deeper to better interpolate the QSGW Σ⁰ between k points where it is calculated;
also sets boundary conditions for the linearization energy higher, to better described unoccupied states.
emax=3 : sets the energy cutoff (Ry) over which the QSGW self-energy Σ⁰ is assumed to be diagonal in the LDA basis.
emax too poses too large a burden on the self-energy interpolator; emax too small begins to affect low-lying states.
This is described in Section II G of Ref. 1.
gcutb=2.7 : sets the G-vector cutoff for the 1-particle basis envelope functions
gcutx=2.3 : sets the G-vector cutoff for the 2-particle basis envelope functions
Alternatively you can do the following and have blm automatically determine both with an internal algorithm:
getgcut
Hazard: The computation time is very sensitive to gcutx=2.3. See Additional Exercise 3.
wemax=0 : Assigns a value to HAM_SIGP_WEMAX. See Additional Exercise 1a.
nk=8 : the k mesh for GW. If you do not supply this mesh, it will use the 1-body (lmf) mesh. But typically
the self-energy varies much more smoothly with k than does the kinetic energy; 16 divisions needed in lmf is expensive and overkill.
--nk=16 : sets the k mesh for the 1-body code, lmf. Use --nk without arguments to let blm select a default for you.
Hazard: using blm’s default value for --nk can be a little dangerous, since a good choice is materials- property- specific.
We use a rather fine mesh here, because Fe has a high density-of-states near the Fermi level. It is shown below how the magnetic moment is affected by the k mesh.
--gmax=9 : sets the G cutoff for plane wave expansion of smooth Hankels.
It is not essential but it is included to maintain compatibility with the Questaal test case, gw_fe45
fe Files associated with this calculation will have extension .fe

Local orbitals

Inspect the file basp.ext . You should see the following

 Fe RSMH= 1.561 1.561 1.045 1.561 1.561 EH= -0.4 -0.4 -0.4 -0.4 -0.4 RSMH2= 1.561 1.561 1.045 EH2= -1.2 -1.2 -1.2 P= 4.738 4.54 3.892 4.148 5.102 PZ= 5.5 3.9437 4.4

PZ=5.5 3.9437 4.4 adds a deep 3p and high-lying 5s and 4d local orbitals to the basis.

It turns out the high-lying Fe 4d state affects the GW potential slightly, enough to affect states near the Fermi level by 0.05-0.1 eV. It matters little in the LDA, but the fidelity of QSGW is high enough that these shifts noticeably affect agreement with ARPES and quantum oscillation measurements (see Ref. 2). In contrast to the LDA, both occupied and unoccupied states contribute to the GW self-energy.

The same story applies to deep 3p core level. Its energy, around −4 Ry, is deep and core-like. In the LDA it can be treated as a core, and it can in GW too; but in GW this effect on states near the Fermi level is non-negligible (~0.05 eV). By its inclusion through PZ it will be promoted from the core to the valence via a (conventional) local orbital. See Additional Exercise 5a.

Modify basp.fe slightly:

echo 'BASIS:' > basp.fe
echo ' Fe RSMH= 1.561 1.561 1.05 1.561 1.561 EH= -0.4 -0.4 -0.4 -0.4 -0.4 RSMH2= 1.561 1.561 1.05 EH2= -1.2 -1.2 -1.2 P= 4.737 4.538 3.892 4.148 5.102 PZ= 5.5 3.9437 4.4' >> basp.fe

This change is of minor importance, but it enables the tutorial output to remain consistent with the Questaal test case, gw_fe45

Starting density

We need one more initialization step to make a self-consistent LDA potential. The LDA needs a trial density, and the most common way to get this is to assemble a Mattheis construction, which constructs a crystal density by overlapping free atomic densities. lmf requires the free-atomic densities to accomplish this. lmfa will make them and store them in file atm.fe.

lmfa ctrl.fe

Self-consistency in the LDA

Before making the density self-consistent, create a file switches-for-lm

echo '--tbeh --v8 -vrsham=2' > switches-for-lm

This step is not essential, but it turns on the tight-binding variant of the basis lmf uses..

Note: for tight-binding form to work a STR category must be present in the ctrl file. Assigning preprocessor variable rsham to 2 causes it to be included because of these lines

% ifdef rsham>1
str     rmaxa={rscut}
        env[mode=1 nel=2 el={el1} {el2}]
% endif

(assigning -vrsham=2 on the command overrides the value rsham=1 assigned in the ctrl file.)

Note: the range of the tight-binding hamiltonian is set by {rscut}. blm set this parameter according to the specification rsham~npr=27.

The density should scarcely depend on whether the tight-binding form is used or not (you can easily verify this by running lmf without these switches), but when we get the GW stage later, the tight-binding form makes the self-energy Σ⁰ shorter ranged, which helps with interpolation. In the step above the switches are packaged into file switches-for-lm. It seems a roundabout way to include these arguments on the command line, but the GW shell script used later on (lmgw.sh) knows about switches-for-lm, and will add them at appropriate places to various executables.

Next we are ready to make a self-consistent density. To check for convergence in k, we initially converge a self-consistent density with a relative coarse k mesh:

lmf ctrl.fe -vnkabc=8 `cat switches-for-lm ` > out.8

(Note -vnkabc=8 overrides the preprocessor variable nkabc=16 assigned in the ctrl file.) This step overlaps the atomic density taken from file atm.fe generated by lmfa, generates an output density, and iterates the until the input and output densities are the same (self-consistency).

Next run lmf with the pre-specified 16×16×16 mesh:

rm -f mixm.fe
lmf ctrl.fe -vnkabc=16 `cat switches-for-lm` > out.16

This is a very fine mesh, as explained below. (Note that -vnkabc=16 is redundant since that is the value assigned in ctrl.fe).

Note: the previous command ran lmf in serial mode. If you have multiple cores available, you can run it as follows (assuming a standard MPI launcher, selecting 6 cores)

mpirun -n 6 lmf ctrl.fe -vnkabc=16 `cat switches-for-lm ` > out.16

By comparing out.8 and out.16 we can get a sense of the k convergence. Do the following:
grep -A 3 ' BZINTS: Fermi energy:' out.8
At the final (self-consistent) iteration you should see

 BZINTS: Fermi energy:     -0.006155;  14.000000 electrons;  D(Ef):   16.040
         Sum occ. bands:  -24.4369777  incl. Bloechl correction:  -0.0029416
         Mag. moment:       2.233924

Do the same for out.16: grep -A 3 ' BZINTS: Fermi energy:' out.16

 BZINTS: Fermi energy:     -0.003997;  14.000000 electrons;  D(Ef):   14.372
         Sum occ. bands:  -24.4327023  incl. Bloechl correction:  -0.0012015
         Mag. moment:       2.206079

The finer k mesh causes the density of states at E_F to drop by 10%. Also note the change in total energy:
egrep '^[hc]' save.fe should yield something similar to

h nkabc=8 rsham=2 mmom=2.1296839 ehf=-2541.0400555 ehk=-2541.0208036
c nkabc=8 rsham=2 mmom=2.2339242 ehf=-2541.0377175 ehk=-2541.0377224
c nkabc=16 rsham=2 mmom=2.2060785 ehf=-2541.0386479 ehk=-2541.038654

The line beginning with ‘h’ shows the results derived from the Mattheis construction. At self-consistency (lines beginning with ‘c’), the Harris-Foulkes and Kohn-Sham total energies are nearly identical. (If they are not, something is wrong with the calculation!) Note that the self-consistent magnetic moment (2.20 μ_B), is very close to the experimental value. It drops by 0.03 in going from nkabc=8→nkabc=16. The total energy also drops, by about 1 mRy. It should be smaller than the ‘Bloechl correction’ from nkabc=8, which is an estimate of the third order contribution to the energy band sum (a component of the total energy) from the classical second-order tetrahedron integration scheme (see Ref. 3 and the page on Brillouin zone integration). A detailed discussion of the output as lmf iterates to self-consistency can be found in the PbTe tutorial.

LDA energy bands

In this section we compute the energy bands, which we will compare later with the QSGW result.

When you include command-line argument --band when running lmf, it does not go through the self-consistent cycle but operates in a special band-generating mode. In this mode you also need to supply the special lines in k space on which to draw the bands. This information we will stored a file syml.fe, whose structure is documented here.

The lmchk utility has a facility to build a template syml.fe completely automatically, provided you have the spglib library linked (the library must be installed and a path to it included in flags.mk). If you do have it installed, simply do

lmchk ctrl.fe --syml

This will generate syml.fe with a fairly extensive list of lines, probably more lines than you like.

If you do not have spglib installed, or if you want to modify the default --syml generates, you can run lmchk in semi-automatic mode. In this mode you add extra information to --syml where you specify the k points and the connecting vectors yourself.

The fully automatic mode prints to stdout a template instruction for the semiautomatic mode, for this purpose. In this example it reads

--syml~n=71~mq~lblq:G=0,0,0,H=1/2,-1/2,1/2,P=1/4,1/4,1/4,N=0,0,1/2~lbl=G-H-N-G-P-H|P-N

You can also look on the symmetry lines page which has templates for some crystal systems, including the bcc lattice that Fe takes. For eample the following line for bcc systems appears on that page:

lmchk ctrl.ext --syml~n=51~mq~lblq:G=0,0,0,H=1/2,-1/2,1/2,N=0,0,1/2,P=1/4,1/4,1/4~lbl=GHNGPHN

Either of the above instructions serve as a good starting point to construct a serviceable syml.fe. For this tutorial we will choose the following. Make syml.fe with

lmchk ctrl.fe --syml~n=51~mq~lblq:G=0,0,0,H=1/2,-1/2,1/2,N=0,0,1/2,P=1/4,1/4,1/4~lbl=GHNPG

Inspect syml.fe and confirm that it conforms to the standard format.

Typically when the bands are plotted the Fermi level E_F is assigned value 0 (see Ref. 4). The Fermi level generated in the last iteration is stored in wkp.fe and also in rst.fe. Normally lmf reads E_F from wkp.fe, but if this file doesn’t exist you can tell lmf to read it from rst.fe by adding --efrs to the command line.

Do one of the following to generate a bands file, bnds.fe.

lmf ctrl.fe `cat switches-for-lm` --band~fn=syml
lmf ctrl.fe `cat switches-for-lm` --efrs --band~fn=syml

Either way, E_F should be the same, -0.003997. Either of these instructions should generate an identical bnds.fe, containing bands along four symmetry lines (Γ-H, H-N, N-P, and P-Γ). For a brief description of the structure of this file, see here.

Hazard: E_F stored in wkp.fe or rst.fe corresponds to the potential that generated rst.fe (as an output density), not the potential rst.fe itself would generate. For a nearly self-consistent density the difference is small, but if you want to be sure, do the following before generating the bands

lmf ctrl.fe `cat switches-for-lm` --quit=band

E_F corresponding to the potential generated from rst.fe comes out to -0.004019, which is close to, but not identical to the last iteration in the self-consistency cycle. This last band pass calculation updates contents of wkp.fe to the exact value corresponding to the potential.

To make a postscript figure of the bands, run the first line for the majority spin and the second line for the minority spin

plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -spin1 -lbl bnds.fe
plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -spin2 -lbl bnds.fe

Either of these instructions generates a postscript file, fplot.ps. Use your favorite postscript viewer (e.g. on a mac, do open fplot.ps) for a visual rendition of the file.

Calculate bands including spin orbit coupling, which couples the spin (to be compared with QSGW bands below)

lmf ctrl.fe `cat switches-for-lm` -vso=1 --band~fn=syml
plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 bnds.fe
cp bnds.fe bnds.lda

It is possible (but a little tedious) to use the bnds file just generated and render both spins in the same figure. Instead we explain how to accomplish the result by including spin orbit coupling, which couples the spins. We assume SO coupling doesn’t modify the charge density significantly, which is a good approximation here. First, run lmf to find the Fermi level when SO coupling is included.

lmf ctrl.fe `cat switches-for-lm` -vso=1 --quit=band > out.so

You should see that the RMS change in the charge density is small, of order 10⁻⁵, confirming that SO coupling minimally perturbs the density (grep DQ out.so).
You can also confirm that the Fermi level hardly changes (grep -A 3 ' BZINTS: Fermi energy:' out.so). It should change slightly, to -0.003949.

Generate the bands again, with SO coupling turned on this time

lmf ctrl.fe `cat switches-for-lm` -vso=1 --band~fn=syml
plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 bnds.fe

Use your favorite postscript viewer (e.g. open fplot.ps) for a visual rendition of the file. You should see essentially the majority band and a nearly parallel minority band, shifted by about 2 eV.

With color weights you can decompose the eigenfunction with a Mulliken analysis and use the Mulliken population derived from a subset of orbitals as a weight assigned to a color. Thus you can use colors to distinguish different orbitals. How to assign color weights can be done in multiple ways: this page documents how to augment the --band switch for this purpose (take especial note of --band~scol).

Suppose we want to distinguish majority spin from minority spin. The following will do this, selecting out the d orbitals (the spin resides almost entirely on the Fe 3d orbitals)
```
lmf ctrl.fe `cat switches-for-lm` -vso=1 --band~fn=syml~scol@sites=1@l=2@spin1~scol2@sites=1@l=2@spin2
```
Inspect the first line of bnds.fe. It should look like this
```
86  -0.00395     2  lbl=GHNPG col1=5:9,30:34,39:43 col2=48:52,73:77,82:86
```
col1 (col2) selects out majority (minority) spin d orbitals. To see the entire table of orbitals and how they are ordered in the hamiltonian, do
```
lmf ctrl.fe `cat switches-for-lm` --pr50 --quit=ham
```
and look for the Orbital positions table. (This table is given for the first spin only; in the SO case the second spin follows the first, with the same ordering.)

Convert the bands into a postscript file

plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -lt=1,bold=3,col=0,0,0,colw=1,0,0,colw2=0,1,0 -lbl bnds.fe

Render the figure (e.g. open fplot.ps) to see how the majority d (red), the minority d (green), and sp orbitals can be distinguished. Note where the bands cross, the bands of a particular colors pass smoothly through the crossing. That indicates these crossings are trivial, not topological.

In this instance we distinguish d states of t_2g symmetry from those of e_g symmetry.

lmf ctrl.fe `cat switches-for-lm` -vso=1 --band~fn=syml~scol@sites=1@l=2@m=-2,-1,1~scol2@sites=1@l=2@m=0,2
plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -lt=1,bold=3,col=0,0,0,colw=1,0,0,colw2=0,1,0 -lbl bnds.fe

(To affirm that t_2g states correspond to m=-2,-1,1 and the e_g states correspond to m=0,2 see the table of spherical harmonics.)

The rigid-band model of exchange splitting works pretty well here: green bands and red bands are approximately parallel.

In this figure colors correspond to:

black : sp character
red : majority d character
green : minority d character

For future reference, preserve the restart file which contains the LDA density

cp rst.fe rst.lda

2. QSGW calculation for Fe

Setup: the GWinput file

QSGW is executed through a shell script lmgw.sh, and it requires a separate input file, GWinput.

Create a template GWinput with:

lmfgwd ctrl.fe `cat switches-for-lm` --job=-1

Usually you don’t need to tinker with GWinput. But sometimes you do, and its structure is documented here. The legacy GW code contains three sections. For now you can still enter data in those sections, but most input can be supplied in the ctrl file, and this is the preferred way. As of this writing, the product basis section still must be read through GWinput; some description is given below.

lmfgwd normally generates only the product basis category in GWinput. You can add a general category too (add --classicGWinput when invoking lmfgwd), as explained here. This is because in the modern implementation most tags in the general category can be read through the ctrl file. For the corresponding tags in the GW category of the ctrl to tags in GWinput, see the GWinput documentation.

The product basis category is generated. Here we break down the various parts of it

It begins with

  tolerance = minimum eigenvalue in PB overlap to reduce linear dependency
  3e-4 3e-4 1e-3

The first line is a comment; numbers in the next line correspond to $\ell{=}0,1,2$ as explained below. The last number is propagated to all $\ell$ in the product basis larger than the highest $\ell$ on the line (2 in this case).

In addition to the one-particle basis, the GW codes require a two-particle basis to represent two-particle objects such as the polarizability and the coulomb interaction $v(\mathbf{r},\mathbf{r}^\prime)$ . Just like the one-particle basis, the two-particle basis has an augmentation part and an interstitial part, as explained in detail in Section II A of Ref. 1. The augmentation or partial-wave portion of the two particle basis is called the product basis $B$ .

$B$ consists of products all possible pairs of partial waves $\{\phi_\ell, \dot\phi_\ell, \phi^z_\ell\}$ (the last applies to any local orbitals present). Each $B$ has a particular $\ell$ since the product of two $\phi_\ell(r)Y_{\ell{m}}$ can be written as a linear combination of $B_\ell(r) Y_{\ell{m}}$ . In its raw form, $B_\ell$ is unnecessarily large since many of the product functions contribute little to the relevant hilbert space (i.e. where $v$ is not small).

To reduce its rank the product basis maker will diagonalize the overlap matrix of all functions of a given $\ell$ , and discard any functions whose overlap eigenvalue is larger than the tolerance for that $\ell$ (3e-4 for $\ell{=}0,1$ and 1e-3 for $\ell{\geq}2$ in this case). This greatly reduces the size of the product basis and significantly improves the efficiency of the code.

  lcutmx(atom) = l-cutoff for the product basis
 4

The first line is a comment; numbers in the next line correspond to $\ell$ cutoff $\ell_\mathrm{max}$ . Any $\ell$ component of a partial wave product exceeding $2\ell_\mathrm{max}$ is discarded.
Note: this line must contain one number for every atom in the unit cell. Fe has only one atom, so there is only a single number.

The following rules of thumb work pretty well

For light (second and third row) atoms, $\ell{=}3$ is usually adequate
For transition metal and moderately sized large atoms , use $\ell{=}4$
For larger atoms, e.g. Sr, increasing $\ell$ to 5 can have a non-negligible effect
For 4f and 5f systems, use $\ell{=}6$ .

  atom   l  nnvv  nnc ! nnvv: num. radial functions (valence) for augmentation-waves. nnc = num. for core.
  0    2    3
  1    3    1
  2    3    0
  3    2    0
  4    2    0

This table tells the GW code how many radial partial waves are present for each $\ell$ and atom in the unit cell. It is a property of the basis set and there is no reason to meddle with this table.

  atom   l    n  occ  unocc  :Valence(1=yes, 0=no)
  0    1    1    1   ! 4S_p *
  0    2    0    0   ! 4S_d
  1    1    1    1   ! 4P_p
  1    2    0    0   ! 4P_d
  1    3    1    0   ! 3P_l
  2    1    1    1   ! 3D_p
  2    2    0    0   ! 3D_d
  2    3    0    1   ! 4D_l
  3    1    0    1   ! 4F_p
  3    2    0    0   ! 4F_d
  4    1    0    0   ! 5g_p
  4    2    0    0   ! 5g_d

This table tells the product basis maker which partial waves to include when constructing the product basis. Each partial wave is associated with a site (atom), $\ell$ , and radial index (n). Columns labeled occ and unocc tell the maker to include this particular partial wave is the first (occ) or second (unocc) function in the product basis. Thus $\phi_{4s}$ is included in the list of both first and second functions, while $\dot\phi_{4s}$ is included in neither, and $\phi_{4f}$ is included only in the second. As a rough guide, you can associate the first function with occupied states and the second with unoccupied ones.

  atom   l    n  occ unocc   ForX0 ForSxc :CoreState(1=yes, 0=no)
  0    1    0    0      0    0    ! 1S *
  0    2    0    0      0    0    ! 2S
  0    3    1    0      0    0    ! 3S
  1    1    0    0      0    0    ! 2P

This table is similar to the preceding one, but it corresponds to core levels. Here the core level $\phi_{3s}$ is included in the first list, but not the second. The last two columns under ForX0 and ForSxc tell the GW code to include this particular partial wave in the polarizability (ForX0) or self-energy (ForSxc). The modern implementation does not have this capacity yet, so you should keep these values at 0.

lmfgwd generates a reasonable set of default values. To see why $\phi_{3s}$ was included but no other, inspect this table of core-level energies the free-atom code lmfa generates:

 core:        ecore       node at      max at       c.t.p.   rho(r>rmt)
   1s    -513.13167         0.000       0.038       0.077     0.000000
   2s     -60.12275         0.079       0.223       0.349     0.000000
   2p     -51.29636         0.000       0.180       0.383     0.000000
   3s      -6.94739         0.328       0.701       1.023     0.000504

All the core levels are very deep, but $\phi_{3s}$ is shallow enough that it might affect the completeness of the basis, and lmfgwd included it (it is marginal whether it affects anything or not).

lmfgwd normally generates only the product basis category in GWinput. You can add a general category too (add --classicGWinput when invoking lmfgwd), as explained here. It is rarely needed in the modern implementation most tags in the general category can be read through the ctrl file. For the corresponding tags in the GW category of the ctrl to tags in GWinput, see the GWinput documentation.

Move to the dynamical self-energy tutorial

Create a template with:

$ lmfgwd --job=-1 --sigw --ib=1:9 ctrl.fe

Switch --sigw makes small changes to file.ext in preparation to make the dynamical self-energy. It isn’t necessary for the QSGW calculation of this tutorial, but it is needed for the follow-on tutorial that makes dynamical self-energy and interacting spectral functions.

Switch --ib=1:9 is used in 1-shot GW mode. It isn’t relevant for this tutorial, but sets list of QP states for which the dynamical self-energy is made in the dynamical self-energy tutorial.

We are particularly interested in Fermi liquid properties, involving states near the Fermi surface. The raw GWinput template will generate a reasonable self-energy, but the 3s and 3p core levels affect the Fermi surface enough that we need to treat them at a little better level of approximation than the template gives you.

Edit GWinput:

$ nano GWinput

In the core part of the product basis you should see these lines:

  atom   l    n  occ unocc   ForX0 ForSxc :CoreState(1=yes, 0=no)
  0    1    0    0      0    0    ! 1S *
  0    2    0    0      0    0    ! 2S
  0    3    0    0      0    0    ! 3S
  1    1    0    0      0    0    ! 2P
  1    2    1    0      0    0    ! 3P

With switches as given, no core level participates in the product basis, polarizability or self energy, except that the 3p is included in the product basis (occ=1). For accurate description of the Fermi surface the 3s also needs to be included; moreover, both 3s and 3p need to be included in the polarization (ForX0 and self-energy (ForSxc).

Hazard: core levels are calculated independently of the valence levels, and there is a slight residual nonorthogonality that can cause problems if the core levels are too shallow. This can be a serious issue and a safer approach is to include these levels in the valence through local orbitals, though in this case the levels are deep enough that the present treatment is adequate.

Change the 3s and 3p lines as follows:

  atom   l    n  occ unocc   ForX0 ForSxc :CoreState(1=yes, 0=no)
  ...
    1    0    3    1    0      1    1    ! 3S
  ...
    1    1    2    1    0      1    1    ! 3P

QSGW self-consistency

QSGW self-consistency is executed through a shell script lmgw.sh, This page provides a user’s guide to the lmgw.sh script, while this page explains the various executables lmgw.sh runs to perform one QSGW cycle.

Things to check before executing lmgw.sh
If you are executing a job in a fresh directory, these steps are not needed. But if you have already run a QSGW calculation in this directory, there are residual files that may affect the results. They are kept because they can be beneficial at times, but they can also lead to confusion if you are not aware of their effects.

meta directory
This keeps track of the last iteration executed, and whatever steps were run in that iteration. It is useful because if a job aborts before completion you can restart it where it left off. But if you want to start fresh, you should remove this directory.
sigm.fe
contains a QSGW self-energy Σ⁰. If you want to start from the LDA, remove this file.
mixm.fe and mixsigma
are mixing files from prior iterations (for the density and Σ⁰, respectively). They are used to accelerate convergence to self-consistency. Normally you would remove both files, but there are times when it makes sense to keep them.
*.h5
a number of hdf5 files are generated by GW. You should remove them, except possibly sig.h5. This file contains the same information as sigm.fe but is read by the GW self-energy maker (lmsig) for Anderson mixing of prior iterations to assist convergence to self-consistency. If sig.h5 is present, it should be consistent with sigm.fe (and not present if starting from the LDA). If you do start from some sigm.fe, make sig.h5 corresponding to it as follows: ln -s sigm.fe sigm; s2s5

Note: lmgw.sh keeps track of what it executed last (it runs a family of executables in a sequence, and repeats this sequence iterating to self-consistency). If you are not starting from scratch but don’t want to pick up where you left off, start with a fresh directory or clean the existing one with the following

touch moms.h5
rm -r -f meta *.h5

The Fe system is small enough you can run lmgw.sh in serial mode. For the serial mode, do the following:

rm -f mixm.fe
lmgw.sh --incount --tol=2e-5 ctrl.fe >lmgw8.log

Note 1: lmgw.sh picks up command-line arguments from switches-for-lm. --tol=2e-5 tells lmgw.sh to stop iterating when the RMS change in Σ⁰ relative to the previous iteration is less than 10⁻⁵.

Note 2: to see the command-line options lmgw.sh knows about, do the following:
lmgw.sh -h
or visit this page.

In serial mode, this calculation is manageable: on a Macbook Pro it takes ~1000 seconds/iteration to execute. Since the calculation requires about 11 iterations to run, the total execution time is several hours.

The calculation executes much faster if you can run a job with MPI and/or OpenMP, but the setup takes more work.

The QSGW implementation has multiple levels of parallelization. To execute, the cores have to be factored into MPI processes and multi-core threading (OpenMP and threaded libraries). Calling n_c the number of cores, n_p the number of MPI processes and n_t the number of threads, the product n_p×n_t ideally should be n_c.

Note: as a matter of execution efficiency, it is usually better to increase n_p and keep n_t relatively small. On the other hand, each MPI process requires its own allocation of memory (in contrast to the multithreading, which uses shared memory), so if n_p is too large, you may run out of memory.

Unfortunately there is no generic prescription for how to run lmgw.sh in parallel mode, as the specific instructions depend on hardware and compiler, and what resources are available. This page provides a general description of to use lmfgwd to facilitate setting up a calculation using lmgw.sh in a parallel environment.

The description below is written for generic single machine with 16 cores running on a single node. Thus we will choose n_p=4, which also makes n_t=4.

The code runs more efficiently if you make a pqmap file. The GW algorithm performs an internal integration over k, and must do so for every irreducible q point. The self-energy maker lmsig (mostly) treats each qk combination as an independent unit and it can parellelize over them (but see Ref. 5). How to optimally distribute the MPI processes over these combinations is non-trivial, and the purpose of pqmap is to tell lmsig how to distribute them over the MPI processes.

Constructing a good pqmap by hand is a painful process, and for that reason lmfgwd has built in a pqmap maker, which makes the construction relatively painless. This command

lmfgwd ctrl.fe `cat switches-for-lm` --job=0 --lmqp~rdgwin --batch~np=4~pqmap@plot

makes file pqmap-4 and also plot.pq4.
Note: −np is appended to the file name to distinguish pqmap for different choices of n_p.

pqmap is a simple ASCII file, whose structure is described here. It is a somewhat tiresome business to visualize how pqmap organizes things by direct inspection of the file, but a graphic rendition is easily accomplished with plot.pq4, a file which is generated concurrently with pqmap-4. Do the following

fplot -f plot.pq4

Use your favorite postscript viewer (e.g. open fplot.ps) for a visual rendition of pqmap.

Note: constructing the optimum pqmap is very difficult in all but the simplest cases, and indeed lmfgwd’s internal algorithm is quite complex. It generally makes a good pqmap but probably not precisely the optimum one. Nevertheless it does a reasonably good job. You can see how efficiently it “tiles the canvas” (there is usually some dead space) by looking for this line in the output:

 makepqmap: filled pqmap for 29 qp  nstep=1632 (ideal=1575)  nproc=4  nqk=6300  packing fraction 0.9651  avg no proc/q 1.1

It tells you that 6300 qk combinations must be calculated. With the map lmfgwd created the self-energy maker lmsig will execute 1632 loops (a little worse than the ideal 6300/4).

You must tell lmgw.sh how to distribute the available cores between MPI process and threads. The appropriate instruction depends on your particular implementation of MPI and whether you have, e.g. the multi-threaded MKL library, linked in with your code.

A typical MPI launch of an executable in serial mode would look like
env OMP_NUM_THREADS=16 MKL_NUM_THREADS=16 mpirun -n 1
while the corresponding launch for 4 MPI processes and 4 threads would be similar to
env OMP_NUM_THREADS=4 MKL_NUM_THREADS=4 mpirun -n 4 p Thus a 16-process MPI analog of lmgw.sh would look something like:

rm -f mixm.fe
lmgw.sh --incount --mpirun-1 "env OMP_NUM_THREADS=16 MKL_NUM_THREADS=16 mpirun -n 1" --mpirun-n "env OMP_NUM_THREADS=4 MKL_NUM_THREADS=4 mpirun -n 4" --tol=2e-5 ctrl.fe > lmgw8.log &

On a standard Intel X86 machine, this calculation executes in about 90 seconds/iteration.

Further description of how to use parallel mode

What follows is not important for this system because it executes quickly anyway, but it is presented here as a reference for more complex systems, where making fuller use of the parallel capabilities lmgw.sh has can matter.

lmgw.sh runs a heterogeneous set of executables, as described on this page. For optimum performance each one should have is own launcher. lmgw.sh in fact allows you to construct separate launchers for each MPI-parallelized executable it needs to run. Unfortunately this cannot be automated, as the launchers are architecture-specific. Nevertheless lmfgwd will make a skeleton batch script for you, assuming a certain architecture (which you specify; a few choices are available). Here we use the vanilla architecture with 16 cores:

lmfgwd ctrl.fe --job=0 --lmqp~rdgwin --batch~np=4~nodes=1~pqmap@plot~vanilla16~nplm=16~npgwd=16

Adding ~nplm=16~npgwd=16 isn’t necessary but since both lmf and lmfgwd work optimally by parallelizing over k, this assigns all cores to distribute over MPI. For this particular calculation, it runs very fast and so it matters little.

This command will generate file batch-4. At the end of the file you will see:

m1="env OMP_NUM_THREADS=16 MKL_NUM_THREADS=16 mpirun -n 1"
mn="env OMP_NUM_THREADS=4  MKL_NUM_THREADS=4  mpirun -n 4"
ml="env OMP_NUM_THREADS=1  MKL_NUM_THREADS=1  mpirun -n 16"
md="env OMP_NUM_THREADS=1  MKL_NUM_THREADS=1  mpirun -n 16"
mc="env OMP_NUM_THREADS=4  MKL_NUM_THREADS=4  mpirun -n 4"

rm -f mixm.fe
lmgw.sh --incount --mpirun-1 "$m1" --mpirun-n "$mn" --cmd-lm "$ml" --cmd-gd "$md" --cmd-vc "$mc" --split-w0 --tol=2e-5 ctrl.fe &> lmgwmp.log

The first thing to check for is whether lmgw.sh was able to converge to self-consistency.
Do grep more lmgw8.log (or grep more lmgwmp.log):

lmgw: iter 0 of 100 completed in 99s, 102s from start. RMS change in sigma = 5.59e-2  Tol = 2e-5  more=T
lmgw: iter 1 of 100 completed in 107s, 209s from start. RMS change in sigma = 7.24e-3  Tol = 2e-5  more=T
lmgw: iter 2 of 100 completed in 112s, 321s from start. RMS change in sigma = 2.8e-3  Tol = 2e-5  more=T
lmgw: iter 3 of 100 completed in 107s, 428s from start. RMS change in sigma = 1.69e-3  Tol = 2e-5  more=T
lmgw: iter 4 of 100 completed in 89s, 517s from start. RMS change in sigma = 8.55e-4  Tol = 2e-5  more=T
lmgw: iter 5 of 100 completed in 116s, 633s from start. RMS change in sigma = 2.51e-4  Tol = 2e-5  more=T
lmgw: iter 6 of 100 completed in 84s, 717s from start. RMS change in sigma = 1.07e-4  Tol = 2e-5  more=T
lmgw: iter 7 of 100 completed in 94s, 811s from start. RMS change in sigma = 4.31e-5  Tol = 2e-5  more=T
lmgw: iter 8 of 100 completed in 79s, 890s from start. RMS change in sigma = 3.02e-5  Tol = 2e-5  more=T
lmgw: iter 9 of 100 completed in 85s, 975s from start. RMS change in sigma = 2.49e-5  Tol = 2e-5  more=T
lmgw: iter 10 of 100 completed in 78s, 1053s from start. RMS change in sigma = 9.4e-6  Tol = 2e-5  more=F

It converged in 11 iterations. Fe is a transition metal, and it is more complex than a simple sp semiconductor, and takes more cycles to converge. Some information for each iteration is kept in subdirectories 0run, 1run, … .

Note 1: Iteration “0” is analogous to G^LDAW^LDA, which is sometimes referred to as simply “GW” or “G₀W₀”. It is not quite the standard G^LDAW^LDA, because the full matrix structure of Σ⁰ is kept, and the Z factor is taken to be unity. There is a justification for the latter, see the Appendix in Ref. 2. For a justification why QSGW accounts for the missing vertex Γ (GW is an approximation to the exact theory, GWΓ), see Appendix A in Ref. 1.

Note 2: lmgw.sh does not store $\Sigma^0$ itself, but $\bar{\Sigma}^0 = \Sigma^0{-}V_\mathrm{xc}^\mathrm{LDA}$ . $\bar{\Sigma}^0$ may be thought of as an external perturbation added to the LDA potential. Each QSGW iteration a “small loop” where the density is updated in the presence of a fixed $\bar{\Sigma}^0$ . lmf updates the density in the presences of this external potential; there is some change $\delta n$ that updates the Hartree potential, and also induces a change $(\delta V_\mathrm{xc}/\delta n)\delta n$ . This latter is an LDA approximation to the change in self-energy, $(\delta \Sigma^0/\delta n)\delta n$ . If this were a good approximation, very few iterations would be required to reach self-consistency. This is the case in sp semiconductors, but the approximation is less good for Fe. At self-consistency, $\delta n{=}0$ , and the change from the small loop has no effect.

The 0^th iteration is the closest analog to G^LDAW^LDA, and it is useful to see what happens in that iteration.

The first difficulty is that there is an ambiguity in the Fermi level. One choice (the usual one) is to use the Fermi level from the LDA (−-0.003997 Ry). A probably better choice is the one that conserves the total charge in the presence of Σ⁰. To see what it is, do grep Fermi 0run/llmf. The first line, which corresponds to the Fermi energy without the density being updated, is 0.128363 Ry. This is a huge change! In QSGW there is only one Fermi level, and there is no ambiguity.

Nevertheless it QSGW is a not a conserving approximation, or even does it properly conserve charge, because the proper density must be computed from the interacting G, not the quasiparticled G₀. However, by construction, G₀ is optimally close to G, so it does the best job of conserving charge among all possible G₀. This is discussed in some detail in Section IC of Ref. 1.

Second, if only the diagonal part of Σ⁰ is kept, eigenfunctions do not change and the charge density is frozen (except through possible changes in occupation numbers). Thus there is no meaningful way to determine the ground state density, neither the charge density (which affects the Hartree potential) nor the magnetization density.

In systems other than very simple ones, the density is in fact significantly renormalized relative to the LDA. This induces a corresponding change in the effective potential which can be quantified by the inverse of the susceptibility, $\chi^{-1}(x_1,x_2) = \delta V(x_1)/\delta n(x_2)$ . lmf’s “small loop” makes a self-consistent density in the presence of an “external” potential $\bar{\Sigma}^0$ , which effectively finds the change $\delta n$ caused by $\bar{\Sigma}^0$ , using an LDA approximation to $\chi^{-1}$ . This greatly improves convergence to self-consistency in the large loop.

TiSe₂ is an outstanding example that highlights the importance of self-consistency. In the vast majority of cases, G^LDAW^LDA, underestimates the correction to the LDA gap in semiconductors. The reverse occurs in TiSe₂, because of changes to the density. See Ref. 7.

To highlight the consequences of these difficulties in Fe, inspect how the Fermi level E_F, the density-of-states at E_F, and the magnetization evolve in the self-consistency cycle of the small loop. Do grep -A 3 ' BZINTS: Fermi energy:' 0run/llmf.

In the first iteration, which uses the LDA density, the following appears

 BZINTS: Fermi energy:      0.128363;  14.000000 electrons;  D(Ef):   14.970
         Sum occ. bands:  -22.0841962  incl. Bloechl correction:  -0.0009566
         Mag. moment:       2.209625

In the last iteration, it changes to

 BZINTS: Fermi energy:      0.026360;  14.000000 electrons;  D(Ef):   16.029
         Sum occ. bands:  -23.3316584  incl. Bloechl correction:  -0.0010080
         Mag. moment:       2.560232

The Fermi level moves closer to the LDA, but the magnetization is severely overestimated.

Consider also what the magnetization would be if the LDA Fermi level and charge density were used. Use the LDA rst file and Σ⁰ from the first iteration

cp rst.lda rst.fe
cp 0run/sigm.fe .

Run lmf telling it what Fermi level to use

lmf ctrl.fe --ef=-0.003984 `cat switches-for-lm` --quit=rho

You should see the following lines in stdout

 BZWTS : Fermi level externally given as -0.003984 ...
 BZINTS: Fermi energy:     -0.003984;   9.902702 electrons
         Sum occ. bands:  -22.3442920 incl. Bloechl correction:  -0.0006227
         Mag. moment:       1.535422

The electron count is very far from 14 electrons, and the magnetization is severely underestimated.

(Be sure to restore the QSGW potential after this test: do cp 10run/{rst.fe,sigm.fe} .).

File llmf contains the output from the one-body QSGW potential of the last iteration. Confirm that the density is also self-consistent : do grep DQ llmf.

Compare the QSGW Fermi level E_F, the density-of-states D(E_F), and the magnetization have changed relative to the LDA: (grep -A 3 ' BZINTS: Fermi energy:' llmf). You should see

 BZINTS: Fermi energy:      0.027198;  14.000000 electrons;  D(Ef):   24.025
         Sum occ. bands:  -23.5727656  incl. Bloechl correction:  -0.0010326
         Mag. moment:       2.347838

D(E_F) is substantially larger (24.0 Ry⁻¹ in QSGW vs 16.0 Ry⁻¹ in the LDA). This is very important because many-body effects are typically driven by instabilities at the Fermi surface. In this case, however, the large change is probably an artifact of the small electron pocket at the Γ point described in the next section. See Additional Exercise 1.

Quite generally, QSGW d bands are narrower than in the LDA, and are closer to experiment; see Refs. 6 and 8. The magnetic moment (2.35 μ_B) is larger than the experimental value (2.20 μ_B), though a portion of the overestimate can be traced to imperfect k convergence in Σ⁰ (see Additional Exercise 1).

In this respect LDA does better, quite well reproducing the experimental moment. However, this is fortuitous: neither QSGW nor LDA account for spin fluctuations, whose omission universally tends to overestimate the magnetic moment. On the other hand, LDA tends to underestimate local moments in large-moment systems because it underestimates the exchange-correlation magnetic field $B_\mathrm{xc}^\mathrm{LDA}$ ; see Fig. 3 of Ref. 6. In Fe and Ni the two errors nearly cancel in the LDA, while in QSGW only one error is present.

QSGW energy bands

Σ⁰ is an effective non interacting potential with one-particle levels, similar to Hartree Fock or the LDA. The band structure can be drawn in the same way as in the LDA. You can do, e.g.

lmf ctrl.fe `cat switches-for-lm` --efrs --band~fn=syml
plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -spin1 bnds.fe

Use your favorite postscript viewer (e.g. open fplot.ps) to see a picture.

For a better comparison with the LDA, here we explain how to make QSGW energy bands with spin orbit coupling included, and also with color weights. The procedure is essentially identical to the LDA, except that we need to manage one complication. Adding SO coupling reduces the symmetry, and lmf and thus increases the number of irreducible k points. lmf will not be able to read Σ⁰. As a workaround, lmf can Bloch transform Σ⁰ to real space, which does not depend on symmetry.

lmf ctrl.fe `cat switches-for-lm` --wsig:rs; cp sigm2rs.fe sigmrs.fe

Confirm that the lmf can read from sigmrs.fe and that the density is self-consistent, and continues to be (nearly) self-consistent in the SO case.

lmf ctrl.fe `cat switches-for-lm` --rsig:rs --quit=rho
lmf ctrl.fe `cat switches-for-lm` --rsig:rs -vso=1 --quit=rho

Next make the QSGW energy bands and convert them into a postscript file

lmf ctrl.fe `cat switches-for-lm` --rsig:rs -vso=1 --band~fn=syml~scol@sites=1@l=2@spin1~scol2@sites=1@l=2@spin2
plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -lt=1,bold=3,col=0,0,0,colw=1,0,0,colw2=0,1,0 -lbl bnds.fe

Use your favorite postscript viewer (e.g. open fplot.ps) to see a picture.

In this figure colors correspond to:

black : sp character
red : majority d character
green : minority d character

Note the bands at the Γ point, in particular the three minority t_2g lying very near E=0. They fall slightly above 0, though experimentally they should dip slightly below 0 and form a small electron pocket. This is an artifact of incomplete k convergence, as Additional Exercise 1 demonstrates.

Compare QSGW and LDA energy bands

When comparing QSGW to LDA, here we use the better converged results of Additional Exercise 1. If you do not wish to go through that exercise, you rename bnds.fe of the preceding section to bnds.gw.

For this section the LDA (bnds.lda) and QSGW (bnds.gw) energy bands should in your working directory, containing bands along four symmetry lines (Γ-H, H-N, N-P, and P-Γ). Note: this section assumes that both the QSGW and the LDA bands were generated with spin orbit coupling.

Here we focus on the d bands in the energy window between (−5,4) eV. We will use the plbnds utility to render both data sets into a single figure.

The following

plbnds -range=-5,4 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 --nocol -dat=lda bnds.lda
plbnds -range=-5,4 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 --nocol -dat=gw bnds.gw

makes bnd[1234].lda for the four panels of LDA bands, and bnd[1234].gw for the four panels of QSGW bands, and also a script plot.plbnds to make a postscript file using the fplot utility.

Next, modify plot.plbnds to include the LDA bands

mv plot.plbnds plot.plbnds~
awk 'BEGIN {print "%char0 lta=2,bold=4,col=1,0,0"} {if ($1 == "-colsy") {print;sub("gw","lda");sub("ltb","lta");print} else {print}}' plot.plbnds~ >> plot.plbnds
fplot -f plot.plbnds

You should see something similar to this figure

QSGW and LDA bands of Fe

The LDA d bands (dashed red lines) are wider than the QSGW (black). That LDA overestimates the width of narrow bands is a well known tendency. While QSGW does a stellar job for Fe (see Ref. 6), for systems with strong spin fluctuations QSGW improves on the LDA but it still overestimates bandwidths (see Ref. 8).

Detailed comparison of levels at the Gamma point

Examine the first lines of bnd1.lda and compare to bnd1.gw. These are the QP levels at the Γ point. They should be for LDA

            maj s   min s      ---- maj t2g ----        ---- maj eg ---         ---- min t2g ----      ---- min eg ---
 0.00000  -8.4656  -8.0766  -2.2912  -2.2640  -2.2327  -0.9611  -0.9516  -0.3682  -0.3505  ...  -0.3192   1.4624   1.4624

and for QSGW

            maj s   min s      ---- maj t2g ----        ---- maj eg ---         ---- min t2g ----      ---- min eg ---
 0.00000  -8.6189  -8.4244  -1.7645  -1.7400  -1.7073  -0.8913  -0.8818  -0.0971  -0.0781  ...  -0.0481   1.5431   1.5444

The three t_2g and two e_g levels would be exactly degenerate, but are split slightly owing to spin orbit coupling. Thus in the LDA the average t_2g majority level is (-2.2912-2.2640-2.2327)/3 = -2.26 eV.

Energy levels and splittings of d bands at Γ, in eV
Potential	t_2g (maj)	e_g (maj)	t_2g (min)	e_g (min)	e_g−t_2g (maj)	e_g−t_2g (min)	t_2g(min)-t_2g(maj)	e_g(min)-e_g(maj)
LDA	-2.26	-0.96	-0.35	+1.46	+1.30	+1.81	+1.91	+2.42
QSGW	-1.74	-0.89	-0.07	+1.54	+0.85	+1.61	+1.67	+2.43

The “crystal field” (e_g−t_2g) splitting is rather different, reflecting smaller hopping matrix elements in the QSGW case. Moreover, the exchange splitting (e_g(min)−e_g(maj)) or (t_2g(min)−t_2g(maj)) is more orbital-dependent in QSGW, reflecting the fact that unlike the LDA, QSGW has an orbital-dependent potential.

Run the QSGW script as follows:

$ rm mixm.fe
$ lmgwsc --sym --metal --tol=1e-5 --getsigp fe > out.lmgwsc

or faster

$ rm mixm.fe
$ lmgwsc --mpi=6,6 --sym --metal --tol=1e-5 --getsigp fe > out.lmgwsc

The QSGW cycle should complete in a couple of hours, after 9 iterations. When it is finished, do

$ grep more out.lmgwsc

You should see the following:

    lmgwsc : completed iteration 0 of 999  more=T Mon Oct 31 09:05:33 GMT 2016 elapsed wall time 15.4m (0.3h) phpdl1
    lmgwsc : iter 1 of 999  RMS change in sigma = 5.57E-03  Tolerance = 1e-5  more=T Mon Oct 31 09:22:09 GMT 2016 elapsed wall time 32.0m (0.5h) phpdl1
    lmgwsc : iter 2 of 999  RMS change in sigma = 2.32E-03  Tolerance = 1e-5  more=T Mon Oct 31 09:38:12 GMT 2016 elapsed wall time 48.0m (0.8h) phpdl1
    lmgwsc : iter 3 of 999  RMS change in sigma = 4.91E-04  Tolerance = 1e-5  more=T Mon Oct 31 09:54:15 GMT 2016 elapsed wall time 64.0m (1.1h) phpdl1
    lmgwsc : iter 4 of 999  RMS change in sigma = 9.56E-04  Tolerance = 1e-5  more=T Mon Oct 31 10:11:04 GMT 2016 elapsed wall time 80.9m (1.3h) phpdl1
    lmgwsc : iter 5 of 999  RMS change in sigma = 6.64E-04  Tolerance = 1e-5  more=T Mon Oct 31 10:28:16 GMT 2016 elapsed wall time 98.1m (1.6h) phpdl1
    lmgwsc : iter 6 of 999  RMS change in sigma = 8.96E-05  Tolerance = 1e-5  more=T Mon Oct 31 10:45:00 GMT 2016 elapsed wall time 114.8m (1.9h) phpdl1
    lmgwsc : iter 7 of 999  RMS change in sigma = 7.11E-05  Tolerance = 1e-5  more=T Mon Oct 31 11:01:18 GMT 2016 elapsed wall time 131.1m (2.2h) phpdl1
    lmgwsc : iter 8 of 999  RMS change in sigma = 9.82E-06  Tolerance = 1e-5  more=F Mon Oct 31 11:18:00 GMT 2016 elapsed wall time 147.8m (2.5h) phpdl1

The self-consistent cycle ends when the RMS change in Σ⁰ falls below the specified tolerance (–tol=1e-5).

Additional exercises

1. k convergence

As was noted earlier, the QSGW magnetic moment comes out about 7% larger than the measured value of 2.20 μ_B. A portion of this discrepancy originates from incomplete k convergence. To check this, redo the calculation with a 12-division QSGW mesh. You can start from the LDA, but you can also start from Σ⁰ already computed with 8-divisions.

To do this you must rewrite Σ⁰ with 12 divisions. This is possible because lmf can interpolate k points. The following steps accomplish this

echo '--tbeh --v8 -vrsham=2 -vnkgw=12' > switches-for-lm
lmf ctrl.fe `cat switches-for-lm` --wsig:newkp
cp sigm2.fe sigm.fe

You should also reset the directory for a fresh calculation, and update sig.h5 and pqmap-4 (lmsig uses both sigm.fe and sig.h5 for now, even though they contain the same information.)

lmgwclear ctrl.fe
ln -s sigm.fe sigm; s2s5; rm sigm
lmfgwd ctrl.fe --job=0 --lmqp~rdgwin --batch~np=4~nodes=1~pqmap@plot

Make a self-consistent Σ⁰ with 12 divisions:

lmgw.sh --incount --mpirun-1 "env OMP_NUM_THREADS=16 MKL_NUM_THREADS=16 mpirun -n 1" --mpirun-n "env OMP_NUM_THREADS=4 MKL_NUM_THREADS=4 mpirun -n 4" --tol=2e-5 ctrl.fe >> lmgw12.log

On a 16 core X86 machine the computation time increases from 80 sec/iteration to about 700 sec/iteration. You can drive the system to full self-consistency but it turns out nearly all of the change occurs in the first iteration.

Read the magnetic moment from the last line of save.fe. You should find it is near 2.26 μ_B, reduced from 2.35 μ_B obtained with 8 k divisions. That it comes out slightly larger than the experimental value is a good thing. Spin fluctuations universally tend to reduce the magnetic moment. This is a local-moment system so the reduction should be small, but some reduction should expected. See Ref. 6, where QSGW was augmented by DMFT for Ni. In Ni the local moment drops from about 0.76 μ_B to 0.60 μ_B.

It was also noted, that the QSGW DOS at E_F seemed to be anomalously large. Comparing to the 8 k-divisions result you can see that E_F is nearly unchanged by D(E_F) is similar to the LDA result. Do grep -A 3 ' BZINTS: Fermi energy:' llmf to find

 BZINTS: Fermi energy:      0.034081;  14.000000 electrons;  D(Ef):   15.084
         Sum occ. bands:  -23.4664444  incl. Bloechl correction:  -0.0010066
         Mag. moment:       2.258714

QSGW energy bands

Here we construct the QSGW energy bands with spin orbit coupling included, and also with color weights. We can use the same procedure as before.

lmf ctrl.fe `cat switches-for-lm` --wsig:rs; cp sigm2rs.fe sigmrs.fe
lmf ctrl.fe `cat switches-for-lm` --rsig:rs -vso=1 --quit=rho

Make the QSGW energy bands and convert them into a postscript file

lmf ctrl.fe `cat switches-for-lm` --rsig:rs -vso=1 --band~fn=syml
plbnds -range=-5,4 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -lt=1,bold=3,col=0,0,0,colw=1,0,0,colw2=0,1,0 -lbl bnds.fe

Use your favorite postscript viewer (e.g. open fplot.ps) to see a picture.

In this figure colors correspond to:

black : sp character
red : majority d character
green : minority d character

Note that the t_2g level at Γ now falls slightly below E_F, as it should.

Copy bnds.fe in preparation for the section comparing QSGW to LDA bands.

cp bnds.fe bnds.gw

1a. Band discontinuity

Before closing this Exercise, there is one final point to address. When viewed in high resolution (plbnds -rnge=-1,1 ...) a small discontinuity in the t_2g level between Γ and H becomes apparent. This is an artifact of the interpolation of the self-energy Σ⁰ between points where it is calculated.

You can resolve this by smoothing out the turn-off from low-energy to high energy behavior (this is described in Section II G of Ref. 1). This can be accomplished by setting HAM_SIGP_WEMAX, which for the input file we have, is assigned to preprocessor variable emaxw. Using a negative value for this variable interpolates the transition with a gaussian function.

lmf -vemaxw=-.5 ctrl.fe `cat switches-for-lm` --rsig:rs -vso=1 --quit=rho
lmf -vemaxw=-.5 ctrl.fe `cat switches-for-lm` --rsig:rs -vso=1 --band~fn=syml~scol@sites=1@l=2@spin1~scol2@sites=1@l=2@spin2
plbnds -range=-1,1 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -lt=1,bold=3,col=0,0,0,colw=1,0,0,colw2=0,1,0 -lbl bnds.fe

The bands now become smooth. There is a small, but minor change to the Fermi level and charge density.

2. Stabilizing Convergence to Self-Consistency with the GW mixing parameter

One way to stabilize relatively slow convergence of the self-energy Σ⁰ is to slow the mixing of input and output self-energies.

In weakly correlated systems this is of no concern, but when systems become correlated or highly heterogeneous it becomes necessary.

As noted on this page, the trial Σ⁰ for the next iteration is constructed by the mixture (1−beta)×Σ^0,in + beta×Σ^0,out .

You can inspect how much of (Σ^0,in,Σ^0,out) pairs are Anderson-mixed to make a new estimate for Σ⁰, by doing grep tj ?run/lsg. These tj indicate what portion of prior iterations is mixed in. Since the tj typically sum up to something on the order of 1/2, it suggests you reduce the mixing beta to 1/2.

The beta to mix Σ⁰ is read through tag GW_MIXBETA, which for the present input file ctrl.fe, is assigned to preprocessor variable gwmixb. (If GW_MIXBETA is not supplied at all, it defaults to unity.)

As an Exercise, add this variable assignment to switches-for-lm:

echo --tbeh --v8 -vrsham=2 -vgwmixb=.5 > switches-for-lm

and repeat the steps of Part 2. You should that you reach essentially the same final result, but by a different path.

Note: Before restarting lmgw.sh, be sure to reset your directory:

lmgwclear fe && cp rst.lda rst.fe && rm -rf meta && rm sig*

You will find that the initial convergence is slower because it has to build up Σ⁰ more slowly from nothing (the initial Σ⁰ is zero), but once Σ⁰ is built up, it converges more stably. You can start with one or two few iterations (add maxit=2 or something similar to the command line when running lmgw.sh) and after it finishes, resume with a smaller beta. This is a typical thing to do anyway, when you find Σ⁰ not converging well.

Note: While a calculation is proceeding you can monitor its convergence (grep more outfile-from-lmgw.sh). If it is not satisfactory, you can kill the job outright (which means you leave the job in an unfinished state, and when lmgw.sh is restarted, it will resume at the step where it left off), or have it stop cleanly at the end of the current iteration (echo 1 > maxit).

Note also that you can let the job continue and change GW_MIXBETA on the fly, by editing the ctrl file, or in this case modifying switches-for-lm.

3. Dependence on gcutb and gcutx

gcutb and gcutx are estimated from the shapes of the envelope functions. The autogenerated parameters (gcutb=3.5 and gcutx=2.8) are somewhat conservative. In this simple case being conservative costs little, but for larger systems the time is very sensitive to gcutx.

Try reducing these numbers by scaling the autogenerated value by 0.9 (getgcut=.9)

blm --brief --mag --noforce --rsham~el=-.4,-1.2 --autobas~ehmx=-.4~eloc=-4.5~noelo --mix~nit=20 --gw~emax=3~getgcut=.9~mixb=1~wemax=0~nk=8 --nk=16 fe
echo --tbeh --v8 -vrsham=2 -vgwmixb=.5 > switches-for-lm

and repeat the steps of Part 2. (switches-for-lm is modified to stabilize the mixing, as explained in Exercise 2.

You should find that the energy bands, magnetic moments, etc, are almost indistinguishable from before. Even getgcut=.8 yields very nearly identical results.

Note: Before restarting lmgw.sh, be sure to reset your directory:

lmgwclear fe && cp rst.lda rst.fe && rm -rf meta && rm sig*

4. Dependence on frequency mesh

The frequency mesh (spacing between frequency points ω_i on the real energy axis) can greatly modify both the memory consumption and execution speed. The frequency integration is described in detail in Section II F of Ref. 1.

For small ω_i, points are spaced linearly, by dw; the spacing gradually increasing with i depending on omg_c. See documentation for lines 29,30 in the GWinput documentation, general category. In the modern GW implementation, the general category is not normally used; instead dw and omg_c are first and second arguments to tag GW_DELRE in the ctrl file. (Note that in the ctrl file, these numbers have Ry units, while in GWinput they have Hartree units.)

Defaults for these numbers are pretty conservative. (Find them with lmfgwd --input | grep -i delre). The default value for dw, 0.02 Ry, can be increased with little reduction in the quality of the results.

In this exercise, use a text editor to add a token delre to the ctrl file, so the GW category has

gw                                    # Parameters for GW
      nkabc={nkgw} gcutb={gcutb} gcutx={gcutx}
      delre=.08 .10

and repeat the steps of Part 2. (Also modify switches-for-lm to stabilize the mixing, as explained in Exercise 2.)

You should find that the energy bands, magnetic moments, etc, are very similar.

Note: Before restarting lmgw.sh, be sure to reset your directory:

lmgwclear fe && cp rst.lda rst.fe && rm -rf meta && rm sig*

5. Dependence on basis set

The basis set is compact, and therefore not complete in the same way a plane wave basis set is. It is tailored to be sufficient complete over the energy window of primary importance (a Ry or so within the Fermi level), but it is important to monitor how strongly the results depend on choice of basis.

In this Exercise we push the envelope function energies (set to −0.4 and −1.2; Ry) down to −0.5 and −1.3; Ry, respectively. Do this with

blm --brief --mag --noforce --rsham~el=-.5,-1.3 --autobas~ehmx=-.5~eloc=-4.5~noelo --mix~nit=20 --gw~emax=3~getgcut~mixb=1~wemax=0~nk=8 --nk=16 fe

Repeat the steps of Part 2. (Also modify switches-for-lm to stabilize the mixing, as explained in Exercise 2.)

You should find that the energy bands, magnetic moments, etc, are very similar.

Note: Before restarting lmgw.sh, be sure to reset your directory:

lmgwclear fe && cp rst.lda rst.fe && rm -rf meta && rm sig*

5a. Fe 3p orbital

The Fe 3p is deep enough that it should have a small effect on the valence levels. Small is not negligible though, and to see what it is, Exercise 5a redoes the calculation of Part 2, putting the Fe 3p in the core. The following remakes the original ctrl and basp files, and then removes the Fe 3p from the basp file.

blm --brief --mag --noforce --rsham~el=-.4,-1.2 --autobas~ehmx=-.4~eloc=-4.5~noelo --mix~nit=20 --gw~emax=3~getgcut~mixb=1~wemax=0~nk=8 --nk=16 fe
sed -i.bak 's/PZ=.*/PZ=0 0 4.4/' basp.fe

You cannot use the same LDA density as before, as valence-core partitioning has changed. The rst file cannot be used and the atomic density must be remade. Clean the directory and remake the atomic density and the GWinput file

lmgwclear fe && rm -rf rst.fe meta && rm sig*
echo --tbeh --v8 -vrsham=2 -vgwmixb=.5 > switches-for-lm
lmfa ctrl.fe `cat switches-for-lm`
lmfgwd ctrl.fe `cat switches-for-lm` --job=-1

Repeat the steps of Part 2.

While the result is not a complete disaster (the energy bands are pretty similar), some things worsen. Notably the magnetic moment comes out a bit too large, about 2.47 μ_B (tail save.fe).

You may also want to consider the impact of the high-lying Fe 4d level.

6. Draw bands with four color weights

With a little work, you can use construct a bnds file with four color weights. lmf can only generate three color weights at one go, so you have to assemble the four weights in two passes. Try the following:

lmf ctrl.fe `cat switches-for-lm` --rsig:rs -vso=1 --band~fn=syml~scol@sites=1@l=2@m=-2,-1,1@spin1~scol2@sites=1@l=2@m=0,2@spin1~scol3@sites=1@l=2@m=-2,-1,1@spin2
cp bnds.fe bnds.a
lmf ctrl.fe `cat switches-for-lm` --rsig:rs -vso=1 --band~fn=syml~scol@sites=1@l=2@m=0,2@spin2
cp bnds.fe bnds.b
plbnds --col4:bnds.b,bnds.fe bnds.a

bnds.fe should now have four color weights.

The following draws a figure with the four weights, assigning red to t_2g (maj), yellow to e_g (maj), blue to t_2g (min), cyan to e_g (min).

plbnds -range=-8,6 -fplot~scl=1,.7~sh -ef=0 -scl=13.6 -lt=1,bold=4,col=0,0,0,colw=1,0,0,colw2=1,1,0,colw3=0,0,1,colw4=0,1,1 -lbl bnds.fe

Footnotes and References

¹ Questaal’s QSGW implementation is based on this paper:
Takao Kotani, M. van Schilfgaarde, S. V. Faleev, Quasiparticle self-consistent GW method: a basis for the independent-particle approximation, Phys. Rev. B76, 165106 (2007)

² A paper on G^LDAW^LDA and its limitations:
M. van Schilfgaarde, Takao Kotani, S. V. Faleev, Adequacy of Approximations in GW Theory, Phys. Rev. B 74, 245125 (2006).

³ Questaal uses an improved tetrahedron integration from Peter Bloechl, which knocks out most of the error in the classical method (the classical method has an error second order in the spacing between k points).

⁴ In a periodic system, the constant part of the potential is arbitrary. Choosing E_F=0 corresponds to a particular choice of the G=0 Fourier component.

⁵ This is not quite true: to make the polarizability and the screened coulomb interaction $W$ , all the k points belonging to a particular q must be collected with and MPI_GATHER instruction. The more processors the k’s are spread over, the slower this step. For that reason the “tiles” in pqmap should be as narrow and tall as feasible.

⁶ A paper on Fe and Ni showing where QSGW succeeds and what its limits are in magnetic transition metals. Lorenzo Sponza, Paolo Pisanti, Alena Vishina, Dimitar Pashov, Cedric Weber, Mark van Schilfgaarde, Swagata Acharya, Julien Vidal, and Gabriel Kotliar ``Self-energies in itinerant magnets: A focus on Fe and Ni,’’ Phys. Rev. B 95, 041112(R) (2016).

⁷ Swagata Acharya, Dimitar Pashov, Alexander N. Rudenko, Malte Rösner, Mark van Schilfgaarde, Mikhail I. Katsnelson Importance of charge self-consistency in first-principles description of strongly correlated systems, npj Comput. Mater 7, 208 (2021).

⁸ Jan M. Tomczak, M. van Schilfgaarde, G. Kotliar, Many-body effects in iron pnictides and chalcogenides – non-local vs dynamic origin of effective masses, Phys. Rev. Lett. 109, 237010 (2012).

Table of Contents

Preliminaries

Introduction

1. Self-consistent LDA calculation for Fe

Input file setup

Local orbitals

Starting density

Self-consistency in the LDA

LDA energy bands

2. QSGW calculation for Fe

Setup: the GWinput file

QSGW self-consistency

QSGW energy bands

Compare QSGW and LDA energy bands

Detailed comparison of levels at the Gamma point

Additional exercises

1. k convergence

QSGW energy bands

1a. Band discontinuity

2. Stabilizing Convergence to Self-Consistency with the GW mixing parameter

3. Dependence on gcutb and gcutx

4. Dependence on frequency mesh

5. Dependence on basis set

5a. Fe 3p orbital

6. Draw bands with four color weights

Footnotes and References