Self-consistency

Summary

Self-consistency is a concept closely linked with mean-field theories. In mean-field treatment of the Schrodinger equation, interacting particles give off a field, and interact with the (mean) field other particles generate. The field a particle senses controls its motion, which in turn affects the field it generates. Mean-field theories link the field a particle generates to the field it senses in a self-consistent manner. For the field to be time-independent and the effective field must be some average field. Thus in the mean-field approximation a particle moves in an average effective external field V^eff(r) which determines the particles’ motion, which in turn determines V^eff(r). Each feeds the other; the condition for which both are satisfied is the self-consistency condition.

Self-Consistency in Density-functional Theory
Self-Consistency in the Coherent Potential Approximation
Self-Consistency in nonequilibrium transport
Self-Consistency in the Quasiparticle Self-Consistent GW Approximation
Self-Consistency in tight-binding models
References

Self-Consistency in Density-functional Theory

Practical implementations of DFT, such as the LDA are instances of a mean-field theory. Formally the density n is determined by minimization of a total energy functional E[n]. In principle it could be done through iterating trial densities n_in until the point of minimum energy is found. In practice this is not possible because realistic schemes must calculate the kinetic energy contribution to E[n] via an effective non-interacting Schrodinger equation. An effective one-particle potential V^eff(r) is generated from the functional derivative V^eff(r)=δE[n]/δn(r). V^eff(r) defines a one-particle Hamiltonian which generates a new density n_out, which differs from n_in. Self-consistency is reached when an n_out=n_in. This density is also the one for which E[n] is minimum, or at least stationary, for certain formulations. The codes here implement the standard Hohenberg Kohn (HK) functional, for which the energy is a minimum, and also the Harris-Foulkes (HF) functional ¹, for which the energy is stationary at self-consistency, though typically it is a maximum point.

Approaching self-consistency can be a tricky business, especially in magnetic systems (or whenever there is a degree of freedom with a very low energy scale associated with it).

Charge mixing, general considerations

Questaal codes follow the usual procedure of mixing a linear combination of input density nⁱⁿ and output density n^out to make a trial n^* as an estimate for the self-consistent density n (see for example Chapter 9 in Richard Martin’s book². In a perfect mixing scheme, it would be possible to infer the self-consistent density n directly from nⁱⁿ and n^out. Instead we must make an estimate n^* from nⁱⁿ and n^out, use it for a new nⁱⁿ find a new n^out, iterating until n^out − nⁱⁿ is small.

Practical implementation of density mixing in Questaal is controlled by the ITER_MIX tag, as explained on this page. Questaal uses two independent techniques to accelerate convergence to the self-consistency condition n^out→nⁱⁿ. First, the quantities are mixed making use of model for the dielectric function. Second, multiple (nⁱⁿ,n^out) pairs (taken from prior iterations) can be used to accelerate convergence (Anderson and Broyden schemes). The contents of ITER_MIX control options all components of the mixing algorithm.

If the static dielectric response is known, n^* can approach the self-consistent density n to linear order in n^out−nⁱⁿ. It is not difficult to show that choosing $n^* = n^\mathrm{in} + \epsilon^{-1}(n^\mathrm{out}{-}n^\mathrm{in}) \quad\quad (1)$

yields $n^* = n$ to linear order in $(n^\mathrm{out}{-}n^\mathrm{in})$ . ε is a function of source and field point coordinates r and r′: ε = ε(r,r′) and in any case it is not given by the standard self-consistency procedure. Nevertheless it can be important and some estimate for ε is needed. An especially pertinent example can be found in the NbFe superlattice tutorial.

$\epsilon^{-1}$ can also be written in reciprocal space, as $\epsilon_{\mathbf{GG}'}^{-1}(\mathbf{q})$ . ${\mathbf{GG}'}$ are the Fourier representation of the periodic part of $\epsilon^{-1}$ , thus characterizing the shape of $\epsilon({\mathbf{r,r}'})$ with $({\mathbf{r,r}'})$ confined to a unit cell. $q$ characterizes the variation of $\epsilon^{-1}$ between unit cells. Since the density is a periodic function of the lattice, only the $q{=}0$ part enters into Eq. 1.

In any case none of the Questaal codes have a purely plane wave representation of the density, so they do something else.

Charge mixing in the full-potential code

lmf uses a three component density: a smooth density n₀ represented as plane waves (carried on a uniform mesh), defined everywhere in space and two local densities: the true density n₁ and a one-center expansion n₂ of the smooth density. (See Section 3.6 of Ref. 6 for a detailed description of augmentation and the density derived from it.)

If the density were expanded solely in plane waves n = Σ_G C_G n_G, a simple mixing scheme would be to mix each C_G using a model dielectric function. The Thomas Fermi approximation provides a reasonable, if rough estimate for ε, which reads in reciprocal space $\epsilon_\mathrm{TF}^{-1}(q) = \frac{q^2}{q^2 + k_{TF}^2} \quad\quad (2)$

Eq.(2) has one free parameter, the Thomas Fermi wave number k_TF. It can be estimated given the total number of electrons qval from the free electron gas formula.

k_F = (3π³/vol×qval)^1/3 = E_F^1/2

This is called the “Kerker mixing” algorithm. The beauty of Kerker mixing is that charges in small G components of the density get damped out, while the short-ranged, large G components do not. One can use the Lindhard function instead. The idea is similar, but the Lindhard function is exact for free electrons; it has the same advantages as Kerker.

The mixing algorithm must mix all three components and it is somewhat involved, though the plane wave part does indeed use the Lindhard function. See fp/mixrho.f for details.

Charge mixing in the ASA

The ASA uses a simplified mixing scheme since the logarithmic derivative parameters P and energy moments of charge Q for each class is sufficient to completely specify the charge density. The density is not explicitly mixed. Since it does not have plane waves, some alternative model must be employed. One has been implemented (which diagonalizes the Madelung matrix and substitutes the eigenvalues for a kind of screened version). It dampens the the small G fluctuations, but at the same time it conceals true physical changes in the moments, and it is not ideal.

The ASA codes have the ability to make a discretized form ε(q=0) explicitly in spirit of the energy moments of charge that serve as a discretized representation of the density. This matrix is a good approximation to ε(q=0) in this context, and its use greatly stabilizes the convergence in difficult cases. The ASA Nb/Fe superlattice tutorial provides a classic illustration of this feature.

In summary the ASA codes (lm, lmgf, lmpg) offer two options:

The q=0 discretized polarization is explicitly calculated
A rough ε is obtained from eigenvalues of the Madelung matrix

There is some overhead associated with the second option, but it is not too large and having it greatly facilitates convergence in large systems. This is particularly important in magnetic metals, where there are low-energy degrees of freedom associated with the magnetic parts that require that a healthy component of n^out for reasonable approach to self-consistency.

Mixing the ASA density

The mixing process reduces to estimating a vector X^* related to the density (e.g. P,Q in the ASA) where δX = X^out − Xⁱⁿ vanishes at Xⁱⁿ = X^*.

Mixing algorithms mix linear combinations of (Xⁱⁿ,X^out) pairs taken from the current iteration together with pairs from prior iterations. If there are no prior iterations, then

X^* = Xⁱⁿ + beta × (X^out − Xⁱⁿ) (3)

It is evident from Eq.(1) that beta should be connected with the dielectric function. However, beta is just a number. If beta=1, X^* = X^out; if beta→0, X^* scarcely changes from Xⁱⁿ. Thus in that case you move like an “amoeba” downhill towards the self-consistent solution. For small systems it is usually sufficient to take beta on the order of, but smaller than unity. For large systems charge sloshing becomes a problem so you have to do something different. This is because the potential change goes as δV ~ G⁻²×δn so small G components of δn determine the rate of mixing. The simplest (but inefficient) choice is to make beta small.

By some method (see above) an estimate ε for the dielectric function is constructed. Construct δn = ε⁻¹ (n^out−nⁱⁿ) and build δX from δn. Then estimate

X^* = Xⁱⁿ + beta × δX (4)

Now beta can be much larger again, of order unity.

The mixing algorithms use yet another device to accelerate the approach to self-consistency. Each iteration produces a new (Xⁱⁿ,X^out) pairs. There are acceleration techniques that given a family (Xⁱⁿ,X^out) pairs some combination of them can be taken that is better than taking any one pair. See ITER_MIX on this page.

It remains to explain how $\epsilon^{-1}$ is estimated in the ASA.

Option 1. Direct calculation of a discretized dielectric function

As seen in Eq. 1, knowledge of the static $\epsilon^{-1}$ greatly facilitates convergence. Since the density is a periodic function of the lattice, we need only the $q{=}0$ part as noted above.

In the ASA, ${\mathbf{G}}$ and ${\mathbf{G}'}$ are not given Instead charge is discretized inside unit spheres. We can accordingly define the ASA bare response function $P^{0}$ , as: $dq_{RL} = P^{0}_{RL,R'L'}dv_{R'L'}$

$dv_{R'L'}$ is the constant potential shift in the channel $L'$ of sphere $R'$ , and $dq_{RL}$ is the induced change in the 0th moment in channel L of sphere R, without taking into account interactions between electrons (noninteracting susceptibility). The static dielectric response function is: $\eta^{-1} = (1-M'P^{0})^{-1}$

where

$M'=M$ + (on-site term = something like the intra-atomic U)

and

$M =$ Madelung Matrix

lm and lmgf can both generate $P^{0}$ , and store it on the disk. If you have generated $P^{0}$ , you can have the mixing procedure construct $\eta^{-1}$ and then screen $(q^{\mathrm{out}}-q^{in})$ by $\eta^{-1}$ . More exactly, it replaces the raw $q^\mathrm{out}$ (generated by lm or lmgf) or lmpg) by ${q^\mathrm{out}}^*=q^\mathrm{in}+\eta^{-1}(q^\mathrm{out}-q^\mathrm{in})$

and then it proceeds with self-consistency in the usual way, using {q^\mathrm{out}}^* in place of the usual $q^\mathrm{out}$ . It is not difficult to show that if $\eta^{-1}$ is the exact inverse dielectric function, $q^\mathrm{out}*$ is the RPA estimate of the self-consistent density. See, for example, Richard Martin’s book, Electronic Structure. Indeed, $\eta$ is sometimes called the “density-density response function”.

To tell lm or lmgf to do this screening, set OPTIONS_SCR=2. The program will crash if it can’t read $P^{0}$ from the disk ( $P^{0}$ is stored in the file psta.ext). You can create $P^{0}$ either with the band program lm or the Green’s function code lmgf. Since the metals treatment of $P^{0}$ is very primitive in the lm code, it is better to use the lmgf to create $P^{0}$ for metals. lmgf can also make $P^{0}$ for the noncollinear case. To run lmgf, you need to add a couple of tokens to the ctrl file. For either program, set OPTIONS_SCR=1 to create $P^{0}$ .

The intra-atomic U is estimated from the second derivative of the sphere total energy wrt to charge. You can update it every $k^{th}$ iteration by adding 10*k to the token SCR=… thus SCR=52 will update the U each $5^{th}$ iteration in the sphere program.

Note: lmpg can also read a $P^{0}$ already generated, but it cannot create it.

If you don’t want to make $P^{0}$ , you can let the codes make a model estimate for you (Option 2 below). It saves you the trouble of making $P^{0}$ , but it is less good than when $P^{0}$ is calculated explicitly.

Option 2. Rough estimate of the dielectric function through modifying eigenvalues of the Madelung matrix

$M$ is a symmetric matrix and has real eigenvalues. It is a discretized form of the coulomb interaction which in reciprocal space is $v(\mathbf{q}) = 1/|\mathbf{q}|^2$ . In a periodic potential, the only allow $(\mathbf{q})$ are the reciprocal lattice vectors $(\mathbf{G})$ .

Eigenvalues correspond to excitation energies of normal modes of $M$ . Inspect the structure of the TF dielectric function, and note that $k_{TF}^2$ plays the role of an energy. Option 2 simply diagonalizes $M$ , uses the TF form Eq. 2 but substitutes the eigenvalues of $M$ for $k_{TF}^2$ , and rotates $M$ back to its atomic representation. This gives an estimate for the “screened” $M$ .

All of these functionalities can be enabled through tag OPTIONS_SCR:

OPTIONS_SCR       opt    i4       1,  1          default = 0
    Use scr to accelerate convergence:
do nothing
Make ASA static response function (see documentation)
Use response to screen output q and ves
Use model response to screen output q
Use response to screen output ves only

LDA+U

LDA+U theory, an extension to LDA, is also a mean-field theory. At sites where U is added there is a self-consistency condition on the site density matrix, which must be incorporated together with the standard charge self-consistency. As of this writing, the density matrix is mixed separately from the density, with its own mixing parameter

Self-Consistency in noncollinear magnetism

In noncollinear extensions to standard DFT, the density turns into a 2×2 matrix, for which at any point r the density may be decomposed into linear combinations of Pauli matrices. The decomposition defines the x,y,z components of the spin density. Using the connection between Pauli matrices and Cartesian coordinates, the spin density represented as spin amplitude and direction. This is how it is represented externally. Internally it is represented at times either as (amplitude, direction), or (when constructing the hamiltonian or charge density) as a 2×2 matrix. Noncollinear magnetism is implemented at present in the ASA only, and the spin orientation a site is assumed to be constant for the whole site. Self-consistency in this context means that the spin at each site is rotated so that the off-diagonal part of the 2×2 spin density matrix vanish.

Self-Consistency in the Coherent Potential Approximation

Alloys consist of different kinds of atoms sitting at a given lattice site, or in the magnetic version, an atom of a particular type (e.g. Fe) with different spin configurations. The CPA replaces a statistical ensemble of atoms with which is taken to be an average, or composite of the true atoms. CPA theory is a way to construct the “average” properties of a composite atom, in an optimal manner, by constructing an the average scattering properties of a site. The theory is naturally formulated in the language of Green’s functions. The CPA has been implemented in the ASA for both magnetic and chemical disorder, in program lmgf. Thus lmgf requires two kinds of self-consistency: an outer loop for charge and an inner loop for the CPA condition. See here for a description of Questaal’s implementation of CPA.

Self-Consistency in nonequilibrium transport

lmpg is a Green’s function code, similar to lmgf but designed for layered systems (periodic in two dimensions but not in the third). One of the best uses of this method is its ability to calculate transport in the Landauer-Buttiker framework. Outside the small bias regime, the potential must be determined self-consistently under nonequilibrium conditions. Questaal’s implementation is described in Ref [2].

Self-Consistency in the Quasiparticle Self-Consistent GW Approximation

Self-consistency has a different purpose in the quasiparticle self-consistent GW approximation (QSGW) [3]. QSGW is based on a kind of self-consistent perturbation theory, where the self-consistency is used to construct an optimal non-interacting hamiltonian by minimizing the difference between it and the many-body hamiltonian, within the GW approximation. Self-consistency enables QSGW to describe optical properties in a wide range of materials rather well, including cases where the local-density and LDA-based GW approximations fail qualitatively. Self-consistency dramatically improves agreement with experiment, and is sometimes essential. QSGW avoids some formal and practical problems encountered in fully self-consistent GW, which is more akin to the mean-field self-consistency described earlier. QSGW handles both itinerant and correlated electrons on an equal footing, without any ambiguity about how a localized state is defined, or how double-counting terms should be subtracted. Weakly correlated materials such as Na and sp semiconductors are described with uniformly high accuracy. Discrepancies with experiment are small and systematic, and can be explained in terms of the approximations made.

Questaal mixes the quasiparticle self-energy $\Sigma^0$ generated by the GW code much in the same way as it mixes charge densities. It uses a mixing parameter beta in Eq. 3 above where in this context X refers to $\Sigma^0$ . It mixes $\Sigma^{0,\mathrm{out}}$ with the input $\Sigma^{0,\mathrm{in}}$ (note that different beta’s are used for the charge and $\Sigma^{0}$ ). The mixing beta to mix $\Sigma^0$ is read through tag GW_MIXBETA.

In addition, the mixing scheme uses an Anderson mixing scheme [7] to mix ( $\Sigma^{0,\mathrm{in}},\Sigma^{0,\mathrm{out}}$ ) combinations from prior iterations when they become available (this information is kept in a file (mixsigma). The scheme is analogous to charge mixing; see the description of ITER_MIX on this page.

To see how much the mixing scheme mixed prior iterations of $\Sigma^0$ in a particular run, do grep tj lsg.

Self-Consistency in tight-binding models

Typically empirical tight-binding models, for which the form of matrix elements is postulated, ignore self-consistency. However, the effects of charge transfer can be important, and the quality tight-binding models can be vastly improved with minimal complexity added to the hamiltonian [4].

This package’s implementation of empirical tight-binding hamiltonians, tbe, has a facility for including electrostatic interactions in a self-consistent manner. Site charges are accumulated from eigenvectors, which is used to make a Madelung potential

Typically tight-binding models ignore self-consistency; however, tbe has a capability to include potential changes from charge transfer in a self-consistent manner. The site charges generate an electrostatic potential through a Madelung matrix, and must be determined self-consistently.

References

[1] M. Foulkes and R. Haydock, Tight-binding models and density-functional theory, Phys. Rev. B39, 12520 (1989). See also comment 3 on this page.

[2] S. V. Faleev, F. Leonard, D. A. Stewart, and M. van Schilfgaarde, Ab initio tight-binding LMTO method for nonequilibrium electron transport in nanosystems, Phys. Rev. B71, 195422 (2005).

[3] Takao Kotani, M. van Schilfgaarde, S. V. Faleev, Quasiparticle self-consistent GW method: a basis for the independent-particle approximation, Phys. Rev. B 76, 165106 (2007).

[4] M. W. Finnis, A. T. Paxton, M. Methfessel and M. van Schilfgaarde, Crystal Structures of Zirconia from First Principles and Self-Consistent Tight Binding, Phys. Rev. Lett. 81, 5149 (1998)

[5] R. M. Martin, Electronic Structure, Cambridge University Press (2004).

[6] Dimitar Pashov, Swagata Acharya, Walter R. L. Lambrecht, Jerome Jackson, Kirill D. Belashchenko, Athanasios Chantis, Francois Jamet, Mark van Schilfgaarde, Questaal: a package of electronic structure methods based on the linear muffin-tin orbital technique, Comp. Phys. Comm. 249, 107065 (2020).

[7] D. G. Anderson. Iterative procedures for nonlinear integral equations. J. Assoc. Comput. Mach., 12:547–560, 1965