In mathematics, Laplace's method, named after Pierre-Simon Laplace, is a technique used to approximate integrals of the form
∫ a b e M f ( x ) d x , {\displaystyle \int _{a}^{b}e^{Mf(x)}\,dx,}where f {\displaystyle f} is a twice-differentiable function, M {\displaystyle M} is a large number, and the endpoints a {\displaystyle a} and b {\displaystyle b} could be infinite. This technique was originally presented in the book by Laplace (1774).
In Bayesian statistics, Laplace's approximation can refer to either approximating the posterior normalizing constant with Laplace's method or approximating the posterior distribution with a Gaussian centered at the maximum a posteriori estimate. Laplace approximations are used in the integrated nested Laplace approximations method for fast approximations of Bayesian inference.
Concept
Let the function f ( x ) {\displaystyle f(x)} have a unique global maximum at x 0 {\displaystyle x_{0}} . M > 0 {\displaystyle M>0} is a constant here. The following two functions are considered:
g ( x ) = M f ( x ) , h ( x ) = e M f ( x ) . {\displaystyle {\begin{aligned}g(x)&=Mf(x),\\h(x)&=e^{Mf(x)}.\end{aligned}}}Then, x 0 {\displaystyle x_{0}} is the global maximum of g {\displaystyle g} and h {\displaystyle h} as well. Hence:
g ( x 0 ) g ( x ) = M f ( x 0 ) M f ( x ) = f ( x 0 ) f ( x ) , h ( x 0 ) h ( x ) = e M f ( x 0 ) e M f ( x ) = e M ( f ( x 0 ) − f ( x ) ) . {\displaystyle {\begin{aligned}{\frac {g(x_{0})}{g(x)}}&={\frac {Mf(x_{0})}{Mf(x)}}={\frac {f(x_{0})}{f(x)}},\\[4pt]{\frac {h(x_{0})}{h(x)}}&={\frac {e^{Mf(x_{0})}}{e^{Mf(x)}}}=e^{M(f(x_{0})-f(x))}.\end{aligned}}}As M increases, the ratio for h {\displaystyle h} will grow exponentially, while the ratio for g {\displaystyle g} does not change. Thus, significant contributions to the integral of this function will come only from points x {\displaystyle x} in a neighborhood of x 0 {\displaystyle x_{0}} , which can then be estimated.
General theory
To state and motivate the method, one must make several assumptions. It is assumed that x 0 {\displaystyle x_{0}} is not an endpoint of the interval of integration and that the values f ( x ) {\displaystyle f(x)} cannot be very close to f ( x 0 ) {\displaystyle f(x_{0})} unless x {\displaystyle x} is close to x 0 {\displaystyle x_{0}} .
f ( x ) {\displaystyle f(x)} can be expanded around x0 by Taylor's theorem,
f ( x ) = f ( x 0 ) + f ′ ( x 0 ) ( x − x 0 ) + 1 2 f ″ ( x 0 ) ( x − x 0 ) 2 + R {\displaystyle f(x)=f(x_{0})+f'(x_{0})(x-x_{0})+{\frac {1}{2}}f''(x_{0})(x-x_{0})^{2}+R}where R = O ( ( x − x 0 ) 3 ) {\displaystyle R=O\left((x-x_{0})^{3}\right)} (see: big O notation).
Since f {\displaystyle f} has a global maximum at x 0 {\displaystyle x_{0}} , and x 0 {\displaystyle x_{0}} is not an endpoint, it is a stationary point, i.e. f ′ ( x 0 ) = 0 {\displaystyle f'(x_{0})=0} . Therefore, the second-order Taylor polynomial approximating f ( x ) {\displaystyle f(x)} is
f ( x ) ≈ f ( x 0 ) + 1 2 f ″ ( x 0 ) ( x − x 0 ) 2 . {\displaystyle f(x)\approx f(x_{0})+{\frac {1}{2}}f''(x_{0})(x-x_{0})^{2}.}Then, just one more step is needed to get a Gaussian distribution. Since x 0 {\displaystyle x_{0}} is a global maximum of the function f {\displaystyle f} it can be stated, by definition of the second derivative, that f ″ ( x 0 ) ≤ 0 {\displaystyle f''(x_{0})\leq 0} , thus giving the relation
f ( x ) ≈ f ( x 0 ) − 1 2 | f ″ ( x 0 ) | ( x − x 0 ) 2 {\displaystyle f(x)\approx f(x_{0})-{\frac {1}{2}}|f''(x_{0})|(x-x_{0})^{2}}
for x {\displaystyle x} close to x 0 {\displaystyle x_{0}} . The integral can then be approximated with:
∫ a b e M f ( x ) d x ≈ e M f ( x 0 ) ∫ a b e − 1 2 M | f ″ ( x 0 ) | ( x − x 0 ) 2 d x {\displaystyle \int _{a}^{b}e^{Mf(x)}\,dx\approx e^{Mf(x_{0})}\int _{a}^{b}e^{-{\frac {1}{2}}M|f''(x_{0})|(x-x_{0})^{2}}\,dx}If f ″ ( x 0 ) < 0 {\displaystyle f''(x_{0})<0} this latter integral becomes a Gaussian integral if we replace the limits of integration by − ∞ {\displaystyle -\infty } and + ∞ {\displaystyle +\infty } ; when M {\displaystyle M} is large this creates only a small error because the exponential decays very fast away from x 0 {\displaystyle x_{0}} . Computing this Gaussian integral we obtain:
∫ a b e M f ( x ) d x ≈ 2 π M | f ″ ( x 0 ) | e M f ( x 0 ) as M → ∞ . {\displaystyle \int _{a}^{b}e^{Mf(x)}\,dx\approx {\sqrt {\frac {2\pi }{M|f''(x_{0})|}}}e^{Mf(x_{0})}{\text{ as }}M\to \infty .}A generalization of this method and extension to arbitrary precision is provided by the book Fog (2008).
Formal statement and proof
Suppose f ( x ) {\displaystyle f(x)} is a twice continuously differentiable function on [ a , b ] , {\displaystyle [a,b],} and there exists a unique point x 0 ∈ ( a , b ) {\displaystyle x_{0}\in (a,b)} such that:
f ( x 0 ) = max x ∈ [ a , b ] f ( x ) and f ″ ( x 0 ) < 0. {\displaystyle f(x_{0})=\max _{x\in [a,b]}f(x)\quad {\text{and}}\quad f''(x_{0})<0.}Then:
lim n → ∞ ∫ a b e n f ( x ) d x e n f ( x 0 ) 2 π n ( − f ″ ( x 0 ) ) = 1. {\displaystyle \lim _{n\to \infty }{\frac {\int _{a}^{b}e^{nf(x)}\,dx}{e^{nf(x_{0})}{\sqrt {\frac {2\pi }{n\left(-f''(x_{0})\right)}}}}}=1.}This method relies on 4 basic concepts such as
Based on these four concepts, we can derive the relative error of this method.
Other formulations
Laplace's approximation is sometimes written as
∫ a b h ( x ) e M g ( x ) d x ≈ 2 π M | g ″ ( x 0 ) | h ( x 0 ) e M g ( x 0 ) as M → ∞ {\displaystyle \int _{a}^{b}h(x)e^{Mg(x)}\,dx\approx {\sqrt {\frac {2\pi }{M|g''(x_{0})|}}}h(x_{0})e^{Mg(x_{0})}\ {\text{ as }}M\to \infty }where h {\displaystyle h} is positive.
Importantly, the accuracy of the approximation depends on the variable of integration, that is, on what stays in g ( x ) {\displaystyle g(x)} and what goes into h ( x ) . {\displaystyle h(x).} 3
In the multivariate case, where x {\displaystyle \mathbf {x} } is a d {\displaystyle d} -dimensional vector and f ( x ) {\displaystyle f(\mathbf {x} )} is a scalar function of x {\displaystyle \mathbf {x} } , Laplace's approximation is usually written as:
∫ h ( x ) e M f ( x ) d x ≈ ( 2 π M ) d / 2 h ( x 0 ) e M f ( x 0 ) | − H ( f ) ( x 0 ) | 1 / 2 as M → ∞ {\displaystyle \int h(\mathbf {x} )e^{Mf(\mathbf {x} )}\,d\mathbf {x} \approx \left({\frac {2\pi }{M}}\right)^{d/2}{\frac {h(\mathbf {x} _{0})e^{Mf(\mathbf {x} _{0})}}{\left|-H(f)(\mathbf {x} _{0})\right|^{1/2}}}{\text{ as }}M\to \infty }where H ( f ) ( x 0 ) {\displaystyle H(f)(\mathbf {x} _{0})} is the Hessian matrix of f {\displaystyle f} evaluated at x 0 {\displaystyle \mathbf {x} _{0}} and where | ⋅ | {\displaystyle |\cdot |} denotes matrix determinant. Analogously to the univariate case, the Hessian is required to be negative-definite.4
By the way, although x {\displaystyle \mathbf {x} } denotes a d {\displaystyle d} -dimensional vector, the term d x {\displaystyle d\mathbf {x} } denotes an infinitesimal volume here, i.e. d x := d x 1 d x 2 ⋯ d x d {\displaystyle d\mathbf {x} :=dx_{1}dx_{2}\cdots dx_{d}} .
Steepest descent extension
Main article: Method of steepest descent
In extensions of Laplace's method, complex analysis, and in particular Cauchy's integral formula, is used to find a contour of steepest descent for an (asymptotically with large M) equivalent integral, expressed as a line integral. In particular, if no point x0 where the derivative of f {\displaystyle f} vanishes exists on the real line, it may be necessary to deform the integration contour to an optimal one, where the above analysis will be possible. Again, the main idea is to reduce, at least asymptotically, the calculation of the given integral to that of a simpler integral that can be explicitly evaluated. See the book of Erdelyi (1956) for a simple discussion (where the method is termed steepest descents).
The appropriate formulation for the complex z-plane is
∫ a b e M f ( z ) d z ≈ 2 π − M f ″ ( z 0 ) e M f ( z 0 ) as M → ∞ . {\displaystyle \int _{a}^{b}e^{Mf(z)}\,dz\approx {\sqrt {\frac {2\pi }{-Mf''(z_{0})}}}e^{Mf(z_{0})}{\text{ as }}M\to \infty .}for a path passing through the saddle point at z0. Note the explicit appearance of a minus sign to indicate the direction of the second derivative: one must not take the modulus. Also note that if the integrand is meromorphic, one may have to add residues corresponding to poles traversed while deforming the contour (see for example section 3 of Okounkov's paper Symmetric functions and random partitions).
Further generalizations
An extension of the steepest descent method is the so-called nonlinear stationary phase/steepest descent method. Here, instead of integrals, one needs to evaluate asymptotically solutions of Riemann–Hilbert factorization problems.
Given a contour C in the complex sphere, a function f {\displaystyle f} defined on that contour and a special point, such as infinity, a holomorphic function M is sought away from C, with prescribed jump across C, and with a given normalization at infinity. If f {\displaystyle f} and hence M are matrices rather than scalars this is a problem that in general does not admit an explicit solution.
An asymptotic evaluation is then possible along the lines of the linear stationary phase/steepest descent method. The idea is to reduce asymptotically the solution of the given Riemann–Hilbert problem to that of a simpler, explicitly solvable, Riemann–Hilbert problem. Cauchy's theorem is used to justify deformations of the jump contour.
The nonlinear stationary phase was introduced by Deift and Zhou in 1993, based on earlier work of Its. A (properly speaking) nonlinear steepest descent method was introduced by Kamvissis, K. McLaughlin and P. Miller in 2003, based on previous work of Lax, Levermore, Deift, Venakides and Zhou. As in the linear case, "steepest descent contours" solve a min-max problem. In the nonlinear case they turn out to be "S-curves" (defined in a different context back in the 80s by Stahl, Gonchar and Rakhmanov).
The nonlinear stationary phase/steepest descent method has applications to the theory of soliton equations and integrable models, random matrices and combinatorics.
Median-point approximation generalization
In the generalization, evaluation of the integral is considered equivalent to finding the norm of the distribution with density
e M f ( x ) . {\displaystyle e^{Mf(x)}.}Denoting the cumulative distribution F ( x ) {\displaystyle F(x)} , if there is a diffeomorphic Gaussian distribution with density
e − g − γ 2 y 2 {\displaystyle e^{-g-{\frac {\gamma }{2}}y^{2}}}the norm is given by
2 π γ − 1 e − g {\displaystyle {\sqrt {2\pi \gamma ^{-1}}}e^{-g}}and the corresponding diffeomorphism is
y ( x ) = 1 γ Φ − 1 ( F ( x ) F ( ∞ ) ) , {\displaystyle y(x)={\frac {1}{\sqrt {\gamma }}}\Phi ^{-1}{\left({\frac {F(x)}{F(\infty )}}\right)},}where Φ {\displaystyle \Phi } denotes cumulative standard normal distribution function.
In general, any distribution diffeomorphic to the Gaussian distribution has density
e − g − γ 2 y 2 ( x ) y ′ ( x ) {\displaystyle e^{-g-{\frac {\gamma }{2}}y^{2}(x)}y'(x)}and the median-point is mapped to the median of the Gaussian distribution. Matching the logarithm of the density functions and their derivatives at the median point up to a given order yields a system of equations that determine the approximate values of γ {\displaystyle \gamma } and g {\displaystyle g} .
The approximation was introduced in 2019 by D. Makogon and C. Morais Smith primarily in the context of partition function evaluation for a system of interacting fermions.5
Complex integrals
For complex integrals in the form:
1 2 π i ∫ c − i ∞ c + i ∞ g ( s ) e s t d s {\displaystyle {\frac {1}{2\pi i}}\int _{c-i\infty }^{c+i\infty }g(s)e^{st}\,ds}with t ≫ 1 , {\displaystyle t\gg 1,} we make the substitution t = iu and the change of variable s = c + i x {\displaystyle s=c+ix} to get the bilateral Laplace transform:
1 2 π ∫ − ∞ ∞ g ( c + i x ) e − u x e i c u d x . {\displaystyle {\frac {1}{2\pi }}\int _{-\infty }^{\infty }g(c+ix)e^{-ux}e^{icu}\,dx.}We then split g(c + ix) in its real and complex part, after which we recover u = t/i. This is useful for inverse Laplace transforms, the Perron formula and complex integration.
Example: Stirling's approximation
Laplace's method can be used to derive Stirling's approximation
N ! ≈ 2 π N ( N e ) N {\displaystyle N!\approx {\sqrt {2\pi N}}({\frac {N}{e}})^{N}\,}for a large integer N. From the definition of the Gamma function, we have
N ! = Γ ( N + 1 ) = ∫ 0 ∞ e − x x N d x . {\displaystyle N!=\Gamma (N+1)=\int _{0}^{\infty }e^{-x}x^{N}\,dx.}Now we change variables, letting x = N z {\displaystyle x=Nz} so that d x = N d z . {\displaystyle dx=Ndz.} Plug these values back in to obtain
N ! = ∫ 0 ∞ e − N z ( N z ) N N d z = N N + 1 ∫ 0 ∞ e − N z z N d z = N N + 1 ∫ 0 ∞ e − N z e N ln z d z = N N + 1 ∫ 0 ∞ e N ( ln z − z ) d z . {\displaystyle {\begin{aligned}N!&=\int _{0}^{\infty }e^{-Nz}(Nz)^{N}N\,dz\\&=N^{N+1}\int _{0}^{\infty }e^{-Nz}z^{N}\,dz\\&=N^{N+1}\int _{0}^{\infty }e^{-Nz}e^{N\ln z}\,dz\\&=N^{N+1}\int _{0}^{\infty }e^{N(\ln z-z)}\,dz.\end{aligned}}}This integral has the form necessary for Laplace's method with
f ( z ) = ln z − z {\displaystyle f(z)=\ln {z}-z}which is twice-differentiable:
f ′ ( z ) = 1 z − 1 , {\displaystyle f'(z)={\frac {1}{z}}-1,} f ″ ( z ) = − 1 z 2 . {\displaystyle f''(z)=-{\frac {1}{z^{2}}}.}The maximum of f ( z ) {\displaystyle f(z)} lies at z0 = 1, and the second derivative of f ( z ) {\displaystyle f(z)} has the value −1 at this point. Therefore, we obtain
N ! ≈ N N + 1 2 π N e − N = 2 π N N N e − N . {\displaystyle N!\approx N^{N+1}{\sqrt {\frac {2\pi }{N}}}e^{-N}={\sqrt {2\pi N}}N^{N}e^{-N}.}See also
- Mathematics portal
- Method of stationary phase
- Method of steepest descent
- Large deviations theory
- Laplace principle (large deviations theory)
- Laplace's approximation
Notes
- Azevedo-Filho, A.; Shachter, R. (1994), "Laplace's Method Approximations for Probabilistic Inference in Belief Networks with Continuous Variables", in Mantaras, R.; Poole, D. (eds.), Uncertainty in Artificial Intelligence, San Francisco, CA: Morgan Kaufmann, CiteSeerX 10.1.1.91.2064.
- Deift, P.; Zhou, X. (1993), "A steepest descent method for oscillatory Riemann–Hilbert problems. Asymptotics for the MKdV equation", Ann. of Math., vol. 137, no. 2, pp. 295–368, arXiv:math/9201261, doi:10.2307/2946540, JSTOR 2946540.
- Erdelyi, A. (1956), Asymptotic Expansions, Dover.
- Fog, A. (2008), "Calculation Methods for Wallenius' Noncentral Hypergeometric Distribution", Communications in Statistics, Simulation and Computation, vol. 37, no. 2, pp. 258–273, doi:10.1080/03610910701790269, S2CID 9040568.
- Laplace, P S (1774), "Mémoires de Mathématique et de Physique, Tome Sixième" [Memoir on the probability of causes of events.], Statistical Science, 1 (3): 366–367, JSTOR 2245476
- Wang, Xiang-Sheng; Wong, Roderick (2007). "Discrete analogues of Laplace's approximation". Asymptot. Anal. 54 (3–4): 165–180.
This article incorporates material from saddle point approximation on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.
References
Tierney, Luke; Kadane, Joseph B. (1986). "Accurate Approximations for Posterior Moments and Marginal Densities". J. Amer. Statist. Assoc. 81 (393): 82–86. doi:10.1080/01621459.1986.10478240. /wiki/Doi_(identifier) ↩
Amaral Turkman, M. Antónia; Paulino, Carlos Daniel; Müller, Peter (2019). "Methods Based on Analytic Approximations". Computational Bayesian Statistics: An Introduction. Cambridge University Press. pp. 150–171. ISBN 978-1-108-70374-1. 978-1-108-70374-1 ↩
Butler, Ronald W (2007). Saddlepoint approximations and applications. Cambridge University Press. ISBN 978-0-521-87250-8. 978-0-521-87250-8 ↩
MacKay, David J. C. (September 2003). Information Theory, Inference and Learning Algorithms. Cambridge: Cambridge University Press. ISBN 9780521642989. 9780521642989 ↩
Makogon, D.; Morais Smith, C. (2022-05-03). "Median-point approximation and its application for the study of fermionic systems". Physical Review B. 105 (17): 174505. Bibcode:2022PhRvB.105q4505M. doi:10.1103/PhysRevB.105.174505. hdl:1874/423769. S2CID 203591796. https://link.aps.org/doi/10.1103/PhysRevB.105.174505 ↩