BSDEs were first introduced by Pardoux and Peng in 1990 and have since become essential tools in stochastic control and financial mathematics. In the 1990s, Étienne Pardoux and Shige Peng established the existence and uniqueness theory for BSDE solutions, applying BSDEs to financial mathematics and control theory. For instance, BSDEs have been widely used in option pricing, risk measurement, and dynamic hedging.2
Deep Learning is a machine learning method based on multilayer neural networks. Its core concept can be traced back to the neural computing models of the 1940s. In the 1980s, the proposal of the backpropagation algorithm made the training of multilayer neural networks possible. In 2006, the Deep Belief Networks proposed by Geoffrey Hinton and others rekindled interest in deep learning. Since then, deep learning has made groundbreaking advancements in image processing, speech recognition, natural language processing, and other fields.3
Traditional numerical methods for solving stochastic differential equations4 include the Euler–Maruyama method, Milstein method, Runge–Kutta method (SDE) and methods based on different representations of iterated stochastic integrals.56
But as financial problems become more complex, traditional numerical methods for BSDEs (such as the Monte Carlo method, finite difference method, etc.) have shown limitations such as high computational complexity and the curse of dimensionality.7
The combination of deep learning with BSDEs, known as deep BSDE, was proposed by Han, Jentzen, and E in 2018 as a solution to the high-dimensional challenges faced by traditional numerical methods. The Deep BSDE approach leverages the powerful nonlinear fitting capabilities of deep learning, approximating the solution of BSDEs by constructing neural networks. The specific idea is to represent the solution of a BSDE as the output of a neural network and train the network to approximate the solution.11
Backward Stochastic Differential Equations (BSDEs) represent a powerful mathematical tool extensively applied in fields such as stochastic control, financial mathematics, and beyond. Unlike traditional Stochastic differential equations (SDEs), which are solved forward in time, BSDEs are solved backward, starting from a future time and moving backwards to the present. This unique characteristic makes BSDEs particularly suitable for problems involving terminal conditions and uncertainties.12
A backward stochastic differential equation (BSDE) can be formulated as:13
In this equation:
The goal is to find adapted processes Y t {\displaystyle Y_{t}} and Z t {\displaystyle Z_{t}} that satisfy this equation. Traditional numerical methods struggle with BSDEs due to the curse of dimensionality, which makes computations in high-dimensional spaces extremely challenging.14
Source:15
We consider a general class of PDEs represented by ∂ u ∂ t ( t , x ) + 1 2 Tr ( σ σ T ( t , x ) ( Hess x u ( t , x ) ) ) + ∇ u ( t , x ) ⋅ μ ( t , x ) + f ( t , x , u ( t , x ) , σ T ( t , x ) ∇ u ( t , x ) ) = 0 {\displaystyle {\frac {\partial u}{\partial t}}(t,x)+{\frac {1}{2}}{\text{Tr}}\left(\sigma \sigma ^{T}(t,x)\left({\text{Hess}}_{x}u(t,x)\right)\right)+\nabla u(t,x)\cdot \mu (t,x)+f\left(t,x,u(t,x),\sigma ^{T}(t,x)\nabla u(t,x)\right)=0}
Let { W t } t ≥ 0 {\displaystyle \{W_{t}\}_{t\geq 0}} be a d {\displaystyle d} -dimensional Brownian motion and { X t } t ≥ 0 {\displaystyle \{X_{t}\}_{t\geq 0}} be a d {\displaystyle d} -dimensional stochastic process which satisfies
X t = ξ + ∫ 0 t μ ( s , X s ) d s + ∫ 0 t σ ( s , X s ) d W s {\displaystyle X_{t}=\xi +\int _{0}^{t}\mu (s,X_{s})\,ds+\int _{0}^{t}\sigma (s,X_{s})\,dW_{s}}
Then the solution of the PDE satisfies the following BSDE:
u ( t , X t ) − u ( 0 , X 0 ) {\displaystyle u(t,X_{t})-u(0,X_{0})}
Discretize the time interval [ 0 , T ] {\displaystyle [0,T]} into steps 0 = t 0 < t 1 < ⋯ < t N = T {\displaystyle 0=t_{0}<t_{1}<\cdots <t_{N}=T} :
X t n + 1 − X t n ≈ μ ( t n , X t n ) Δ t n + σ ( t n , X t n ) Δ W n {\displaystyle X_{t_{n+1}}-X_{t_{n}}\approx \mu (t_{n},X_{t_{n}})\Delta t_{n}+\sigma (t_{n},X_{t_{n}})\Delta W_{n}}
u ( t n , X t n + 1 ) − u ( t n , X t n ) {\displaystyle u(t_{n},X_{t_{n+1}})-u(t_{n},X_{t_{n}})}
where Δ t n = t n + 1 − t n {\displaystyle \Delta t_{n}=t_{n+1}-t_{n}} and Δ W n = W t n + 1 − W n {\displaystyle \Delta W_{n}=W_{t_{n+1}}-W_{n}} .
Use a multilayer feedforward neural network to approximate:
σ T ( t n , X n ) ∇ u ( t n , X n ) ≈ ( σ T ∇ u ) ( t n , X n ; θ n ) {\displaystyle \sigma ^{T}(t_{n},X_{n})\nabla u(t_{n},X_{n})\approx (\sigma ^{T}\nabla u)(t_{n},X_{n};\theta _{n})}
for n = 1 , … , N {\displaystyle n=1,\ldots ,N} , where θ n {\displaystyle \theta _{n}} are parameters of the neural network approximating x ↦ σ T ( t , x ) ∇ u ( t , x ) {\displaystyle x\mapsto \sigma ^{T}(t,x)\nabla u(t,x)} at t = t n {\displaystyle t=t_{n}} .
Stack all sub-networks in the approximation step to form a deep neural network. Train the network using paths { X t n } 0 ≤ n ≤ N {\displaystyle \{X_{t_{n}}\}_{0\leq n\leq N}} and { W t n } 0 ≤ n ≤ N {\displaystyle \{W_{t_{n}}\}_{0\leq n\leq N}} as input data, minimizing the loss function:
l ( θ ) = E | g ( X t N ) − u ^ ( { X t n } 0 ≤ n ≤ N , { W t n } 0 ≤ n ≤ N ; θ ) | 2 {\displaystyle l(\theta )=\mathbb {E} \left|g(X_{t_{N}})-{\hat {u}}\left(\{X_{t_{n}}\}_{0\leq n\leq N},\{W_{t_{n}}\}_{0\leq n\leq N};\theta \right)\right|^{2}}
where u ^ {\displaystyle {\hat {u}}} is the approximation of u ( t , X t ) {\displaystyle u(t,X_{t})} .
Source:16
Deep learning encompass a class of machine learning techniques that have transformed numerous fields by enabling the modeling and interpretation of intricate data structures. These methods, often referred to as deep learning, are distinguished by their hierarchical architecture comprising multiple layers of interconnected nodes, or neurons. This architecture allows deep neural networks to autonomously learn abstract representations of data, making them particularly effective in tasks such as image recognition, natural language processing, and financial modeling. The core of this method lies in designing an appropriate neural network structure (such as fully connected networks or recurrent neural networks) and selecting effective optimization algorithms.17
The choice of deep BSDE network architecture, the number of layers, and the number of neurons per layer are crucial hyperparameters that significantly impact the performance of the deep BSDE method. The deep BSDE method constructs neural networks to approximate the solutions for Y {\displaystyle Y} and Z {\displaystyle Z} , and utilizes stochastic gradient descent and other optimization algorithms for training.18
The fig illustrates the network architecture for the deep BSDE method. Note that ∇ u ( t n , X t n ) {\displaystyle \nabla u(t_{n},X_{t_{n}})} denotes the variable approximated directly by subnetworks, and u ( t n , X t n ) {\displaystyle u(t_{n},X_{t_{n}})} denotes the variable computed iteratively in the network. There are three types of connections in this network:19
i) X t n → h 1 n → h 2 n → … → h H n → ∇ u ( t n , X t n ) {\displaystyle X_{t_{n}}\rightarrow h_{1}^{n}\rightarrow h_{2}^{n}\rightarrow \ldots \rightarrow h_{H}^{n}\rightarrow \nabla u(t_{n},X_{t_{n}})} is the multilayer feedforward neural network approximating the spatial gradients at time t = t n {\displaystyle t=t_{n}} . The weights θ n {\displaystyle \theta _{n}} of this subnetwork are the parameters optimized.
ii) ( u ( t n , X t n ) , ∇ u ( t n , X t n ) , W t n + 1 − W t n ) → u ( t n + 1 , X t n + 1 ) {\displaystyle (u(t_{n},X_{t_{n}}),\nabla u(t_{n},X_{t_{n}}),W_{t_{n}+1}-W_{t_{n}})\rightarrow u(t_{n+1},X_{t_{n+1}})} is the forward iteration providing the final output of the network as an approximation of u ( t N , X t N ) {\displaystyle u(t_{N},X_{t_{N}})} , characterized by Eqs. 5 and 6. There are no parameters optimized in this type of connection.
iii) ( X t n , W t n + 1 − W t n ) → X t n + 1 {\displaystyle (X_{t_{n}},W_{t_{n}+1}-W_{t_{n}})\rightarrow X_{t_{n+1}}} is the shortcut connecting blocks at different times, characterized by Eqs. 4 and 6. There are also no parameters optimized in this type of connection.
This function implements the Adam20 algorithm for minimizing the target function G ( θ ) {\displaystyle {\mathcal {G}}(\theta )} .
This function implements the backpropagation algorithm for training a multi-layer feedforward neural network.
Source:21
This function calculates the optimal investment portfolio using the specified parameters and stochastic processes.
Deep BSDE is widely used in the fields of financial derivatives pricing, risk management, and asset allocation. It is particularly suitable for:
Sources:2829
Sources:3031
Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690 ↩
Pardoux, E.; Peng, S. (1990). "Adapted solution of a backward stochastic differential equation". Systems & Control Letters. 14 (1): 55–61. doi:10.1016/0167-6911(90)90082-6. /wiki/Doi_(identifier) ↩
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep Learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096. https://hal.science/hal-04206682/file/Lecun2015.pdf ↩
Kloeden, P.E., Platen E. (1992). Numerical Solution of Stochastic Differential Equations. Springer, Berlin, Heidelberg. DOI: https://doi.org/10.1007/978-3-662-12616-5 https://doi.org/10.1007/978-3-662-12616-5 ↩
Kuznetsov, D.F. (2023). Strong approximation of iterated Itô and Stratonovich stochastic integrals: Method of generalized multiple Fourier series. Application to numerical integration of Itô SDEs and semilinear SPDEs. Differ. Uravn. Protsesy Upr., no. 1. DOI: https://doi.org/10.21638/11701/spbu35.2023.110 https://doi.org/10.21638/11701/spbu35.2023.110 ↩
Rybakov, K.A. (2023). Spectral representations of iterated stochastic integrals and their application for modeling nonlinear stochastic dynamics. Mathematics, vol. 11, 4047. DOI: https://doi.org/10.3390/math11194047 https://doi.org/10.3390/math11194047 ↩
"Real Options with Monte Carlo Simulation". Archived from the original on 2010-03-18. Retrieved 2010-09-24. https://web.archive.org/web/20100318060412/http://www.puc-rio.br/marco.ind/monte-carlo.html ↩
"Monte Carlo Simulation". Palisade Corporation. 2010. Retrieved 2010-09-24. http://www.palisade.com/risk/monte_carlo_simulation.asp ↩
Christian Grossmann; Hans-G. Roos; Martin Stynes (2007). Numerical Treatment of Partial Differential Equations. Springer Science & Business Media. p. 23. ISBN 978-3-540-71584-9. 978-3-540-71584-9 ↩
Ma, Jin; Yong, Jiongmin (2007). Forward-Backward Stochastic Differential Equations and their Applications. Lecture Notes in Mathematics. Vol. 1702. Springer Berlin, Heidelberg. doi:10.1007/978-3-540-48831-6. ISBN 978-3-540-65960-0. 978-3-540-65960-0 ↩
Kingma, Diederik; Ba, Jimmy (2014). "Adam: A Method for Stochastic Optimization". arXiv:1412.6980 [cs.LG]. /wiki/ArXiv_(identifier) ↩
Beck, C.; E, W.; Jentzen, A. (2019). "Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations". Journal of Nonlinear Science. 29 (4): 1563–1619. arXiv:1709.05963. doi:10.1007/s00332-018-9525-3. /wiki/ArXiv_(identifier) ↩