Deep backward stochastic differential equation method

<h2 id="history">History</h2>
<h3>Backwards stochastic differential equations</h3>
BSDEs were first introduced by Pardoux and Peng in 1990 and have since become essential tools in <a href="/facts/Stochastic_control/ETvJLl24">stochastic control</a> and <a href="/facts/Financial_mathematics/P9otEltQ">financial mathematics</a>. In the 1990s, <a href="/facts/%25C3%2589tienne_Pardoux/LWukrJSe">Étienne Pardoux</a> and <a href="/facts/Shige_Peng/0FHNC3gK">Shige Peng</a> established the existence and uniqueness theory for BSDE solutions, applying BSDEs to financial mathematics and control theory. For instance, BSDEs have been widely used in option pricing, risk measurement, and dynamic hedging.<a class="footnote-ref" id="fnref:2" href="#fn:2">2</a>

<h3>Deep learning</h3>

<a href="/facts/Deep_Learning/JLuwD3ea">Deep Learning</a> is a <a href="/facts/Machine_learning/e0w0XJTu">machine learning</a> method based on multilayer <a href="/facts/Neural_networks/DDhJTfMc">neural networks</a>. Its core concept can be traced back to the neural computing models of the 1940s. In the 1980s, the proposal of the <a href="/facts/Backpropagation/lCsIdKHc">backpropagation</a> algorithm made the training of multilayer neural networks possible. In 2006, the <a href="/facts/Deep_Belief_Networks/aPkAjLIU">Deep Belief Networks</a> proposed by <a href="/facts/Geoffrey_Hinton/HJU6lC2H">Geoffrey Hinton</a> and others rekindled interest in deep learning. Since then, deep learning has made groundbreaking advancements in <a href="/facts/Image_processing/zenalNCf">image processing</a>, <a href="/facts/Speech_recognition/z7S7Pgk6">speech recognition</a>, <a href="/facts/Natural_language_processing/1hjMKsSN">natural language processing</a>, and other fields.<a class="footnote-ref" id="fnref:3" href="#fn:3">3</a>

<h3>Limitations of traditional numerical methods</h3>
Traditional numerical methods for solving stochastic differential equations<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a> include the <a href="/facts/Euler%25E2%2580%2593Maruyama_method/HL4K30c4">Euler–Maruyama method</a>, <a href="/facts/Milstein_method/YlnTHt5J">Milstein method</a>, <a href="/facts/Runge%25E2%2580%2593Kutta_method_(SDE)/zm8G0I5N">Runge–Kutta method (SDE)</a> and methods based on different representations of iterated stochastic integrals.<a class="footnote-ref" id="fnref:5" href="#fn:5">5</a><a class="footnote-ref" id="fnref:6" href="#fn:6">6</a>
But as financial problems become more complex, traditional numerical methods for BSDEs (such as the <a href="/facts/Monte_Carlo_method/AkHjY7jc">Monte Carlo method</a>, <a href="/facts/Finite_difference_method/EoG81Qn9">finite difference method</a>, etc.) have shown limitations such as high computational complexity and the curse of dimensionality.<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a> 

<ol><li>In high-dimensional scenarios, the Monte Carlo method requires numerous simulation paths to ensure accuracy, resulting in lengthy computation times. In particular, for nonlinear BSDEs, the convergence rate is slow, making it challenging to handle complex financial derivative pricing problems.<a class="footnote-ref" id="fnref:8" href="#fn:8">8</a><a class="footnote-ref" id="fnref:9" href="#fn:9">9</a> </li>
<li>The finite difference method, on the other hand, experiences exponential growth in the number of computation grids with increasing dimensions, leading to significant computational and storage demands. This method is generally suitable for simple boundary conditions and low-dimensional BSDEs, but it is less effective in complex situations.<a class="footnote-ref" id="fnref:10" href="#fn:10">10</a></li></ol>
<h3>Deep BSDE method</h3>
The combination of deep learning with BSDEs, known as deep BSDE, was proposed by Han, Jentzen, and E in 2018 as a solution to the high-dimensional challenges faced by traditional numerical methods. The Deep BSDE approach leverages the powerful nonlinear fitting capabilities of deep learning, approximating the solution of BSDEs by constructing neural networks. The specific idea is to represent the solution of a BSDE as the output of a neural network and train the network to approximate the solution.<a class="footnote-ref" id="fnref:11" href="#fn:11">11</a>

<h2 id="model">Model</h2>
<h3>Mathematical method</h3>
Backward Stochastic Differential Equations (BSDEs) represent a powerful mathematical tool extensively applied in fields such as <a href="/facts/Stochastic_control/ETvJLl24">stochastic control</a>, <a href="/facts/Financial_mathematics/P9otEltQ">financial mathematics</a>, and beyond. Unlike traditional <a href="/facts/Stochastic_differential_equations/ol9a9WNq">Stochastic differential equations </a>(SDEs), which are solved forward in time, BSDEs are solved backward, starting from a future time and moving backwards to the present. This unique characteristic makes BSDEs particularly suitable for problems involving terminal conditions and uncertainties.<a class="footnote-ref" id="fnref:12" href="#fn:12">12</a>

A backward stochastic differential equation (BSDE) can be formulated as:<a class="footnote-ref" id="fnref:13" href="#fn:13">13</a>

Y
          
            t
          
        
        =
        ξ
        +
        
          ∫
          
            t
          
          
            T
          
        
        f
        (
        s
        ,
        
          Y
          
            s
          
        
        ,
        
          Z
          
            s
          
        
        )
        
        d
        s
        −
        
          ∫
          
            t
          
          
            T
          
        
        
          Z
          
            s
          
        
        
        d
        
          W
          
            s
          
        
        ,
        
        t
        ∈
        [
        0
        ,
        T
        ]
      
    
    {\displaystyle Y_{t}=\xi +\int _{t}^{T}f(s,Y_{s},Z_{s})\,ds-\int _{t}^{T}Z_{s}\,dW_{s},\quad t\in [0,T]}

In this equation:

<ul><li>
 
 
 
 ξ
 
 
 {\displaystyle \xi }
 
 is the terminal condition specified at time 
 
 
 
 T
 
 
 {\displaystyle T}
 
.</li>
<li>
 
 
 
 f
 :
 [
 0
 ,
 T
 ]
 ×
 
 R
 
 ×
 
 R
 
 →
 
 R
 
 
 
 {\displaystyle f:[0,T]\times \mathbb {R} \times \mathbb {R} \to \mathbb {R} }
 
 is called the generator of the BSDE</li>
<li>
 
 
 
 (
 
 Y
 
 t
 
 
 ,
 
 Z
 
 t
 
 
 
 )
 
 t
 ∈
 [
 0
 ,
 T
 ]
 
 
 
 
 {\displaystyle (Y_{t},Z_{t})_{t\in [0,T]}}
 
 is the solution consists of stochastic processes 
 
 
 
 (
 
 Y
 
 t
 
 
 
 )
 
 t
 ∈
 [
 0
 ,
 T
 ]
 
 
 
 
 {\displaystyle (Y_{t})_{t\in [0,T]}}
 
 and 
 
 
 
 (
 
 Z
 
 t
 
 
 
 )
 
 t
 ∈
 [
 0
 ,
 T
 ]
 
 
 
 
 {\displaystyle (Z_{t})_{t\in [0,T]}}
 
 which are adapted to the filtration 
 
 
 
 (
 
 
 
 F
 
 
 
 t
 
 
 
 )
 
 t
 ∈
 [
 0
 ,
 T
 ]
 
 
 
 
 {\displaystyle ({\mathcal {F}}_{t})_{t\in [0,T]}}
 
</li>
<li>
 
 
 
 
 W
 
 s
 
 
 
 
 {\displaystyle W_{s}}
 
 is a standard <a href="/facts/Brownian_motion/Th0g0wkr">Brownian motion</a>.</li></ul>
The goal is to find adapted processes 
 
 
 
 
 Y
 
 t
 
 
 
 
 {\displaystyle Y_{t}}
 
 and 
 
 
 
 
 Z
 
 t
 
 
 
 
 {\displaystyle Z_{t}}
 
 that satisfy this equation. Traditional numerical methods struggle with BSDEs due to the curse of dimensionality, which makes computations in high-dimensional spaces extremely challenging.<a class="footnote-ref" id="fnref:14" href="#fn:14">14</a>

<h3>Methodology overview</h3>
Source:<a class="footnote-ref" id="fnref:15" href="#fn:15">15</a>

<h4>1. Semilinear parabolic PDEs</h4>
We consider a general class of PDEs represented by

∂
              u
            
            
              ∂
              t
            
          
        
        (
        t
        ,
        x
        )
        +
        
          
            1
            2
          
        
        
          Tr
        
        
          (
          
            σ
            
              σ
              
                T
              
            
            (
            t
            ,
            x
            )
            
              (
              
                
                  
                    Hess
                  
                  
                    x
                  
                
                u
                (
                t
                ,
                x
                )
              
              )
            
          
          )
        
        +
        ∇
        u
        (
        t
        ,
        x
        )
        ⋅
        μ
        (
        t
        ,
        x
        )
        +
        f
        
          (
          
            t
            ,
            x
            ,
            u
            (
            t
            ,
            x
            )
            ,
            
              σ
              
                T
              
            
            (
            t
            ,
            x
            )
            ∇
            u
            (
            t
            ,
            x
            )
          
          )
        
        =
        0
      
    
    {\displaystyle {\frac {\partial u}{\partial t}}(t,x)+{\frac {1}{2}}{\text{Tr}}\left(\sigma \sigma ^{T}(t,x)\left({\text{Hess}}_{x}u(t,x)\right)\right)+\nabla u(t,x)\cdot \mu (t,x)+f\left(t,x,u(t,x),\sigma ^{T}(t,x)\nabla u(t,x)\right)=0}

In this equation:

<ul><li>
 
 
 
 u
 (
 T
 ,
 x
 )
 =
 g
 (
 x
 )
 
 
 {\displaystyle u(T,x)=g(x)}
 
 is the terminal condition specified at time 
 
 
 
 T
 
 
 {\displaystyle T}
 
.</li>
<li>
 
 
 
 t
 
 
 {\displaystyle t}
 
 and 
 
 
 
 x
 
 
 {\displaystyle x}
 
 represent the time and 
 
 
 
 d
 
 
 {\displaystyle d}
 
-dimensional space variable, respectively.</li>
<li>
 
 
 
 σ
 
 
 {\displaystyle \sigma }
 
 is a known vector-valued function, 
 
 
 
 
 σ
 
 T
 
 
 
 
 {\displaystyle \sigma ^{T}}
 
 denotes the transpose associated to 
 
 
 
 σ
 
 
 {\displaystyle \sigma }
 
, and 
 
 
 
 
 
 Hess
 
 
 x
 
 
 u
 
 
 {\displaystyle {\text{Hess}}_{x}u}
 
 denotes the Hessian of function 
 
 
 
 u
 
 
 {\displaystyle u}
 
 with respect to 
 
 
 
 x
 
 
 {\displaystyle x}
 
.</li>
<li>
 
 
 
 μ
 
 
 {\displaystyle \mu }
 
 is a known vector-valued function, and 
 
 
 
 f
 
 
 {\displaystyle f}
 
 is a known nonlinear function.</li></ul>
<h4>2. Stochastic process representation</h4>
Let 
 
 
 
 {
 
 W
 
 t
 
 
 
 }
 
 t
 ≥
 0
 
 
 
 
 {\displaystyle \{W_{t}\}_{t\geq 0}}
 
 be a 
 
 
 
 d
 
 
 {\displaystyle d}
 
-dimensional Brownian motion and 
 
 
 
 {
 
 X
 
 t
 
 
 
 }
 
 t
 ≥
 0
 
 
 
 
 {\displaystyle \{X_{t}\}_{t\geq 0}}
 
 be a 
 
 
 
 d
 
 
 {\displaystyle d}
 
-dimensional stochastic process which satisfies

 
 
 
 
 X
 
 t
 
 
 =
 ξ
 +
 
 ∫
 
 0
 
 
 t
 
 
 μ
 (
 s
 ,
 
 X
 
 s
 
 
 )
 
 d
 s
 +
 
 ∫
 
 0
 
 
 t
 
 
 σ
 (
 s
 ,
 
 X
 
 s
 
 
 )
 
 d
 
 W
 
 s
 
 
 
 
 {\displaystyle X_{t}=\xi +\int _{0}^{t}\mu (s,X_{s})\,ds+\int _{0}^{t}\sigma (s,X_{s})\,dW_{s}}

<h4>3. Backward stochastic differential equation (BSDE)</h4>
Then the solution of the PDE satisfies the following BSDE:

 
 
 
 u
 (
 t
 ,
 
 X
 
 t
 
 
 )
 −
 u
 (
 0
 ,
 
 X
 
 0
 
 
 )
 
 
 {\displaystyle u(t,X_{t})-u(0,X_{0})}

=
        −
        
          ∫
          
            0
          
          
            t
          
        
        f
        
          (
          
            s
            ,
            
              X
              
                s
              
            
            ,
            u
            (
            s
            ,
            
              X
              
                s
              
            
            )
            ,
            
              σ
              
                T
              
            
            (
            s
            ,
            
              X
              
                s
              
            
            )
            ∇
            u
            (
            s
            ,
            
              X
              
                s
              
            
            )
          
          )
        
        
        d
        s
        +
        
          ∫
          
            0
          
          
            t
          
        
        ∇
        u
        (
        s
        ,
        
          X
          
            s
          
        
        )
        ⋅
        σ
        (
        s
        ,
        
          X
          
            s
          
        
        )
        
        d
        
          W
          
            s
          
        
      
    
    {\displaystyle =-\int _{0}^{t}f\left(s,X_{s},u(s,X_{s}),\sigma ^{T}(s,X_{s})\nabla u(s,X_{s})\right)\,ds+\int _{0}^{t}\nabla u(s,X_{s})\cdot \sigma (s,X_{s})\,dW_{s}}

<h4>4. Temporal discretization</h4>
Discretize the time interval 
 
 
 
 [
 0
 ,
 T
 ]
 
 
 {\displaystyle [0,T]}
 
 into steps 
 
 
 
 0
 =
 
 t
 
 0
 
 
 <
 
 t
 
 1
 
 
 <
 ⋯
 <
 
 t
 
 N
 
 
 =
 T
 
 
 {\displaystyle 0=t_{0}<t_{1}<\cdots <t_{N}=T}
 
:

 
 
 
 
 X
 
 
 t
 
 n
 +
 1
 
 
 
 
 −
 
 X
 
 
 t
 
 n
 
 
 
 
 ≈
 μ
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 
 t
 
 n
 
 
 
 
 )
 Δ
 
 t
 
 n
 
 
 +
 σ
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 
 t
 
 n
 
 
 
 
 )
 Δ
 
 W
 
 n
 
 
 
 
 {\displaystyle X_{t_{n+1}}-X_{t_{n}}\approx \mu (t_{n},X_{t_{n}})\Delta t_{n}+\sigma (t_{n},X_{t_{n}})\Delta W_{n}}

u
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 
 t
 
 n
 +
 1
 
 
 
 
 )
 −
 u
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 
 t
 
 n
 
 
 
 
 )
 
 
 {\displaystyle u(t_{n},X_{t_{n+1}})-u(t_{n},X_{t_{n}})}

≈
        −
        f
        
          (
          
            
              t
              
                n
              
            
            ,
            
              X
              
                
                  t
                  
                    n
                  
                
              
            
            ,
            u
            (
            
              t
              
                n
              
            
            ,
            
              X
              
                
                  t
                  
                    n
                  
                
              
            
            )
            ,
            
              σ
              
                T
              
            
            (
            
              t
              
                n
              
            
            ,
            
              X
              
                
                  t
                  
                    n
                  
                
              
            
            )
            ∇
            u
            (
            
              t
              
                n
              
            
            ,
            
              X
              
                
                  t
                  
                    n
                  
                
              
            
            )
          
          )
        
        Δ
        
          t
          
            n
          
        
        +
        
          [
          
            ∇
            u
            (
            
              t
              
                n
              
            
            ,
            
              X
              
                
                  t
                  
                    n
                  
                
              
            
            )
            σ
            (
            
              t
              
                n
              
            
            ,
            
              X
              
                
                  t
                  
                    n
                  
                
              
            
            )
          
          ]
        
        Δ
        
          W
          
            n
          
        
      
    
    {\displaystyle \approx -f\left(t_{n},X_{t_{n}},u(t_{n},X_{t_{n}}),\sigma ^{T}(t_{n},X_{t_{n}})\nabla u(t_{n},X_{t_{n}})\right)\Delta t_{n}+\left[\nabla u(t_{n},X_{t_{n}})\sigma (t_{n},X_{t_{n}})\right]\Delta W_{n}}

where 
 
 
 
 Δ
 
 t
 
 n
 
 
 =
 
 t
 
 n
 +
 1
 
 
 −
 
 t
 
 n
 
 
 
 
 {\displaystyle \Delta t_{n}=t_{n+1}-t_{n}}
 
 and 
 
 
 
 Δ
 
 W
 
 n
 
 
 =
 
 W
 
 
 t
 
 n
 +
 1
 
 
 
 
 −
 
 W
 
 n
 
 
 
 
 {\displaystyle \Delta W_{n}=W_{t_{n+1}}-W_{n}}
 
.

<h4>5. Neural network approximation</h4>
Use a multilayer feedforward neural network to approximate:

 
 
 
 
 σ
 
 T
 
 
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 n
 
 
 )
 ∇
 u
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 n
 
 
 )
 ≈
 (
 
 σ
 
 T
 
 
 ∇
 u
 )
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 n
 
 
 ;
 
 θ
 
 n
 
 
 )
 
 
 {\displaystyle \sigma ^{T}(t_{n},X_{n})\nabla u(t_{n},X_{n})\approx (\sigma ^{T}\nabla u)(t_{n},X_{n};\theta _{n})}

for 
 
 
 
 n
 =
 1
 ,
 …
 ,
 N
 
 
 {\displaystyle n=1,\ldots ,N}
 
, where 
 
 
 
 
 θ
 
 n
 
 
 
 
 {\displaystyle \theta _{n}}
 
 are parameters of the neural network approximating 
 
 
 
 x
 ↦
 
 σ
 
 T
 
 
 (
 t
 ,
 x
 )
 ∇
 u
 (
 t
 ,
 x
 )
 
 
 {\displaystyle x\mapsto \sigma ^{T}(t,x)\nabla u(t,x)}
 
 at 
 
 
 
 t
 =
 
 t
 
 n
 
 
 
 
 {\displaystyle t=t_{n}}
 
.

<h4>6. Training the neural network</h4>
Stack all sub-networks in the approximation step to form a deep neural network. Train the network using paths 
 
 
 
 {
 
 X
 
 
 t
 
 n
 
 
 
 
 
 }
 
 0
 ≤
 n
 ≤
 N
 
 
 
 
 {\displaystyle \{X_{t_{n}}\}_{0\leq n\leq N}}
 
 and 
 
 
 
 {
 
 W
 
 
 t
 
 n
 
 
 
 
 
 }
 
 0
 ≤
 n
 ≤
 N
 
 
 
 
 {\displaystyle \{W_{t_{n}}\}_{0\leq n\leq N}}
 
 as input data, minimizing the loss function:

 
 
 
 l
 (
 θ
 )
 =
 
 E
 
 
 
 |
 
 g
 (
 
 X
 
 
 t
 
 N
 
 
 
 
 )
 −
 
 
 
 u
 ^
 
 
 
 
 (
 
 {
 
 X
 
 
 t
 
 n
 
 
 
 
 
 }
 
 0
 ≤
 n
 ≤
 N
 
 
 ,
 {
 
 W
 
 
 t
 
 n
 
 
 
 
 
 }
 
 0
 ≤
 n
 ≤
 N
 
 
 ;
 θ
 
 )
 
 
 |
 
 
 2
 
 
 
 
 {\displaystyle l(\theta )=\mathbb {E} \left|g(X_{t_{N}})-{\hat {u}}\left(\{X_{t_{n}}\}_{0\leq n\leq N},\{W_{t_{n}}\}_{0\leq n\leq N};\theta \right)\right|^{2}}

where 
 
 
 
 
 
 
 u
 ^
 
 
 
 
 
 {\displaystyle {\hat {u}}}
 
 is the approximation of 
 
 
 
 u
 (
 t
 ,
 
 X
 
 t
 
 
 )
 
 
 {\displaystyle u(t,X_{t})}
 
.

<h3>Neural network architecture</h3>
Source:<a class="footnote-ref" id="fnref:16" href="#fn:16">16</a>

Deep learning encompass a class of machine learning techniques that have transformed numerous fields by enabling the modeling and interpretation of intricate data structures. These methods, often referred to as <a href="/facts/Deep_learning/JLuwD3ea">deep learning</a>, are distinguished by their hierarchical architecture comprising multiple layers of interconnected nodes, or neurons. This architecture allows deep neural networks to autonomously learn abstract representations of data, making them particularly effective in tasks such as <a href="/facts/Image_recognition/Tl2Yyk66">image recognition</a>, <a href="/facts/Natural_language_processing/1hjMKsSN">natural language processing</a>, and <a href="/facts/Financial_modeling/Qhk0pV2W">financial modeling</a>. The core of this method lies in designing an appropriate neural network structure (such as <a href="/facts/Fully_connected_network/rBYE0kke">fully connected networks</a> or <a href="/facts/Recurrent_neural_networks/bx7hBVB1">recurrent neural networks</a>) and selecting effective optimization algorithms.<a class="footnote-ref" id="fnref:17" href="#fn:17">17</a>
The choice of deep BSDE network architecture, the number of layers, and the number of neurons per layer are crucial hyperparameters that significantly impact the performance of the deep BSDE method. The deep BSDE method constructs neural networks to approximate the solutions for 
 
 
 
 Y
 
 
 {\displaystyle Y}
 
 and 
 
 
 
 Z
 
 
 {\displaystyle Z}
 
, and utilizes <a href="/facts/Stochastic_gradient_descent/HbcaYqQP">stochastic gradient descent</a> and other optimization algorithms for training.<a class="footnote-ref" id="fnref:18" href="#fn:18">18</a>
The fig illustrates the network architecture for the deep BSDE method. Note that 
 
 
 
 ∇
 u
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 
 t
 
 n
 
 
 
 
 )
 
 
 {\displaystyle \nabla u(t_{n},X_{t_{n}})}
 
 denotes the variable approximated directly by subnetworks, and 
 
 
 
 u
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 
 t
 
 n
 
 
 
 
 )
 
 
 {\displaystyle u(t_{n},X_{t_{n}})}
 
 denotes the variable computed iteratively in the network. There are three types of connections in this network:<a class="footnote-ref" id="fnref:19" href="#fn:19">19</a>
i) 
 
 
 
 
 X
 
 
 t
 
 n
 
 
 
 
 →
 
 h
 
 1
 
 
 n
 
 
 →
 
 h
 
 2
 
 
 n
 
 
 →
 …
 →
 
 h
 
 H
 
 
 n
 
 
 →
 ∇
 u
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 
 t
 
 n
 
 
 
 
 )
 
 
 {\displaystyle X_{t_{n}}\rightarrow h_{1}^{n}\rightarrow h_{2}^{n}\rightarrow \ldots \rightarrow h_{H}^{n}\rightarrow \nabla u(t_{n},X_{t_{n}})}
 
 is the multilayer feedforward neural network approximating the spatial gradients at time 
 
 
 
 t
 =
 
 t
 
 n
 
 
 
 
 {\displaystyle t=t_{n}}
 
. The weights 
 
 
 
 
 θ
 
 n
 
 
 
 
 {\displaystyle \theta _{n}}
 
 of this subnetwork are the parameters optimized.
ii) 
 
 
 
 (
 u
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 
 t
 
 n
 
 
 
 
 )
 ,
 ∇
 u
 (
 
 t
 
 n
 
 
 ,
 
 X
 
 
 t
 
 n
 
 
 
 
 )
 ,
 
 W
 
 
 t
 
 n
 
 
 +
 1
 
 
 −
 
 W
 
 
 t
 
 n
 
 
 
 
 )
 →
 u
 (
 
 t
 
 n
 +
 1
 
 
 ,
 
 X
 
 
 t
 
 n
 +
 1
 
 
 
 
 )
 
 
 {\displaystyle (u(t_{n},X_{t_{n}}),\nabla u(t_{n},X_{t_{n}}),W_{t_{n}+1}-W_{t_{n}})\rightarrow u(t_{n+1},X_{t_{n+1}})}
 
 is the forward iteration providing the final output of the network as an approximation of 
 
 
 
 u
 (
 
 t
 
 N
 
 
 ,
 
 X
 
 
 t
 
 N
 
 
 
 
 )
 
 
 {\displaystyle u(t_{N},X_{t_{N}})}
 
, characterized by Eqs. 5 and 6. There are no parameters optimized in this type of connection.
iii) 
 
 
 
 (
 
 X
 
 
 t
 
 n
 
 
 
 
 ,
 
 W
 
 
 t
 
 n
 
 
 +
 1
 
 
 −
 
 W
 
 
 t
 
 n
 
 
 
 
 )
 →
 
 X
 
 
 t
 
 n
 +
 1
 
 
 
 
 
 
 {\displaystyle (X_{t_{n}},W_{t_{n}+1}-W_{t_{n}})\rightarrow X_{t_{n+1}}}
 
 is the shortcut connecting blocks at different times, characterized by Eqs. 4 and 6. There are also no parameters optimized in this type of connection.

<h2 id="algorithms">Algorithms</h2>

<h3>Adam optimizer</h3>
This function implements the Adam<a class="footnote-ref" id="fnref:20" href="#fn:20">20</a> algorithm for minimizing the target function 
 
 
 
 
 
 G
 
 
 (
 θ
 )
 
 
 {\displaystyle {\mathcal {G}}(\theta )}
 
.

Function: ADAM(
 
 
 
 α
 
 
 {\displaystyle \alpha }
 
, 
 
 
 
 
 β
 
 1
 
 
 
 
 {\displaystyle \beta _{1}}
 
, 
 
 
 
 
 β
 
 2
 
 
 
 
 {\displaystyle \beta _{2}}
 
, 
 
 
 
 ϵ
 
 
 {\displaystyle \epsilon }
 
, 
 
 
 
 
 
 G
 
 
 (
 θ
 )
 
 
 {\displaystyle {\mathcal {G}}(\theta )}
 
, 
 
 
 
 
 θ
 
 0
 
 
 
 
 {\displaystyle \theta _{0}}
 
) is

m
 
 0
 
 
 :=
 0
 
 
 {\displaystyle m_{0}:=0}
 
 // Initialize the first moment vector
 
 
 
 
 
 v
 
 0
 
 
 :=
 0
 
 
 {\displaystyle v_{0}:=0}
 
 // Initialize the second moment vector
 
 
 
 
 t
 :=
 0
 
 
 {\displaystyle t:=0}
 
 // Initialize timestep

// Step 1: Initialize parameters
 
 
 
 
 
 θ
 
 t
 
 
 :=
 
 θ
 
 0
 
 
 
 
 {\displaystyle \theta _{t}:=\theta _{0}}

// Step 2: Optimization loop
 while 
 
 
 
 
 θ
 
 t
 
 
 
 
 {\displaystyle \theta _{t}}
 
 has not converged do
 
 
 
 
 t
 :=
 t
 +
 1
 
 
 {\displaystyle t:=t+1}

g
 
 t
 
 
 :=
 
 ∇
 
 θ
 
 
 
 
 
 G
 
 
 
 t
 
 
 (
 
 θ
 
 t
 −
 1
 
 
 )
 
 
 {\displaystyle g_{t}:=\nabla _{\theta }{\mathcal {G}}_{t}(\theta _{t-1})}
 
 // Compute gradient of 
 
 
 
 
 
 G
 
 
 
 
 {\displaystyle {\mathcal {G}}}
 
 at timestep 
 
 
 
 t
 
 
 {\displaystyle t}
 

 
 
 
 
 
 m
 
 t
 
 
 :=
 
 β
 
 1
 
 
 ⋅
 
 m
 
 t
 −
 1
 
 
 +
 (
 1
 −
 
 β
 
 1
 
 
 )
 ⋅
 
 g
 
 t
 
 
 
 
 {\displaystyle m_{t}:=\beta _{1}\cdot m_{t-1}+(1-\beta _{1})\cdot g_{t}}
 
 // Update biased first moment estimate
 
 
 
 
 
 v
 
 t
 
 
 :=
 
 β
 
 2
 
 
 ⋅
 
 v
 
 t
 −
 1
 
 
 +
 (
 1
 −
 
 β
 
 2
 
 
 )
 ⋅
 
 g
 
 t
 
 
 2
 
 
 
 
 {\displaystyle v_{t}:=\beta _{2}\cdot v_{t-1}+(1-\beta _{2})\cdot g_{t}^{2}}
 
 // Update biased second raw moment estimate
 
 
 
 
 
 
 
 
 m
 ^
 
 
 
 
 t
 
 
 :=
 
 
 
 m
 
 t
 
 
 
 (
 1
 −
 
 β
 
 1
 
 
 t
 
 
 )
 
 
 
 
 
 {\displaystyle {\widehat {m}}_{t}:={\frac {m_{t}}{(1-\beta _{1}^{t})}}}
 
 // Compute bias-corrected first moment estimate
 
 
 
 
 
 
 
 
 v
 ^
 
 
 
 
 t
 
 
 :=
 
 
 
 v
 
 t
 
 
 
 (
 1
 −
 
 β
 
 2
 
 
 t
 
 
 )
 
 
 
 
 
 {\displaystyle {\widehat {v}}_{t}:={\frac {v_{t}}{(1-\beta _{2}^{t})}}}
 
 // Compute bias-corrected second moment estimate
 
 
 
 
 
 θ
 
 t
 
 
 :=
 
 θ
 
 t
 −
 1
 
 
 −
 
 
 
 α
 ⋅
 
 
 
 
 m
 ^
 
 
 
 
 t
 
 
 
 
 (
 
 
 
 
 
 
 v
 ^
 
 
 
 
 t
 
 
 
 
 +
 ϵ
 )
 
 
 
 
 
 {\displaystyle \theta _{t}:=\theta _{t-1}-{\frac {\alpha \cdot {\widehat {m}}_{t}}{({\sqrt {{\widehat {v}}_{t}}}+\epsilon )}}}
 
 // Update parameters
 
 return 
 
 
 
 
 θ
 
 t
 
 
 
 
 {\displaystyle \theta _{t}}

<ul><li>With the ADAM algorithm described above, we now present the pseudocode corresponding to a multilayer feedforward neural network:</li></ul>
<h3>Backpropagation algorithm</h3>
This function implements the backpropagation algorithm for training a multi-layer feedforward neural network.

Function: BackPropagation(set 
 
 
 
 D
 =
 
 
 {
 
 (
 
 
 x
 
 
 k
 
 
 ,
 
 
 y
 
 
 k
 
 
 )
 
 }
 
 
 k
 =
 1
 
 
 m
 
 
 
 
 {\displaystyle D=\left\{(\mathbf {x} _{k},\mathbf {y} _{k})\right\}_{k=1}^{m}}
 
) is
 // Step 1: Random initialization
 // Step 2: Optimization loop
 repeat until termination condition is met:
 for each 
 
 
 
 (
 
 
 x
 
 
 k
 
 
 ,
 
 
 y
 
 
 k
 
 
 )
 ∈
 D
 
 
 {\displaystyle (\mathbf {x} _{k},\mathbf {y} _{k})\in D}
 
:
 
 
 
 
 
 
 
 
 
 y
 
 ^
 
 
 
 
 k
 
 
 :=
 f
 (
 
 β
 
 j
 
 
 −
 
 θ
 
 j
 
 
 )
 
 
 {\displaystyle {\hat {\mathbf {y} }}_{k}:=f(\beta _{j}-\theta _{j})}
 
 // Compute output
 // Compute gradients
 for each output neuron 
 
 
 
 j
 
 
 {\displaystyle j}
 
:
 
 
 
 
 
 g
 
 j
 
 
 :=
 
 
 
 
 y
 ^
 
 
 
 
 j
 
 
 k
 
 
 (
 1
 −
 
 
 
 
 y
 ^
 
 
 
 
 j
 
 
 k
 
 
 )
 (
 
 
 
 
 y
 ^
 
 
 
 
 j
 
 
 k
 
 
 −
 
 y
 
 j
 
 
 k
 
 
 )
 
 
 {\displaystyle g_{j}:={\hat {y}}_{j}^{k}(1-{\hat {y}}_{j}^{k})({\hat {y}}_{j}^{k}-y_{j}^{k})}
 
 // Gradient of output neuron
 for each hidden neuron 
 
 
 
 h
 
 
 {\displaystyle h}
 
:
 
 
 
 
 
 e
 
 h
 
 
 :=
 
 b
 
 h
 
 
 (
 1
 −
 
 b
 
 h
 
 
 )
 
 ∑
 
 j
 =
 1
 
 
 ℓ
 
 
 
 w
 
 h
 j
 
 
 
 g
 
 j
 
 
 
 
 {\displaystyle e_{h}:=b_{h}(1-b_{h})\sum _{j=1}^{\ell }w_{hj}g_{j}}
 
 // Gradient of hidden neuron
 // Update weights
 for each weight 
 
 
 
 
 w
 
 h
 j
 
 
 
 
 {\displaystyle w_{hj}}
 
:
 
 
 
 
 Δ
 
 w
 
 h
 j
 
 
 :=
 η
 
 g
 
 j
 
 
 
 b
 
 h
 
 
 
 
 {\displaystyle \Delta w_{hj}:=\eta g_{j}b_{h}}
 
 // Update rule for weight
 for each weight 
 
 
 
 
 v
 
 i
 h
 
 
 
 
 {\displaystyle v_{ih}}
 
:
 
 
 
 
 Δ
 
 v
 
 i
 h
 
 
 :=
 η
 
 e
 
 h
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle \Delta v_{ih}:=\eta e_{h}x_{i}}
 
 // Update rule for weight
 // Update parameters
 for each parameter 
 
 
 
 
 θ
 
 j
 
 
 
 
 {\displaystyle \theta _{j}}
 
:
 
 
 
 
 Δ
 
 θ
 
 j
 
 
 :=
 −
 η
 
 g
 
 j
 
 
 
 
 {\displaystyle \Delta \theta _{j}:=-\eta g_{j}}
 
 // Update rule for parameter
 for each parameter 
 
 
 
 
 γ
 
 h
 
 
 
 
 {\displaystyle \gamma _{h}}
 
:
 
 
 
 
 Δ
 
 γ
 
 h
 
 
 :=
 −
 η
 
 e
 
 h
 
 
 
 
 {\displaystyle \Delta \gamma _{h}:=-\eta e_{h}}
 
 // Update rule for parameter

// Step 3: Construct the trained multi-layer feedforward neural network

return trained neural network

<ul><li>Combining the ADAM algorithm and a multilayer feedforward neural network, we provide the following pseudocode for solving the optimal investment portfolio:</li></ul>
<h3>Numerical solution for optimal investment portfolio</h3>
Source:<a class="footnote-ref" id="fnref:21" href="#fn:21">21</a>
This function calculates the optimal investment portfolio using the specified parameters and stochastic processes.

function OptimalInvestment(
 
 
 
 
 W
 
 
 t
 
 i
 +
 1
 
 
 
 
 −
 
 W
 
 
 t
 
 i
 
 
 
 
 
 
 {\displaystyle W_{t_{i+1}}-W_{t_{i}}}
 
, 
 
 
 
 x
 
 
 {\displaystyle x}
 
, 
 
 
 
 θ
 =
 (
 
 X
 
 0
 
 
 ,
 
 H
 
 0
 
 
 ,
 
 θ
 
 1
 
 
 ,
 
 θ
 
 2
 
 
 ,
 …
 ,
 
 θ
 
 N
 −
 1
 
 
 )
 
 
 {\displaystyle \theta =(X_{0},H_{0},\theta _{1},\theta _{2},\dots ,\theta _{N-1})}
 
) is
 // Step 1: Initialization
 for 
 
 
 
 k
 :=
 0
 
 
 {\displaystyle k:=0}
 
 to maxstep do
 
 
 
 
 
 M
 
 0
 
 
 k
 ,
 m
 
 
 :=
 0
 
 
 {\displaystyle M_{0}^{k,m}:=0}
 
, 
 
 
 
 
 X
 
 0
 
 
 k
 ,
 m
 
 
 :=
 
 X
 
 0
 
 
 k
 
 
 
 
 {\displaystyle X_{0}^{k,m}:=X_{0}^{k}}
 
 // Parameter initialization
 for 
 
 
 
 i
 :=
 0
 
 
 {\displaystyle i:=0}
 
 to 
 
 
 
 N
 −
 1
 
 
 {\displaystyle N-1}
 
 do
 
 
 
 
 
 H
 
 
 t
 
 i
 
 
 
 
 k
 ,
 m
 
 
 :=
 
 
 N
 N
 
 
 (
 
 M
 
 
 t
 
 i
 
 
 
 
 k
 ,
 m
 
 
 ;
 
 θ
 
 i
 
 
 k
 
 
 )
 
 
 {\displaystyle H_{t_{i}}^{k,m}:={\mathcal {NN}}(M_{t_{i}}^{k,m};\theta _{i}^{k})}
 
 // Update feedforward neural network unit
 
 
 
 
 
 M
 
 
 t
 
 i
 +
 1
 
 
 
 
 k
 ,
 m
 
 
 :=
 
 M
 
 
 t
 
 i
 
 
 
 
 k
 ,
 m
 
 
 +
 
 
 (
 
 
 (
 1
 −
 ϕ
 )
 (
 
 μ
 
 
 t
 
 i
 
 
 
 
 −
 
 M
 
 
 t
 
 i
 
 
 
 
 k
 ,
 m
 
 
 )
 
 
 )
 
 
 (
 
 t
 
 i
 +
 1
 
 
 −
 
 t
 
 i
 
 
 )
 +
 
 σ
 
 
 t
 
 i
 
 
 
 
 (
 
 W
 
 
 t
 
 i
 +
 1
 
 
 
 
 −
 
 W
 
 
 t
 
 i
 
 
 
 
 )
 
 
 {\displaystyle M_{t_{i+1}}^{k,m}:=M_{t_{i}}^{k,m}+{\big (}(1-\phi )(\mu _{t_{i}}-M_{t_{i}}^{k,m}){\big )}(t_{i+1}-t_{i})+\sigma _{t_{i}}(W_{t_{i+1}}-W_{t_{i}})}

X
          
            
              t
              
                i
                +
                1
              
            
          
          
            k
            ,
            m
          
        
        :=
        
          X
          
            
              t
              
                i
              
            
          
          
            k
            ,
            m
          
        
        +
        
          
            [
          
        
        
          H
          
            
              t
              
                i
              
            
          
          
            k
            ,
            m
          
        
        (
        ϕ
        (
        
          M
          
            
              t
              
                i
              
            
          
          
            k
            ,
            m
          
        
        −
        
          μ
          
            
              t
              
                i
              
            
          
        
        )
        +
        
          μ
          
            
              t
              
                i
              
            
          
        
        )
        
          
            ]
          
        
        (
        
          t
          
            i
            +
            1
          
        
        −
        
          t
          
            i
          
        
        )
        +
        
          H
          
            
              t
              
                i
              
            
          
          
            k
            ,
            m
          
        
        (
        
          W
          
            
              t
              
                i
                +
                1
              
            
          
        
        −
        
          W
          
            
              t
              
                i
              
            
          
        
        )
      
    
    {\displaystyle X_{t_{i+1}}^{k,m}:=X_{t_{i}}^{k,m}+{\big [}H_{t_{i}}^{k,m}(\phi (M_{t_{i}}^{k,m}-\mu _{t_{i}})+\mu _{t_{i}}){\big ]}(t_{i+1}-t_{i})+H_{t_{i}}^{k,m}(W_{t_{i+1}}-W_{t_{i}})}

// Step 2: Compute loss function
 
 
 
 
 
 
 L
 
 
 (
 t
 )
 :=
 
 
 1
 M
 
 
 
 ∑
 
 m
 =
 1
 
 
 M
 
 
 
 
 |
 
 
 X
 
 
 t
 
 N
 
 
 
 
 k
 ,
 m
 
 
 −
 g
 (
 
 M
 
 
 t
 
 N
 
 
 
 
 k
 ,
 m
 
 
 )
 
 |
 
 
 2
 
 
 
 
 {\displaystyle {\mathcal {L}}(t):={\frac {1}{M}}\sum _{m=1}^{M}\left|X_{t_{N}}^{k,m}-g(M_{t_{N}}^{k,m})\right|^{2}}

// Step 3: Update parameters using ADAM optimization
 
 
 
 
 
 θ
 
 k
 +
 1
 
 
 :=
 ADAM
 ⁡
 (
 
 θ
 
 k
 
 
 ,
 ∇
 
 
 L
 
 
 (
 t
 )
 )
 
 
 {\displaystyle \theta ^{k+1}:=\operatorname {ADAM} (\theta ^{k},\nabla {\mathcal {L}}(t))}

X
          
            0
          
          
            k
            +
            1
          
        
        :=
        ADAM
        ⁡
        (
        
          X
          
            0
          
          
            k
          
        
        ,
        ∇
        
          
            L
          
        
        (
        t
        )
        )
      
    
    {\displaystyle X_{0}^{k+1}:=\operatorname {ADAM} (X_{0}^{k},\nabla {\mathcal {L}}(t))}

// Step 4: Return terminal state
 return 
 
 
 
 (
 
 M
 
 
 t
 
 N
 
 
 
 
 ,
 
 X
 
 
 t
 
 N
 
 
 
 
 )
 
 
 {\displaystyle (M_{t_{N}},X_{t_{N}})}

<h2 id="application">Application</h2>

Deep BSDE is widely used in the fields of financial derivatives pricing, risk management, and asset allocation. It is particularly suitable for:

<ul><li>High-Dimensional Option Pricing: Pricing complex derivatives like <a href="/facts/Basket_options/M7u75StY">basket options</a> and <a href="/facts/Asian_options/lBPLf2Pt">Asian options</a>, which involve multiple underlying assets.<a class="footnote-ref" id="fnref:22" href="#fn:22">22</a> Traditional methods such as finite difference methods and Monte Carlo simulations struggle with these high-dimensional problems due to the curse of dimensionality, where the computational cost increases exponentially with the number of dimensions. Deep BSDE methods utilize the function approximation capabilities of <a href="/facts/Deep_neural_networks/JLuwD3ea">deep neural networks</a> to manage this complexity and provide accurate pricing solutions. The deep BSDE approach is particularly beneficial in scenarios where traditional numerical methods fall short. For instance, in high-dimensional option pricing, methods like finite difference or Monte Carlo simulations face significant challenges due to the exponential increase in computational requirements with the number of dimensions. Deep BSDE methods overcome this by leveraging deep learning to approximate solutions to high-dimensional PDEs efficiently.<a class="footnote-ref" id="fnref:23" href="#fn:23">23</a></li></ul>
<ul><li>Risk Measurement: Calculating risk measures such as <a href="/facts/Conditional_Value-at-Risk/03GNwfpV">Conditional Value-at-Risk</a> (CVaR) and <a href="/facts/Expected_shortfall/03GNwfpV">Expected shortfall</a> (ES).<a class="footnote-ref" id="fnref:24" href="#fn:24">24</a> These risk measures are crucial for financial institutions to assess potential losses in their portfolios. Deep BSDE methods enable efficient computation of these risk metrics even in high-dimensional settings, thereby improving the accuracy and robustness of risk assessments. In risk management, deep BSDE methods enhance the computation of advanced risk measures like CVaR and ES, which are essential for capturing tail risk in portfolios. These measures provide a more comprehensive understanding of potential losses compared to simpler metrics like Value-at-Risk (VaR). The use of deep neural networks enables these computations to be feasible even in high-dimensional contexts, ensuring accurate and reliable risk assessments.<a class="footnote-ref" id="fnref:25" href="#fn:25">25</a></li></ul>
<ul><li>Dynamic Asset Allocation: Determining optimal strategies for asset allocation over time in a stochastic environment.<a class="footnote-ref" id="fnref:26" href="#fn:26">26</a> This involves creating investment strategies that adapt to changing market conditions and asset price dynamics. By modeling the stochastic behavior of asset returns and incorporating it into the allocation decisions, deep BSDE methods allow investors to dynamically adjust their portfolios, maximizing expected returns while managing risk effectively. For dynamic asset allocation, deep BSDE methods offer significant advantages by optimizing investment strategies in response to market changes. This dynamic approach is critical for managing portfolios in a stochastic financial environment, where asset prices are subject to random fluctuations. Deep BSDE methods provide a framework for developing and executing strategies that adapt to these fluctuations, leading to more resilient and effective asset management.<a class="footnote-ref" id="fnref:27" href="#fn:27">27</a></li></ul>
<h2 id="advantages-and-disadvantages">Advantages and disadvantages</h2>
<h3>Advantages</h3>
Sources:<a class="footnote-ref" id="fnref:28" href="#fn:28">28</a><a class="footnote-ref" id="fnref:29" href="#fn:29">29</a>

<ol><li>High-dimensional capability: Compared to traditional numerical methods, deep BSDE performs exceptionally well in high-dimensional problems.</li>
<li>Flexibility: The incorporation of deep neural networks allows this method to adapt to various types of BSDEs and financial models.</li>
<li>Parallel computing: Deep learning frameworks support GPU acceleration, significantly improving computational efficiency.</li></ol>
<h3>Disadvantages</h3>
Sources:<a class="footnote-ref" id="fnref:30" href="#fn:30">30</a><a class="footnote-ref" id="fnref:31" href="#fn:31">31</a>

<ol><li>Training time: Training deep neural networks typically requires substantial data and computational resources.</li>
<li>Parameter sensitivity: The choice of neural network architecture and hyperparameters greatly impacts the results, often requiring experience and trial-and-error.</li></ol>
<h2 id="see-also">See also</h2>

<ul><li><a href="/facts/Bellman_equation/Qh2gKSMH">Bellman equation</a></li>
<li><a href="/facts/Dynamic_programming/CLcabtmk">Dynamic programming</a></li>
<li><a href="/facts/Applications_of_artificial_intelligence/tIeFJjMo">Applications of artificial intelligence</a></li>
<li><a href="/facts/List_of_artificial_intelligence_projects/0mWJkofC">List of artificial intelligence projects</a></li>
<li><a href="/facts/Backward_stochastic_differential_equation/cIpkLhaY">Backward stochastic differential equation</a></li>
<li><a href="/facts/Stochastic_process/Ng1n1dnB">Stochastic process</a></li>
<li><a href="/facts/Stochastic_volatility/d5zBCcUP">Stochastic volatility</a></li>
<li><a href="/facts/Stochastic_partial_differential_equations/3ARbPKSL">Stochastic partial differential equations</a></li>
<li><a href="/facts/Diffusion_process/n7u5hjKX">Diffusion process</a></li>
<li><a href="/facts/Stochastic_difference_equation/LpboAQ0W">Stochastic difference equation</a></li></ul>

<h2 id="further-reading">Further reading</h2>
<ul><li>Bishop, Christopher M.; Bishop, Hugh (2024). Deep learning: foundations and concepts. Springer. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-3-031-45467-7.</li>
<li><a href="/facts/Ian_Goodfellow/ycEznz5D">Goodfellow, Ian</a>; <a href="/facts/Yoshua_Bengio/796atWBU">Bengio, Yoshua</a>; Courville, Aaron (2016). <a href="http://www.deeplearningbook.org">Deep Learning</a>. MIT Press. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0-26203561-3. <a href="https://web.archive.org/web/20160416111010/http://www.deeplearningbook.org/">Archived</a> from the original on 2016-04-16. Retrieved 2021-05-09, introductory textbook.{{cite book}}: CS1 maint: postscript (link)</li>
<li>Evans, Lawrence C (2013). <a href="https://bookstore.ams.org/mbk-82">An Introduction to Stochastic Differential Equations</a> American Mathematical Society.</li>
<li>Higham., Desmond J. (January 2001). "An Algorithmic Introduction to Numerical Simulation of Stochastic Differential Equations". SIAM Review. 43 (3): 525–546. <a href="/facts/Bibcode_(identifier)/9HtdQSGB">Bibcode</a>:<a href="https://ui.adsabs.harvard.edu/abs/2001SIAMR..43..525H">2001SIAMR..43..525H</a>. <a href="/facts/CiteSeerX_(identifier)/SceDmd3c">CiteSeerX</a> <a href="https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.137.6375">10.1.1.137.6375</a>. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1137%2FS0036144500378302">10.1137/S0036144500378302</a>.</li>
<li>Desmond Higham and Peter Kloeden: "An Introduction to the Numerical Simulation of Stochastic Differential Equations", SIAM, <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-1-611976-42-7 (2021).</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Pardoux, E.; Peng, S. (1990). "Adapted solution of a backward stochastic differential equation". Systems & Control Letters. 14 (1): 55–61. doi:10.1016/0167-6911(90)90082-6. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep Learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096. <a href="https://hal.science/hal-04206682/file/Lecun2015.pdf" target="_blank">https://hal.science/hal-04206682/file/Lecun2015.pdf</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Kloeden, P.E., Platen E. (1992). Numerical Solution of Stochastic Differential Equations. Springer, Berlin, Heidelberg. DOI: https://doi.org/10.1007/978-3-662-12616-5 <a href="https://doi.org/10.1007/978-3-662-12616-5" target="_blank">https://doi.org/10.1007/978-3-662-12616-5</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Kuznetsov, D.F. (2023). Strong approximation of iterated Itô and Stratonovich stochastic integrals: Method of generalized multiple Fourier series. Application to numerical integration of Itô SDEs and semilinear SPDEs. Differ. Uravn. Protsesy Upr., no. 1. DOI: https://doi.org/10.21638/11701/spbu35.2023.110 <a href="https://doi.org/10.21638/11701/spbu35.2023.110" target="_blank">https://doi.org/10.21638/11701/spbu35.2023.110</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Rybakov, K.A. (2023). Spectral representations of iterated stochastic integrals and their application for modeling nonlinear stochastic dynamics. Mathematics, vol. 11, 4047. DOI: https://doi.org/10.3390/math11194047 <a href="https://doi.org/10.3390/math11194047" target="_blank">https://doi.org/10.3390/math11194047</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">"Real Options with Monte Carlo Simulation". Archived from the original on 2010-03-18. Retrieved 2010-09-24. <a href="https://web.archive.org/web/20100318060412/http://www.puc-rio.br/marco.ind/monte-carlo.html" target="_blank">https://web.archive.org/web/20100318060412/http://www.puc-rio.br/marco.ind/monte-carlo.html</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">"Monte Carlo Simulation". Palisade Corporation. 2010. Retrieved 2010-09-24. <a href="http://www.palisade.com/risk/monte_carlo_simulation.asp" target="_blank">http://www.palisade.com/risk/monte_carlo_simulation.asp</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Christian Grossmann; Hans-G. Roos; Martin Stynes (2007). Numerical Treatment of Partial Differential Equations. Springer Science & Business Media. p. 23. ISBN 978-3-540-71584-9. <a href="978-3-540-71584-9" target="_blank">978-3-540-71584-9</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Pardoux, E.; Peng, S. (1990). "Adapted solution of a backward stochastic differential equation". Systems & Control Letters. 14 (1): 55–61. doi:10.1016/0167-6911(90)90082-6. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
<li id="fn:13">Ma, Jin; Yong, Jiongmin (2007). Forward-Backward Stochastic Differential Equations and their Applications. Lecture Notes in Mathematics. Vol. 1702. Springer Berlin, Heidelberg. doi:10.1007/978-3-540-48831-6. ISBN 978-3-540-65960-0. <a href="978-3-540-65960-0" target="_blank">978-3-540-65960-0</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></li>
<li id="fn:14">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:14" class="footnote-back-ref">↩</a></li>
<li id="fn:15">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:15" class="footnote-back-ref">↩</a></li>
<li id="fn:16">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:16" class="footnote-back-ref">↩</a></li>
<li id="fn:17">LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep Learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096. <a href="https://hal.science/hal-04206682/file/Lecun2015.pdf" target="_blank">https://hal.science/hal-04206682/file/Lecun2015.pdf</a> <a href="#fnref:17" class="footnote-back-ref">↩</a></li>
<li id="fn:18">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:18" class="footnote-back-ref">↩</a></li>
<li id="fn:19">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:19" class="footnote-back-ref">↩</a></li>
<li id="fn:20">Kingma, Diederik; Ba, Jimmy (2014). "Adam: A Method for Stochastic Optimization". arXiv:1412.6980 [cs.LG]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:20" class="footnote-back-ref">↩</a></li>
<li id="fn:21">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:21" class="footnote-back-ref">↩</a></li>
<li id="fn:22">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:22" class="footnote-back-ref">↩</a></li>
<li id="fn:23">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:23" class="footnote-back-ref">↩</a></li>
<li id="fn:24">Beck, C.; E, W.; Jentzen, A. (2019). "Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations". Journal of Nonlinear Science. 29 (4): 1563–1619. arXiv:1709.05963. doi:10.1007/s00332-018-9525-3. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:24" class="footnote-back-ref">↩</a></li>
<li id="fn:25">Beck, C.; E, W.; Jentzen, A. (2019). "Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations". Journal of Nonlinear Science. 29 (4): 1563–1619. arXiv:1709.05963. doi:10.1007/s00332-018-9525-3. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:25" class="footnote-back-ref">↩</a></li>
<li id="fn:26">Beck, C.; E, W.; Jentzen, A. (2019). "Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations". Journal of Nonlinear Science. 29 (4): 1563–1619. arXiv:1709.05963. doi:10.1007/s00332-018-9525-3. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:26" class="footnote-back-ref">↩</a></li>
<li id="fn:27">Beck, C.; E, W.; Jentzen, A. (2019). "Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations". Journal of Nonlinear Science. 29 (4): 1563–1619. arXiv:1709.05963. doi:10.1007/s00332-018-9525-3. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:27" class="footnote-back-ref">↩</a></li>
<li id="fn:28">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:28" class="footnote-back-ref">↩</a></li>
<li id="fn:29">Beck, C.; E, W.; Jentzen, A. (2019). "Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations". Journal of Nonlinear Science. 29 (4): 1563–1619. arXiv:1709.05963. doi:10.1007/s00332-018-9525-3. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:29" class="footnote-back-ref">↩</a></li>
<li id="fn:30">Han, J.; Jentzen, A.; E, W. (2018). "Solving high-dimensional partial differential equations using deep learning". Proceedings of the National Academy of Sciences. 115 (34): 8505–8510. doi:10.1073/pnas.1718942115. PMC 6112690. PMID 30082389. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6112690</a> <a href="#fnref:30" class="footnote-back-ref">↩</a></li>
<li id="fn:31">Beck, C.; E, W.; Jentzen, A. (2019). "Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations". Journal of Nonlinear Science. 29 (4): 1563–1619. arXiv:1709.05963. doi:10.1007/s00332-018-9525-3. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:31" class="footnote-back-ref">↩</a></li>
</ol>

Deep backward stochastic differential equation method open-in-new

Deep backward stochastic differential equation method