Separation principle in stochastic control

The separation principle is one of the fundamental principles of <a href="/facts/Stochastic_control_theory/ETvJLl24">stochastic control theory</a>, which states that the problems of optimal control and state estimation can be decoupled under certain conditions. In its most basic formulation it deals with a linear stochastic system

d
                x
              
              
                
                =
                A
                (
                t
                )
                x
                (
                t
                )
                
                d
                t
                +
                
                  B
                  
                    1
                  
                
                (
                t
                )
                u
                (
                t
                )
                
                d
                t
                +
                
                  B
                  
                    2
                  
                
                (
                t
                )
                
                d
                w
              
            
            
              
                d
                y
              
              
                
                =
                C
                (
                t
                )
                x
                (
                t
                )
                
                d
                t
                +
                D
                (
                t
                )
                
                d
                w
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}dx&=A(t)x(t)\,dt+B_{1}(t)u(t)\,dt+B_{2}(t)\,dw\\dy&=C(t)x(t)\,dt+D(t)\,dw\end{aligned}}}

with a state process 
 
 
 
 x
 
 
 {\displaystyle x}
 
, an output process 
 
 
 
 y
 
 
 {\displaystyle y}
 
 and a control 
 
 
 
 u
 
 
 {\displaystyle u}
 
, where 
 
 
 
 w
 
 
 {\displaystyle w}
 
 is a vector-valued <a href="/facts/Wiener_process/KPh7vYd7">Wiener process</a>, 
 
 
 
 x
 (
 0
 )
 
 
 {\displaystyle x(0)}
 
 is a zero-mean <a href="/facts/Multivariate_normal_distribution/2Xfegqz2">Gaussian</a> random vector independent of 
 
 
 
 w
 
 
 {\displaystyle w}
 
, 
 
 
 
 y
 (
 0
 )
 =
 0
 
 
 {\displaystyle y(0)=0}
 
, and 
 
 
 
 A
 
 
 {\displaystyle A}
 
, 
 
 
 
 
 B
 
 1
 
 
 
 
 {\displaystyle B_{1}}
 
, 
 
 
 
 
 B
 
 2
 
 
 
 
 {\displaystyle B_{2}}
 
, 
 
 
 
 C
 
 
 {\displaystyle C}
 
, 
 
 
 
 D
 
 
 {\displaystyle D}
 
 are matrix-valued functions which generally are taken to be continuous of <a href="/facts/Bounded_variation/tIjQTpjh">bounded variation</a>. Moreover, 
 
 
 
 D
 
 D
 ′
 
 
 
 {\displaystyle DD'}
 
 is nonsingular on some interval 
 
 
 
 [
 0
 ,
 T
 ]
 
 
 {\displaystyle [0,T]}
 
. The problem is to design an output feedback law 
 
 
 
 π
 :
 
 y
 ↦
 u
 
 
 {\displaystyle \pi :\,y\mapsto u}
 
 which maps the observed process 
 
 
 
 y
 
 
 {\displaystyle y}
 
 to the control input 
 
 
 
 u
 
 
 {\displaystyle u}
 
 in a nonanticipatory manner so as to minimize the functional

J
        (
        u
        )
        =
        
          E
        
        
          {
          
            
              ∫
              
                0
              
              
                T
              
            
            x
            (
            t
            
              )
              ′
            
            Q
            (
            t
            )
            x
            (
            t
            )
            
            d
            t
            +
            
              ∫
              
                0
              
              
                T
              
            
            u
            (
            t
            
              )
              ′
            
            R
            (
            t
            )
            u
            (
            t
            )
            
            d
            t
            +
            x
            (
            T
            
              )
              ′
            
            S
            x
            (
            T
            )
          
          }
        
        ,
      
    
    {\displaystyle J(u)=\mathbb {E} \left\{\int _{0}^{T}x(t)'Q(t)x(t)\,dt+\int _{0}^{T}u(t)'R(t)u(t)\,dt+x(T)'Sx(T)\right\},}

where 
 
 
 
 
 E
 
 
 
 {\displaystyle \mathbb {E} }
 
 denotes <a href="/facts/Expected_value/1XV0JKL8">expected value</a>, prime (
 
 
 
 
 
 ′
 
 
 
 {\displaystyle '}
 
) denotes <a href="/facts/Transpose_matrix/8wmsagGS">transpose</a>. and 
 
 
 
 Q
 
 
 {\displaystyle Q}
 
 and 
 
 
 
 R
 
 
 {\displaystyle R}
 
 are continuous matrix functions of bounded variation, 
 
 
 
 Q
 (
 t
 )
 
 
 {\displaystyle Q(t)}
 
 is positive semi-definite and 
 
 
 
 R
 (
 t
 )
 
 
 {\displaystyle R(t)}
 
 is positive definite for all 
 
 
 
 t
 
 
 {\displaystyle t}
 
. Under suitable conditions, which need to be properly stated, the optimal policy 
 
 
 
 π
 
 
 {\displaystyle \pi }
 
 can be chosen in the form

u
        (
        t
        )
        =
        K
        (
        t
        )
        
          
            
              x
              ^
            
          
        
        (
        t
        )
        ,
      
    
    {\displaystyle u(t)=K(t){\hat {x}}(t),}

where 
 
 
 
 
 
 
 x
 ^
 
 
 
 (
 t
 )
 
 
 {\displaystyle {\hat {x}}(t)}
 
 is the linear least-squares estimate of the state vector 
 
 
 
 x
 (
 t
 )
 
 
 {\displaystyle x(t)}
 
 obtained from the <a href="/facts/Kalman_filter/wNK7rnbk">Kalman filter</a>

d
        
          
            
              x
              ^
            
          
        
        =
        A
        (
        t
        )
        
          
            
              x
              ^
            
          
        
        (
        t
        )
        
        d
        t
        +
        
          B
          
            1
          
        
        (
        t
        )
        u
        (
        t
        )
        
        d
        t
        +
        L
        (
        t
        )
        (
        d
        y
        −
        C
        (
        t
        )
        
          
            
              x
              ^
            
          
        
        (
        t
        )
        
        d
        t
        )
        ,
        
        
          
            
              x
              ^
            
          
        
        (
        0
        )
        =
        0
        ,
      
    
    {\displaystyle d{\hat {x}}=A(t){\hat {x}}(t)\,dt+B_{1}(t)u(t)\,dt+L(t)(dy-C(t){\hat {x}}(t)\,dt),\quad {\hat {x}}(0)=0,}

where 
 
 
 
 K
 
 
 {\displaystyle K}
 
 is the gain of the optimal <a href="/facts/Linear-quadratic_regulator/8YybEFpU">linear-quadratic regulator</a> obtained by taking 
 
 
 
 
 B
 
 2
 
 
 =
 D
 =
 0
 
 
 {\displaystyle B_{2}=D=0}
 
 and 
 
 
 
 x
 (
 0
 )
 
 
 {\displaystyle x(0)}
 
 deterministic, and where 
 
 
 
 L
 
 
 {\displaystyle L}
 
 is the <a href="/facts/Kalman_filter/wNK7rnbk">Kalman gain</a>. There is also a non-Gaussian version of this problem (to be discussed below) where the Wiener process 
 
 
 
 w
 
 
 {\displaystyle w}
 
 is replaced by a more general square-integrable martingale with possible jumps. In this case, the Kalman filter needs to be replaced by a nonlinear filter providing an estimate of the (strict sense) conditional mean

x
              ^
            
          
        
        (
        t
        )
        =
        E
        ⁡
        {
        x
        (
        t
        )
        ∣
        
          
            
              Y
            
          
          
            t
          
        
        }
        ,
      
    
    {\displaystyle {\hat {x}}(t)=\operatorname {E} \{x(t)\mid {\cal {Y}}_{t}\},}

where

Y
            
          
          
            t
          
        
        :=
        σ
        {
        y
        (
        τ
        )
        ,
        τ
        ∈
        [
        0
        ,
        t
        ]
        }
        ,
        
        0
        ≤
        t
        ≤
        T
        ,
      
    
    {\displaystyle {\cal {Y}}_{t}:=\sigma \{y(\tau ),\tau \in [0,t]\},\quad 0\leq t\leq T,}

is the filtration generated by the output process; i.e., the family of increasing sigma fields representing the data as it is produced.
In the early literature on the separation principle it was common to allow as admissible controls 
 
 
 
 u
 
 
 {\displaystyle u}
 
 all processes that are adapted to the filtration 
 
 
 
 {
 
 
 
 Y
 
 
 
 t
 
 
 ,
 
 0
 ≤
 t
 ≤
 T
 }
 
 
 {\displaystyle \{{\cal {Y}}_{t},\,0\leq t\leq T\}}
 
. This is equivalent to allowing all non-anticipatory <a href="/facts/Borel_function/NgIHnb1b">Borel functions</a> as feedback laws, which raises the question of existence of a unique solution to the equations of the feedback loop. Moreover, one needs to exclude the possibility that a nonlinear controller extracts more information from the data than what is possible with a linear control law.

Separation principle in stochastic control open-in-new

Separation principle in stochastic control