Reproducing kernel Hilbert space

<h2 id="definition">Definition</h2>
Let 
 
 
 
 X
 
 
 {\displaystyle X}
 
 be an arbitrary <a href="/facts/Set_(mathematics)/BfucfHMq">set</a> and 
 
 
 
 H
 
 
 {\displaystyle H}
 
 a <a href="/facts/Hilbert_space/aDFMh4lV">Hilbert space</a> of <a href="/facts/Real-valued_function/jetqlJpK">real-valued functions</a> on 
 
 
 
 X
 
 
 {\displaystyle X}
 
, equipped with pointwise addition and pointwise scalar multiplication. The <a href="/facts/Cartesian_closed_category/mPXAQps2">evaluation</a> functional over the Hilbert space of functions 
 
 
 
 H
 
 
 {\displaystyle H}
 
 is a linear functional that evaluates each function at a point 
 
 
 
 x
 
 
 {\displaystyle x}
 
,

L
          
            x
          
        
        :
        f
        ↦
        f
        (
        x
        )
        
           
        
        ∀
        f
        ∈
        H
        .
      
    
    {\displaystyle L_{x}:f\mapsto f(x){\text{   }}\forall f\in H.}

We say that H is a reproducing kernel Hilbert space if, for all 
 
 
 
 x
 
 
 {\displaystyle x}
 
 in 
 
 
 
 X
 
 
 {\displaystyle X}
 
, 
 
 
 
 
 L
 
 x
 
 
 
 
 {\displaystyle L_{x}}
 
 is <a href="/facts/Continuous_function_(topology)/sKbl02pB">continuous</a> at every 
 
 
 
 f
 
 
 {\displaystyle f}
 
 in 
 
 
 
 H
 
 
 {\displaystyle H}
 
 or, equivalently, if 
 
 
 
 
 L
 
 x
 
 
 
 
 {\displaystyle L_{x}}
 
 is a <a href="/facts/Bounded_operator/LYmDTIqR">bounded operator</a> on 
 
 
 
 H
 
 
 {\displaystyle H}
 
, i.e. there exists some 
 
 
 
 
 M
 
 x
 
 
 >
 0
 
 
 {\displaystyle M_{x}>0}
 
 such that

<table><tbody><tr><td> | L x ( f ) | := | f ( x ) | ≤ M x ‖ f ‖ H ∀ f ∈ H . {\displaystyle |L_{x}(f)|:=|f(x)|\leq M_{x}\,\|f\|_{H}\qquad \forall f\in H.\,} </td> <td></td> <td>1</td></tr></tbody></table>
Although 
 
 
 
 
 M
 
 x
 
 
 <
 ∞
 
 
 {\displaystyle M_{x}<\infty }
 
 is assumed for all 
 
 
 
 x
 ∈
 X
 
 
 {\displaystyle x\in X}
 
, it might still be the case that 
 
 
 
 
 sup
 
 x
 
 
 
 M
 
 x
 
 
 =
 ∞
 
 
 {\textstyle \sup _{x}M_{x}=\infty }
 
.
While property (1) is the weakest condition that ensures both the existence of an inner product and the evaluation of every function in 
 
 
 
 H
 
 
 {\displaystyle H}
 
 at every point in the domain, it does not lend itself to easy application in practice. A more intuitive definition of the RKHS can be obtained by observing that this property guarantees that the evaluation functional can be represented by taking the inner product of 
 
 
 
 f
 
 
 {\displaystyle f}
 
 with a function 
 
 
 
 
 K
 
 x
 
 
 
 
 {\displaystyle K_{x}}
 
 in 
 
 
 
 H
 
 
 {\displaystyle H}
 
. This function is the so-called reproducing kernel for the Hilbert space 
 
 
 
 H
 
 
 {\displaystyle H}
 
 from which the RKHS takes its name. More formally, the <a href="/facts/Riesz_representation_theorem/9CjKDUII">Riesz representation theorem</a> implies that for all 
 
 
 
 x
 
 
 {\displaystyle x}
 
 in 
 
 
 
 X
 
 
 {\displaystyle X}
 
 there exists a unique element 
 
 
 
 
 K
 
 x
 
 
 
 
 {\displaystyle K_{x}}
 
 of 
 
 
 
 H
 
 
 {\displaystyle H}
 
 with the reproducing property,

<table><tbody><tr><td> f ( x ) = L x ( f ) = ⟨ f ,   K x ⟩ H ∀ f ∈ H . {\displaystyle f(x)=L_{x}(f)=\langle f,\ K_{x}\rangle _{H}\quad \forall f\in H.} </td> <td></td> <td>2</td></tr></tbody></table>
Since 
 
 
 
 
 K
 
 x
 
 
 
 
 {\displaystyle K_{x}}
 
 is itself a function defined on 
 
 
 
 X
 
 
 {\displaystyle X}
 
 with values in the field 
 
 
 
 
 R
 
 
 
 {\displaystyle \mathbb {R} }
 
 (or 
 
 
 
 
 C
 
 
 
 {\displaystyle \mathbb {C} }
 
 in the case of complex Hilbert spaces) and as 
 
 
 
 
 K
 
 x
 
 
 
 
 {\displaystyle K_{x}}
 
 is in 
 
 
 
 H
 
 
 {\displaystyle H}
 
 we have that

K
          
            x
          
        
        (
        y
        )
        =
        
          L
          
            y
          
        
        (
        
          K
          
            x
          
        
        )
        =
        ⟨
        
          K
          
            x
          
        
        ,
         
        
          K
          
            y
          
        
        
          ⟩
          
            H
          
        
        ,
      
    
    {\displaystyle K_{x}(y)=L_{y}(K_{x})=\langle K_{x},\ K_{y}\rangle _{H},}

where 
 
 
 
 
 K
 
 y
 
 
 ∈
 H
 
 
 {\displaystyle K_{y}\in H}
 
 is the element in 
 
 
 
 H
 
 
 {\displaystyle H}
 
 associated to 
 
 
 
 
 L
 
 y
 
 
 
 
 {\displaystyle L_{y}}
 
.
This allows us to define the reproducing kernel of 
 
 
 
 H
 
 
 {\displaystyle H}
 
 as a function 
 
 
 
 K
 :
 X
 ×
 X
 →
 
 R
 
 
 
 {\displaystyle K:X\times X\to \mathbb {R} }
 
 (or 
 
 
 
 
 C
 
 
 
 {\displaystyle \mathbb {C} }
 
 in the complex case) by

K
        (
        x
        ,
        y
        )
        =
        ⟨
        
          K
          
            x
          
        
        ,
         
        
          K
          
            y
          
        
        
          ⟩
          
            H
          
        
        .
      
    
    {\displaystyle K(x,y)=\langle K_{x},\ K_{y}\rangle _{H}.}

From this definition it is easy to see that 
 
 
 
 K
 :
 X
 ×
 X
 →
 
 R
 
 
 
 {\displaystyle K:X\times X\to \mathbb {R} }
 
 (or 
 
 
 
 
 C
 
 
 
 {\displaystyle \mathbb {C} }
 
 in the complex case) is both symmetric (resp. conjugate symmetric) and <a href="/facts/Positive_definite/5Jvw9A5o">positive definite</a>, i.e.

∑
          
            i
            ,
            j
            =
            1
          
          
            n
          
        
        
          c
          
            i
          
        
        
          c
          
            j
          
        
        K
        (
        
          x
          
            i
          
        
        ,
        
          x
          
            j
          
        
        )
        =
        
          ∑
          
            i
            =
            1
          
          
            n
          
        
        
          c
          
            i
          
        
        
          
            ⟨
            
              
                K
                
                  
                    x
                    
                      i
                    
                  
                
              
              ,
              
                ∑
                
                  j
                  =
                  1
                
                
                  n
                
              
              
                c
                
                  j
                
              
              
                K
                
                  
                    x
                    
                      j
                    
                  
                
              
            
            ⟩
          
          
            H
          
        
        =
        
          
            ⟨
            
              
                ∑
                
                  i
                  =
                  1
                
                
                  n
                
              
              
                c
                
                  i
                
              
              
                K
                
                  
                    x
                    
                      i
                    
                  
                
              
              ,
              
                ∑
                
                  j
                  =
                  1
                
                
                  n
                
              
              
                c
                
                  j
                
              
              
                K
                
                  
                    x
                    
                      j
                    
                  
                
              
            
            ⟩
          
          
            H
          
        
        =
        
          
            ‖
            
              
                ∑
                
                  i
                  =
                  1
                
                
                  n
                
              
              
                c
                
                  i
                
              
              
                K
                
                  
                    x
                    
                      i
                    
                  
                
              
            
            ‖
          
          
            H
          
          
            2
          
        
        ≥
        0
      
    
    {\displaystyle \sum _{i,j=1}^{n}c_{i}c_{j}K(x_{i},x_{j})=\sum _{i=1}^{n}c_{i}\left\langle K_{x_{i}},\sum _{j=1}^{n}c_{j}K_{x_{j}}\right\rangle _{H}=\left\langle \sum _{i=1}^{n}c_{i}K_{x_{i}},\sum _{j=1}^{n}c_{j}K_{x_{j}}\right\rangle _{H}=\left\|\sum _{i=1}^{n}c_{i}K_{x_{i}}\right\|_{H}^{2}\geq 0}

for every 
 
 
 
 n
 ∈
 
 N
 
 ,
 
 x
 
 1
 
 
 ,
 …
 ,
 
 x
 
 n
 
 
 ∈
 X
 ,
 
  and 
 
 
 c
 
 1
 
 
 ,
 …
 ,
 
 c
 
 n
 
 
 ∈
 
 R
 
 .
 
 
 {\displaystyle n\in \mathbb {N} ,x_{1},\dots ,x_{n}\in X,{\text{ and }}c_{1},\dots ,c_{n}\in \mathbb {R} .}
 
<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a> The Moore–Aronszajn theorem (see below) is a sort of converse to this: if a function 
 
 
 
 K
 
 
 {\displaystyle K}
 
 satisfies these conditions then there is a Hilbert space of functions on 
 
 
 
 X
 
 
 {\displaystyle X}
 
 for which it is a reproducing kernel.

<h2 id="examples">Examples</h2>
The simplest example of a reproducing kernel Hilbert space is the space 
 
 
 
 
 L
 
 2
 
 
 (
 X
 ,
 μ
 )
 
 
 {\displaystyle L^{2}(X,\mu )}
 
 where 
 
 
 
 X
 
 
 {\displaystyle X}
 
 is a set and 
 
 
 
 μ
 
 
 {\displaystyle \mu }
 
 is the <a href="/facts/Counting_measure/TA4u39yF">counting measure</a> on 
 
 
 
 X
 
 
 {\displaystyle X}
 
. For 
 
 
 
 x
 ∈
 X
 
 
 {\displaystyle x\in X}
 
, the reproducing kernel 
 
 
 
 
 K
 
 x
 
 
 
 
 {\displaystyle K_{x}}
 
 is the <a href="/facts/Indicator_function/QEuh04NM">indicator function</a> of the one point set 
 
 
 
 {
 x
 }
 ⊂
 X
 
 
 {\displaystyle \{x\}\subset X}
 
.
Nontrivial reproducing kernel Hilbert spaces often involve <a href="/facts/Analytic_function/JhL3z8FP">analytic functions</a>, as we now illustrate by example. Consider the Hilbert space of <a href="/facts/Bandlimiting/LrYmxzlA">bandlimited</a> <a href="/facts/Continuous_function/sKbl02pB">continuous functions</a> 
 
 
 
 H
 
 
 {\displaystyle H}
 
. Fix some <a href="/facts/Cutoff_frequency/vc8veeXp">cutoff frequency</a> 
 
 
 
 0
 <
 a
 <
 ∞
 
 
 {\displaystyle 0<a<\infty }
 
 and define the Hilbert space

H
        =
        {
        f
        ∈
        
          L
          
            2
          
        
        (
        
          R
        
        )
        ∣
        supp
        ⁡
        (
        F
        )
        ⊂
        [
        −
        a
        ,
        a
        ]
        }
      
    
    {\displaystyle H=\{f\in L^{2}(\mathbb {R} )\mid \operatorname {supp} (F)\subset [-a,a]\}}

where 
 
 
 
 
 L
 
 2
 
 
 (
 
 R
 
 )
 
 
 {\displaystyle L^{2}(\mathbb {R} )}
 
 is the set of square integrable functions, and 
 
 
 
 F
 (
 ω
 )
 =
 
 ∫
 
 −
 ∞
 
 
 ∞
 
 
 f
 (
 t
 )
 
 e
 
 −
 i
 ω
 t
 
 
 
 d
 t
 
 
 {\textstyle F(\omega )=\int _{-\infty }^{\infty }f(t)e^{-i\omega t}\,dt}
 
 is the <a href="/facts/Fourier_transform/XXhKJtc8">Fourier transform</a> of 
 
 
 
 f
 
 
 {\displaystyle f}
 
. As the inner product, we use

⟨
        f
        ,
        g
        
          ⟩
          
            
              L
              
                2
              
            
          
        
        =
        
          ∫
          
            −
            ∞
          
          
            ∞
          
        
        f
        (
        x
        )
        ⋅
        
          
            
              g
              (
              x
              )
            
            ¯
          
        
        
        d
        x
        .
      
    
    {\displaystyle \langle f,g\rangle _{L^{2}}=\int _{-\infty }^{\infty }f(x)\cdot {\overline {g(x)}}\,dx.}

Since this is a closed subspace of 
 
 
 
 
 L
 
 2
 
 
 (
 
 R
 
 )
 
 
 {\displaystyle L^{2}(\mathbb {R} )}
 
, it is a Hilbert space. Moreover, the elements of 
 
 
 
 H
 
 
 {\displaystyle H}
 
 are smooth functions on 
 
 
 
 
 R
 
 
 
 {\displaystyle \mathbb {R} }
 
 that tend to zero at infinity, essentially by the <a href="/facts/Riemann-Lebesgue_lemma/Dd03bg0z">Riemann-Lebesgue lemma</a>. In fact, the elements of 
 
 
 
 H
 
 
 {\displaystyle H}
 
 are the restrictions to 
 
 
 
 
 R
 
 
 
 {\displaystyle \mathbb {R} }
 
 of entire <a href="/facts/Holomorphic_function/kVzthtEf">holomorphic functions</a>, by the <a href="/facts/Paley%25E2%2580%2593Wiener_theorem/02pKjt9F">Paley–Wiener theorem</a>.
From the <a href="/facts/Fourier_inversion_theorem/aM1aMkf0">Fourier inversion theorem</a>, we have

f
        (
        x
        )
        =
        
          
            1
            
              2
              π
            
          
        
        
          ∫
          
            −
            a
          
          
            a
          
        
        F
        (
        ω
        )
        
          e
          
            i
            x
            ω
          
        
        
        d
        ω
        .
      
    
    {\displaystyle f(x)={\frac {1}{2\pi }}\int _{-a}^{a}F(\omega )e^{ix\omega }\,d\omega .}

It then follows by the <a href="/facts/Cauchy%25E2%2580%2593Schwarz_inequality/poMzQUIY">Cauchy–Schwarz inequality</a> and <a href="/facts/Plancherel%2527s_theorem/IlpoAySd">Plancherel's theorem</a> that, for all 
 
 
 
 x
 
 
 {\displaystyle x}
 
,

|
        
        f
        (
        x
        )
        
          |
        
        ≤
        
          
            1
            
              2
              π
            
          
        
        
          
            2
            a
            
              ∫
              
                −
                a
              
              
                a
              
            
            
              |
            
            F
            (
            ω
            )
            
              
                |
              
              
                2
              
            
            
            d
            ω
          
        
        =
        
          
            
              2
              a
            
            
              2
              π
            
          
        
        
          
            
              ∫
              
                −
                ∞
              
              
                ∞
              
            
            
              |
            
            F
            (
            ω
            )
            
              
                |
              
              
                2
              
            
            
            d
            ω
          
        
        =
        
          
            
              a
              π
            
          
        
        ‖
        f
        
          ‖
          
            
              L
              
                2
              
            
          
        
        .
      
    
    {\displaystyle |f(x)|\leq {\frac {1}{2\pi }}{\sqrt {2a\int _{-a}^{a}|F(\omega )|^{2}\,d\omega }}={\frac {\sqrt {2a}}{2\pi }}{\sqrt {\int _{-\infty }^{\infty }|F(\omega )|^{2}\,d\omega }}={\sqrt {\frac {a}{\pi }}}\|f\|_{L^{2}}.}

This inequality shows that the evaluation functional is bounded, proving that 
 
 
 
 H
 
 
 {\displaystyle H}
 
 is indeed a RKHS.
The kernel function 
 
 
 
 
 K
 
 x
 
 
 
 
 {\displaystyle K_{x}}
 
 in this case is given by

K
          
            x
          
        
        (
        y
        )
        =
        
          
            a
            π
          
        
        sinc
        ⁡
        
          (
          
            
              
                a
                π
              
            
            (
            y
            −
            x
            )
          
          )
        
        =
        
          
            
              sin
              ⁡
              (
              a
              (
              y
              −
              x
              )
              )
            
            
              π
              (
              y
              −
              x
              )
            
          
        
        .
      
    
    {\displaystyle K_{x}(y)={\frac {a}{\pi }}\operatorname {sinc} \left({\frac {a}{\pi }}(y-x)\right)={\frac {\sin(a(y-x))}{\pi (y-x)}}.}

The Fourier transform of 
 
 
 
 
 K
 
 x
 
 
 (
 y
 )
 
 
 {\displaystyle K_{x}(y)}
 
 defined above is given by

∫
          
            −
            ∞
          
          
            ∞
          
        
        
          K
          
            x
          
        
        (
        y
        )
        
          e
          
            −
            i
            ω
            y
          
        
        
        d
        y
        =
        
          
            {
            
              
                
                  
                    e
                    
                      −
                      i
                      ω
                      x
                    
                  
                
                
                  
                    if 
                  
                  ω
                  ∈
                  [
                  −
                  a
                  ,
                  a
                  ]
                  ,
                
              
              
                
                  0
                
                
                  
                    
                      otherwise
                    
                  
                  ,
                
              
            
            
          
        
      
    
    {\displaystyle \int _{-\infty }^{\infty }K_{x}(y)e^{-i\omega y}\,dy={\begin{cases}e^{-i\omega x}&{\text{if }}\omega \in [-a,a],\\0&{\textrm {otherwise}},\end{cases}}}

which is a consequence of the <a href="/facts/Fourier_transform/XXhKJtc8">time-shifting property of the Fourier transform</a>. Consequently, using <a href="/facts/Plancherel%2527s_theorem/IlpoAySd">Plancherel's theorem</a>, we have

⟨
        f
        ,
        
          K
          
            x
          
        
        
          ⟩
          
            
              L
              
                2
              
            
          
        
        =
        
          ∫
          
            −
            ∞
          
          
            ∞
          
        
        f
        (
        y
        )
        ⋅
        
          
            
              
                K
                
                  x
                
              
              (
              y
              )
            
            ¯
          
        
        
        d
        y
        =
        
          
            1
            
              2
              π
            
          
        
        
          ∫
          
            −
            a
          
          
            a
          
        
        F
        (
        ω
        )
        ⋅
        
          e
          
            i
            ω
            x
          
        
        
        d
        ω
        =
        f
        (
        x
        )
        .
      
    
    {\displaystyle \langle f,K_{x}\rangle _{L^{2}}=\int _{-\infty }^{\infty }f(y)\cdot {\overline {K_{x}(y)}}\,dy={\frac {1}{2\pi }}\int _{-a}^{a}F(\omega )\cdot e^{i\omega x}\,d\omega =f(x).}

Thus we obtain the reproducing property of the kernel.

 
 
 
 
 K
 
 x
 
 
 
 
 {\displaystyle K_{x}}
 
 in this case is the "bandlimited version" of the <a href="/facts/Dirac_delta_function/V8PjRP8T">Dirac delta function</a>, and that 
 
 
 
 
 K
 
 x
 
 
 (
 y
 )
 
 
 {\displaystyle K_{x}(y)}
 
 converges to 
 
 
 
 δ
 (
 y
 −
 x
 )
 
 
 {\displaystyle \delta (y-x)}
 
 in the weak sense as the cutoff frequency 
 
 
 
 a
 
 
 {\displaystyle a}
 
 tends to infinity.

<h2 id="moorearonszajn-theorem">Moore–Aronszajn theorem</h2>
We have seen how a reproducing kernel Hilbert space defines a reproducing kernel function that is both symmetric and <a href="/facts/Positive_definite_kernel/Ws3zkxfl">positive definite</a>. The Moore–Aronszajn theorem goes in the other direction; it states that every symmetric, positive definite kernel defines a unique reproducing kernel Hilbert space. The theorem first appeared in Aronszajn's Theory of Reproducing Kernels, although he attributes it to <a href="/facts/E._H._Moore/1lJXH5jn">E. H. Moore</a>.

Theorem. Suppose K is a symmetric, <a href="/facts/Positive_definite_kernel/Ws3zkxfl">positive definite kernel</a> on a set X. Then there is a unique Hilbert space of functions on X for which K is a reproducing kernel.
Proof. For all x in X, define Kx = K(x, ⋅ ). Let H0 be the linear span of {Kx : x ∈ X}. Define an inner product on H0 by

⟨
            
              
                ∑
                
                  j
                  =
                  1
                
                
                  n
                
              
              
                b
                
                  j
                
              
              
                K
                
                  
                    y
                    
                      j
                    
                  
                
              
              ,
              
                ∑
                
                  i
                  =
                  1
                
                
                  m
                
              
              
                a
                
                  i
                
              
              
                K
                
                  
                    x
                    
                      i
                    
                  
                
              
            
            ⟩
          
          
            
              H
              
                0
              
            
          
        
        =
        
          ∑
          
            i
            =
            1
          
          
            m
          
        
        
          ∑
          
            j
            =
            1
          
          
            n
          
        
        
          
            a
            
              i
            
          
        
        
          b
          
            j
          
        
        K
        (
        
          y
          
            j
          
        
        ,
        
          x
          
            i
          
        
        )
        ,
      
    
    {\displaystyle \left\langle \sum _{j=1}^{n}b_{j}K_{y_{j}},\sum _{i=1}^{m}a_{i}K_{x_{i}}\right\rangle _{H_{0}}=\sum _{i=1}^{m}\sum _{j=1}^{n}{a_{i}}b_{j}K(y_{j},x_{i}),}

which implies 
 
 
 
 K
 (
 x
 ,
 y
 )
 =
 
 
 ⟨
 
 
 K
 
 x
 
 
 ,
 
 K
 
 y
 
 
 
 ⟩
 
 
 
 H
 
 0
 
 
 
 
 
 
 {\displaystyle K(x,y)=\left\langle K_{x},K_{y}\right\rangle _{H_{0}}}
 
.
The symmetry of this inner product follows from the symmetry of K and the non-degeneracy follows from the fact that K is positive definite.
Let H be the <a href="/facts/Completion_(metric_space)/lsckmU8G">completion</a> of H0 with respect to this inner product. Then H consists of functions of the form

f
        (
        x
        )
        =
        
          ∑
          
            i
            =
            1
          
          
            ∞
          
        
        
          a
          
            i
          
        
        
          K
          
            
              x
              
                i
              
            
          
        
        (
        x
        )
        
        
          where
        
        
        
          lim
          
            n
            →
            ∞
          
        
        
          sup
          
            p
            ≥
            0
          
        
        
          
            ‖
            
              
                ∑
                
                  i
                  =
                  n
                
                
                  n
                  +
                  p
                
              
              
                a
                
                  i
                
              
              
                K
                
                  
                    x
                    
                      i
                    
                  
                
              
            
            ‖
          
          
            
              H
              
                0
              
            
          
        
        =
        0.
      
    
    {\displaystyle f(x)=\sum _{i=1}^{\infty }a_{i}K_{x_{i}}(x)\quad {\text{where}}\quad \lim _{n\to \infty }\sup _{p\geq 0}\left\|\sum _{i=n}^{n+p}a_{i}K_{x_{i}}\right\|_{H_{0}}=0.}

Now we can check the reproducing property (2):

⟨
        f
        ,
        
          K
          
            x
          
        
        
          ⟩
          
            H
          
        
        =
        
          ∑
          
            i
            =
            1
          
          
            ∞
          
        
        
          a
          
            i
          
        
        
          
            ⟨
            
              
                K
                
                  
                    x
                    
                      i
                    
                  
                
              
              ,
              
                K
                
                  x
                
              
            
            ⟩
          
          
            
              H
              
                0
              
            
          
        
        =
        
          ∑
          
            i
            =
            1
          
          
            ∞
          
        
        
          a
          
            i
          
        
        K
        (
        
          x
          
            i
          
        
        ,
        x
        )
        =
        f
        (
        x
        )
        .
      
    
    {\displaystyle \langle f,K_{x}\rangle _{H}=\sum _{i=1}^{\infty }a_{i}\left\langle K_{x_{i}},K_{x}\right\rangle _{H_{0}}=\sum _{i=1}^{\infty }a_{i}K(x_{i},x)=f(x).}

To prove uniqueness, let G be another Hilbert space of functions for which K is a reproducing kernel. For every x and y in X, (2) implies that

⟨
        
          K
          
            x
          
        
        ,
        
          K
          
            y
          
        
        
          ⟩
          
            H
          
        
        =
        K
        (
        x
        ,
        y
        )
        =
        ⟨
        
          K
          
            x
          
        
        ,
        
          K
          
            y
          
        
        
          ⟩
          
            G
          
        
        .
      
    
    {\displaystyle \langle K_{x},K_{y}\rangle _{H}=K(x,y)=\langle K_{x},K_{y}\rangle _{G}.}

By linearity, 
 
 
 
 ⟨
 ⋅
 ,
 ⋅
 
 ⟩
 
 H
 
 
 =
 ⟨
 ⋅
 ,
 ⋅
 
 ⟩
 
 G
 
 
 
 
 {\displaystyle \langle \cdot ,\cdot \rangle _{H}=\langle \cdot ,\cdot \rangle _{G}}
 
 on the span of 
 
 
 
 {
 
 K
 
 x
 
 
 :
 x
 ∈
 X
 }
 
 
 {\displaystyle \{K_{x}:x\in X\}}
 
. Then 
 
 
 
 H
 ⊂
 G
 
 
 {\displaystyle H\subset G}
 
 because G is complete and contains H0 and hence contains its completion.
Now we need to prove that every element of G is in H. Let 
 
 
 
 f
 
 
 {\displaystyle f}
 
 be an element of G. Since H is a closed subspace of G, we can write 
 
 
 
 f
 =
 
 f
 
 H
 
 
 +
 
 f
 
 
 H
 
 ⊥
 
 
 
 
 
 
 {\displaystyle f=f_{H}+f_{H^{\bot }}}
 
 where 
 
 
 
 
 f
 
 H
 
 
 ∈
 H
 
 
 {\displaystyle f_{H}\in H}
 
 and 
 
 
 
 
 f
 
 
 H
 
 ⊥
 
 
 
 
 ∈
 
 H
 
 ⊥
 
 
 
 
 {\displaystyle f_{H^{\bot }}\in H^{\bot }}
 
. Now if 
 
 
 
 x
 ∈
 X
 
 
 {\displaystyle x\in X}
 
 then, since K is a reproducing kernel of G and H:

f
        (
        x
        )
        =
        ⟨
        
          K
          
            x
          
        
        ,
        f
        
          ⟩
          
            G
          
        
        =
        ⟨
        
          K
          
            x
          
        
        ,
        
          f
          
            H
          
        
        
          ⟩
          
            G
          
        
        +
        ⟨
        
          K
          
            x
          
        
        ,
        
          f
          
            
              H
              
                ⊥
              
            
          
        
        
          ⟩
          
            G
          
        
        =
        ⟨
        
          K
          
            x
          
        
        ,
        
          f
          
            H
          
        
        
          ⟩
          
            G
          
        
        =
        ⟨
        
          K
          
            x
          
        
        ,
        
          f
          
            H
          
        
        
          ⟩
          
            H
          
        
        =
        
          f
          
            H
          
        
        (
        x
        )
        ,
      
    
    {\displaystyle f(x)=\langle K_{x},f\rangle _{G}=\langle K_{x},f_{H}\rangle _{G}+\langle K_{x},f_{H^{\bot }}\rangle _{G}=\langle K_{x},f_{H}\rangle _{G}=\langle K_{x},f_{H}\rangle _{H}=f_{H}(x),}

where we have used the fact that 
 
 
 
 
 K
 
 x
 
 
 
 
 {\displaystyle K_{x}}
 
 belongs to H so that its inner product with 
 
 
 
 
 f
 
 
 H
 
 ⊥
 
 
 
 
 
 
 {\displaystyle f_{H^{\bot }}}
 
 in G is zero.
This shows that 
 
 
 
 f
 =
 
 f
 
 H
 
 
 
 
 {\displaystyle f=f_{H}}
 
 in G and concludes the proof.

<h2 id="integral-operators-and-mercers-theorem">Integral operators and Mercer's theorem</h2>
We may characterize a symmetric positive definite kernel 
 
 
 
 K
 
 
 {\displaystyle K}
 
 via the integral operator using <a href="/facts/Mercer%2527s_theorem/6CmBHnHc">Mercer's theorem</a> and obtain an additional view of the RKHS. Let 
 
 
 
 X
 
 
 {\displaystyle X}
 
 be a compact space equipped with a strictly positive finite <a href="/facts/Borel_measure/8cYR8Ysw">Borel measure</a> 
 
 
 
 μ
 
 
 {\displaystyle \mu }
 
 and 
 
 
 
 K
 :
 X
 ×
 X
 →
 
 R
 
 
 
 {\displaystyle K:X\times X\to \mathbb {R} }
 
 a continuous, symmetric, and positive definite function. Define the integral operator 
 
 
 
 
 T
 
 K
 
 
 :
 
 L
 
 2
 
 
 (
 X
 )
 →
 
 L
 
 2
 
 
 (
 X
 )
 
 
 {\displaystyle T_{K}:L_{2}(X)\to L_{2}(X)}
 
 as

[
        
          T
          
            K
          
        
        f
        ]
        (
        ⋅
        )
        =
        
          ∫
          
            X
          
        
        K
        (
        ⋅
        ,
        t
        )
        f
        (
        t
        )
        
        d
        μ
        (
        t
        )
      
    
    {\displaystyle [T_{K}f](\cdot )=\int _{X}K(\cdot ,t)f(t)\,d\mu (t)}

where 
 
 
 
 
 L
 
 2
 
 
 (
 X
 )
 
 
 {\displaystyle L_{2}(X)}
 
 is the space of square integrable functions with respect to 
 
 
 
 μ
 
 
 {\displaystyle \mu }
 
.
Mercer's theorem states that the spectral decomposition of the integral operator 
 
 
 
 
 T
 
 K
 
 
 
 
 {\displaystyle T_{K}}
 
 of 
 
 
 
 K
 
 
 {\displaystyle K}
 
 yields a series representation of 
 
 
 
 K
 
 
 {\displaystyle K}
 
 in terms of the eigenvalues and eigenfunctions of 
 
 
 
 
 T
 
 K
 
 
 
 
 {\displaystyle T_{K}}
 
. This then implies that 
 
 
 
 K
 
 
 {\displaystyle K}
 
 is a reproducing kernel so that the corresponding RKHS can be defined in terms of these eigenvalues and eigenfunctions. We provide the details below.
Under these assumptions 
 
 
 
 
 T
 
 K
 
 
 
 
 {\displaystyle T_{K}}
 
 is a compact, continuous, self-adjoint, and positive operator. The <a href="/facts/Spectral_theorem/Vqc04B21">spectral theorem</a> for self-adjoint operators implies that there is an at most countable decreasing sequence 
 
 
 
 (
 
 σ
 
 i
 
 
 
 )
 
 i
 
 
 ≥
 0
 
 
 {\displaystyle (\sigma _{i})_{i}\geq 0}
 
 such that 
 
 
 
 
 lim
 
 i
 →
 ∞
 
 
 
 σ
 
 i
 
 
 =
 0
 
 
 {\textstyle \lim _{i\to \infty }\sigma _{i}=0}
 
 and

T
 
 K
 
 
 
 φ
 
 i
 
 
 (
 x
 )
 =
 
 σ
 
 i
 
 
 
 φ
 
 i
 
 
 (
 x
 )
 
 
 {\displaystyle T_{K}\varphi _{i}(x)=\sigma _{i}\varphi _{i}(x)}
 
, where the 
 
 
 
 {
 
 φ
 
 i
 
 
 }
 
 
 {\displaystyle \{\varphi _{i}\}}
 
 form an orthonormal basis of 
 
 
 
 
 L
 
 2
 
 
 (
 X
 )
 
 
 {\displaystyle L_{2}(X)}
 
. By the positivity of 
 
 
 
 
 T
 
 K
 
 
 ,
 
 σ
 
 i
 
 
 >
 0
 
 
 {\displaystyle T_{K},\sigma _{i}>0}
 
 for all 
 
 
 
 i
 .
 
 
 {\displaystyle i.}
 
 One can also show that 
 
 
 
 
 T
 
 K
 
 
 
 
 {\displaystyle T_{K}}
 
 maps continuously into the space of continuous functions 
 
 
 
 C
 (
 X
 )
 
 
 {\displaystyle C(X)}
 
 and therefore we may choose continuous functions as the eigenvectors, that is, 
 
 
 
 
 φ
 
 i
 
 
 ∈
 C
 (
 X
 )
 
 
 {\displaystyle \varphi _{i}\in C(X)}
 
 for all 
 
 
 
 i
 .
 
 
 {\displaystyle i.}
 
 Then by Mercer's theorem 
 
 
 
 K
 
 
 {\displaystyle K}
 
 may be written in terms of the eigenvalues and continuous eigenfunctions as

K
        (
        x
        ,
        y
        )
        =
        
          ∑
          
            j
            =
            1
          
          
            ∞
          
        
        
          σ
          
            j
          
        
        
        
          φ
          
            j
          
        
        (
        x
        )
        
        
          φ
          
            j
          
        
        (
        y
        )
      
    
    {\displaystyle K(x,y)=\sum _{j=1}^{\infty }\sigma _{j}\,\varphi _{j}(x)\,\varphi _{j}(y)}

for all 
 
 
 
 x
 ,
 y
 ∈
 X
 
 
 {\displaystyle x,y\in X}
 
 such that

lim
          
            n
            →
            ∞
          
        
        
          sup
          
            u
            ,
            v
          
        
        
          |
          
            K
            (
            u
            ,
            v
            )
            −
            
              ∑
              
                j
                =
                1
              
              
                n
              
            
            
              σ
              
                j
              
            
            
            
              φ
              
                j
              
            
            (
            u
            )
            
            
              φ
              
                j
              
            
            (
            v
            )
          
          |
        
        =
        0.
      
    
    {\displaystyle \lim _{n\to \infty }\sup _{u,v}\left|K(u,v)-\sum _{j=1}^{n}\sigma _{j}\,\varphi _{j}(u)\,\varphi _{j}(v)\right|=0.}

This above series representation is referred to as a Mercer kernel or Mercer representation of 
 
 
 
 K
 
 
 {\displaystyle K}
 
.
Furthermore, it can be shown that the RKHS 
 
 
 
 H
 
 
 {\displaystyle H}
 
 of 
 
 
 
 K
 
 
 {\displaystyle K}
 
 is given by

H
 =
 
 {
 
 f
 ∈
 
 L
 
 2
 
 
 (
 X
 )
 
 
 
 |
 
 
 
 
 ∑
 
 i
 =
 1
 
 
 ∞
 
 
 
 
 
 
 ⟨
 
 f
 ,
 
 φ
 
 i
 
 
 
 ⟩
 
 
 
 L
 
 2
 
 
 
 
 2
 
 
 
 σ
 
 i
 
 
 
 
 <
 ∞
 
 }
 
 
 
 {\displaystyle H=\left\{f\in L_{2}(X)\,{\Bigg \vert }\,\sum _{i=1}^{\infty }{\frac {\left\langle f,\varphi _{i}\right\rangle _{L_{2}}^{2}}{\sigma _{i}}}<\infty \right\}}

where the inner product of 
 
 
 
 H
 
 
 {\displaystyle H}
 
 given by

⟨
            
              f
              ,
              g
            
            ⟩
          
          
            H
          
        
        =
        
          ∑
          
            i
            =
            1
          
          
            ∞
          
        
        
          
            
              
                
                  ⟨
                  
                    f
                    ,
                    
                      φ
                      
                        i
                      
                    
                  
                  ⟩
                
                
                  
                    L
                    
                      2
                    
                  
                
              
              
                
                  ⟨
                  
                    g
                    ,
                    
                      φ
                      
                        i
                      
                    
                  
                  ⟩
                
                
                  
                    L
                    
                      2
                    
                  
                
              
            
            
              σ
              
                i
              
            
          
        
        .
      
    
    {\displaystyle \left\langle f,g\right\rangle _{H}=\sum _{i=1}^{\infty }{\frac {\left\langle f,\varphi _{i}\right\rangle _{L_{2}}\left\langle g,\varphi _{i}\right\rangle _{L_{2}}}{\sigma _{i}}}.}

This representation of the RKHS has application in probability and statistics, for example to the <a href="/facts/Karhunen%25E2%2580%2593Lo%25C3%25A8ve_theorem/HX3cIv6g">Karhunen-Loève representation</a> for stochastic processes and <a href="/facts/Kernel_PCA/VLNWSISo">kernel PCA</a>.

<h2 id="feature-maps">Feature maps</h2>
A feature map is a map 
 
 
 
 φ
 :
 X
 →
 F
 
 
 {\displaystyle \varphi \colon X\rightarrow F}
 
, where 
 
 
 
 F
 
 
 {\displaystyle F}
 
 is a Hilbert space which we will call the feature space. The first sections presented the connection between bounded/continuous evaluation functions, positive definite functions, and integral operators and in this section we provide another representation of the RKHS in terms of feature maps.
Every feature map defines a kernel via

<table><tbody><tr><td> K ( x , y ) = ⟨ φ ( x ) , φ ( y ) ⟩ F . {\displaystyle K(x,y)=\langle \varphi (x),\varphi (y)\rangle _{F}.} </td> <td></td> <td>3</td></tr></tbody></table>
Clearly 
 
 
 
 K
 
 
 {\displaystyle K}
 
 is symmetric and positive definiteness follows from the properties of inner product in 
 
 
 
 F
 
 
 {\displaystyle F}
 
. Conversely, every positive definite function and corresponding reproducing kernel Hilbert space has infinitely many associated feature maps such that (3) holds.
For example, we can trivially take 
 
 
 
 F
 =
 H
 
 
 {\displaystyle F=H}
 
 and 
 
 
 
 φ
 (
 x
 )
 =
 
 K
 
 x
 
 
 
 
 {\displaystyle \varphi (x)=K_{x}}
 
 for all 
 
 
 
 x
 ∈
 X
 
 
 {\displaystyle x\in X}
 
. Then (3) is satisfied by the reproducing property. Another classical example of a feature map relates to the previous section regarding integral operators by taking 
 
 
 
 F
 =
 
 ℓ
 
 2
 
 
 
 
 {\displaystyle F=\ell ^{2}}
 
 and 
 
 
 
 φ
 (
 x
 )
 =
 (
 
 
 
 σ
 
 i
 
 
 
 
 
 φ
 
 i
 
 
 (
 x
 )
 
 )
 
 i
 
 
 
 
 {\displaystyle \varphi (x)=({\sqrt {\sigma _{i}}}\varphi _{i}(x))_{i}}
 
.
This connection between kernels and feature maps provides us with a new way to understand positive definite functions and hence reproducing kernels as inner products in 
 
 
 
 H
 
 
 {\displaystyle H}
 
. Moreover, every feature map can naturally define a RKHS by means of the definition of a positive definite function.
Lastly, feature maps allow us to construct function spaces that reveal another perspective on the RKHS. Consider the linear space

H
          
            φ
          
        
        =
        {
        f
        :
        X
        →
        
          R
        
        ∣
        ∃
        w
        ∈
        F
        ,
        f
        (
        x
        )
        =
        ⟨
        w
        ,
        φ
        (
        x
        )
        
          ⟩
          
            F
          
        
        ,
        ∀
        
           
        
        x
        ∈
        X
        }
        .
      
    
    {\displaystyle H_{\varphi }=\{f:X\to \mathbb {R} \mid \exists w\in F,f(x)=\langle w,\varphi (x)\rangle _{F},\forall {\text{  }}x\in X\}.}

We can define a norm on 
 
 
 
 
 H
 
 φ
 
 
 
 
 {\displaystyle H_{\varphi }}
 
 by

‖
        f
        
          ‖
          
            φ
          
        
        =
        inf
        {
        ‖
        w
        
          ‖
          
            F
          
        
        :
        w
        ∈
        F
        ,
        f
        (
        x
        )
        =
        ⟨
        w
        ,
        φ
        (
        x
        )
        
          ⟩
          
            F
          
        
        ,
        ∀
        
           
        
        x
        ∈
        X
        }
        .
      
    
    {\displaystyle \|f\|_{\varphi }=\inf\{\|w\|_{F}:w\in F,f(x)=\langle w,\varphi (x)\rangle _{F},\forall {\text{  }}x\in X\}.}

It can be shown that 
 
 
 
 
 H
 
 φ
 
 
 
 
 {\displaystyle H_{\varphi }}
 
 is a RKHS with kernel defined by 
 
 
 
 K
 (
 x
 ,
 y
 )
 =
 ⟨
 φ
 (
 x
 )
 ,
 φ
 (
 y
 )
 
 ⟩
 
 F
 
 
 
 
 {\displaystyle K(x,y)=\langle \varphi (x),\varphi (y)\rangle _{F}}
 
. This representation implies that the elements of the RKHS are inner products of elements in the feature space and can accordingly be seen as hyperplanes. This view of the RKHS is related to the <a href="/facts/Kernel_trick/fYuURIPk">kernel trick</a> in machine learning.<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a>

<h2 id="properties">Properties</h2>
Useful properties of RKHSs:

<ul><li>Let 
 
 
 
 (
 
 X
 
 i
 
 
 
 )
 
 i
 =
 1
 
 
 p
 
 
 
 
 {\displaystyle (X_{i})_{i=1}^{p}}
 
 be a sequence of sets and 
 
 
 
 (
 
 K
 
 i
 
 
 
 )
 
 i
 =
 1
 
 
 p
 
 
 
 
 {\displaystyle (K_{i})_{i=1}^{p}}
 
 be a collection of corresponding positive definite functions on 
 
 
 
 (
 
 X
 
 i
 
 
 
 )
 
 i
 =
 1
 
 
 p
 
 
 .
 
 
 {\displaystyle (X_{i})_{i=1}^{p}.}
 
 It then follows that

K
        (
        (
        
          x
          
            1
          
        
        ,
        …
        ,
        
          x
          
            p
          
        
        )
        ,
        (
        
          y
          
            1
          
        
        ,
        …
        ,
        
          y
          
            p
          
        
        )
        )
        =
        
          K
          
            1
          
        
        (
        
          x
          
            1
          
        
        ,
        
          y
          
            1
          
        
        )
        ⋯
        
          K
          
            p
          
        
        (
        
          x
          
            p
          
        
        ,
        
          y
          
            p
          
        
        )
      
    
    {\displaystyle K((x_{1},\ldots ,x_{p}),(y_{1},\ldots ,y_{p}))=K_{1}(x_{1},y_{1})\cdots K_{p}(x_{p},y_{p})}

is a kernel on 
 
 
 
 X
 =
 
 X
 
 1
 
 
 ×
 ⋯
 ×
 
 X
 
 p
 
 
 .
 
 
 {\displaystyle X=X_{1}\times \dots \times X_{p}.}
 
</li>
<li>Let 
 
 
 
 
 X
 
 0
 
 
 ⊂
 X
 ,
 
 
 {\displaystyle X_{0}\subset X,}
 
 then the restriction of 
 
 
 
 K
 
 
 {\displaystyle K}
 
 to 
 
 
 
 
 X
 
 0
 
 
 ×
 
 X
 
 0
 
 
 
 
 {\displaystyle X_{0}\times X_{0}}
 
 is also a reproducing kernel.</li>
<li>Consider a normalized kernel 
 
 
 
 K
 
 
 {\displaystyle K}
 
 such that 
 
 
 
 K
 (
 x
 ,
 x
 )
 =
 1
 
 
 {\displaystyle K(x,x)=1}
 
 for all 
 
 
 
 x
 ∈
 X
 
 
 {\displaystyle x\in X}
 
. Define a pseudo-metric on X as

d
          
            K
          
        
        (
        x
        ,
        y
        )
        =
        ‖
        
          K
          
            x
          
        
        −
        
          K
          
            y
          
        
        
          ‖
          
            H
          
          
            2
          
        
        =
        2
        (
        1
        −
        K
        (
        x
        ,
        y
        )
        )
        
        ∀
        x
        ∈
        X
        .
      
    
    {\displaystyle d_{K}(x,y)=\|K_{x}-K_{y}\|_{H}^{2}=2(1-K(x,y))\qquad \forall x\in X.}

By the <a href="/facts/Cauchy%25E2%2580%2593Schwarz_inequality/poMzQUIY">Cauchy–Schwarz inequality</a>,

K
        (
        x
        ,
        y
        
          )
          
            2
          
        
        ≤
        K
        (
        x
        ,
        x
        )
        K
        (
        y
        ,
        y
        )
        =
        1
        
        ∀
        x
        ,
        y
        ∈
        X
        .
      
    
    {\displaystyle K(x,y)^{2}\leq K(x,x)K(y,y)=1\qquad \forall x,y\in X.}

This inequality allows us to view 
 
 
 
 K
 
 
 {\displaystyle K}
 
 as a <a href="/facts/Similarity_measure/FmmFco8c">measure of similarity</a> between inputs. If 
 
 
 
 x
 ,
 y
 ∈
 X
 
 
 {\displaystyle x,y\in X}
 
 are similar then 
 
 
 
 K
 (
 x
 ,
 y
 )
 
 
 {\displaystyle K(x,y)}
 
 will be closer to 1 while if 
 
 
 
 x
 ,
 y
 ∈
 X
 
 
 {\displaystyle x,y\in X}
 
 are dissimilar then 
 
 
 
 K
 (
 x
 ,
 y
 )
 
 
 {\displaystyle K(x,y)}
 
 will be closer to 0.</li></ul>
<ul><li>The closure of the span of 
 
 
 
 {
 
 K
 
 x
 
 
 ∣
 x
 ∈
 X
 }
 
 
 {\displaystyle \{K_{x}\mid x\in X\}}
 
 coincides with 
 
 
 
 H
 
 
 {\displaystyle H}
 
.<a class="footnote-ref" id="fnref:8" href="#fn:8">8</a></li></ul>
<h2 id="common-examples">Common examples</h2>
<h3>Bilinear kernels</h3>

K
        (
        x
        ,
        y
        )
        =
        ⟨
        x
        ,
        y
        ⟩
      
    
    {\displaystyle K(x,y)=\langle x,y\rangle }

The RKHS 
 
 
 
 H
 
 
 {\displaystyle H}
 
 corresponding to this kernel is the dual space, consisting of functions 
 
 
 
 f
 (
 x
 )
 =
 ⟨
 x
 ,
 β
 ⟩
 
 
 {\displaystyle f(x)=\langle x,\beta \rangle }
 
 satisfying 
 
 
 
 ‖
 f
 
 ‖
 
 H
 
 
 2
 
 
 =
 ‖
 β
 
 ‖
 
 2
 
 
 
 
 {\displaystyle \|f\|_{H}^{2}=\|\beta \|^{2}}
 
.

<h3>Polynomial kernels</h3>

K
        (
        x
        ,
        y
        )
        =
        (
        α
        ⟨
        x
        ,
        y
        ⟩
        +
        1
        
          )
          
            d
          
        
        ,
        
        α
        ∈
        
          R
        
        ,
        d
        ∈
        
          N
        
      
    
    {\displaystyle K(x,y)=(\alpha \langle x,y\rangle +1)^{d},\qquad \alpha \in \mathbb {R} ,d\in \mathbb {N} }

<h3><a href="/facts/Radial_basis_function_kernel/A88JLBJF">Radial basis function kernels</a></h3>
These are another common class of kernels which satisfy 
 
 
 
 K
 (
 x
 ,
 y
 )
 =
 K
 (
 ‖
 x
 −
 y
 ‖
 )
 
 
 {\displaystyle K(x,y)=K(\|x-y\|)}
 
. Some examples include:

<ul><li>Gaussian or squared exponential kernel:

K
 (
 x
 ,
 y
 )
 =
 
 e
 
 −
 
 
 
 ‖
 x
 −
 y
 
 ‖
 
 2
 
 
 
 
 2
 
 σ
 
 2
 
 
 
 
 
 
 
 ,
 
 σ
 >
 0
 
 
 {\displaystyle K(x,y)=e^{-{\frac {\|x-y\|^{2}}{2\sigma ^{2}}}},\qquad \sigma >0}
 
</li>
<li>Laplacian kernel:

K
        (
        x
        ,
        y
        )
        =
        
          e
          
            −
            
              
                
                  ‖
                  x
                  −
                  y
                  ‖
                
                σ
              
            
          
        
        ,
        
        σ
        >
        0
      
    
    {\displaystyle K(x,y)=e^{-{\frac {\|x-y\|}{\sigma }}},\qquad \sigma >0}

The squared norm of a function 
 
 
 
 f
 
 
 {\displaystyle f}
 
 in the RKHS 
 
 
 
 H
 
 
 {\displaystyle H}
 
 with this kernel is:<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a><a class="footnote-ref" id="fnref:10" href="#fn:10">10</a>

‖
 f
 
 ‖
 
 H
 
 
 2
 
 
 =
 
 ∫
 
 
 R
 
 
 
 
 
 (
 
 
 
 
 1
 σ
 
 
 f
 (
 x
 
 )
 
 2
 
 
 +
 σ
 
 f
 ′
 
 (
 x
 
 )
 
 2
 
 
 
 
 )
 
 
 
 d
 
 x
 .
 
 
 {\displaystyle \|f\|_{H}^{2}=\int _{\mathbb {R} }{\Big (}{\frac {1}{\sigma }}f(x)^{2}+\sigma f'(x)^{2}{\Big )}\mathrm {d} x.}
 
</li></ul>
<h3><a href="/facts/Bergman_kernel/IeSkXbGR">Bergman kernels</a></h3>
We also provide examples of <a href="/facts/Bergman_kernel/IeSkXbGR">Bergman kernels</a>. Let X be finite and let H consist of all complex-valued functions on X. Then an element of H can be represented as an array of complex numbers. If the usual <a href="/facts/Inner_product/HoyjElEy">inner product</a> is used, then Kx is the function whose value is 1 at x and 0 everywhere else, and 
 
 
 
 K
 (
 x
 ,
 y
 )
 
 
 {\displaystyle K(x,y)}
 
 can be thought of as an identity matrix since

K
        (
        x
        ,
        y
        )
        =
        
          
            {
            
              
                
                  1
                
                
                  x
                  =
                  y
                
              
              
                
                  0
                
                
                  x
                  ≠
                  y
                
              
            
            
          
        
      
    
    {\displaystyle K(x,y)={\begin{cases}1&x=y\\0&x\neq y\end{cases}}}

In this case, H is isomorphic to 
 
 
 
 
 
 C
 
 
 n
 
 
 
 
 {\displaystyle \mathbb {C} ^{n}}
 
.
The case of 
 
 
 
 X
 =
 
 D
 
 
 
 {\displaystyle X=\mathbb {D} }
 
 (where 
 
 
 
 
 D
 
 
 
 {\displaystyle \mathbb {D} }
 
 denotes the <a href="/facts/Unit_disc/phrUu7yC">unit disc</a>) is more sophisticated. Here the <a href="/facts/Bergman_space/r5moDRvw">Bergman space</a> 
 
 
 
 
 A
 
 2
 
 
 (
 
 D
 
 )
 
 
 {\displaystyle A^{2}(\mathbb {D} )}
 
 is the space of <a href="/facts/Square-integrable_function/IASblcJJ">square-integrable</a> <a href="/facts/Holomorphic_function/kVzthtEf">holomorphic functions</a> on 
 
 
 
 
 D
 
 
 
 {\displaystyle \mathbb {D} }
 
. It can be shown that the reproducing kernel for 
 
 
 
 
 A
 
 2
 
 
 (
 
 D
 
 )
 
 
 {\displaystyle A^{2}(\mathbb {D} )}
 
 is

K
        (
        x
        ,
        y
        )
        =
        
          
            1
            π
          
        
        
          
            1
            
              (
              1
              −
              x
              
                
                  y
                  ¯
                
              
              
                )
                
                  2
                
              
            
          
        
        .
      
    
    {\displaystyle K(x,y)={\frac {1}{\pi }}{\frac {1}{(1-x{\overline {y}})^{2}}}.}

Lastly, the space of band limited functions in 
 
 
 
 
 L
 
 2
 
 
 (
 
 R
 
 )
 
 
 {\displaystyle L^{2}(\mathbb {R} )}
 
 with bandwidth 
 
 
 
 2
 a
 
 
 {\displaystyle 2a}
 
 is a RKHS with reproducing kernel

K
        (
        x
        ,
        y
        )
        =
        
          
            
              sin
              ⁡
              a
              (
              x
              −
              y
              )
            
            
              π
              (
              x
              −
              y
              )
            
          
        
        .
      
    
    {\displaystyle K(x,y)={\frac {\sin a(x-y)}{\pi (x-y)}}.}

<h2 id="extension-to-vector-valued-functions">Extension to vector-valued functions</h2>
In this section we extend the definition of the RKHS to spaces of vector-valued functions as this extension is particularly important in <a href="/facts/Multi-task_learning/av9lCQUF">multi-task learning</a> and <a href="/facts/Manifold_regularization/foBI9c7S">manifold regularization</a>. The main difference is that the reproducing kernel 
 
 
 
 Γ
 
 
 {\displaystyle \Gamma }
 
 is a symmetric function that is now a positive semi-definite matrix for every 
 
 
 
 x
 ,
 y
 
 
 {\displaystyle x,y}
 
 in 
 
 
 
 X
 
 
 {\displaystyle X}
 
. More formally, we define a vector-valued RKHS (vvRKHS) as a Hilbert space of functions 
 
 
 
 f
 :
 X
 →
 
 
 R
 
 
 T
 
 
 
 
 {\displaystyle f:X\to \mathbb {R} ^{T}}
 
 such that for all 
 
 
 
 c
 ∈
 
 
 R
 
 
 T
 
 
 
 
 {\displaystyle c\in \mathbb {R} ^{T}}
 
 and 
 
 
 
 x
 ∈
 X
 
 
 {\displaystyle x\in X}

Γ
          
            x
          
        
        c
        (
        y
        )
        =
        Γ
        (
        x
        ,
        y
        )
        c
        ∈
        H
        
           for 
        
        y
        ∈
        X
      
    
    {\displaystyle \Gamma _{x}c(y)=\Gamma (x,y)c\in H{\text{ for }}y\in X}

and

⟨
        f
        ,
        
          Γ
          
            x
          
        
        c
        
          ⟩
          
            H
          
        
        =
        f
        (
        x
        
          )
          
            ⊺
          
        
        c
        .
      
    
    {\displaystyle \langle f,\Gamma _{x}c\rangle _{H}=f(x)^{\intercal }c.}

This second property parallels the reproducing property for the scalar-valued case. This definition can also be connected to integral operators, bounded evaluation functions, and feature maps as we saw for the scalar-valued RKHS. We can equivalently define the vvRKHS as a vector-valued Hilbert space with a bounded evaluation functional and show that this implies the existence of a unique reproducing kernel by the Riesz Representation theorem. Mercer's theorem can also be extended to address the vector-valued setting and we can therefore obtain a feature map view of the vvRKHS. Lastly, it can also be shown that the closure of the span of 
 
 
 
 {
 
 Γ
 
 x
 
 
 c
 :
 x
 ∈
 X
 ,
 c
 ∈
 
 
 R
 
 
 T
 
 
 }
 
 
 {\displaystyle \{\Gamma _{x}c:x\in X,c\in \mathbb {R} ^{T}\}}
 
 coincides with 
 
 
 
 H
 
 
 {\displaystyle H}
 
, another property similar to the scalar-valued case.
We can gain intuition for the vvRKHS by taking a component-wise perspective on these spaces. In particular, we find that every vvRKHS is isometrically <a href="/facts/Isomorphic/pj8KgkxU">isomorphic</a> to a scalar-valued RKHS on a particular input space. Let 
 
 
 
 Λ
 =
 {
 1
 ,
 …
 ,
 T
 }
 
 
 {\displaystyle \Lambda =\{1,\dots ,T\}}
 
. Consider the space 
 
 
 
 X
 ×
 Λ
 
 
 {\displaystyle X\times \Lambda }
 
 and the corresponding reproducing kernel

<table><tbody><tr><td> γ : X × Λ × X × Λ → R . {\displaystyle \gamma :X\times \Lambda \times X\times \Lambda \to \mathbb {R} .} </td> <td></td> <td>4</td></tr></tbody></table>
As noted above, the RKHS associated to this reproducing kernel is given by the closure of the span of 
 
 
 
 {
 
 γ
 
 (
 x
 ,
 t
 )
 
 
 :
 x
 ∈
 X
 ,
 t
 ∈
 Λ
 }
 
 
 {\displaystyle \{\gamma _{(x,t)}:x\in X,t\in \Lambda \}}
 
 where

γ
 
 (
 x
 ,
 t
 )
 
 
 (
 y
 ,
 s
 )
 =
 γ
 (
 (
 x
 ,
 t
 )
 ,
 (
 y
 ,
 s
 )
 )
 
 
 {\displaystyle \gamma _{(x,t)}(y,s)=\gamma ((x,t),(y,s))}
 
 for every set of pairs 
 
 
 
 (
 x
 ,
 t
 )
 ,
 (
 y
 ,
 s
 )
 ∈
 X
 ×
 Λ
 
 
 {\displaystyle (x,t),(y,s)\in X\times \Lambda }
 
.
The connection to the scalar-valued RKHS can then be made by the fact that every matrix-valued kernel can be identified with a kernel of the form of (4) via

Γ
        (
        x
        ,
        y
        
          )
          
            (
            t
            ,
            s
            )
          
        
        =
        γ
        (
        (
        x
        ,
        t
        )
        ,
        (
        y
        ,
        s
        )
        )
        .
      
    
    {\displaystyle \Gamma (x,y)_{(t,s)}=\gamma ((x,t),(y,s)).}

Moreover, every kernel with the form of (4) defines a matrix-valued kernel with the above expression. Now letting the map 
 
 
 
 D
 :
 
 H
 
 Γ
 
 
 →
 
 H
 
 γ
 
 
 
 
 {\displaystyle D:H_{\Gamma }\to H_{\gamma }}
 
 be defined as

(
        D
        f
        )
        (
        x
        ,
        t
        )
        =
        ⟨
        f
        (
        x
        )
        ,
        
          e
          
            t
          
        
        
          ⟩
          
            
              
                R
              
              
                T
              
            
          
        
      
    
    {\displaystyle (Df)(x,t)=\langle f(x),e_{t}\rangle _{\mathbb {R} ^{T}}}

where 
 
 
 
 
 e
 
 t
 
 
 
 
 {\displaystyle e_{t}}
 
 is the 
 
 
 
 
 t
 
 th
 
 
 
 
 {\displaystyle t^{\text{th}}}
 
 component of the canonical basis for 
 
 
 
 
 
 R
 
 
 T
 
 
 
 
 {\displaystyle \mathbb {R} ^{T}}
 
, one can show that 
 
 
 
 D
 
 
 {\displaystyle D}
 
 is bijective and an isometry between 
 
 
 
 
 H
 
 Γ
 
 
 
 
 {\displaystyle H_{\Gamma }}
 
 and 
 
 
 
 
 H
 
 γ
 
 
 
 
 {\displaystyle H_{\gamma }}
 
.
While this view of the vvRKHS can be useful in multi-task learning, this isometry does not reduce the study of the vector-valued case to that of the scalar-valued case. In fact, this isometry procedure can make both the scalar-valued kernel and the input space too difficult to work with in practice as properties of the original kernels are often lost.<a class="footnote-ref" id="fnref:11" href="#fn:11">11</a><a class="footnote-ref" id="fnref:12" href="#fn:12">12</a><a class="footnote-ref" id="fnref:13" href="#fn:13">13</a>
An important class of matrix-valued reproducing kernels are separable kernels which can factorized as the product of a scalar valued kernel and a 
 
 
 
 T
 
 
 {\displaystyle T}
 
-dimensional symmetric positive semi-definite matrix. In light of our previous discussion these kernels are of the form

γ
        (
        (
        x
        ,
        t
        )
        ,
        (
        y
        ,
        s
        )
        )
        =
        K
        (
        x
        ,
        y
        )
        
          K
          
            T
          
        
        (
        t
        ,
        s
        )
      
    
    {\displaystyle \gamma ((x,t),(y,s))=K(x,y)K_{T}(t,s)}

for all 
 
 
 
 x
 ,
 y
 
 
 {\displaystyle x,y}
 
 in 
 
 
 
 X
 
 
 {\displaystyle X}
 
 and 
 
 
 
 t
 ,
 s
 
 
 {\displaystyle t,s}
 
 in 
 
 
 
 T
 
 
 {\displaystyle T}
 
. As the scalar-valued kernel encodes dependencies between the inputs, we can observe that the matrix-valued kernel encodes dependencies among both the inputs and the outputs.
We lastly remark that the above theory can be further extended to spaces of functions with values in function spaces but obtaining kernels for these spaces is a more difficult task.<a class="footnote-ref" id="fnref:14" href="#fn:14">14</a>

<h2 id="connection-between-rkhss-and-the-relu-function">Connection between RKHSs and the ReLU function</h2>
The <a href="/facts/Rectifier_(neural_networks)/pwzdU9jF">ReLU function</a> is commonly defined as 
 
 
 
 f
 (
 x
 )
 =
 max
 {
 0
 ,
 x
 }
 
 
 {\displaystyle f(x)=\max\{0,x\}}
 
 and is a mainstay in the architecture of neural networks where it is used as an activation function. One can construct a ReLU-like nonlinear function using the theory of reproducing kernel Hilbert spaces. Below, we derive this construction and show how it implies the representation power of neural networks with ReLU activations. 
We will work with the Hilbert space 
 
 
 
 
 
 H
 
 
 =
 
 L
 
 2
 
 
 1
 
 
 (
 0
 )
 [
 0
 ,
 ∞
 )
 
 
 {\displaystyle {\mathcal {H}}=L_{2}^{1}(0)[0,\infty )}
 
 of absolutely continuous functions with 
 
 
 
 f
 (
 0
 )
 =
 0
 
 
 {\displaystyle f(0)=0}
 
 and square integrable (i.e. 
 
 
 
 
 L
 
 2
 
 
 
 
 {\displaystyle L_{2}}
 
) derivative. It has the inner product

⟨
        f
        ,
        g
        
          ⟩
          
            
              H
            
          
        
        =
        
          ∫
          
            0
          
          
            ∞
          
        
        
          f
          ′
        
        (
        x
        )
        
          g
          ′
        
        (
        x
        )
        
        d
        x
        .
      
    
    {\displaystyle \langle f,g\rangle _{\mathcal {H}}=\int _{0}^{\infty }f'(x)g'(x)\,dx.}

To construct the reproducing kernel it suffices to consider a dense subspace, so let 
 
 
 
 f
 ∈
 
 C
 
 1
 
 
 [
 0
 ,
 ∞
 )
 
 
 {\displaystyle f\in C^{1}[0,\infty )}
 
 and 
 
 
 
 f
 (
 0
 )
 =
 0
 
 
 {\displaystyle f(0)=0}
 
. 
The Fundamental Theorem of Calculus then gives

f
        (
        y
        )
        =
        
          ∫
          
            0
          
          
            y
          
        
        
          f
          ′
        
        (
        x
        )
        
        d
        x
        =
        
          ∫
          
            0
          
          
            ∞
          
        
        G
        (
        x
        ,
        y
        )
        
          f
          ′
        
        (
        x
        )
        
        d
        x
        =
        ⟨
        
          K
          
            y
          
        
        ,
        f
        ⟩
      
    
    {\displaystyle f(y)=\int _{0}^{y}f'(x)\,dx=\int _{0}^{\infty }G(x,y)f'(x)\,dx=\langle K_{y},f\rangle }

where

G
 (
 x
 ,
 y
 )
 =
 
 
 {
 
 
 
 1
 ,
 
 
 x
 <
 y
 
 
 
 
 0
 ,
 
 
 
 otherwise
 
 
 
 
 
 
 
 
 
 {\displaystyle G(x,y)={\begin{cases}1,&x<y\\0,&{\text{otherwise}}\end{cases}}}

and 
 
 
 
 
 K
 
 y
 
 ′
 
 (
 x
 )
 =
 G
 (
 x
 ,
 y
 )
 ,
  
 
 K
 
 y
 
 
 (
 0
 )
 =
 0
 
 
 {\displaystyle K_{y}'(x)=G(x,y),\ K_{y}(0)=0}
 
 i.e.

K
 (
 x
 ,
 y
 )
 =
 
 K
 
 y
 
 
 (
 x
 )
 =
 
 ∫
 
 0
 
 
 x
 
 
 G
 (
 z
 ,
 y
 )
 
 d
 z
 =
 
 
 {
 
 
 
 x
 ,
 
 
 0
 ≤
 x
 <
 y
 
 
 
 
 y
 ,
 
 
 
 otherwise.
 
 
 
 
 
 
 
 =
 min
 (
 x
 ,
 y
 )
 
 
 {\displaystyle K(x,y)=K_{y}(x)=\int _{0}^{x}G(z,y)\,dz={\begin{cases}x,&0\leq x<y\\y,&{\text{otherwise.}}\end{cases}}=\min(x,y)}

This implies 
 
 
 
 
 K
 
 y
 
 
 =
 K
 (
 ⋅
 ,
 y
 )
 
 
 {\displaystyle K_{y}=K(\cdot ,y)}
 
 reproduces 
 
 
 
 f
 
 
 {\displaystyle f}
 
.
Moreover the minimum function on 
 
 
 
 X
 ×
 X
 =
 [
 0
 ,
 ∞
 )
 ×
 [
 0
 ,
 ∞
 )
 
 
 {\displaystyle X\times X=[0,\infty )\times [0,\infty )}
 
 has the following representations with the ReLu function:

min
        (
        x
        ,
        y
        )
        =
        x
        −
        ReLU
        ⁡
        (
        x
        −
        y
        )
        =
        y
        −
        ReLU
        ⁡
        (
        y
        −
        x
        )
        .
      
    
    {\displaystyle \min(x,y)=x-\operatorname {ReLU} (x-y)=y-\operatorname {ReLU} (y-x).}

Using this formulation, we can apply the <a href="/facts/Representer_theorem/6bXmlSzK">representer theorem</a> to the RKHS, letting one prove the optimality of using ReLU activations in neural network settings.

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Positive_definite_kernel/Ws3zkxfl">Positive definite kernel</a></li>
<li><a href="/facts/Mercer%2527s_theorem/6CmBHnHc">Mercer's theorem</a></li>
<li><a href="/facts/Kernel_trick/fYuURIPk">Kernel trick</a></li>
<li><a href="/facts/Kernel_embedding_of_distributions/oK7gvSdn">Kernel embedding of distributions</a></li>
<li><a href="/facts/Representer_theorem/6bXmlSzK">Representer theorem</a></li></ul>
<h2 id="notes">Notes</h2>

<ul><li>Alvarez, Mauricio, Rosasco, Lorenzo and Lawrence, Neil, “Kernels for Vector-Valued Functions: a Review,” <a href="https://arxiv.org/abs/1106.6251">https://arxiv.org/abs/1106.6251</a>, June 2011.</li>
<li><a href="/facts/Nachman_Aronszajn/7jYonRlv">Aronszajn, Nachman</a> (1950). <a href="https://doi.org/10.1090%2FS0002-9947-1950-0051437-7">"Theory of Reproducing Kernels"</a>. <a href="/facts/Transactions_of_the_American_Mathematical_Society/JmKtyCwK">Transactions of the American Mathematical Society</a>. 68 (3): 337–404. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1090%2FS0002-9947-1950-0051437-7">10.1090/S0002-9947-1950-0051437-7</a>. <a href="/facts/JSTOR_(identifier)/YTeVmaJ7">JSTOR</a> <a href="https://www.jstor.org/stable/1990404">1990404</a>. <a href="/facts/MR_(identifier)/uP137L11">MR</a> <a href="https://mathscinet.ams.org/mathscinet-getitem?mr=0051437">0051437</a>.</li>
<li>Berlinet, Alain and Thomas, Christine. Reproducing kernel Hilbert spaces in Probability and Statistics, Kluwer Academic Publishers, 2004.</li>
<li>Cucker, Felipe; Smale, Steve (2002). <a href="https://doi.org/10.1090%2FS0273-0979-01-00923-5">"On the Mathematical Foundations of Learning"</a>. <a href="/facts/Bulletin_of_the_American_Mathematical_Society/8ER5KY5z">Bulletin of the American Mathematical Society</a>. 39 (1): 1–49. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1090%2FS0273-0979-01-00923-5">10.1090/S0273-0979-01-00923-5</a>. <a href="/facts/MR_(identifier)/uP137L11">MR</a> <a href="https://mathscinet.ams.org/mathscinet-getitem?mr=1864085">1864085</a>.</li>
<li>De Vito, Ernest, Umanita, Veronica, and Villa, Silvia. "An extension of Mercer theorem to vector-valued measurable kernels," <a href="/facts/ArXiv_(identifier)/H6EtgnBe">arXiv</a>:<a href="https://arxiv.org/abs/1110.4017">1110.4017</a>, June 2013.</li>
<li>Durrett, Greg. 9.520 Course Notes, Massachusetts Institute of Technology, <a href="https://www.mit.edu/~9.520/scribe-notes/class03_gdurett.pdf">https://www.mit.edu/~9.520/scribe-notes/class03_gdurett.pdf</a>, February 2010.</li>
<li>Kimeldorf, George; <a href="/facts/Grace_Wahba/7tkXHAI9">Wahba, Grace</a> (1971). <a href="http://www.stat.wisc.edu/~wahba/ftp1/oldie/kw71.pdf">"Some results on Tchebycheffian Spline Functions"</a> (PDF). Journal of Mathematical Analysis and Applications. 33 (1): 82–95. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1016%2F0022-247X%2871%2990184-3">10.1016/0022-247X(71)90184-3</a>. <a href="/facts/MR_(identifier)/uP137L11">MR</a> <a href="https://mathscinet.ams.org/mathscinet-getitem?mr=0290013">0290013</a>.</li>
<li>Okutmustur, Baver. “Reproducing Kernel Hilbert Spaces,” M.S. dissertation, Bilkent University, <a href="https://users.metu.edu.tr/baver/MS.Thesis.pdf">https://users.metu.edu.tr/baver/MS.Thesis.pdf</a>, August 2005.</li>
<li>Paulsen, Vern. “An introduction to the theory of reproducing kernel Hilbert spaces,” <a href="https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=440218056738e05b5ab43679f932a9f33fccee87">https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=440218056738e05b5ab43679f932a9f33fccee87</a>.</li>
<li>Steinwart, Ingo; Scovel, Clint (2012). "Mercer's theorem on general domains: On the interaction between measures, kernels, and RKHSs". Constr. Approx. 35 (3): 363–417. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1007%2Fs00365-012-9153-3">10.1007/s00365-012-9153-3</a>. <a href="/facts/MR_(identifier)/uP137L11">MR</a> <a href="https://mathscinet.ams.org/mathscinet-getitem?mr=2914365">2914365</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:253885172">253885172</a>.</li>
<li>Rosasco, Lorenzo and Poggio, Thomas. "A Regularization Tour of Machine Learning – MIT 9.520 Lecture Notes" Manuscript, Dec. 2014.</li>
<li><a href="/facts/Grace_Wahba/7tkXHAI9">Wahba, Grace</a>, Spline Models for Observational Data, <a href="http://www.siam.org/books/">SIAM</a>, 1990.</li>
<li>Zhang, Haizhang; Xu, Yuesheng; Zhang, Qinghui (2012). <a href="http://www.jmlr.org/papers/volume13/zhang12a/zhang12a.pdf">"Refinement of Operator-valued Reproducing Kernels"</a> (PDF). Journal of Machine Learning Research. 13: 91–136.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Alpay, D., and T. M. Mills. "A family of Hilbert spaces which are not reproducing kernel Hilbert spaces." J. Anal. Appl. 1.2 (2003): 107–111. <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Z. Pasternak-Winiarski, "On weights which admit reproducing kernel of Bergman type", International Journal of Mathematics and Mathematical Sciences, vol. 15, Issue 1, 1992.
 <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">T. Ł. Żynda, "On weights which admit reproducing kernel of Szegő type", Journal of Contemporary Mathematical Analysis (Armenian Academy of Sciences), 55, 2020.
 <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Okutmustur <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Paulson <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Durrett <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Rosasco <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Rosasco <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Berlinet, Alain and Thomas, Christine. Reproducing kernel Hilbert spaces in Probability and Statistics, Kluwer Academic Publishers, 2004 <a href="https://books.google.com/books?id=bX3TBwAAQBAJ&dq=%22Reproducing+kernel+Hilbert+spaces+in+Probability+and+Statistics%22&pg=PP11" target="_blank">https://books.google.com/books?id=bX3TBwAAQBAJ&dq=%22Reproducing+kernel+Hilbert+spaces+in+Probability+and+Statistics%22&pg=PP11</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Thomas-Agnan C . Computing a family of reproducing kernels for statistical applications. Numerical Algorithms, 13, pp. 21-32 (1996) <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">De Vito <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Zhang <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
<li id="fn:13">Alvarez <a href="#fnref:13" class="footnote-back-ref">↩</a></li>
<li id="fn:14">Rosasco <a href="#fnref:14" class="footnote-back-ref">↩</a></li>
</ol>

Reproducing kernel Hilbert space open-in-new

Reproducing kernel Hilbert space