Additive smoothing

In <a href="/facts/Statistics/DNPsKGYU">statistics</a>, additive smoothing, also called <a href="/facts/Pierre-Simon_Laplace/6ktp0txb">Laplace</a> smoothing or <a href="/facts/George_James_Lidstone/6J9FBXsz">Lidstone</a> smoothing, is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts 
 
 
 
 
 x
 
 =
 ⟨
 
 x
 
 1
 
 
 ,
 
 x
 
 2
 
 
 ,
 …
 ,
 
 x
 
 d
 
 
 ⟩
 
 
 {\displaystyle \mathbf {x} =\langle x_{1},x_{2},\ldots ,x_{d}\rangle }
 
 from a 
 
 
 
 d
 
 
 {\displaystyle d}
 
-dimensional <a href="/facts/Multinomial_distribution/y58v1p9J">multinomial distribution</a> with 
 
 
 
 N
 
 
 {\displaystyle N}
 
 trials, a "smoothed" version of the counts gives the <a href="/facts/Estimator/CbkjcKvN">estimator</a>

θ
                ^
              
            
          
          
            i
          
        
        =
        
          
            
              
                x
                
                  i
                
              
              +
              α
            
            
              N
              +
              α
              d
            
          
        
        
        (
        i
        =
        1
        ,
        …
        ,
        d
        )
        ,
      
    
    {\displaystyle {\hat {\theta }}_{i}={\frac {x_{i}+\alpha }{N+\alpha d}}\qquad (i=1,\ldots ,d),}

where the smoothed count 
 
 
 
 
 
 
 
 x
 ^
 
 
 
 
 i
 
 
 =
 N
 
 
 
 
 θ
 ^
 
 
 
 
 i
 
 
 
 
 {\displaystyle {\hat {x}}_{i}=N{\hat {\theta }}_{i}}
 
, and the "pseudocount" α > 0 is a smoothing <a href="/facts/Parameter/zFvVFMv9">parameter</a>, with α = 0 corresponding to no smoothing (this parameter is explained in § Pseudocount below). Additive smoothing is a type of <a href="/facts/Shrinkage_estimator/SY91pcov">shrinkage estimator</a>, as the resulting estimate will be between the <a href="/facts/Empirical_probability/WyHXAJ43">empirical probability</a> (<a href="/facts/Relative_frequency/FtFNmhDS">relative frequency</a>) 
 
 
 
 
 x
 
 i
 
 
 
 /
 
 N
 
 
 {\displaystyle x_{i}/N}
 
 and the <a href="/facts/Discrete_uniform_distribution/M52SQP5n">uniform probability</a> 
 
 
 
 1
 
 /
 
 d
 .
 
 
 {\displaystyle 1/d.}
 
 Common choices for α are 0 (no smoothing), +1⁄2 (the <a href="/facts/Jeffreys_prior/6j4Wpu7t">Jeffreys prior</a>), or 1 (Laplace's <a href="/facts/Rule_of_succession/XRSfn8T1">rule of succession</a>), but the parameter may also be set empirically based on the observed data.
From a <a href="/facts/Bayesian_inference/tEpK3zLx">Bayesian</a> point of view, this corresponds to the <a href="/facts/Expected_value/1XV0JKL8">expected value</a> of the <a href="/facts/Posterior_distribution/fojsMD0D">posterior distribution</a>, using a symmetric <a href="/facts/Dirichlet_distribution/Qq13f5g2">Dirichlet distribution</a> with parameter α as a <a href="/facts/Prior_distribution/JQKAD4o0">prior distribution</a>. In the special case where the number of categories is 2, this is equivalent to using a <a href="/facts/Beta_distribution/DbJk4eeV">beta distribution</a> as the conjugate prior for the parameters of the <a href="/facts/Binomial_distribution/UMoFMjDj">binomial distribution</a>.

Additive smoothing open-in-new

Additive smoothing