Bernoulli distribution

<h2 id="properties">Properties</h2>
<p>If 
  
    
      
        X
      
    
    {\displaystyle X}
  
 is a random variable with a Bernoulli distribution, then:
</p>

Pr
        (
        X
        =
        1
        )
        =
        p
        ,
        Pr
        (
        X
        =
        0
        )
        =
        q
        =
        1
        −
        p
        .
      
    
    {\displaystyle \Pr(X=1)=p,\Pr(X=0)=q=1-p.}

<p>The <a href="/facts/Probability_mass_function/LhurokRt">probability mass function</a> 
  
    
      
        f
      
    
    {\displaystyle f}
  
 of this distribution, over possible outcomes <i>k</i>, is
</p>

f
        (
        k
        ;
        p
        )
        =
        
          
            {
            
              
                
                  p
                
                
                  
                    if 
                  
                  k
                  =
                  1
                  ,
                
              
              
                
                  q
                  =
                  1
                  −
                  p
                
                
                  
                    if 
                  
                  k
                  =
                  0.
                
              
            
            
          
        
      
    
    {\displaystyle f(k;p)={\begin{cases}p&{\text{if }}k=1,\\q=1-p&{\text{if }}k=0.\end{cases}}}
  
<a class="footnote-ref" id="fnref:3" href="#fn:3"><sup>3</sup></a>
<p>This can also be expressed as
</p>

f
        (
        k
        ;
        p
        )
        =
        
          p
          
            k
          
        
        (
        1
        −
        p
        
          )
          
            1
            −
            k
          
        
        
        
          for 
        
        k
        ∈
        {
        0
        ,
        1
        }
      
    
    {\displaystyle f(k;p)=p^{k}(1-p)^{1-k}\quad {\text{for }}k\in \{0,1\}}

f
        (
        k
        ;
        p
        )
        =
        p
        k
        +
        (
        1
        −
        p
        )
        (
        1
        −
        k
        )
        
        
          for 
        
        k
        ∈
        {
        0
        ,
        1
        }
        .
      
    
    {\displaystyle f(k;p)=pk+(1-p)(1-k)\quad {\text{for }}k\in \{0,1\}.}

<p>The Bernoulli distribution is a special case of the <a href="/facts/Binomial_distribution/UMoFMjDj">binomial distribution</a> with 
  
    
      
        n
        =
        1.
      
    
    {\displaystyle n=1.}
  
<a class="footnote-ref" id="fnref:4" href="#fn:4"><sup>4</sup></a>
</p><p>The <a href="/facts/Kurtosis/yqGgry89">kurtosis</a> goes to infinity for high and low values of 
  
    
      
        p
        ,
      
    
    {\displaystyle p,}
  
 but for 
  
    
      
        p
        =
        1
        
          /
        
        2
      
    
    {\displaystyle p=1/2}
  
 the two-point distributions including the Bernoulli distribution have a lower <a href="/facts/Excess_kurtosis/yqGgry89">excess kurtosis</a>, namely −2, than any other probability distribution.
</p><p>The Bernoulli distributions for 
  
    
      
        0
        ≤
        p
        ≤
        1
      
    
    {\displaystyle 0\leq p\leq 1}
  
 form an <a href="/facts/Exponential_family/1LkkqEIf">exponential family</a>.
</p><p>The <a href="/facts/Maximum_likelihood_estimator/0Yq2dpQD">maximum likelihood estimator</a> of 
  
    
      
        p
      
    
    {\displaystyle p}
  
 based on a random sample is the <a href="/facts/Sample_mean/Ah8VVVDT">sample mean</a>.
</p>

<h2 id="mean">Mean</h2>
<p>The <a href="/facts/Expected_value/1XV0JKL8">expected value</a> of a Bernoulli random variable 
  
    
      
        X
      
    
    {\displaystyle X}
  
 is
</p>

E
        ⁡
        [
        X
        ]
        =
        p
      
    
    {\displaystyle \operatorname {E} [X]=p}

<p>This is because for a Bernoulli distributed random variable 
  
    
      
        X
      
    
    {\displaystyle X}
  
 with 
  
    
      
        Pr
        (
        X
        =
        1
        )
        =
        p
      
    
    {\displaystyle \Pr(X=1)=p}
  
 and 
  
    
      
        Pr
        (
        X
        =
        0
        )
        =
        q
      
    
    {\displaystyle \Pr(X=0)=q}
  
 we find
</p>

E
        ⁡
        [
        X
        ]
        =
        Pr
        (
        X
        =
        1
        )
        ⋅
        1
        +
        Pr
        (
        X
        =
        0
        )
        ⋅
        0
        =
        p
        ⋅
        1
        +
        q
        ⋅
        0
        =
        p
        .
      
    
    {\displaystyle \operatorname {E} [X]=\Pr(X=1)\cdot 1+\Pr(X=0)\cdot 0=p\cdot 1+q\cdot 0=p.}
  
<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a>
<h2 id="variance">Variance</h2>
<p>The <a href="/facts/Variance/ULBJKXD1">variance</a> of a Bernoulli distributed 
  
    
      
        X
      
    
    {\displaystyle X}
  
 is
</p>

Var
        ⁡
        [
        X
        ]
        =
        p
        q
        =
        p
        (
        1
        −
        p
        )
      
    
    {\displaystyle \operatorname {Var} [X]=pq=p(1-p)}

<p>We first find
</p>

E
        ⁡
        [
        
          X
          
            2
          
        
        ]
        =
        Pr
        (
        X
        =
        1
        )
        ⋅
        
          1
          
            2
          
        
        +
        Pr
        (
        X
        =
        0
        )
        ⋅
        
          0
          
            2
          
        
      
    
    {\displaystyle \operatorname {E} [X^{2}]=\Pr(X=1)\cdot 1^{2}+\Pr(X=0)\cdot 0^{2}}

=
        p
        ⋅
        
          1
          
            2
          
        
        +
        q
        ⋅
        
          0
          
            2
          
        
        =
        p
        =
        E
        ⁡
        [
        X
        ]
      
    
    {\displaystyle =p\cdot 1^{2}+q\cdot 0^{2}=p=\operatorname {E} [X]}

<p>From this follows
</p>

Var
        ⁡
        [
        X
        ]
        =
        E
        ⁡
        [
        
          X
          
            2
          
        
        ]
        −
        E
        ⁡
        [
        X
        
          ]
          
            2
          
        
        =
        E
        ⁡
        [
        X
        ]
        −
        E
        ⁡
        [
        X
        
          ]
          
            2
          
        
      
    
    {\displaystyle \operatorname {Var} [X]=\operatorname {E} [X^{2}]-\operatorname {E} [X]^{2}=\operatorname {E} [X]-\operatorname {E} [X]^{2}}

=
        p
        −
        
          p
          
            2
          
        
        =
        p
        (
        1
        −
        p
        )
        =
        p
        q
      
    
    {\displaystyle =p-p^{2}=p(1-p)=pq}
  
<a class="footnote-ref" id="fnref:6" href="#fn:6"><sup>6</sup></a>
<p>With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside 
  
    
      
        [
        0
        ,
        1
        
          /
        
        4
        ]
      
    
    {\displaystyle [0,1/4]}
  
.
</p>
<h2 id="skewness">Skewness</h2>
<p>The <a href="/facts/Skewness/rdS8luFa">skewness</a> is 
  
    
      
        
          
            
              q
              −
              p
            
            
              p
              q
            
          
        
        =
        
          
            
              1
              −
              2
              p
            
            
              p
              q
            
          
        
      
    
    {\displaystyle {\frac {q-p}{\sqrt {pq}}}={\frac {1-2p}{\sqrt {pq}}}}
  
. When we take the standardized Bernoulli distributed random variable 
  
    
      
        
          
            
              X
              −
              E
              ⁡
              [
              X
              ]
            
            
              Var
              ⁡
              [
              X
              ]
            
          
        
      
    
    {\displaystyle {\frac {X-\operatorname {E} [X]}{\sqrt {\operatorname {Var} [X]}}}}
  
 we find that this random variable attains 
  
    
      
        
          
            q
            
              p
              q
            
          
        
      
    
    {\displaystyle {\frac {q}{\sqrt {pq}}}}
  
 with probability 
  
    
      
        p
      
    
    {\displaystyle p}
  
 and attains 
  
    
      
        −
        
          
            p
            
              p
              q
            
          
        
      
    
    {\displaystyle -{\frac {p}{\sqrt {pq}}}}
  
 with probability 
  
    
      
        q
      
    
    {\displaystyle q}
  
. Thus we get
</p>

γ
                  
                    1
                  
                
              
              
                
                =
                E
                ⁡
                
                  [
                  
                    
                      (
                      
                        
                          
                            X
                            −
                            E
                            ⁡
                            [
                            X
                            ]
                          
                          
                            Var
                            ⁡
                            [
                            X
                            ]
                          
                        
                      
                      )
                    
                    
                      3
                    
                  
                  ]
                
              
            
            
              
              
                
                =
                p
                ⋅
                
                  
                    (
                    
                      
                        q
                        
                          p
                          q
                        
                      
                    
                    )
                  
                  
                    3
                  
                
                +
                q
                ⋅
                
                  
                    (
                    
                      −
                      
                        
                          p
                          
                            p
                            q
                          
                        
                      
                    
                    )
                  
                  
                    3
                  
                
              
            
            
              
              
                
                =
                
                  
                    1
                    
                      
                        
                          p
                          q
                        
                      
                      
                        3
                      
                    
                  
                
                
                  (
                  
                    p
                    
                      q
                      
                        3
                      
                    
                    −
                    q
                    
                      p
                      
                        3
                      
                    
                  
                  )
                
              
            
            
              
              
                
                =
                
                  
                    
                      p
                      q
                    
                    
                      
                        
                          p
                          q
                        
                      
                      
                        3
                      
                    
                  
                
                (
                
                  q
                  
                    2
                  
                
                −
                
                  p
                  
                    2
                  
                
                )
              
            
            
              
              
                
                =
                
                  
                    
                      (
                      1
                      −
                      p
                      
                        )
                        
                          2
                        
                      
                      −
                      
                        p
                        
                          2
                        
                      
                    
                    
                      p
                      q
                    
                  
                
              
            
            
              
              
                
                =
                
                  
                    
                      1
                      −
                      2
                      p
                    
                    
                      p
                      q
                    
                  
                
                =
                
                  
                    
                      q
                      −
                      p
                    
                    
                      p
                      q
                    
                  
                
                .
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\gamma _{1}&=\operatorname {E} \left[\left({\frac {X-\operatorname {E} [X]}{\sqrt {\operatorname {Var} [X]}}}\right)^{3}\right]\\&=p\cdot \left({\frac {q}{\sqrt {pq}}}\right)^{3}+q\cdot \left(-{\frac {p}{\sqrt {pq}}}\right)^{3}\\&={\frac {1}{{\sqrt {pq}}^{3}}}\left(pq^{3}-qp^{3}\right)\\&={\frac {pq}{{\sqrt {pq}}^{3}}}(q^{2}-p^{2})\\&={\frac {(1-p)^{2}-p^{2}}{\sqrt {pq}}}\\&={\frac {1-2p}{\sqrt {pq}}}={\frac {q-p}{\sqrt {pq}}}.\end{aligned}}}

<h2 id="higher-moments-and-cumulants">Higher moments and cumulants</h2>
<p>The raw moments are all equal because 
  
    
      
        
          1
          
            k
          
        
        =
        1
      
    
    {\displaystyle 1^{k}=1}
  
 and 
  
    
      
        
          0
          
            k
          
        
        =
        0
      
    
    {\displaystyle 0^{k}=0}
  
.
</p>

E
        ⁡
        [
        
          X
          
            k
          
        
        ]
        =
        Pr
        (
        X
        =
        1
        )
        ⋅
        
          1
          
            k
          
        
        +
        Pr
        (
        X
        =
        0
        )
        ⋅
        
          0
          
            k
          
        
        =
        p
        ⋅
        1
        +
        q
        ⋅
        0
        =
        p
        =
        E
        ⁡
        [
        X
        ]
        .
      
    
    {\displaystyle \operatorname {E} [X^{k}]=\Pr(X=1)\cdot 1^{k}+\Pr(X=0)\cdot 0^{k}=p\cdot 1+q\cdot 0=p=\operatorname {E} [X].}

<p>The central moment of order 
  
    
      
        k
      
    
    {\displaystyle k}
  
 is given by
</p>

μ
          
            k
          
        
        =
        (
        1
        −
        p
        )
        (
        −
        p
        
          )
          
            k
          
        
        +
        p
        (
        1
        −
        p
        
          )
          
            k
          
        
        .
      
    
    {\displaystyle \mu _{k}=(1-p)(-p)^{k}+p(1-p)^{k}.}

<p>The first six central moments are
</p>

μ
                  
                    1
                  
                
              
              
                
                =
                0
                ,
              
            
            
              
                
                  μ
                  
                    2
                  
                
              
              
                
                =
                p
                (
                1
                −
                p
                )
                ,
              
            
            
              
                
                  μ
                  
                    3
                  
                
              
              
                
                =
                p
                (
                1
                −
                p
                )
                (
                1
                −
                2
                p
                )
                ,
              
            
            
              
                
                  μ
                  
                    4
                  
                
              
              
                
                =
                p
                (
                1
                −
                p
                )
                (
                1
                −
                3
                p
                (
                1
                −
                p
                )
                )
                ,
              
            
            
              
                
                  μ
                  
                    5
                  
                
              
              
                
                =
                p
                (
                1
                −
                p
                )
                (
                1
                −
                2
                p
                )
                (
                1
                −
                2
                p
                (
                1
                −
                p
                )
                )
                ,
              
            
            
              
                
                  μ
                  
                    6
                  
                
              
              
                
                =
                p
                (
                1
                −
                p
                )
                (
                1
                −
                5
                p
                (
                1
                −
                p
                )
                (
                1
                −
                p
                (
                1
                −
                p
                )
                )
                )
                .
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\mu _{1}&=0,\\\mu _{2}&=p(1-p),\\\mu _{3}&=p(1-p)(1-2p),\\\mu _{4}&=p(1-p)(1-3p(1-p)),\\\mu _{5}&=p(1-p)(1-2p)(1-2p(1-p)),\\\mu _{6}&=p(1-p)(1-5p(1-p)(1-p(1-p))).\end{aligned}}}

<p>The higher central moments can be expressed more compactly in terms of 
  
    
      
        
          μ
          
            2
          
        
      
    
    {\displaystyle \mu _{2}}
  
 and 
  
    
      
        
          μ
          
            3
          
        
      
    
    {\displaystyle \mu _{3}}

</p>

μ
                  
                    4
                  
                
              
              
                
                =
                
                  μ
                  
                    2
                  
                
                (
                1
                −
                3
                
                  μ
                  
                    2
                  
                
                )
                ,
              
            
            
              
                
                  μ
                  
                    5
                  
                
              
              
                
                =
                
                  μ
                  
                    3
                  
                
                (
                1
                −
                2
                
                  μ
                  
                    2
                  
                
                )
                ,
              
            
            
              
                
                  μ
                  
                    6
                  
                
              
              
                
                =
                
                  μ
                  
                    2
                  
                
                (
                1
                −
                5
                
                  μ
                  
                    2
                  
                
                (
                1
                −
                
                  μ
                  
                    2
                  
                
                )
                )
                .
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\mu _{4}&=\mu _{2}(1-3\mu _{2}),\\\mu _{5}&=\mu _{3}(1-2\mu _{2}),\\\mu _{6}&=\mu _{2}(1-5\mu _{2}(1-\mu _{2})).\end{aligned}}}

<p>The first six cumulants are
</p>

κ
                  
                    1
                  
                
              
              
                
                =
                p
                ,
              
            
            
              
                
                  κ
                  
                    2
                  
                
              
              
                
                =
                
                  μ
                  
                    2
                  
                
                ,
              
            
            
              
                
                  κ
                  
                    3
                  
                
              
              
                
                =
                
                  μ
                  
                    3
                  
                
                ,
              
            
            
              
                
                  κ
                  
                    4
                  
                
              
              
                
                =
                
                  μ
                  
                    2
                  
                
                (
                1
                −
                6
                
                  μ
                  
                    2
                  
                
                )
                ,
              
            
            
              
                
                  κ
                  
                    5
                  
                
              
              
                
                =
                
                  μ
                  
                    3
                  
                
                (
                1
                −
                12
                
                  μ
                  
                    2
                  
                
                )
                ,
              
            
            
              
                
                  κ
                  
                    6
                  
                
              
              
                
                =
                
                  μ
                  
                    2
                  
                
                (
                1
                −
                30
                
                  μ
                  
                    2
                  
                
                (
                1
                −
                4
                
                  μ
                  
                    2
                  
                
                )
                )
                .
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\kappa _{1}&=p,\\\kappa _{2}&=\mu _{2},\\\kappa _{3}&=\mu _{3},\\\kappa _{4}&=\mu _{2}(1-6\mu _{2}),\\\kappa _{5}&=\mu _{3}(1-12\mu _{2}),\\\kappa _{6}&=\mu _{2}(1-30\mu _{2}(1-4\mu _{2})).\end{aligned}}}

<h2 id="entropy-and-fishers-information">Entropy and Fisher's Information</h2>
<h3>Entropy</h3>
<p>Entropy is a measure of uncertainty or randomness in a probability distribution. For a Bernoulli random variable 
  
    
      
        X
      
    
    {\displaystyle X}
  
 with success probability 
  
    
      
        p
      
    
    {\displaystyle p}
  
 and failure probability 
  
    
      
        q
        =
        1
        −
        p
      
    
    {\displaystyle q=1-p}
  
, the entropy 
  
    
      
        H
        (
        X
        )
      
    
    {\displaystyle H(X)}
  
 is defined as:
</p>

H
                (
                X
                )
              
              
                
                =
                
                  
                    E
                  
                  
                    p
                  
                
                ln
                ⁡
                (
                
                  
                    1
                    
                      P
                      (
                      X
                      )
                    
                  
                
                )
                =
                −
                [
                P
                (
                X
                =
                0
                )
                ln
                ⁡
                P
                (
                X
                =
                0
                )
                +
                P
                (
                X
                =
                1
                )
                ln
                ⁡
                P
                (
                X
                =
                1
                )
                ]
              
            
            
              
                H
                (
                X
                )
              
              
                
                =
                −
                (
                q
                ln
                ⁡
                q
                +
                p
                ln
                ⁡
                p
                )
                ,
                
                q
                =
                P
                (
                X
                =
                0
                )
                ,
                p
                =
                P
                (
                X
                =
                1
                )
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}H(X)&=\mathbb {E} _{p}\ln({\frac {1}{P(X)}})=-[P(X=0)\ln P(X=0)+P(X=1)\ln P(X=1)]\\H(X)&=-(q\ln q+p\ln p),\quad q=P(X=0),p=P(X=1)\end{aligned}}}

<p>The entropy is maximized when 
  
    
      
        p
        =
        0.5
      
    
    {\displaystyle p=0.5}
  
, indicating the highest level of uncertainty when both outcomes are equally likely. The entropy is zero when 
  
    
      
        p
        =
        0
      
    
    {\displaystyle p=0}
  
 or 
  
    
      
        p
        =
        1
      
    
    {\displaystyle p=1}
  
, where one outcome is certain.
</p>
<h3>Fisher's Information</h3>
<p>Fisher information measures the amount of information that an observable random variable 
  
    
      
        X
      
    
    {\displaystyle X}
  
 carries about an unknown parameter 
  
    
      
        p
      
    
    {\displaystyle p}
  
 upon which the probability of 
  
    
      
        X
      
    
    {\displaystyle X}
  
 depends. For the Bernoulli distribution, the Fisher information with respect to the parameter 
  
    
      
        p
      
    
    {\displaystyle p}
  
 is given by:
</p>

I
                (
                p
                )
                =
                
                  
                    1
                    
                      p
                      q
                    
                  
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}I(p)={\frac {1}{pq}}\end{aligned}}}

<p>Proof:
</p>
<ul><li>The Likelihood Function for a Bernoulli random variable
  
    
      
        X
      
    
    {\displaystyle X}
  
 is:</li></ul>

L
                (
                p
                ;
                X
                )
                =
                
                  p
                  
                    X
                  
                
                (
                1
                −
                p
                
                  )
                  
                    1
                    −
                    X
                  
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}L(p;X)=p^{X}(1-p)^{1-X}\end{aligned}}}

<p>This represents the probability of observing 
  
    
      
        X
      
    
    {\displaystyle X}
  
 given the parameter 
  
    
      
        p
      
    
    {\displaystyle p}
  
.
</p>
<ul><li>The Log-Likelihood Function is:</li></ul>

ln
                ⁡
                L
                (
                p
                ;
                X
                )
                =
                X
                ln
                ⁡
                p
                +
                (
                1
                −
                X
                )
                ln
                ⁡
                (
                1
                −
                p
                )
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\ln L(p;X)=X\ln p+(1-X)\ln(1-p)\end{aligned}}}

<ul><li>The Score Function (the first derivative of the log-likelihood w.r.t. 
  
    
      
        p
      
    
    {\displaystyle p}
  
 is:</li></ul>

∂
                    
                      ∂
                      p
                    
                  
                
                ln
                ⁡
                L
                (
                p
                ;
                X
                )
                =
                
                  
                    X
                    p
                  
                
                −
                
                  
                    
                      1
                      −
                      X
                    
                    
                      1
                      −
                      p
                    
                  
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}{\frac {\partial }{\partial p}}\ln L(p;X)={\frac {X}{p}}-{\frac {1-X}{1-p}}\end{aligned}}}

<ul><li>The second derivative of the log-likelihood function is:</li></ul>

∂
                      
                        2
                      
                    
                    
                      ∂
                      
                        p
                        
                          2
                        
                      
                    
                  
                
                ln
                ⁡
                L
                (
                p
                ;
                X
                )
                =
                −
                
                  
                    X
                    
                      p
                      
                        2
                      
                    
                  
                
                −
                
                  
                    
                      1
                      −
                      X
                    
                    
                      (
                      1
                      −
                      p
                      
                        )
                        
                          2
                        
                      
                    
                  
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}{\frac {\partial ^{2}}{\partial p^{2}}}\ln L(p;X)=-{\frac {X}{p^{2}}}-{\frac {1-X}{(1-p)^{2}}}\end{aligned}}}

<ul><li>Fisher information is calculated as the negative expected value of the second derivative of the log-likelihood:</li></ul>

I
                (
                p
                )
                =
                −
                E
                
                  [
                  
                    
                      
                        
                          ∂
                          
                            2
                          
                        
                        
                          ∂
                          
                            p
                            
                              2
                            
                          
                        
                      
                    
                    ln
                    ⁡
                    L
                    (
                    p
                    ;
                    X
                    )
                  
                  ]
                
                =
                −
                
                  (
                  
                    −
                    
                      
                        p
                        
                          p
                          
                            2
                          
                        
                      
                    
                    −
                    
                      
                        
                          1
                          −
                          p
                        
                        
                          (
                          1
                          −
                          p
                          
                            )
                            
                              2
                            
                          
                        
                      
                    
                  
                  )
                
                =
                
                  
                    1
                    
                      p
                      (
                      1
                      −
                      p
                      )
                    
                  
                
                =
                
                  
                    1
                    
                      p
                      q
                    
                  
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}I(p)=-E\left[{\frac {\partial ^{2}}{\partial p^{2}}}\ln L(p;X)\right]=-\left(-{\frac {p}{p^{2}}}-{\frac {1-p}{(1-p)^{2}}}\right)={\frac {1}{p(1-p)}}={\frac {1}{pq}}\end{aligned}}}

<p>It is maximized when 
  
    
      
        p
        =
        0.5
      
    
    {\displaystyle p=0.5}
  
, reflecting maximum uncertainty and thus maximum information about the parameter 
  
    
      
        p
      
    
    {\displaystyle p}
  
.
</p>
<h2 id="related-distributions">Related distributions</h2>
<ul><li>If 
  
    
      
        
          X
          
            1
          
        
        ,
        …
        ,
        
          X
          
            n
          
        
      
    
    {\displaystyle X_{1},\dots ,X_{n}}
  
 are independent, identically distributed  (<a href="/facts/Independent_and_identically_distributed_random_variables/othIRaWt">i.i.d.</a>)  random variables, all <a href="/facts/Bernoulli_trial/7fm8snCf">Bernoulli trials</a> with success probability <i>p</i>, then their <a href="/facts/Sum_of_independent_random_variables/2aS20E3h">sum is distributed</a> according to a <a href="/facts/Binomial_distribution/UMoFMjDj">binomial distribution</a> with parameters <i>n</i> and <i>p</i>:

∑
          
            k
            =
            1
          
          
            n
          
        
        
          X
          
            k
          
        
        ∼
        B
        ⁡
        (
        n
        ,
        p
        )
      
    
    {\displaystyle \sum _{k=1}^{n}X_{k}\sim \operatorname {B} (n,p)}
  
 (<a href="/facts/Binomial_distribution/UMoFMjDj">binomial distribution</a>).<a class="footnote-ref" id="fnref:7" href="#fn:7"><sup>7</sup></a></li></ul>
The Bernoulli distribution is simply 
  
    
      
        B
        ⁡
        (
        1
        ,
        p
        )
      
    
    {\displaystyle \operatorname {B} (1,p)}
  
, also written as 
  
    
      
        
          B
          e
          r
          n
          o
          u
          l
          l
          i
        
        (
        p
        )
        .
      
    
    {\textstyle \mathrm {Bernoulli} (p).}

<ul><li>The <a href="/facts/Categorical_distribution/xpc6RWM2">categorical distribution</a> is the generalization of the Bernoulli distribution for variables with any constant number of discrete values.</li>
<li>The <a href="/facts/Beta_distribution/DbJk4eeV">Beta distribution</a> is the <a href="/facts/Conjugate_prior/ScCFcs8b">conjugate prior</a> of the Bernoulli distribution.<a class="footnote-ref" id="fnref:8" href="#fn:8"><sup>8</sup></a></li>
<li>The <a href="/facts/Geometric_distribution/c8wjfaoV">geometric distribution</a> models the number of independent and identical Bernoulli trials needed to get one success.</li>
<li>If 
  
    
      
        Y
        ∼
        
          B
          e
          r
          n
          o
          u
          l
          l
          i
        
        
          (
          
            
              1
              2
            
          
          )
        
      
    
    {\textstyle Y\sim \mathrm {Bernoulli} \left({\frac {1}{2}}\right)}
  
, then 
  
    
      
        2
        Y
        −
        1
      
    
    {\textstyle 2Y-1}
  
 has a <a href="/facts/Rademacher_distribution/MlPxl57G">Rademacher distribution</a>.</li></ul>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Bernoulli_process/AF9E3jpu">Bernoulli process</a>, a <a href="/facts/Random_process/Ng1n1dnB">random process</a> consisting of a sequence of <a href="/facts/Independence_(probability_theory)/NUzQtnUL">independent</a> Bernoulli trials</li>
<li><a href="/facts/Bernoulli_sampling/N7qeq0tb">Bernoulli sampling</a></li>
<li><a href="/facts/Binary_entropy_function/s9cwzz4n">Binary entropy function</a></li>
<li><a href="/facts/Binary_decision_diagram/977o5E7L">Binary decision diagram</a></li></ul>

<h2 id="further-reading">Further reading</h2>
<ul><li>Johnson, N. L.; Kotz, S.; Kemp, A. (1993). <i>Univariate Discrete Distributions</i> (2nd ed.). Wiley. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-471-54897-9.</li>
<li>Peatman, John G. (1963). <i>Introduction to Applied Statistics</i>. New York: Harper & Row. pp. 162–171.</li></ul>
<h2 id="external-links">External links</h2>

Wikimedia Commons has media related to Bernoulli distribution.

<ul><li><a href="https://www.encyclopediaofmath.org/index.php?title=Binomial_distribution">"Binomial distribution"</a>, <i><a href="/facts/Encyclopedia_of_Mathematics/WC6mGtPm">Encyclopedia of Mathematics</a></i>, <a href="/facts/European_Mathematical_Society/B3h7b672">EMS Press</a>, 2001 [1994].</li>
<li><a href="/facts/Eric_W._Weisstein/FbMuVDep">Weisstein, Eric W.</a> <a href="https://mathworld.wolfram.com/BernoulliDistribution.html">"Bernoulli Distribution"</a>. <i><a href="/facts/MathWorld/yc1jodbq">MathWorld</a></i>.</li>
<li>Interactive graphic: <a href="http://www.math.wm.edu/~leemis/chart/UDR/UDR.html">Univariate Distribution Relationships</a>.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Uspensky, James Victor (1937). Introduction to Mathematical Probability. New York: McGraw-Hill. p. 45. OCLC 996937. <a href="/wiki/OCLC_(identifier)" target="_blank">/wiki/OCLC_(identifier)</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Dekking, Frederik; Kraaikamp, Cornelis; Lopuhaä, Hendrik; Meester, Ludolf (9 October 2010). A Modern Introduction to Probability and Statistics (1 ed.). Springer London. pp. 43–48. ISBN 9781849969529. <a href="9781849969529" target="_blank">9781849969529</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Bertsekas, Dimitri P. (2002). Introduction to Probability. Tsitsiklis, John N., Τσιτσικλής, Γιάννης Ν. Belmont, Mass.: Athena Scientific. ISBN 188652940X. OCLC 51441829. <a href="188652940X" target="_blank">188652940X</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC. Section 4.2.2. ISBN 0-412-31760-5. <a href="0-412-31760-5" target="_blank">0-412-31760-5</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Bertsekas, Dimitri P. (2002). Introduction to Probability. Tsitsiklis, John N., Τσιτσικλής, Γιάννης Ν. Belmont, Mass.: Athena Scientific. ISBN 188652940X. OCLC 51441829. <a href="188652940X" target="_blank">188652940X</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>Bertsekas, Dimitri P. (2002). Introduction to Probability. Tsitsiklis, John N., Τσιτσικλής, Γιάννης Ν. Belmont, Mass.: Athena Scientific. ISBN 188652940X. OCLC 51441829. <a href="188652940X" target="_blank">188652940X</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
<li id="fn:7"><p>Bertsekas, Dimitri P. (2002). Introduction to Probability. Tsitsiklis, John N., Τσιτσικλής, Γιάννης Ν. Belmont, Mass.: Athena Scientific. ISBN 188652940X. OCLC 51441829. <a href="188652940X" target="_blank">188652940X</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></p></li>
<li id="fn:8"><p>Orloff, Jeremy; Bloom, Jonathan. "Conjugate priors: Beta and normal" (PDF). math.mit.edu. Retrieved October 20, 2023. <a href="https://math.mit.edu/~dav/05.dir/class15-prep.pdf" target="_blank">https://math.mit.edu/~dav/05.dir/class15-prep.pdf</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></p></li>
</ol>

Bernoulli distribution open-in-new

Bernoulli distribution