Probability mass function

<h2 id="formal-definition">Formal definition</h2>
<p>Probability mass function is the probability distribution of a  <a href="/facts/Discrete_random_variable/TwTBXnLT">discrete random variable</a>, and provides the possible values and their associated probabilities. It is the function 
  
    
      
        p
        :
        
          R
        
        →
        [
        0
        ,
        1
        ]
      
    
    {\displaystyle p:\mathbb {R} \to [0,1]}
  
 defined by
</p>

<p>
  
    
      
        
          p
          
            X
          
        
        (
        x
        )
        =
        P
        (
        X
        =
        x
        )
      
    
    {\displaystyle p_{X}(x)=P(X=x)}

</p>

<p>for 
  
    
      
        −
        ∞
        <
        x
        <
        ∞
      
    
    {\displaystyle -\infty <x<\infty }
  
,<a class="footnote-ref" id="fnref:4" href="#fn:4"><sup>4</sup></a> where 
  
    
      
        P
      
    
    {\displaystyle P}
  
 is a <a href="/facts/Probability_measure/8BjhgoPR">probability measure</a>. 
  
    
      
        
          p
          
            X
          
        
        (
        x
        )
      
    
    {\displaystyle p_{X}(x)}
  
 can also be simplified as 
  
    
      
        p
        (
        x
        )
      
    
    {\displaystyle p(x)}
  
.<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a>
</p><p>The probabilities associated with all (hypothetical) values must be non-negative and sum up to 1,
</p><p>
  
    
      
        
          ∑
          
            x
          
        
        
          p
          
            X
          
        
        (
        x
        )
        =
        1
      
    
    {\displaystyle \sum _{x}p_{X}(x)=1}
  
 and 
  
    
      
        
          p
          
            X
          
        
        (
        x
        )
        ≥
        0.
      
    
    {\displaystyle p_{X}(x)\geq 0.}

</p><p>Thinking of probability as mass helps to avoid mistakes since the physical mass is <a href="/facts/Conservation_of_mass/fRu05SS1">conserved</a> as is the total probability for all hypothetical outcomes 
  
    
      
        x
      
    
    {\displaystyle x}
  
.
</p>
<h2 id="measure-theoretic-formulation">Measure theoretic formulation</h2>
<p>A probability mass function of a discrete random variable 
  
    
      
        X
      
    
    {\displaystyle X}
  
 can be seen as a special case of two more general measure theoretic constructions: 
the <a href="/facts/Probability_distribution/EpsKKVRu">distribution</a> of 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and the <a href="/facts/Probability_density_function/zvfybna4">probability density function</a> of 
  
    
      
        X
      
    
    {\displaystyle X}
  
 with respect to the <a href="/facts/Counting_measure/TA4u39yF">counting measure</a>.  We make this more precise below.
</p><p>Suppose that 
  
    
      
        (
        A
        ,
        
          
            A
          
        
        ,
        P
        )
      
    
    {\displaystyle (A,{\mathcal {A}},P)}
  
 is a <a href="/facts/Probability_space/dEcv2VrU">probability space</a>
and that 
  
    
      
        (
        B
        ,
        
          
            B
          
        
        )
      
    
    {\displaystyle (B,{\mathcal {B}})}
  
 is a measurable space whose underlying <a href="/facts/Sigma_algebra/8LJINLNc">σ-algebra</a> is discrete, so in particular contains singleton sets of 
  
    
      
        B
      
    
    {\displaystyle B}
  
. In this setting, a random variable 
  
    
      
        X
        :
        A
        →
        B
      
    
    {\displaystyle X\colon A\to B}
  
 is discrete provided its image is countable.
The <a href="/facts/Pushforward_measure/aFljwqr6">pushforward measure</a> 
  
    
      
        
          X
          
            ∗
          
        
        (
        P
        )
      
    
    {\displaystyle X_{*}(P)}
  
—called the distribution of 
  
    
      
        X
      
    
    {\displaystyle X}
  
 in this context—is a probability measure on 
  
    
      
        B
      
    
    {\displaystyle B}
  
 whose restriction to singleton sets induces the probability mass function (as mentioned in the previous section) 
  
    
      
        
          f
          
            X
          
        
        :
        B
        →
        
          R
        
      
    
    {\displaystyle f_{X}\colon B\to \mathbb {R} }
  
 since 
  
    
      
        
          f
          
            X
          
        
        (
        b
        )
        =
        P
        (
        
          X
          
            −
            1
          
        
        (
        b
        )
        )
        =
        P
        (
        X
        =
        b
        )
      
    
    {\displaystyle f_{X}(b)=P(X^{-1}(b))=P(X=b)}
  
 for each 
  
    
      
        b
        ∈
        B
      
    
    {\displaystyle b\in B}
  
.
</p><p>Now suppose that 
  
    
      
        (
        B
        ,
        
          
            B
          
        
        ,
        μ
        )
      
    
    {\displaystyle (B,{\mathcal {B}},\mu )}
  
 is a <a href="/facts/Measure_space/w9Dx4KY1">measure space</a> equipped with the counting measure 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
.  The probability density function 
  
    
      
        f
      
    
    {\displaystyle f}
  
 of 
  
    
      
        X
      
    
    {\displaystyle X}
  
 with respect to the counting measure, if it exists, is the <a href="/facts/Radon%25E2%2580%2593Nikodym_derivative/uWepNG2B">Radon–Nikodym derivative</a> of the pushforward measure of 
  
    
      
        X
      
    
    {\displaystyle X}
  
 (with respect to the counting measure), so 
  
    
      
        f
        =
        d
        
          X
          
            ∗
          
        
        P
        
          /
        
        d
        μ
      
    
    {\displaystyle f=dX_{*}P/d\mu }
  
 and 
  
    
      
        f
      
    
    {\displaystyle f}
  
 is a function from 
  
    
      
        B
      
    
    {\displaystyle B}
  
 to the non-negative reals.  As a consequence, for any 
  
    
      
        b
        ∈
        B
      
    
    {\displaystyle b\in B}
  
 we have

P
        (
        X
        =
        b
        )
        =
        P
        (
        
          X
          
            −
            1
          
        
        (
        b
        )
        )
        =
        
          X
          
            ∗
          
        
        (
        P
        )
        (
        b
        )
        =
        
          ∫
          
            b
          
        
        f
        d
        μ
        =
        f
        (
        b
        )
        ,
      
    
    {\displaystyle P(X=b)=P(X^{-1}(b))=X_{*}(P)(b)=\int _{b}fd\mu =f(b),}

</p><p>demonstrating that 
  
    
      
        f
      
    
    {\displaystyle f}
  
 is in fact a probability mass function.
</p><p>When there is a natural order among the potential outcomes 
  
    
      
        x
      
    
    {\displaystyle x}
  
, it may be convenient to assign numerical values to them (or <i>n</i>-tuples in case of a discrete <a href="/facts/Multivariate_random_variable/qMfooyVf">multivariate random variable</a>) and to consider also values not in the <a href="/facts/Image_(mathematics)/KANefMAl">image</a> of 
  
    
      
        X
      
    
    {\displaystyle X}
  
. That is, 
  
    
      
        
          f
          
            X
          
        
      
    
    {\displaystyle f_{X}}
  
 may be defined for all <a href="/facts/Real_number/R02gw5Pb">real numbers</a> and 
  
    
      
        
          f
          
            X
          
        
        (
        x
        )
        =
        0
      
    
    {\displaystyle f_{X}(x)=0}
  
 for all 
  
    
      
        x
        ∉
        X
        (
        S
        )
      
    
    {\displaystyle x\notin X(S)}
  
 as shown in the figure.
</p><p>The image of 
  
    
      
        X
      
    
    {\displaystyle X}
  
 has a <a href="/facts/Countable/Wj4DmWPv">countable</a> subset on which the probability mass function 
  
    
      
        
          f
          
            X
          
        
        (
        x
        )
      
    
    {\displaystyle f_{X}(x)}
  
 is one. Consequently, the probability mass function is zero for all but a countable number of values of 
  
    
      
        x
      
    
    {\displaystyle x}
  
.
</p><p>The discontinuity of probability mass functions is related to the fact that the <a href="/facts/Cumulative_distribution_function/WaKU8tp4">cumulative distribution function</a> of a discrete random variable is also discontinuous. If 
  
    
      
        X
      
    
    {\displaystyle X}
  
 is a discrete random variable, then 
  
    
      
        P
        (
        X
        =
        x
        )
        =
        1
      
    
    {\displaystyle P(X=x)=1}
  
 means that the casual event 
  
    
      
        (
        X
        =
        x
        )
      
    
    {\displaystyle (X=x)}
  
 is certain (it is true in 100% of the occurrences); on the contrary, 
  
    
      
        P
        (
        X
        =
        x
        )
        =
        0
      
    
    {\displaystyle P(X=x)=0}
  
 means that the casual event 
  
    
      
        (
        X
        =
        x
        )
      
    
    {\displaystyle (X=x)}
  
 is always impossible. This statement isn't true for a <a href="/facts/Continuous_random_variable/EpsKKVRu">continuous random variable</a> 
  
    
      
        X
      
    
    {\displaystyle X}
  
, for which 
  
    
      
        P
        (
        X
        =
        x
        )
        =
        0
      
    
    {\displaystyle P(X=x)=0}
  
 for any possible 
  
    
      
        x
      
    
    {\displaystyle x}
  
. <a href="/facts/Discretization_of_continuous_features/c8RXMm7D">Discretization</a> is the process of converting a continuous random variable into a discrete one.
</p>
<h2 id="examples">Examples</h2>
<p class="note">Main articles: <a href="/facts/Bernoulli_distribution/ChCtYyvs">Bernoulli distribution</a>, <a href="/facts/Binomial_distribution/UMoFMjDj">Binomial distribution</a>, and <a href="/facts/Geometric_distribution/c8wjfaoV">Geometric distribution</a></p>
<h3>Finite</h3>
<p>There are three major distributions associated, the <a href="/facts/Bernoulli_distribution/ChCtYyvs">Bernoulli distribution</a>, the <a href="/facts/Binomial_distribution/UMoFMjDj">binomial distribution</a> and the <a href="/facts/Geometric_distribution/c8wjfaoV">geometric distribution</a>.
</p>
<ul><li>Bernoulli distribution: ber(p) , is used to model an experiment with only two possible outcomes. The two outcomes are often encoded as 1 and 0. 
  
    
      
        
          p
          
            X
          
        
        (
        x
        )
        =
        
          
            {
            
              
                
                  p
                  ,
                
                
                  
                    if 
                  
                  x
                  
                     is 1
                  
                
              
              
                
                  1
                  −
                  p
                  ,
                
                
                  
                    if 
                  
                  x
                  
                     is 0
                  
                
              
            
            
          
        
      
    
    {\displaystyle p_{X}(x)={\begin{cases}p,&{\text{if }}x{\text{ is 1}}\\1-p,&{\text{if }}x{\text{ is 0}}\end{cases}}}
  
 An example of the Bernoulli distribution is tossing a coin. Suppose that 
  
    
      
        S
      
    
    {\displaystyle S}
  
 is the sample space of all outcomes of a single toss of a <a href="/facts/Fair_coin/vTBk29k7">fair coin</a>, and 
  
    
      
        X
      
    
    {\displaystyle X}
  
 is the random variable defined on 
  
    
      
        S
      
    
    {\displaystyle S}
  
 assigning 0 to the category "tails" and 1 to the category "heads".  Since the coin is fair, the probability mass function is 
  
    
      
        
          p
          
            X
          
        
        (
        x
        )
        =
        
          
            {
            
              
                
                  
                    
                      1
                      2
                    
                  
                  ,
                
                
                  x
                  =
                  0
                  ,
                
              
              
                
                  
                    
                      1
                      2
                    
                  
                  ,
                
                
                  x
                  =
                  1
                  ,
                
              
              
                
                  0
                  ,
                
                
                  x
                  ∉
                  {
                  0
                  ,
                  1
                  }
                  .
                
              
            
            
          
        
      
    
    {\displaystyle p_{X}(x)={\begin{cases}{\frac {1}{2}},&x=0,\\{\frac {1}{2}},&x=1,\\0,&x\notin \{0,1\}.\end{cases}}}
  
</li>
<li>Binomial distribution, models the number of successes when someone draws n times with replacement. Each draw or experiment is independent, with two possible outcomes. The associated probability mass function is 
  
    
      
        
          
            
              (
            
            
              n
              k
            
            
              )
            
          
        
        
          p
          
            k
          
        
        (
        1
        −
        p
        
          )
          
            n
            −
            k
          
        
      
    
    {\textstyle {\binom {n}{k}}p^{k}(1-p)^{n-k}}
  
. An example of the binomial distribution is the probability of getting exactly one 6 when someone rolls a fair die three times.</li>
<li>Geometric distribution describes the number of trials needed to get one success. Its probability mass function is 
  
    
      
        
          p
          
            X
          
        
        (
        k
        )
        =
        (
        1
        −
        p
        
          )
          
            k
            −
            1
          
        
        p
      
    
    {\textstyle p_{X}(k)=(1-p)^{k-1}p}
  
.An example is tossing a coin until the first "heads" appears. 
  
    
      
        p
      
    
    {\displaystyle p}
  
 denotes the probability of the outcome "heads", and 
  
    
      
        k
      
    
    {\displaystyle k}
  
 denotes the number of necessary coin tosses. Other distributions that can be modeled using a probability mass function are the <a href="/facts/Categorical_distribution/xpc6RWM2">categorical distribution</a> (also known as the generalized Bernoulli distribution) and the <a href="/facts/Multinomial_distribution/y58v1p9J">multinomial distribution</a>.</li>
<li>If the discrete distribution has two or more categories one of which may occur, whether or not these categories have a natural ordering, when there is only a single trial (draw) this is a categorical distribution.</li>
<li>An example of a <a href="/facts/Joint_probability_distribution/klX2ksGY">multivariate discrete distribution</a>, and of its probability mass function, is provided by the <a href="/facts/Multinomial_distribution/y58v1p9J">multinomial distribution</a>. Here the multiple random variables are the numbers of successes in each of the categories after a given number of trials, and each non-zero probability mass gives the probability of a certain combination of numbers of successes in the various categories.</li></ul>

<h3>Infinite</h3>
<p>The following exponentially declining distribution is an example of a distribution with an infinite number of possible outcomes—all the positive integers: 
  
    
      
        
          Pr
        
        (
        X
        =
        i
        )
        =
        
          
            1
            
              2
              
                i
              
            
          
        
        
        
          for 
        
        i
        =
        1
        ,
        2
        ,
        3
        ,
        …
      
    
    {\displaystyle {\text{Pr}}(X=i)={\frac {1}{2^{i}}}\qquad {\text{for }}i=1,2,3,\dots }
  
 Despite the infinite number of possible outcomes, the total probability mass is 1/2 + 1/4 + 1/8 + ⋯ = 1, satisfying the unit total probability requirement for a probability distribution.
</p>
<h2 id="multivariate-case">Multivariate case</h2>
<p class="note">Main article: <a href="/facts/Joint_probability_distribution/klX2ksGY">Joint probability distribution</a></p>
<p>Two or more discrete random variables have a joint probability mass function, which gives the probability of each possible combination of realizations for the random variables.
</p>

<h2 id="further-reading">Further reading</h2>
<ul><li>Johnson, N. L.; Kotz, S.; Kemp, A. (1993). <a href="https://archive.org/details/univariatediscre00john_205"><i>Univariate Discrete Distributions</i></a> (2nd ed.). Wiley. p. <a href="https://archive.org/details/univariatediscre00john_205/page/n28">36</a>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-471-54897-9.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>7.2 - Probability Mass Functions | STAT 414 - PennState - Eberly College of Science <a href="https://online.stat.psu.edu/stat414/lesson/7/7.2" target="_blank">https://online.stat.psu.edu/stat414/lesson/7/7.2</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Stewart, William J. (2011). Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling. Princeton University Press. p. 105. ISBN 978-1-4008-3281-1. <a href="978-1-4008-3281-1" target="_blank">978-1-4008-3281-1</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. ISBN 978-1-85233-896-1. OCLC 262680588.{{cite book}}:  CS1 maint: others (link) <a href="978-1-85233-896-1" target="_blank">978-1-85233-896-1</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. ISBN 978-1-85233-896-1. OCLC 262680588.{{cite book}}:  CS1 maint: others (link) <a href="978-1-85233-896-1" target="_blank">978-1-85233-896-1</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Rao, Singiresu S. (1996). Engineering optimization : theory and practice (3rd ed.). New York: Wiley. ISBN 0-471-55034-5. OCLC 62080932. <a href="0-471-55034-5" target="_blank">0-471-55034-5</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
</ol>

Probability mass function open-in-new

Probability mass function