Probability mass function is the probability distribution of a discrete random variable, and provides the possible values and their associated probabilities. It is the function p : R → [ 0 , 1 ] {\displaystyle p:\mathbb {R} \to [0,1]} defined by
p X ( x ) = P ( X = x ) {\displaystyle p_{X}(x)=P(X=x)}
for − ∞ < x < ∞ {\displaystyle -\infty <x<\infty } ,4 where P {\displaystyle P} is a probability measure. p X ( x ) {\displaystyle p_{X}(x)} can also be simplified as p ( x ) {\displaystyle p(x)} .5
The probabilities associated with all (hypothetical) values must be non-negative and sum up to 1,
∑ x p X ( x ) = 1 {\displaystyle \sum _{x}p_{X}(x)=1} and p X ( x ) ≥ 0. {\displaystyle p_{X}(x)\geq 0.}
Thinking of probability as mass helps to avoid mistakes since the physical mass is conserved as is the total probability for all hypothetical outcomes x {\displaystyle x} .
A probability mass function of a discrete random variable X {\displaystyle X} can be seen as a special case of two more general measure theoretic constructions: the distribution of X {\displaystyle X} and the probability density function of X {\displaystyle X} with respect to the counting measure. We make this more precise below.
Suppose that ( A , A , P ) {\displaystyle (A,{\mathcal {A}},P)} is a probability space and that ( B , B ) {\displaystyle (B,{\mathcal {B}})} is a measurable space whose underlying σ-algebra is discrete, so in particular contains singleton sets of B {\displaystyle B} . In this setting, a random variable X : A → B {\displaystyle X\colon A\to B} is discrete provided its image is countable. The pushforward measure X ∗ ( P ) {\displaystyle X_{*}(P)} —called the distribution of X {\displaystyle X} in this context—is a probability measure on B {\displaystyle B} whose restriction to singleton sets induces the probability mass function (as mentioned in the previous section) f X : B → R {\displaystyle f_{X}\colon B\to \mathbb {R} } since f X ( b ) = P ( X − 1 ( b ) ) = P ( X = b ) {\displaystyle f_{X}(b)=P(X^{-1}(b))=P(X=b)} for each b ∈ B {\displaystyle b\in B} .
Now suppose that ( B , B , μ ) {\displaystyle (B,{\mathcal {B}},\mu )} is a measure space equipped with the counting measure μ {\displaystyle \mu } . The probability density function f {\displaystyle f} of X {\displaystyle X} with respect to the counting measure, if it exists, is the Radon–Nikodym derivative of the pushforward measure of X {\displaystyle X} (with respect to the counting measure), so f = d X ∗ P / d μ {\displaystyle f=dX_{*}P/d\mu } and f {\displaystyle f} is a function from B {\displaystyle B} to the non-negative reals. As a consequence, for any b ∈ B {\displaystyle b\in B} we have P ( X = b ) = P ( X − 1 ( b ) ) = X ∗ ( P ) ( b ) = ∫ b f d μ = f ( b ) , {\displaystyle P(X=b)=P(X^{-1}(b))=X_{*}(P)(b)=\int _{b}fd\mu =f(b),}
demonstrating that f {\displaystyle f} is in fact a probability mass function.
When there is a natural order among the potential outcomes x {\displaystyle x} , it may be convenient to assign numerical values to them (or n-tuples in case of a discrete multivariate random variable) and to consider also values not in the image of X {\displaystyle X} . That is, f X {\displaystyle f_{X}} may be defined for all real numbers and f X ( x ) = 0 {\displaystyle f_{X}(x)=0} for all x ∉ X ( S ) {\displaystyle x\notin X(S)} as shown in the figure.
The image of X {\displaystyle X} has a countable subset on which the probability mass function f X ( x ) {\displaystyle f_{X}(x)} is one. Consequently, the probability mass function is zero for all but a countable number of values of x {\displaystyle x} .
The discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous. If X {\displaystyle X} is a discrete random variable, then P ( X = x ) = 1 {\displaystyle P(X=x)=1} means that the casual event ( X = x ) {\displaystyle (X=x)} is certain (it is true in 100% of the occurrences); on the contrary, P ( X = x ) = 0 {\displaystyle P(X=x)=0} means that the casual event ( X = x ) {\displaystyle (X=x)} is always impossible. This statement isn't true for a continuous random variable X {\displaystyle X} , for which P ( X = x ) = 0 {\displaystyle P(X=x)=0} for any possible x {\displaystyle x} . Discretization is the process of converting a continuous random variable into a discrete one.
Main articles: Bernoulli distribution, Binomial distribution, and Geometric distribution
There are three major distributions associated, the Bernoulli distribution, the binomial distribution and the geometric distribution.
The following exponentially declining distribution is an example of a distribution with an infinite number of possible outcomes—all the positive integers: Pr ( X = i ) = 1 2 i for i = 1 , 2 , 3 , … {\displaystyle {\text{Pr}}(X=i)={\frac {1}{2^{i}}}\qquad {\text{for }}i=1,2,3,\dots } Despite the infinite number of possible outcomes, the total probability mass is 1/2 + 1/4 + 1/8 + ⋯ = 1, satisfying the unit total probability requirement for a probability distribution.
Main article: Joint probability distribution
Two or more discrete random variables have a joint probability mass function, which gives the probability of each possible combination of realizations for the random variables.
7.2 - Probability Mass Functions | STAT 414 - PennState - Eberly College of Science https://online.stat.psu.edu/stat414/lesson/7/7.2 ↩
Stewart, William J. (2011). Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling. Princeton University Press. p. 105. ISBN 978-1-4008-3281-1. 978-1-4008-3281-1 ↩
A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. ISBN 978-1-85233-896-1. OCLC 262680588.{{cite book}}: CS1 maint: others (link) 978-1-85233-896-1 ↩
Rao, Singiresu S. (1996). Engineering optimization : theory and practice (3rd ed.). New York: Wiley. ISBN 0-471-55034-5. OCLC 62080932. 0-471-55034-5 ↩