Menu
Home Explore People Places Arts History Plants & Animals Science Life & Culture Technology
On this page
Marginal distribution
Probability distribution of a subset (the "marginal variables") of a collection of random variables

In probability theory and statistics, the marginal distribution of a subset of random variables represents the probabilities of values within that subset, disregarding other variables. This differs from a conditional distribution, which depends on given values of other variables. Marginal variables are identified by summing across rows or columns of a table—hence the term "marginal"—and the process of focusing on these sums is called marginalizing out the remaining variables. In data analysis, one often begins with a broad set of variables, then narrows attention to the marginal distribution of selected subsets, such as sums or other derived variables, allowing for targeted insights into specific aspects of the data.

We don't have any images related to Marginal distribution yet.
We don't have any YouTube videos related to Marginal distribution yet.
We don't have any PDF documents related to Marginal distribution yet.
We don't have any Books related to Marginal distribution yet.
We don't have any archived web articles related to Marginal distribution yet.

Definition

Marginal probability mass function

Given a known joint distribution of two discrete random variables, say, X and Y, the marginal distribution of either variable – X for example – is the probability distribution of X when the values of Y are not taken into consideration. This can be calculated by summing the joint probability distribution over all values of Y. Naturally, the converse is also true: the marginal distribution can be obtained for Y by summing over the separate values of X.

p X ( x i ) = ∑ j p ( x i , y j ) {\displaystyle p_{X}(x_{i})=\sum _{j}p(x_{i},y_{j})} , and p Y ( y j ) = ∑ i p ( x i , y j ) {\displaystyle p_{Y}(y_{j})=\sum _{i}p(x_{i},y_{j})} Joint and marginal distributions of a pair of discrete random variables, X and Y, dependent, thus having nonzero mutual informationI(X; Y). The values of the joint distribution are in the 3×4 rectangle; the values of the marginal distributions are along the right and bottom margins.
XYx1x2x3x4pY(y) ↓
y1⁠4/32⁠⁠2/32⁠⁠1/32⁠⁠1/32⁠⁠8/32⁠
y2⁠3/32⁠⁠6/32⁠⁠3/32⁠⁠3/32⁠⁠15/32⁠
y3⁠9/32⁠000⁠9/32⁠
pX(x) →⁠16/32⁠⁠8/32⁠⁠4/32⁠⁠4/32⁠⁠32/32⁠

A marginal probability can always be written as an expected value: p X ( x ) = ∫ y p X ∣ Y ( x ∣ y ) p Y ( y ) d y = E Y ⁡ [ p X ∣ Y ( x ∣ Y ) ] . {\displaystyle p_{X}(x)=\int _{y}p_{X\mid Y}(x\mid y)\,p_{Y}(y)\,\mathrm {d} y=\operatorname {E} _{Y}[p_{X\mid Y}(x\mid Y)]\;.}

Intuitively, the marginal probability of X is computed by examining the conditional probability of X given a particular value of Y, and then averaging this conditional probability over the distribution of all values of Y.

This follows from the definition of expected value (after applying the law of the unconscious statistician) E Y ⁡ [ f ( Y ) ] = ∫ y f ( y ) p Y ( y ) d y . {\displaystyle \operatorname {E} _{Y}[f(Y)]=\int _{y}f(y)p_{Y}(y)\,\mathrm {d} y.}

Therefore, marginalization provides the rule for the transformation of the probability distribution of a random variable Y and another random variable X=g(Y): p X ( x ) = ∫ y p X ∣ Y ( x ∣ y ) p Y ( y ) d y = ∫ y δ ( x − g ( y ) ) p Y ( y ) d y . {\displaystyle p_{X}(x)=\int _{y}p_{X\mid Y}(x\mid y)\,p_{Y}(y)\,\mathrm {d} y=\int _{y}\delta {\big (}x-g(y){\big )}\,p_{Y}(y)\,\mathrm {d} y.}

Marginal probability density function

Given two continuous random variables X and Y whose joint distribution is known, then the marginal probability density function can be obtained by integrating the joint probability distribution, f, over Y, and vice versa. That is

f X ( x ) = ∫ c d f ( x , y ) d y {\displaystyle f_{X}(x)=\int _{c}^{d}f(x,y)\,dy} f Y ( y ) = ∫ a b f ( x , y ) d x {\displaystyle f_{Y}(y)=\int _{a}^{b}f(x,y)\,dx}

where x ∈ [ a , b ] {\displaystyle x\in [a,b]} , and y ∈ [ c , d ] {\displaystyle y\in [c,d]} .

Marginal cumulative distribution function

Finding the marginal cumulative distribution function from the joint cumulative distribution function is easy. Recall that:

  • For discrete random variables, F ( x , y ) = P ( X ≤ x , Y ≤ y ) {\displaystyle F(x,y)=P(X\leq x,Y\leq y)}
  • For continuous random variables, F ( x , y ) = ∫ a x ∫ c y f ( x ′ , y ′ ) d y ′ d x ′ {\displaystyle F(x,y)=\int _{a}^{x}\int _{c}^{y}f(x',y')\,dy'dx'}

If X and Y jointly take values on [a, b] × [c, d] then

F X ( x ) = F ( x , d ) {\displaystyle F_{X}(x)=F(x,d)} and F Y ( y ) = F ( b , y ) {\displaystyle F_{Y}(y)=F(b,y)}

If d is ∞, then this becomes a limit F X ( x ) = lim y → ∞ F ( x , y ) {\textstyle F_{X}(x)=\lim _{y\to \infty }F(x,y)} . Likewise for F Y ( y ) {\displaystyle F_{Y}(y)} .

Marginal distribution vs. conditional distribution

Definition

The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. This means that the calculation for one variable is dependent on another variable.2

The conditional distribution of a variable given another variable is the joint distribution of both variables divided by the marginal distribution of the other variable.3 That is,

  • For discrete random variables, p Y | X ( y | x ) = P ( Y = y ∣ X = x ) = P ( X = x , Y = y ) P X ( x ) {\displaystyle p_{Y|X}(y|x)=P(Y=y\mid X=x)={\frac {P(X=x,Y=y)}{P_{X}(x)}}}
  • For continuous random variables, f Y | X ( y | x ) = f X , Y ( x , y ) f X ( x ) {\displaystyle f_{Y|X}(y|x)={\frac {f_{X,Y}(x,y)}{f_{X}(x)}}}

Example

Suppose there is data from a classroom of 200 students on the amount of time studied (X) and the percentage of correct answers (Y).4 Assuming that X and Y are discrete random variables, the joint distribution of X and Y can be described by listing all the possible values of p(xi,yj), as shown in Table.3.

Two-way table of dataset of the relationship in a classroom of 200 students between the amount of time studied and the percentage correct
XYTime studied (minutes)
% correctx1 (0-20)x2 (21-40)x3 (41-60)x4(>60)pY(y) ↓
y1 (0-20)⁠2/200⁠00⁠8/200⁠⁠10/200⁠
y2 (21-40)⁠10/200⁠⁠2/200⁠⁠8/200⁠0⁠20/200⁠
y3 (41-59)⁠2/200⁠⁠4/200⁠⁠32/200⁠⁠32/200⁠⁠70/200⁠
y4 (60-79)0⁠20/200⁠⁠30/200⁠⁠10/200⁠⁠60/200⁠
y5 (80-100)0⁠4/200⁠⁠16/200⁠⁠20/200⁠⁠40/200⁠
pX(x) →⁠14/200⁠⁠30/200⁠⁠86/200⁠⁠70/200⁠1

The marginal distribution can be used to determine how many students scored 20 or below: p Y ( y 1 ) = P Y ( Y = y 1 ) = ∑ i = 1 4 P ( x i , y 1 ) = 2 200 + 8 200 = 10 200 {\displaystyle p_{Y}(y_{1})=P_{Y}(Y=y_{1})=\sum _{i=1}^{4}P(x_{i},y_{1})={\frac {2}{200}}+{\frac {8}{200}}={\frac {10}{200}}} , meaning 10 students or 5%.

The conditional distribution can be used to determine the probability that a student that studied 60 minutes or more obtains a scored of 20 or below: p Y | X ( y 1 | x 4 ) = P ( Y = y 1 | X = x 4 ) = P ( X = x 4 , Y = y 1 ) P ( X = x 4 ) = 8 / 200 70 / 200 = 8 70 = 4 35 {\displaystyle p_{Y|X}(y_{1}|x_{4})=P(Y=y_{1}|X=x_{4})={\frac {P(X=x_{4},Y=y_{1})}{P(X=x_{4})}}={\frac {8/200}{70/200}}={\frac {8}{70}}={\frac {4}{35}}} , meaning there is about a 11% probability of scoring 20 after having studied for at least 60 minutes.

Real-world example

Suppose that the probability that a pedestrian will be hit by a car, while crossing the road at a pedestrian crossing, without paying attention to the traffic light, is to be computed. Let H be a discrete random variable taking one value from {Hit, Not Hit}. Let L (for traffic light) be a discrete random variable taking one value from {Red, Yellow, Green}.

Realistically, H will be dependent on L. That is, P(H = Hit) will take different values depending on whether L is red, yellow or green (and likewise for P(H = Not Hit)). A person is, for example, far more likely to be hit by a car when trying to cross while the lights for perpendicular traffic are green than if they are red. In other words, for any given possible pair of values for H and L, one must consider the joint probability distribution of H and L to find the probability of that pair of events occurring together if the pedestrian ignores the state of the light.

However, in trying to calculate the marginal probability P(H = Hit), what is being sought is the probability that H = Hit in the situation in which the particular value of L is unknown and in which the pedestrian ignores the state of the light. In general, a pedestrian can be hit if the lights are red OR if the lights are yellow OR if the lights are green. So, the answer for the marginal probability can be found by summing P(H | L) for all possible values of L, with each value of L weighted by its probability of occurring.

Here is a table showing the conditional probabilities of being hit, depending on the state of the lights. (Note that the columns in this table must add up to 1 because the probability of being hit or not hit is 1 regardless of the state of the light.)

Conditional distribution: P ( H ∣ L ) {\displaystyle P(H\mid L)}
LHRedYellowGreen
Not Hit0.990.90.2
Hit0.010.10.8

To find the joint probability distribution, more data is required. For example, suppose P(L = red) = 0.2, P(L = yellow) = 0.1, and P(L = green) = 0.7. Multiplying each column in the conditional distribution by the probability of that column occurring results in the joint probability distribution of H and L, given in the central 2×3 block of entries. (Note that the cells in this 2×3 block add up to 1).

Joint distribution: ⁠ P ( H , L ) {\displaystyle P(H,L)} ⁠
LHRedYellowGreenMarginal probability P(H)
Not Hit0.1980.090.140.428
Hit0.0020.010.560.572
Total0.20.10.71

The marginal probability P(H = Hit) is the sum 0.572 along the H = Hit row of this joint distribution table, as this is the probability of being hit when the lights are red OR yellow OR green. Similarly, the marginal probability that P(H = Not Hit) is the sum along the H = Not Hit row.

Multivariate distributions

For multivariate distributions, formulae similar to those above apply with the symbols X and/or Y being interpreted as vectors. In particular, each summation or integration would be over all variables except those contained in X.5

That means, If X1,X2,…,Xn are discrete random variables, then the marginal probability mass function should be p X i ( k ) = ∑ p ( x 1 , x 2 , … , x i − 1 , k , x i + 1 , … , x n ) ; {\displaystyle p_{X_{i}}(k)=\sum p(x_{1},x_{2},\dots ,x_{i-1},k,x_{i+1},\dots ,x_{n});} if X1,X2,…,Xn are continuous random variables, then the marginal probability density function should be f X i ( x i ) = ∫ − ∞ ∞ ∫ − ∞ ∞ ∫ − ∞ ∞ ⋯ ∫ − ∞ ∞ f ( x 1 , x 2 , … , x n ) d x 1 d x 2 ⋯ d x i − 1 d x i + 1 ⋯ d x n . {\displaystyle f_{X_{i}}(x_{i})=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }f(x_{1},x_{2},\dots ,x_{n})dx_{1}dx_{2}\cdots dx_{i-1}dx_{i+1}\cdots dx_{n}.}

See also

Bibliography

  • Everitt, B. S.; Skrondal, A. (2010). Cambridge Dictionary of Statistics. Cambridge University Press.
  • Dekking, F. M.; Kraaikamp, C.; Lopuhaä, H. P.; Meester, L. E. (2005). A modern introduction to probability and statistics. London : Springer. ISBN 9781852338961.

References

  1. Trumpler, Robert J. & Harold F. Weaver (1962). Statistical Astronomy. Dover Publications. pp. 32–33.

  2. "Marginal & Conditional Probability Distributions: Definition & Examples". Study.com. Retrieved 2019-11-16. https://study.com/academy/lesson/marginal-conditional-probability-distributions-definition-examples.html

  3. "Exam P [FSU Math]". www.math.fsu.edu. Retrieved 2019-11-16. https://www.math.fsu.edu/~paris/Pexam/

  4. Marginal and conditional distributions, retrieved 2019-11-16 https://www.khanacademy.org/math/ap-statistics/analyzing-categorical-ap/distributions-two-way-tables/v/marginal-distribution-and-conditional-distribution

  5. A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. ISBN 9781852338961. OCLC 262680588.{{cite book}}: CS1 maint: others (link) 9781852338961