The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family or class of functions. It is especially used in the context of statistical learning theory, where it is used to study properties of statistical learning methods. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties. It is a basic concept in machine learning.
Definitions
Set-family definition
Let H {\displaystyle H} be a set family (a set of sets) and C {\displaystyle C} a set. Their intersection is defined as the following set-family:
H ∩ C := { h ∩ C ∣ h ∈ H } {\displaystyle H\cap C:=\{h\cap C\mid h\in H\}}The intersection-size (also called the index) of H {\displaystyle H} with respect to C {\displaystyle C} is | H ∩ C | {\displaystyle |H\cap C|} . If a set C m {\displaystyle C_{m}} has m {\displaystyle m} elements then the index is at most 2 m {\displaystyle 2^{m}} . If the index is exactly 2m then the set C {\displaystyle C} is said to be shattered by H {\displaystyle H} , because H ∩ C {\displaystyle H\cap C} contains all the subsets of C {\displaystyle C} , i.e.:
| H ∩ C | = 2 | C | , {\displaystyle |H\cap C|=2^{|C|},}The growth function measures the size of H ∩ C {\displaystyle H\cap C} as a function of | C | {\displaystyle |C|} . Formally:
Growth ( H , m ) := max C : | C | = m | H ∩ C | {\displaystyle \operatorname {Growth} (H,m):=\max _{C:|C|=m}|H\cap C|}Hypothesis-class definition
Equivalently, let H {\displaystyle H} be a hypothesis-class (a set of binary functions) and C {\displaystyle C} a set with m {\displaystyle m} elements. The restriction of H {\displaystyle H} to C {\displaystyle C} is the set of binary functions on C {\displaystyle C} that can be derived from H {\displaystyle H} :4: 45
H C := { ( h ( x 1 ) , … , h ( x m ) ) ∣ h ∈ H , x i ∈ C } {\displaystyle H_{C}:=\{(h(x_{1}),\ldots ,h(x_{m}))\mid h\in H,x_{i}\in C\}}The growth function measures the size of H C {\displaystyle H_{C}} as a function of | C | {\displaystyle |C|} :5: 49
Growth ( H , m ) := max C : | C | = m | H C | {\displaystyle \operatorname {Growth} (H,m):=\max _{C:|C|=m}|H_{C}|}Examples
1. The domain is the real line R {\displaystyle \mathbb {R} } . The set-family H {\displaystyle H} contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form { x > x 0 ∣ x ∈ R } {\displaystyle \{x>x_{0}\mid x\in \mathbb {R} \}} for some x 0 ∈ R {\displaystyle x_{0}\in \mathbb {R} } . For any set C {\displaystyle C} of m {\displaystyle m} real numbers, the intersection H ∩ C {\displaystyle H\cap C} contains m + 1 {\displaystyle m+1} sets: the empty set, the set containing the largest element of C {\displaystyle C} , the set containing the two largest elements of C {\displaystyle C} , and so on. Therefore: Growth ( H , m ) = m + 1 {\displaystyle \operatorname {Growth} (H,m)=m+1} .6: Ex.1 The same is true whether H {\displaystyle H} contains open half-lines, closed half-lines, or both.
2. The domain is the segment [ 0 , 1 ] {\displaystyle [0,1]} . The set-family H {\displaystyle H} contains all the open sets. For any finite set C {\displaystyle C} of m {\displaystyle m} real numbers, the intersection H ∩ C {\displaystyle H\cap C} contains all possible subsets of C {\displaystyle C} . There are 2 m {\displaystyle 2^{m}} such subsets, so Growth ( H , m ) = 2 m {\displaystyle \operatorname {Growth} (H,m)=2^{m}} . 7: Ex.2
3. The domain is the Euclidean space R n {\displaystyle \mathbb {R} ^{n}} . The set-family H {\displaystyle H} contains all the half-spaces of the form: x ⋅ ϕ ≥ 1 {\displaystyle x\cdot \phi \geq 1} , where ϕ {\displaystyle \phi } is a fixed vector. Then Growth ( H , m ) = Comp ( n , m ) {\displaystyle \operatorname {Growth} (H,m)=\operatorname {Comp} (n,m)} , where Comp is the number of components in a partitioning of an n-dimensional space by m hyperplanes.8: Ex.3
4. The domain is the real line R {\displaystyle \mathbb {R} } . The set-family H {\displaystyle H} contains all the real intervals, i.e., all sets of the form { x ∈ [ x 0 , x 1 ] | x ∈ R } {\displaystyle \{x\in [x_{0},x_{1}]|x\in \mathbb {R} \}} for some x 0 , x 1 ∈ R {\displaystyle x_{0},x_{1}\in \mathbb {R} } . For any set C {\displaystyle C} of m {\displaystyle m} real numbers, the intersection H ∩ C {\displaystyle H\cap C} contains all runs of between 0 and m {\displaystyle m} consecutive elements of C {\displaystyle C} . The number of such runs is ( m + 1 2 ) + 1 {\displaystyle {m+1 \choose 2}+1} , so Growth ( H , m ) = ( m + 1 2 ) + 1 {\displaystyle \operatorname {Growth} (H,m)={m+1 \choose 2}+1} .
Polynomial or exponential
The main property that makes the growth function interesting is that it can be either polynomial or exponential - nothing in-between.
The following is a property of the intersection-size:9: Lem.1
- If, for some set C m {\displaystyle C_{m}} of size m {\displaystyle m} , and for some number n ≤ m {\displaystyle n\leq m} , | H ∩ C m | ≥ Comp ( n , m ) {\displaystyle |H\cap C_{m}|\geq \operatorname {Comp} (n,m)} -
- then, there exists a subset C n ⊆ C m {\displaystyle C_{n}\subseteq C_{m}} of size n {\displaystyle n} such that | H ∩ C n | = 2 n {\displaystyle |H\cap C_{n}|=2^{n}} .
This implies the following property of the Growth function.10: Th.1 For every family H {\displaystyle H} there are two cases:
- The exponential case: Growth ( H , m ) = 2 m {\displaystyle \operatorname {Growth} (H,m)=2^{m}} identically.
- The polynomial case: Growth ( H , m ) {\displaystyle \operatorname {Growth} (H,m)} is majorized by Comp ( n , m ) ≤ m n + 1 {\displaystyle \operatorname {Comp} (n,m)\leq m^{n}+1} , where n {\displaystyle n} is the smallest integer for which Growth ( H , n ) < 2 n {\displaystyle \operatorname {Growth} (H,n)<2^{n}} .
Other properties
Trivial upper bound
For any finite H {\displaystyle H} :
Growth ( H , m ) ≤ | H | {\displaystyle \operatorname {Growth} (H,m)\leq |H|}since for every C {\displaystyle C} , the number of elements in H ∩ C {\displaystyle H\cap C} is at most | H | {\displaystyle |H|} . Therefore, the growth function is mainly interesting when H {\displaystyle H} is infinite.
Exponential upper bound
For any nonempty H {\displaystyle H} :
Growth ( H , m ) ≤ 2 m {\displaystyle \operatorname {Growth} (H,m)\leq 2^{m}}I.e, the growth function has an exponential upper-bound.
We say that a set-family H {\displaystyle H} shatters a set C {\displaystyle C} if their intersection contains all possible subsets of C {\displaystyle C} , i.e. H ∩ C = 2 C {\displaystyle H\cap C=2^{C}} . If H {\displaystyle H} shatters C {\displaystyle C} of size m {\displaystyle m} , then Growth ( H , C ) = 2 m {\displaystyle \operatorname {Growth} (H,C)=2^{m}} , which is the upper bound.
Cartesian intersection
Define the Cartesian intersection of two set-families as:
H 1 ⨂ H 2 := { h 1 ∩ h 2 ∣ h 1 ∈ H 1 , h 2 ∈ H 2 } {\displaystyle H_{1}\bigotimes H_{2}:=\{h_{1}\cap h_{2}\mid h_{1}\in H_{1},h_{2}\in H_{2}\}} .Then:11: 57
Growth ( H 1 ⨂ H 2 , m ) ≤ Growth ( H 1 , m ) ⋅ Growth ( H 2 , m ) {\displaystyle \operatorname {Growth} (H_{1}\bigotimes H_{2},m)\leq \operatorname {Growth} (H_{1},m)\cdot \operatorname {Growth} (H_{2},m)}Union
For every two set-families:12: 58
Growth ( H 1 ∪ H 2 , m ) ≤ Growth ( H 1 , m ) + Growth ( H 2 , m ) {\displaystyle \operatorname {Growth} (H_{1}\cup H_{2},m)\leq \operatorname {Growth} (H_{1},m)+\operatorname {Growth} (H_{2},m)}VC dimension
The VC dimension of H {\displaystyle H} is defined according to these two cases:
- In the polynomial case, VCDim ( H ) = n − 1 {\displaystyle \operatorname {VCDim} (H)=n-1} = the largest integer d {\displaystyle d} for which Growth ( H , d ) = 2 d {\displaystyle \operatorname {Growth} (H,d)=2^{d}} .
- In the exponential case VCDim ( H ) = ∞ {\displaystyle \operatorname {VCDim} (H)=\infty } .
So VCDim ( H ) ≥ d {\displaystyle \operatorname {VCDim} (H)\geq d} if-and-only-if Growth ( H , d ) = 2 d {\displaystyle \operatorname {Growth} (H,d)=2^{d}} .
The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether Growth ( H , d ) {\displaystyle \operatorname {Growth} (H,d)} is equal to or smaller than 2 d {\displaystyle 2^{d}} , while the growth function tells us exactly how Growth ( H , m ) {\displaystyle \operatorname {Growth} (H,m)} changes as a function of m {\displaystyle m} .
Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:13: 49
If VCDim ( H ) = d {\displaystyle \operatorname {VCDim} (H)=d} , then: for all m {\displaystyle m} : Growth ( H , m ) ≤ ∑ i = 0 d ( m i ) {\displaystyle \operatorname {Growth} (H,m)\leq \sum _{i=0}^{d}{m \choose i}}In particular,
for all m > d + 1 {\displaystyle m>d+1} : Growth ( H , m ) ≤ ( e m / d ) d = O ( m d ) {\displaystyle \operatorname {Growth} (H,m)\leq (em/d)^{d}=O(m^{d})} so when the VC dimension is finite, the growth function grows polynomially with m {\displaystyle m} .This upper bound is tight, i.e., for all m > d {\displaystyle m>d} there exists H {\displaystyle H} with VC dimension d {\displaystyle d} such that:14: 56
Growth ( H , m ) = ∑ i = 0 d ( m i ) {\displaystyle \operatorname {Growth} (H,m)=\sum _{i=0}^{d}{m \choose i}}Entropy
While the growth-function is related to the maximum intersection-size, the entropy is related to the average intersection size:15: 272–273
Entropy ( H , m ) = E | C m | = m [ log 2 ( | H ∩ C m | ) ] {\displaystyle \operatorname {Entropy} (H,m)=E_{|C_{m}|=m}{\big [}\log _{2}(|H\cap C_{m}|){\big ]}}The intersection-size has the following property. For every set-family H {\displaystyle H} :
| H ∩ ( C 1 ∪ C 2 ) | ≤ | H ∩ C 1 | ⋅ | H ∩ C 2 | {\displaystyle |H\cap (C_{1}\cup C_{2})|\leq |H\cap C_{1}|\cdot |H\cap C_{2}|}Hence:
Entropy ( H , m 1 + m 2 ) ≤ Entropy ( H , m 1 ) + Entropy ( H , m 2 ) {\displaystyle \operatorname {Entropy} (H,m_{1}+m_{2})\leq \operatorname {Entropy} (H,m_{1})+\operatorname {Entropy} (H,m_{2})}Moreover, the sequence Entropy ( H , m ) / m {\displaystyle \operatorname {Entropy} (H,m)/m} converges to a constant c ∈ [ 0 , 1 ] {\displaystyle c\in [0,1]} when m → ∞ {\displaystyle m\to \infty } .
Moreover, the random-variable log 2 | H ∩ C m | / m {\displaystyle \log _{2}{|H\cap C_{m}|/m}} is concentrated near c {\displaystyle c} .
Applications in probability theory
Let Ω {\displaystyle \Omega } be a set on which a probability measure Pr {\displaystyle \Pr } is defined. Let H {\displaystyle H} be family of subsets of Ω {\displaystyle \Omega } (= a family of events).
Suppose we choose a set C m {\displaystyle C_{m}} that contains m {\displaystyle m} elements of Ω {\displaystyle \Omega } , where each element is chosen at random according to the probability measure P {\displaystyle P} , independently of the others (i.e., with replacements). For each event h ∈ H {\displaystyle h\in H} , we compare the following two quantities:
- Its relative frequency in C m {\displaystyle C_{m}} , i.e., | h ∩ C m | / m {\displaystyle |h\cap C_{m}|/m} ;
- Its probability Pr [ h ] {\displaystyle \Pr[h]} .
We are interested in the difference, D ( h , C m ) := | | h ∩ C m | / m − Pr [ h ] | {\displaystyle D(h,C_{m}):={\big |}|h\cap C_{m}|/m-\Pr[h]{\big |}} . This difference satisfies the following upper bound:
Pr [ ∀ h ∈ H : D ( h , C m ) ≤ 8 ( ln Growth ( H , 2 m ) + ln ( 4 / δ ) ) m ] > 1 − δ {\displaystyle \Pr \left[\forall h\in H:D(h,C_{m})\leq {\sqrt {8(\ln \operatorname {Growth} (H,2m)+\ln(4/\delta )) \over m}}\right]~~~~>~~~~1-\delta }which is equivalent to:16: Th.2
Pr [ ∀ h ∈ H : D ( h , C m ) ≤ ε ] > 1 − 4 ⋅ Growth ( H , 2 m ) ⋅ exp ( − ε 2 ⋅ m / 8 ) {\displaystyle \Pr {\big [}\forall h\in H:D(h,C_{m})\leq \varepsilon {\big ]}~~~~>~~~~1-4\cdot \operatorname {Growth} (H,2m)\cdot \exp(-\varepsilon ^{2}\cdot m/8)}In words: the probability that for all events in H {\displaystyle H} , the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of H {\displaystyle H} .
A corollary of this is that, if the growth function is polynomial in m {\displaystyle m} (i.e., there exists some n {\displaystyle n} such that Growth ( H , m ) ≤ m n + 1 {\displaystyle \operatorname {Growth} (H,m)\leq m^{n}+1} ), then the above probability approaches 1 as m → ∞ {\displaystyle m\to \infty } . I.e, the family H {\displaystyle H} enjoys uniform convergence in probability.
References
Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9. 978-3-319-21851-9 ↩
Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. US, Massachusetts: MIT Press. ISBN 9780262018258., especially Section 3.2 9780262018258 ↩
Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135. 9781107057135 ↩
Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135. 9781107057135 ↩
Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135. 9781107057135 ↩
Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9. 978-3-319-21851-9 ↩
Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9. 978-3-319-21851-9 ↩
Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9. 978-3-319-21851-9 ↩
Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9. 978-3-319-21851-9 ↩
Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9. 978-3-319-21851-9 ↩
Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. US, Massachusetts: MIT Press. ISBN 9780262018258., especially Section 3.2 9780262018258 ↩
Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. US, Massachusetts: MIT Press. ISBN 9780262018258., especially Section 3.2 9780262018258 ↩
Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135. 9781107057135 ↩
Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. US, Massachusetts: MIT Press. ISBN 9780262018258., especially Section 3.2 9780262018258 ↩
Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9. 978-3-319-21851-9 ↩
Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9. 978-3-319-21851-9 ↩