Formally, let Y i , i = 1 , 2 , … , n {\displaystyle Y_{i},i=1,2,\ldots ,n} be an independent sample from n {\displaystyle n} of N ≥ n {\displaystyle N\geq n} distinct strata with an overall mean μ {\displaystyle \mu } . Suppose further that π i {\displaystyle \pi _{i}} is the inclusion probability that a randomly sampled individual in a superpopulation belongs to the i {\displaystyle i} th stratum. The Horvitz–Thompson estimator of the total is given by:3: 51
and the Horvitz–Thompson estimate of the mean is given by:
In a Bayesian probabilistic framework π i {\displaystyle \pi _{i}} is considered the proportion of individuals in a target population belonging to the i {\displaystyle i} th stratum. Hence, Y i / π i {\displaystyle Y_{i}/\pi _{i}} could be thought of as an estimate of the complete sample of persons within the i {\displaystyle i} th stratum. The Horvitz–Thompson estimator can also be expressed as the limit of a weighted bootstrap resampling estimate of the mean. It can also be viewed as a special case of multiple imputation approaches.4
For post-stratified study designs, estimation of π {\displaystyle \pi } and μ {\displaystyle \mu } are done in distinct steps. In such cases, computating the variance of μ ^ H T {\displaystyle {\hat {\mu }}_{HT}} is not straightforward. Resampling techniques such as the bootstrap or the jackknife can be applied to gain consistent estimates of the variance of the Horvitz–Thompson estimator.5 The "survey" package for R conducts analyses for post-stratified data using the Horvitz–Thompson estimator.6
For this proof it will be useful to represent the sample as a random subset S ⊆ { 1 , … , N } {\displaystyle S\subseteq \{1,\ldots ,N\}} of size n {\displaystyle n} . We can then define indicator random variables I j = 1 [ j ∈ S ] {\displaystyle I_{j}=\mathbf {1} [j\in S]} representing whether for each j {\displaystyle j} in { 1 , … , N } {\displaystyle \{1,\ldots ,N\}} whether it is present in the sample. Note that for any observation in the sample, the expectation is the definition of the inclusion probability: π i = E ( I i ) = Pr ( i ∈ S ) {\displaystyle \pi _{i}=\operatorname {\mathbb {E} } \left(I_{i}\right)=\Pr(i\in S)} . 7
Taking the expectation of the estimator we can prove it is unbiased as follows:
The Hansen–Hurwitz (1943) is known to be inferior to the Horvitz–Thompson (1952) strategy, associated with a number of Inclusion Probabilities Proportional to Size (IPPS) sampling procedures.8
Horvitz, D. G.; Thompson, D. J. (1952) "A generalization of sampling without replacement from a finite universe", Journal of the American Statistical Association, 47, 663–685, . JSTOR 2280784 /wiki/Journal_of_the_American_Statistical_Association ↩
William G. Cochran (1977), Sampling Techniques, 3rd Edition, Wiley. ISBN 0-471-16240-X /wiki/ISBN_(identifier) ↩
Särndal, Carl-Erik; Swensson, Bengt; Wretman, Jan Hȧkan (1992). Model Assisted Survey Sampling. ISBN 9780387975283. 9780387975283 ↩
Roderick J.A. Little, Donald B. Rubin (2002) Statistical Analysis With Missing Data, 2nd ed., Wiley. ISBN 0-471-18386-5 /wiki/ISBN_(identifier) ↩
Quatember, A. (2014). "The Finite Population Bootstrap - from the Maximum Likelihood to the Horvitz-Thompson Approach". Austrian Journal of Statistics. 43 (2): 93–102. doi:10.17713/ajs.v43i2.10. https://doi.org/10.17713%2Fajs.v43i2.10 ↩
"CRAN - Package survey". 19 July 2021. https://cran.r-project.org/web/packages/survey/ ↩
Technically, the indexing scheme in the proof is different from the indexing in the description of the estimator. In the proof, Y j {\displaystyle Y_{j}} is the j {\displaystyle j} th value in a global ordering out of N {\displaystyle N} strata. In the description, Y i {\displaystyle Y_{i}} is the i {\displaystyle i} th value in the sample, out of n {\displaystyle n} . To unify these two, we could explicitly define a function mapping sample-indices to global indices. ↩
PRABHU-AJGAONKAR, S. G. "Comparison of the Horvitz–Thompson Strategy with the Hansen–Hurwitz Strategy." Survey Methodology (1987): 221. (pdf) https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1987002/article/14609-eng.pdf?st=mgQEBG-Z ↩