Main article: Bias of an estimator
Statistical bias is a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated. The bias of an estimator of a parameter should not be confused with its degree of precision, as the degree of precision is a measure of the sampling error. The bias is defined as follows: let T {\displaystyle T} be a statistic used to estimate a parameter θ {\displaystyle \theta } , and let E ( T ) {\displaystyle \operatorname {E} (T)} denote the expected value of T {\displaystyle T} . Then,
is called the bias of the statistic T {\displaystyle T} (with respect to θ {\displaystyle \theta } ). If bias ( T , θ ) = 0 {\displaystyle \operatorname {bias} (T,\theta )=0} , then T {\displaystyle T} is said to be an unbiased estimator of θ {\displaystyle \theta } ; otherwise, it is said to be a biased estimator of θ {\displaystyle \theta } .
The bias of a statistic T {\displaystyle T} is always relative to the parameter θ {\displaystyle \theta } it is used to estimate, but the parameter θ {\displaystyle \theta } is often omitted when it is clear from the context what is being estimated.
Statistical bias comes from all stages of data analysis. The following sources of bias will be listed in each stage separately.
Selection bias involves individuals being more likely to be selected for study than others, biasing the sample. This can also be termed selection effect, sampling bias and Berksonian bias.3
Type I and type II errors in statistical hypothesis testing leads to wrong results.12 Type I error happens when the null hypothesis is correct but is rejected. For instance, suppose that the null hypothesis is that if the average driving speed limit ranges from 75 to 85 km/h, it is not considered as speeding. On the other hand, if the average speed is not in that range, it is considered speeding. If someone receives a ticket with an average driving speed of 7 km/h, the decision maker has committed a Type I error. In other words, the average driving speed meets the null hypothesis but is rejected. On the contrary, Type II error happens when the null hypothesis is not correct but is accepted.
Bias in hypothesis testing occurs when the power (the complement of the type II error rate) at some alternative is lower than the supremum of the Type I error rate (which is usually the significance level, α {\displaystyle \alpha } ). Equivalently, if no rejection rate at any alternative is lower than the rejection rate at any point in the null hypothesis set, the test is said to be unbiased.13
The bias of an estimator is the difference between an estimator's expected value and the true value of the parameter being estimated. Although an unbiased estimator is theoretically preferable to a biased estimator, in practice, biased estimators with small biases are frequently used. A biased estimator may be more useful for several reasons. First, an unbiased estimator may not exist without further assumptions. Second, sometimes an unbiased estimator is hard to compute. Third, a biased estimator may have a lower value of mean squared error.
Reporting bias involves a skew in the availability of data, such that observations of a certain kind are more likely to be reported.
Depending on the type of bias present, researchers and analysts can take different steps to reduce bias on a data set. All types of bias mentioned above have corresponding measures which can be taken to reduce or eliminate their impacts.
Bias should be accounted for at every step of the data collection process, beginning with clearly defined research parameters and consideration of the team who will be conducting the research.17 Observer bias may be reduced by implementing a blind or double-blind technique. Avoidance of p-hacking is essential to the process of accurate data collection. One way to check for bias in results after is rerunning analyses with different independent variables to observe whether a given phenomenon still occurs in dependent variables.18 Careful use of language in reporting can reduce misleading phrases, such as discussion of a result "approaching" statistical significant as compared to actually achieving it.19
Cole, Nancy S. (October 1981). "Bias in testing". American Psychologist. 36 (10): 1067–1077. doi:10.1037/0003-066X.36.10.1067. ISSN 1935-990X. http://doi.apa.org/getdoi.cfm?doi=10.1037/0003-066X.36.10.1067 ↩
Popovic, Aleksandar; Huecker, Martin R. (June 23, 2023). "Study Bias". Stat Pearls. PMID 34662027. https://www.ncbi.nlm.nih.gov/books/NBK574513/ ↩
Rothman, Kenneth J.; Greenland, Sander; Lash, Timothy L. (2008). Modern Epidemiology. Lippincott Williams & Wilkins. pp. 134–137. /wiki/Kenneth_Rothman_(epidemiologist) ↩
Mulherin, Stephanie A.; Miller, William C. (2002-10-01). "Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation". Annals of Internal Medicine. 137 (7): 598–602. doi:10.7326/0003-4819-137-7-200210010-00011. ISSN 1539-3704. PMID 12353947. S2CID 35752032. https://pubmed.ncbi.nlm.nih.gov/12353947/ ↩
Bostrom, Nick (2013-05-31). Anthropic Bias: Observation Selection Effects in Science and Philosophy. New York: Routledge. doi:10.4324/9780203953464. ISBN 978-0-203-95346-4. 978-0-203-95346-4 ↩
Ćirković, Milan M.; Sandberg, Anders; Bostrom, Nick (2010). "Anthropic Shadow: Observation Selection Effects and Human Extinction Risks". Risk Analysis. 30 (10): 1495–1506. Bibcode:2010RiskA..30.1495C. doi:10.1111/j.1539-6924.2010.01460.x. ISSN 1539-6924. PMID 20626690. S2CID 6485564. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1539-6924.2010.01460.x ↩
Tripepi, Giovanni; Jager, Kitty J.; Dekker, Friedo W.; Zoccali, Carmine (2010). "Selection Bias and Information Bias in Clinical Research". Nephron Clinical Practice. 115 (2): c94 – c99. doi:10.1159/000312871. ISSN 1660-2110. PMID 20407272. S2CID 18856450. https://www.karger.com/Article/FullText/312871 ↩
"Volunteer bias". Catalog of Bias. 2017-11-17. Retrieved 2021-12-18. https://catalogofbias.org/biases/volunteer-bias/ ↩
Alex, Evans (2020). "Why Do Women Volunteer More Than Men?". Retrieved 2021-12-22. https://teamkinetic.co.uk/blog/2019/07/10/women-volunteer-more-than-men/ ↩
Krimsky, Sheldon (2013-07-01). "Do Financial Conflicts of Interest Bias Research?: An Inquiry into the "Funding Effect" Hypothesis". Science, Technology, & Human Values. 38 (4): 566–587. doi:10.1177/0162243912456271. ISSN 0162-2439. S2CID 42598982. https://doi.org/10.1177/0162243912456271 ↩
Higgins, Julian P. T.; Green, Sally (March 2011). "8. Introduction to sources of bias in clinical trials". In Higgins, Julian P. T.; et al. (eds.). Cochrane Handbook for Systematic Reviews of Interventions (version 5.1). The Cochrane Collaboration. /wiki/Julian_Higgins ↩
Neyman, Jerzy; Pearson, Egon S. (1936). "Contributions to the theory of testing statistical hypotheses". Statistical Research Memoirs. 1: 1–37. /wiki/Jerzy_Neyman ↩
Casella, George; Berger, Roger L. (2002), Statistical Inference, 2nd Ed., p387 ↩
Romano, Joseph P.; Siegel, A. F. (1986-06-01). Counterexamples in Probability And Statistics. CRC Press. pp. 194–196. ISBN 978-0-412-98901-8. 978-0-412-98901-8 ↩
Hardy, Michael (2003). "An Illuminating Counterexample". The American Mathematical Monthly. 110 (3): 234–238. doi:10.2307/3647938. ISSN 0002-9890. JSTOR 3647938. https://www.jstor.org/stable/3647938 ↩
National Council on Measurement in Education (NCME). "NCME Assessment Glossary". Archived from the original on 2017-07-22. /wiki/National_Council_on_Measurement_in_Education ↩
"5 Types of Statistical Biases to Avoid in Your Analyses". Business Insights Blog. 2017-06-13. Retrieved 2023-08-16. https://online.hbs.edu/blog/post/types-of-statistical-bias ↩