Statistical conclusion validity

<h2 id="common-threats">Common threats</h2>
<p>The most common threats to statistical conclusion validity are:
</p>
<h3>Low statistical power</h3>
<p><a href="/facts/Statistical_power/5KU54nW1">Power</a> is the probability of correctly rejecting the <a href="/facts/Null_hypothesis/C8NwSE9J">null hypothesis</a> when it is false (inverse of the type II error rate). Experiments with low power have a higher probability of incorrectly failing to reject the null hypothesis—that is, committing a type II error and concluding that there is no detectable effect when there is an effect (e.g., there is real covariation between the cause and effect). Low power occurs when the sample size of the study is too small given other factors (small <a href="/facts/Effect_sizes/mIpNrPte">effect sizes</a>, large group variability, unreliable measures, etc.).
</p>
<h3>Violated assumptions of the test statistics</h3>
<p>Most statistical tests (particularly <a href="/facts/Inferential_statistics/sJLjubxm">inferential statistics</a>) involve assumptions about the data that make the analysis suitable for <a href="/facts/Statistical_hypothesis_testing/yv9RPF6U">testing a hypothesis</a>. Violating the assumptions of statistical tests can lead to incorrect inferences about the cause–effect relationship. The <a href="/facts/Robust_statistics/RwIMvtD2">robustness</a> of a test indicates how sensitive it is to violations. Violations of assumptions may make tests more or less likely to make <a href="/facts/Type_I_and_type_II_errors/aazbF3fq">type I or II errors</a>.
</p>
<h3>Dredging and the error rate problem</h3>
<p>Each hypothesis test involves a set risk of a type I error (the alpha rate). If a researcher searches or "<a href="/facts/Data_dredging/X11G2jBn">dredges</a>" through their data, testing many different hypotheses to find a significant effect, they are inflating their type I error rate. The more the researcher repeatedly tests the data, the higher the chance of observing a type I error and making an incorrect inference  about the existence of a relationship.
</p>
<h3>Unreliability of measures</h3>
<p>If the dependent and/or independent variable(s) are not measured <a href="/facts/Reliability_(psychometrics)/2QJrAeB5">reliably</a> (i.e. with large amounts of <a href="/facts/Measurement_error/q7pa7bRt">measurement error</a>), incorrect conclusions can be drawn.
</p>
<h3>Restriction of range</h3>
<p>Restriction of range, such as <a href="/facts/Ceiling_effect_(statistics)/ka3xfaAR">floor and ceiling effects</a> or <a href="/facts/Selection_effects/XmeLAXBF">selection effects</a>, reduce the power of the experiment, and increase the chance of a type II error.<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a> This is because <a href="/facts/Correlation/egkluAEm">correlations</a> are attenuated (weakened) by reduced variability (see, for example, the equation for the <a href="/facts/Pearson_product-moment_correlation_coefficient/Igf0xDPc">Pearson product-moment correlation coefficient</a> which uses score variance in its estimation).
</p>
<h3>Heterogeneity of the units under study</h3>
<p>Greater heterogeneity of individuals participating in the study can also impact interpretations of results by increasing the variance of results or obscuring true relationships (see also <a href="/facts/Sampling_error/3k8Fjlb3">sampling error</a>). This obscures possible interactions between the characteristics of the units and the cause–effect relationship.
</p>
<h3>Threats to internal validity</h3>
<p>Any effect that can impact the <a href="/facts/Internal_validity/mfyCA0n1">internal validity</a> of a research study may bias the results and impact the validity of statistical conclusions reached. These threats to internal validity include unreliability of treatment implementation (lack of <a href="/facts/Standardization/vS9AhNMS">standardization</a>) or failing to control for <a href="/facts/Extraneous_variables/dINfzIF2">extraneous variables</a>.
</p>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Internal_validity/mfyCA0n1">Internal validity</a></li>
<li><a href="/facts/Statistical_model_validation/9gt8V1ko">Statistical model validation</a></li>
<li><a href="/facts/Test_validity/V25vtGeh">Test validity</a></li>
<li><a href="/facts/Validity_(statistics)/9LNgtTUG">Validity (statistics)</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Cozby, Paul C. (2009). Methods in behavioral research (10th ed.). Boston: McGraw-Hill Higher Education. <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Cohen, R. J.; Swerdlik, M. E. (2004). Psychological testing and assessment (6th edition). Sydney: McGraw-Hill. <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Cook, T. D.; Campbell, D. T.; Day, A. (1979). Quasi-experimentation: Design & analysis issues for field settings. Houghton Mifflin. <a href="https://archive.org/details/quasiexperimenta00cook" target="_blank">https://archive.org/details/quasiexperimenta00cook</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>Shadish, W.; Cook, T. D.; Campbell, D. T. (2006). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin. <a href="/wiki/Houghton_Mifflin" target="_blank">/wiki/Houghton_Mifflin</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Sackett, P.R.; Lievens, F.; Berry, C.M.; Landers, R.N. (2007). "A Cautionary Note on the Effects of Range Restriction on Predictor Intercorrelations". Journal of Applied Psychology. 92 (2): 538–544. doi:10.1037/0021-9010.92.2.538. PMID 17371098. <a href="https://www.researchgate.net/publication/6436643" target="_blank">https://www.researchgate.net/publication/6436643</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
</ol>

Statistical conclusion validity open-in-new

Statistical conclusion validity