Main article: Frequentist statistics
See also: Randomization
A theory of statistical inference was developed by Charles S. Peirce in "Illustrations of the Logic of Science" (1877–1878)3 and "A Theory of Probable Inference" (1883),4 two publications that emphasized the importance of randomization-based inference in statistics.5
Main article: Random assignment
See also: Repeated measures design
Charles S. Peirce randomly assigned volunteers to a blinded, repeated-measures design to evaluate their ability to discriminate weights.6789 Peirce's experiment inspired other researchers in psychology and education, which developed a research tradition of randomized experiments in laboratories and specialized textbooks in the 1800s.10111213
Main article: Response surface methodology
See also: Optimal design
Charles S. Peirce also contributed the first English-language publication on an optimal design for regression models in 1876.14 A pioneering optimal design for polynomial regression was suggested by Gergonne in 1815. In 1918, Kirstine Smith published optimal designs for polynomials of degree six (and less).1516
Main article: Sequential analysis
See also: Multi-armed bandit problem, Gittins index, and Optimal design
The use of a sequence of experiments, where the design of each may depend on the results of previous experiments, including the possible decision to stop experimenting, is within the scope of sequential analysis, a field that was pioneered17 by Abraham Wald in the context of sequential tests of statistical hypotheses.18 Herman Chernoff wrote an overview of optimal sequential designs,19 while adaptive designs have been surveyed by S. Zacks.20 One specific type of sequential design is the "two-armed bandit", generalized to the multi-armed bandit, on which early work was done by Herbert Robbins in 1952.21
A methodology for designing experiments was proposed by Ronald Fisher, in his innovative books: The Arrangement of Field Experiments (1926) and The Design of Experiments (1935). Much of his pioneering work dealt with agricultural applications of statistical methods. As a mundane example, he described how to test the lady tasting tea hypothesis, that a certain lady could distinguish by flavour alone whether the milk or the tea was first placed in the cup. These methods have been broadly adapted in biological, psychological, and agricultural research.22
This example of design experiments is attributed to Harold Hotelling, building on examples from Frank Yates.262728 The experiments designed in this example involve combinatorial designs.29
Weights of eight objects are measured using a pan balance and set of standard weights. Each weighing measures the weight difference between objects in the left pan and any objects in the right pan by adding calibrated weights to the lighter pan until the balance is in equilibrium. Each measurement has a random error. The average error is zero; the standard deviations of the probability distribution of the errors is the same number σ on different weighings; errors on different weighings are independent. Denote the true weights by
We consider two different experiments:
The question of design of experiments is: which experiment is better?
The variance of the estimate X1 of θ1 is σ2 if we use the first experiment. But if we use the second experiment, the variance of the estimate given above is σ2/8. Thus the second experiment gives us 8 times as much precision for the estimate of a single item, and estimates all items simultaneously, with the same precision. What the second experiment achieves with eight would require 64 weighings if the items are weighed separately. However, note that the estimates for the items obtained in the second experiment have errors that correlate with each other.
Many problems of the design of experiments involve combinatorial designs, as in this example and others.30
See also: Metascience
False positive conclusions, often resulting from the pressure to publish or the author's own confirmation bias, are an inherent hazard in many fields.31
Use of double-blind designs can prevent biases potentially leading to false positives in the data collection phase. When a double-blind design is used, participants are randomly assigned to experimental groups but the researcher is unaware of what participants belong to which group. Therefore, the researcher can not affect the participants' response to the intervention.32
Experimental designs with undisclosed degrees of freedom[jargon] are a problem,33 in that they can lead to conscious or unconscious "p-hacking": trying multiple things until you get the desired result. It typically involves the manipulation – perhaps unconsciously – of the process of statistical analysis and the degrees of freedom until they return a figure below the p<.05 level of statistical significance.3435
P-hacking can be prevented by preregistering researches, in which researchers have to send their data analysis plan to the journal they wish to publish their paper in before they even start their data collection, so no data manipulation is possible.3637
Another way to prevent this is taking a double-blind design to the data-analysis phase, making the study triple-blind, where the data are sent to a data-analyst unrelated to the research who scrambles up the data so there is no way to know which participants belong to before they are potentially taken away as outliers.38
Clear and complete documentation of the experimental methodology is also important in order to support replication of results.39
An experimental design or randomized clinical trial requires careful consideration of several factors before actually doing the experiment.40 An experimental design is the laying out of a detailed experimental plan in advance of doing the experiment. Some of the following topics have already been discussed in the principles of experimental design section:
The independent variable of a study often has many levels or different groups. In a true experiment, researchers can have an experimental group, which is where their intervention testing the hypothesis is implemented, and a control group, which has all the same element as the experimental group, without the interventional element. Thus, when everything else except for one intervention is held constant, researchers can certify with some certainty that this one element is what caused the observed change. In some instances, having a control group is not ethical. This is sometimes solved using two different experimental groups. In some cases, independent variables cannot be manipulated, for example when testing the difference between two groups who have a different disease, or testing the difference between genders (obviously variables that would be hard or unethical to assign participants to). In these cases, a quasi-experimental design may be used.
In the pure experimental design, the independent (predictor) variable is manipulated by the researcher – that is – every participant of the research is chosen randomly from the population, and each participant chosen is assigned randomly to conditions of the independent variable. Only when this is done is it possible to certify with high probability that the reason for the differences in the outcome variables are caused by the different conditions. Therefore, researchers should choose the experimental design over other design types whenever possible. However, the nature of the independent variable does not always allow for manipulation. In those cases, researchers must be aware of not certifying about causal attribution when their design doesn't allow for it. For example, in observational designs, participants are not assigned randomly to conditions, and so if there are differences found in outcome variables between conditions, it is likely that there is something other than the differences between the conditions that causes the differences in outcomes, that is – a third variable. The same goes for studies with correlational design.
It is best that a process be in reasonable statistical control prior to conducting designed experiments. When this is not possible, proper blocking, replication, and randomization allow for the careful conduct of designed experiments.41 To control for nuisance variables, researchers institute control checks as additional measures. Investigators should ensure that uncontrolled influences (e.g., source credibility perception) do not skew the findings of the study. A manipulation check is one example of a control check. Manipulation checks allow investigators to isolate the chief variables to strengthen support that these variables are operating as planned.
One of the most important requirements of experimental research designs is the necessity of eliminating the effects of spurious, intervening, and antecedent variables. In the most basic model, cause (X) leads to effect (Y). But there could be a third variable (Z) that influences (Y), and X might not be the true cause at all. Z is said to be a spurious variable and must be controlled for. The same is true for intervening variables (a variable in between the supposed cause (X) and the effect (Y)), and anteceding variables (a variable prior to the supposed cause (X) that is the true cause). When a third variable is involved and has not been controlled for, the relation is said to be a zero order relationship. In most practical applications of experimental research designs there are several causes (X1, X2, X3). In most designs, only one of these causes is manipulated at a time.
Some efficient designs for estimating several main effects were found independently and in near succession by Raj Chandra Bose and K. Kishen in 1940 at the Indian Statistical Institute, but remained little known until the Plackett–Burman designs were published in Biometrika in 1946. About the same time, C. R. Rao introduced the concepts of orthogonal arrays as experimental designs. This concept played a central role in the development of Taguchi methods by Genichi Taguchi, which took place during his visit to Indian Statistical Institute in early 1950s. His methods were successfully applied and adopted by Japanese and Indian industries and subsequently were also embraced by US industry albeit with some reservations.
In 1950, Gertrude Mary Cox and William Gemmell Cochran published the book Experimental Designs, which became the major reference work on the design of experiments for statisticians for years afterwards.
Developments of the theory of linear models have encompassed and surpassed the cases that concerned early writers. Today, the theory rests on advanced topics in linear algebra, algebra and combinatorics.
As with other branches of statistics, experimental design is pursued using both frequentist and Bayesian approaches: In evaluating statistical procedures like experimental designs, frequentist statistics studies the sampling distribution while Bayesian statistics updates a probability distribution on the parameter space.
Some important contributors to the field of experimental designs are C. S. Peirce, R. A. Fisher, F. Yates, R. C. Bose, A. C. Atkinson, R. A. Bailey, D. R. Cox, G. E. P. Box, W. G. Cochran, W. T. Federer, V. V. Fedorov, A. S. Hedayat, J. Kiefer, O. Kempthorne, J. A. Nelder, Andrej Pázman, Friedrich Pukelsheim, D. Raghavarao, C. R. Rao, Shrikhande S. S., J. N. Srivastava, William J. Studden, G. Taguchi and H. P. Wynn.42
The textbooks of D. Montgomery, R. Myers, and G. Box/W. Hunter/J.S. Hunter have reached generations of students and practitioners.4344454647 Furthermore, there is ongoing discussion of experimental design in the context of model building for models either static or dynamic models, also known as system identification. 4849
Laws and ethical considerations preclude some carefully designed experiments with human subjects. Legal constraints are dependent on jurisdiction. Constraints may involve institutional review boards, informed consent and confidentiality affecting both clinical (medical) trials and behavioral and social science experiments.50 In the field of toxicology, for example, experimentation is performed on laboratory animals with the goal of defining safe exposure limits for humans.51 Balancing the constraints are views from the medical field.52 Regarding the randomization of patients, "... if no one knows which therapy is better, there is no ethical imperative to use one therapy or another." (p 380) Regarding experimental design, "...it is clearly not ethical to place subjects at risk to collect data in a poorly designed study when this situation can be easily avoided...". (p 393)
"What Is Design of Experiments (DOE)?". asq.org. American Society for Quality. Retrieved 20 February 2025. https://asq.org/quality-resources/design-of-experiments?srsltid=AfmBOoqGNe13QlU1WGcx1ABznp_0sVoAdwVX3jHd_Hq_a9iaqVTQ9p1u ↩
"The Sequential Nature of Classical Design of Experiments | Prism". prismtc.co.uk. Retrieved 10 March 2023. https://prismtc.co.uk/resources/blogs-and-articles/the-sequential-nature-of-classical-design-of-experiments ↩
Peirce, Charles Sanders (1887). "Illustrations of the Logic of Science". Open Court (10 June 2014). ISBN 0812698495. /wiki/ISBN_(identifier) ↩
Peirce, Charles Sanders (1883). "A Theory of Probable Inference". In C. S. Peirce (Ed.), Studies in logic by members of the Johns Hopkins University (p. 126–181). Little, Brown and Co (1883) ↩
Stigler, Stephen M. (1978). "Mathematical statistics in the early States". Annals of Statistics. 6 (2): 239–65 [248]. doi:10.1214/aos/1176344123. JSTOR 2958876. MR 0483118. Indeed, Pierce's work contains one of the earliest explicit endorsements of mathematical randomization as a basis for inference of which I am aware (Peirce, 1957, pages 216–219 /wiki/Stephen_Stigler ↩
Peirce, Charles Sanders; Jastrow, Joseph (1885). "On Small Differences in Sensation". Memoirs of the National Academy of Sciences. 3: 73–83. /wiki/Charles_Sanders_Peirce ↩
of Hacking, Ian (September 1988). "Telepathy: Origins of Randomization in Experimental Design". Isis. 79 (3): 427–451. doi:10.1086/354775. JSTOR 234674. MR 1013489. S2CID 52201011. /wiki/Ian_Hacking ↩
Stephen M. Stigler (November 1992). "A Historical View of Statistical Concepts in Psychology and Educational Research". American Journal of Education. 101 (1): 60–70. doi:10.1086/444032. JSTOR 1085417. S2CID 143685203. /wiki/Stephen_M._Stigler ↩
Trudy Dehue (December 1997). "Deception, Efficiency, and Random Groups: Psychology and the Gradual Origination of the Random Group Design". Isis. 88 (4): 653–673. doi:10.1086/383850. PMID 9519574. S2CID 23526321. https://www.rug.nl/research/portal/en/publications/deception-efficiency-and-random-groups(459e54f0-1e56-4390-876a-46a33e80621d).html ↩
Peirce, C. S. (1876). "Note on the Theory of the Economy of Research". Coast Survey Report: 197–201., actually published 1879, NOAA PDF Eprint Archived 2 March 2017 at the Wayback Machine. Reprinted in Collected Papers 7, paragraphs 139–157, also in Writings 4, pp. 72–78, and in Peirce, C. S. (July–August 1967). "Note on the Theory of the Economy of Research". Operations Research. 15 (4): 643–648. doi:10.1287/opre.15.4.643. JSTOR 168276. /wiki/Charles_Sanders_Peirce ↩
Guttorp, P.; Lindgren, G. (2009). "Karl Pearson and the Scandinavian school of statistics". International Statistical Review. 77: 64. CiteSeerX 10.1.1.368.8328. doi:10.1111/j.1751-5823.2009.00069.x. S2CID 121294724. /wiki/CiteSeerX_(identifier) ↩
Smith, Kirstine (1918). "On the standard deviations of adjusted and interpolated values of an observed polynomial function and its constants and the guidance they give towards a proper choice of the distribution of observations". Biometrika. 12 (1–2): 1–85. doi:10.1093/biomet/12.1-2.1. /wiki/Kirstine_Smith ↩
Johnson, N.L. (1961). "Sequential analysis: a survey." Journal of the Royal Statistical Society, Series A. Vol. 124 (3), 372–411. (pages 375–376) /wiki/Journal_of_the_Royal_Statistical_Society ↩
Wald, A. (1945) "Sequential Tests of Statistical Hypotheses", Annals of Mathematical Statistics, 16 (2), 117–186. /wiki/Annals_of_Mathematical_Statistics ↩
Herman Chernoff, Sequential Analysis and Optimal Design, SIAM Monograph, 1972. /wiki/Herman_Chernoff ↩
Zacks, S. (1996) "Adaptive Designs for Parametric Models". In: Ghosh, S. and Rao, C. R., (Eds) (1996). "Design and Analysis of Experiments," Handbook of Statistics, Volume 13. North-Holland. ISBN 0-444-82061-2. (pages 151–180) /wiki/ISBN_(identifier) ↩
Robbins, H. (1952). "Some Aspects of the Sequential Design of Experiments". Bulletin of the American Mathematical Society. 58 (5): 527–535. doi:10.1090/S0002-9904-1952-09620-8. https://doi.org/10.1090%2FS0002-9904-1952-09620-8 ↩
Miller, Geoffrey (2000). The Mating Mind: how sexual choice shaped the evolution of human nature, London: Heineman, ISBN 0-434-00741-2 (also Doubleday, ISBN 0-385-49516-1) "To biologists, he was an architect of the 'modern synthesis' that used mathematical models to integrate Mendelian genetics with Darwin's selection theories. To psychologists, Fisher was the inventor of various statistical tests that are still supposed to be used whenever possible in psychology journals. To farmers, Fisher was the founder of experimental agricultural research, saving millions from starvation through rational crop breeding programs." p.54. /wiki/Geoffrey_Miller_(psychologist) ↩
Creswell, J.W. (2008), Educational research: Planning, conducting, and evaluating quantitative and qualitative research (3rd edition), Upper Saddle River, NJ: Prentice Hall. 2008, p. 300. ISBN 0-13-613550-1 /wiki/ISBN_(identifier) ↩
Dr. Hani (2009). "Replication study". Archived from the original on 2 June 2012. Retrieved 27 October 2011. https://web.archive.org/web/20120602061136/http://www.experiment-resources.com/replication-study.html ↩
Burman, Leonard E.; Robert W. Reed; James Alm (2010), "A call for replication studies", Public Finance Review, 38 (6): 787–793, doi:10.1177/1091142110385210, S2CID 27838472, retrieved 27 October 2011 http://pfr.sagepub.com ↩
Hotelling, Harold (1944). "Some Improvements in Weighing and Other Experimental Techniques". Annals of Mathematical Statistics. 15 (3): 297–306. doi:10.1214/aoms/1177731236. https://projecteuclid.org/euclid.aoms/1177731236 ↩
Giri, Narayan C.; Das, M. N. (1979). Design and Analysis of Experiments. New York, N.Y: Wiley. pp. 350–359. ISBN 9780852269145. 9780852269145 ↩
Jack Sifri (8 December 2014). "How to Use Design of Experiments to Create Robust Designs With High Yield". youtube.com. Retrieved 11 February 2015. https://www.youtube.com/watch?v=hfdZabCVwzc ↩
Forstmeier, Wolfgang; Wagenmakers, Eric-Jan; Parker, Timothy H. (23 November 2016). "Detecting and avoiding likely false-positive findings – a practical guide". Biological Reviews. 92 (4): 1941–1968. doi:10.1111/brv.12315. hdl:11245.1/31f84a5b-4439-4a4c-a690-6e98354199f5. ISSN 1464-7931. PMID 27879038. S2CID 26793416. https://doi.org/10.1111%2Fbrv.12315 ↩
David, Sharoon; Khandhar1, Paras B. (17 July 2023). "Double-Blind Study". StatPearls Publishing. PMID 31536248.{{cite journal}}: CS1 maint: numeric names: authors list (link) https://www.ncbi.nlm.nih.gov/books/NBK546641/ ↩
Simmons, Joseph; Leif Nelson; Uri Simonsohn (November 2011). "False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant". Psychological Science. 22 (11): 1359–1366. doi:10.1177/0956797611417632. ISSN 0956-7976. PMID 22006061. /wiki/Doi_(identifier) ↩
"Science, Trust And Psychology in Crisis". KPLU. 2 June 2014. Archived from the original on 14 July 2014. Retrieved 12 June 2014. https://web.archive.org/web/20140714151939/http://www.kplu.org/post/science-trust-and-psychology-crisis ↩
"Why Statistically Significant Studies Can Be Insignificant". Pacific Standard. 4 June 2014. Retrieved 12 June 2014. https://psmag.com/environment/statistically-significant-studies-arent-necessarily-significant-82832 ↩
Nosek, Brian A.; Ebersole, Charles R.; DeHaven, Alexander C.; Mellor, David T. (13 March 2018). "The preregistration revolution". Proceedings of the National Academy of Sciences. 115 (11): 2600–2606. Bibcode:2018PNAS..115.2600N. doi:10.1073/pnas.1708274114. ISSN 0027-8424. PMC 5856500. PMID 29531091. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5856500 ↩
"Pre-Registering Studies – What Is It, How Do You Do It, and Why?". www.acf.hhs.gov. Retrieved 29 August 2023. https://www.acf.hhs.gov/opre/blog/2022/08/pre-registering-studies-what-it-how-do-you-do-it-and-why ↩
Chris Chambers (10 June 2014). "Physics envy: Do 'hard' sciences hold the solution to the replication crisis in psychology?". theguardian.com. Retrieved 12 June 2014. https://www.theguardian.com/science/head-quarters/2014/jun/10/physics-envy-do-hard-sciences-hold-the-solution-to-the-replication-crisis-in-psychology ↩
Ader, Mellenberg & Hand (2008) "Advising on Research Methods: A consultant's companion" ↩
Bisgaard, S (2008) "Must a Process be in Statistical Control before Conducting Designed Experiments?", Quality Engineering, ASQ, 20 (2), pp 143–176 ↩
Giri, Narayan C.; Das, M. N. (1979). Design and Analysis of Experiments. New York, N.Y: Wiley. pp. 53, 159, 264. ISBN 9780852269145. 9780852269145 ↩
Montgomery, Douglas (2013). Design and analysis of experiments (8th ed.). Hoboken, NJ: John Wiley & Sons, Inc. ISBN 9781118146927. 9781118146927 ↩
Walpole, Ronald E.; Myers, Raymond H.; Myers, Sharon L.; Ye, Keying (2007). Probability & statistics for engineers & scientists (8 ed.). Upper Saddle River, NJ: Pearson Prentice Hall. ISBN 978-0131877115. 978-0131877115 ↩
Myers, Raymond H.; Montgomery, Douglas C.; Vining, G. Geoffrey; Robinson, Timothy J. (2010). Generalized linear models : with applications in engineering and the sciences (2 ed.). Hoboken, N.J.: Wiley. ISBN 978-0470454633. 978-0470454633 ↩
Box, George E.P.; Hunter, William G.; Hunter, J. Stuart (1978). Statistics for Experimenters : An Introduction to Design, Data Analysis, and Model Building. New York: Wiley. ISBN 978-0-471-09315-2. 978-0-471-09315-2 ↩
Box, George E.P.; Hunter, William G.; Hunter, J. Stuart (2005). Statistics for Experimenters : Design, Innovation, and Discovery (2 ed.). Hoboken, N.J.: Wiley. ISBN 978-0471718130. 978-0471718130 ↩
Spall, J. C. (2010). "Factorial Design for Efficient Experimentation: Generating Informative Data for System Identification". IEEE Control Systems Magazine. 30 (5): 38–53. doi:10.1109/MCS.2010.937677. S2CID 45813198. /wiki/Doi_(identifier) ↩
Pronzato, L (2008). "Optimal experimental design and some related control problems". Automatica. 44 (2): 303–325. arXiv:0802.4381. doi:10.1016/j.automatica.2007.05.016. S2CID 1268930. /wiki/ArXiv_(identifier) ↩
Moore, David S.; Notz, William I. (2006). Statistics : concepts and controversies (6th ed.). New York: W.H. Freeman. pp. Chapter 7: Data ethics. ISBN 9780716786368. 9780716786368 ↩
Ottoboni, M. Alice (1991). The dose makes the poison : a plain-language guide to toxicology (2nd ed.). New York, N.Y: Van Nostrand Reinhold. ISBN 978-0442006600. 978-0442006600 ↩
Glantz, Stanton A. (1992). Primer of biostatistics (3rd ed.). ISBN 978-0-07-023511-3. 978-0-07-023511-3 ↩