Revision of principle on statistical significance

Given the accumulation of evidence, Principle 13.29 is being revised. We propose re-stating it to read: “Do not use measures of statistical significance to assess the validity of a forecasting method or model.” We invite comments…

Given the accumulation of evidence, Principle 13.29 is being revised. We propose re-stating it to read: “Do not use measures of statistical significance to assess the validity of a forecasting method or model.” We invite comments (mailto:kesten.green@unisa.edu.au?subject=Comment%20on%20revision%20to%20Principle%2013.29%20on%20statistical%20significance) on the revision of this principle prior to posting it. Here is the proposed restatement of the principle in full: 13.29 Do not use measures of statistical significance to assess the validity of a forecasting method or model. Description: Even when correctly applied, significance tests are dangerous. Statistical significance tests provide measures of how likely it is that relationships apparent in a sample of data are the result of chance variations that arose in selecting the sample. The measurements are affected by the size of the sample and the choice of null hypothesis. With large samples, even very small differences from the null hypothesis are found to be “statistically significant.” Choosing a different null hypothesis can change the conclusion. Statistical significance tests do not provide useful information on material significance or importance. The tests are blind to common problems including sampling bias, non-response bias, and misspecification of relationships. The proper approach to analyzing and communicating findings from empirical studies is to calculate and report effect sizes and confidence intervals that take account of all sources of error in measuring the effect, and to conduct replications, extensions, and meta-analyses. Purpose: To avoid the selection of invalid models or methods, and the rejection of valid ones. Conditions: There are no empirically demonstrated conditions on this principle. Statistical significance tests should not be used unless it can be shown that measures provide a bet benefit in the situation under consideration. Strength of evidence: Strong logical support and non-experimental evidence. There are many examples showing how significance testing has harmed decision-making. Despite repeated appeals for evidence that statistical significance tests can improve decisions, none has been forthcoming. Tests of statistical significance run contrary to the proper purpose of statistics—which is to help users make sense of data. Experimental studies are needed to identify the conditions, if any, under which tests of statistical significance can improve decision-making. Source of evidence: Armstrong, J. S. (2007). Significance tests harm progress in forecasting. International Journal of Forecasting, 23, 321-336, with commentary and a reply. Hauer, E. (2004). The harm done by tests of statistical significance. Accident Analysis and Prevention, 36, 495-500. Hubbard, R. Armstrong J. S. (2006). Why we don't really know what ‘statistical significance’ means: a major educational failure. Journal of Marketing Education, 28, 114-120 Hunter, J.E. Schmidt, J. L. (1996). Cumulative research knowledge and social policy formulation: The critical role of meta-analysis. Psychology, Public Policy, and Law, 2, 324-347. Ziliak, S. T. McCloskey, D. N. (2008). The cult of statistical significance: How the standard error costs us jobs, justice, and lives. Ann Arbor, MI: University of Michigan Press.