Revision of principle on statistical significance

Given the accumulation of evidence, Principle 13.29 is being revised. We propose re-stating it to read: “Do not use measures of statistical significance to assess a forecasting method or model.” We invite comments…

Given the accumulation of evidence, Principle 13.29 is being revised. We propose re-stating it to read: “Do not use measures of statistical significance to assess a forecasting method or model.” We invite comments (mailto:kesten.green@unisa.edu.au?subject=Comment%20on%20revision%20to%20Principle%2013.29%20on%20statistical%20significance) on the revision of this principle prior to posting it. (We have already made changes in response to feedback – 18 March 2008, and again on 21 March.) Here is the proposed restatement of the principle in full: 13.29 Do not use measures of statistical significance to assess a forecasting method or model. Description: Even when correctly applied, significance tests are dangerous. Statistical significance tests calculate the probability, assuming the analyst’s null hypothesis is true, that relationships apparent in a sample of data are the result of chance variations that arose in selecting the sample. The probability that is calculated is affected by the size of the sample and the choice of null hypothesis. With large samples, even small differences from what would be expected in the data if the null hypothesis were true will be “statistically significant.” Choosing a different null hypothesis can change the conclusion. Statistical significance tests do not provide useful information on material significance or importance. Moreover, the tests are blind to common problems such as non-response error, response error, and misspecification of relationships. The proper approach to analyzing and communicating findings from empirical studies is to (1) calculate and report effect sizes; (2) estimate the range within which the actual effect size is likely to lie by taking account of prior knowledge and all potential sources of error in measuring the effect; and (3) conduct replications, extensions, and meta-analyses. Purpose: To avoid the selection of invalid models or methods, and the rejection of valid ones. Conditions: There are no empirically demonstrated conditions on this principle. Statistical significance tests should not be used unless it can be shown that the measures provide a net benefit in the situation under consideration. Strength of evidence: Strong logical support and non-experimental evidence. There are many examples showing how significance testing has harmed decision-making. Despite repeated appeals for evidence that statistical significance tests can improve decisions, none has been forthcoming. Tests of statistical significance run contrary to the proper purpose of statistics—which is to help users make sense of data. Experimental studies are needed to identify the conditions, if any, under which tests of statistical significance can improve decision-making. Source of evidence: Armstrong, J. S. (2007). The harm done by tests of statistical significance (http://tinyurl.com/Hauer2004Harm). Accident Analysis and Prevention, 36, 495-500. Hubbard, R. Armstrong J. S. (2006). Why we don't really know what ‘statistical significance’ means: a major educational failure. Journal of Marketing Education, 28, 114-120 Hunter, J.E. Schmidt, F. L. (1996). Cumulative research knowledge and social policy formulation: The critical role of meta-analysis. Psychology, Public Policy, and Law, 2, 324-347. Ziliak, S. T. McCloskey, D. N. (2008). The cult of statistical significance: How the standard error costs us jobs, justice, and lives. Ann Arbor, MI: University of Michigan Press.