Dual P-Values, Evidential Tension and Balanced Tests

In the classical approach to statistical hypothesis testing the role of the null hypothesis H0 and the alternative H1 is very asymmetric. Power, calculated from the distribution of the test statistic under H1, is treated as a theoretical construct that can be used to guide the choice of an approp… riate test statistic or sample size, but power calculations do not explicitly enter the testing process in practice. In a significance test a decision to accept or reject H0 is driven solely by an examination of the strength of evidence against H0, summarized in the P-value calculated from the distribution of the test statistic under H0. A small P-value is taken to represent strong evidence against H0, but it need not necessarily indicate strong evidence in favour of H1. More recently, Moerkerke et al. (2006) have suggested that the special status of H0 is often unwarranted or inappropriate, and argue that evidence against H1 can be equally meaningful. They propose a balanced treatment of both H0 and H1 in which the classical P-value is supplemented by the P-value derived under H1. The alternative P-value is the dual of the null P-value and summarizes the evidence against a target alternative. Here we review how the dual P-values are used to assess the evidential tension between H0 and H1, and use decision theoretic arguments to explore a balanced hypothesis testing technique that exploits this evidential tension. The operational characteristics of balanced hypothesis tests is outlined and their relationship to conventional notions of optimal tests is laid bare. The use of balanced hypothesis tests as a conceptual tool is illustrated via model selection in linear regression and their practical implementation is demonstrated by application to the detection of cancer-specific protein markers in mass spectroscopy.