Multiplecomparisons ModelingandANOVA Multiple comparisons and ANOVA Patrick Breheny April 19 PatrickBreheny STA580:BiostatisticsI 1/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Multiple comparisons So far in this class, I’ve painted a picture of research in which investigators set out with one specific hypothesis in mind, collect a random sample, then perform a hypothesis test Real life is a lot messier Investigators often test dozens of hypotheses, and don’t always decide on those hypotheses before they have looked at their data Hypothesis tests and p-values are much harder to interpret when multiple comparisons have been made PatrickBreheny STA580:BiostatisticsI 2/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Environmental health emergency ... As an example, suppose we see five cases of a certain type of cancer in the same neighborhood Suppose also that the probability of seeing a single case in neighborhood this size is 1 in 10 If the cases arose independently (our null hypothesis), then the probability of seeing three cases in the neighborhood in a single year is (cid:0) 1 (cid:1)5 = .00001 10 This looks like pretty convincing evidence that chance alone is an unlikely explanation for the outbreak, and that we should look for a common cause This type of scenario occurs all the time, and suspicion is usually cast on a local industry and their waste disposal practices, which may be contaminating the air, ground, or water PatrickBreheny STA580:BiostatisticsI 3/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate ... or coincidence? But there are a lot of neighborhoods and a lot of types of cancer Suppose we were to carry out such a hypothesis test for 100,000 different neighborhoods and 100 different types of cancer Then we would expect (100,000)(100)(.00001) = 100 of these tests to have p-values below .00001 just by random chance As a result, further investigations by epidemiologists and other public health officials rarely succeed in finding a common cause The lesson: if you keep testing null hypotheses, sooner or later, you’ll find significant differences regardless of whether or not one exists PatrickBreheny STA580:BiostatisticsI 4/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Breast cancer study If an investigator begins with a clear set of hypotheses in mind, however, and these hypotheses are independent, then there are methods for carrying out tests while adjusting for multiple comparisons For example, consider a study done at the National Institutes of Health to find genes associated with breast cancer They looked at 3,226 genes, carrying out a two-sample t-test for each gene to see if the expression level of the gene differed between women with breast cancer and healthy controls (i.e., they got 3,226 p-values) PatrickBreheny STA580:BiostatisticsI 5/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Probability of a single mistake If we accepted p < .05 as convincing evidence, what is the probability that we would make at least one mistake? P(At least one error) = 1−P(All correct) ≈ 1−.953,226 ≈ 1 If we want to keep our overall probability of making a type I error at 5%, we need to require p to be much lower PatrickBreheny STA580:BiostatisticsI 6/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate The Bonferroni correction Instead of testing each individual hypothesis at α = .05, we would have to compare our p-values to a new, lower value α∗, where α α∗ = h where h is the number of hypothesis tests that we are conducting (this approach is called the Bonferroni correction) For the breast cancer study, α∗ = .000015 Note that it is still possible to find significant evidence of a gene-cancer association, but much more evidence is needed to overcome the multiple testing PatrickBreheny STA580:BiostatisticsI 7/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate False discovery rate Another way to adjust for multiple hypothesis tests is the false discovery rate Instead of trying to control the overall probability of a type I error, the false discovery rate controls the proportion of significant findings that are type I errors If a cutoff of α for the individual hypothesis tests results in s significant findings, then the false discovery rate is: hα FDR = s PatrickBreheny STA580:BiostatisticsI 8/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate False discovery rate applied to the breast cancer study problem So for example, in the breast cancer study, p < .01 for 207 of the hypothesis tests By chance, we would have expected 3226(.01) = 32.26 significant findings by chance alone Thus, the false discovery rate for this p-value cutoff is 32.26 FDR = = 15.6% 207 We can expect roughly 15.6% of these 207 genes to be spurious results, linked to breast cancer only by chance variability PatrickBreheny STA580:BiostatisticsI 9/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Breast cancer study: Visual idea of FDR 0 0 3 y uenc 200 q e Fr 0 0 1 0 0.0 0.2 0.4 0.6 0.8 1.0 p PatrickBreheny STA580:BiostatisticsI 10/31
Description: