ebook img

Multiple comparisons and ANOVA PDF

31 Pages·2012·0.25 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multiple comparisons and ANOVA

Multiplecomparisons ModelingandANOVA Multiple comparisons and ANOVA Patrick Breheny April 19 PatrickBreheny STA580:BiostatisticsI 1/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Multiple comparisons So far in this class, I’ve painted a picture of research in which investigators set out with one specific hypothesis in mind, collect a random sample, then perform a hypothesis test Real life is a lot messier Investigators often test dozens of hypotheses, and don’t always decide on those hypotheses before they have looked at their data Hypothesis tests and p-values are much harder to interpret when multiple comparisons have been made PatrickBreheny STA580:BiostatisticsI 2/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Environmental health emergency ... As an example, suppose we see five cases of a certain type of cancer in the same neighborhood Suppose also that the probability of seeing a single case in neighborhood this size is 1 in 10 If the cases arose independently (our null hypothesis), then the probability of seeing three cases in the neighborhood in a single year is (cid:0) 1 (cid:1)5 = .00001 10 This looks like pretty convincing evidence that chance alone is an unlikely explanation for the outbreak, and that we should look for a common cause This type of scenario occurs all the time, and suspicion is usually cast on a local industry and their waste disposal practices, which may be contaminating the air, ground, or water PatrickBreheny STA580:BiostatisticsI 3/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate ... or coincidence? But there are a lot of neighborhoods and a lot of types of cancer Suppose we were to carry out such a hypothesis test for 100,000 different neighborhoods and 100 different types of cancer Then we would expect (100,000)(100)(.00001) = 100 of these tests to have p-values below .00001 just by random chance As a result, further investigations by epidemiologists and other public health officials rarely succeed in finding a common cause The lesson: if you keep testing null hypotheses, sooner or later, you’ll find significant differences regardless of whether or not one exists PatrickBreheny STA580:BiostatisticsI 4/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Breast cancer study If an investigator begins with a clear set of hypotheses in mind, however, and these hypotheses are independent, then there are methods for carrying out tests while adjusting for multiple comparisons For example, consider a study done at the National Institutes of Health to find genes associated with breast cancer They looked at 3,226 genes, carrying out a two-sample t-test for each gene to see if the expression level of the gene differed between women with breast cancer and healthy controls (i.e., they got 3,226 p-values) PatrickBreheny STA580:BiostatisticsI 5/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Probability of a single mistake If we accepted p < .05 as convincing evidence, what is the probability that we would make at least one mistake? P(At least one error) = 1−P(All correct) ≈ 1−.953,226 ≈ 1 If we want to keep our overall probability of making a type I error at 5%, we need to require p to be much lower PatrickBreheny STA580:BiostatisticsI 6/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate The Bonferroni correction Instead of testing each individual hypothesis at α = .05, we would have to compare our p-values to a new, lower value α∗, where α α∗ = h where h is the number of hypothesis tests that we are conducting (this approach is called the Bonferroni correction) For the breast cancer study, α∗ = .000015 Note that it is still possible to find significant evidence of a gene-cancer association, but much more evidence is needed to overcome the multiple testing PatrickBreheny STA580:BiostatisticsI 7/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate False discovery rate Another way to adjust for multiple hypothesis tests is the false discovery rate Instead of trying to control the overall probability of a type I error, the false discovery rate controls the proportion of significant findings that are type I errors If a cutoff of α for the individual hypothesis tests results in s significant findings, then the false discovery rate is: hα FDR = s PatrickBreheny STA580:BiostatisticsI 8/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate False discovery rate applied to the breast cancer study problem So for example, in the breast cancer study, p < .01 for 207 of the hypothesis tests By chance, we would have expected 3226(.01) = 32.26 significant findings by chance alone Thus, the false discovery rate for this p-value cutoff is 32.26 FDR = = 15.6% 207 We can expect roughly 15.6% of these 207 genes to be spurious results, linked to breast cancer only by chance variability PatrickBreheny STA580:BiostatisticsI 9/31 Introduction Multiplecomparisons TheBonferronicorrection ModelingandANOVA Thefalsediscoveryrate Breast cancer study: Visual idea of FDR 0 0 3 y uenc 200 q e Fr 0 0 1 0 0.0 0.2 0.4 0.6 0.8 1.0 p PatrickBreheny STA580:BiostatisticsI 10/31

Description:
Hypothesis tests and p-values are much harder to interpret when multiple comparisons have been made. Patrick Breheny. STA 580: Biostatistics I.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.