ebook img

Gene-based Genomewide Association Analysis: A Comparison Study. PDF

2013·0.12 MB·English
by  KangGuolian
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Gene-based Genomewide Association Analysis: A Comparison Study.

Send Orders of Reprints at [email protected] 250 Current Genomics, 2013, 14, 250-255 Gene-based Genomewide Association Analysis: A Comparison Study Guolian Kang1,*, Bo Jiang2 and Yuehua Cui3 1Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN 38105; 2Department of Biostatistics, The University of Alabama at Birmingham, Birmingham, AL 35294; 3Department of Statistics and Probability, Michi- gan State University, East Lansing, MI 48824 USA Abstract: The study of gene-based genetic associations has gained conceptual popularity recently. Biologic insight into the etiology of a complex disease can be gained by focusing on genes as testing units. Several gene-based methods (e.g., minimum p-value (or maximum test statistic) or entropy-based method) have been developed and have more power than a single nucleotide polymorphism (SNP)-based analysis. The objective of this study is to compare the performance of the entropy-based method with the minimum p-value and single SNP–based analysis and to explore their strengths and weak- nesses. Simulation studies show that: 1) all three methods can reasonably control the false-positive rate; 2) the minimum p-value method outperforms the entropy-based and the single SNP–based method when only one disease-related SNP oc- curs within the gene; 3) the entropy-based method outperforms the other methods when there are more than two disease- related SNPs in the gene; and 4) the entropy-based method is computationally more efficient than the minimum p-value method. Application to a real data set shows that more significant genes were identified by the entropy-based method than by the other two methods. Received on: February 06, 2013- Revised on: May 01, 2013- Accepted on: May 07, 2013 Keywords: Gene-centric, Genome-wide association study, Monte carlo, Entropy, Minimum p-value method. 1. INTRODUCTION improved by combining the information from neighboring SNPs within a single gene. Several methods have been de- Single nucleotide polymorphism (SNP)–based genome- veloped to analyze multiple SNPs within the same gene sim- wide association studies (GWAS) have been a popular and ultaneously. These methods include Fisher’s method for successful method to identify disease-related SNPs. Howev- combining p-values by a logarithm function of p-values and er, this approach has much lower power when the number of the minP (minimum p-value) or maxT (maximum test statis- SNPs increases and SNPs are correlated, especially when tics) method in which the significance level can be deter- their effect sizes are small and only their cumulative effect is mined by the observed p-value. However, the empirical p- associated with a disease. Gene- or region-based analysis value must be calculated by using permutation, because the may have higher power to identify the causal variants that limiting distributions of Fisher’s statistic and minP (maxT) affect the complex disease, because it takes into considera- statistic are unknown under the null hypothesis that the gene tion the correlations among SNPs within a single gene. is not associated with the disease. The simplest method for gene-based analysis is the SNP- Another alternative method to combine multiple SNPs is based method, in which each genotyped SNP is tested for to do multivariate tests. Chapman and Whittaker proposed a association, and multiple testing corrections based on the multivariate score test statistic that is equivalent to the score Bonferroni procedure are applied to control the type-I error test for the logistic regression model [7]. Another test statis- rate. The most widely used single SNP-based association test tic based on an empirical Bayesian model for the parameters method is Cochran-Armitage trend test (CATT) which has was similar to the above multivariate score test statistic [8]. high power under additive and multiplicative disease models Wang and Elston proposed a test statistic using a weighted but much low power under recessive disease model [1-4]. Fourier transform of the genotypes to reduce the test degrees The genotypic test based on a 2(cid:1)3 contingency table is ro- of freedom [9]. Chapman and Whittaker compared the above bust to different disease models [5]. Some other innovative five methods by simulation studies, and they found that the methods include entropy-based method which is generally as minP (maxT) and Goeman’s method perform well over a good as or even more powerful than the genotypic test [5, 6]. range of scenarios [7]. The SNP-based method for gene-based analysis has low power when the causal variants are highly correlated with For the minP (maxT) method, a Monte Carlo (MC) one or more genotyped SNPs and when the causal SNPs are method can be used to evaluate the empirical p-values based not genotyped. The power of the SNP-based method can be on approximating the joint distribution of the test statistics by an MC-sampling approach. This is computationally feasi- ble compared with a permutation method [10]. An entropy- *Address correspondence to this author at the Department of Biostatistics, based test statistic was recently proposed to test gene-disease St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; Tel: +1- 901-595-2666; Fax: +1-901-595-8843; E-mail: [email protected]; association based on the joint genotypes on multiple SNPs 1875-5488/13 $58.00+.00 ©2013 Bentham Science Publishers Comparison of Gene-based Association Methods Current Genomics, 2013, Vol. 14, No. 4 251 within a gene and a cluster-based analysis method was used is much less. Denote the number of observed joint genotypes to reduce the degrees of freedom of the test statistic [11]. for one gene by s (s<3m). Let pA and pU (1(cid:1)i (cid:1) s) be i i In this study, we compare three methods, namely the sin- the frequencies of the i-th joint genotype in cases and con- gle SNP-based method, the maxT method with MC sampling trols, respectively. Then the entropy-based test statistic for to estimate the empirical p-value, and an entropy-based testing the association between this gene and a disease is as method, by simulation studies and real data analysis. We follows [11]: start with a detailed description of each method, followed by simulations and real data analysis. Tgene =(SA (cid:1)SU)W(cid:1)1(SA (cid:1)SU)T (1) , 2. METHODS SA/U =[(cid:1)pA/U log(pA/U),(cid:1),(cid:1)pA/U log(pA/U)] where 1 1 m m , 2.1. MaxT (or minP) Method with Monte Carlo Sampling W=DA(cid:1)ADA/nA +DU(cid:1)UDU /nU, nA/U is the number of Much of what follows in the section below is adapted cases and controls, and from Lin [10]. Consider one gene with mgenic SNPs, each with two alleles. Let Y be the phenotypic value of the i-th (cid:2)(cid:1)1(cid:1)log(pA/U) (cid:1) 0 (cid:4) i 1 (cid:6) (cid:7) individual; let X = 0, 1, or 2 be the genotype of i-th indi- (cid:2) (cid:3) (cid:2) ji (cid:6) (cid:7) n n (cid:6) 0 (cid:1) (cid:1)1(cid:1)log(pA/U)(cid:7) vidual at locus j; and let Y =(cid:1)Y /n and X =(cid:1)X /n, DA/U = (cid:3) m (cid:5), i j ji i=1 i=1 (cid:3)pA/U(1(cid:1) pA/U) (cid:1) (cid:1) pA/U pA/U (cid:5) where 1(cid:1)i (cid:1) n,1(cid:1) j (cid:1) m, and n is the sample size. The (cid:7) 1 1 1 m (cid:8) (cid:2)A/U = (cid:2) (cid:3) (cid:2) (cid:7) (cid:8) test statistic for the j-th locus within this gene is defined as T =UTV(cid:1)1U and j=1, 2,…, m, where U =(cid:1)n U ,, (cid:4)(cid:7) (cid:1) pmA/U p1A/U (cid:1) pmA/U(1(cid:1) pmA1/U)(cid:8)(cid:6). j j j j j ji i=1 Under the null hypothesis that there is no association U =(Y (cid:1)Y)X , and V =(cid:1)n U UT. This test statistic between this gene and a disease, Tgene follows a central (cid:1)2 ji i ji j i=1 ji ji follows an (cid:1)2 distribution with r degrees of freedom, distribution with m-1 degrees of freedom. j When the number of genic SNPs is high, the degree of where rj is the dimension of U j. freedom increases so that the power will decrease. To in- crease the power, the rare joint genotypes could be grouped The test statistics (T , T ,…, T ) may be correlated due to 1 2 m into common ones by using the penalized entropy measure linkage disequilibrium among SNPs within one gene. The p- (PEM) [11]: values evaluated by using the actual joint distribution of (T , 1 T ,…,T ) can be computationally intensive. Lin [10] pro- (cid:3) k (cid:5) 2 m I =(cid:2)(cid:7)(cid:1)p log p (cid:8)(cid:2)2log k/m posed an MC method to approximate the actual joint distri- (cid:7) j 2 j(cid:8) 2 k bution to evaluate the empirical p-values by MC sampling. (cid:4) j=1 (cid:6) , ~ ~ ~ ~ The MC method defines T =UTV(cid:1)1U , where U = where mk is the number of k-th joint genotypes. The joint j j j j j genotype set with maximum value of I will be the corre- n (cid:1) U G , and G , G ,…,G are independent, standard, sponding common joint genotype. To do so, we first sort all i=1 ji i 1 2 n joint genotypes in descending order, according to their fre- normal, random variables that are independent of the data. quencies. Then we calculate the PEM by adding one joint ~ The method then uses the joint distribution of T sto approx- genotype to the present joint genotype set. If the PEM begins j to decrease when the k-th joint genotype is added to the cur- imate the joint distribution of Ts on the basis of obtaining j rent set, the common joint genotype set will include the for- ~ realizations from distributions of T s by repeatedly generat- mer k-1 joint genotypes. j ing the normal random samples G , G ,…,G . Let Once the grouping threshold is determined, we can pro- 1 2 n ceed to calculate the similarities between one rare-joint (t ,t ,(cid:1),t ) be the observed values of the test statistics 1 2 m genotype with frequency less than the threshold and all (T , T ,…, T ), and let t =max{t ,t ,(cid:1),t }. If common genotypes and then group it with the common one 1 2 m max 1 2 m ~ that is the most similar. Pr(T (cid:1)t )<(cid:2), where (cid:1) is the preset significance lev- max max el, then the null hypothesis that this gene is not associated 3. SIMULATION STUDIES with the disease is rejected. We evaluated the performance of the three methods de- scribed above by using simulation studies. We simulated 2.2. Entropy-based Test Statistic and Genotype Grouping case-control samples in two methods: one using a linkage- via Penalized Entropy disequilibrium (LD)-based method similar to methods in [10, For one gene with m genic SNPs, there is a total of 3m 11], and the other using an MS program developed by Hud- joint genotypes. However, the real number of joint genotypes son [12] that is similar to programs developed by Tzeng 252 Current Genomics, 2013, Vol. 14, No. 4 Kang et al. [13]. Although we will not discuss the LD-based simulation For a disease model with two or three interactions of dis- method here (see [11]), we describe below the detailed pro- ease-related SNPs within a single gene (Scenarios 2 and 3), cess to generate samples by the MS program. we follow the cases given in [14]. 3.1. MS Program Scenario 2. For the two-locus-interaction disease model, we denote the two-locus genotypes as (G , G )(cid:1)(0, 1, 2)2, A B We used the MS program developed by Hudson [12] to which represents the number of risk alleles at each disease- simulate haplotypes for each individual to form individual related SNP A and B. The two-locus-interaction disease genotype data. The main parameters under the coalescent model is as follows: model for generating haplotypes were set as: the effective diploid population size n is 1(cid:1)104; the scaled recombina- Model 1: Odds(GA, GB) =(cid:1)(1+(cid:2))GA+GB e tion rate for the whole region of interest, 4ne(cid:1)/bp, is Model 2: Odds(GA, GB) = (cid:1)(1+(cid:2))GAI(GA>0)+GBI(GB>0) 4(cid:2)10(cid:1)3, where the parameter g is the probability of crosso- Model 3: Odds(GA, GB) = (cid:1)(1+(cid:2))I(GA>0(cid:1)GB>0) ver per generation between the ends of the haplotype locus being simulated; the scaled mutation rate for the simulated where (cid:2) is the baseline effect, and (cid:1) is the genotypic effect. haplotype region, 4nμ/bp, is set to be 5.6(cid:2)10(cid:1)4 for the Scenario 3. For the three-locus-interaction disease model, e we denote the three-locus genotypes as (G , G , G )(cid:1)(0, 1, region of simulated haplotypes; and the length of sequence A B C 2)3, which represents the number of risk alleles at each dis- within the region of simulated haplotypes, n sites, is 10 kb. ease-related SNP A, B, and C. The three-locus-interaction Similar parameter settings can be found in other studies [10, disease model is as follows: 12, 13]. We set the number of SNP sequences in the simulat- egden searmatpe leth teo h1a0p0lo ftoyrp ee ascahm gpelen eo na nthde r ubna stihs eo fM tShe psero pgararamm teo- Model 1: Odds(GA, GB, GC) = (cid:1)(1+(cid:2))GA+GB+GC ter settings. Then we randomly selected a segment of 10 ad- Model 2: Odds(GA, GB, GC) = (cid:1)(1+(cid:2))GAI(GA>0)+GBI(GB>0)+GCI(GC>0) jacent SNPs as a haplotype. The two haplotypes are random- ly drawn from the simulated sample containing 100 10-SNP Model 3: Odds(GA, GB, GC) = (cid:1)(1+(cid:2))I(GA>0(cid:1)GB>0(cid:1)GC>0) haplotypes and are paired to form an individual genotype. where (cid:2) and (cid:1) are the same as in Scenario 2. Once the dis- ease-related SNPs are determined, the case-control status can 3.2. Phenotype Simulation then be simulated according to a multinomial distribution In reality, we do not know the true functional mechanism conditional on the observed genotype data. for a given gene, so it is difficult to simulate the true func- We simulated data sets with 400 cases and 400 controls tional variants and the true functional mechanism within a or 800 cases and 800 controls. For the evaluation of type one gene [13]. Here, we considered three scenarios to mimic the error rate, we simulated data sets using both LD-based and situation of a complex disease in which there is one, two, or MS methods but for power, we only used MS method be- three disease-related SNPs within a given gene. For cases with two or three disease-related SNPs, complex interactions cause it can better mimic the biological data. For each data occur among the SNPs. Here we briefly illustrate how the set, we applied the three methods described above. The type- disease phenotypes are simulated. I error rate was estimated based on 1000 replicates, and the power was estimated based on 100 replicates at a signifi- Scenario 1. Let f , f , f be three penetrances of three geno- 0 1 2 cance level of 0.05. For the maxT method, the empirical p- types. Denote (cid:1) = f /f , (cid:1) = f /f as the genotype-relative risks 1 1 0 2 2 1 value was obtained based on 10,000 normal samples. (GRRs). Let p be the disease allele frequency, and denote the disease prevalence as k. Then the three penetrances can be 4. REAL DATA ANALYSIS calculated for an additive, dominant, or recessive disease model (Table 1). We omit a multiplicative model, because the To compare the three methods, we applied them to a results of that model are similar to those from the additive large-scale, candidate-gene study. The data set contains 225 model. Once f is determined, the case/control status is simulat- cases and 585 controls on 190 candidate genes in a genetic ed according to a Bernoulli distribution, with the probability of association study of preeclampsia [15]. We removed SNPs success f conditional on the observed genotype data. with minor allele frequencies less than 0.05 and focused on the remaining 819 SNPs. We also removed 27 genes carry- Table 1. Single-SNP Disease Model ing only one SNP. Similar to [11], we used a nominal level of 0.005 for the gene-based method and 0.005 dividing the Disease Model f0a f1 f2 number of SNPs within each gene for SNP-based method. Additive prev/(1(cid:1)2p+2p(cid:2))b (cid:1)f (2(cid:1)-1)f (Table 2) lists the p-values of significant genes and SNPs 0 0 for the three methods. The genes and SNPs that showed sig- prev/((1(cid:1)p)2+(cid:2)p(2(cid:1)p)) nificant effects are formatted in bold. The entropy-based Dominant (cid:1)f0 (cid:1)f0 method identified seven significant genes among the 190 genes evaluated. The single SNP-based method identified Recessive prev/(1+p2(cid:2)2(cid:1)p2) f0 (cid:1)f0 three significant genes, and the maxT method identified one significant gene. Thus, the gene-based entropy method iden- aThe f0, f1, f2 are three penetrances of genotypes. tified the most number of significant genes. bIn additive and dominant models, (cid:1) = (cid:1)1, and in a recessive model, (cid:1) = (cid:1)2. Comparison of Gene-based Association Methods Current Genomics, 2013, Vol. 14, No. 4 253 Table 2. Analysis of the Preeclampsia Data Set Using the SNP-Based, Gene-based Entropy, and MaxT Methods Gene (No. of SNPs) maxTa Entropyb SNPc SNP-based Method APOB (9) 0.0379 0.0015d rs5456814 0.0165 F13B (4) 0.0282 0.0029 rs28787657 0.0010 F2 (7) 0.5812 0.0020 rs28886771 0.0021 FGF4 (3) 0.0047 0.0039 rs634043464 0.0067 IGF2R (14) 0.7919 0.0005 rs41410456 0.0330 MMP10 (8) 0.1150 0.0006 rs634850223 0.0280 PDGFC (2) 0.0527 0.0036 rs634820282 0.032 IGF1R (7) 0.1312 0.1902 rs40893937 0.0006 NOS2A (10) 0.3695 0.0547 rs9678181 0.0001 aData were obtained using the maximum test statistic method. bData were obtained using the entropy-based method. cOnly SNPs with the smallest P-values within the corresponding genes are listed. dBold formatting of data indicates significant p-values. 5. SIMULATION RESULTS set analysis. Simulation results show that 1) all three meth- ods effectively control the type-I error rate; 2) the single (Table 3) presents the empirical type-I error rates of the SNP–based method is very conservative; 3) when there is single-SNP, maxT, and entropy-based methods based on the one disease-related SNP within a gene, the maxT method is MS program and LD-based method. From (Table 3), we see the most powerful; 4) when there are two or three disease- that the maxT and entropy-based methods control the type-I related SNPs within a gene, the entropy-based method is the error rate quite well. The latter also controls as the sample size most powerful. Real data analysis shows that the entropy- increases. However, the single-SNP method has a much lower based method identifies more significant genes than do the type-I error rate, which means that this method may have low- other two methods. In addition, we have compared the com- er power. We also simulated 10 SNPs with r2=0.9, 0.5, and 0 puting time used by the three methods and found that the within one gene by using the LD-based method and found that entropy-based method is computationally more efficient than all three methods control the type-I error rate well. the maxT method. (Table 4) presents the estimated power of the SNP-based, Given the unknown number of causal SNPs as well as the maxT, and entropy-based methods for one disease-related complex structure among/between causal and non-causal SNP within a single gene. The maxT method appeared to be SNPs within the gene, and the complex underlying disease the most powerful among the three methods. The entropy- gene actions, the relative performance of different approach- based method had lower power than the maxT method, be- es for gene-based association tests strongly depends on dif- cause when one disease-related SNP occurs within a gene, ferent realistic scenarios. Considering genes as testing units, the cluster number in the entropy-based method will be sometimes we have to move forward to pursue gene-based large, so that the degree of freedom of the test statistic in interactions to get better biological insights into the etiology equation (1) is high. This will affect the power of the entro- of complex diseases [16]. As new approaches are increasing- py-based method. ly developed, we believe that no single approach is univer- (Tables 5 and 6) present the estimated power of the sally superb to others [4]. We suggest that users explore as three methods for situations in which two or three disease- many different approaches as possible and choose the best related SNPs occur within a single gene. The entropy-based one based on their biological experience. method appeared to be the most powerful method, and the Rare variants may play an important role to explain the single SNP–based method was the least powerful. This missing heritability of complex disease in post-GWAS re- makes sense because when there are two or three interact- search. The correlations between rare and common SNPs and ing-disease-related SNPs within one gene, the cluster num- among rare variants are generally weak [17], and the number ber of the observed joint genotypes will be small. Thus, the of causal rare SNPs each with moderate or large effect sizes degrees of freedom of the test statistic in equation (1) will may be large [18]. The novel statistical or computational be small, which will improve the power of the entropy- methodologies for analyzing rare variants focusing on genes based method. are urgently needed with the availability of large scale exo- me or wholegenome sequencing data [19]. The relative per- 6. DISCUSSION formance of these approaches for gene-based association We have compared three gene-based association ap- tests is worthy of further investigation. proaches by conducting simulation studies and one real data 254 Current Genomics, 2013, Vol. 14, No. 4 Kang et al. Table 3. The Estimated Type I Error Rate Under the Null Hypothesis of No Association by Using MS Program LD-based Programs MS Program r2 = 0.9 r2 = 0.5 r2 = 0.0 SS maxTa Entropyb SNPc maxT Entropy SNP maxT Entropy SNP maxT Entropy SNP 400 0.05 0.06 0.03 0.05 0.06 0.027 0.06 0.06 0.06 0.04 0.04 0.04 800 0.05 0.05 0.02 0.04 0.06 0.019 0.05 0.06 0.04 0.06 0.05 0.05 aSS, sample size. bData were obtained using the maximum test statistic method. cData were obtained using the entropy-based method. dData were obtained using the single-SNP–based method. Table 4. The Estimated Power of Gene-based Association Tests, Assuming One Disease-related SNP Occurs Within the Gene, Un- der Different Sample Sizes and Different Disease Models Disease Model N=400 N=800 GRRa maxTb Entropyc SNPd maxT Entropy SNP Additive 1.4 1 0.56 0.60 0.95 0.92 0.94 1.6 1 0.91 0.955 1 1 1 1.8 1 0.975 0.990 1 1 1 Dominant 1.4 0.47 0.39 0.36 0.65 0.62 0.74 1.6 0.75 0.65 0.73 0.94 0.90 0.95 1.8 0.88 0.89 0.90 0.99 0.99 0.99 Recessive 1.4 0.22 0.26 0.20 0.29 0.29 0.37 1.6 0.32 0.34 0.34 0.64 0.74 0.77 1.8 0.54 0.63 0.59 0.86 0.92 0.98 aGRR, genotype relative risks. bData were obtained using the maximum test statistic method. cData were obtained using the entropy-based method. dData were obtained using the single-SNP–based method. Table 5. The Estimated Power of Gene-based Association Tests, Assuming that Two Disease-related SNPs Occur Within a Gene, Under Different Sample Sizes and Different Disease Models Disease Model N=400 N=800 (BL,GE) a maxTb Entropyc SNPd maxT Entropy SNP Model 1 (1,0.5) 0.31 0.42 0.19 0.61 0.76 0.37 (1,0.7) 0.54 0.71 0.35 0.87 0.93 0.72 (1,0.9) 0.78 0.89 0.61 0.99 1 0.96 Model 2 (1,0.5) 0.20 0.29 0.19 0.52 0.54 0.49 (1,0.7) 0.34 0.45 0.38 0.66 0.77 0.79 (1,0.9) 0.52 0.65 0.59 0.90 0.96 0.97 Model 3 (1,0.5) 0.17 0.25 0.10 0.51 0.49 0.54 (1,0.7) 0.43 0.56 0.43 0.66 0.77 0.76 (1,0.9) 0.41 0.59 0.50 0.84 0.91 0.92 aBL, the baseline effect; GE, is the genotypic effect. bData were obtained using the maximum test statistic method. cData were obtained using the entropy-based method. dData were obtained using the single-SNP–based method. Comparison of Gene-based Association Methods Current Genomics, 2013, Vol. 14, No. 4 255 Table 6. The Estimated Power of Gene-based Association Tests, Assuming Three Disease-related SNPs Occur Within a Gene, Under Different Sample Sizes and Different Disease Models Disease Model N=400 N=800 (BL,GE)a maxTb Entropyc SNPd maxT Entropy SNP Model 1 (1,0.5) 0.54 0.56 0.42 0.92 0.88 0.81 (1,0.7) 0.87 0.77 0.63 1 1 1 (1,0.9) 0.95 0.94 0.87 1 1 1 Model 2 (1,0.5) 0.56 0.50 0.33 0.94 0.91 0.81 (1,0.7) 0.87 0.76 0.73 1 0.99 0.99 (1,0.9) 0.96 0.96 0.91 1 1 1 Model 3 (1,0.5) 0.06 0.05 0 0.01 0.08 0.03 (1,0.7) 0.08 0.13 0.05 0.06 0.16 0.02 (1,0.9) 0.04 0.19 0.03 0.05 0.20 0.05 a BL, the baseline effect; GE, is the genotypic effect. bData were obtained using the maximum test statistic method. cData were obtained using the entropy-based method. dData were obtained using the single-SNP–based method. CONFLICT OF INTEREST 68, 477-493. [9] Wang, T.; Elston, R.C. Improved power by use of a weighted score The authors confirm that this article content has no con- test for linkage disequilibrium mapping. Am. J. Hum. Genet. 2007, flicts of interest. 80, 353-360. [10] Lin, D.Y. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 2005, 21, 781-787. ACKNOWLEDGEMENTS [11] Cui, Y.H.; Kang, G.L.; Sun, K.L.; Romero, R.; Qian, M.P.; Fu, W.J. Gene-centric genomewide association study via entropy. Ge- We gratefully acknowledge the funding of the American netics 2008, 179, 637-650. Lebanese Syrian Associated Charities (ALSAC) and NSF [12] Hudson, R. Generating samples under a Wright-Fisher neutral grant DMS-1209112, and thank Dr. Angela J. McArthur for model of genetic variation. Bioinformatics 2002, 18, 337-338. editing the manuscript. [13] Tzeng, J.Y. Evolutionary-based grouping of haplotypes in associa- tion analysis. Genet. Epidemiol. 2005, 28, 220-231. [14] Marchini, J.; Donnelly, P.; Cardon, L.R. Genome-wide strategies REFERENCES for detecting multiple loci that influence complex diseases. Nat. [1] Cochran, W.G. Some methods for strengthening the common 2 Genet. 2005, 37, 413-417. tests. Biometrics 1954; 10, 417-451. [15] Goddard, K.A.; Tromp, G.; Romero, R.; Olson, J.M.; Lu, Q.; Xu, [2] Armitage, P. Tests for linear trends in proportions and frequencies. Z.; Parimi, N.; Nien, J.K.; Gomez. R.; Behnke, E.; Solari, M.; Es- Biometrics 1955; 11, 375-386. pinoza, J.; Santolaya, J.; Chaiworapongsa, T.; Lenk, G.M.; Volke- [3] Sasieni, P.D. From genotypes to genes: doubling the sample size. nant, K.; Anant, M.K.; Salisbury, B.A.; Carr, J.; Lee, M.S.; Vovis, Biometrics 1997; 53, 1253-1261. G.F.; Kuivaniemi, H. Candidate-gene association study of mothers [4] Cantor, R.M.; Lange, K., Sinsheimer, J.S. Prioritizing GWAS with pre-eclampsia, and their infants, analyzing 775 SNPs in 190 Results: A Review of Statistical Methods and Recommendations genes. Hum Hered. 2007, 63, 1-16. for Their Application. Am. J. Hum. Genet. 2010, 86(1), 6-22. [16] Li, S.Y.; Cui, Y.H. Gene-centric gene-gene interaction: a model- [5] Ruiz-Marín, M.; Matilla-García, M.; Cordoba, J.A.G.; Susillo- based kernel machine method. Annals of Applied Statistics 2012, 6, González, J.L.; Romo-Astorga, A.; González-Pérez, A.; Ruiz, A.; 1134-1161 Gayán, J. An entropy test for single-locus genetic association anal- [17] Dickson, S.P.; Wang, K.; Krantz, I.; Hakonarson, H.; Goldstein, ysis. BMC Genetics 2010, 11, 19. D.B. Rare Variants Create Synthetic Genome-Wide Associations. [6] Kang, G.; Zuo, Y. Entropy-based joint analysis for two-stage ge- PLOS Genetics 2010, 8, e1000294. nomewide association studies. Journal of Human Genetics 2007, [18] Wu, M.C.; Lee, S.; Cai, T.; Li, Y.; Boehnke, M.; Lin, X. Rare- 52, 747-756. variant association testing for sequencing data with the sequence [7] Chapman, J.; Whittaker, J. Analysis of multiple SNPs in a candi- kernel association test. Am J Hum Genet 2011, 89, 82-93. date gene or region. Genet. Epidemiol. 2008, 32, 560-566. [19] Moore, J.H.; Asselberges, F.M.; Williams, S.W. Bioinformatics [8] Goeman, J.J.; van de Geer, S.; van Houwelingen, H.C. Testing challenges for genome-wide association studies. Bioinformatics against a high dimensional alternative. J. Royal. Stat. Soc. B 2005, 2010, 26 (4), 445-455.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.