Authors
Summary
Current meta-analysis methods for GWAS assume independence of samples included in individual studies. However, association studies increasingly sample from the same populations or cohorts, or use identical control datasets. Individual participant data are often inaccessible, which makes the degree of relatedness or overlap difficult to calculate. This sample overlap can be very high, especially if the source population is small. Lin and Sullivan provide a theoretical framework for case-control and quantitative trait meta-analysis that accounts for a known number of overlapping samples. Province and Borecki use tetrachoric correlation to estimate sample relatedness or overlap from summary statistics. However this p-value based method does not account for differences in genetic effect directions, nor does it produce a summary of these effects. We adapt this estimator and integrate it with Lin and Sullivan's inverse-variance based method to provide a meta-analysis of both effect sizes and p-values. Using simulations based on GWAS genotypes from 10,000 individuals from the UKHLS cohort, we show that this method maintains the type-I error rate under an average 7% at a 5% significance threshold, even with very large sample overlaps, as opposed to a linear increase of 0.12% per 1% sample overlap using an uncorrected meta-analysis. Both the p-value correction and overlap estimation were robust to sample size variation and to MAF filtering of the input dataset. We demonstrate that tetrachoric correlation can estimate sample overlap with 95% accuracy. We implement our method in a software package that scales to genome-wide sequencing data, and can control for unknown sample relatedness or overlap in meta-analysis of up to 15 studies.
Volume and page numbers
Volume: 39 , p.529 -599
Subjects
Notes
Albert Sloman Library Periodicals *restricted to Univ. Essex registered users*