Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes.
Journal: 2008/May - Nature Genetics
ISSN: 1546-1718
Abstract:
Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D). Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and approximately 2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975. We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P = 5.0 x 10(-14)), CDC123-CAMK1D (P = 1.2 x 10(-10)), TSPAN8-LGR5 (P = 1.1 x 10(-9)), THADA (P = 1.1 x 10(-9)), ADAMTS9 (P = 1.2 x 10(-8)) and NOTCH2 (P = 4.1 x 10(-8)) gene regions. Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D.
Relations:
Content
Citations
(851)
References
(28)
Grants
(46)
Diseases
(1)
Conditions
(1)
Genes
(22)
Organisms
(1)
Processes
(2)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Nat Genet 40(5): 638-645

Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes

+90 authors

Methods

Stage 1 samples, genome-wide genotyping and quality control (expanded in Supplementary Methods)

UK

The WTCCC stage 1 sample consists of 1,924 T2D cases and 2,938 population controls from the UK34. These samples were genotyped on the Affymetrix GeneChip Human Mapping 500k Array Set. The call frequency of included samples was >0.97. 393,143 autosomal SNPs passed quality control (QC) criteria: Hardy-Weinberg equilibrium [HWE] p>10 in T2D cases and controls, call frequency >0.95, minor allele frequency (MAF)>0.01, and good clustering, as defined in34.

DGI

The DGI stage 1 Swedish and Finnish sample consists of 1,464 T2D cases and 1,467 normoglycemic controls. Of these, 2,097 are population-based T2D cases and controls matched for BMI, gender, and geographic origin, and 834 are T2D cases and controls in 326 sibships discordant for T2D1. These samples were genotyped on the Affymetrix GeneChip Human Mapping 500k Array Set, and all included samples had a genotype call rate >0.95. 378,860 autosomal SNPs passed QC criteria (call frequency >0.95, HWE p>10 in controls and MAF>0.01 in both population and familial components)1.

FUSION

The FUSION stage 1 sample consists of 1,161 Finnish T2D cases and 1,174 Finnish normal glucose tolerant controls2. In addition, 122 FUSION offspring with genotyped parents were included for quality control purposes and quantitative trait analysis. Samples were genotyped with the Illumina HumanHap300 BeadChip (version 1.1). All samples included had a call frequency >0.975. 306,222 autosomal SNPs passed QC2 and had a HWE p≥10 in the total sample, ≤3 combined duplicate or non-Mendelian inheritance errors (out of 79 duplicate samples and 122 parent-offspring sets), call frequency ≥0.90, and MAF>0.01.

Analysis of stage 1 genotype data

In combining data across the three studies, we did not attempt, given differences in study design and implementation, to harmonize every aspect of individual study analysis and QC. For the UK, DGI and FUSION studies respectively, 393,143, 378,860 and 306,222 SNPs were analyzed under an additive model. The genomic control values for these directly genotyped SNPs were 1.08 (UK), 1.06 (DGI) and 1.03 (FUSION) (Supplementary Methods).

Stage 1 imputation and T2D analysis

For each stage 1 sample set, we imputed genotypes for autosomal SNPs that were present in HapMap Phase II but were not present in the genome-wide chip or did not pass direct genotyping QC. In each sample, genotypes were imputed using the genotype data from the GWA chips and phased HapMap II genotype data from the 60 CEU HapMap founders. We retained SNPs that had an estimated MAF>0.01 in the control samples. Imputed SNPs were then tested for T2D association. The genomic control values for these imputed SNPs were 1.08 (UK), 1.07 (DGI) and 1.04 (FUSION) (Supplementary Methods).

Stage 1 meta-analysis (expanded in Supplementary Methods)

We used meta-analysis to combine the T2D association results for the stage 1 WTCCC, DGI and FUSION samples. The combined stage 1 data are comprised of 10,128 samples: 4,549 T2D cases and 5,579 controls. We used association results from directly genotyped SNPs, where available, and imputed genotype association results at all other positions. 2,168,847 genotyped and imputed autosomal SNPs passed QC and had MAF>0.01 in each of the three samples (44,750 were genotyped in all three samples, 308,628 were genotyped in two samples, 245,158 were genotyped in one sample, 1,570,311 were imputed in all samples). All association results were expressed relative to the forward strand of the reference genome based on dbSNP125. For our initial analysis, which was used to select signals for stage 2 genotyping, for each SNP we combined the ORs for a given reference allele weighted by the confidence intervals using a fixed effects model. We investigated evidence for heterogeneity of ORs using two commonly used statistics: Cochrans's Q statistic and I (25).

We repeated the meta-analysis combining evidence for association based solely on the p value. Specifically, for each study we converted the two-sided p value to a z-statistic which was signed to reflect the direction of the association given the reference allele. Each z-score was then weighted; the squared weights were chosen to sum to 1 and each sample-specific weight was proportional to the square root of the effective number of individuals in the sample. Weighted z-statistics were summed across studies and the summary z-score converted to a two-sided p value.

SNP prioritisation for stage 2 genotyping

We prioritized 69 SNPs for replication in stage 2 based on the results from the three-study stage 1 meta-analysis, using a set of criteria we developed as part of a heuristic approach to the prioritization of loci for follow-up (Supplementary Methods). Briefly, we considered SNPs with a meta-analysis p value <10 and a meta-analysis heterogeneity p value >10. These selections were largely made using the initial OR-based version of the meta-analysis. We allowed some exceptions to the above follow-up criteria.

Five SNPs were selected for replication genotyping on the basis of their strong association with T2D in the DGI GWA study (2 SNPs), association with T2D and with insulinogenic index in the DGI study (1 SNP), and overlap with FUSION or WTCCC (p<0.05 in DGI and one or both studies; 2 SNPs). For known T2D loci (TCF7L2, CDKAL1, IGF2BP2, KCNJ11, HHEX/IDE, SLC30A8, CDKN2A/2B region, WFS1, TCF2, and FTO) we excluded from follow-up all SNPs that resided within the surrounding region, with region boundaries defined by the furthest neighboring SNPs with p values remaining ~0.01 (n=1,981). For the PPARG region, we identified a SNP, rs17036101, with a p value two orders of magnitude lower than the established Pro12Ala susceptibility variant, rs1801282, and took this signal forward to replication. A total of 69 SNPs were taken forward to stage 2 genotyping.

Stage 2 samples, genotyping and analysis

UK

We genotyped the prioritized SNPs in cases and controls from three UK replication sets (RS1, RS2 and RS3, described in4; Supplementary Table 1; Supplementary Methods). Genotyping of prioritized SNPs in RS1, RS2 and RS3 was performed by Kbiosciences (Herts., UK). All assays were validated prior to use, using a standard 96-well validation plate used by Kbiosciences and up to 296 samples from the WTCCC study (see Comparison of genotypes from imputation and direct genotyping; Supplementary Methods). Concordance rates between the Affymetrix and KASPar/TaqMan genotypes (based on up to 296 replicate stage 1 samples) were 97.5% on average. All genotyped SNPs had genotype call frequency rates >94% in the replication sets and no SNPs had HWE p value<0.001 in cases or controls. We tested for association with T2D using the Cochran-Armitage test for trend. Results from the 3 replication sets were combined in a Cochran-Mantel-Haenszel meta-analysis framework.

DGI

We genotyped the prioritized SNPs in three stage 2 case-control samples1 (Supplementary Table 1; Supplementary Methods). The prioritized SNPs were genotyped in all DGI stage 1 and 2 samples using the iPLEX Sequenom MassARRAY platform (http://www.sequenom.com/Assets/pdfs/appnotes/8876-006.pdf). 63 SNPs passing QC (>94% call rate, MAF>0.01 and HWE p value >0.001) were used for association testing. We tested for T2D association in each DGI stage 2 case-control set using a chi-squared analysis (assuming an additive genetic model). Results from the three DGI stage 2 samples were combined using Cochran-Mantel-Haenszel meta-analysis.

FUSION

We genotyped the prioritized SNPs in a Finnish case-control sample (Supplementary Table 1; Supplementary Methods) using the Sequenom Homogeneous Mass EXTEND or iPLEX Gold SBE assays, carried out at the National Human Genome Research Institute (NHGRI). 59 SNPs had genotype call frequency >94% and HWE p value >0.001. The genotype consistency rate among 56 duplicate samples was 100% and the average call frequency of successfully genotyped SNPs was 97.3%. SNPs were analyzed using logistic regression with adjustment for sex, 5-year age category and birth province and an additive model for the genetic effect.

Comparison of genotypes from imputation and direct genotyping

A proportion of the prioritized imputed signals was genotyped in the stage 1 samples of the three studies and respective concordance rates were calculated (Supplementary Methods; Supplementary Table 4). All results presented in the main manuscript text are based on directly-typed stage 1 data.

Combined meta-analysis for stages 1 and 2

We combined stage 1 and stage 2 data using both the OR-based and the weighted z score-based meta-analysis approaches described above (Stage 1 meta-analysis). We also assessed our results using random effects meta-analysis to better account for any heterogeneity between the studies (Supplementary Table 6). Locus-specific and combined sibling relative risk estimates were calculated using sample size-weighted estimates of the effect size and risk-allele frequency derived from stage 2 replication samples only, and under the assumption of allelic and locus independence, as described by2627.

Stage 3 sample, genotyping and association analysis

Eleven SNPs (rs2641348, rs10490072, rs7578597, rs17036101, rs4607103, rs9472138, rs864745, rs12779790, rs1153188, rs10923931, and rs7961581) were followed up in the stage 3 samples, from the deCODE, KORA, Danish, HUNT, NHS, GEM Consortium (CCC, EPIC, ADDITION/Ely, Norfolk) and METSIM studies (Supplementary Table 1; Supplementary Methods).

Combined meta-analysis for stages 1, 2, and 3

We combined stage 1, 2 and 3 data using both meta-analysis approaches (fixed-effects model to combine ORs and weighted p value-based z-statistic combination across all sample sets) described above (Stage 1 meta-analysis). We also assessed our results using random effects meta-analysis (Supplementary Table 6). We observed some evidence for heterogeneity across studies (the I statistic ranged from 0 to 57.8% depending on SNP), with rs7578597 and rs10923931 displaying the largest fold differences in association p value between the fixed- and random-effects model analyses. Differences in strength of association across studies (leading to evidence for heterogeneity) could reflect interesting biological associations that vary from study to study depending on subject ascertainment scheme.

Genomic control (expanded in Supplementary Methods)

We have adopted two strategies in reporting the findings from this study. In the first, we performed GC-correction of data from DGI, FUSION and WTCCC prior to stage 1 meta-analysis. We corrected each individual study for the GC inflation observed (directly genotyped and imputed data separately), and combined results across studies. We present the genome-wide distribution of association statistics in Supplementary Figure 1. We note that, after study-specific genomic control adjustment, the estimated inflation factor for the stage-1 meta-analysis test statistic was 1.04.

In the second, we combined GC-uncorrected data from DGI, FUSION and WTCCC for stage 1 meta-analysis and did not correct the meta-analysis test statistics for the overall GC (to guard against over-conservativeness in the estimate of strength of association for interesting signals). We also present the genome-wide distribution of these statistics in Supplementary Figure 1.

For the combination of data across stages 1, 2 and 3, we also adopted these two strategies (of using GC-corrected and GC-uncorrected stage 1 data). In the first, we performed individual GC-correction of DGI, FUSION and WTCCC stage 1 data prior to meta-analysis with stage 2 and stage 3 data (an approach which may be over-conservative where, as here, none of the T2D-associated SNPs has particular hallmarks of stratification) (Supplementary Note). In the second, we combined only uncorrected data (except for the deCODE data, where we have applied GC correction given a more marked genomic control inflation [GC ~1.3] in that sample). We present the resulting data from both approaches (of using GC-corrected and GC-uncorrected stage 1 data for stage 1-3 meta-analysis) in Supplementary Table 6 and a comparison of results (showing very small differences) in the Supplementary Note. All data presented elsewhere in the manuscript reflect the GC-corrected analysis strategy outcome.

Conditional analysis of T2D signals

For each SNP in Table 2, we assessed the additive SNP association in the stage 1 and 2 samples before and after including body mass index in the logistic regression model. For each genotyped and imputed SNP surrounding a specific T2D signal we assessed the additive SNP association in the stage 1 sample before and after including the Table 2 SNP from the same region in the model. We analyzed the data and adjusted for covariates for the stage 1 and stage 2 analysis of each sample. Data were combined across studies as described above. The ORs and CIs were calculated using a fixed-effects model and p values were calculated using the weighted z-score method. For the UK stage 1 samples, we did not have BMI information available for ~1,500 of the population-based controls. We therefore carried out the conditional BMI analyses by using all T2D cases and only those controls for whom BMI data were available.

Quantitative trait analyses

Quantitative trait analyses were carried out in the UK, DGI and FUSION samples for the 11 SNPs taken forward to stage 3. We tested BMI, quantitative glycemic traits (fasting and 2 hour glucose and insulin, HOMA-IR), lipid traits (total, HDL and LDL cholesterol, and serum triglycerides) and blood pressure (systolic and diastolic), where available, for association using an additive genetic model (Supplementary Methods).

Stage 1 samples, genome-wide genotyping and quality control (expanded in Supplementary Methods)

UK

The WTCCC stage 1 sample consists of 1,924 T2D cases and 2,938 population controls from the UK34. These samples were genotyped on the Affymetrix GeneChip Human Mapping 500k Array Set. The call frequency of included samples was >0.97. 393,143 autosomal SNPs passed quality control (QC) criteria: Hardy-Weinberg equilibrium [HWE] p>10 in T2D cases and controls, call frequency >0.95, minor allele frequency (MAF)>0.01, and good clustering, as defined in34.

DGI

The DGI stage 1 Swedish and Finnish sample consists of 1,464 T2D cases and 1,467 normoglycemic controls. Of these, 2,097 are population-based T2D cases and controls matched for BMI, gender, and geographic origin, and 834 are T2D cases and controls in 326 sibships discordant for T2D1. These samples were genotyped on the Affymetrix GeneChip Human Mapping 500k Array Set, and all included samples had a genotype call rate >0.95. 378,860 autosomal SNPs passed QC criteria (call frequency >0.95, HWE p>10 in controls and MAF>0.01 in both population and familial components)1.

FUSION

The FUSION stage 1 sample consists of 1,161 Finnish T2D cases and 1,174 Finnish normal glucose tolerant controls2. In addition, 122 FUSION offspring with genotyped parents were included for quality control purposes and quantitative trait analysis. Samples were genotyped with the Illumina HumanHap300 BeadChip (version 1.1). All samples included had a call frequency >0.975. 306,222 autosomal SNPs passed QC2 and had a HWE p≥10 in the total sample, ≤3 combined duplicate or non-Mendelian inheritance errors (out of 79 duplicate samples and 122 parent-offspring sets), call frequency ≥0.90, and MAF>0.01.

UK

The WTCCC stage 1 sample consists of 1,924 T2D cases and 2,938 population controls from the UK34. These samples were genotyped on the Affymetrix GeneChip Human Mapping 500k Array Set. The call frequency of included samples was >0.97. 393,143 autosomal SNPs passed quality control (QC) criteria: Hardy-Weinberg equilibrium [HWE] p>10 in T2D cases and controls, call frequency >0.95, minor allele frequency (MAF)>0.01, and good clustering, as defined in34.

DGI

The DGI stage 1 Swedish and Finnish sample consists of 1,464 T2D cases and 1,467 normoglycemic controls. Of these, 2,097 are population-based T2D cases and controls matched for BMI, gender, and geographic origin, and 834 are T2D cases and controls in 326 sibships discordant for T2D1. These samples were genotyped on the Affymetrix GeneChip Human Mapping 500k Array Set, and all included samples had a genotype call rate >0.95. 378,860 autosomal SNPs passed QC criteria (call frequency >0.95, HWE p>10 in controls and MAF>0.01 in both population and familial components)1.

FUSION

The FUSION stage 1 sample consists of 1,161 Finnish T2D cases and 1,174 Finnish normal glucose tolerant controls2. In addition, 122 FUSION offspring with genotyped parents were included for quality control purposes and quantitative trait analysis. Samples were genotyped with the Illumina HumanHap300 BeadChip (version 1.1). All samples included had a call frequency >0.975. 306,222 autosomal SNPs passed QC2 and had a HWE p≥10 in the total sample, ≤3 combined duplicate or non-Mendelian inheritance errors (out of 79 duplicate samples and 122 parent-offspring sets), call frequency ≥0.90, and MAF>0.01.

Analysis of stage 1 genotype data

In combining data across the three studies, we did not attempt, given differences in study design and implementation, to harmonize every aspect of individual study analysis and QC. For the UK, DGI and FUSION studies respectively, 393,143, 378,860 and 306,222 SNPs were analyzed under an additive model. The genomic control values for these directly genotyped SNPs were 1.08 (UK), 1.06 (DGI) and 1.03 (FUSION) (Supplementary Methods).

Stage 1 imputation and T2D analysis

For each stage 1 sample set, we imputed genotypes for autosomal SNPs that were present in HapMap Phase II but were not present in the genome-wide chip or did not pass direct genotyping QC. In each sample, genotypes were imputed using the genotype data from the GWA chips and phased HapMap II genotype data from the 60 CEU HapMap founders. We retained SNPs that had an estimated MAF>0.01 in the control samples. Imputed SNPs were then tested for T2D association. The genomic control values for these imputed SNPs were 1.08 (UK), 1.07 (DGI) and 1.04 (FUSION) (Supplementary Methods).

Stage 1 meta-analysis (expanded in Supplementary Methods)

We used meta-analysis to combine the T2D association results for the stage 1 WTCCC, DGI and FUSION samples. The combined stage 1 data are comprised of 10,128 samples: 4,549 T2D cases and 5,579 controls. We used association results from directly genotyped SNPs, where available, and imputed genotype association results at all other positions. 2,168,847 genotyped and imputed autosomal SNPs passed QC and had MAF>0.01 in each of the three samples (44,750 were genotyped in all three samples, 308,628 were genotyped in two samples, 245,158 were genotyped in one sample, 1,570,311 were imputed in all samples). All association results were expressed relative to the forward strand of the reference genome based on dbSNP125. For our initial analysis, which was used to select signals for stage 2 genotyping, for each SNP we combined the ORs for a given reference allele weighted by the confidence intervals using a fixed effects model. We investigated evidence for heterogeneity of ORs using two commonly used statistics: Cochrans's Q statistic and I (25).

We repeated the meta-analysis combining evidence for association based solely on the p value. Specifically, for each study we converted the two-sided p value to a z-statistic which was signed to reflect the direction of the association given the reference allele. Each z-score was then weighted; the squared weights were chosen to sum to 1 and each sample-specific weight was proportional to the square root of the effective number of individuals in the sample. Weighted z-statistics were summed across studies and the summary z-score converted to a two-sided p value.

SNP prioritisation for stage 2 genotyping

We prioritized 69 SNPs for replication in stage 2 based on the results from the three-study stage 1 meta-analysis, using a set of criteria we developed as part of a heuristic approach to the prioritization of loci for follow-up (Supplementary Methods). Briefly, we considered SNPs with a meta-analysis p value <10 and a meta-analysis heterogeneity p value >10. These selections were largely made using the initial OR-based version of the meta-analysis. We allowed some exceptions to the above follow-up criteria.

Five SNPs were selected for replication genotyping on the basis of their strong association with T2D in the DGI GWA study (2 SNPs), association with T2D and with insulinogenic index in the DGI study (1 SNP), and overlap with FUSION or WTCCC (p<0.05 in DGI and one or both studies; 2 SNPs). For known T2D loci (TCF7L2, CDKAL1, IGF2BP2, KCNJ11, HHEX/IDE, SLC30A8, CDKN2A/2B region, WFS1, TCF2, and FTO) we excluded from follow-up all SNPs that resided within the surrounding region, with region boundaries defined by the furthest neighboring SNPs with p values remaining ~0.01 (n=1,981). For the PPARG region, we identified a SNP, rs17036101, with a p value two orders of magnitude lower than the established Pro12Ala susceptibility variant, rs1801282, and took this signal forward to replication. A total of 69 SNPs were taken forward to stage 2 genotyping.

Stage 2 samples, genotyping and analysis

UK

We genotyped the prioritized SNPs in cases and controls from three UK replication sets (RS1, RS2 and RS3, described in4; Supplementary Table 1; Supplementary Methods). Genotyping of prioritized SNPs in RS1, RS2 and RS3 was performed by Kbiosciences (Herts., UK). All assays were validated prior to use, using a standard 96-well validation plate used by Kbiosciences and up to 296 samples from the WTCCC study (see Comparison of genotypes from imputation and direct genotyping; Supplementary Methods). Concordance rates between the Affymetrix and KASPar/TaqMan genotypes (based on up to 296 replicate stage 1 samples) were 97.5% on average. All genotyped SNPs had genotype call frequency rates >94% in the replication sets and no SNPs had HWE p value<0.001 in cases or controls. We tested for association with T2D using the Cochran-Armitage test for trend. Results from the 3 replication sets were combined in a Cochran-Mantel-Haenszel meta-analysis framework.

DGI

We genotyped the prioritized SNPs in three stage 2 case-control samples1 (Supplementary Table 1; Supplementary Methods). The prioritized SNPs were genotyped in all DGI stage 1 and 2 samples using the iPLEX Sequenom MassARRAY platform (http://www.sequenom.com/Assets/pdfs/appnotes/8876-006.pdf). 63 SNPs passing QC (>94% call rate, MAF>0.01 and HWE p value >0.001) were used for association testing. We tested for T2D association in each DGI stage 2 case-control set using a chi-squared analysis (assuming an additive genetic model). Results from the three DGI stage 2 samples were combined using Cochran-Mantel-Haenszel meta-analysis.

FUSION

We genotyped the prioritized SNPs in a Finnish case-control sample (Supplementary Table 1; Supplementary Methods) using the Sequenom Homogeneous Mass EXTEND or iPLEX Gold SBE assays, carried out at the National Human Genome Research Institute (NHGRI). 59 SNPs had genotype call frequency >94% and HWE p value >0.001. The genotype consistency rate among 56 duplicate samples was 100% and the average call frequency of successfully genotyped SNPs was 97.3%. SNPs were analyzed using logistic regression with adjustment for sex, 5-year age category and birth province and an additive model for the genetic effect.

UK

We genotyped the prioritized SNPs in cases and controls from three UK replication sets (RS1, RS2 and RS3, described in4; Supplementary Table 1; Supplementary Methods). Genotyping of prioritized SNPs in RS1, RS2 and RS3 was performed by Kbiosciences (Herts., UK). All assays were validated prior to use, using a standard 96-well validation plate used by Kbiosciences and up to 296 samples from the WTCCC study (see Comparison of genotypes from imputation and direct genotyping; Supplementary Methods). Concordance rates between the Affymetrix and KASPar/TaqMan genotypes (based on up to 296 replicate stage 1 samples) were 97.5% on average. All genotyped SNPs had genotype call frequency rates >94% in the replication sets and no SNPs had HWE p value<0.001 in cases or controls. We tested for association with T2D using the Cochran-Armitage test for trend. Results from the 3 replication sets were combined in a Cochran-Mantel-Haenszel meta-analysis framework.

DGI

We genotyped the prioritized SNPs in three stage 2 case-control samples1 (Supplementary Table 1; Supplementary Methods). The prioritized SNPs were genotyped in all DGI stage 1 and 2 samples using the iPLEX Sequenom MassARRAY platform (http://www.sequenom.com/Assets/pdfs/appnotes/8876-006.pdf). 63 SNPs passing QC (>94% call rate, MAF>0.01 and HWE p value >0.001) were used for association testing. We tested for T2D association in each DGI stage 2 case-control set using a chi-squared analysis (assuming an additive genetic model). Results from the three DGI stage 2 samples were combined using Cochran-Mantel-Haenszel meta-analysis.

FUSION

We genotyped the prioritized SNPs in a Finnish case-control sample (Supplementary Table 1; Supplementary Methods) using the Sequenom Homogeneous Mass EXTEND or iPLEX Gold SBE assays, carried out at the National Human Genome Research Institute (NHGRI). 59 SNPs had genotype call frequency >94% and HWE p value >0.001. The genotype consistency rate among 56 duplicate samples was 100% and the average call frequency of successfully genotyped SNPs was 97.3%. SNPs were analyzed using logistic regression with adjustment for sex, 5-year age category and birth province and an additive model for the genetic effect.

Comparison of genotypes from imputation and direct genotyping

A proportion of the prioritized imputed signals was genotyped in the stage 1 samples of the three studies and respective concordance rates were calculated (Supplementary Methods; Supplementary Table 4). All results presented in the main manuscript text are based on directly-typed stage 1 data.

Combined meta-analysis for stages 1 and 2

We combined stage 1 and stage 2 data using both the OR-based and the weighted z score-based meta-analysis approaches described above (Stage 1 meta-analysis). We also assessed our results using random effects meta-analysis to better account for any heterogeneity between the studies (Supplementary Table 6). Locus-specific and combined sibling relative risk estimates were calculated using sample size-weighted estimates of the effect size and risk-allele frequency derived from stage 2 replication samples only, and under the assumption of allelic and locus independence, as described by2627.

Stage 3 sample, genotyping and association analysis

Eleven SNPs (rs2641348, rs10490072, rs7578597, rs17036101, rs4607103, rs9472138, rs864745, rs12779790, rs1153188, rs10923931, and rs7961581) were followed up in the stage 3 samples, from the deCODE, KORA, Danish, HUNT, NHS, GEM Consortium (CCC, EPIC, ADDITION/Ely, Norfolk) and METSIM studies (Supplementary Table 1; Supplementary Methods).

Combined meta-analysis for stages 1, 2, and 3

We combined stage 1, 2 and 3 data using both meta-analysis approaches (fixed-effects model to combine ORs and weighted p value-based z-statistic combination across all sample sets) described above (Stage 1 meta-analysis). We also assessed our results using random effects meta-analysis (Supplementary Table 6). We observed some evidence for heterogeneity across studies (the I statistic ranged from 0 to 57.8% depending on SNP), with rs7578597 and rs10923931 displaying the largest fold differences in association p value between the fixed- and random-effects model analyses. Differences in strength of association across studies (leading to evidence for heterogeneity) could reflect interesting biological associations that vary from study to study depending on subject ascertainment scheme.

Genomic control (expanded in Supplementary Methods)

We have adopted two strategies in reporting the findings from this study. In the first, we performed GC-correction of data from DGI, FUSION and WTCCC prior to stage 1 meta-analysis. We corrected each individual study for the GC inflation observed (directly genotyped and imputed data separately), and combined results across studies. We present the genome-wide distribution of association statistics in Supplementary Figure 1. We note that, after study-specific genomic control adjustment, the estimated inflation factor for the stage-1 meta-analysis test statistic was 1.04.

In the second, we combined GC-uncorrected data from DGI, FUSION and WTCCC for stage 1 meta-analysis and did not correct the meta-analysis test statistics for the overall GC (to guard against over-conservativeness in the estimate of strength of association for interesting signals). We also present the genome-wide distribution of these statistics in Supplementary Figure 1.

For the combination of data across stages 1, 2 and 3, we also adopted these two strategies (of using GC-corrected and GC-uncorrected stage 1 data). In the first, we performed individual GC-correction of DGI, FUSION and WTCCC stage 1 data prior to meta-analysis with stage 2 and stage 3 data (an approach which may be over-conservative where, as here, none of the T2D-associated SNPs has particular hallmarks of stratification) (Supplementary Note). In the second, we combined only uncorrected data (except for the deCODE data, where we have applied GC correction given a more marked genomic control inflation [GC ~1.3] in that sample). We present the resulting data from both approaches (of using GC-corrected and GC-uncorrected stage 1 data for stage 1-3 meta-analysis) in Supplementary Table 6 and a comparison of results (showing very small differences) in the Supplementary Note. All data presented elsewhere in the manuscript reflect the GC-corrected analysis strategy outcome.

Conditional analysis of T2D signals

For each SNP in Table 2, we assessed the additive SNP association in the stage 1 and 2 samples before and after including body mass index in the logistic regression model. For each genotyped and imputed SNP surrounding a specific T2D signal we assessed the additive SNP association in the stage 1 sample before and after including the Table 2 SNP from the same region in the model. We analyzed the data and adjusted for covariates for the stage 1 and stage 2 analysis of each sample. Data were combined across studies as described above. The ORs and CIs were calculated using a fixed-effects model and p values were calculated using the weighted z-score method. For the UK stage 1 samples, we did not have BMI information available for ~1,500 of the population-based controls. We therefore carried out the conditional BMI analyses by using all T2D cases and only those controls for whom BMI data were available.

Quantitative trait analyses

Quantitative trait analyses were carried out in the UK, DGI and FUSION samples for the 11 SNPs taken forward to stage 3. We tested BMI, quantitative glycemic traits (fasting and 2 hour glucose and insulin, HOMA-IR), lipid traits (total, HDL and LDL cholesterol, and serum triglycerides) and blood pressure (systolic and diastolic), where available, for association using an additive genetic model (Supplementary Methods).

Supplementary Material

Supplementary Methods, Figures/Tables

Supplementary Table 5

Supplementary Methods, Figures/Tables

Click here to view.(1.4M, pdf)

Supplementary Table 5

Click here to view.(62K, xls)

Acknowledgements

UK: Collection of the UK type 2 diabetes cases was supported by Diabetes UK, BDA Research and the UK Medical Research Council (Biomedical Collections Strategic Grant G0000649). The UK Type 2 Diabetes Genetics Consortium collection was supported by the Wellcome Trust (Biomedical Collections Grant GR072960). The GWA genotyping was supported by the Wellcome Trust (076113) and replication genotyping by the European Commission (EURODIA LSHG-CT-2004- 518153), MRC (Project Grant G016121), Wellcome Trust, Peninsula Medical School, and Diabetes UK. EZ is a Wellcome Trust Research Career Development Fellow. We acknowledge the contribution of Dr Michael Sampson, and our team of research nurses. We acknowledge the efforts of Jane Collier, Phil Robinson, Steven Asquith and others at Kbiosciences (http://www.kbioscience.co.uk/) for their rapid and accurate large-scale genotyping.

DGI: We thank the study participants who made this research possible. We thank colleagues in the Broad Genetic Analysis and Biological Samples Platforms for their expertise and contributions to genotyping, data and sample management, and analysis. The initial GWAS genotyping was supported by Novartis (to DA); support for additional analysis and genotyping in this report was provided by funding from the Broad Institute of Harvard and MIT, by The Richard and Susan Smith Family Foundation / American Diabetes Association Pinnacle Program Project Award (to DA), and by a Freedom to Discovery award of the Foundation of Bristol Myers Squibb (to DA). PIWdB, MJD, DA, acknowledge support from NIH/ NHLBI grant (U01 {"type":"entrez-nucleotide","attrs":{"text":"HG004171","term_id":"519113797","term_text":"HG004171"}}HG004171). DA was a Burroughs Wellcome Fund Clinical Scholar in Translational Research, and is a Distinguished Clinical Scholar of the Doris Duke Charitable Foundation. LG, TT, BI, MRT and the Botnia Study are principally supported by the Sigrid Juselius Foundation, the Finnish Diabetes Research Foundation, The Folkhalsan Research Foundation and Clinical Research Institute HUCH Ltd; work in Malmö, Sweden was also funded by a Linné grant from the Swedish Research Council (349-2006-237). We thank the Botnia and Skara research teams for clinical contributions, and colleagues at MGH, Harvard, Broad, Novartis and Lund for helpful discussions throughout the course of this work.

FUSION: We thank the Finnish citizens who generously participated in this study, and Ryan Welch for bioinformatics support. Support for this research was provided by NIH grants DK062370 (M.B.), {"type":"entrez-nucleotide","attrs":{"text":"DK072193","term_id":"187693969","term_text":"DK072193"}}DK072193 (K.L.M.), {"type":"entrez-nucleotide","attrs":{"text":"HL084729","term_id":"1051655137","term_text":"HL084729"}}HL084729(G.R.A.), {"type":"entrez-nucleotide","attrs":{"text":"HG002651","term_id":"548932687","term_text":"HG002651"}}HG002651 (G.R.A.), and U54 {"type":"entrez-nucleotide","attrs":{"text":"DA021519","term_id":"78407467","term_text":"DA021519"}}DA021519; National Human Genome Research Institute intramural project number 1 Z01 {"type":"entrez-nucleotide","attrs":{"text":"HG000024","term_id":"743402349","term_text":"HG000024"}}HG000024 (F.S.C.); and a post-doctoral fellowship award from the American Diabetes Association (C.J.W.). Genome-wide genotyping was performed by the Johns Hopkins University Genetic Resources Core Facility (GRCF) SNP Center at the Center for Inherited Disease Research (CIDR) with support from CIDR NIH Contract Number N01-HG-65403 and the GRCF SNP Center.

deCODE: We thank the Icelandic study participants whose contribution made this work possible. We also thank the nurses at Noatun (deCODE's sample recruitment center) and personnel at the deCODE core facilities for their hard work and enthusiasm.

KORA study: We thank Christian Gieger and Guido Fischer for expert data handling. The MONICA/KORA Augsburg studies were financed by the GSF-National Research Center for Environment and Health, Neuherberg, Germany and supported by grants from the German Federal Ministry of Education and Research (BMBF). Part of this work was financed by the German National Genome Research Network (NGFN). Our research was also supported within the Munich Center of Health Sciences (MC Health) as part of LMUinnovativ. We thank all members of field staffs who were involved in the planning and conduct of the MONICA/KORA Augsburg studies.

Danish study: This work was supported by the European Union (EUGENE2, grant no. LSHM-CT-2004-512013), Lundbeck Foundation centre of Applied Medical Genomics in Personalized Disease Prediction, Prevention and Care and The Danish Medical Research Council.

HUNT: The Nord-Trøndelag Health Study (The HUNT Study) is a collaboration between The HUNT Research Centre, Faculty of Medicine, Norwegian University of Science and Technology (NTNU), The National Institute of Public Health, The National Screening Service of Norway and The Nord-Trøndelag County Council.

NHS: The Nurses' Health Study is funded by National Cancer Institute grant CA87969. L.Q. is supported by an American Heart Association Scientist Development Grant. F.B.H. is supported by NIH grants DK58845 and U01 {"type":"entrez-nucleotide","attrs":{"text":"HG004399","term_id":"508081929","term_text":"HG004399"}}HG004399.

GEM Consortium: We thank all study participants. The work on the Cambridgeshire case-control, Ely, ADDITION and EPIC-Norfolk studies was funded by support from the Wellcome Trust and MRC. The Norfolk Diabetes study is funded by the MRC with support from NHS Research &amp; Development and the Wellcome Trust. We are grateful to Dr Simon Griffin, MRC Epidemiology Unit, for assistance with the ADDITION study and Dr Mike Sampson and Dr Elizabeth Young for help with the Norfolk Diabetes Study. We thank Suzannah Bumpstead, William E Bottomley and Amy Chaney for rapid and accurate genotyping and Jilur Ghori for assay design and informatics support. We are grateful to Panos Deloukas for overall genotyping support. F.P. and I.B. are funded by the Wellcome Trust.

METSIM: The METSIM study has received grant support from the Academy of Finland (no. 124243).

for the Diabetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02142, USA.
Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA.
Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA.
Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA.
Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
Dept of Statistics, University of Oxford, Oxford UK
Division of Genetics, Brigham and Women's Hospital, Harvard-Partners Center for Genetics and Genomics, Boston, MA 02115, USA.
Department of Clinical Sciences, Diabetes and Endocrinology Research Unit, University Hospital Malmö, Lund University, Malmö, Sweden
Steno Diabetes Center, Copenhagen, Denmark
R &amp; D Centre, Skaraborg Institute, Skövde, Sweden
Department of Physiology and Biophysics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA.
Genome Technology Branch, National Human Genome Research Institute, Bethesda, MD 20892, USA
Faculty of Health Science, University of Aarhus, Aarhus, Denmark
Diabetes and Metabolism Disease Area, Novartis Institutes for BioMedical Research, 100 Technology Square, Cambridge, MA 02139, USA
Diabetes Research Group, Division of Medicine and Therapeutics, Ninewells Hospital and Medical School, Dundee, UK.
Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, Magdalen Road, Exeter, UK
Diabetes Genetics, Institute of Biomedical and Clinical Science, Peninsula Medical School, Barrack Road, Exeter, UK.
GSF - National Research Center for Environment and Health, Institute of Epidemiology, Neuherberg, Germany
Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, UK
Institute for Clinical Diabetes Research, German Diabetes Center, Leibniz Institute at Heinrich Heine University, Düsseldorf, Germany
Centre for Diabetes and Metabolic Medicine, Barts and The London, Royal London Hospital, Whitechapel, London, UK.
Malmska Municipal Health Center and Hospital, Jakobstad, Finland
Folkhälsan Research Center, Helsinki, Finland.
Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark
deCODE genetics, Sturlugata 8, 101 Reykjavik, Iceland.
Department of Medicine, University of Kuopio and Kuopio University Hospital, 70210, Kuopio, Finland.
MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.
Department of General Practice, University of Aarhus, Aarhus, Denmark.
Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA.
HUNT Research Centre, Department of Public Health and General Practice, Faculty of Medicine, Norwegian University of Science and Technology (NTNU), 7650 Verdal, Norway.
Population Pharmacogenetics Group, Biomedical Research Centre, Ninewells Hospital and Medical School, Dundee, UK.
Metabolic Disease Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), N-7600 Levanger, Norway.
Depts. of Nutrition and Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
Channing Laboratory, Dept. of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
The MRC Centre for Causal Analyses in Translational Epidemiology, Bristol University, Canynge Hall, Whiteladies Rd, Bristol, UK.
Department of Medicine, Helsinki University Hospital, University of Helsinki, Helsinki, Finland.
Diabetes Unit, Department of Epidemiology and Health Promotion, National Public Health Institute, 00300 Helsinki, Finland.
Department of Public Health, University of Helsinki, 00014 Helsinki, Finland.
South Ostrobothnia Central Hospital, 60220 Seinäjoki, Finland.
Diabetes Research Group, School of Clinical Medical Sciences, Newcastle University, Framlington Place, Newcastle upon Tyne, UK.
Dept. of Preventative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA.
Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA.
Correspondence to: Mark I McCarthy, Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Old Road, Headington, Oxford, OX3 7LJ, UK; Tel: +44 1865 857298; Fax: +44 1865 857299; ku.ca.xo.lrd@yhtraccm.kram;
Michael Boehnke, Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, Michigan 48109-2029, USA; Tel: 734 936 1001; Fax: 734 615 8322; ude.hcimu@eknheob;
David Altshuler, Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA; Tel: 617 726 5936; Fax: 617 643 3293; ude.dravrah.hgm.oiblom@reluhstla.
These authors contributed equally.
Membership of the Wellcome Trust Case Control Consortium is provided in Supplementary Material.
Author contributions section

Writing team and Project Management:

Laura J Scott, Eleftheria Zeggini, Richa Saxena, Benjamin F Voight, David Altshuler, Michael Boehnke, Mark I McCarthy

Study Design:

Richa Saxena, Benjamin F Voight, Eleftheria Zeggini, Laura J Scott, Thomas E Hughes, Frank B Hu, Jeff J Roix, Hong Chen, Kari Stefansson, Oluf Pedersen, Thomas Illig, Kristian Hveem, Markku Laakso, Andrew T Hattersley, Inês Barroso, Nicholas J Wareham, Francis S Collins, Leif Groop, David Altshuler, Mark I McCarthy, Michael Boehnke

Analysis:

UK: Katherine S Elliott, Rachel M Freathy, Hana Lango, Cecilia M Lindgren, John RB Perry, Inga Prokopenko, Nigel W Rayner, Nicholas J Timpson, Michael N Weedon, Jonathan L Marchini, Eleftheria Zeggini

FUSION: Peter S Chines, Charles Ding, William L Duren, Tianle Hu, Anne U Jackson, Yun Li, Amanda F Marvelle, Li Qin, Heather M Stringham, Cristen J Willer, Gonçalo R Abecasis, Laura J Scott

DGI: Richa Saxena, Benjamin F Voight, Paul IW de Bakker, Finny G Kuruvilla, Peter Almgren, Mark J Daly

deCODE: Unnur Thorsteinsdottir, Augustine Kong

Danish: Niels Grarup, Gitte Andersen, Torben Hansen, Oluf Pedersen

HUNT: Kristian Midthjell

NHS: Lu Qi

GEM Consortium: Claudia Langenberg

METSIM: Markku Laakso

Clinical samples and genotyping:

UK: Alex SF Doney, Timothy M Frayling, Christopher J Groves, Graham A Hitman, Katharine R Owen, Colin NA Palmer, Beverley Shields, Mark Walker, Andrew D Morris, Andrew T Hattersley, Mark I McCarthy

FUSION: Lori L Bonnycastle, Parimal Deodhar, Michael R Erdos, Kari Kubalanza, Mario A Morken, Narisu Narisu, Matthew Rees, Amy M Swift, Richard N Bergman, Karen L Mohlke, Jaakko Tuomilehto, Richard M Watanabe

DGI: Kristin Ardlie, Kristina Bengtsson Boström, Noël P Burtt, Lauren Gianniny, Candace Guiducci, Bo Isomaa, Valeriya Lyssenko, Peter Nilsson, Marketa Sjögren, Tiinamaija Tuomi, Leif Groop

deCODE: Valgerdur Steinthorsdottir, Gudmar Thorleifsson, Kari Stefansson

KORA: Harald Grallert, Christian Herder, Christa Meisinger, Thomas Illig

Danish: Gitte Andersen, Niels Grarup, Torben Hansen, Torben Jørgensen, Torsten Lauritzen, Anelli Sandbæk, Knut Borch-Johnsen, Oluf Pedersen

HUNT: Kristian Midthjell, Elin Pettersen, Carl Platou, Kristian Hveem

NHS: Frank B Hu

GEM Consortium: Felicity Payne, Inês Barroso, Nicholas J Wareham

METSIM: Johanna Kuusisto

Abstract

Genome-wide association (GWA) studies have identified multiple new genomic loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D)111. Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to discover loci at which common alleles have modest effects, we performed meta-analysis of three T2D GWA scans encompassing 10,128 individuals of European-descent and ~2.2 million SNPs (directly genotyped and imputed). Replication testing was performed in an independent sample with an effective sample size of up to 53,975. At least six new loci with robust evidence for association were detected, including the JAZF1 (p=5.0×10), CDC123/CAMK1D (p=1.2×10), TSPAN8/LGR5 (p=1.1×10), THADA (p=1.1×10), ADAMTS9 (p=1.2×10), and NOTCH2 (p=4.1×10) gene regions. The large number of loci with relatively small effects indicates the value of large discovery and follow-up samples in identifying additional clues about the inherited basis of T2D.

Abstract

Genome-wide association studies are unbiased by previous hypotheses concerning candidate genes and pathways, but challenged by the modest effect sizes of individual common susceptibility variants and the need for stringent statistical thresholds. For example, the largest allelic odds ratio of any established common variant for T2D is ~1.35 (TCF7L2), with the nine other validated associations to common variants (excluding FTO, which has its primary effect through obesity) having allelic odds ratios between 1.1 and 1.2161112. To augment power to detect additional loci of similar and/or smaller effect, we increased sample size by combining three previously published GWA studies (Diabetes Genetics Initiative [DGI], Finland-United States Investigation of NIDDM Genetics [FUSION], and Wellcome Trust Case Control Consortium [WTCCC])14, and extended SNP coverage by imputing untyped SNPs based on patterns of haplotype variation from the HapMap dataset13 (Table 1).

Table 1

Overview of study design.

Studyn cases#n controls#effective sample
size#
n directly genotyped
SNPs*
n imputed
SNPs*
Stage 1

DGI1,4641,4672,521378,8601,853,222
WTCCC1,9242,9384,706393,1431,915,393
FUSION1,1611,1742,335306,2222,110,199

Stage 2

DGI stage 25,0655,7859,87463-
FUSION stage 21,2151,2582,47359-
UK stage 23,7575,3469,11466-

Stage 3

deCODE1,520 (1,422)25,235 (3,455)4,280 (3,130)11-
KORA1,2411,4582,6846-
Danish4,0895,0438,69011-
HUNT1,0041,5032,41211-
NHS1,5062,0143,46810-
CCC5475331,07011-
EPIC3887741,03610-
ADDITION/Ely8921,6102,28811-
Norfolk2,3112,4004,45011-
METSIM6592,6392,13611-
Autosomal SNPs passing quality control, as defined for directly genotyped and imputed SNPs in each study (QC criteria: SNPTEST information measure≥0.5; rhat≥0.3; MAF>0.01). For the stage 1 meta-analysis, we combined results for 2,168,847 directly genotyped and imputed SNPs passing QC in all three studies (Methods; Supplementary Methods).
Sample sizes presented here are the maximum available for each study. For the deCODE stage 3 study, we used genotype data from the Icelandic GWA scan for rs2641348, rs7578597 and rs9472138, and a perfect proxy (rs2793831, based on HapMap) for rs10923931. The remaining SNPs had not been directly typed as part of this scan and were therefore genotyped separately, in a subset of the GWA scan samples (numbers indicated in parentheses) (Supplementary Methods).

We started with a set of genotyped autosomal SNPs that passed quality control (QC) filters in each study: in WTCCC, 393,143 SNPs from the Affymetrix 500k chip (MAF>0.01; 1,924 cases and 2,938 population-based controls from the Wellcome Trust Case Control Consortium34); in DGI, 378,860 SNPs from the Affymetrix 500k chip (minor allele frequency [MAF]>0.01; Swedish and Finnish sample of 1,464 T2D cases and 1,467 normoglycaemic controls, including 326 discordant sibships1); and in FUSION, 306,222 SNPs from the Illumina 317k chip (MAF>0.01, 1,161 T2D cases and 1,174 normal glucose tolerant controls from Finland2) (Supplementary Table 1). There were 44,750 SNPs (MAF>0.01) directly genotyped in all three studies across the two platforms. We used data from the GWA studies and phased chromosomes from the HapMap CEU sample to impute autosomal SNPs with MAF>0.01(14 and Y.L., C.J.W., J.D, P.S., G.R.A. Markov model for rapid haplotyping and genotype imputation in genome wide studies. Submitted, 2007; http://www.sph.umich.edu/csg/abecasis/MaCH/download/). We based our further analyses on 2,168,847 SNPs that met imputation and genotyping QC criteria across all studies (Methods; Supplementary Methods).

Using these directly measured and imputed genotypes, we tested for association of each SNP with T2D in each study separately, corrected each study for residual population stratification, cryptic relatedness or technical artifacts using genomic control, and then combined these results in a genome-wide meta-analysis across a total of 10,128 samples (4,549 cases, 5,579 controls) (Methods; Supplementary Methods). We calculated that this sample size provides reasonable power to detect additional variants with properties similar to those previously identified by less formal data combination efforts124 (Supplementary Table 2). Unless otherwise indicated, results presented are derived from individually genomic control-adjusted stage 1 results. Meta-analysis OR and confidence intervals are obtained from a fixed-effects model and p values from a weighted z-statistic-based meta-analysis (Methods; Supplementary Methods). As expected, the most significant result was obtained for rs7903146 in TCF7L2. We also observed evidence for association (p<10) at eight of the ten established T2D loci (as well as at the FTO obesity locus)12 (Supplementary Table 3). This is unsurprising, as these same data supported discovery of many of these loci. Since our goal was to identify new loci, we excluded 1,981 SNPs in the immediate vicinity of these T2D susceptibility loci from further analysis (with the exception of a signal near PPARG, which was followed-up), and examined the remainder of the autosomal genome (Methods; Supplementary Methods). Even after excluding known loci, we saw a strong enrichment of highly associated variants: 426 with p values <10, compared to 217 under the null.

Before proceeding to follow-up, we explored the individual studies and combined data for potential errors and biases. We found a genomic control λ value of 1.04 for the combined results (based on 10,128 samples), which, given the relationship between λ and sample size15, suggests little residual confounding (Supplementary Figure 1; Supplementary Note). We also used genome-wide genotype data to estimate the principal components (PC) of the identity-by-state relationships in each stage 1 sample. For the SNPs presented in Table 2, adjustment for principal components in stage 1 T2D association analysis did not diminish the association in the WTCCC (2 PCs), FUSION (10 PCs), or DGI (10 PCs) sample (Supplementary Note). Additionally, we found no evidence for association between UK population ancestry informative markers3 and disease status in the UK replication sets (Supplementary Note). To ensure that the observed stage 1 associations taken forward to follow-up were not due to imputation errors, we directly genotyped originally imputed variants in the stage 1 samples (Methods; Supplementary Methods). We found strong agreement between the genotype-based and imputed p values (in 38 of 43 cases where a direct genotype-based result was obtained, the p value was within one order of magnitude of that from imputation, and in the remaining 5 cases p values were less than 2 orders of magnitude different) (Supplementary Table 4).

Table 2

Eleven T2D-associated SNPs taken forward to stages 2 and 3.

Stage 1 (DGI, FUSION,
WTCCC)
Stage 2 (DGI, FUSION,
UKT2D)
Stage3 (deCODE,
KORA, Steno, HUNT,
NHS, CCC, EPIC
ADDITION/Ely,
Norfolk, METSIM)
All data

SNPChrPosition NCBI35 (bp)non-
risk
allele#
risk
allele#
risk allele
frequency#
nearest
gene(s)
OR (95%CI)P valueOR (95%CI)P valueOR (95%CI)P valueneffOR (95%CI)P valuePhetn samples
for 80%
power##
rs864745727,953,796CT0.501JAZF11.14
(1.07-1.20)
1.5E-041.08
(1.04-1.12)
8.1E-051.10
(1.06-1.15)
1.3E-0759,6171.10
(1.07-1.13)
5.0E-140.7010,610
rs127797901012,368,016AG0.183CDC123/CAMK1D1.15
(1.06-1.24)
4.2E-041.11
(1.06-1.16)
5.4E-051.09
(1.04-1.14)
1.5E-0462,3661.11
(1.07-1.14)
1.2E-100.679334
rs79615811269,949,369TC0.269TSPAN8/LGR51.18
(1.10-1.26)
1.8E-051.06
(1.02-1.11)
9.8E-031.09
(1.04-1.13)
4.3E-0562,3011.09
(1.06-1.12)
1.1E-090.2023,206
rs7578597243,644,474CT0.902THADA1.25
(1.12-1.40)
1.8E-041.15
(1.07-1.22)
1.6E-031.12
(1.05-1.20)
9.2E-0560,8321.15
(1.10-1.20)
1.1E-090.0089,624
rs4607103364,686,944TC0.761ADAMTS91.13
(1.06-1.22)
5.4E-041.10
(1.05-1.15)
1.0E-041.06
(1.01-1.11)
3.5E-0362,3871.09
(1.06-1.12)
1.2E-080.179,748
rs10923931*1120,230,001GT0.106NOTCH21.30
(1.17-1.43)
1.1E-041.09
(1.03-1.16)
2.9E-031.11
(1.05-1.18)
1.9E-0358,6671.13
(1.08-1.17)
4.1E-080.00421,568
rs11531881253,385,263TA0.733DCD1.15
(1.08-1.23)
3.2E-051.07
(1.03-1.12)
3.1E-031.06
(1.02-1.10)
8.8E-0362,3011.08
(1.05-1.11)
1.8E-070.7917,808
rs17036101**312,252,845AG0.927SYN2/PPARG1.33
(1.18-1.50)
1.0E-051.13
(1.04-1.22)
4.5E-031.11
(1.02-1.20)
1.2E-0259,6821.15
(1.10-1.21)
2.0E-070.1916,370
rs2641348*1120,149,926AG0.107ADAM301.14
(1.05-1.25)
1.4E-031.10
(1.03-1.17)
1.2E-031.09
(1.03-1.16)
7.8E-0360,0481.10
(1.06-1.15)
4.0E-070.0817,428
rs9472138643,919,740CT0.282VEGFA1.13
(1.06-1.21)
5.4E-051.07
(1.02-1.12)
1.5E-031.03
(1.00-1.07)
9.5E-0263,5371.06
(1.04-1.09)
4.0E-060.4316,696
rs10490072260,581,582CT0.724BCL11A1.17
(1.10-1.26)
3.4E-051.08
(1.03-1.13)
1.4E-031.00
(0.97-1.04)
6.5E-0159,6821.05
(1.03-1.08)
1.0E-040.003513,502

Maximum available effective sample size9,56221,46132,514

Table 2 presents results from the analysis of directly genotyped data only, except for FUSION stage 1 results for rs7961581 (Supplementary Methods). Combined estimates of odds ratio (OR) were calculated using a fixed effects, inverse variance meta-analysis; DGI discordant sibling pairs were not included in OR estimates. P values were combined using a weighted z score-based meta-analysis including DGI sibships; p values for the three stage 1 studies were individually corrected by genomic control before meta analysis.

Ancestral allele is denoted in bold, based on Entrez SNP and derived by comparison against chimpanzee sequence. The risk allele frequencies presented are sample size-weighted risk allele frequencies across the stage 2 studies.
SNPs rs10923931 and rs2641348 appear to represent the same signal (r=0.92 in HapMap CEU)13. Results for rs2934381 and rs2793831 (perfect proxies for rs10923931) are presented for UK (stage 1,2) and deCODE (stage 3) respectively.
The signal at SNP rs17036101 is indistinguishable from rs1801282, the established P12A variant in PPARG.
Sample size (sum of case and control samples) required for 80% power (to achieve nominal replication at α=0.05) is calculated based on the stage 2 OR estimate, sample size-weighted risk allele frequency across the stage 2 studies and assuming an equal number of cases and controls (Supplementary Methods).

We selected SNPs for replication based principally on the statistical evidence for association in stage 1, excluding SNPs with evidence for heterogeneity of ORs (p<10) across studies (Methods; Supplementary Methods). Sixty nine SNPs were taken forward to an initial round of replication (stage 2) in up to 22,426 additional samples of European descent (Table 1, Supplementary Table 1). The distribution of association p values in stage 2 was highly inconsistent with a null distribution. Of the 69 signals selected for follow-up, a total of 65 were successfully genotyped in stage 2, and represented loci that were independent of each other and of previously established susceptibility loci. Nine of these had a p value ≤0.01 with association in the same direction as the original signal, far in excess of 0.33 expected under the null (p=1.4×10, binomial test; Supplementary Methods), and two SNPs had p<10 as compared to an expectation of 0.0033 (p=5.2×10) (Supplementary Methods; Supplementary Table 5).

We identified eleven SNPs (ten separate signals, nine of which represent novel loci) with p<0.005 in stage 2, for which the combined stage 1 and stage 2 data (based on direct genotyping of stage 1 samples, where previously imputed) generated p<10. These eleven SNPs were further genotyped in up to 57,366 additional samples (14,157 cases and 43,209 controls) of European descent in stage 3 (Table 1; Supplementary Table 1; Methods; Supplementary Methods). The distribution of p values for these 11 SNPs was again inconsistent with a null distribution: all nine new and independent SNPs had effects in the same direction as in the stage 1 + 2 meta-analysis (p=0.002), and seven had p<0.05 in the direction of the original association (p=2.1×10) (Table 2).

Based on the combined stage 1-3 analyses, six signals reach compelling levels of evidence (p=5.0×10 or better) for T2D association (Table 2). As in all LD-mapping approaches, characterization of the causal variants responsible, their effect sizes, and the genes through which they act will require extensive resequencing and fine-mapping. However, on current evidence, the most associated variants in each of these signals map to intron 1 of JAZF1, between CDC123 and CAMK1D, between TSPAN8 and LGR5, in exon 24 of THADA, near ADAMTS9, and in intron 5 of NOTCH2.

The strongest statistical evidence for a novel association signal was with rs864745 in intron 1 of the JAZF1 gene (Figure 1), one of a cluster of associated SNPs with strong evidence for association in the stage 1 meta-analysis, and across each replication sample set (Table 2; Supplementary Table 6). The overall estimate of effect was an OR[95%CI] of 1.10[1.07-1.13] (p=5.0×10 under an additive model), based on 68,042 individuals. The JAZF1 (juxtaposed with another zinc finger gene 1) gene encodes a transcriptional repressor of NR2C2 (nuclear receptor subfamily 2, group C, member 2)16. Mice deficient in Nr2c2 exhibit growth retardation, low IGF1 serum levels, and perinatal and early postnatal hypoglycaemia17. While this paper was in review, a SNP in JAZF1 was identified as associated with prostate cancer18; this is particularly interesting given the recent finding that SNPs in TCF2 are also associated both with T2D and prostate cancer1920.

An external file that holds a picture, illustration, etc.
Object name is ukmss-4376-f0001.jpg

Regional plots of six confirmed associations. For each of the (A) JAZF1, (B) CDC123/CAMK1D, (C) TSPAN8/LGR5, (D) THADA, (E) ADAMTS9 and (F) NOTCH2/ADAM30 regions, genotyped and imputed SNPs passing QC across all three stage 1 studies are plotted with their meta-analysis p values (as −log10 values) as a function of genomic position (with NCBI Build 35). In each panel, the SNP taken forward to stages 2 and 3 is represented by a blue diamond (meta-analysis p value across stages 1-3), and its initial p value in stage 1 data is denoted by a red diamond. Estimated recombination rates (taken from HapMap)13 are plotted to reflect the local LD structure around the associated SNPs and their correlated proxies (according to a white to red scale from r=0 to r=1; based on pairwise r values from HapMap CEU)13. Gene annotations were taken from the University of California-Santa Cruz genome browser.

The second strongest new statistical signal (rs12779790, combined OR[95%CI] of 1.11[1.07-1.14], p=1.2×10) lies in an intergenic region ~90 kb from the CDC123 (cell division cycle 123 homolog [S. cerevisiae]) and ~63.5 kb from the CAMK1D (calcium/calmodulin-dependent protein kinase ID) genes (Figure 1; Table 2; Supplementary Table 6). CDC123 is regulated by nutrient availability in S. cerevisiae and plays a role in cell cycle regulation21. Evidence from previous GWA studies implicating variants in CDKAL1 and near CDKN2A/B in T2D predisposition suggests that cell cycle dysregulation may be a common pathogenetic mechanism in T2D124.

The third strongest statistical signal resides upstream of the TSPAN8 (tetraspanin 8) gene (rs7961581; combined OR[95%CI]: 1.09[1.06-1.12], p=1.1×10) (Figure 1; Table 2; Supplementary Table 6). Tetraspanin 8 is a cell-surface glycoprotein expressed in carcinomas of the colon, liver and pancreas.

The fourth strongest novel association signal (rs7578597, a non-synonymous SNP [T1187A]; combined OR[95%CI] of 1.15[1.10-1.20], p=1.1×10) resides in exon 24 of the widely-expressed THADA (thyroid adenoma associated) gene (Figure 1; Table 2; Supplementary Table 6). Disruption of THADA by chromosomal rearrangements (including fusion with intronic sequence from PPARG) is observed in thyroid adenomas22. The function of THADA has not been well-characterized, but there is some evidence to suggest it may be involved in the death receptor pathway and apoptosis23.

Rs4607103 (combined OR[95%CI]: 1.09[1.06-1.12], p=1.2×10), representing a cluster of associated SNPs, resides ~38 kb upstream of the ADAMTS9 (ADAM metallopeptidase with thrombospondin type 1 motif, 9) gene, and is the fifth strongest new signal (Figure 1; Table 2, Supplementary Table 6). ADAMTS9 is a secreted metalloprotease that cleaves the proteoglycans versican and aggrecan, and is expressed widely, including in skeletal muscle and pancreas.

The sixth strongest signal, marked by rs10923931, resides within intron 5 of the NOTCH2 (Notch homolog 2 [Drosophila]) gene (combined OR[95%CI]: 1.13[1.08-1.17], p=4.1×10) (Figure 1; Table 2; Supplementary Table 6). Rs2641348, a non-synonymous SNP (L359P) within the neighboring ADAM30 (ADAM metallopeptidase domain 30) gene represents the same signal (r=0.92 based on HapMap CEU data) and was also followed-up, but its overall signal (combined OR[95%CI]: 1.10[1.06-1.15], p=4.0×10; Table 2) was slightly less strong. NOTCH-2 is a type 1 transmembrane receptor; in mice, the Notch-2 receptor is expressed in embryonic ductal cells of branching pancreatic buds during pancreatic organogenesis, the likely source of endocrine and exocrine stem cells24.

The strength of the association evidence for the remaining four variants taken into stage 3 does not meet our prespecified threshold of p≤5.0×10. However, it is likely (based on individual significance values and their overall distribution) that several of these also represent genuine association signals. In all, three of these additional SNPs showed p values <10 across the combined data (Table 2), and two demonstrated p<0.05 in stage 3 in the same direction as in stages 1 and 2. Variants near DCD (dermcidin) showed evidence for association (rs1153188; overall p=1.8×10) (Supplementary Figure 2). A signal in VEGFA had previously been noted in the UK GWA scan4, but displays inconsistent evidence for replication: further studies will be required to establish its status. We also found association at rs17036101, ~44 kb downstream of SYN2 (synapsin II) and 115.3 kb upstream of the established T2D susceptibility variant rs1801282 (Pro12Ala) in the PPARG gene (r=0.54 in HapMap CEU) (Supplementary Figure 3). Conditional analyses in stage 1 + 2 samples could not differentiate between the effect of these two SNPs (Supplementary Note; Supplementary Table 7).

None of the 11 SNPs (Table 2) were convincingly associated with body mass index (BMI) (Supplementary Table 8) or other T2D-related traits (with p<10) (Supplementary Table 9). The largest fold-change in T2D association p values before and after adjusting for BMI was for rs17036101 (p=8.1×10 before adjustment and p=7.5×10 after adjustment for BMI; Supplementary Table 10). Conditioning on the associated SNP that was taken forward to stages 2 and 3 in each region revealed no additional independent association signals (p<10) in stage 1 data (Supplementary Figure 4; Supplementary Note).

By combining three GWA scans involving 10,128 samples (enhanced through imputation approaches) and undertaking large-scale replication in up to 79,792 additional samples, we have identified six additional loci from the human genome that apparently harbor common genetic variants that influence susceptibility to T2D. These findings are consistent with a model in which the preponderance of loci detectable through the GWA approach (using current arrays and indirect, LD mapping, at least) have modest effects (ORs between 1.1 and 1.2). Given such a model, our study (in which we followed up only 69 signals out of over 2 million meta-analysed SNPs) would be expected to recover only a subset of the loci with similar characteristics (that is, those that managed to reach our stage 1 selection criteria). Further efforts to expand GWA meta-analyses and to extend the number of SNPs taken forward to large-scale replication should confirm additional genomic loci, as should targeted analysis of copy number variation. However, estimates based on the first common SNP marker in a region are certainly underestimates. The effect of the actual common causal mutation(s) will typically be larger, since effect sizes are now estimated through LD. Moreover, the same genes likely carry rare mutations of larger effect: three genes with common variants that influence risk of T2D were first discovered based on rare Mendelian mutations (KCNJ11, WSF1 and TCF2). Regardless of effect size, these loci provide important clues to the processes involved in the maintenance of normal glucose homeostasis and in the pathogenesis of type 2 diabetes.

References

  • 1. Diabetes Genetics Initiative Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316:1331–1336.[PubMed]
  • 2. Scott LJ, et al A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–1345.[Google Scholar]
  • 3. Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678.
  • 4. Zeggini E, et al Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316:1336–1341.[Google Scholar]
  • 5. Steinthorsdottir V, et al A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat. Genet. 2007;39:770–775.[PubMed][Google Scholar]
  • 6. Sladek R, et al A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445:881–885.[PubMed][Google Scholar]
  • 7. Florez JC, et al A 100K genome-wide association scan for diabetes and related traits in the Framingham Heart Study: replication and integration with other genome-wide datasets. Diabetes. 2007;56:3063–3074.[PubMed][Google Scholar]
  • 8. Rampersaud E, et al Identification of novel candidate genes for type 2 diabetes from a genome-wide association scan in the Old Order Amish: evidence for replication from diabetes-related quantitative traits and from independent populations. Diabetes. 2007;56:3053–3062.[PubMed][Google Scholar]
  • 9. Hanson RL, et al A search for variants associated with young-onset type 2 diabetes in American Indians in a 100K genotyping array. Diabetes. 2007;56:3045–52.information valid in the study of a chronic disease such as diabetes? The Nord-Trøndelag diabetes study. J Epidemiol Community Health. 1992;46:537–542.[PubMed][Google Scholar]
  • 10. Hayes MG, et al Identification of type 2 diabetes genes in Mexican Americans through genome-wide association studies. Diabetes. 2007;56:3033–3044.[PubMed][Google Scholar]
  • 11. Salonen J, et al Type 2 diabetes whole-genome association study in four populations: the DiaGen consortium. Am. J. Hum. Genet. 2007;81:338–345.[Google Scholar]
  • 12. McCarthy MI, Zeggini EGenome-wide association scans for Type 2 diabetes: new insights into biology and therapy. Trends Pharmacol. Sci. 2007;28:598–601.[PubMed][Google Scholar]
  • 13. International HapMap Consortium A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861.
  • 14. Marchini J, Howie B, Myers S, McVean G, Donnelly PA new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 2007;39:906–913.[PubMed][Google Scholar]
  • 15. Freedman ML, et al Assessing the impact of population stratification on genetic association studies. Nat. Genet. 2004;36:388–393.[PubMed][Google Scholar]
  • 16. Nakajima T, Fujino S, Nakanishi G, Kim YS, Jetten AMTIP27: a novel repressor of the nuclear orphan receptor TAK1/TR4. Nucleic Acids Res. 2004;32:4194–4204.[Google Scholar]
  • 17. Collins LL, et al Growth retardation and abnormal maternal behavior in mice lacking testicular orphan nuclear receptor 4. Proc. Natl. Acad. Sci. USA. 2004;101:15058–15063.[Google Scholar]
  • 18. Thomas G, et al Multiple loci identified in a genome-wide association study of prostate cancer. Nat. Genet. 10 February 2008 | doi:10.1038/ng.91. [[PubMed]
  • 19. Gudmundsson J, et al Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat. Genet. 2007;39:977–983.[PubMed][Google Scholar]
  • 20. Winckler W, et al Evaluation of common variants in the six known maturity-onset diabetes of the young (MODY) genes for association with type 2 diabetes. Diabetes. 2007;56:685–693.[PubMed][Google Scholar]
  • 21. Bieganowski P, Shilinski K, Tsichlis PN, Brenner CCdc123 and checkpoint forkhead associated with RING proteins control the cell cycle by controlling eIF2gamma abundance. J. Biol. Chem. 2004;273:44656–44666.[PubMed][Google Scholar]
  • 22. Drieschner N, et al Evidence for a 3p25 breakpoint hot spot region in thyroid tumors of follicular origin. Thyroid. 2006;16:1091–1096.[PubMed][Google Scholar]
  • 23. Drieschner N, et al A domain of the thyroid adenoma associated gene (THADA) conserved in vertebrates becomes destroyed by chromosomal rearrangements observed in thyroid adenomas. Gene. 2007;403:110–117.[PubMed][Google Scholar]
  • 24. Lammert E, Brown J, Melton DANotch gene expression during pancreatic organogenesis. Mech. Dev. 2000;94:199–203.[PubMed][Google Scholar]
  • 25. Higgins JP, Thompson SG, Deeks JJ, Altman DGMeasuring inconsistency in meta-analyses. BMJ. 2003;327:557–560.[Google Scholar]
  • 26. Risch N, Merikangas KThe future of genetic studies of complex human diseases. Science. 1996;273:1516–1517.[PubMed][Google Scholar]
  • 27. Lin S, Chakravarti A, Cutler DJExhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies. Nat. Genet. 2004;36:1181–1188.[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.