Multiple type 2 diabetes susceptibility genes following genome-wide association scan in UK samples
Abstract
The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insights into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.
The pathophysiological basis of type 2 diabetes (T2D) remains unclear despite its growing global importance (1). Candidate gene and positional cloning efforts have suggested many putative susceptibility variants, but unequivocal replications are so far limited to variants in just three genes: PPARG, KCNJ11 and TCF7L2 (2-4).
Improved understanding of the correlation between genetic variants (linkage disequilibrium [LD]), allied to advances in genotyping technology, have enabled systematic searches for disease-associated common variants on a genome-wide scale. The Wellcome Trust Case Control Consortium (WTCCC) recently completed such a genome-wide association (GWA) scan in 1,924 T2D cases and 2,938 population controls from the UK, using the Affymetrix GeneChip Human Mapping 500k Array Set (5). The strongest association signals genome-wide were observed for single nucleotide polymorphisms (SNPs) in TCF7L2 (e.g. rs7901695 OR 1.37 [95% CI 1.25-1.49], p=6.7×10). The other known T2D susceptibility variants were detected with effect sizes consistent with previous reports (2, 3).
Here, we describe how integration of data from the WTCCC scan and our own replication studies with similar information generated by the Diabetes Genetics Initiative [DGI] (6) and the Finland-United States Investigation of NIDDM Genetics [FUSION] (7) has identified several additional susceptibility variants for T2D.
In the WTCCC study, analysis of 490,032 autosomal SNPs in 16,179 samples yielded 459,448 SNPs that passed initial quality control (5). We considered only the 393,453 autosomal SNPs with minor allele frequency (MAF) exceeding 1% in both cases and controls and no extreme departure from Hardy Weinberg equilibrium (HWE) (p<10 in cases or controls) (8). This T2D-specific dataset shows no evidence of substantial confounding from population substructure and genotyping biases (8).
To distinguish true associations from those reflecting fluctuations under the null or residual errors arising from aberrant allele calling, we first submitted putative signals from the WTCCC study to additional quality control including cluster plot visualization and validation genotyping on a second platform (8). Next, we attempted replication of selected signals in up to 3,757 additional cases and 5,346 controls (replication sets RS1-RS3). RS1 comprised 2,022 cases and 2,037 controls from the UK Type 2 Diabetes Genetics Consortium collection (UKT2DGC: all from Tayside, Scotland). RS2 included 632 additional T2D cases and 1,750 population controls from the Exeter Family Study of Child Health (EFSOCH). A subset of SNPs were typed in RS3, comprising a further 1,103 cases and 1,559 controls from the UKT2DGC (Table S1).
The first wave of validated SNPs sent for replication was selected from the 30 SNPs, in 9 distinct chromosomal regions (excluding TCF7L2), which had, in the WTCCC scan alone, attained the most extreme (p<10) significance values on Cochran-Armitage tests of association. Genotyping of 21 representative SNPs generated evidence of replication (p<0.05) for three of these 9 regions (Tables 1, S2).
Table 1
WTCCC | Replication meta-analysis | All UK sample meta-analysis | DGI | FUSION | All Combined | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1924 cases | 3757 cases | 5681 cases | 6529 cases | 2376 cases | 14586 cases | ||||||||||||
2938 controls | 5346 controls | 8284 controls | 7252 controls | 2432 controls | 17968 controls | ||||||||||||
rs | chr | position | A1 | A2 | Region | OR (95%Cis) | Padd | OR (95%Cis) | Padd | OR (95%Cis) | Padd | OR (95%Cis) | Padd | OR (95%Cis) | Padd | OR (95% Cis) | Padd |
rs8050136 | 16 | 52373776 | A | C | FTO | 1.27 (1.16-1.37) | 2.0×10−8 | 1.22 (1.12-1.32) | 5.4×10−7 | 1.23 (1.18-1.32) | 7.3×10−14 | 1.03 (0.91-1.17) | 0.25 | 1.11 (1.02-1.20) | 0.017 | 1.17 (1.12-1.22) | 1.3×10−12 |
rs10946398 | 6 | 20769013 | A | C | CDKAL1 | 1.20 (1.10-1.31) | 2.5×10−5 | 1.14 (1.07-1.22) | 8.4×10−5 | 1.16 (1.10-1.22) | 1.3×10−8 | 1.08 (1.03-1.14) | 2.4×10−3 | 1.12 (1.03-1.22) | 9.5×10−3 | 1.12 (1.08,1.16) | 4.1×10−11 |
rs5015480 | 10 | 94455539 | C | T | HHEX | 1.22 (1.12-1.33) | 5.4×10−6 | - | - | ||||||||
rs1111875 | 10 | 94452862 | C | T | HHEX | - | - | 1.08 (1.01-1.15) | 0.020 | 1.13 (1.07-1.19) | 4.6×10−6 | 1.14 (1.06-1.22) | 1.7×10−4 | 1.10 (1.01-1.19) | 0.025 | 1.13 (1.08-1.17) | 5.7×10−10 |
rs10811661 | 9 | 22124094 | C | T | CDKN2B | 1.22 (1.09-1.37) | 7.6×10−4 | 1.18 (1.08-1.28) | 1.7×10−4 | 1.19 (1.11-1.28) | 4.9×10−7 | 1.20 (1.12-1.28) | 5.4×10−8 | 1.20 (1.07-1.36) | 2.2×10−3 | 1.20 (1.14-1.25) | 7.8−10−15 |
rs564398 | 9 | 22019547 | C | T | CDKN2B | 1.16 (1.07-1.27) | 3.2×10−4 | 1.12 (1.05-1.19) | 8.6×10−4 | 1.13 (1.08-1.19) | 1.3×10−6 | 1.05 (0.94-1.17) | 0.5 | 1.13 (1.01-1.27) | 0.039 | 1.12 (1.07-1.17) | 1.2×10−7 |
rs4402960 | 3 | 186994389 | G | T | IGF2BP2 | 1.15 (1.05-1.25) | 1.7×10−3 | 1.09 (1.01-1.16) | 0.018 | 1.11 (1.05-1.16) | 1.6×10−4 | 1.17 (1.11-1.23) | 1.7×10−9 | 1.18 (1.08-1.28) | 2.4×10−4 | 1.14 (1.11-1.18) | 8.6×10−16 |
rs13266634 | 8 | 118253964 | C | T | SLC30A8 | 1.12 (1.02-1.23) | 0.020 | 1.12 (1.04-1.19) | 1.2×10−3 | 1.12 (1.05-1.18) | 7.0×10−5 | 1.07 (1.00-1.16) | 0.047 | 1.18 (1.09-1.29) | 7.0×10−5 | 1.12 (1.07-1.16) | 5.3×10−8 |
rs7901695 | 10 | 114744078 | C | T | TCF7L2 | 1.37 (1.25-1.49) | 6.7×10−13 | - | - | - | - | 1.38 (1.31-1.46) | 2.3×10−31 | 1.34 (1.21-1.49) | 1.4×10−8 | 1.37 (1.31-1.43) | 1.0×10−48 |
rs5215 | 11 | 17365206 | C | T | KCNJ11 | 1.15 (1.05-1.25) | 1.3×10−3 | - | - | - | - | 1.15 (1.09-1.21) | 1.0−10−7 | 1.11 (1.02-1.20) | 0.014 | 1.14 (1.10-1.19) | 5.0×10−11 |
rs1801282 | 3 | 12368125 | C | G | PPARG | 1.23 (1.09-1.41) | 1.3×10−3 | - | - | - | - | 1.09 (1.01-1.16) | 0.019 | 1.20 (1.07-1.33) | 1.4×10−3 | 1.14 (1.08-1.20) | 1.7×10−6 |
Rs8050136 (mapping to the FTO [fat mass associated with obesity] gene region on chr16) was among a cluster of SNPs generating the strongest evidence for association outside TCF7L2 in the original scan (risk allele OR 1.27 [1.16-1.37] p=2.0×10) (Figure S1). This SNP showed strong replication (OR 1.22 [1.12-1.32], p=5.4×10). As we recently reported (9), this effect on T2D risk is mediated through a primary effect on adiposity, and adjustment for BMI abolishes the T2D association. Replication was also obtained for SNPs within the CDKAL1 locus on chromosome 6, including rs9465871 and rs10946398. Although rs9465871 generated the stronger signal in the WTCCC scan, replication at this SNP was modest (p=0.023). The replication signal at rs10946398 was more striking (OR 1.14 [1.07-1.22], p=8.4×10) (Tables 1, S2). Consistent evidence of association is provided by the DGI (p=4.1×10 at rs7754840) and FUSION groups (p=9.5×10 at rs471253) (Tables 1, S3) (6, 7), both SNPs being strong (r>0.99) proxies for rs10946398. Across all studies, combined evidence for association at CDKAL1 is compelling (p~4.1×10). All associated SNPs map to a large (90kb) intron within CDKAL1 (Figure 1). Flanking recombination hotspots define a 200kb interval likely to contain the etiological variant(s). CDKAL1 (cyclin-dependent kinase 5 [CDK5] regulatory subunit associated protein 1-like 1) encodes a 579-residue, 65kD protein of unknown function. We have detected expression of CDKAL1 mRNA in human pancreatic islet and skeletal muscle (Figure S2). CDKAL1 shares considerable protein domain and amino acid homology with CDK5 regulatory subunit associated protein 1 (CDK5RAP1), a known inhibitor of CDK5 activation. CDK5 has been implicated in the regulation of pancreatic beta cell function, through formation of p35/CDK5 complexes that downregulate insulin expression (11, 12).
The third replicated association maps to the HHEX (homeobox, hematopoietically expressed) gene region on chromosome 10. This gene showed both strong association in the WTCCC scan (rs5015480: risk allele OR 1.22 [1.12-1.33], p=5.4×10) and is a powerful biological candidate (13, 14). We could not optimize a replication assay for rs5015480, but observed evidence for replication at a perfect proxy, rs1111875 (risk allele OR 1.08 [1.01-1.15], p=0.02) (Tables 1, S2, S3). Both DGI and FUSION studies showed modest, but consistent association signals generating strong combined evidence (p~5.7×10) for a role in T2D susceptibility (Tables 1, S3). A fourth genome-wide association scan, in French subjects, recently reported independent evidence for a T2D signal in this region (10). The signal resides within an extended (295kb) region of LD containing not only HHEX (highly expressed in fetal and adult pancreas [Figure S2]) but also the genes encoding kinesin-interacting factor (KIF11) and insulin degrading enzyme (IDE) (Figure S3). IDE represents a second strong biological candidate given postulated effects on both insulin signalling and islet function, and data from rodent models (15-17).
Of the remaining regions selected in the first wave, none showed any evidence of replication in UK samples (Table S2), and for none was there strong support from the DGI and FUSION scans.
The relatively strict thresholds imposed for SNP selection in the first wave (i.e. point-wise p<10) help to limit false discovery, but many genuine susceptibility variants will fail to reach them. We initiated a second wave of replication based around SNPs for which the WTCCC scan generated more modest evidence for association (Cochran-Armitage p ~10 to 10). We prioritized the 5367 SNPs in this range, using additional criteria: (a) evidence of association in DGI and FUSION (6, 7); (b) presence of multiple, independent (r<0.4) associations within the same locus; and (c) biological candidacy (8, 18).
Analysis of the 56 SNPs, representing 49 putative signals, selected for this “second wave” of replication (Table S4) yielded two further regions implicated in T2D-susceptibility. A cluster of SNPs on chromosome 9 (represented by rs10811661) generated a promising signal in all three scans. Replication was observed in UK samples (rs10811661:OR 1.18 [1.08-1.28], p=1.7×10), as well as DGI (p=2.2×10) and FUSION follow-up studies (rs2383208, p=9.7×10). A second signal from the WTCCC scan located ~100kb 5′ (rs564398, OR 1.16 [1.07-1.27], p=3.2×10) was weakly supported in the FUSION, but not the DGI scan (Tables 1, S3) and replicated in the UK RS samples (OR 1.12 [1.05-1.19], p=8.6×10) (Tables 1, S3).
These two association signals are separated by a recombination hotspot (D′ between rs10811661 and rs564398 is 0.057, r<0.001) (Figure 2). Across all studies, the combined evidence for association is stronger for the 3′ (p~7.8×10) than the 5′ (p~1.2×10) peak (Table 1). The 3′ signal maps to sequence with no characterized genes, while the recombination interval enclosing the 5′ signal includes the full coding sequences of CDKN2B and CDKN2A (encoding p15 and p16 respectively). CDKN2A is a known tumour suppressor and its product, p16, inhibits CDK4 (cyclin-dependent kinase 4), a powerful regulator of pancreatic beta cell replication (19-21). Overexpression of Cdkn2a leads to decreased islet proliferation in ageing mice (22). Cdkn2b overexpression is also causally related to islet hypoplasia and diabetes in murine models (23). Both CDKN2B and CDKN2A display high levels of expression in pancreatic islets and pituitary (Figure S2).
A fifth replicated association lies within the IGF2BP2 gene on chromosome 3. We observed some evidence of association for SNPs in this region in the WTCCC scan (5) (e.g. rs4402960: OR 1.15 [1.05-1.25], p=1.7×10). Consistent associations in the DGI and FUSION scans (6, 7) and the biological candidacy of the gene (a known regulator of insulin-like growth factor 2 [IGF2] translation), prompted replication. We obtained only modest evidence for replication at rs4402960 (OR 1.09 [1.01-1.16], p=0.018) (Tables 1, S4), but combined evidence across all studies (p~8.6×10) establishes this as a genuine T2D signal (Tables 1, S3). The associated SNPs map to a 57kb region spanning the promoter and first 2 exons of IGF2BP2 (Figure S4).
Most of the remaining 50 “second wave” SNPs can be discounted as susceptibility variants based on their failure to replicate (Table S4), though some merit further consideration. One such example is rs9369425, located 57kb downstream of the VEGFA (vascular endothelial growth factor A) gene on chromosome 6 (Figure S5). Evidence for association in the WTCCC scan (OR 1.16 [1.06-1.27], p=8.6×10) is supported by nominal replication in UK samples (1.08 [1.01-1.15], p=0.03) and by DGI scan results (1.17 [1.04-1.32], p=4.4×10). While no signal is apparent in the FUSION study, this does not allow us to reject this association. For 80% power to detect an OR of 1.11 (α=0.05), over 3,000 case-control pairs are needed.
In the French genome-wide scan (10), variants in both the HHEX and SLC30A8 genes were implicated in T2D susceptibility. As the associated SNPs in SLC30A8 are poorly captured on the Affymetrix chip (r<0.01), the WTCCC scan was not informative for this locus. However, we genotyped rs13266634 independently and obtained replication of the finding (risk allele OR 1.12 [1.05-1.18], p=7.0×10 in all UK data) and across all three studies (p~5.3×10, Tables 1, S4).
The present analysis has contributed to identification of several confirmed T2D susceptibility loci. One of these (FTO) exerts its primary effect on T2D risk through an impact on adiposity (9): none of the other signals was attenuated by adjustment for BMI or waist circumference (Tables S5-S7). One of the remaining four loci (HHEX/IDE) represents a strong replication of findings recently reported (10). The other three loci (near CDKAL1, IGF2BP2 and CDKN2A), all showing extensive replication across the three studies represent novel T2D susceptibility loci.
Across the four T2D scans completed (5, 6, 7, 10), TCF7L2 clearly emerges as the largest association signal. On current evidence, all other confirmed loci display more modest effect sizes (between 1.10 and 1.25 per allele). Extensive resequencing and fine-mapping will be required to define the full spectrum of etiological variation at each locus and these may yet identify variants with greater impact. Our findings offer clear lessons for the design of future studies. Robust identification of variants with such effect sizes is only feasible with large-scale sample sets (13,965 individuals were typed in the present study). Further, the exchange of data between groups (providing data on up to 32,554 samples) was key to the rapid and unequivocal identification of the signals we report.
As a result of the four GWA studies reported to date (5, 6, 7, 10), the number of genuine, replicated T2D susceptibility signals has climbed from 3 to 9 (adding HHEX/IDE, SLC30A8, CDKAL1, CDKN2A, IGF2BP2 and FTO). However, these loci explain only a small proportion of the observed familiality (the sibling relative risk, λs, attributable to all loci in the UK samples is only ~1.07). We expect additional loci to be revealed by further rounds of replication initiated by more systematic meta-analysis of these and other scans. Our study provides an important validation of the genome-wide indirect association mapping approach and a demonstration of the value of aggressive data sharing efforts. It also generates insights into T2D pathogenesis emphasizing the likely importance of pathways involved in pancreatic beta cell development, regeneration and function. In-depth physiological and functional studies are now needed to establish the precise mechanisms involved.
Footnotes
Supporting Online Material
Materials and Methods
Replication set-only combined effect size estimation
Haplotype-based analysis results
Overlap between association and linkage signals
Quantitative trait analysis results
Departures from additivity
Data access details
Membership of WTCCC
Detailed acknowledgements
References
Figs. S1, S2, S3, S4, S5, S6, S7, S8
Tables S1, S2, S3, S4, S5, S6, S7, S8, S9, S10
REFERENCES
References
- 1. Stumvoll M, Goldstein BJ, van Haeften TW. Lancet. 2005;365:1333–1346.[PubMed]
- 2. Altshuler D, et al Nat. Genet. 2000;26:76–80.[PubMed][Google Scholar]
- 3. Gloyn AL, et al Diabetes. 2003;52:568–572.[PubMed][Google Scholar]
- 4. Grant SF, et al Nat. Genet. 2006;38:320–323.[PubMed][Google Scholar]
- 5. Donnelly P, the WTCCC, personal communication Data from the Wellcome Trust Case Control Consortium scan.
- 6. Diabetes Genetics Initiative Science. (this issue)[PubMed]
- 7. Scott LJ, et al Science. (this issue)[PubMed][Google Scholar]
- 8. Materials and methods are available as on Science Online[PubMed]
- 9. Frayling TM, et al Science. Published online April 12 2007.[Google Scholar]
- 10. Sladek R, et al Nature. 2007;445:881–885.[PubMed][Google Scholar]
- 11. Ubeda M, Rukstalis JM, Habener JF. J. Biol. Chem. 2006;281:28858–28864.[PubMed]
- 12. Wei FY, et al Nat. Med. 2005;11:1104–1108.[PubMed][Google Scholar]
- 13. Foley AC, Mercola M. Genes Dev. 2005;19:387–396.
- 14. Bort R, Martinez-Barbera JP, Beddington RS, Zaret KS. Development. 2004;131:797–806.[PubMed]
- 15. Fakhrai-Rad H, et al Hum. Mol. Genet. 2000;9:2149–2158.[PubMed][Google Scholar]
- 16. Seta KA, Roth RA. Biochem. Biophys. Res. Commun. 1997;231:167–71.[PubMed]
- 17. Farris W, et al Am. J. Pathol. 2004;164:1425–1434.[Google Scholar]
- 18. GeneSniffer ( ) was used to prioritize genes for biological candidacy by integrating information from diverse online databases (including Entrez Gene, OMIM, PubMed and MGI).[PubMed]
- 19. Rane SG, et al Nat. Genet. 1999;22:44–52.[PubMed][Google Scholar]
- 20. Mettus RV, Rane SG. Oncogene. 2003;22:8413–8421.[PubMed]
- 21. Marzo N, et al Diabetologia. 2004;47:686–694.[PubMed][Google Scholar]
- 22. Krishnamurthy J, et al Nature. 2006;443:453–457.[PubMed][Google Scholar]
- 23. Moritani M, et al Mol. Cell Endocrinol. 2005;229:175–184.[PubMed][Google Scholar]
- 24. Blanchette M, et al Genome Res. 2004;14:708–715.[Google Scholar]
- 25. Pettersson F, Jonsson O, Cardon LR. Bioinformatics. 2004;20:3241–3243.[PubMed]
- 26. Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. Science. 2005;310:321.[PubMed]