Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes.
Journal: 2007/June - Science
ISSN: 1095-9203
Abstract:
The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1924 diabetic cases and 2938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3757 additional cases and 5346 controls and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.
Relations:
Content
Citations
(927)
References
(20)
Diseases
(1)
Conditions
(1)
Chemicals
(3)
Genes
(12)
Organisms
(1)
Processes
(4)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Science 316(5829): 1336-1341

Multiple type 2 diabetes susceptibility genes following genome-wide association scan in UK samples

+18 authors
Oxford Centre for Diabetes, Endocrinology and Medicine, University of Oxford, Churchill Hospital, Oxford, OX3 7LJ, UK
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK
Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, Magdalen Road, Exeter EX1 2LU, UK
Diabetes Genetics group, Institute of Biomedical and Clinical Science, Peninsula Medical School, Barrack Road, Exeter EX2 5DW, UK
The MRC Centre for Causal Analyses in Translational Epidemiology, Bristol University, Canynge Hall, Whiteladies Rd, Bristol, BS2 8PR, UK
The Molecular Genetics Laboratory, Royal Devon and Exeter NHS Foundation Trust, Old Pathology Building, Barrack Road, Exeter, EX2 5DW, UK
Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, UK
Diabetes Research Group, School of Clinical Medical Sciences, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, UK
Centre for Diabetes and Metabolic Medicine, Barts and The London, Royal London Hospital, Whitechapel, London, E1 1BB, UK
Diabetes Research Group, Division of Medicine and Therapeutics, Ninewells Hospital and Medical School, Dundee, DD1 9SY, UK
Membership of the WTCCC is listed in supporting online text
Corresponding author: Prof Mark I McCarthy Oxford Centre for Diabetes, Endocrinology and Metabolism University of Oxford Old Road, Headington, Oxford OX3 7LJ UK Tel: (44) 1865 857298 Fax: (44) 1865 857299 ku.ca.xo.lrd@yhtraccm.kram
EZ, MNW, CML, TMF contributed equally to the work described. MIMcC and ATH contributed equally.

Abstract

The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insights into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.

Abstract

The pathophysiological basis of type 2 diabetes (T2D) remains unclear despite its growing global importance (1). Candidate gene and positional cloning efforts have suggested many putative susceptibility variants, but unequivocal replications are so far limited to variants in just three genes: PPARG, KCNJ11 and TCF7L2 (2-4).

Improved understanding of the correlation between genetic variants (linkage disequilibrium [LD]), allied to advances in genotyping technology, have enabled systematic searches for disease-associated common variants on a genome-wide scale. The Wellcome Trust Case Control Consortium (WTCCC) recently completed such a genome-wide association (GWA) scan in 1,924 T2D cases and 2,938 population controls from the UK, using the Affymetrix GeneChip Human Mapping 500k Array Set (5). The strongest association signals genome-wide were observed for single nucleotide polymorphisms (SNPs) in TCF7L2 (e.g. rs7901695 OR 1.37 [95% CI 1.25-1.49], p=6.7×10). The other known T2D susceptibility variants were detected with effect sizes consistent with previous reports (2, 3).

Here, we describe how integration of data from the WTCCC scan and our own replication studies with similar information generated by the Diabetes Genetics Initiative [DGI] (6) and the Finland-United States Investigation of NIDDM Genetics [FUSION] (7) has identified several additional susceptibility variants for T2D.

In the WTCCC study, analysis of 490,032 autosomal SNPs in 16,179 samples yielded 459,448 SNPs that passed initial quality control (5). We considered only the 393,453 autosomal SNPs with minor allele frequency (MAF) exceeding 1% in both cases and controls and no extreme departure from Hardy Weinberg equilibrium (HWE) (p<10 in cases or controls) (8). This T2D-specific dataset shows no evidence of substantial confounding from population substructure and genotyping biases (8).

To distinguish true associations from those reflecting fluctuations under the null or residual errors arising from aberrant allele calling, we first submitted putative signals from the WTCCC study to additional quality control including cluster plot visualization and validation genotyping on a second platform (8). Next, we attempted replication of selected signals in up to 3,757 additional cases and 5,346 controls (replication sets RS1-RS3). RS1 comprised 2,022 cases and 2,037 controls from the UK Type 2 Diabetes Genetics Consortium collection (UKT2DGC: all from Tayside, Scotland). RS2 included 632 additional T2D cases and 1,750 population controls from the Exeter Family Study of Child Health (EFSOCH). A subset of SNPs were typed in RS3, comprising a further 1,103 cases and 1,559 controls from the UKT2DGC (Table S1).

The first wave of validated SNPs sent for replication was selected from the 30 SNPs, in 9 distinct chromosomal regions (excluding TCF7L2), which had, in the WTCCC scan alone, attained the most extreme (p<10) significance values on Cochran-Armitage tests of association. Genotyping of 21 representative SNPs generated evidence of replication (p<0.05) for three of these 9 regions (Tables 1, S2).

Table 1

Confirmed T2D susceptibility variants

Representative SNPs are shown for each signal with ORs and 95% CIs reported (for the Cochran-Armitage 1df test) with respect to the risk allele (denoted in bold, with the ancestral allele underlined where known). SNPs selected for inclusion are those with the strongest evidence for association in the UK datasets (except in the case of TCF7L2, where, to maximize consistency across the datasets rs7901695 is presented). In the case of HHEX, the UK meta-analysis combines data from rs5015480 and rs1111875 (r=1 in HapMap CEU). Since DGI and FUSION had not typed the identical SNPs in all samples, results shown for those studies feature the SNP generating the strongest association: in all cases these were SNPs in strong LD (minimum r 0.95, except TCF7L2) and with consistent direction of effect with the SNP reported in the UK data (see Table S3 for details). The use of different SNPs may result in slightly different estimates of p values and OR between the three studies. Combined estimates of the ORs were calculated by weighting the logORs of each study by the inverse of their variance.

WTCCCReplication
meta-analysis
All UK sample
meta-analysis
DGIFUSIONAll Combined
1924 cases3757 cases5681 cases6529 cases2376 cases14586 cases
2938 controls5346 controls8284 controls7252 controls2432 controls17968 controls
rschrpositionA1A2RegionOR
(95%Cis)
PaddOR
(95%Cis)
PaddOR
(95%Cis)
PaddOR
(95%Cis)
PaddOR
(95%Cis)
PaddOR
(95% Cis)
Padd
rs80501361652373776ACFTO1.27
(1.16-1.37)
2.0×10−81.22
(1.12-1.32)
5.4×10−71.23
(1.18-1.32)
7.3×10−141.03
(0.91-1.17)
0.251.11
(1.02-1.20)
0.0171.17
(1.12-1.22)
1.3×10−12
rs10946398620769013ACCDKAL11.20
(1.10-1.31)
2.5×10−51.14
(1.07-1.22)
8.4×10−51.16
(1.10-1.22)
1.3×10−81.08
(1.03-1.14)
2.4×10−31.12
(1.03-1.22)
9.5×10−31.12
(1.08,1.16)
4.1×10−11
rs50154801094455539CTHHEX1.22
(1.12-1.33)
5.4×10−6--
rs11118751094452862CTHHEX--1.08
(1.01-1.15)
0.0201.13
(1.07-1.19)
4.6×10−61.14
(1.06-1.22)
1.7×10−41.10
(1.01-1.19)
0.0251.13
(1.08-1.17)
5.7×10−10
rs10811661922124094CTCDKN2B1.22
(1.09-1.37)
7.6×10−41.18
(1.08-1.28)
1.7×10−41.19
(1.11-1.28)
4.9×10−71.20
(1.12-1.28)
5.4×10−81.20
(1.07-1.36)
2.2×10−31.20
(1.14-1.25)
7.8−10−15
rs564398922019547CTCDKN2B1.16
(1.07-1.27)
3.2×10−41.12
(1.05-1.19)
8.6×10−41.13
(1.08-1.19)
1.3×10−61.05
(0.94-1.17)
0.51.13
(1.01-1.27)
0.0391.12
(1.07-1.17)
1.2×10−7
rs44029603186994389GTIGF2BP21.15
(1.05-1.25)
1.7×10−31.09
(1.01-1.16)
0.0181.11
(1.05-1.16)
1.6×10−41.17
(1.11-1.23)
1.7×10−91.18
(1.08-1.28)
2.4×10−41.14
(1.11-1.18)
8.6×10−16
rs132666348118253964CTSLC30A81.12
(1.02-1.23)
0.0201.12
(1.04-1.19)
1.2×10−31.12
(1.05-1.18)
7.0×10−51.07
(1.00-1.16)
0.0471.18
(1.09-1.29)
7.0×10−51.12
(1.07-1.16)
5.3×10−8
rs790169510114744078CTTCF7L21.37
(1.25-1.49)
6.7×10−13----1.38
(1.31-1.46)
2.3×10−311.34
(1.21-1.49)
1.4×10−81.37
(1.31-1.43)
1.0×10−48
rs52151117365206CTKCNJ111.15
(1.05-1.25)
1.3×10−3----1.15
(1.09-1.21)
1.0−10−71.11
(1.02-1.20)
0.0141.14
(1.10-1.19)
5.0×10−11
rs1801282312368125CGPPARG1.23
(1.09-1.41)
1.3×10−3----1.09
(1.01-1.16)
0.0191.20
(1.07-1.33)
1.4×10−31.14
(1.08-1.20)
1.7×10−6

Rs8050136 (mapping to the FTO [fat mass associated with obesity] gene region on chr16) was among a cluster of SNPs generating the strongest evidence for association outside TCF7L2 in the original scan (risk allele OR 1.27 [1.16-1.37] p=2.0×10) (Figure S1). This SNP showed strong replication (OR 1.22 [1.12-1.32], p=5.4×10). As we recently reported (9), this effect on T2D risk is mediated through a primary effect on adiposity, and adjustment for BMI abolishes the T2D association. Replication was also obtained for SNPs within the CDKAL1 locus on chromosome 6, including rs9465871 and rs10946398. Although rs9465871 generated the stronger signal in the WTCCC scan, replication at this SNP was modest (p=0.023). The replication signal at rs10946398 was more striking (OR 1.14 [1.07-1.22], p=8.4×10) (Tables 1, S2). Consistent evidence of association is provided by the DGI (p=4.1×10 at rs7754840) and FUSION groups (p=9.5×10 at rs471253) (Tables 1, S3) (6, 7), both SNPs being strong (r>0.99) proxies for rs10946398. Across all studies, combined evidence for association at CDKAL1 is compelling (p~4.1×10). All associated SNPs map to a large (90kb) intron within CDKAL1 (Figure 1). Flanking recombination hotspots define a 200kb interval likely to contain the etiological variant(s). CDKAL1 (cyclin-dependent kinase 5 [CDK5] regulatory subunit associated protein 1-like 1) encodes a 579-residue, 65kD protein of unknown function. We have detected expression of CDKAL1 mRNA in human pancreatic islet and skeletal muscle (Figure S2). CDKAL1 shares considerable protein domain and amino acid homology with CDK5 regulatory subunit associated protein 1 (CDK5RAP1), a known inhibitor of CDK5 activation. CDK5 has been implicated in the regulation of pancreatic beta cell function, through formation of p35/CDK5 complexes that downregulate insulin expression (11, 12).

An external file that holds a picture, illustration, etc.
Object name is emss-54288-f0001.jpg
Overview of CDKAL1 signal region

A Plot of −log(p values) for T2D (Cochran-Armitage test for trend) against chromosome position in Mb. Blue diamonds represent primary scan results and pink triangles denote meta-analysis results across all UK samples.

B Genomic location of genes showing intron and exon structure (NCBI Build 35). Pink triangles show position of replication SNPs relative to gene structure.

C MULTIZ (24) vertebrate alignment of 17 species showing evolutionary conservation.

D GoldSurfer2 (25) plot of linkage disequilibrium (r) for SNPs genotyped in WTCCC scan (passing T2D-specific quality control) in WTCCC T2D cases.

E Recombination rate given as cM/MB. Red boxes represent recombination hotspots (26).

F GoldSurfer2 plot of linkage disequilibrium (r) for all HapMap SNPs across the region (HapMap CEU data).

The third replicated association maps to the HHEX (homeobox, hematopoietically expressed) gene region on chromosome 10. This gene showed both strong association in the WTCCC scan (rs5015480: risk allele OR 1.22 [1.12-1.33], p=5.4×10) and is a powerful biological candidate (13, 14). We could not optimize a replication assay for rs5015480, but observed evidence for replication at a perfect proxy, rs1111875 (risk allele OR 1.08 [1.01-1.15], p=0.02) (Tables 1, S2, S3). Both DGI and FUSION studies showed modest, but consistent association signals generating strong combined evidence (p~5.7×10) for a role in T2D susceptibility (Tables 1, S3). A fourth genome-wide association scan, in French subjects, recently reported independent evidence for a T2D signal in this region (10). The signal resides within an extended (295kb) region of LD containing not only HHEX (highly expressed in fetal and adult pancreas [Figure S2]) but also the genes encoding kinesin-interacting factor (KIF11) and insulin degrading enzyme (IDE) (Figure S3). IDE represents a second strong biological candidate given postulated effects on both insulin signalling and islet function, and data from rodent models (15-17).

Of the remaining regions selected in the first wave, none showed any evidence of replication in UK samples (Table S2), and for none was there strong support from the DGI and FUSION scans.

The relatively strict thresholds imposed for SNP selection in the first wave (i.e. point-wise p<10) help to limit false discovery, but many genuine susceptibility variants will fail to reach them. We initiated a second wave of replication based around SNPs for which the WTCCC scan generated more modest evidence for association (Cochran-Armitage p ~10 to 10). We prioritized the 5367 SNPs in this range, using additional criteria: (a) evidence of association in DGI and FUSION (6, 7); (b) presence of multiple, independent (r<0.4) associations within the same locus; and (c) biological candidacy (8, 18).

Analysis of the 56 SNPs, representing 49 putative signals, selected for this “second wave” of replication (Table S4) yielded two further regions implicated in T2D-susceptibility. A cluster of SNPs on chromosome 9 (represented by rs10811661) generated a promising signal in all three scans. Replication was observed in UK samples (rs10811661:OR 1.18 [1.08-1.28], p=1.7×10), as well as DGI (p=2.2×10) and FUSION follow-up studies (rs2383208, p=9.7×10). A second signal from the WTCCC scan located ~100kb 5′ (rs564398, OR 1.16 [1.07-1.27], p=3.2×10) was weakly supported in the FUSION, but not the DGI scan (Tables 1, S3) and replicated in the UK RS samples (OR 1.12 [1.05-1.19], p=8.6×10) (Tables 1, S3).

These two association signals are separated by a recombination hotspot (D′ between rs10811661 and rs564398 is 0.057, r<0.001) (Figure 2). Across all studies, the combined evidence for association is stronger for the 3′ (p~7.8×10) than the 5′ (p~1.2×10) peak (Table 1). The 3′ signal maps to sequence with no characterized genes, while the recombination interval enclosing the 5′ signal includes the full coding sequences of CDKN2B and CDKN2A (encoding p15 and p16 respectively). CDKN2A is a known tumour suppressor and its product, p16, inhibits CDK4 (cyclin-dependent kinase 4), a powerful regulator of pancreatic beta cell replication (19-21). Overexpression of Cdkn2a leads to decreased islet proliferation in ageing mice (22). Cdkn2b overexpression is also causally related to islet hypoplasia and diabetes in murine models (23). Both CDKN2B and CDKN2A display high levels of expression in pancreatic islets and pituitary (Figure S2).

An external file that holds a picture, illustration, etc.
Object name is emss-54288-f0002.jpg
Overview of chr9 signal region

Panel layout as per Figure 1

A fifth replicated association lies within the IGF2BP2 gene on chromosome 3. We observed some evidence of association for SNPs in this region in the WTCCC scan (5) (e.g. rs4402960: OR 1.15 [1.05-1.25], p=1.7×10). Consistent associations in the DGI and FUSION scans (6, 7) and the biological candidacy of the gene (a known regulator of insulin-like growth factor 2 [IGF2] translation), prompted replication. We obtained only modest evidence for replication at rs4402960 (OR 1.09 [1.01-1.16], p=0.018) (Tables 1, S4), but combined evidence across all studies (p~8.6×10) establishes this as a genuine T2D signal (Tables 1, S3). The associated SNPs map to a 57kb region spanning the promoter and first 2 exons of IGF2BP2 (Figure S4).

Most of the remaining 50 “second wave” SNPs can be discounted as susceptibility variants based on their failure to replicate (Table S4), though some merit further consideration. One such example is rs9369425, located 57kb downstream of the VEGFA (vascular endothelial growth factor A) gene on chromosome 6 (Figure S5). Evidence for association in the WTCCC scan (OR 1.16 [1.06-1.27], p=8.6×10) is supported by nominal replication in UK samples (1.08 [1.01-1.15], p=0.03) and by DGI scan results (1.17 [1.04-1.32], p=4.4×10). While no signal is apparent in the FUSION study, this does not allow us to reject this association. For 80% power to detect an OR of 1.11 (α=0.05), over 3,000 case-control pairs are needed.

In the French genome-wide scan (10), variants in both the HHEX and SLC30A8 genes were implicated in T2D susceptibility. As the associated SNPs in SLC30A8 are poorly captured on the Affymetrix chip (r<0.01), the WTCCC scan was not informative for this locus. However, we genotyped rs13266634 independently and obtained replication of the finding (risk allele OR 1.12 [1.05-1.18], p=7.0×10 in all UK data) and across all three studies (p~5.3×10, Tables 1, S4).

The present analysis has contributed to identification of several confirmed T2D susceptibility loci. One of these (FTO) exerts its primary effect on T2D risk through an impact on adiposity (9): none of the other signals was attenuated by adjustment for BMI or waist circumference (Tables S5-S7). One of the remaining four loci (HHEX/IDE) represents a strong replication of findings recently reported (10). The other three loci (near CDKAL1, IGF2BP2 and CDKN2A), all showing extensive replication across the three studies represent novel T2D susceptibility loci.

Across the four T2D scans completed (5, 6, 7, 10), TCF7L2 clearly emerges as the largest association signal. On current evidence, all other confirmed loci display more modest effect sizes (between 1.10 and 1.25 per allele). Extensive resequencing and fine-mapping will be required to define the full spectrum of etiological variation at each locus and these may yet identify variants with greater impact. Our findings offer clear lessons for the design of future studies. Robust identification of variants with such effect sizes is only feasible with large-scale sample sets (13,965 individuals were typed in the present study). Further, the exchange of data between groups (providing data on up to 32,554 samples) was key to the rapid and unequivocal identification of the signals we report.

As a result of the four GWA studies reported to date (5, 6, 7, 10), the number of genuine, replicated T2D susceptibility signals has climbed from 3 to 9 (adding HHEX/IDE, SLC30A8, CDKAL1, CDKN2A, IGF2BP2 and FTO). However, these loci explain only a small proportion of the observed familiality (the sibling relative risk, λs, attributable to all loci in the UK samples is only ~1.07). We expect additional loci to be revealed by further rounds of replication initiated by more systematic meta-analysis of these and other scans. Our study provides an important validation of the genome-wide indirect association mapping approach and a demonstration of the value of aggressive data sharing efforts. It also generates insights into T2D pathogenesis emphasizing the likely importance of pathways involved in pancreatic beta cell development, regeneration and function. In-depth physiological and functional studies are now needed to establish the precise mechanisms involved.

Footnotes

Supporting Online Material

www.sciencemag.org

Materials and Methods

Replication set-only combined effect size estimation

Haplotype-based analysis results

Overlap between association and linkage signals

Quantitative trait analysis results

Departures from additivity

Data access details

Membership of WTCCC

Detailed acknowledgements

References

Figs. S1, S2, S3, S4, S5, S6, S7, S8

Tables S1, S2, S3, S4, S5, S6, S7, S8, S9, S10

Footnotes

REFERENCES

REFERENCES

References

  • 1. Stumvoll M, Goldstein BJ, van Haeften TW. Lancet. 2005;365:1333–1346.[PubMed]
  • 2. Altshuler D, et al Nat. Genet. 2000;26:76–80.[PubMed][Google Scholar]
  • 3. Gloyn AL, et al Diabetes. 2003;52:568–572.[PubMed][Google Scholar]
  • 4. Grant SF, et al Nat. Genet. 2006;38:320–323.[PubMed][Google Scholar]
  • 5. Donnelly P, the WTCCC, personal communication Data from the Wellcome Trust Case Control Consortium scan.
  • 6. Diabetes Genetics Initiative Science. (this issue)[PubMed]
  • 7. Scott LJ, et al Science. (this issue)[PubMed][Google Scholar]
  • 8. Materials and methods are available as on Science Online[PubMed]
  • 9. Frayling TM, et al Science. Published online April 12 2007.[Google Scholar]
  • 10. Sladek R, et al Nature. 2007;445:881–885.[PubMed][Google Scholar]
  • 11. Ubeda M, Rukstalis JM, Habener JF. J. Biol. Chem. 2006;281:28858–28864.[PubMed]
  • 12. Wei FY, et al Nat. Med. 2005;11:1104–1108.[PubMed][Google Scholar]
  • 13. Foley AC, Mercola M. Genes Dev. 2005;19:387–396.
  • 14. Bort R, Martinez-Barbera JP, Beddington RS, Zaret KS. Development. 2004;131:797–806.[PubMed]
  • 15. Fakhrai-Rad H, et al Hum. Mol. Genet. 2000;9:2149–2158.[PubMed][Google Scholar]
  • 16. Seta KA, Roth RA. Biochem. Biophys. Res. Commun. 1997;231:167–71.[PubMed]
  • 17. Farris W, et al Am. J. Pathol. 2004;164:1425–1434.[Google Scholar]
  • 18. GeneSniffer ( ) was used to prioritize genes for biological candidacy by integrating information from diverse online databases (including Entrez Gene, OMIM, PubMed and MGI).[PubMed]
  • 19. Rane SG, et al Nat. Genet. 1999;22:44–52.[PubMed][Google Scholar]
  • 20. Mettus RV, Rane SG. Oncogene. 2003;22:8413–8421.[PubMed]
  • 21. Marzo N, et al Diabetologia. 2004;47:686–694.[PubMed][Google Scholar]
  • 22. Krishnamurthy J, et al Nature. 2006;443:453–457.[PubMed][Google Scholar]
  • 23. Moritani M, et al Mol. Cell Endocrinol. 2005;229:175–184.[PubMed][Google Scholar]
  • 24. Blanchette M, et al Genome Res. 2004;14:708–715.[Google Scholar]
  • 25. Pettersson F, Jonsson O, Cardon LR. Bioinformatics. 2004;20:3241–3243.[PubMed]
  • 26. Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. Science. 2005;310:321.[PubMed]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.