Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease
Crohn’s disease (CD) and ulcerative colitis (UC), the two common forms of inflammatory bowel disease (IBD), affect over 2.5 million people of European ancestry with rising prevalence in other populations1. Genome-wide association studies (GWAS) and subsequent meta-analyses of CD and UC2,3 as separate phenotypes implicated previously unsuspected mechanisms, such as autophagy4, in pathogenesis and showed that some IBD loci are shared with other inflammatory diseases5. Here we expand knowledge of relevant pathways by undertaking a meta-analysis of CD and UC genome-wide association scans, with validation of significant findings in more than 75,000 cases and controls. We identify 71 new associations, for a total of 163 IBD loci that meet genome-wide significance thresholds. Most loci contribute to both phenotypes, and both directional and balancing selection effects are evident. Many IBD loci are also implicated in other immune-mediated disorders, most notably with ankylosing spondylitis and psoriasis. We also observe striking overlap between susceptibility loci for IBD and mycobacterial infection. Gene co-expression network analysis emphasizes this relationship, with pathways shared between host responses to mycobacteria and those predisposing to IBD.
We conducted an imputation-based association analysis using autosomal genotype level data from 15 GWAS of CD and/or UC (Supplementary Table 1, Supplementary Figure 1). We imputed 1.23 million SNPs from the HapMap3 reference set (Supplementary Methods), resulting in a high quality dataset with reduced genome-wide inflation (Supplementary Figures 2, 3) compared with previous meta-analyses of subsets of these data2,3. The imputed GWAS data identified 25,075 SNPs that had association p < 0.01 in at least one of the CD, UC or all IBD analyses. A meta-analysis of GWAS data with Immunochip6 validation genotypes from an independent, newly-genotyped set of 14,763 CD cases, 10,920 UC cases, and 15,977 controls was performed (Supplementary Table 1, Supplementary Figure 1). Principal components analysis resolved geographic stratification, as well as Jewish and non-Jewish ancestry (Supplementary Figure 4), and significantly reduced inflation to a level consistent with residual polygenic risk, rather than other confounding effects (from λGC = 2.00 to λGC = 1.23 when analyzing all IBD samples, Supplementary Methods, Supplementary Figure 5).
Our meta-analysis of the GWAS and Immunochip data identified 193 statistically independent signals of association at genome-wide significance (p < 5×10−8) in at least one of the three analyses (CD, UC, IBD). Since some of these signals (Supplementary Figure 6) probably represent associations to the same underlying functional unit, we merged these signals (Supplementary Methods) into 163 regions, of which 71 are reported here for the first time (Table 1, Supplementary Table 2). Figure 1A shows the relative contributions of each locus to the total variance explained in UC and CD. We have increased the total disease variance explained (variance being subject to fewer assumptions than heritability7) from 8.2% to 13.6% in CD and from 4.1% to 7.5% in UC (Supplementary Methods). Consistent with previous studies, our IBD risk loci seem to act independently, with no significant evidence of deviation from an additive combination of log odds ratios.
Our combined genome-wide analysis of CD and UC enables a more comprehensive analysis of disease specificity than was previously possible. A model selection analysis (Supplementary Methods 1d) showed that 110/163 loci are associated with both disease phenotypes; 50 of these have an indistinguishable effect size in UC and CD, while 60 show evidence of heterogeneous effects (Table 1). Of the remaining loci, 30 are classified as CD-specific and 23 as UC-specific. However, 43 of these 53 show the same direction of effect in the non-associated disease (Figure 1B, overall p=2.8×10−6). Risk alleles at two CD loci, PTPN22 and NOD2, show significant (p < 0.005) protective effects in UC, exceptions that may reflect biological differences between the two diseases. This degree of sharing of genetic risk suggests that nearly all the biological mechanisms involved in one disease play some role in the other.
The large number of IBD associations, far more than reported for any other complex disease, increases the power of network-based analyses to prioritize genes within loci. We investigated the IBD loci using functional annotation and empirical gene network tools (Supplementary Table 2). Compared with previous analyses which identified candidate genes in 35% of loci2,3 our updated GRAIL8 -connectivity network identifies candidates in 53% of loci, including increased statistical significance for 58 of the 73 candidates from previous analyses. The new candidates come not only from genes within newly identified loci, but also integrate additional genes from previously established loci (Figure 1C). Only 29 IBD-associated SNPs are in strong linkage disequilibrium (r2 > 0.8) with a missense variant in the 1000 Genomes Project data, which reinforces previous evidence that a large fraction of risk for complex disease is driven by non-coding variation. In contrast, 64 IBD-associated SNPs are in linkage disequilibrium with variants known to regulate gene expression (Supplementary Table 2). Overall, we highlighted a total of 300 candidate genes in 125 loci, of which 39 contained a single gene supported by two or more methods.
Seventy percent (113/163) of the IBD loci are shared with other complex diseases or traits, including 66 among the 154 loci previously associated with other immune-mediated diseases9, which is 8.6 times the number that would be expected by chance (Figure 2A, p < 10−16, Supplementary Figure 7). Such enrichment cannot be attributed to the immune-mediated focus of the Immunochip, (Supplementary Methods 4a(i), Supplementary Figure 8), since the analysis is based on our combined GWAS-Immunochip data. Comparing overlaps with specific diseases is confounded by the variable power in studies of different diseases. For instance, while type 1 diabetes (T1D) shares the largest number of loci (20/39, 10-fold enrichment) with IBD, this is partially driven by the large number of known T1D associations. Indeed, seven other immune-mediated diseases show stronger enrichment of overlap, with the largest being ankylosing spondylitis (8/11, 13-fold) and psoriasis (14/17, 14-fold).
IBD loci are also markedly enriched (4.9-fold, p < 10−4) in genes involved in primary immunodeficiencies (PIDs, Figure 2A), which are characterized by a dysfunctional immune system resulting in severe infections10. Genes implicated in this overlap correlate with reduced levels of circulating T-cells (ADA, CD40, TAP1/2, NBS1, BLM, DNMT3B), or of specific subsets such as Th17 (STAT3), memory (SP110), or regulatory T-cells (STAT5B). The subset of PIDs genes leading to Mendelian susceptibility to mycobacterial disease (MSMD)10–12 is enriched still further; six of the eight known autosomal genes linked to MSMD are located within IBD loci (IL12B, IFNGR2, STAT1, IRF8, TYK2 and STAT3, 46-fold enrichment, p = 1.3 × 10−6), and a seventh, IFNGR1, narrowly missed genome-wide significance (p = 6 × 10−8). Overlap with IBD is also seen in complex mycobacterial disease; we find IBD associations in 7/8 loci identified by leprosy GWAS13, including 6 cases where the same SNP is implicated. Furthermore, genetic defects in STAT314–15 and CARD916, also within IBD loci, lead to PIDs involving skin infections with staphylococcus and candidiasis, respectively. The comparative effects of IBD and infectious disease susceptibility risk alleles on gene function and expression is summarized in Supplementary Table 3, and include both opposite (e.g. NOD2 and STAT3, Supplementary Figure 9) and similar (e.g., IFNGR2) directional effects.
To extend our understanding of the fundamental biology of IBD pathogenesis we conducted searches across the IBD locus list: (i) for enrichment of specific GeneOntology (GO) terms and canonical pathways, (ii) for evidence of selective pressure acting on specific variants and pathways, and (iii) for enrichment of differentially expressed genes across immune cell types. We tested the 300 prioritized genes (see above) for enrichment in GO terms (Supplementary Methods) and identified 286 GO terms and 56 pathways demonstrating significant enrichment in genes contained within IBD loci (Supplementary Table 4, Supplementary Figure 10,11). Excluding high-level GO categories such as “immune system processes” (p = 3.5 × 10−26), the most significantly enriched term is regulation of cytokine production (p=2.7×10−24), specifically IFNG-γ, IL-12, TNF-α, and IL-10 signalling. Lymphocyte activation was the next most significant (p=1.8 × 10−23), with activation of T-, B-, and NK-cells being the strongest contributors to this signal. Strong enrichment was also seen for response to molecules of bacterial origin (p=2.4 × 10−20), and for KEGG’s JAK-STAT signalling pathway (p = 4.8 × 10−15). We note that no enriched terms or pathways showed specific evidence of CD- or UC-specificity.
As infectious organisms are known to be among the strongest agents of natural selection, we investigated whether the IBD-associated variants are subject to selective pressures (Supplementary Methods, Supplementary Table 5). Directional selection would imply that the balance between these forces shifted in one direction over the course of human history, whereas balancing selection would suggest an allele frequency dependent-scenario typified by host-microbe co-evolution, as can be observed with parasites. Two SNPs show Bonferroni-significant selection: the most significant signal, in NOD2, is under balancing selection (p = 5.2 × 10−5), and the second most significant, in the receptor TNFRSF18, showed directional selection (p = 8.9 × 10−5). The next most significant variants were in the ligand of that receptor, TNFSF18 (directional, p = 5.2 × 10−4), and IL23R (balancing, p = 1.5 × 10−3). As a group, the IBD variants show significant enrichment in selection (Figure 2B) of both types (p = 5.5 × 10−6). We discovered an enrichment of balancing selection (Figure 2B) in genes annotated with the GO term “regulation of interleukin-17 production” (p = 1.4 × 10−4). The important role of IL17 in both bacterial defense and autoimmunity suggests a key role for balancing selection in maintaining the genetic relationship between inflammation and infection, and this is reinforced by a nominal enrichment of balancing selection in loci annotated with the broader GO term “defense response to bacterium” (p = 0.007).
We tested for enrichment of cell-type expression specificity of genes in IBD loci in 223 distinct sets of sorted, mouse-derived immune cells from the Immunological Genome Consortium17. Dendritic cells showed the strongest enrichment, followed by weaker signals that support the GO analysis, including CD4+ T, NK and NKT cells (Figure 2C). Notably, several of these cell types express genes near our IBD associations much more specifically when stimulated; our strongest signal, a lung-derived dendritic cell, had p stimulated < 1×10−6 compared with p unstimulated = 0.0015, consistent with an important role for cell activation.
To further our goal of identifying likely causal genes within our susceptibility loci and to elucidate networks underlying IBD pathogenesis, we screened the associated genes against 211 co-expression modules identified from weighted gene co-expression network analyses18, conducted with large gene expression datasets from multiple tissues19–21. The most significantly enriched module comprised 523 genes from omental adipose tissue collected from morbidly obese patients19, which was found to be 2.9-fold enriched for genes in the IBD-associated loci (p = 1.1 × 10−13, Supplementary Table 6, Supplementary Figure 12). We constructed a probabilistic causal gene network using an integrative Bayesian network reconstruction algorithm22–24 which combines expression and genotype data to infer the direction of causality between genes with correlated expression. The intersection of this network and the genes in the IBD-enriched module defined a sub-network of genes enriched in bone marrow-derived macrophages (p < 10−16) and is suggestive of dynamic interactions relevant to IBD pathogenesis. In particular, this sub-network featured close proximity amongst genes connected to host interaction with bacteria, notably NOD2, IL10, and CARD9.
A NOD2-focused inspection of the sub-network prioritizes multiple additional candidate genes within IBD-associated regions. For example, a cluster near NOD2 (Figure 2D) contains multiple IBD genes implicated in M.tb response, including SLC11A1, VDR and LGALS9. Furthermore, both SLC11A1 (also known as NRAMP1) and VDR have been associated with M.tb infection by candidate gene studies25–26, and LGALS9 modulates mycobacteriosis27. Of interest, HCK (located in our new locus on chromosome 20 at 30.75Mb) is predicted to upregulate expression of both NOD2 and IL10, an anti-inflammatory cytokine associated with Mendelian28 and non-Mendelian IBD29. HCK has been linked to alternative, anti-inflammatory activation of monocytes (M2 macrophages)30; while not identified in our aforementioned analyses, these data implicate HCK as the causal gene in this new IBD locus.
We report one of the largest genetic experiments involving a complex disease undertaken to date. This has increased the number of confirmed IBD susceptibility loci to 163, most of which are associated with both CD and UC, and is substantially more than reported for any other complex disease. Even this large number of loci explains only a minority of the variance in disease risk, which suggests that other factors such as rarer genetic variation not captured by GWAS or environmental exposures make substantial contributions to pathogenesis. Most of the evidence relating to possible causal genes points to an essential role for host defence against infection in IBD. In this regard the current results focus ever closer attention on the interaction between the host mucosal immune system and microbes both at the epithelial cell surface and within the gut lumen. In particular, they raise the question, in the context of this burden of IBD susceptibility genes, as to what triggers components of the commensal microbiota to switch from a symbiotic to a pathogenic relationship with the host. Collectively, our findings have begun to shed light on these questions and provide a rich source of clues to the pathogenic mechanisms underlying this archetypal complex disease.
We conducted a meta-analysis of GWAS datasets after imputation to the HapMap3 reference set, and aimed to replicate in the Immunochip data any SNPs with p < 0.01. We compared likelihoods of different disease models to assess whether each locus was associated with CD, UC or both. We used databases of eQTL SNPs and coding SNPs in linkage disequilibrium with our hit SNPs, as well as the network tools GRAIL and DAPPLE, and a co-expression network analysis to prioritize candidate genes in our loci. Gene Ontology, ImmGen mouse immune cell expression resource, the TreeMix selection software, and a Bayesian causal network analysis were used to functionally annotate these genes.
|Chr||Position (hg19 (Mb))||SNP||Key Genes (+N additional in locus)|
|7||2.78||rs798502||CARD11, GNA12, (5)|
|1||161.47||rs1801274†||FCGR2A/B, FCGR3A, (13)|
|2||102.86*||rs917997†||IL18RAP, IL1R1, (7)|
|3||48.96**||rs3197999||MST1, PFKB4, (63)|
|5||96.24||rs1363907||ERAP2, ERAP1, (3)|
|5||131.19*||rs2188962†||IBD5 locus, (18)|
|22||30.43||rs2412970||LIF, OSM, (9)|
*= additional genome-wide significant associated SNP in the region.
**= two or more additional genome-wide significant SNPs in the region.
‡ = These regions have overlapping but distinct UC and CD signals.
† = heterogeneity of odds ratios.
§ = CD risk allele is significantly protective in UC.
||= gene for which functional studies of associated alleles have been reported. Newly discovered loci. Bolded rs numbers indicate SNPs with p-values less than 10−13. Listed are genes implicated by one or more candidate genes approaches. Bolded genes have been implicated by two or more candidate gene approaches. For each locus, the top two candidate genes are listed. A complete listing of gene prioritization is provided in Supplementary Table 2.
Author Contributions Conceived and designed study, managed study and funding: JHC, JCB, RKW, RHD, DPM, MDA, VA, AF, MP, SV. Manuscript preparation: JHC, JCB, LJ, SR, RKW, RHD, DPM, MJD, MP, CGM. Performed or supervised statistical and computational analyses: JHC, JCB, LJ, SR, RKW, KYH, CAA, JE, KN, SLS, SR, ZW, CA, AC, GB, MH, XH, BZ, CKZ, HZ, JDR, EES, MJD. Study subject recruitment and assembled phenotypic data: RKW, RHD, DPM, JCL, LPS, YS, PG, JPA, TA, LA, ANA, VA, JMA, LB, PAB, AB, SB, CB, SC, MDA, DDJ, KLD, MD, CE, LRF, DF, MG, RG, JG, AH, CH, THK, LK, SK, AL, DL, EL, ICL, CWL, ARM, CM, GM, JM, WN, OP, CYP, UP, NJP, MR, JIR, RKR, JDS, MS, JS, SS, LAS, JS, SRT, MT, HWV, MDV, CW, DCW, JW, RJX, SZ, MSS, VA, HH, SRB, JDR, GRS, CGM, AF, MP, SV, JHC. Established DNA collections, genotyping and data management: RKW, RHD, DPM, LPS, YS, MM, IC, ET, TB, DE, KF, TH, KDT, CGM, AF, MP, JHC. All authors read and approved the final manuscript before submission.
Data have been deposited in NCBI’s database of Genotypes and Phenotypes (dbGaP) through study accession numbers phs000130.v1.p1 and phs000345.v1.p1. Summary statistics for imputed GWAS are available at http://www.broadinstitute.org/mpg/ricopili/. Summary statistics for the meta-analysis markers are available at http://www.ibdgenetics.org/. The 523 causal gene network cytoscape file is available on request. Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests. Readers are welcome to comment on the online version of this article at www.nature.com/nature.
We thank all the subjects who contributed samples and the physicians and nursing staff who helped with recruitment globally. UK case collections were supported by the National Association for Colitis and Crohn’s disease, Wellcome Trust grant 098051 (LJ, CAA, JCB), Medical Research Council UK, the Catherine McEwan Foundation, an NHS Research Scotland career fellowship (RKR), Peninsular College of Medicine and Dentistry, Exeter, the National Institute for Health Research, through the Comprehensive Local Research Network and through Biomedical Research Centre awards to Guy’s & St. Thomas’ National Health Service Trust, King’s College London, Addenbrooke’s Hospital, University of Cambridge School of Clinical Medicine and to the University of Manchester and Central Manchester Foundation Trust. The British 1958 Birth Cohort DNA collection was funded by Medical Research Council grant G0000934 and Wellcome Trust grant 068545/Z/02, and the UK National Blood Service controls by the Wellcome Trust. The Wellcome Trust Case Control Consortium projects were supported by Wellcome Trust grants 083948/Z/07/Z, 085475/B/08/Z and 085475/Z/08/Z. North American collections and data processing were supported by funds to the NIDDK IBD Genetics Consortium which is funded by the following grants: DK062431 (SRB), DK062422 (JHC), DK062420 (RHD), DK062432 (JDR), DK062423 (MSS), DK062413 (DPM), DK076984 (MJD), DK084554 (MJD and DPM) and DK062429 (JHC). Additional funds were provided by funding to JHC (DK062429-S1 and Crohn’s & Colitis Foundation of America, Senior Investigator Award (5-2229)), and RHD (CA141743). KYH is supported by the NIH MSTP TG T32GM07205 training award. Cedars-Sinai is supported by USPHS grant PO1DK046763 and the Cedars-Sinai F. Widjaja Inflammatory Bowel and Immunobiology Research Institute Research Funds, National Center for Research Resources (NCRR) grant M01-RR00425, UCLA/Cedars-Sinai/Harbor/Drew Clinical and Translational Science Institute (CTSI) Grant [UL1 TR000124-01], the Southern California Diabetes and Endocrinology Research Grant (DERC) [DK063491], The Helmsley Foundation (DPM) and the Crohn’s and Colitis Foundation of America (DPM). RJX and ANA are funded by DK83756, AI062773, DK043351 and the Helmsley Foundation. The Netherlands Organization for Scientific Research supported RKW with a clinical fellowship grant (90.700.281) and CW (VICI grant 918.66.620). CW is also supported by the Celiac Disease Consortium (BSIK03009). This study was also supported by the German Ministry of Education and Research through the National Genome Research Network, the Popgen biobank, through the Deutsche Forschungsgemeinschaft (DFG) cluster of excellence ‘Inflammation at Interfaces’ and DFG grant no. FR 2821/2-1. S Brand was supported by (DFG BR 1912/6-1) and the Else-Kröner-Fresenius-Stiftung (Else Kröner-Exzellenzstipendium 2010_EKES.32). Italian case collections were supported by the Italian Group for IBD and the Italian Society for Paediatric Gastroenterology, Hepatology and Nutrition and funded by the Italian Ministry of Health GR-2008-1144485. Activities in Sweden were supported by the Swedish Society of Medicine, Ihre Foundation, Örebro University Hospital Research Foundation, Karolinska Institutet, the Swedish National Program for IBD Genetics, the Swedish Organization for IBD, and the Swedish Medical Research Council. DF and SV are senior clinical investigators for the Funds for Scientific Research (FWO/FNRS) Belgium. We acknowledge a grant from Viborg Regional Hospital, Denmark. VA was supported by SHS Aabenraa, Denmark. We acknowledge funding provided by the Royal Brisbane and Women’s Hospital Foundation, National Health and Medical Research Council, Australia and by the European Community (5th PCRDT). We gratefully acknowledge the following groups who provided biological samples or data for this study: the Inflammatory Bowel in South Eastern Norway (IBSEN) study group, the Norwegian Bone Marrow Donor Registry (NMBDR), the Avon Longitudinal Study of Parents and Children, the Human Biological Data Interchange and Diabetes UK, and Banco Nacional de ADN, Salamanca. This research also utilizes resources provided by the Type 1 Diabetes Genetics Consortium, a collaborative clinical study sponsored by the NIDDK, NIAID, NHGRI, NICHD, and JDRF and supported by U01 DK062418. The KORA study was initiated and financed by the Helmholtz Zentrum München – German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research (BMBF) and by the State of Bavaria. KORA research was supported within the Munich Center of Health Sciences (MC Health), Ludwig-Maximilians-Universität, as part of LMUinnovativ.
- 1. Increasing incidence and prevalence of the inflammatory bowel diseases with time, based on systematic reviewGastroenterology14246542012
- 2. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47Nat Genet432462522011
- 3. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility lociNat Genet42111811252010
- 4. Genetics pathogenesis of inflammatory bowel diseaseNature4743073172011
- 5. Genomics and the multifactorial nature of human autoimmune diseaseN Engl J Med365161216232011
- 6. Promise and pitfalls of the ImmunochipArthritis Res Ther131012011
- 7. The mystery of missing heritability: Genetic interactions create phantom heritabilityProc Natl Acad Sci USA109119311982012
- 8. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletionsPLoS Genet5e10005342009
- 9. Potential etiologic and functional implications of genome-wide association loci for human diseases and traitsProc Natl Acad Sci USA106936293672009
- 10. International Union of Immunological Societies Expert Committee on Primary I et alPrimary immunodeficiencies: 2009 updateJ Allergy Clin Immunol124116111782009
- 11. Genetic lessons learned from X-linked Mendelian susceptibility to mycobacterial diseasesAnn NY Acad Sci1246921012011
- 12. Genetically determined susceptibility to mycobacterial infectionJ Clin Pathol61100610122008
- 13. Identification of two new loci at IL23R and RAB32 that influence susceptibility to leprosyNat Genet43124712512011
- 14. STAT3 mutations in the hyper-IgE syndromeN Engl J Med357160816192007
- 15. Dominant-negative mutations in the DNA-binding domain of STAT3 cause hyper-IgE syndromeNature448105810622007
- 16. A homozygous CARD9 mutation in a family with susceptibility to fungal infectionsN Engl J Med361172717352009
- 17. Integrating autoimmune risk loci with gene-expression data identifies specific pathogenic immune cell subsetsAm J Hum Genet894965062011
- 18. A general framework for weighted gene co-expression network analysisStat Appl Genet Mol Biol4Article172005
- 19. A survey of the genetics of stomach, liver, and adipose gene expression from a morbidly obese cohortGenome Res21100810162011
- 20. Genetics of gene expression and its effect on diseaseNature4524234282008
- 21. Mapping the genetic architecture of gene expression in human liverPLoS Biol6e1072008
- 22. Variations in DNA elucidate molecular networks that cause diseaseNature4524294352008
- 23. Liver and adipose expression associated SNPs are enriched for association to type 2 diabetesPLoS Genet6e10009322010
- 24. Increasing the power to detect causal associations by combining genotypic and expression data in segregating populationsPLoS Comput Biol3e692007
- 25. Meta-analysis of vitamin D receptor polymorphisms and pulmonary tuberculosis riskInt J Tuberc Lung Dis9117411772005
- 26. SLC11A1 (NRAMP1) polymorphisms and tuberculosis susceptibility: updated systematic review and meta-analysisPloS One6e158312011
- 27. Genome-wide analysis of the host intracellular network that regulates survival of Mycobacterium tuberculosisCell1407317432010
- 28. Infant colitis--it’s in the genesLancet37612722010
- 29. Sequence variants in IL10, ARPC2 and multiple other loci contribute to ulcerative colitis susceptibilityNat Genet40131913232008
- 30. Hck is a key regulator of gene expression in alternatively activated human monocytesJ Biol Chem28636709367232011