Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease.
Journal: 2008/September - Nature Genetics
ISSN: 1546-1718
Abstract:
Several risk factors for Crohn's disease have been identified in recent genome-wide association studies. To advance gene discovery further, we combined data from three studies on Crohn's disease (a total of 3,230 cases and 4,829 controls) and carried out replication in 3,664 independent cases with a mixture of population-based and family-based controls. The results strongly confirm 11 previously reported loci and provide genome-wide significant evidence for 21 additional loci, including the regions containing STAT3, JAK2, ICOSLG, CDKAL1 and ITLN1. The expanded molecular understanding of the basis of this disease offers promise for informed therapeutic development.
Relations:
Content
Citations
(963)
References
(48)
Clinical trials
(2)
Diseases
(1)
Conditions
(1)
Genes
(28)
Organisms
(1)
Processes
(2)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Nat Genet 40(8): 955-962

Genome-wide association defines more than thirty distinct susceptibility loci for Crohn's disease

+55 authors

Results

Meta-analysis of three genome-wide association scans

The combined GWAS study samples (Table 1) consisted of 3,230 cases and 4,829 controls, all of European descent. While the individual scans did identify new risk factors, they were only well-powered to discover common alleles with odds-ratios (ORs) above 1.3 (in the case of the WTCCC) or 1.5 (the smaller two scans, Figure 1). By contrast, the combined sample has 74% power at an OR of 1.2, allowing evaluation of the role of alleles with smaller effect sizes for the first time. As two different genotyping technologies were used in the constituent scans, we utilized recently developed imputation1011 methods to assess association across all three studies at 635,547 SNPs contained on one or both platforms. A quantile-quantile (Q-Q) plot of the primary meta-statistic (single SNP Z-scores, Figure 2) shows a striking excess of significant associations, well beyond what would be attributable to the modest overall distributional inflation (genomic control λ < 1.16). Despite the large sample size, the overall inflation is modest because (1) each group had separately tested for evidence of population stratification, and the meta-analysis used a test that combined the results from each study (rather than mixing the raw data and compromising the case-control matching of each study), and (2) imputation was done on all samples ignoring case status and thus would not introduce artifactual differences between cases and controls12.

An external file that holds a picture, illustration, etc.
Object name is nihms53734f1.jpg

Power to detect a genetic effect of various sizes (odds ratio 1.2, 1.3, 1.5) versus study sample size. Power is reported here as the probability (given a multiplicative model and risk allele frequency of 20%) of p < 5×10 in a scan – the value used to define regions for attempting replication in a larger sample set. Vertical dotted lines show the sample sizes for the three constituent scans and the meta-analysis. Relatively large effects are likely to be detected by any of these scans, whereas only the combined analysis is well powered to detect more modest effects.

An external file that holds a picture, illustration, etc.
Object name is nihms53734f2.jpg

A quantile-quantile plot of observed -log10 p values versus the expectation under the null. Black points represent the complete meta-analysis, with a substantial departure from the null at the tail (values > 8 are represented along the top of the plot as triangles). Dark blue points show the distribution after removing 11 previously published loci, demonstrating a still notable excess. Light blue points show the distribution after removing all 40 loci which replicate at least nominally. In all the cases the overall distribution is marginally inflated (λGC < 1.16).

We focus our attention in this study specifically on the 526 SNPs from 74 distinct genomic loci which were associated with p < 5×10 – more than 7 times the number of SNPs expected by chance even after correction for the modest overall inflation detected. This threshold for follow-up is not meant to imply that there are no genuine associations among SNPs with less significant association in the meta-analysis, but rather reflects a practical desire to prioritize as many true positives as possible for immediate replication. Eleven associations previously replicated and established at genome-wide significance levels (Methods, Table 2), including both “historical” associations at NOD21314 and 5q31 (IBD5) 15 as well as recent replicated findings from individual GWA scans such as IL23R, ATG16L1, IRGM, TNFSF15 and PTPN22616, were among the 74 regions represented in this tail of the distribution of association statistics. Even after removing all SNPs in LD with these eleven loci, however, there continued to be a substantial excess of associated alleles beyond that which would be expected by chance (Figure 2).

Table 2

Convincingly (Bonferroni p < 0.05) replicated CD risk loci
p valuesOdds ratios
SNPChrCritical regionScanReplicationCombinedNum. genesGene of interestRAFRisk alleleCase CtrlTDT
(a) Previously published loci
rs114658041p3167.4*1.01×10-353.1×10-296.66×10-63NAIL23R0.933T2.502.77
rs38283092q37230.9*1.13×10-207.67×10-142.36×10-32NAATG16L10.533G1.281.30
rs31979993p2148.73 - 49.872.16×10-75.64×10-71.15×10-1235MST1500.271A1.201.20
rs46137635p1340.32 - 40.484.52×10-222.79×10-86.82×10-270PTGER4**0.125C1.321.28
rs21889625q31131.44 - 131.904.58×10-93.52×10-112.32×10-1870.425T1.251.26
rs117472705q33150.15 - 150.326.36×10-112.57×10-73.40×10-163IRGM0.090G1.331.31
rs42638399q32114.61 - 114.783.92×10-76.58×10-52.60×10-102TNFSF150.677G1.221.07
rs1099527110q2164.05 - 64.121.90×10-111.61×10-104.46×10-201ZNF3650.387C1.251.53
rs1119014010q24101.26 - 101.321.71×10-101.69×10-73.06×10-161NKX2-30.478T1.201.28
rs206684716q1249.3*NA1.49×10-242.98×10-24NANOD20.018C3.992.57
rs254215118p1112.73 - 12.881.19×10-112.41×10-75.10×10-171P TPN20.152G1.351.14
(b) Novel loci
rs24766011p13113.79 - 114.171.81×10-50.0001011.46×10-87P TPN220.899G1.311.17
rs22749101q23157.65 - 157.723.50×10-70.0004811.46×10-92ITLN10.682C1.141.62
rs92868791q24169.54 - 169.674.02×10-70.0003211.53×10-900.243G1.191.08
rs115843831q32197.60 - 197.776.82×10-72.34×10-61.43×10-1130.697T1.181.20
rs100454315q33158.69 - 158.768.80×10-93.66×10-63.86×10-131IL12B0.708C1.111.36
rs69084256p2220.63 - 20.842.52×10-70.0002788.96×10-101CDKAL10.780C1.211.09
rs77460826q21106.52 - 106.623.70×10-67.7×10-62.44×10-1000.289C1.171.19
rs23014366q27167.32 - 167.523.30×10-73.26×10-71.04×10-123CCR60.463T1.211.16
rs14568937p1250.03 - 50.114.92×10-51.1×10-54.60×10-900.678A1.201.14
rs15513988q24126.60 - 126.624.90×10-60.0001094.50×10-900.619A1.081.25
rs107586699p244.94 - 5.266.80×10-70.000433.46×10-93JAK20.348C1.121.21
rs1758241610p1135.30 - 35.608.48×10-62.53×10-51.79×10-930.345G1.161.26
rs792789411q1375.80 - 76.021.43×10-70.0007321.32×10-91C11orf300.386T1.161.07
rs1117559312q1238.61 - 39.311.33×10-70.0001653.08×10-103LRRK2, MUC190.017T1.541.44
rs376414713q1443.13 - 43.541.61×10-71.33×10-72.08×10-1330.221G1.251.19
rs287250717q2134.63 - 35.342.12×10-60.0002925.00×10-917ORMDL30.473A1.121.24
rs74416617q2137.74 - 37.955.94×10-69.15×10-86.82×10-124STAT30.565A1.181.25
rs173613521q2115.73 - 15.762.06×10-54.58×10-57.40×10-900.565T1.181.10
rs76242121q2244.43 - 44.481.08×10-51.59×10-51.41×10-91ICOSLG0.389G1.131.21

RAF is risk allele frequency in control samples (see Supplementary Table 5 for details). Critical region is in NCBI B35 coordinates, with definition as described in Methods. Risk alleles are defined relative to the + strand of the reference.

regions where causal variants have been convincingly mapped, rendering the LD window uninformative.
PTGER4 is outside the critical region, but was implicated via eQTL analysis.

Replication of 21 new loci

As these 74 regions included the 11 already reported as independently replicated and meeting genome-wide significance thresholds, this replication experiment effectively explored 63 putative associations in novel regions with 11 positive controls (Supplementary Table 1). To identify the true risk factors from these 63 regions, we undertook a replication study involving a total of 2,325 additional Crohn's disease cases and 1,809 controls alongside an independent family-based dataset of 1,339 parent-parent-affected offspring trios.

Results (significance levels and odds ratios) for strongly replicating loci, including all positive controls, are presented in Table 2. The distribution of Z-scores from the 63 putative regions shows a dramatic departure from the null distribution (Figure 3) with 19 novel regions showing significant replication (p < 0.0008 – a value of 0.05/63 representing a conservative threshold expected to be exceeded only once by chance in 20 such replication experiments). SNPs on chromosome 19p13 (replication p = 0.00347, combined p = 2.12×10) and in the MHC (replication p = 0.006, combined p = 5.2×10 - suspected but not previously conclusively established in Crohn's disease) did not reach this conservative threshold, but so convincingly satisfy proposed thresholds for genome-wide significance (p<5×10, Methods) that we propose these as the 20 and 21 additional Crohn's disease associated loci defined here. A further 8 of the 42 remaining loci showed nominal replication (Table 3).

An external file that holds a picture, illustration, etc.
Object name is nihms53734f3.jpg

Distribution of observed Z scores from the 63 novel regions explored, along with the expected distribution under the null (a standard normal with mean 0 and variance 1). Even setting aside the 21 regions reaching genome-wide significance, the distribution is highly skewed – 4 more results exceed a Z of 2 (1 would be expected by chance under the null) whilst none showed a Z of less than -2 (same expectation under the null) suggesting that even more of the regions investigated here are likely to constitute true positive associations when additional data become available.

Table 3

Nominally (p < 0.05) replicated CD risk loci
p valuesOdds ratios
SNPChrCritical regionScanReplicationCombinedNum. genesGene of interestRAFRisk alleleCase CtrlTDT
rs480756919p131.05 -1.151.16×10-80.003472.12×10-920.217C1.021.26
rs7800942p2327.30 - 27.773.82×10-60.003813.14×10-722GCKR0.397T1.081.13
rs37633136p2132.44-32.79 *1.45×10-80.006025.20×10-97BTNL2, DRA, DRB, DQA0.188C1.191.01
rs130034642p1661.09 - 61.143.44×10-50.005654.60×10-61CCDC1390.376G1.161.08
rs99180417q1229.57 - 29.704.02×10-60.01351.07×10-64CCL2, CCL70.726C1.11.08
rs125291986p255.04 - 5.117.08×10-70.01926.96×10-71LYRM40.062G1.121.19
rs173098276p253.36 - 3.422.08×10-60.03912.74×10-61SLC22A230.639T1.11.02
rs77580806q25149.54 - 149.657.28×10-60.0448.78×10-600.274G1.120.99
rs809867318q1117.74 - 17.933.18×10-50.04432.88×10-500.329C1.051.09
rs9179972q11102.31 - 102.642.16×10-50.04932.22×10-55IL18RAP0.222T1.051.11

RAF is risk allele frequency in control samples (see Supplementary Table 5 for details). Critical region is in NCBI B35 coordinates, with definition as described in Methods. Risk alleles are defined relative to the + strand of the reference.

SNPs with p<0.0001 were observed throughout the MHC from 30.2 – 32.9 Mb but only this largest signal from the region was followed up. More detailed study of the MHC will be required to identify and localize potentially independent signals from this region.

It is possible that extreme population substructure in the replication sample could give rise to such a striking excess of hits. While unlikely, this was directly evaluated by the large family-based component of the replication study. Odds ratio estimates from the TDT analysis of the North American, French and Belgian families alone are consistent with those from the UK and Belgian case/control samples (Tables 2 &amp; 3), with all 21 newly defined loci showing odds ratios in the same direction of association with the original scan in the family-based component (and nearly half showing greater OR than in the case-control arm). Importantly, none of the significantly or nominally replicating loci show significant evidence for heterogeneity (across studies or between family-based and population-based arms) when corrected for the number of tests performed. This independent family based evidence (Supplementary Table 6) confirms these alleles constitute true Crohn's disease loci.

For this newly expanded set of 32 unequivocally associated loci, we assessed whether there was evidence of significant pairwise interactions which could add further to the overall variance in liability explained by this set of loci. We performed a case-only analysis of the 3,664 cases in the replication study and observed no interactions that withstood a correction for the number of tests performed (Supplementary Table 2).

Deciphering the genetic architecture of CD

The contributions of the 32 loci to disease risk were computed using a standard liability threshold model and are displayed as a histogram of individual variances (Figure 4). The observations from this variance analysis that many loci were detected for which the current study had low power, and that only a minority of the variance in risk is explained by these 32 loci, suggest that many additional loci are yet to be identified. This is reinforced by the additional 8 nominal replications (Table 3) where only 2 or 3 would be expected by chance, and by the continued excess of small p values when these 40 total regions are removed (Figure 2).

An external file that holds a picture, illustration, etc.
Object name is nihms53734f4.jpg

Histogram of percent variance explained by each of the 32 established CD risk loci. The distribution resembles the long postulated exponential distribution of effect sizes. Dashed line shows the joint power for our meta-analysis to detect (p < 5×10), and for our replication sample to replicate (at Bonferroni corrected p values), a 20% variant explaining a given fraction of variance. Note how quickly this curve moves from nearly zero power to detect tiny effects (less than one tenth of one percent) to nearly full power to detect larger effects (presuming they are well covered by the current generation of GWAS chips). Complete power near the origin would likely reveal a more complete exponential distribution, with many very small effects. These are likely to increase somewhat once the causal variant or variants are identified in each locus. Indeed, NOD2 and IL23R are distant outliers, each explaining 1-2% of total variance, partially because multiple causal variants have already been discovered at these loci613.

While recognizing that fine-mapping is required to identify specific causal variants, we performed a series of analyses to gain some general insight into the CD associations. We first queried HapMap to discover any instances where a non-synonymous SNP (nsSNP) was correlated (r > 0.5) to the most associated variant discovered in this study. Accepting that HapMap is not a complete catalogue of nsSNPs, but including four loci where fine-mapping has identified coding variants, just 9 of the 32 genomewide significant associations were correlated with a known nsSNP (Supplementary Table 3). To explore whether any of the associations reflect a cis-acting regulatory effect on a nearby gene, we evaluated genotype-expression correlation using the panel of 400 lymphoblastoid cell lines described by Dixon et al.17. From all genes within 250 kb of the LD-based intervals defined in Table 2 and and3,3, five correlations between expression of a nearby gene and a CD-associated variant were identified (LOD > 2) (Supplementary Table 4). This was far in excess of chance (p∼0.001) (Supplementary Figure 1) and suggests that regulatory variation also contributes to the genetic architecture identified.

Meta-analysis of three genome-wide association scans

The combined GWAS study samples (Table 1) consisted of 3,230 cases and 4,829 controls, all of European descent. While the individual scans did identify new risk factors, they were only well-powered to discover common alleles with odds-ratios (ORs) above 1.3 (in the case of the WTCCC) or 1.5 (the smaller two scans, Figure 1). By contrast, the combined sample has 74% power at an OR of 1.2, allowing evaluation of the role of alleles with smaller effect sizes for the first time. As two different genotyping technologies were used in the constituent scans, we utilized recently developed imputation1011 methods to assess association across all three studies at 635,547 SNPs contained on one or both platforms. A quantile-quantile (Q-Q) plot of the primary meta-statistic (single SNP Z-scores, Figure 2) shows a striking excess of significant associations, well beyond what would be attributable to the modest overall distributional inflation (genomic control λ < 1.16). Despite the large sample size, the overall inflation is modest because (1) each group had separately tested for evidence of population stratification, and the meta-analysis used a test that combined the results from each study (rather than mixing the raw data and compromising the case-control matching of each study), and (2) imputation was done on all samples ignoring case status and thus would not introduce artifactual differences between cases and controls12.

An external file that holds a picture, illustration, etc.
Object name is nihms53734f1.jpg

Power to detect a genetic effect of various sizes (odds ratio 1.2, 1.3, 1.5) versus study sample size. Power is reported here as the probability (given a multiplicative model and risk allele frequency of 20%) of p < 5×10 in a scan – the value used to define regions for attempting replication in a larger sample set. Vertical dotted lines show the sample sizes for the three constituent scans and the meta-analysis. Relatively large effects are likely to be detected by any of these scans, whereas only the combined analysis is well powered to detect more modest effects.

An external file that holds a picture, illustration, etc.
Object name is nihms53734f2.jpg

A quantile-quantile plot of observed -log10 p values versus the expectation under the null. Black points represent the complete meta-analysis, with a substantial departure from the null at the tail (values > 8 are represented along the top of the plot as triangles). Dark blue points show the distribution after removing 11 previously published loci, demonstrating a still notable excess. Light blue points show the distribution after removing all 40 loci which replicate at least nominally. In all the cases the overall distribution is marginally inflated (λGC < 1.16).

We focus our attention in this study specifically on the 526 SNPs from 74 distinct genomic loci which were associated with p < 5×10 – more than 7 times the number of SNPs expected by chance even after correction for the modest overall inflation detected. This threshold for follow-up is not meant to imply that there are no genuine associations among SNPs with less significant association in the meta-analysis, but rather reflects a practical desire to prioritize as many true positives as possible for immediate replication. Eleven associations previously replicated and established at genome-wide significance levels (Methods, Table 2), including both “historical” associations at NOD21314 and 5q31 (IBD5) 15 as well as recent replicated findings from individual GWA scans such as IL23R, ATG16L1, IRGM, TNFSF15 and PTPN22616, were among the 74 regions represented in this tail of the distribution of association statistics. Even after removing all SNPs in LD with these eleven loci, however, there continued to be a substantial excess of associated alleles beyond that which would be expected by chance (Figure 2).

Table 2

Convincingly (Bonferroni p < 0.05) replicated CD risk loci
p valuesOdds ratios
SNPChrCritical regionScanReplicationCombinedNum. genesGene of interestRAFRisk alleleCase CtrlTDT
(a) Previously published loci
rs114658041p3167.4*1.01×10-353.1×10-296.66×10-63NAIL23R0.933T2.502.77
rs38283092q37230.9*1.13×10-207.67×10-142.36×10-32NAATG16L10.533G1.281.30
rs31979993p2148.73 - 49.872.16×10-75.64×10-71.15×10-1235MST1500.271A1.201.20
rs46137635p1340.32 - 40.484.52×10-222.79×10-86.82×10-270PTGER4**0.125C1.321.28
rs21889625q31131.44 - 131.904.58×10-93.52×10-112.32×10-1870.425T1.251.26
rs117472705q33150.15 - 150.326.36×10-112.57×10-73.40×10-163IRGM0.090G1.331.31
rs42638399q32114.61 - 114.783.92×10-76.58×10-52.60×10-102TNFSF150.677G1.221.07
rs1099527110q2164.05 - 64.121.90×10-111.61×10-104.46×10-201ZNF3650.387C1.251.53
rs1119014010q24101.26 - 101.321.71×10-101.69×10-73.06×10-161NKX2-30.478T1.201.28
rs206684716q1249.3*NA1.49×10-242.98×10-24NANOD20.018C3.992.57
rs254215118p1112.73 - 12.881.19×10-112.41×10-75.10×10-171P TPN20.152G1.351.14
(b) Novel loci
rs24766011p13113.79 - 114.171.81×10-50.0001011.46×10-87P TPN220.899G1.311.17
rs22749101q23157.65 - 157.723.50×10-70.0004811.46×10-92ITLN10.682C1.141.62
rs92868791q24169.54 - 169.674.02×10-70.0003211.53×10-900.243G1.191.08
rs115843831q32197.60 - 197.776.82×10-72.34×10-61.43×10-1130.697T1.181.20
rs100454315q33158.69 - 158.768.80×10-93.66×10-63.86×10-131IL12B0.708C1.111.36
rs69084256p2220.63 - 20.842.52×10-70.0002788.96×10-101CDKAL10.780C1.211.09
rs77460826q21106.52 - 106.623.70×10-67.7×10-62.44×10-1000.289C1.171.19
rs23014366q27167.32 - 167.523.30×10-73.26×10-71.04×10-123CCR60.463T1.211.16
rs14568937p1250.03 - 50.114.92×10-51.1×10-54.60×10-900.678A1.201.14
rs15513988q24126.60 - 126.624.90×10-60.0001094.50×10-900.619A1.081.25
rs107586699p244.94 - 5.266.80×10-70.000433.46×10-93JAK20.348C1.121.21
rs1758241610p1135.30 - 35.608.48×10-62.53×10-51.79×10-930.345G1.161.26
rs792789411q1375.80 - 76.021.43×10-70.0007321.32×10-91C11orf300.386T1.161.07
rs1117559312q1238.61 - 39.311.33×10-70.0001653.08×10-103LRRK2, MUC190.017T1.541.44
rs376414713q1443.13 - 43.541.61×10-71.33×10-72.08×10-1330.221G1.251.19
rs287250717q2134.63 - 35.342.12×10-60.0002925.00×10-917ORMDL30.473A1.121.24
rs74416617q2137.74 - 37.955.94×10-69.15×10-86.82×10-124STAT30.565A1.181.25
rs173613521q2115.73 - 15.762.06×10-54.58×10-57.40×10-900.565T1.181.10
rs76242121q2244.43 - 44.481.08×10-51.59×10-51.41×10-91ICOSLG0.389G1.131.21

RAF is risk allele frequency in control samples (see Supplementary Table 5 for details). Critical region is in NCBI B35 coordinates, with definition as described in Methods. Risk alleles are defined relative to the + strand of the reference.

regions where causal variants have been convincingly mapped, rendering the LD window uninformative.
PTGER4 is outside the critical region, but was implicated via eQTL analysis.

Replication of 21 new loci

As these 74 regions included the 11 already reported as independently replicated and meeting genome-wide significance thresholds, this replication experiment effectively explored 63 putative associations in novel regions with 11 positive controls (Supplementary Table 1). To identify the true risk factors from these 63 regions, we undertook a replication study involving a total of 2,325 additional Crohn's disease cases and 1,809 controls alongside an independent family-based dataset of 1,339 parent-parent-affected offspring trios.

Results (significance levels and odds ratios) for strongly replicating loci, including all positive controls, are presented in Table 2. The distribution of Z-scores from the 63 putative regions shows a dramatic departure from the null distribution (Figure 3) with 19 novel regions showing significant replication (p < 0.0008 – a value of 0.05/63 representing a conservative threshold expected to be exceeded only once by chance in 20 such replication experiments). SNPs on chromosome 19p13 (replication p = 0.00347, combined p = 2.12×10) and in the MHC (replication p = 0.006, combined p = 5.2×10 - suspected but not previously conclusively established in Crohn's disease) did not reach this conservative threshold, but so convincingly satisfy proposed thresholds for genome-wide significance (p<5×10, Methods) that we propose these as the 20 and 21 additional Crohn's disease associated loci defined here. A further 8 of the 42 remaining loci showed nominal replication (Table 3).

An external file that holds a picture, illustration, etc.
Object name is nihms53734f3.jpg

Distribution of observed Z scores from the 63 novel regions explored, along with the expected distribution under the null (a standard normal with mean 0 and variance 1). Even setting aside the 21 regions reaching genome-wide significance, the distribution is highly skewed – 4 more results exceed a Z of 2 (1 would be expected by chance under the null) whilst none showed a Z of less than -2 (same expectation under the null) suggesting that even more of the regions investigated here are likely to constitute true positive associations when additional data become available.

Table 3

Nominally (p < 0.05) replicated CD risk loci
p valuesOdds ratios
SNPChrCritical regionScanReplicationCombinedNum. genesGene of interestRAFRisk alleleCase CtrlTDT
rs480756919p131.05 -1.151.16×10-80.003472.12×10-920.217C1.021.26
rs7800942p2327.30 - 27.773.82×10-60.003813.14×10-722GCKR0.397T1.081.13
rs37633136p2132.44-32.79 *1.45×10-80.006025.20×10-97BTNL2, DRA, DRB, DQA0.188C1.191.01
rs130034642p1661.09 - 61.143.44×10-50.005654.60×10-61CCDC1390.376G1.161.08
rs99180417q1229.57 - 29.704.02×10-60.01351.07×10-64CCL2, CCL70.726C1.11.08
rs125291986p255.04 - 5.117.08×10-70.01926.96×10-71LYRM40.062G1.121.19
rs173098276p253.36 - 3.422.08×10-60.03912.74×10-61SLC22A230.639T1.11.02
rs77580806q25149.54 - 149.657.28×10-60.0448.78×10-600.274G1.120.99
rs809867318q1117.74 - 17.933.18×10-50.04432.88×10-500.329C1.051.09
rs9179972q11102.31 - 102.642.16×10-50.04932.22×10-55IL18RAP0.222T1.051.11

RAF is risk allele frequency in control samples (see Supplementary Table 5 for details). Critical region is in NCBI B35 coordinates, with definition as described in Methods. Risk alleles are defined relative to the + strand of the reference.

SNPs with p<0.0001 were observed throughout the MHC from 30.2 – 32.9 Mb but only this largest signal from the region was followed up. More detailed study of the MHC will be required to identify and localize potentially independent signals from this region.

It is possible that extreme population substructure in the replication sample could give rise to such a striking excess of hits. While unlikely, this was directly evaluated by the large family-based component of the replication study. Odds ratio estimates from the TDT analysis of the North American, French and Belgian families alone are consistent with those from the UK and Belgian case/control samples (Tables 2 &amp; 3), with all 21 newly defined loci showing odds ratios in the same direction of association with the original scan in the family-based component (and nearly half showing greater OR than in the case-control arm). Importantly, none of the significantly or nominally replicating loci show significant evidence for heterogeneity (across studies or between family-based and population-based arms) when corrected for the number of tests performed. This independent family based evidence (Supplementary Table 6) confirms these alleles constitute true Crohn's disease loci.

For this newly expanded set of 32 unequivocally associated loci, we assessed whether there was evidence of significant pairwise interactions which could add further to the overall variance in liability explained by this set of loci. We performed a case-only analysis of the 3,664 cases in the replication study and observed no interactions that withstood a correction for the number of tests performed (Supplementary Table 2).

Deciphering the genetic architecture of CD

The contributions of the 32 loci to disease risk were computed using a standard liability threshold model and are displayed as a histogram of individual variances (Figure 4). The observations from this variance analysis that many loci were detected for which the current study had low power, and that only a minority of the variance in risk is explained by these 32 loci, suggest that many additional loci are yet to be identified. This is reinforced by the additional 8 nominal replications (Table 3) where only 2 or 3 would be expected by chance, and by the continued excess of small p values when these 40 total regions are removed (Figure 2).

An external file that holds a picture, illustration, etc.
Object name is nihms53734f4.jpg

Histogram of percent variance explained by each of the 32 established CD risk loci. The distribution resembles the long postulated exponential distribution of effect sizes. Dashed line shows the joint power for our meta-analysis to detect (p < 5×10), and for our replication sample to replicate (at Bonferroni corrected p values), a 20% variant explaining a given fraction of variance. Note how quickly this curve moves from nearly zero power to detect tiny effects (less than one tenth of one percent) to nearly full power to detect larger effects (presuming they are well covered by the current generation of GWAS chips). Complete power near the origin would likely reveal a more complete exponential distribution, with many very small effects. These are likely to increase somewhat once the causal variant or variants are identified in each locus. Indeed, NOD2 and IL23R are distant outliers, each explaining 1-2% of total variance, partially because multiple causal variants have already been discovered at these loci613.

While recognizing that fine-mapping is required to identify specific causal variants, we performed a series of analyses to gain some general insight into the CD associations. We first queried HapMap to discover any instances where a non-synonymous SNP (nsSNP) was correlated (r > 0.5) to the most associated variant discovered in this study. Accepting that HapMap is not a complete catalogue of nsSNPs, but including four loci where fine-mapping has identified coding variants, just 9 of the 32 genomewide significant associations were correlated with a known nsSNP (Supplementary Table 3). To explore whether any of the associations reflect a cis-acting regulatory effect on a nearby gene, we evaluated genotype-expression correlation using the panel of 400 lymphoblastoid cell lines described by Dixon et al.17. From all genes within 250 kb of the LD-based intervals defined in Table 2 and and3,3, five correlations between expression of a nearby gene and a CD-associated variant were identified (LOD > 2) (Supplementary Table 4). This was far in excess of chance (p∼0.001) (Supplementary Figure 1) and suggests that regulatory variation also contributes to the genetic architecture identified.

Discussion

Genome-wide association studies provide a systematic assessment of the contribution of common variation to disease pathogenesis. A limiting factor is often the size of the case-control dataset, and hence the power to detect any but the most strongly associated loci. Meta-analysis of existing data provides an obvious potential solution. As Figure 1 demonstrates, our expectation was that the additional power of the combined dataset would result in the identification of a substantially larger number of readily replicating associations than were derived from any of the smaller, constituent datasets. However, the paradigm of exploring common genetic variation with similar effects across studies (in this case all of European descent) needs testing before its results can be accepted as valid.

On the validity of the method our results are substantially reassuring. All 11 previously confirmed CD susceptibility loci were strongly replicated both in the meta-analysis and follow-up experiment. These include the two widely replicated findings from studies published in 20011315 as well as all of the compelling findings from individual GWAS (Table 2 a). Significantly, we have also identified and replicated 21 new CD susceptibility loci. Using a conservative threshold for significance (only 1 such region would be expected by chance in 20 such experiments), the loci with clear evidence for association in the replication panel include a very high proportion of those showing strongest signals in the meta-analysis (Supplementary Table 1) – 9 of 9 previously unreported regions with p < 5×10 in the combined scan were replicated convincingly - emphasizing the validity of the meta-analysis results. Further emphasizing the robustness of these results, all 21 of these loci exceed a conservative genome-wide level of significance (p < 5×10) by a significant margin (all but two have p < 5×10) - and equivalent strength of association was observed in the family-based subset of our replication sample.

In keeping with other regions recently identified as associated with CD, the 21 new loci do not conform to any obvious pattern in terms of gene content. Thus, as shown in Table 2, some loci (defined by HapMap recombination hotspots flanking the set of correlated, associated variants) contain just a single gene, some contain many genes and others none. Clearly the first category provides the most immediate clues regarding pathogenic mechanisms. These genes are discussed briefly in Box 1, together with a number of genes which constitute striking candidates from regions with only a handful of transcripts. Included among these are compelling functional candidates such as STAT3, JAK2 and IL12B while others, such as CDKAL1 and PTPN22, highlight potentially intriguing contrasts between genetic susceptibility to Crohn's disease and some other complex disorders (Box 1). It is noteworthy – and consistent with previous findings from CD and other complex diseases – that we did not find any strong evidence of deviation from the model of multiplicative (random) effects when we tested for gene-gene interactions among the 32 confirmed associations. This is in spite of the fact that some of these genes seem to affect the same or overlapping pathways.

BOX 1Noteworthy genes within loci newly implicated in Crohn's pathogenesis

  • Chemokine receptor 6 (CCR6): encoding a member of the G protein-coupled chemokine receptor family, this homing receptor is expressed by immature dendritic cells and memory T cells and is important for B-cell differentiation and tissue specific migration of dendritic and T cells during epithelial inflammatory and immunological responses 25. The ligand of this receptor is macrophage inflammatory protein 3 alpha (MIP-3 alpha); both genes are expressed in granulomas of pulmonary sarcoid 26. Recent studies have also demonstrated that CCR6, IL23R and RORγT are selectively expressed by IL-17 producing cells and IFNγ producing TH17/TH1 cells in CD27.

  • Interleukin IL12B: encodes the p40 subunit which is a constituent of both heterodimeric interleukins IL-12 and IL-2328. Association with CD was previously reported5 but not confirmed, and it is also known to be associated with psoriasis7. The key role of the IL12/IL23 pathway in chronic intestinal inflammation is supported by the association between IL23R and CD3 and strong functional evidence from mouse models of colitis 2932.

  • Signal transducer and activator of transcription 3 (STAT3) and Janus kinase 2 (JAK2): the JAK-STAT pathway is a focal point in signal transmission downstream of cytokine and growth factor signals from cell surface receptors to the nucleus to modify transcription of various genes, notably in hematopoietic cells. The present findings are particularly significant, given the role of both genes in IL23R signaling 33, and the central role STAT3 in Th17 differentiation 34. However, JAK2 or STAT3 are also downstream of several other cytokines implicated in CD pathogenesis in addition to interleukin 23, highlighting the pathophysiologic complexity of these new associations. Further complexity is highlighted by the distinctly different roles of STAT3 in innate versus adaptive immunity in murine colitis models: activation of STAT3 in innate immune cells enhances mucosal barrier function whereas STAT3 activation in T-cells exacerbates colitis.

  • Leucine-rich repeat kinase 2 (LRRK2). This gene encodes a multi-domain protein expressed mainly in the cytoplasm of neurons, myeloid cells and monocytes, and mutations in LRRK2 have been strongly associated with Parkinson's disease35. A recent study reported the induction of autophagy by mutant LRRK2,41 which is of interest given the strong associations between CD and the autophagy genes ATG16L1 and IRGM.25 The same locus also contains the gene MUC19, which encodes a large protein with multiple serine/threonine-rich repeats characteristic of the mucin gene family. The mucin proteins are core components of the mucus layer which protects the intestinal epithelia from injury, and mucin-deficiency potentiates intestinal inflammation in mouse models of colitis36.

  • CDKAL1: the protein encoded by this gene is poorly characterized, but CDKAL1 is noteworthy for being recently confirmed as a type 2 diabetes susceptibility gene 243739. In this study, we find that SNPs from the same intron of CDKAL1 that shows association with T2D are associated with CD, but the associated alleles for the two diseases are not correlated with each other.

  • Inducible T-cell co-stimulator ligand (ICOSLG): this co-stimulatory molecule is expressed on intestinal (and other) epithelial cells and may play a role in their antigen presentation to and regulation of mucosal T lymphocytes40. Upon maturation, plasmacytoid dendritic cells express ICOSLG and drive the generation of IL-10 producing T regulatory cells41.

  • Protein tyrosine phosphatase, non-receptor types 2 and 22 (PTPN2 and PTPN22). Both of these genes are associated with other autoimmune and inflammatory diseases and the effect described here for PTPN2 is similar to that previously described for type 1 diabetes (T1D) 42. However, the association of PTPN22 with CD, although mapping to the same coding variant (R602W) that is a risk factor for T1D and rheumatoid arthritis,4344 is in the opposite direction, with the T1D and RA risk allele, 602W, offering protection from CD.

  • Intelectin 1 (ITLN1) is known to be expressed in human small bowel and colon, and encodes a 120-kDa homotrimeric lectin recognizing galactofuranosyl residues found in cell walls of various microorganisms but not in mammals45. Human intelectin-1 is structurally identical to the lactoferrin receptor (LFR), expressed within the enterocyte brush border, and appears critical in membrane stabilization, preventing loss of digestive enzymes, and protecting the glycolipid microdomains from pathogens46. In addition, intelectin expression is reported in Paneth cells in both mouse and pig small intestine, further pointing to a role in innate immunity.

For loci containing multiple genes or no genes the picture is less well defined. The identified paucity of correlation between associated SNPs and coding variation suggests that these loci may, in particular, benefit from eQTL (expression quantitative trait locus) analysis. This seeks correlation between genotype and expression patterns – bearing in mind that such functional relationships need not respect the specific boundaries of LD around the association. One of our groups previously reported an eQTL effect incriminating PTGER4 at the 5p13 locus9. A striking outcome from our present analysis was at the established IBD5 locus 15, where CD-associated SNPs were associated with decreased SLC22A5 mRNA expression levels. While a SNP had previously been proposed as regulating SLC22A5 transcriptional activity18, these data suggest for the first time that the most disease-associated variants in the IBD5 region, including a coding variant in neighboring SLC22A4, are the same variants most associated with SLC22A5 expression. Equally striking, the most significant Crohn's disease associated eQTL reported here affects ORMDL3 (LOD = 20) on chromosome 17 and SNPs in precisely the same region were recently shown to be strongly associated with childhood asthma.19 This suggests that the same polymorphisms might underlie susceptibility to both CD and asthma, possibly by perturbing ORMDL3 expression.

The new loci that we have identified are of modest effect size, which is unsurprising given all loci with larger impact on disease risk were – as might be expected – discovered in the original scans. The small sizes of these effects explains the lack of overlap between linkage results in CD and these newly discovered loci (Supplementary Figure 2), with the possible exceptions of combined effects of multiple high ranking associations on chromosomes 5q and 6p. Indeed, the linkage evidence that led to the discovery of the IBD5 locus was very likely boosted by the nearby effects at IL12B and IRGM. As expected, the only gene conclusively discovered via linkage (NOD2) is one of two loci which stand well out from the remainder of the distribution of effect sizes (Figure 4). The other outlier, IL23R, illustrates an interesting characteristic of linkage – because (unlike NOD2) the most penetrant risk allele has very high frequency (93%), it is nearly invisible to linkage analysis despite the high OR; highly protective rare alleles are simply not present in multiplex affected families and thus do not influence allele sharing substantially.

Using a liability-threshold model, we estimate that the 32 loci identified to date explain about 10% of the overall variance in disease risk, which may be as much as a fifth of the genetic risk, given previous estimates of CD heritability of approximately 50%.20 This observation is consistent with the fact that these loci collectively contribute only a factor of two to sibling relative risk (λs), and even this figure is dominated by the substantial contribution of NOD2 variants. However, it should be emphasized that the full impact of the new loci cannot be determined until causal variants have been identified by directed sequencing and fine-mapping experiments. Until then the proportion of the variance in Crohn's disease risk explained must be measured from the confirmed SNPs, where association is due to LD with causal variants. Since multiple causal variants might exist at each locus (ranging in frequency from rare to common) our estimates of variance explained provide only a lower bound for the true contribution of each locus.

In conjunction with results from a very similar gene discovery effort in type 2 diabetes21, common lessons are beginning to emerge with respect to the genetic architecture of complex traits. In each example, substantial increase in sample size achieved through meta-analysis has led to dramatic success in gene discovery. In all cases, this progress has revealed an underlying architecture consistent with many individually modest effects which conventional genetic linkage analysis, and even the largest individual genome-wide association studies, are not well powered to detect. Common variants explaining more than 1% of the genetic variance are rare, whereas well-powered studies have found dozens of variants contributing 0.1% of overall variance in liability. Perhaps surprisingly, neither we nor others have yet to document a substantial role for epistasis among these loci and a number of associated loci are conclusively mapped to regions with no currently annotated protein coding genes. Despite the considerable concordant success, a distinct minority of the overall heritability has been explained by these documented associations.

Since our study is well-powered to identify loci that explain > 0.2% of the overall variance, but the sum of such loci explains a relatively small fraction of the total, it seems likely that many loci with even more modest effect sizes remain undiscovered. Of particular note is the continued excess of associations outside of the regions studied here, as well as the nominal replication of an additional 8 loci, notably greater than expected by chance. Overall, the distribution of Z scores in the replication experiment is clearly skewed towards replication – only 11 of the 63 Z-scores in this replication experiment generate Z<0. If only the 21 strongly confirmed loci were genuinely associated, half of the 42 remaining should end up with Z<0. Indeed, observing 8 of the 42 remaining tests with Z>1.5 is itself a highly significant observation (p < 0.0001). Although modest in terms of effect size, identification of such loci is likely to still provide important insights into pathogenic mechanisms, as biological importance need not be proportional to the statistical evidence for genetic association. Closer inspection of regions showing nominal association in the replication experiment reveals that a number of transcripts in these loci are of considerable interest, including CCL2/CCL722, IL18RAP23 and GCKR24.

It is important to note that the generation of GWAS arrays used in the scans here did not offer complete genome coverage of common variation (additional loci may reside in poorly covered intervals) and did not address either rare SNPs or copy number variation effectively. Thus in spite of the wealth of new susceptibility genes and loci identified by the current study, it seems implausible that there are not more to be found – albeit very large datasets are likely to be required to achieve robust statistical support for them. With respect to the present findings, there is much work to be done in resequencing and fine mapping to identify causal variants. While we do not yet have a complete understanding of the genetic architecture of Crohn's disease, dramatic progress has now been made towards this goal - and with it the prospect of directed functional exploration of the pathways identified, insight into how risk alleles interact with environmental modifiers, and the hope of new avenues for treatment.

Methods

Crohn's disease patients, controls, and GWAS

The meta-analysis was based on data from the 3 genome-wide scans of the NIDDK4, WTCCC5 and Belgian/French9 studies. Details of the numbers of cases and controls genotyped in the respective scans and of the genotyping platforms used are shown in Table 1, as are case/control and family cohorts genotyped in the replication study of the meta-analysis. Details of the ascertainment and characterization of these cohorts, as well as quality control procedures applied to the GWA datasets, were provided in the original scan and replication publications 34569. Recruitment of study subjects was approved by local and national institutional review boards, and informed consent was obtained from all participants.

Imputation

Briefly, these methods rely on observed haplotype patterns in a set of reference data (the HapMap) and the actual genotype data from each project to make predictions (along with a measure of statistical certainty) at un-genotyped SNPs. We used the program MACH 10 with the NIDDK and Belgian/French data, and IMPUTE 11 with the WTCCC data. Comparisons between the two algorithms yielded very similar results (data not shown). We imputed the superset of polymorphic markers which passed QC in the original scans459. This set was comprised of SNPs on either the Affymetrix 500K only (n = 350,507), Illumina HumanHap300 version 1 only (n = 238,935), or both panels (n = 46,105) such that all association tests performed were at least partially based on observed genotype data.

Test for association, effect size estimation and interactions

Using the genotype probabilities (rather than best-guess genotypes) and empirical variances for imputed markers in the case and control tallies, we summarized the standard 1 d.f. allele-based test of association as a Z-score within each scan and combined scores across studies to produce a single meta-statistic for each SNP across all three datasets. Odds ratios were estimated separately in TDT samples and each case/control replication collection, and then combined and tested for heterogeneity. 47 Interaction tests were performed using the case-only epistasis test implemented in PLINK48.

Critical regions

Given that most associations contain many correlated SNPs showing signal, we demarcated independent loci by first defining the set of HapMap SNPs with r > 0.5 to the most significantly associated SNP. We then bounded the “critical region” by the flanking HapMap recombination hotspots which contained this set. These windows very likely contain the causal polymorphisms explaining the associations.

Replication

We defined loci to have been previously confirmed if an earlier study had both detected and replicated the association in independent samples and the association achieved p < 5 × 10 (recently proposed as an appropriate genome-wide significance level for GWAS49). For replication genotyping, we selected the most significantly associated SNP from each region along with a second, correlated SNP with p<0.0001 or a second assay on the opposite strand in order to have a technical backup should the first fail genotyping (Supplementary Table 1). Replication genotyping for the putatively associated loci was performed using primer extension chemistry and mass spectrometric analysis (iPLEX, Sequenom) using Sequenom Genetics Services (N. American panel) and Genome Research Limited, Wellcome Trust Sanger Institute (UK panel), and using a custom-made Golden Gate assay on a Beadstation500 (Illumina), following the manufacturer's recommendations (Belgian/French panel). The more completely genotyped SNP of the two from each region was chosen to represent that regional association in analysis (if both were completely typed, the SNP that was more strongly associated in the scan was used). Samples with >10% missing data (n = 267 for Belgian/French data, 111 for the UK data and 8 for the N. American data; these samples are not included in the tallies for Table 1), as well as SNPs with >10% missing data or Hardy-Weinberg p value < 0.001 were excluded from this analysis.

Regional Annotation: eQTL analysis

Effects of SNPs in Tables 2 &amp; 3 on expression levels of neighbouring genes was studied using transcriptome data from the ∼400 lymphoblastoid cell lines described by Dixon et al.17. SNPs that were not genotyped on this panel (n=14) were replaced with a proxy with r > 0.95 when possible (n=12). LOD scores > 2 for genes (probe average) located within 250 Kb of the corresponding LD windows were retrieved from http://www.sph.umich.edu/csg/liang/asthma/. To evaluate the significance of the findings with the CD associated SNPs, we compared the observed (i) number of genes yielding LOD scores > 2, and (ii) sum of these LOD scores, with the corresponding frequency distributions for 1,000 randomly selected sets of 31SNPs, matched for allele frequency (± 0.02) and gene context. Window sizes determined for associated SNPs were used for the matched simulated SNPs.

URL

Meta-analysis test statistics and allele frequencies for all SNPs are available at: http://www.broad.mit.edu/∼jcbarret/ibd-meta/

Crohn's disease patients, controls, and GWAS

The meta-analysis was based on data from the 3 genome-wide scans of the NIDDK4, WTCCC5 and Belgian/French9 studies. Details of the numbers of cases and controls genotyped in the respective scans and of the genotyping platforms used are shown in Table 1, as are case/control and family cohorts genotyped in the replication study of the meta-analysis. Details of the ascertainment and characterization of these cohorts, as well as quality control procedures applied to the GWA datasets, were provided in the original scan and replication publications 34569. Recruitment of study subjects was approved by local and national institutional review boards, and informed consent was obtained from all participants.

Imputation

Briefly, these methods rely on observed haplotype patterns in a set of reference data (the HapMap) and the actual genotype data from each project to make predictions (along with a measure of statistical certainty) at un-genotyped SNPs. We used the program MACH 10 with the NIDDK and Belgian/French data, and IMPUTE 11 with the WTCCC data. Comparisons between the two algorithms yielded very similar results (data not shown). We imputed the superset of polymorphic markers which passed QC in the original scans459. This set was comprised of SNPs on either the Affymetrix 500K only (n = 350,507), Illumina HumanHap300 version 1 only (n = 238,935), or both panels (n = 46,105) such that all association tests performed were at least partially based on observed genotype data.

Test for association, effect size estimation and interactions

Using the genotype probabilities (rather than best-guess genotypes) and empirical variances for imputed markers in the case and control tallies, we summarized the standard 1 d.f. allele-based test of association as a Z-score within each scan and combined scores across studies to produce a single meta-statistic for each SNP across all three datasets. Odds ratios were estimated separately in TDT samples and each case/control replication collection, and then combined and tested for heterogeneity. 47 Interaction tests were performed using the case-only epistasis test implemented in PLINK48.

Critical regions

Given that most associations contain many correlated SNPs showing signal, we demarcated independent loci by first defining the set of HapMap SNPs with r > 0.5 to the most significantly associated SNP. We then bounded the “critical region” by the flanking HapMap recombination hotspots which contained this set. These windows very likely contain the causal polymorphisms explaining the associations.

Replication

We defined loci to have been previously confirmed if an earlier study had both detected and replicated the association in independent samples and the association achieved p < 5 × 10 (recently proposed as an appropriate genome-wide significance level for GWAS49). For replication genotyping, we selected the most significantly associated SNP from each region along with a second, correlated SNP with p<0.0001 or a second assay on the opposite strand in order to have a technical backup should the first fail genotyping (Supplementary Table 1). Replication genotyping for the putatively associated loci was performed using primer extension chemistry and mass spectrometric analysis (iPLEX, Sequenom) using Sequenom Genetics Services (N. American panel) and Genome Research Limited, Wellcome Trust Sanger Institute (UK panel), and using a custom-made Golden Gate assay on a Beadstation500 (Illumina), following the manufacturer's recommendations (Belgian/French panel). The more completely genotyped SNP of the two from each region was chosen to represent that regional association in analysis (if both were completely typed, the SNP that was more strongly associated in the scan was used). Samples with >10% missing data (n = 267 for Belgian/French data, 111 for the UK data and 8 for the N. American data; these samples are not included in the tallies for Table 1), as well as SNPs with >10% missing data or Hardy-Weinberg p value < 0.001 were excluded from this analysis.

Regional Annotation: eQTL analysis

Effects of SNPs in Tables 2 &amp; 3 on expression levels of neighbouring genes was studied using transcriptome data from the ∼400 lymphoblastoid cell lines described by Dixon et al.17. SNPs that were not genotyped on this panel (n=14) were replaced with a proxy with r > 0.95 when possible (n=12). LOD scores > 2 for genes (probe average) located within 250 Kb of the corresponding LD windows were retrieved from http://www.sph.umich.edu/csg/liang/asthma/. To evaluate the significance of the findings with the CD associated SNPs, we compared the observed (i) number of genes yielding LOD scores > 2, and (ii) sum of these LOD scores, with the corresponding frequency distributions for 1,000 randomly selected sets of 31SNPs, matched for allele frequency (± 0.02) and gene context. Window sizes determined for associated SNPs were used for the matched simulated SNPs.

URL

Meta-analysis test statistics and allele frequencies for all SNPs are available at: http://www.broad.mit.edu/∼jcbarret/ibd-meta/

Supplementary Material

Suplemental Material

Suplemental Figures

Suplemental Material

Click here to view.(600K, pdf)

Suplemental Figures

Click here to view.(92K, pdf)

Acknowledgments

We acknowledge use of DNA from the 1958 British Birth Cohort collection (R.Jones, S. Ring, W. McArdle and M. Pembrey), funded by the Medical Research Council (grant G0000934) and The Wellcome Trust (grant 068545/Z/02) and the UK Blood Services Collection of Common Controls (W. Ouwehand) funded by the Wellcome Trust. We also acknowledge the National Association for Colitis and Crohn's disease and the Wellcome Trust for supporting the case DNA collections, and support from UCB Pharma (unrestricted educational grant) and the NIHR Cambridge Biomedical Research Centre. The National Institute of Diabetes and Digestive and Kidney Disease (NIDDK) IBD Genetics Consortium is funded by the following grants: DK62431 (S.R.B.), DK62422 (J.H.C.), DK62420 (R.H.D.), DK62432 and {"type":"entrez-nucleotide","attrs":{"text":"DK064869","term_id":"187443277","term_text":"DK064869"}}DK064869 (J.D.R.), DK62423 (M.S.S.), DK62413 (K.D.T.), NIH-AI06277 (R.J.X.) and DK62429 (J.H.C.). Additional support was provided by the Burroughs Wellcome Foundation (J.H.C.), the Crohn's and Colitis Foundation of America (S.R.B., J.H.C.). We thank Peter Gregersen and Annette Lee (Feinstein Medical Research Institute) for their efforts and the use of control samples. This work was supported by grants from (i) the DGTRE from the Walloon Region (n°315422 and CIBLES), (ii) from the Communauté Française de Belgique (Biomod ARC), and (iii) the Belgian Science Policy organisation (SSTC Genefunc and Biomagnet PAI). Edouard Louis, Sarah Hansoul, Denis Franchimont and Severine Vermeire are fellows of the Belgian FNRS and NFWO. Cynthia Sandor is a fellow of the FRIA. We are grateful to all the clinicians, consultants and nursing staff who recruited patients, including: Jean-Marc Maisin*, Vinciane Muls*, Jean Van Cauter*, Marc Van Gossum*, Philippe Closset*, Pierre Hayard* and Jean Michel Ghilain*; Paul Mainguet°, Faddy Mokaddem°, Fernand Fontaine°, Jacques Deflandre°, and Hubert Demolin°; Jean-Frédéric Colombel, Marc Lemann, Sven Almer, Curt Tysk, Yigael Finkel, Miquel Gassul, Colm O'Morain, Vibeke Binder and Jean-Pierre Cézard (*Erasme-BBIH-IBD; ° Ulg Collaborators; INSERM collaborators). Sincere thanks to L. Liang for his assistance in accessing the eQTL database, and to Françoise Merlin for expert technical assistance. Finally, we thank all subjects who contributed samples.

Bioinformatics and Statistical Genetics, Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
Unit of Animal Genomics, GIGA-R &amp; Faculty of Veterinary Medicine, University of Liège, Belgium
University of Chicago, Department of Medicine, 5801 South Ellis, Chicago, Illinois 60637, USA
Yale University, Departments of Medicine and Genetics, Division of Gastroenterology, Inflammatory Bowel Disease (IBD) Center, 300 Cedar Street, New Haven, Connecticut 06519, USA
University of Pittsburgh, Graduate School of Public Health, Department of Human Genetics, 130 Desoto Street, Pittsburgh, Pennsylvania 15261, USA
University of Pittsburgh, School of Medicine, Department of Medicine, Division of Gastroenterology, Hepatology and Nutrition, University of Pittsburgh Medical Center (UPMC) Presbyterian, 200 Lothrop Street, Pittsburgh, Pennsylvania 15213, USA
Université de Montréal and the Montreal Heart Institute, Research Center, 5000 rue Belanger, Montreal, Quebec H1T 1C8, Canada
The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
Johns Hopkins University, Department of Medicine, Harvey M. and Lyn P. Meyerhoff Inflammatory Bowel Disease Center, 1503 East Jefferson Street, Baltimore, Maryland 21231, USA
Johns Hopkins University, Bloomberg School of Public Health, Department of Epidemiology, 615 E. Wolfe Street, Baltimore, Maryland 21205, USA
Mount Sinai Hospital IBD Centre, University of Toronto, 441-600 University Avenue, Toronto, Ontario M5G 1X5, Canada
Medical Genetics Institute and Inflammatory Bowel Disease (IBD) Center, Cedars-Sinai Medical Center, 8700 W. Beverly Blvd., Los Angeles, California 90048, USA
Department of Medicine, Royal Victoria Hospital, McGill University, Montreal, Quebec, H3A 1A1, Canada
The Hospital for Sick Children, University of Toronto, 555 University Avenue, Toronto, Ontario M5G 1X8, Canada
University of Chicago, Department of Health Studies, 5841 S. Maryland Avenue, Chicago, Illinois 60637, USA
Gastrointestinal Unit and Center for Computational and Integrative Biology, Massachusetts General Hospital, Harvard Medical School, 185 Cambridge Street, Boston, Massachusetts 02114, USA
Centre National de Génotypage, Evry, France
Unit of Hepatology and Gastroenterology, Department of Clinical Sciences, GIGA-R, Faculty of Medicine and CHU de Liège, University of Liège, Belgium
Department of gastroenterology, Clinique universitaire St Luc, UCL, Brussels, Belgium
Department of Hepatology and Gastroenterology, Ghent University Hospital, Belgium
Departrment of Gastroenterology, University Hospital Leuven, Belgium
Department of Gastroenterology, Erasmus Hospital, Free University of Brussels, Belgium
INSERM; Université Paris Diderot; Assistance Publique Hôpitaux de Paris; Hopital Robert Debré, Paris, Fance
Gastrointestinal Unit, Division of Medical Sciences, School of Molecular and Clinical Medicine, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
Peninsula Medical School, Barrack Road, Exeter, EX2 5DW
Department of Medical and Molecular Genetics, King's College London School of Medicine, 8th Floor Guy's Tower, Guy's Hospital, London, SE1 9RT, UK
Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
IBD research group, Addenbrooke's Hospital, University of Cambridge, Cambridge CB2 2QQ, UK
Department of Gastroenterology &amp; Hepatology, University of Newcastle upon Tyne, Royal Victoria Infirmary, Newcastle upon Tyne NE1 4LP, UK
Gastroenterology Unit, Radcliffe Infirmary, University of Oxford, Oxford, OX2 6HE, UK
Center for Human Genetic Research, Massachusetts General Hospital, Harvard Medical School, 185 Cambridge Street, Boston, Massachusetts 02114, USA

Abstract

Several new risk factors for Crohn's disease have been identified in recent genome-wide association studies. To advance gene discovery further we have combined the data from three studies (a total of 3,230 cases and 4,829 controls) and performed replication in 3,664 independent cases with a mixture of population-based and family-based controls. The results strongly confirm 11 previously reported loci and provide genome-wide significant evidence for 21 new loci, including the regions containing STAT3, JAK2, ICOSLG, CDKAL1, and ITLN1. The expanded molecular understanding of the basis of disease offers promise for informed therapeutic development.

Abstract

The first genome-wide association studies (GWAS) have identified many common variants associated with complex diseases, and have rapidly expanded our knowledge of the genetic architecture of these traits. Progress in Crohn's disease (CD), a common idiopathic inflammatory bowel disease (IBD) with high heritability (λs ∼ 20-35), has been especially striking, with recent GWAS publications increasing the number of confirmed associated loci from two to more than ten 1. The results have identified new pathogenic mechanisms of IBD and promise to advance fundamentally our understanding of CD biology. These recent discoveries highlight, for instance, the key importance of autophagy and innate immunity25 as determinants of the dysregulated host-bacterial interactions implicated in disease pathogenesis. Furthermore, genetic associations have been shown to be shared between CD and other auto-inflammatory conditions – for example, IL23R variants 6 are also associated with psoriasis7 and ankylosing spondylitis8, and PTPN2 variants with type 1 diabetes35. As in other complex diseases, restricted sample sizes have resulted in early CD studies focusing on only the strongest effects, which turn out to explain only a fraction of the heritability of disease.

We recently published three separate GWA scans for CD in European-derived populations – the details of which are shown in Table 1459. Motivated by the need for larger datasets to improve power to detect loci of modest effect, we carried out a genome-wide meta-analysis from our three CD scans. These analyses, together with a replication study in an equivalently sized, independent panel, have enabled us to identify at genome-wide levels of significance 21 novel Crohn's disease susceptibility genes and loci. This brings the total number of independent loci conclusively associated with Crohn's disease to more than 30 and provides unprecedented insight into both CD pathogenesis as well as the general genetic architecture of a multifactorial disease.

Table 1

Samples used (post QC) in this study
NIDDKBEL/FRUKIBDGCTotal
Scan cases9465361,7483,230
Scan controls9779142,9384,829
Replication cases01,0821,2432,325
Replication controls07871,0221,809
Replication Trios72061901,339
NationalityUSA/CanadianBelgian/FrenchBritish
Scan PlatformIllumina HumanHap300Illumina HumanHap300Affymetrix GeneChip 500K
Replication PlatformSequenomIllumina GoldenGateSequenom

Footnotes

Article on Nature Genetics website: http://www.nature.com/ng/journal/vaop/ncurrent/abs/ng.175.html

Footnotes

References

  • 1. Mathew CGNew links to the pathogenesis of Crohn disease provided by genome-wide association scans. Nat Rev Genet. 2008;9:9–14.[PubMed][Google Scholar]
  • 2. Hampe J, et al A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nat Genet. 2007;39:207–11.[PubMed][Google Scholar]
  • 3. Parkes M, et al Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nat Genet. 2007;39:830–2.[Google Scholar]
  • 4. Rioux JD, et al Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat Genet. 2007;39:596–604.[Google Scholar]
  • 5. WTCCCGenome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–78.[Google Scholar]
  • 6. Duerr RH, et al A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science. 2006;314:1461–3.[Google Scholar]
  • 7. Cargill M, et al A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. Am J Hum Genet. 2007;80:273–90.[Google Scholar]
  • 8. Burton PR, et al Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants. Nat Genet. 2007;39:1329–37.[Google Scholar]
  • 9. Libioulle C, et al A novel susceptibility locus for Crohn's disease identified by whole genome association maps to a gene desert on chromosome 5p13.1 and modulates the level of expression of the prostaglandin receptor EP4. Plos Genetics. 2007[PubMed][Google Scholar]
  • 10. Li Y, Abecasis GRSMach 1.0: Rapid Haplotype Reconstruction and Missing Genotype Inference. Am J Hum Genet. 2006;S79:2290.[PubMed][Google Scholar]
  • 11. Marchini J, Howie B, Myers S, McVean G, Donnelly PA new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–13.[PubMed][Google Scholar]
  • 12. Clayton DG, et al Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet. 2005;37:1243–6.[PubMed][Google Scholar]
  • 13. Hugot JP, et al Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature. 2001;411:599–603.[PubMed][Google Scholar]
  • 14. Ogura Y, et al A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature. 2001;411:603–6.[PubMed][Google Scholar]
  • 15. Rioux JD, et al Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat Genet. 2001;29:223–8.[PubMed][Google Scholar]
  • 16. Yamazaki K, et al Single nucleotide polymorphisms in TNFSF15 confer susceptibility to Crohn's disease. Hum Mol Genet. 2005;14:3499–506.[PubMed][Google Scholar]
  • 17. Dixon AL, et al A genome-wide association study of global gene expression. Nat Genet. 2007;39:1202–7.[PubMed][Google Scholar]
  • 18. Peltekova VD, et al Functional variants of OCTN cation transporter genes are associated with Crohn disease. Nat Genet. 2004;36:471–5.[PubMed][Google Scholar]
  • 19. Moffatt MF, et al Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature. 2007;448:470–3.[PubMed][Google Scholar]
  • 20. Tysk C, Lindberg E, Jarnerot G, Floderus-Myrhed B. Ulcerative colitis and Crohn's disease in an unselected population of monozygotic and dizygotic twins. A study of heritability and the influence of smoking. Gut. 1988;29:990–6.
  • 21. Zeggini E, et al Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40:638–45.[Google Scholar]
  • 22. Wedemeyer J, et al Enhanced production of monocyte chemotactic protein 3 in inflammatory bowel disease mucosa. Gut. 1999;44:629–35.[Google Scholar]
  • 23. Dinarello CAInterleukin-18 and the pathogenesis of inflammatory diseases. Semin Nephrol. 2007;27:98–114.[PubMed][Google Scholar]
  • 24. Saxena R, et al Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316:1331–6.[PubMed][Google Scholar]
  • 25. Salazar-Gonzalez RM, et al CCR6-mediated dendritic cell activation of pathogen-specific T cells in Peyer's patches. Immunity. 2006;24:623–32.[Google Scholar]
  • 26. Facco M, et al Expression and role of CCR6/CCL20 chemokine axis in pulmonary sarcoidosis. J Leukoc Biol. 2007;82:946–55.[PubMed][Google Scholar]
  • 27. Annunziato F, et al Phenotypic and functional features of human Th17 cells. J Exp Med. 2007;204:1849–61.[Google Scholar]
  • 28. Oppmann B, et al Novel p19 protein engages IL-12p40 to form a cytokine, IL-23, with biological activities similar as well as distinct from IL-12. Immunity. 2000;13:715–25.[PubMed][Google Scholar]
  • 29. Hue S, et al Interleukin-23 drives innate and T cell-mediated intestinal inflammation. J Exp Med. 2006;203:2473–83.[Google Scholar]
  • 30. Kullberg MC, et al IL-23 plays a key role in Helicobacter hepaticus-induced T cell-dependent colitis. J Exp Med. 2006;203:2485–94.[Google Scholar]
  • 31. Uhlig HH, et al Differential activity of IL-12 and IL-23 in mucosal and systemic innate immune pathology. Immunity. 2006;25:309–18.[PubMed][Google Scholar]
  • 32. Yen D, et al IL-23 is essential for T cell-mediated colitis and promotes inflammation via IL-17 and IL-6. J Clin Invest. 2006;116:1310–6.[Google Scholar]
  • 33. Parham C, et al A receptor for the heterodimeric cytokine IL-23 is composed of IL-12Rbeta1 and a novel cytokine receptor subunit, IL-23R. J Immunol. 2002;168:5699–708.[PubMed][Google Scholar]
  • 34. Mathur AN, et al Stat3 and Stat4 direct development of IL-17-secreting Th cells. J Immunol. 2007;178:4901–7.[PubMed][Google Scholar]
  • 35. Plowey ED, Cherra SJ, 3rd, Liu YJ, Chu CTRole of autophagy in G2019S-LRRK2-associated neurite shortening in differentiated SH-SY5Y cells. J Neurochem. 2008[Google Scholar]
  • 36. Van der Sluis M, et al Muc2-deficient mice spontaneously develop colitis, indicating that MUC2 is critical for colonic protection. Gastroenterology. 2006;131:117–29.[PubMed][Google Scholar]
  • 37. Steinthorsdottir V, et al A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat Genet. 2007;39:770–5.[PubMed][Google Scholar]
  • 38. Scott LJ, et al A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–5.[Google Scholar]
  • 39. Zeggini E, et al Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316:1336–41.[Google Scholar]
  • 40. Nakazawa A, et al The expression and function of costimulatory molecules B7H and B7-H1 on colonic epithelial cells. Gastroenterology. 2004;126:1347–57.[PubMed][Google Scholar]
  • 41. Ito T, et al Plasmacytoid dendritic cells prime IL-10-producing T regulatory cells by inducible costimulator ligand. J Exp Med. 2007;204:105–15.[Google Scholar]
  • 42. Bottini N, et al A functional variant of lymphoid tyrosine phosphatase is associated with type I diabetes. Nat Genet. 2004;36:337–8.[PubMed][Google Scholar]
  • 43. Criswell LA, et al Analysis of families in the multiple autoimmune disease genetics consortium (MADGC) collection: the PTPN22 620W allele associates with multiple autoimmune phenotypes. Am J Hum Genet. 2005;76:561–71.[Google Scholar]
  • 44. Rieck M, et al Genetic variation in PTPN22 corresponds to altered function of T and B lymphocytes. J Immunol. 2007;179:4704–10.[PubMed][Google Scholar]
  • 45. Tsuji S, et al Human intelectin is a novel soluble lectin that recognizes galactofuranose in carbohydrate chains of bacterial cell wall. J Biol Chem. 2001;276:23456–63.[PubMed][Google Scholar]
  • 46. Wrackmeyer U, Hansen GH, Seya T, Danielsen EMIntelectin: a novel lipid raft-associated protein in the enterocyte brush border. Biochemistry. 2006;45:9188–97.[PubMed][Google Scholar]
  • 47. Kazeem GR, Farrall MIntegrating case-control and TDT studies. Ann Hum Genet. 2005;69:329–35.[PubMed][Google Scholar]
  • 48. Purcell S, et al PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.[Google Scholar]
  • 49. Pe'er I, Yelensky R, Altshuler D, Daly MJEstimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol. 2008[PubMed][Google Scholar]
  • 50. Goyette P, et al Gene-centric association mapping of chromosome 3p implicates MST1 in IBD pathogenesis. Mucosal Immunology. 2008;1:131–38.[Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.