Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants.
Journal: 2008/January - Nature Genetics
ISSN: 1546-1718
Abstract:
We have genotyped 14,436 nonsynonymous SNPs (nsSNPs) and 897 major histocompatibility complex (MHC) tag SNPs from 1,000 independent cases of ankylosing spondylitis (AS), autoimmune thyroid disease (AITD), multiple sclerosis (MS) and breast cancer (BC). Comparing these data against a common control dataset derived from 1,500 randomly selected healthy British individuals, we report initial association and independent replication in a North American sample of two new loci related to ankylosing spondylitis, ARTS1 and IL23R, and confirmation of the previously reported association of AITD with TSHR and FCRL3. These findings, enabled in part by increased statistical power resulting from the expansion of the control reference group to include individuals from the other disease groups, highlight notable new possibilities for autoimmune regulation and suggest that IL23R may be a common susceptibility factor for the major 'seronegative' diseases.
Relations:
Content
Citations
(472)
References
(49)
Grants
(2K+)
Diseases
(4)
Chemicals
(4)
Genes
(2)
Organisms
(1)
Processes
(5)
Affiliates
(2)
Similar articles
Articles by the same authors
Discussion board
Nat Genet 39(11): 1329-1337

Association scan of 14,500 nsSNPs in four common diseases identifies variants involved in autoimmunity

+222 authors

RESULTS

Initial genotyping was performed with a custom-made Infinium array (Illumina) and involved 14,436 nsSNPs (assays were synthesized for 16,078 nsSNPs). At the time of study inception, this comprised the complete set of experimentally validated nsSNPs with MAF > 1% in Caucasian samples. In addition, because three of the diseases were of autoimmune aetiology, we also typed a dense set of 897 SNPs throughout the major histocompatibility complex (MHC) which together with 348 nsSNPs in this region provided comprehensive tag SNP coverage (r ≥ 0.8 with all SNPs in ref16). Finally, 103 SNPs were typed in pigmentation genes specifically designed to differentiate between population groups. Similar to previous studies, our data revealed that detailed assessment of initial data is critically important to the process of association inference, as biases in genotype calling lead to clear inflation of false positive rates12,17. This inflation is exaggerated in nsSNP data because they tend to have lower allele frequencies than otherwise anonymous genomic SNPs and genotype calling is often most difficult for rare alleles. If only cursory filtering had been applied in the present case, a large number of striking false-positives would have emerged (Supplementary Figures 1-4). Table 1 displays the total number of SNPs and individuals remaining after genotype and sample quality control procedures (see Methods).

Table 1

Number of individuals and SNPs tested in each cohort

Cohort
ASAITDBCMS58C
Males6101380271732
Females3127621004704734
Number of SNPs genotyped15,43615,43615,43615,43615,436
SNPs with Low GC score783816771802796
SNPs with Low Genotyping133206124218186
Monomorphic SNPs1,8421,8291,8541,8101,687
SNPs with HW p < 10*1297410497132
Differences in missing rate p < 10-451101172309n/a
“Manual” Exclusions3333333333
Total Number of SNPs tested12,70112,57212,57712,374
Only SNPs with HW p < 10 in the 58C control group were excluded from analyses

Association with the MHC

The strongest associations observed in the study were between SNPs in the MHC region and the three auto-immune diseases studied, AS, AITD and MS, with p-values of <10 for each disease (Figure 1). No association of the MHC was seen with BC (p > 10 across the region). For each of the autoimmune diseases, the maximum signal is centered around the known HLA-associated genes (ie, HLA-B in AS, HLA-DRB1 in MS and the MHC Class I-II for AITD), but in all cases it extends far beyond the specific associated haplotype(s). For example, in AS, association was observed at p < 10 across ~ 1.5 Mb. Given the well-known large effect of B27 with AS (odds ratio 100-200 in most populations), the extent of this association signal reflects the fact that with such large effects, even very distant SNPs in modest LD will reveal indirect evidence for association. Strong signals like these may also cloud the evidence for additional HLA loci18. Disentangling similar patterns of association within the MHC has proven extremely challenging in the past and will be addressed in future studies of these data. Here we focus specifically on the nsSNP results.

An external file that holds a picture, illustration, etc.
Object name is ukmss-4500-f0001.jpg

Minus log10 p values for the Armitage test of trend for MHC association with Ankylosing Spondylitis (top panel), Auto-Immune Thyroid Disease (middle panel), and Multiple Sclerosis (bottom panel). Note in particular how evidence for association extends along very long regions of the MHC, reflecting statistical power to detect association even when linkage disequilibrium amongst SNPs is relatively low and/or the possibility of multiple disease-predisposing loci.

Association with nsSNPs

A major advantage of the WTCCC design is the availability of multiple disease cohorts which are similar in terms of ancestry and which have been typed on the same genetic markers1217. Assuming that each disease has at least some genetic loci that differ between diseases, it should be possible to increase power to detect association by combining the other three case groups with the 58C controls19. For each disease we therefore conducted two primary analyses: (1) testing nsSNP associations for each disease against the 1958 Birth Cohort controls (58C); and (2) testing the same associations for each disease against an expanded reference group comprising the combined cases from the other three disease groups plus individuals from the 58C. A similar set of analyses was conducted for each of the autoimmune disorders against a reference group comprising 58C+BC, but the results were very similar to those for the fully expanded groups so here we describe the larger sample (see Supplementary Table 1). In addition, since it is possible that different autoimmune diseases share similar genetic etiologies, we also compared a combined AS, AITD and MS group against the combined set of BC patients and 58C controls. All of our analyses are reported without regard to specific treatment of population structure, as the degree of structure in our final genotype data is not severe (Genomic Control20 λ = 1.07 to 1.13 in the 58C-only datasets; λ = 1.03 to 1.06 in the expanded reference group comparisons, see Table 2), consistent with our recent findings from 17,000 UK individuals involving the same controls12.

Table 2

Estimates of λ for Single and Combined Cohorts

λ
Single Cohort
AS cases vs 19581.07
AITD cases vs 19581.12
BC cases vs 19581.13
MS cases vs 19581.12
Mixed Cohorts
AS cases vs All Others1.03
AITD cases vs All Others1.05
BC cases vs All Others1.04
MS cases vs All Others1.06
IMMUNE cases vs BC and 58C1.04

nsSNP association results (i.e. excluding the MHC region) for each of the four disease groups against the 58C controls are shown in Figure 2 and Table 3. Two SNPs on chromosome 5 reached a high-level of statistical significance for AS (rs27044: p = 1.0 × 10; rs30187: p = 3.0 × 10). This level of significance exceeds the 10-10 thresholds advocated for gene-based scans21, as well as the oft-used Bonferroni correction at p< 0.05 (see refs1221 for a discussion of genome-wide association significance). Both of these markers reside in the gene for ARTS1 (ERAAP, ERAP1), a type II integral transmembrane aminopeptidase with diverse immunological functions. Four additional SNPs display significance at p< 10, with an increasing number of possible associations at more modest significance levels. Several of the more strongly associated SNPs or others in the same genes have previously been associated with the particular disease, and for others there exists functional evidence for their involvement in the particular condition. These include SNPs in the genes FCRL3 and FCRL5 in the case of AITD, IL23R in the case of AS, MEL-18 in the case of BC, and IL7R for MS. The complete list of single-marker association results is provided in Supplementary Table 1.

An external file that holds a picture, illustration, etc.
Object name is ukmss-4500-f0002.jpg

Minus log10 p values for the Armitage test of trend for genome-wide association scans of ankylosing spondylitis, auto-immune thyroid disease, breast cancer and multiple sclerosis. The spacing between SNPs on the plot is uniform and does not reflect distances between the SNPs. The vertical dashed lines reflect chromosomal boundaries. The horizontal dashed lines display the cutoff of p = 10. Note that SNPs within the MHC are not included in this diagram.

Table 3

nsSNPs outside the MHC which meet a point-wise significance level of p < 10 for the Cochran-Armitage test for trend

DiseaseSNPChromosomePosition (bp)MAFORχ p valueGene
AS
rs696698174777462.041.8411.138.5 × 10-4C1orf173
rs104942171119181230.170.7711.626.5 × 10-4TBX15
rs22948511206966279.130.7313.552.3 × 10-4HHAT
rs81925562182368504.010.4512.244.7 × 10-4NEUROD1
rs16876657578645930.023.1013.053.0 × 10-4JMY
rs27044596144608.341.4023.901.0 × 10-6ARTS-1
rs17482078596144622.170.7613.552.3 × 10-4ARTS-1
rs10050860596147966.180.7514.871.1 × 10-4ARTS-1
rs30187596150086.401.3321.823.0 × 10-6ARTS-1
rs2287987596155291.180.7514.311.6 × 10-4ARTS-1
rs2303138596376466.101.5819.411.1 × 10-5LNPEP
rs117508145137528564.160.7710.999.1 × 10-4BRD8
rs119598205149192703.020.4912.414.3 × 10-4PPARGC1B
rs907609111813846.130.7610.919.5 × 10-4SYT8
rs37406911147144987.290.8011.865.7 × 10-4ZNF289
rs1106238512297836.240.7911.825.9 × 10-4JARID1A
rs7302230127179699.081.5714.971.1 × 10-4CLSTN3
AITD
rs10916769120408244.170.7612.105.0 × 10-4FLJ32784
rs64273841154321955.181.4318.971.3 × 10-5FCRL5
rs20121991154322098.171.3513.182.8 × 10-4FCRL5
rs66797931154327170.221.3314.691.3 × 10-4FCRL5
rs75220611154481463.471.2513.782.1 × 10-4FCRL3
rs1047911274611433.151.3411.248.0 × 10-4MRPL53
rs75781992241912838.261.2611.536.9 × 10-4HDLBP
rs374814089036429.000.2811.447.2 × 10-4PPP1R3B
rs1048101826683945.420.8210.989.2 × 10-4ADRA1A
rs797506912132389146.300.8012.065.2 × 10-4ZNF268
rs2271233176644845.070.9411.327.7 × 10-4TEKT1
rs285696618897710.190.7614.001.8 × 10-4ADCYAP1
rs7250822192206311.041.9713.832.0 × 10-4AMH
rs22300182344685331.141.4111.556.8 × 10-4UTX
BC
rs42553781151919300.481.2514.701.3 × 10-4MUC1
rs2107732744851218.101.4010.969.3 × 10-4CCM2
rs49867909117554856.071.5411.467.1 × 10-4TLR4
rs228537411118457383.380.8212.254.7 × 10-4VPS11
rs73138991254231386.032.1013.023.1 × 10-4OR6C4
rs28790971734143085.200.7811.736.1 × 10-4MEL18
rs28225582114593715.130.7313.872.0 × 10-4ABCC13
rs22300182344685331.141.4012.144.9 × 10-4UTX
MS
rs17009792274400978.020.4414.411.5 × 10-4SLC4A5
rs11322003120633526.150.7315.229.6 × 10-5FLJ10902
rs6897932535910332.230.8011.048.9 × 10-4IL7R
rs64701478124517985.361.2310.929.5 × 10-4FLJ10204
rs381851110134309378.241.2812.843.4 × 10-4INPP5A
rs115744221167970565.022.8214.641.3 × 10-4LRP5
rs3887061949110533.481.2211.198.2 × 10-4ZNF45
rs18004371950873232.170.7416.116.0 × 10-5GIPR
rs22818682369451484.501.2611.387.4 × 10-4SAP102

The results for analyses involving the expanded reference group are presented in Supplementary Figure 5 and Supplementary Table 1. Many of the SNPs that showed moderate to strong evidence for association in the initial analysis revealed substantially increased significance with the larger reference group. Notably, these included the SNPs rs27044 (p = 4.0 × 10) and rs30187 (p = 2.1 × 10) in ARTS1, as well as several other variants in this gene. A second SNP, rs7302230 in the Calsyntenin-3 gene on chromosome 12, showed substantially stronger evidence for association in the expanded reference group analysis (p = 5.3 × 10) relative to the 58C-only results (p = 1.1 × 10). Results of the expanded group also showed elevated results for several SNPs which did not appear exceptional in the original (non-combined) analyses, including SNPs in several candidate genes such as sialoadhesin22 and complement receptor 1 for AS, PIK3R2 for MS, and C8B​, IL17R and TYK2 in the combined autoimmune disease analysis. SNP rs3783941 in the thyroid stimulating hormone receptor (TSHR) gene emerged as amongst the most significant in the expanded reference group analyses of AITD (p = 2.1 × 10). Several polymorphisms in the TSHR have previously been associated with Graves’ disease2324. This known association did not reach even the modest significance level of 10 in the original analyses, but adding an additional 3000 further reference samples delineated it from the background noise and further supports the original independent report.

ARTS1 association confirmed in an independent cohort

In order to validate the most exceptional findings from the initial study, we genotyped the ARTS1, CLSTN3 and LNPEP SNPs in 471 independent AS cases and 625 new controls (all North American Caucasian). Table 4 shows the results of the genes examined for AS. The data strongly suggest that the ARTS1 association is genuine. All ARTS1 nsSNPs reveal independent replication in the same direction of effect, with replication significance levels ranging from 4.7 × 10 to 5.1 × 10. When combined with the original samples, the results reveal striking evidence for association with AS (p = 1.2 × 10 to 3.4 × 10). The population attributable risk25 contributed by the most strongly associated marker in the North American dataset (rs2287987) is 26%.

Table 4

Ankylosing Spondylitis Replication Results

GeneSNPUK CasesUS CasesAll Cases
Case MAFControl MAFORp valCase MAFControl MAFORp valCase MAFControl MAFORp val

ARTS1rs270440.340.271.401.0×10-6--------
ARTS1rs174820780.170.220.762.3×10-40.150.210.655.1×10-50.160.220.701.2×10-8
ARTS1rs100508600.180.230.751.2×10-40.150.220.668.8×10-50.170.220.717.6×10-9
ARTS1rs301870.400.331.333.0×10-60.410.351.300.000470.410.341.403.4×10-10
ARTS1rs22879870.180.220.751.6×10-40.150.210.668.4×10-50.170.220.711.0×10-8
LNPEPrs23031380.100.071.581.1×10-50.110.091.400.0180.110.071.481.1×10-6
CLSTN3rs73022300.080.051.571.1×10-40.060.051.100.560.070.051.300.0039
IL23Rrs112090260.040.060.630.00170.0380.060.630.0140.040.060.634.0×10-6
IL23Rrs10048190.350.301.200.00130.350.301.300.00450.350.301.201.1×10-5
IL23Rrs104896290.430.450.900.0620.390.470.724.2×10-50.410.460.830.00011
IL23Rrs114658040.040.060.670.00190.0490.060.680.040.040.060.680.0002
IL23Rrs13431510.300.340.850.00770.290.360.716.7×10-50.300.340.801.0×10-5
IL23Rrs108896770.360.311.200.000660.370.291.404.7×10-50.360.311.301.3×10-6
IL23Rrs112090320.380.321.302.0×10-60.380.321.300.00130.380.321.307.5×10-9
IL23Rrs14959650.490.441.200.00210.500.431.400.000190.490.441.203.1×10-6

Association was also confirmed with marker rs2303138 in the LNPEP gene, which lies 127kb 3′ of ARTS1. This marker is in strong LD with ARTS1 markers (D’ = 1, rs27044 - rs2303138). We tested the interdependence of the ARTS1 and LNPEP associations using conditional logistic regression. The remaining association at LNPEP is weak after controlling for ARTS1 (p = 0.01), whereas the association at ARTS1 remains strong after controlling for LNPEP (p = 2.7 × 10), suggesting that the LNPEP association may only be secondary to LD with a true association at ARTS1.

No association was seen with CLSTN3 in the confirmation set. The US controls exhibited the same allele frequency as the UK controls (5%) but the allele frequency in the US cases was less than that of the UK cases (6% vs 8%), suggesting no association in the US samples and substantially reducing the significance of the combined data. Calystenin-3 is a post-synaptic neuronal membrane protein, and is an unlikely candidate gene for involvement in inflammatory arthritis. The failure to replicate this association suggests that our replication sample size is insufficient to detect the modest effect or it was a false positive in the initial scan.

IL23R variants confer risk of AS

The IL23R variant rs11209026, whilst not striking in the initial nsSNP scan (p = 1.7 × 10), is of particular interest as it was recently associated with both Crohn’s Disease2627 and psoriasis28, conditions which commonly co-occur with AS. To better define this association, 7 additional SNPs in IL23R were genotyped in the same 1000 British AS cases and 1500 58C controls as well as the North American Caucasian replication samples (Table 4). In the WTCCC dataset, strong association was seen in 7 of 8 genotyped SNPs (p ≤ 0.008, including the original nsSNP rs11209026), with the strongest association seen at rs11209032 (p = 2.0 × 10). In the replication dataset, association was observed with all genotyped SNPs (p ≤ 0.04), with peak association observed with marker rs10489629 (p = 4.2 × 10). In the combined dataset, the strongest association observed was with SNP rs11209032 (odds ratio 1.3, 95% CI 1.2 - 1.4, p = 7.5 × 10). The attributable risk for this marker in the replication cohort is 9%. Conditional logistic regression analyses did not reveal a single primary disease-associated marker, with residual association remaining after having controlled for association at the remaining SNPs. Considering only AS cases who self-reported as not having inflammatory bowel disease (n = 1066) the association remained strong and was still strongest at rs11209032 (p = 6.9 × 10), indicating that there is a primary association with AS and that the observed association was not due to coexistent clinical inflammatory bowel disease.

In contrast to the pleiotropic effects of IL23R, the ARTS1 association evidence appears confined to AS. We genotyped the five AS-associated SNPs in 755 British Crohn’s disease and 1011 ulcerative colitis cases, and 633 healthy controls. No association was seen with either UC or CD (Armitage trend p > 0.4 for all markers).

FCRL3 confirmed in AITD pathogenesis

In addition to the AS replications, we attempted to confirm and extend the FCRL3 association in AITD. The SNP rs7522061 in the FCRL3 gene was recently reported to be associated with AITD29 and two other autoimmune diseases, rheumatoid arthritis and systemic lupus erythematosus30. Our initial association evidence (p = 2.1 × 10) likely reflects the signal of the originally detected polymorphism since the level of linkage disequilibrium (LD) is high across this gene. In fact, the entire 1q21-q23 region (which includes another gene, FCRL5, flagged in our scan) has also been implicated in several autoimmune diseases including psoriasis and multiple sclerosis3132.

On the basis of the original findings on 1q21-q23, the original cohort was increased from 1,000 to 2,500 Graves Disease (GD) cases and we used 2,500 different 1958 cohort controls. Eight SNPs that tagged the FCRL3 and FCRL5 gene regions were selected and typed in all 5,000 samples using an alternative genotyping platform. SNP rs3761959, which tags rs7522061 and rs7528684 (previously associated with RA and GD), was associated with GD in this extended cohort (Table 5), therefore, confirming the original result. In total, three of the seven FCRL3 SNPs showed some evidence for association (p < .05) with SNP rs11264798 being the most associated of the tag SNPs, p = 4.0 × 10. SNP rs6667109 in FCRL5, which tagged SNPs rs6427384, rs2012199 and rs6679793, all found to be weakly associated in the original study, showed little evidence of association in this extended cohort.

Table 5

Auto-immune Thyroid Disease Replication Results

GeneSNPReplication CohortCombined Cohort
Case MAFControl MAFORp valCase MAFControl MAFORp val

FCRL3rs3761959*0.480.450.900.0290.490.450.879.4×10-3
FCRL3rs112647940.420.461.100.0790.420.461.120.013
FCRL3rs112647930.270.240.870.0200.260.240.900.044
FCRL3rs112647980.440.491.214.0×10-30.440.491.221.6×10-5
FCRL3rs104896780.190.201.070.300.200.201.040.43
FCRL3rs66915690.280.281.030.600.290.291.000.93
FCRL3rs22822840.0620.0580.920.0130.0620.0580.930.47
FCRL5rs66671090.170.150.910.180.180.150.857.7×10-2
This SNP tags the SNP rs7522061 which was flagged as associated with AITD in the WTCCC screen (p = 2.1 × 10)

Association with the MHC

The strongest associations observed in the study were between SNPs in the MHC region and the three auto-immune diseases studied, AS, AITD and MS, with p-values of <10 for each disease (Figure 1). No association of the MHC was seen with BC (p > 10 across the region). For each of the autoimmune diseases, the maximum signal is centered around the known HLA-associated genes (ie, HLA-B in AS, HLA-DRB1 in MS and the MHC Class I-II for AITD), but in all cases it extends far beyond the specific associated haplotype(s). For example, in AS, association was observed at p < 10 across ~ 1.5 Mb. Given the well-known large effect of B27 with AS (odds ratio 100-200 in most populations), the extent of this association signal reflects the fact that with such large effects, even very distant SNPs in modest LD will reveal indirect evidence for association. Strong signals like these may also cloud the evidence for additional HLA loci18. Disentangling similar patterns of association within the MHC has proven extremely challenging in the past and will be addressed in future studies of these data. Here we focus specifically on the nsSNP results.

An external file that holds a picture, illustration, etc.
Object name is ukmss-4500-f0001.jpg

Minus log10 p values for the Armitage test of trend for MHC association with Ankylosing Spondylitis (top panel), Auto-Immune Thyroid Disease (middle panel), and Multiple Sclerosis (bottom panel). Note in particular how evidence for association extends along very long regions of the MHC, reflecting statistical power to detect association even when linkage disequilibrium amongst SNPs is relatively low and/or the possibility of multiple disease-predisposing loci.

Association with nsSNPs

A major advantage of the WTCCC design is the availability of multiple disease cohorts which are similar in terms of ancestry and which have been typed on the same genetic markers1217. Assuming that each disease has at least some genetic loci that differ between diseases, it should be possible to increase power to detect association by combining the other three case groups with the 58C controls19. For each disease we therefore conducted two primary analyses: (1) testing nsSNP associations for each disease against the 1958 Birth Cohort controls (58C); and (2) testing the same associations for each disease against an expanded reference group comprising the combined cases from the other three disease groups plus individuals from the 58C. A similar set of analyses was conducted for each of the autoimmune disorders against a reference group comprising 58C+BC, but the results were very similar to those for the fully expanded groups so here we describe the larger sample (see Supplementary Table 1). In addition, since it is possible that different autoimmune diseases share similar genetic etiologies, we also compared a combined AS, AITD and MS group against the combined set of BC patients and 58C controls. All of our analyses are reported without regard to specific treatment of population structure, as the degree of structure in our final genotype data is not severe (Genomic Control20 λ = 1.07 to 1.13 in the 58C-only datasets; λ = 1.03 to 1.06 in the expanded reference group comparisons, see Table 2), consistent with our recent findings from 17,000 UK individuals involving the same controls12.

Table 2

Estimates of λ for Single and Combined Cohorts

λ
Single Cohort
AS cases vs 19581.07
AITD cases vs 19581.12
BC cases vs 19581.13
MS cases vs 19581.12
Mixed Cohorts
AS cases vs All Others1.03
AITD cases vs All Others1.05
BC cases vs All Others1.04
MS cases vs All Others1.06
IMMUNE cases vs BC and 58C1.04

nsSNP association results (i.e. excluding the MHC region) for each of the four disease groups against the 58C controls are shown in Figure 2 and Table 3. Two SNPs on chromosome 5 reached a high-level of statistical significance for AS (rs27044: p = 1.0 × 10; rs30187: p = 3.0 × 10). This level of significance exceeds the 10-10 thresholds advocated for gene-based scans21, as well as the oft-used Bonferroni correction at p< 0.05 (see refs1221 for a discussion of genome-wide association significance). Both of these markers reside in the gene for ARTS1 (ERAAP, ERAP1), a type II integral transmembrane aminopeptidase with diverse immunological functions. Four additional SNPs display significance at p< 10, with an increasing number of possible associations at more modest significance levels. Several of the more strongly associated SNPs or others in the same genes have previously been associated with the particular disease, and for others there exists functional evidence for their involvement in the particular condition. These include SNPs in the genes FCRL3 and FCRL5 in the case of AITD, IL23R in the case of AS, MEL-18 in the case of BC, and IL7R for MS. The complete list of single-marker association results is provided in Supplementary Table 1.

An external file that holds a picture, illustration, etc.
Object name is ukmss-4500-f0002.jpg

Minus log10 p values for the Armitage test of trend for genome-wide association scans of ankylosing spondylitis, auto-immune thyroid disease, breast cancer and multiple sclerosis. The spacing between SNPs on the plot is uniform and does not reflect distances between the SNPs. The vertical dashed lines reflect chromosomal boundaries. The horizontal dashed lines display the cutoff of p = 10. Note that SNPs within the MHC are not included in this diagram.

Table 3

nsSNPs outside the MHC which meet a point-wise significance level of p < 10 for the Cochran-Armitage test for trend

DiseaseSNPChromosomePosition (bp)MAFORχ p valueGene
AS
rs696698174777462.041.8411.138.5 × 10-4C1orf173
rs104942171119181230.170.7711.626.5 × 10-4TBX15
rs22948511206966279.130.7313.552.3 × 10-4HHAT
rs81925562182368504.010.4512.244.7 × 10-4NEUROD1
rs16876657578645930.023.1013.053.0 × 10-4JMY
rs27044596144608.341.4023.901.0 × 10-6ARTS-1
rs17482078596144622.170.7613.552.3 × 10-4ARTS-1
rs10050860596147966.180.7514.871.1 × 10-4ARTS-1
rs30187596150086.401.3321.823.0 × 10-6ARTS-1
rs2287987596155291.180.7514.311.6 × 10-4ARTS-1
rs2303138596376466.101.5819.411.1 × 10-5LNPEP
rs117508145137528564.160.7710.999.1 × 10-4BRD8
rs119598205149192703.020.4912.414.3 × 10-4PPARGC1B
rs907609111813846.130.7610.919.5 × 10-4SYT8
rs37406911147144987.290.8011.865.7 × 10-4ZNF289
rs1106238512297836.240.7911.825.9 × 10-4JARID1A
rs7302230127179699.081.5714.971.1 × 10-4CLSTN3
AITD
rs10916769120408244.170.7612.105.0 × 10-4FLJ32784
rs64273841154321955.181.4318.971.3 × 10-5FCRL5
rs20121991154322098.171.3513.182.8 × 10-4FCRL5
rs66797931154327170.221.3314.691.3 × 10-4FCRL5
rs75220611154481463.471.2513.782.1 × 10-4FCRL3
rs1047911274611433.151.3411.248.0 × 10-4MRPL53
rs75781992241912838.261.2611.536.9 × 10-4HDLBP
rs374814089036429.000.2811.447.2 × 10-4PPP1R3B
rs1048101826683945.420.8210.989.2 × 10-4ADRA1A
rs797506912132389146.300.8012.065.2 × 10-4ZNF268
rs2271233176644845.070.9411.327.7 × 10-4TEKT1
rs285696618897710.190.7614.001.8 × 10-4ADCYAP1
rs7250822192206311.041.9713.832.0 × 10-4AMH
rs22300182344685331.141.4111.556.8 × 10-4UTX
BC
rs42553781151919300.481.2514.701.3 × 10-4MUC1
rs2107732744851218.101.4010.969.3 × 10-4CCM2
rs49867909117554856.071.5411.467.1 × 10-4TLR4
rs228537411118457383.380.8212.254.7 × 10-4VPS11
rs73138991254231386.032.1013.023.1 × 10-4OR6C4
rs28790971734143085.200.7811.736.1 × 10-4MEL18
rs28225582114593715.130.7313.872.0 × 10-4ABCC13
rs22300182344685331.141.4012.144.9 × 10-4UTX
MS
rs17009792274400978.020.4414.411.5 × 10-4SLC4A5
rs11322003120633526.150.7315.229.6 × 10-5FLJ10902
rs6897932535910332.230.8011.048.9 × 10-4IL7R
rs64701478124517985.361.2310.929.5 × 10-4FLJ10204
rs381851110134309378.241.2812.843.4 × 10-4INPP5A
rs115744221167970565.022.8214.641.3 × 10-4LRP5
rs3887061949110533.481.2211.198.2 × 10-4ZNF45
rs18004371950873232.170.7416.116.0 × 10-5GIPR
rs22818682369451484.501.2611.387.4 × 10-4SAP102

The results for analyses involving the expanded reference group are presented in Supplementary Figure 5 and Supplementary Table 1. Many of the SNPs that showed moderate to strong evidence for association in the initial analysis revealed substantially increased significance with the larger reference group. Notably, these included the SNPs rs27044 (p = 4.0 × 10) and rs30187 (p = 2.1 × 10) in ARTS1, as well as several other variants in this gene. A second SNP, rs7302230 in the Calsyntenin-3 gene on chromosome 12, showed substantially stronger evidence for association in the expanded reference group analysis (p = 5.3 × 10) relative to the 58C-only results (p = 1.1 × 10). Results of the expanded group also showed elevated results for several SNPs which did not appear exceptional in the original (non-combined) analyses, including SNPs in several candidate genes such as sialoadhesin22 and complement receptor 1 for AS, PIK3R2 for MS, and C8B​, IL17R and TYK2 in the combined autoimmune disease analysis. SNP rs3783941 in the thyroid stimulating hormone receptor (TSHR) gene emerged as amongst the most significant in the expanded reference group analyses of AITD (p = 2.1 × 10). Several polymorphisms in the TSHR have previously been associated with Graves’ disease2324. This known association did not reach even the modest significance level of 10 in the original analyses, but adding an additional 3000 further reference samples delineated it from the background noise and further supports the original independent report.

ARTS1 association confirmed in an independent cohort

In order to validate the most exceptional findings from the initial study, we genotyped the ARTS1, CLSTN3 and LNPEP SNPs in 471 independent AS cases and 625 new controls (all North American Caucasian). Table 4 shows the results of the genes examined for AS. The data strongly suggest that the ARTS1 association is genuine. All ARTS1 nsSNPs reveal independent replication in the same direction of effect, with replication significance levels ranging from 4.7 × 10 to 5.1 × 10. When combined with the original samples, the results reveal striking evidence for association with AS (p = 1.2 × 10 to 3.4 × 10). The population attributable risk25 contributed by the most strongly associated marker in the North American dataset (rs2287987) is 26%.

Table 4

Ankylosing Spondylitis Replication Results

GeneSNPUK CasesUS CasesAll Cases
Case MAFControl MAFORp valCase MAFControl MAFORp valCase MAFControl MAFORp val

ARTS1rs270440.340.271.401.0×10-6--------
ARTS1rs174820780.170.220.762.3×10-40.150.210.655.1×10-50.160.220.701.2×10-8
ARTS1rs100508600.180.230.751.2×10-40.150.220.668.8×10-50.170.220.717.6×10-9
ARTS1rs301870.400.331.333.0×10-60.410.351.300.000470.410.341.403.4×10-10
ARTS1rs22879870.180.220.751.6×10-40.150.210.668.4×10-50.170.220.711.0×10-8
LNPEPrs23031380.100.071.581.1×10-50.110.091.400.0180.110.071.481.1×10-6
CLSTN3rs73022300.080.051.571.1×10-40.060.051.100.560.070.051.300.0039
IL23Rrs112090260.040.060.630.00170.0380.060.630.0140.040.060.634.0×10-6
IL23Rrs10048190.350.301.200.00130.350.301.300.00450.350.301.201.1×10-5
IL23Rrs104896290.430.450.900.0620.390.470.724.2×10-50.410.460.830.00011
IL23Rrs114658040.040.060.670.00190.0490.060.680.040.040.060.680.0002
IL23Rrs13431510.300.340.850.00770.290.360.716.7×10-50.300.340.801.0×10-5
IL23Rrs108896770.360.311.200.000660.370.291.404.7×10-50.360.311.301.3×10-6
IL23Rrs112090320.380.321.302.0×10-60.380.321.300.00130.380.321.307.5×10-9
IL23Rrs14959650.490.441.200.00210.500.431.400.000190.490.441.203.1×10-6

Association was also confirmed with marker rs2303138 in the LNPEP gene, which lies 127kb 3′ of ARTS1. This marker is in strong LD with ARTS1 markers (D’ = 1, rs27044 - rs2303138). We tested the interdependence of the ARTS1 and LNPEP associations using conditional logistic regression. The remaining association at LNPEP is weak after controlling for ARTS1 (p = 0.01), whereas the association at ARTS1 remains strong after controlling for LNPEP (p = 2.7 × 10), suggesting that the LNPEP association may only be secondary to LD with a true association at ARTS1.

No association was seen with CLSTN3 in the confirmation set. The US controls exhibited the same allele frequency as the UK controls (5%) but the allele frequency in the US cases was less than that of the UK cases (6% vs 8%), suggesting no association in the US samples and substantially reducing the significance of the combined data. Calystenin-3 is a post-synaptic neuronal membrane protein, and is an unlikely candidate gene for involvement in inflammatory arthritis. The failure to replicate this association suggests that our replication sample size is insufficient to detect the modest effect or it was a false positive in the initial scan.

IL23R variants confer risk of AS

The IL23R variant rs11209026, whilst not striking in the initial nsSNP scan (p = 1.7 × 10), is of particular interest as it was recently associated with both Crohn’s Disease2627 and psoriasis28, conditions which commonly co-occur with AS. To better define this association, 7 additional SNPs in IL23R were genotyped in the same 1000 British AS cases and 1500 58C controls as well as the North American Caucasian replication samples (Table 4). In the WTCCC dataset, strong association was seen in 7 of 8 genotyped SNPs (p ≤ 0.008, including the original nsSNP rs11209026), with the strongest association seen at rs11209032 (p = 2.0 × 10). In the replication dataset, association was observed with all genotyped SNPs (p ≤ 0.04), with peak association observed with marker rs10489629 (p = 4.2 × 10). In the combined dataset, the strongest association observed was with SNP rs11209032 (odds ratio 1.3, 95% CI 1.2 - 1.4, p = 7.5 × 10). The attributable risk for this marker in the replication cohort is 9%. Conditional logistic regression analyses did not reveal a single primary disease-associated marker, with residual association remaining after having controlled for association at the remaining SNPs. Considering only AS cases who self-reported as not having inflammatory bowel disease (n = 1066) the association remained strong and was still strongest at rs11209032 (p = 6.9 × 10), indicating that there is a primary association with AS and that the observed association was not due to coexistent clinical inflammatory bowel disease.

In contrast to the pleiotropic effects of IL23R, the ARTS1 association evidence appears confined to AS. We genotyped the five AS-associated SNPs in 755 British Crohn’s disease and 1011 ulcerative colitis cases, and 633 healthy controls. No association was seen with either UC or CD (Armitage trend p > 0.4 for all markers).

FCRL3 confirmed in AITD pathogenesis

In addition to the AS replications, we attempted to confirm and extend the FCRL3 association in AITD. The SNP rs7522061 in the FCRL3 gene was recently reported to be associated with AITD29 and two other autoimmune diseases, rheumatoid arthritis and systemic lupus erythematosus30. Our initial association evidence (p = 2.1 × 10) likely reflects the signal of the originally detected polymorphism since the level of linkage disequilibrium (LD) is high across this gene. In fact, the entire 1q21-q23 region (which includes another gene, FCRL5, flagged in our scan) has also been implicated in several autoimmune diseases including psoriasis and multiple sclerosis3132.

On the basis of the original findings on 1q21-q23, the original cohort was increased from 1,000 to 2,500 Graves Disease (GD) cases and we used 2,500 different 1958 cohort controls. Eight SNPs that tagged the FCRL3 and FCRL5 gene regions were selected and typed in all 5,000 samples using an alternative genotyping platform. SNP rs3761959, which tags rs7522061 and rs7528684 (previously associated with RA and GD), was associated with GD in this extended cohort (Table 5), therefore, confirming the original result. In total, three of the seven FCRL3 SNPs showed some evidence for association (p < .05) with SNP rs11264798 being the most associated of the tag SNPs, p = 4.0 × 10. SNP rs6667109 in FCRL5, which tagged SNPs rs6427384, rs2012199 and rs6679793, all found to be weakly associated in the original study, showed little evidence of association in this extended cohort.

Table 5

Auto-immune Thyroid Disease Replication Results

GeneSNPReplication CohortCombined Cohort
Case MAFControl MAFORp valCase MAFControl MAFORp val

FCRL3rs3761959*0.480.450.900.0290.490.450.879.4×10-3
FCRL3rs112647940.420.461.100.0790.420.461.120.013
FCRL3rs112647930.270.240.870.0200.260.240.900.044
FCRL3rs112647980.440.491.214.0×10-30.440.491.221.6×10-5
FCRL3rs104896780.190.201.070.300.200.201.040.43
FCRL3rs66915690.280.281.030.600.290.291.000.93
FCRL3rs22822840.0620.0580.920.0130.0620.0580.930.47
FCRL5rs66671090.170.150.910.180.180.150.857.7×10-2
This SNP tags the SNP rs7522061 which was flagged as associated with AITD in the WTCCC screen (p = 2.1 × 10)

DISCUSSION

Our scan of nsSNPs has identified and validated two new genes for AS (ARTS1 and IL23R), confirmed and extended markers in the TSHR and FCRL3 genes which have previously been associated with AITD, and provided a dense set of association data for AITD, AS and MS across the MHC region. The challenge now is to design functional studies that will reveal how variation in these genes translates into physiological processes that influence disease risk.

From a functional perspective, the ARTS1 and IL23R genes represent excellent biological candidates. ARTS1 has two known functions, either of which may explain its association with AS. Within the endoplasmic reticulum, ARTS1 is involved in trimming peptides to the optimal length for MHC Class I presentation3334. AS is primarily an HLA Class I mediated autoimmune disease35, with >90% of cases carrying the HLA-B27 allele. How B27 increases risk of AS is unknown, but if the mechanism of association of ARTS1 with the disease is through effects on peptide presentation, this would inform research into the mechanism explaining the association of B27 with AS. The second known function of ARTS1 is that it cleaves cell surface receptors for the pro-inflammatory cytokines IL-1 (IL-1R2)36, IL-6 (IL-6Rα)37 and TNF (TNFR1)38, thereby downregulating their signaling. Genetic variants that alter the functioning of ARTS1 could therefore have pro-inflammatory effects through this mechanism.

As well as AS, polymorphisms in IL23R have been recently documented in Crohn’s Disease 2627 and psoriasis 28, suggesting that this gene is a common susceptibility factor for the major ‘seronegative’ diseases, at least partially explaining their co-occurrence. IL-23R is a key factor in the regulation of a newly defined effector T-cell subset, TH17 cells. TH17 cells were originally identified as a distinct subset of T-cells expressing high levels of the pro-inflammatory cytokine IL-17 in response to stimulation, in addition to IL-1, IL-6, TNFa, IL-22 and IL-25 (IL-17E). IL-23 has been shown to be important in the mouse models experimental autoimmune encephalomyelitis39, collagen-induced arthritis40 and mouse models of inflammatory bowel disease41, but has not been studied in AS, either in humans or animal models of disease. These studies show that blocking IL-23 reduces inflammation in these models, suggesting that the IL23R variants associated with disease are pro-inflammatory. Successful treatment of Crohn’s disease has been reported with anti-IL-12p40 antibodies, which block both IL-12 and IL-23, as these cytokines share the IL-12p40 chain42. No functional studies of IL23R variants have been reported to date, and it is unclear to what extent findings in studies targeting IL-23 can be generalised to mechanisms by which IL23R variation affects disease susceptibility. Our genetic findings provide a major novel insight into the aetiopathogenesis of AS, and suggest that treatments targeting IL-23 may prove effective in this condition, but clearly much more needs to be understood about the mechanism underlying the observed association.

Despite the successful identification of the ARTS1 and IL23R genes, it is likely that additional real associations are either present in our data but with modest effect sizes, or that our focus on non-synonymous coding changes led us to miss real genes. For example, we found no evidence for association at even a nominal p < 0.05 in or within 2 Mb of the recently reported and validated breast cancer gene FGFR3343 (2 nsSNPs in FGFR3 were included in our panel, rs1078816 and rs755793; the former yielded p = 0.12 and the latter was monomorphic in these samples), nor in or near any of the other suggestive breast cancer genes reported in refs3,43. Lack of statistical power in 1000 cases and 1500 controls is a likely contributor to this lack of replication, but some of these loci, notably FGFR3, appear to be intronic and thus would likely have been missed even with larger samples.

The issue of statistical power is emphasized in studies of non-synonymous coding changes, which have a greater number of rare variants than other genetic variants and thus will require even larger sample sizes unless the effect sizes are larger. Other analytical approaches, such as assessing evidence for association between clusters of rare variants rather than individual loci may prove highly informative in this regard44, but most of the nsSNPs available in this study exist either by themselves in each gene or with 1-2 others, which precludes these assessments (Supplementary Figure 6). In our analyses, ARTS1 was the only locus showing exceptional statistical significance in the scan of 1000 cases and 1500 controls, thus emphasizing the need for greater statistical power. We increased power by expanding the controls, or ‘reference set’, to include some or all of the other disease samples. In doing so, ARTS1 showed strong association evidence, the IL23R SNPs increased to a level that began to delineate them from background noise, and the AITD/TSHR confirmation emerged. This demonstration of increased statistical power by combining multiple datasets is timely given the international impetus to make genotype data available to the scientific community. Future investigations will be needed to assess the power vs confounding effects and the statistical corrections needed to combine more heterogeneous samples from broader sampling regions.

These results also highlight the question of how much information may be missed by focusing on coding SNPs rather than searching more broadly over the genome at large. This question is relevant because the trade-off between SNP panel selection and sample size to genotype is a salient factor in every genome-wide study design. In the HapMap data45, a substantial portion of the common non-synonymous variation in our nsSNP set is captured by available genome-wide panels (about 65% of common (MAF > 5%) nsSNPs in the Illumina Human NS-12 Beadchip are tagged with an r > .8 using the Affymetrix 500K chip, rising to 90% for the Illumina HumanHap300 which includes almost all of the nsSNPs from the NS-12 Beadchip). The four primary associated variants flagged in our study (i.e., in ARTS1, IL23R, TSHR and FCRL3) would have been detected using any of the genome-wide panels, since either the markers themselves or a SNP in high LD with them (r ≥ .78), are present on the genome-wide chips. This LD relationship also emphasizes the fact that observing an association with a nsSNP does not necessarily imply that the nsSNP is causal, as it may be indirectly associated with other genetic variants in or outside the gene. Given this high degree of overlap, the continuously increasing coverage of many available genotyping products and concomitant pressures to decrease assay costs, these data suggest that future gene-centric scans will be efficiently subsumed by the more encompassing and less hypothesis-driven genome-wide SNP panels.

METHODS

Subjects

Individuals included in the study were self-identified as white Europeans and came from mainland UK (England, Scotland and Wales, but not Northern Ireland). The 1500 control samples were from the British 1958 Birth Cohort (58C, also known as the National Child Development Study), which included all births in England, Wales and Scotland during one week in 1958. Recruitment details and diagnostic criteria for each of the four case groups and the 58C are further described in the Supplementary Methods online.

Sample QA/QC

Genome-wide Identity by State (IBS) sharing was calculated for each pair of individuals in the combined sample of cohorts in order to identify first and second degree relatives that might contaminate the study. One subject from any pair of individuals who shared < 400 genotypes IBS = 0 and/or > 80% alleles IBS was removed from all subsequent analyses (i.e. the individual with the most missing genotypes). In order to identify individuals who might have ancestries other than Western European, we merged each of our cohorts with the 60 CEU founder, 60 YRI founder, and 90 JPT and CHB individuals from the International HapMap Project45. We calculated genome-wide identity by state distances for each pair of individuals (i.e. one minus average IBS sharing) on those markers shared between HapMap and our non-synonymous panel, and then used the multidimensional scaling option in R to generate a two dimensional plot based upon individuals’ scores on the first two principal coordinates from this analysis (Supplementary Figure 2). Any WTCCC sample that was not present in the main cluster with the CEU individuals was excluded from subsequent analyses. Finally, any individual with >10% of genotypes missing was removed from the analysis. The number of individuals remaining after these quality control measures is displayed in Table 1.

Genotyping

We genotyped a total of 14,436 nsSNPs across the genome on all case and control samples. Because three of the diseases were of autoimmune etiology, we also typed an additional 897 SNPs within the MHC region, as well as 103 SNPs in pigmentation genes specifically designed to differentiate between population groups. SNP genotyping was performed with the Infinium I assay (Illumina) which is based on Allele Specific Primer Extension (ASPE) and the use of a single fluorochrome. The assay requires ~250 ng of genomic DNA which is first subjected to a round of isothermal amplification generating a “high complexity” representation of the genome with most loci represented at usable amounts. There are two allele specific probes (50mers) per SNP each on a different bead type; each bead type is present on the array 30 times on average (minimum 5), allowing for multiple independent measurements. We processed six samples per array. Clustering was performed with the GenCall software version 6.2.0.4 which assigns a quality score to each locus and an individual genotype confidence score (GC score) which is based on the distance of a genotype from the centre of the nearest cluster. First, we removed samples with more than 50% of loci having a score below 0.7 and then all loci with a quality score below 0.2. Post clustering we applied two additional filtering criteria: (i) omit individual genotypes with a GC score < 0.15 and (ii) remove any SNP which had more than 20% of its samples with GC scores below 0.15. The above criteria were designed so as to optimize genotype accuracy whilst minimizing uncalled genotypes.

Statistical Analysis

Markers that were monomorphic in both case and control samples, SNPs with > 10% missing genotypes, and SNPs with differences in the amount of missing data between cases and controls (p < 10 as assessed by χ test) were excluded from all analyses involving that case group only. In addition any marker which failed an exact test of Hardy-Weinberg equilibrium in controls (p < 10) was excluded from all analyses46.

Cochran-Armitage Tests for trend47 were conducted using the PLINK program48. For the present analyses, we used the significance thresholds of p < 10 – 10, as suggested for gene-based scans with stronger prior probabilities than scans of anonymous markers21. In the present context, the lower thresholds are similar to Bonferroni significance levels (Bonferroni-corrected p = .05 corresponds to nominal p = 3 × 10). The conditional logistic regression analyses involving the LNPEP and ARTS1 SNPs were performed using Purcell’s WHAP program49.

We manually rechecked the genotype calls of every nsSNP with an asymptotic significance level of p < 10 by inspecting raw signal intensity values and their corresponding automated genotype calls. Interestingly, this flagged an additional 33 markers with clear problems in genotype calling, which were subsequently excluded from all analyses (see Supplementary Figure 4 for an example). These results indicate that this genotyping platform generally yields highly accurate genotypes, but errors do occur and they can be distributed non-randomly between cases and controls despite stringent QC procedures. It is imperative to check the clustering of the most significant SNPs to ensure that evidence for associations are not a result of genotyping error.

Whilst great lengths were taken to ensure our samples were as homogenous as possible in terms of genetic ancestry, even subtle population substructure can substantially influence tests of association in large genome-wide analyses involving thousands of individuals50. We therefore calculated the genomic-control inflation factor, λ20, for each case-control sample as well as in the analyses where we combined the other case groups with the control individuals (Table 2). In general, values for λ were small (~1.1) indicating a small degree of substructure in UK samples which induces only a slight inflation of the test statistic under the null hypothesis, consistent with the results from our companion paper12. We, therefore, present uncorrected results in all analyses reported.

Subjects

Individuals included in the study were self-identified as white Europeans and came from mainland UK (England, Scotland and Wales, but not Northern Ireland). The 1500 control samples were from the British 1958 Birth Cohort (58C, also known as the National Child Development Study), which included all births in England, Wales and Scotland during one week in 1958. Recruitment details and diagnostic criteria for each of the four case groups and the 58C are further described in the Supplementary Methods online.

Sample QA/QC

Genome-wide Identity by State (IBS) sharing was calculated for each pair of individuals in the combined sample of cohorts in order to identify first and second degree relatives that might contaminate the study. One subject from any pair of individuals who shared < 400 genotypes IBS = 0 and/or > 80% alleles IBS was removed from all subsequent analyses (i.e. the individual with the most missing genotypes). In order to identify individuals who might have ancestries other than Western European, we merged each of our cohorts with the 60 CEU founder, 60 YRI founder, and 90 JPT and CHB individuals from the International HapMap Project45. We calculated genome-wide identity by state distances for each pair of individuals (i.e. one minus average IBS sharing) on those markers shared between HapMap and our non-synonymous panel, and then used the multidimensional scaling option in R to generate a two dimensional plot based upon individuals’ scores on the first two principal coordinates from this analysis (Supplementary Figure 2). Any WTCCC sample that was not present in the main cluster with the CEU individuals was excluded from subsequent analyses. Finally, any individual with >10% of genotypes missing was removed from the analysis. The number of individuals remaining after these quality control measures is displayed in Table 1.

Genotyping

We genotyped a total of 14,436 nsSNPs across the genome on all case and control samples. Because three of the diseases were of autoimmune etiology, we also typed an additional 897 SNPs within the MHC region, as well as 103 SNPs in pigmentation genes specifically designed to differentiate between population groups. SNP genotyping was performed with the Infinium I assay (Illumina) which is based on Allele Specific Primer Extension (ASPE) and the use of a single fluorochrome. The assay requires ~250 ng of genomic DNA which is first subjected to a round of isothermal amplification generating a “high complexity” representation of the genome with most loci represented at usable amounts. There are two allele specific probes (50mers) per SNP each on a different bead type; each bead type is present on the array 30 times on average (minimum 5), allowing for multiple independent measurements. We processed six samples per array. Clustering was performed with the GenCall software version 6.2.0.4 which assigns a quality score to each locus and an individual genotype confidence score (GC score) which is based on the distance of a genotype from the centre of the nearest cluster. First, we removed samples with more than 50% of loci having a score below 0.7 and then all loci with a quality score below 0.2. Post clustering we applied two additional filtering criteria: (i) omit individual genotypes with a GC score < 0.15 and (ii) remove any SNP which had more than 20% of its samples with GC scores below 0.15. The above criteria were designed so as to optimize genotype accuracy whilst minimizing uncalled genotypes.

Statistical Analysis

Markers that were monomorphic in both case and control samples, SNPs with > 10% missing genotypes, and SNPs with differences in the amount of missing data between cases and controls (p < 10 as assessed by χ test) were excluded from all analyses involving that case group only. In addition any marker which failed an exact test of Hardy-Weinberg equilibrium in controls (p < 10) was excluded from all analyses46.

Cochran-Armitage Tests for trend47 were conducted using the PLINK program48. For the present analyses, we used the significance thresholds of p < 10 – 10, as suggested for gene-based scans with stronger prior probabilities than scans of anonymous markers21. In the present context, the lower thresholds are similar to Bonferroni significance levels (Bonferroni-corrected p = .05 corresponds to nominal p = 3 × 10). The conditional logistic regression analyses involving the LNPEP and ARTS1 SNPs were performed using Purcell’s WHAP program49.

We manually rechecked the genotype calls of every nsSNP with an asymptotic significance level of p < 10 by inspecting raw signal intensity values and their corresponding automated genotype calls. Interestingly, this flagged an additional 33 markers with clear problems in genotype calling, which were subsequently excluded from all analyses (see Supplementary Figure 4 for an example). These results indicate that this genotyping platform generally yields highly accurate genotypes, but errors do occur and they can be distributed non-randomly between cases and controls despite stringent QC procedures. It is imperative to check the clustering of the most significant SNPs to ensure that evidence for associations are not a result of genotyping error.

Whilst great lengths were taken to ensure our samples were as homogenous as possible in terms of genetic ancestry, even subtle population substructure can substantially influence tests of association in large genome-wide analyses involving thousands of individuals50. We therefore calculated the genomic-control inflation factor, λ20, for each case-control sample as well as in the analyses where we combined the other case groups with the control individuals (Table 2). In general, values for λ were small (~1.1) indicating a small degree of substructure in UK samples which induces only a slight inflation of the test statistic under the null hypothesis, consistent with the results from our companion paper12. We, therefore, present uncorrected results in all analyses reported.

Supplementary Material

Supplementary Information

Supplementary Information

Click here to view.(713K, pdf)

Acknowledgements

We would like to thank all the patients and controls who participated in this study.

AITD: We wish to thank the collection coordinators, Jackie Carr-Smith and all contributors to the AITD national DNA collection of index cases and family members from centres including Birmingham, Bournemouth, Cambridge, Cardiff, Exeter, Leeds, Newcastle and Sheffield. Principle leads for the AITD UK national collection are: Simon HS Pearce (Newcastle), Bijay Vaidya (Exeter), John H Lazarus (Cardiff), Amit Allahabadia (Sheffield), Mary Armitage (Bournemouth), Peter J Grant (Leeds), VK Chatterjee (Cambridge).

AS: We wish to thank the Arthritis Research Campaign (UK). MAB is funded by the National Health and Medical Research Council (Australia). TASC is funded by the National Institute of Arthritis and Musculoskeletal and Skin Diseases grants 1PO1-052915-01,RO1 {"type":"entrez-nucleotide","attrs":{"text":"AR046208","term_id":"5967673","term_text":"AR046208"}}AR046208, and RO1-{"type":"entrez-nucleotide","attrs":{"text":"AR048465","term_id":"5970808","term_text":"AR048465"}}AR048465,as well as by University of Texas at Houston CTSA grant UL1RR024148, Cedars-Sinai GCRC grant MO1-RR00425, The Rosalind Russell Center for Arthritis Research at The University of California San Francisco, and the Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH. We thank Rui Jin for technical assistance and Laura Diekman, Lori Guthrie, Felice Lin and Stephanie Morgan for their study coordination.

BC: The Breast Cancer samples were clinically and molecularly curated with the assistance of Anthony Renwick, Anita Hall, Anna Elliot, Hiran Jayatilake, Tasnim Chagtai, Rita Barfoot, Patrick Kelly and Katarina Spanova. Our research is supported by US Army Medical Research and Material Command grant #W81XWH-05-1-0204, The Institute of Cancer Research and Cancer Research UK.

MS: Our work has been supported by the Wellcome Trust (Grant Ref 057097), the Medical Research Council (UK) (Grant Ref G0000648), the Multiple Sclerosis Society of Great Britain and Northern Ireland (Grant Ref 730/02) and the National Institutes of Health (USA) (Grant Ref 049477). AG is a Postdoctoral Fellow of the Research Foundation - Flanders (FWO - Vlaanderen).

Luana Galver and Paulina Ng at Illumina and Jonathan Morrison at the Sanger Institute contributed in the design of the nsSNP array. The DNA team of the JDRF/WT DIL. Thomas Dibling, Cliff Hind, and Douglas Simpkin at the Sanger Institute for carrying out the genotyping.

We also with to thank Sheila Bingham and the WTCCC Inflammatory Bowel Disease group for genotyping the ARTS1 markers in their replication samples.

Address Correspondence to: Lon R. Cardon or David M. Evans2
Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, Washington, 98109, USA. Tel: +1 206 667 6542. Fax: +1 206 667 4023. Email: gro.crchf@nodracl
University of Oxford, Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN Tel: +44 (0)1865 287 587, Fax: +44 (0)1865 287 697. Email: ku.ca.xo.llew@edivad
List of participants and affiliations appear at the end of this manuscript

Abstract

We have genotyped 14,436 nsSNPs and 897 MHC tagSNPs in 1000 independent cases of Ankylosing Spondylitis (AS), Autoimmune Thyroid Disease (AITD), Multiple Sclerosis and Breast Cancer. Comparing each of these diseases against a common control set of 1500 unselected healthy British individuals, we report initial association and independent replication of two new loci for AS, ARTS1 and IL23R, and confirmation of the previously reported AITD association with TSHR and FCRL3. These findings, enabled in part by expanding the control reference group with individuals from the other disease groups to increase statistical power, highlight important new possibilities for autoimmune regulation and suggest that IL23R may be a common susceptibility factor for the major ‘seronegative’ diseases.

Abstract

Genome-wide association scans are currently revealing a number of new genetic variants for common diseases; eg111. We have recently completed the largest and most comprehensive scan conducted to date, involving genome-wide association studies of 2000 individuals from each of seven common disease cohorts and 3000 common control individuals using a dense panel of >500k markers12. In parallel with this scan, we conducted a study of 5,500 independent individuals with a genome-wide set of non-synonymous coding variants, an approach which has recently yielded new findings for Type 1 diabetes and Crohn’s disease and which has been proposed as an efficient complementary approach to whole genome scans1315. Here we report several new replicated associations in our scan of nsSNPs in 1500 shared controls and 1000 individuals from each of 4 different diseases: Ankylosing Spondylitis (AS), Autoimmune Thyroid Disease/Graves’ Disease (AITD), Breast Cancer (BC) and Multiple Sclerosis (MS).

Footnotes

Membership of the BRAGGS and Breast Cancer Susceptibility Collaboration (UK) is listed in the Supplementary Information.

Footnotes

References

  • 1. Rioux JD, et al Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat Genet. 2007;39:596–604.[Google Scholar]
  • 2. Sladek R, et al A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445:881–5.[PubMed][Google Scholar]
  • 3. Easton DF, et al Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007;447:1087–93.[Google Scholar]
  • 4. Libioulle C, et al Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet. 2007;3:e58.[Google Scholar]
  • 5. Zanke BW, et al Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet. 2007;39:989–994.[PubMed][Google Scholar]
  • 6. Haiman CA, et al Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet. 2007;39:638–44.[Google Scholar]
  • 7. Gudmundsson J, et al Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet. 2007;39:977–83.[PubMed][Google Scholar]
  • 8. Moffatt MF, et al Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature. 2007;448:470–3.[PubMed][Google Scholar]
  • 9. Zeggini E, et al Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316:1336–41.[Google Scholar]
  • 10. Scott LJ, et al A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–5.[Google Scholar]
  • 11. Saxena R, et al Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316:1331–6.[PubMed][Google Scholar]
  • 12. WTCCC Genome-wide association studies of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–683.
  • 13. Hampe J, et al A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nat Genet. 2007;39:207–211.[PubMed][Google Scholar]
  • 14. Jorgenson E, Witte JSCoverage and power in genomewide association studies. Am J Hum Genet. 2006;78:884–8.[Google Scholar]
  • 15. Smyth DJ, et al A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region. Nat Genet. 2006;38:617–9.[PubMed][Google Scholar]
  • 16. Miretti MM, et al A high-resolution linkage-disequilibrium map of the human major histocompatibility complex and first generation of tag single-nucleotide polymorphisms. Am J Hum Genet. 2005;76:634–46.[Google Scholar]
  • 17. Clayton DG, et al Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet. 2005;37:1243–6.[PubMed][Google Scholar]
  • 18. Sims AM, et al Non-B27 MHC associations of ankylosing spondylitis. Genes Immun. 2007;8:115–23.[PubMed][Google Scholar]
  • 19. McGinnis R, Shifman S, Darvasi APower and efficiency of the TDT and case-control design for association scans. Behav Genet. 2002;32:135–44.[PubMed][Google Scholar]
  • 20. Devlin B, Roeder KGenomic control for association studies. Biometrics. 1999;55:997–1004.[PubMed][Google Scholar]
  • 21. Thomas DC, Clayton DGBetting odds and genetic associations. J Natl Cancer Inst. 2004;96:421–3.[PubMed][Google Scholar]
  • 22. Jiang HR, et al Sialoadhesin promotes the inflammatory response in experimental autoimmune uveoretinitis. J Immunol. 2006;177:2258–64.[PubMed][Google Scholar]
  • 23. Dechairo BM, et al Association of the TSHR gene with Graves’ disease: the first disease specific locus. Eur J Hum Genet. 2005;13:1223–30.[PubMed][Google Scholar]
  • 24. Hiratani H, et al Multiple SNPs in intron 7 of thyrotropin receptor are associated with Graves’ disease. J Clin Endocrinol Metab. 2005;90:2898–903.[PubMed][Google Scholar]
  • 25. Miettinen OProportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol. 1974;99:325–332.[PubMed][Google Scholar]
  • 26. Duerr RH, et al A Genome-Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease Gene. Science. 2006[Google Scholar]
  • 27. Tremelling M, et al IL23R variation determines susceptibility but not disease phenotype in inflammatory bowel disease. Gastroenterology. 2007;132:1657–64.[Google Scholar]
  • 28. Cargill M, et al A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. Am J Hum Genet. 2007;80:273–90.[Google Scholar]
  • 29. Simmonds MJ, et al Contribution of single nucleotide polymorphisms within FCRL3 and MAP3K7IP2 to the pathogenesis of Graves’ disease. J Clin Endocrinol Metab. 2006;91:1056–61.[PubMed][Google Scholar]
  • 30. Kochi Y, et al A functional variant in FCRL3, encoding Fc receptor-like 3, is associated with rheumatoid arthritis and several autoimmunities. Nat Genet. 2005;37:478–85.[Google Scholar]
  • 31. Capon F, et al Fine mapping of the PSORS4 psoriasis susceptibility region on chromosome 1q21. J Invest Dermatol. 2001;116:728–30.[PubMed][Google Scholar]
  • 32. Dai KZ, et al The T cell regulator gene SH2D2A contributes to the genetic susceptibility of multiple sclerosis. Genes Immun. 2001;2:263–8.[PubMed][Google Scholar]
  • 33. Chang SC, Momburg F, Bhutani N, Goldberg ALThe ER aminopeptidase, ERAP1, trims precursors to lengths of MHC class I peptides by a “molecular ruler” mechanism. Proc Natl Acad Sci U S A. 2005;102:17107–12.[Google Scholar]
  • 34. Saveanu L, et al Concerted peptide trimming by human ERAP1 and ERAP2 aminopeptidase complexes in the endoplasmic reticulum. Nat Immunol. 2005;6:689–97.[PubMed][Google Scholar]
  • 35. Brown MA, et al HLA class I associations of ankylosing spondylitis in the white population in the United Kingdom. Ann Rheum Dis. 1996;55:268–70.[Google Scholar]
  • 36. Cui X, Rouhani FN, Hawari F, Levine SJShedding of the type II IL-1 decoy receptor requires a multifunctional aminopeptidase, aminopeptidase regulator of TNF receptor type 1 shedding. J Immunol. 2003;171:6814–9.[PubMed][Google Scholar]
  • 37. Cui X, Rouhani FN, Hawari F, Levine SJAn aminopeptidase, ARTS-1, is required for interleukin-6 receptor shedding. J Biol Chem. 2003;278:28677–85.[PubMed][Google Scholar]
  • 38. Cui X, et al Identification of ARTS-1 as a novel TNFR1-binding protein that promotes TNFR1 ectodomain shedding. J Clin Invest. 2002;110:515–26.[Google Scholar]
  • 39. Cua DJ, et al Interleukin-23 rather than interleukin-12 is the critical cytokine for autoimmune inflammation of the brain. Nature. 2003;421:744–8.[PubMed][Google Scholar]
  • 40. Murphy CA, et al Divergent pro- and antiinflammatory roles for IL-23 and IL-12 in joint autoimmune inflammation. J Exp Med. 2003;198:1951–7.[Google Scholar]
  • 41. Hue S, et al Interleukin-23 drives innate and T cell-mediated intestinal inflammation. J Exp Med. 2006;203:2473–83.[Google Scholar]
  • 42. Mannon PJ, et al Anti-interleukin-12 antibody for active Crohn’s disease. N Engl J Med. 2004;351:2069–79.[PubMed][Google Scholar]
  • 43. Hunter DJ, et al A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet. 2007;39:870–4.[Google Scholar]
  • 44. Cohen JC, et al Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305:869–72.[PubMed][Google Scholar]
  • 45. A haplotype map of the human genome. Nature. 2005;437:1299–320.
  • 46. Wigginton JE, Cutler DJ, Abecasis GRA note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet. 2005;76:887–93.[Google Scholar]
  • 47. Armitage PTest for linear trend in proportions and frequencies. Biometrics. 1955;11:375–386.[PubMed][Google Scholar]
  • 48. Purcell S, et al PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet. 2007;81:559–75.[Google Scholar]
  • 49. Purcell S, Daly MJ, Sham PCWHAP: haplotype-based association analysis. Bioinformatics. 2007;23:255–6.[PubMed][Google Scholar]
  • 50. Marchini J, Cardon LR, Phillips MS, Donnelly PThe effects of human population structure on large genetic association studies. Nat Genet. 2004[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.