Cell 159(3): 676-690

Integrated Genomic Characterization of Papillary Thyroid Carcinoma

Introduction

The incidence of thyroid cancer has increased 3-fold over the past 30 years (Chen et al., 2009) and the prevalence of different histologies and genetic profiles has changed over time (Jung et al., 2014). All thyroid cancers, except medullary carcinoma, are derived from follicular cells that comprise the simple unicellular epithelium of normal thyroid. Eighty percent of all thyroid cancers are papillary thyroid carcinomas (PTCs), named for their papillary histological architecture. In addition, PTCs encompass several subtypes, including the follicular variant (FV), characterized by a predominantly follicular growth pattern. PTCs are usually curable with 5-year survival of over 95% (Hay et al., 2002); however, occasionally they dedifferentiate into more aggressive and lethal thyroid cancers. Current treatment involves surgery, thyroid hormone and radioactive iodine (RAI) therapy (which exploits thyroid follicular cells' avidity for iodine).

Previous genetic studies report a high frequency (70%) of activating somatic alterations of genes encoding effectors in the mitogen-activated protein kinase (MAPK) signaling pathway, including point mutations of BRAF and the RAS genes (Cohen et al., 2003; Kimura et al., 2003; Lemoine et al., 1988; Suarez et al., 1988), as well as fusions involving the RET (Grieco et al., 1990) and NTRK1 tyrosine kinases (Pierotti et al., 1995). These mutations are almost always mutually exclusive (Soares et al., 2003), suggesting similar or redundant downstream effects. The various MAPK pathway alterations are strongly associated with distinct clinicopathological characteristics (Adeniran et al., 2006), and gene expression (Giordano et al., 2005) and DNA methylation profiles (Ellis et al., 2014). Mutations in members of the phosphoinositide 3-kinase (PI3K) pathway, such as PTEN, PIK3CA and AKT1, have also been reported at low frequencies (Xing, 2013).

We present The Cancer Genome Atlas (TCGA) project results from a comprehensive multiplatform analysis of 496 PTCs, the largest cohort studied to date. Clinically aggressive thyroid cancers (poorly and undifferentiated carcinomas) were excluded to maximally develop the compendium of tumor-initiating alterations. While excluding histological types of aggressive tumors limited some aspects of the study, the homogeneous PTC cohort allowed robust correlative analyses of multidimensional molecular data. The relatively quiet PTC genome allowed us to assess the signaling and differentiation consequences of the common drivers. Furthermore, the cohort allowed us to define integrated molecular subtypes that correspond to histology, signaling, differentiation state and risk assessment. We put forth that our results will lead to improved clinicopathologic classification and management of patients.

Results

Samples, Clinical Data and Analytical Approach

Tumor samples and matched germline DNA from blood or normal thyroid from 496 patients included 324 (69.4%) classical-type (CT), 99 (21.2%) follicular-variant (FV), 35 (7.5%) tall cell variant (TCV), 9 (2.0%) uncommon PTC variants and 29 without histological annotation, primarily from non-irradiated patients (Table S1A). We estimated risk of tumor recurrence based on the 2009 American Thyroid Association guidelines (American Thyroid Association Guidelines Taskforce on Thyroid et al., 2009), and assessed mortality risk using MACIS scores (Hay et al., 1993) (see Supplement). We generated comprehensive and high-quality molecular data at TCGA genome sequencing and characterization centers with one proteomic and six genomic platforms (Table S1B), and analyzed the data at multiple genomic data analysis centers (Table S1C). Although 496 primary tumors were studied, the number of informative cases varied across platforms mostly for technical reasons and 390 tumors were analyzed on all major platforms (SNP arrays, exomes, RNA-seq, miRNA-seq and DNA methylation).

Our analysis strategy consisted of three parts, each based on integrating several molecular datasets. First, we identified somatic mutations that included single nucleotide variants, small insertions and deletions, gene fusions and copy number alterations in order to characterize the genomic landscape of PTC and identify driver events in cases without any previously known driver (i.e., the so-called ‘dark matter’ of the PTC genome). Next, we focused on the consequences of the driver mutations. We developed a gene expression signature of samples with these mutations and characterized tumors based on this signature. We then determined the differential signaling consequences of BRAF^andRAS mutations, the most common pathogenic mutations in the cohort, using protein and mRNA expression data. Finally, we used the molecular data to derive molecular classifications of PTC and integrated these classifications with data related to genotype, signaling, differentiation and risk.

Somatic Single Nucleotide Variants and Insertions and Deletions

Whole exome DNA sequencing of 402 (of 496) tumor/normal pairs (average depth-of-coverage; 97.0× for tumors, 94.9× for normals) showed a low somatic mutation density (0.41 non-synonymous mutations per Mb, on average) (Figures 1A and S1A; Table S3) relative to other cancers (Lawrence et al., 2014; Lawrence et al., 2013). Mutation density correlated with age (Pearson correlation p=5.2×10^,Figure S1B), risk of recurrence (Kruskal-Wallis test p=3.4×10^{), and MACIS score (Pearson correlation p=4×10}^,Figure S1C). To compensate for the confounding effect of age-at-diagnosis, we regressed the effect of age from mutation density and found that the association of risk with age-corrected mutation density remained (Kruskal-Wallis test p=9.7×10^{; MACIS scores did not, p=0.19,}Figure S1C). This association was maintained for the CT cohort (p=0.0044), but not for other variants (Figure S1D). The correlation between mutation density and age suggests age should be used as a continuous variable in risk stratification (Bischoff et al., 2013)), instead of a threshold of 45 years used in many staging systems.

An external file that holds a picture, illustration, etc.
Object name is nihms633017f1.jpg

Figure 1

Landscape of Genomic Alterations in 402 Papillary Thyroid Carcinomas

(A) Mutation density (mutations/Mb) across the cohort. (B) Tumor purity, patient age, gender, history of radiation exposure, risk of recurrence, MACIS score, histological type, and BRS score. (C) Number and frequency of recurrent mutations in genes (left) ranked by MutSig significance (right), gene-sample matrix of mutations (middle) with TERT promoter mutations (bottom). (D) Number and frequency of fusion events (left), gene-sample matrix of fusions across the cohort (middle). (E) Number and frequency of SCNAs (left), chromosome-sample matrix of SCNAs across the cohort (middle) with focal deletions in BRAF and PTEN (bottom), GISTIC2 significance (right). (F) Driving variant types across the cohort. Samples were sorted by driving variant type with dark matter on the left. See also Figures S1, S2, S3, S4 and Tables S1, S2, S3, S4A, and S5A,B,E.

Mutation densities were not associated with other variables, like genotype or radiation exposure (Mann-Whitney p=0.579 and p=0.173, respectively; Figure S1E). TCVs had the highest mutation density (Figure S1F), consistent with their known aggressive behavior. Five BRAF^V600E-mutant tumors with aggressive histologic features had higher mutation densities (Figures S1E,G).

Ten tumors with the highest mutation densities (> 1/Mb) were enriched for mutations associated with the APOBEC process (Roberts et al., 2013) (Mann-Whitney p=4×10^{), similar to bladder cancer (}Cancer Genome Atlas Research, 2014).

The relatively large number of patients in this study and the low background mutation density provides the statistical power to detect significantly mutated genes (SMGs) in as low as 3% of cases (>90% power for 90% of genes) (Lawrence et al., 2014). MutSig (Lawrence et al., 2013) detected seven SMGs (q<0.1) (Figure 1C and Table S2) with four of the seven in <3% of patients. SMGs included the MAPK-related genes, BRAF, NRAS, HRAS and KRAS, which were virtually mutually exclusive (Figure 1C, Fisher's exact test p=1.1×10^{, MEMo (}Ciriello et al., 2012) corrected p<0.01, Table S4A) in 300/402 (74.6%) patients. The 248 (61.7%) BRAF mutations were mostly V600E substitution (Figures 1C and S2A and Table S3C). Somatic single nucleotide variants (SSNVs) were identified in 52 patients (12.9%) within codons 12 and 61 of RAS genes (Figures 1C and S2B and Table S3D). We observed strong associations between BRAF and RAS mutation status and histology, with BRAF^{characterizing CT and TCV and}RAS mutations characterizing FV (Figures 1B,C and Table S3E). These observations confirm the critical role of MAPK pathway alterations in PTC and illustrate that having more than one mutation confers no clonal advantage.

MutSig also identified EIF1AX (eukaryotic translation initiation factor 1A, X-linked) as significantly mutated (q=5.3×10^;Figure 1C and Table S3B). EIF1AX encodes a protein that mediates transfer of Met-tRNAf to 40S ribosomal subunits to form the 40S preinitiation complex for protein translation. Six (1.5%) mutations (Figure S2C) were identified in tumors lacking other known driver mutations (Fisher's exact test of EIF1AX vs. RAS/BRAF, p=0.013) with one exception: one case with 3 driver mutations in KRAS, BRAF and EIF1AX; here the KRAS mutation was clonal (cancer cell fraction estimated at 100% with 95% CI [56-100%]), while EIF1AX and BRAF were subclonal at 76% [38-100%] and 53% [22-94%] cancer cell fractions, respectively (see Supplement). The near-mutual exclusivity of EIF1AX alterations with MAPK pathway mutations (Figure 1C), together with recurrent mutations in other tumors (Forbes et al., 2011; Martin et al., 2013), suggests that EIF1AX is a novel cancer gene in PTC.

The two remaining SMGs (PPM1D, CHEK2, Figures S2D,E) encode interacting proteins related to DNA repair (Oliva-Trastoy et al., 2007) and occurred concomitant with MAPK-pathway driver mutations (Figure 1C). Germline PPM1D mutations are associated with breast and ovarian cancer predisposition and may impair p53 function (Kleiblova et al., 2013). Although not statistically significant, there were 8 additional DNA repair-related mutations in 26 (6.5%) tumors, all mutually exclusive (Figure S3A, MEMo corrected p=0.24). Tumors carrying these mutations had a significantly higher median mutation density (Table S5A, Mann Whitney p=0.022), although not after age-adjustment (p=0.111), and were enriched with high-risk patients (Fisher's exact test p=0.018). These observations, together with our finding of a mutated network of FANCA-associated genes (see Integrated Analysis of Somatic Alterations), suggest that acquisition of a defect in DNA repair represents a mechanism for development of aggressive PTC.

We also observed alterations in other cancer genes, pathways and functional groups (chromatin remodeling, PI3K, WNT and tumor suppressor genes). Although not statistically significant, they are known to play a role in thyroid cancer pathogenesis and progression (Xing, 2013). We identified 93 mutations within 57 epigenetic regulatory genes in 80/402 (20.0%) tumors, 9 of which possessed more than one mutation (Figure S3D). Mutations in MLL (1.7%), ARID1B (1.0%) and MLL3 (1%) were most frequent (Figures 1C and S3D). In the PI3K and PPARγ pathways, we observed 20 nearly mutually exclusive mutations in PTEN, AKT1/2, PAX8/PPARG (Figure S3E), representing 4.5% (18/402) of cases. Mutations of five WNT pathway-related genes were found in 6/402 (1.5%) tumors (Figure S3B), a lower frequency than reported for aggressive tumor types (Xing, 2013). Mutations of tumor suppressor genes (TP53, RB1, NF1/2, MEN1, PTEN) were identified in 15/402 (3.7%) tumors (Figure S3C). In addition, two genes that may be tumor suppressors were near significance: (i) ZFHX3, a zinc finger homeobox transcription factor (Minamiya et al., 2012), 7/402 (1.7%) tumors (q=0.79); and (ii) BDP1, which may regulate AKT signaling (Woiwode et al., 2008), in 5/402 (1.2%) tumors (q=0.58). Finally, mutations in thyroid-related genes were infrequent: 11/402 (2.7%) mutations in thyroglobulin, 2/402 (0.5%) mutations in thyroid-stimulating hormone receptor (TSHR). No mutations in thyroid hormone receptor genes (TRHA and TRHB) were found.

We identified TERT promoter mutations in 36 (9.4%) of 384 informative tumors, with 27 (7.0%) C228T, 1 (0.3%) C228A and 8 (2.1%) C250T substitutions. These mutations were present in PTCs of all histological types. TERT promoter mutations had modest association with mutation drivers (Fisher's exact test p=0.029) and arm-level somatic copy number alterations (p=0.023), but not BRAF mutations or gene fusions. They showed strong associations with older age, MACIS scores (Kruskal-Wallis test p=2.6×10^{, and p=1.3×10}^,Figure 2B, respectively), and high risk of recurrence (Fisher's exact test p=7×10^,Figure 2A), within the entire cohort, and these associations remained within the BRAF^{tumors. Finally,}TERT promoter mutations occurred in less-differentiated PTCs (lower TDS values, see Signaling and Differentiation) (Kruskal-Wallis test p=4.2×10^,Figure2C). These associations are consistent with published results (Melo et al., 2014; Xing et al., 2014) and suggest that molecular diagnostic assessment of TERT promoter mutations may be used to identify high risk patients.

An external file that holds a picture, illustration, etc.
Object name is nihms633017f2.jpg

Figure 2

TERT Promoter Mutations and Clonality Assessment of Driver Mutations

(A-C) Association of TERT promoter mutations with (A) risk of recurrence, (B) MACIS score, and (C) thyroid differentiation score (TDS). See also Table S2.

(D) Mutation cancer cell fraction distribution. The majority of all mutations, including driver mutations BRAF, NRAS, HRAS, KRAS, and EIF1AX, have a calculated cancer cell fraction close to 1.0, indicating their presence all tumor cells.

A recent report suggested that BRAF^{mutations in PTC are often present only in a small subset of the cancer cells (}Guerra et al., 2012), information relevant to therapeutic application of BRAF-inhibitors. To address this issue, we used the ABSOLUTE package (Carter et al., 2012) to estimate the cancer cell fraction (CCF) of mutations in BRAF, NRAS, HRAS, KRAS and EIF1AX (see Supplement). In our study, all of these driver mutations were present in the majority of tumor cells (Figure 2D), i.e. they are largely clonal. Our results instead confirm other studies showing homogeneous cancer cell immunohistochemical staining with BRAF^{-specific antibodies (}Ghossein et al., 2013).

We performed targeted sequencing validation experiments on a subset of 318 tumors at 333 mutated sites. Two hundred sixty-four of 265 (99.6%) of mutations in driver genes were confirmed. We also carried out targeted validation on a random set of 54 mutations, of which 49 (91%) were confirmed, for an overall validation rate of 96%.

Gene Fusions

Chromosomal rearrangements and translocations contribute to PTC pathogenesis (Xing, 2013). We identified both known and novel fusions, including new partners of previously described fusions, in 74 (15.3%) of 484 informative cases, based on multiple platforms (Figure 3). Fusions were mutually exclusive with each other and with BRAF, RAS and EIF1AX mutations (Fisher's exact test p=4.9×10^;Figures 1C,D and Figure S3F). Fusion-positive tumors were associated with younger age of diagnosis (Wilcoxon ranksum test p=0.005), but not with risk of recurrence (Fisher's exact test p=0.55) or age-corrected mutation density (ranksum test p=0.341, Table S5A).

An external file that holds a picture, illustration, etc.
Object name is nihms633017f3.jpg

Figure 3

Candidate “Driver” Gene Fusions in Papillary Thyroid Carcinoma

(A) RNA expression fusion plots for representative novel candidate genes involving RET, BRAF, ALK, NTRK3 and LTK fusions. Each gene in the fusion plot is drawn 5′ to 3′, exon specific relative expression data is represented with low (blue) and high expression (red), and the kinase domain is mapped with a green box. The pairs of numbers across the links indicate the number of split reads and paired-end supporting reads from RNA-seq. (B) Circos plots (http://circos.ca) of RET, BRAF and NTRK3 fusions. Red links represent recurrent fusions, black non-recurrent. See also Figures S3F and Table S5B.

RET fusions were most frequent (33/484, 6.8%) (Table S5B and Figure S3F); however, less frequent than previously reported in sporadic or radiation-associated PTC (Ricarte-Filho et al., 2013). We identified four novel unique RET fusions that preserved the kinase domain (Figure 3A and Table S5B).

Fusions involving BRAF have been identified in PTC (Ciampi et al., 2005) and other cancers. In melanoma these events define a molecular subclass with distinct response to MEK inhibition (Hutchinson et al., 2013). In our cohort, we identified 13/484 (2.7%) BRAF fusions with diverse gene partners (Figure 3A and Table S5B); three tumors exhibited the SND1/BRAF gene fusion seen in a gastric cancer cell line (Lee et al., 2012). Some fusions supported BRAF signaling with expression and conservation of its kinase domain (MKRN1/BRAF), while others suggested an alternative activating mechanism. Six of nine BRAF fusions were validated by independent PCR experiments, while PCR evidence for the other three was inconclusive. These diverse fusions, together with the BRAF point mutations and indels described above, illustrate the various possible mechanisms of activating BRAF and highlight its oncogenic importance in PTC.

We identified PAX8/PPARG fusions in 4/484 (0.8%) tumors. Originally found in follicular carcinomas (Kroll et al., 2000), PAX8/PPARG translocations have been reported with low frequency in PTC, especially FV. ETV6/NTRK3 and RBPMS/NTRK3 fusions were uncovered in 6/484 (1.2%) tumors. These fusions are more prevalent in radiation-induced thyroid cancers but have lower prevalence in sporadic PTC (Ricarte-Filho et al., 2013). THADA fusions were identified in 6/484 (1.2%) tumors. Fusions involving ALK presented in 4/484 (0.8%) tumors, including EML4/ALK (Figure 3A), which is observed in lung adenocarcinomas and rare thyroid cancers (Kelly et al., 2014), suggesting opportunities for targeted inhibition of ALK. In addition, two cases had FGFR2 fusions and two cases had non-recurrent fusions of MET and LTK (Table S4B).

Somatic Copy Number Alterations

Somatic copy number alterations (SCNAs) were identified in 135 (27.2%) of 495 informative tumors. These 135 cases were significantly enriched in cases with no driver mutation or fusion (Fisher's exact test p=4.4×10^;Figures 1C,D,E), suggesting that SCNAs may also drive PTC. Arm-level alterations occurred significantly more frequently in FV than in CT subtypes (Figure S4A) (FDR<0.1, p<0.008), providing evidence for a close relationship between FV and follicular neoplasms (in which SCNAs are more common (Wreesmann et al., 2004a)).

Unsupervised clustering of chromosomal arm-level alterations defined 4 distinct classes (Figure S4B). The largest class (72.9%) lacked significant gains or losses (SCNA-quiet), reflecting the highly differentiated nature of PTC; this group was not enriched for any particular genotype or histologic type. A second class (9.9%) was characterized by an isolated loss of 22q (SCNA-22q-del), a region that includes NF2 and CHEK2 and reported to be lost with significant frequency in PTC (Kjellman et al., 2001). Seventy tumors had 22q loss and 5 tumors possessed CHEK2 mutations, with 4 cases containing both mutations (p=0.0035). The SCNA-22q-del cohort contained few TCV and was enriched for FV (p<0.05) (Figure S4C). This result suggests that loss of CHEK2 and/or the NF2 tumor suppressor may be important in PTC, particularly in the FV subtype. A third class (14.8%) was characterized by a few SCNA events and gain of 1q (SCNA-low-1q-amp), was enriched for TCV (p<0.0001) and BRAF mutations (p<0.05) (Figure S4C), and was associated with significantly higher MACIS scores (p <0.0001) (Figure S4D), risk profiles (Figure S4E), and tumor stage (Figure S4F), consistent with reports of 1q gains in aggressive PTC (Wreesmann et al., 2004b). The final and smallest class (2.4%) was defined by a higher frequency of focal gains and losses (SCNA-high).

We found few significantly recurring focal alterations using GISTIC2 (Mermel et al., 2011). In 5/13 tumors with BRAF fusions, we detected by SNP-array the resulting focal alterations at 7q34. Five tumors with 10q23.31 deletions lacked PTEN expression (Figure S4G). We also found single case amplifications containing oncogenic driver genes (e.g. FGFR3). Together, these results, in the context of SSNVs and fusions, suggest that SCNAs may represent both tumor-initiating and progression-related events in PTC.

Integrated Analyses of Somatic Alterations

Next we sought to identify additional genes that may harbor driver mutations (point mutations, fusions or copy-number changes) that did not meet significance, by searching within protein-protein interaction networks for subnetworks enriched with frequently mutated genes. To this end, we applied the HotNet2 algorithm (Leiserson et al., submitted) and identified 17 significantly mutated subnetworks (p<0.004, Table S5C and Figure S5A). As expected, the largest subnetwork (16 genes) included four known members of the MAPK signaling pathway (BRAF and three RAS genes) and 12 additional genes (Figure S5B). Some of these additional genes (e.g. RAP1GAP) displayed mutations that were mutually exclusive with BRAF (SSNVs, indels and fusions) and RAS, and may represent additional MAPK drivers. Other mutated genes in this subnetwork (e.g. PIK3CA) overlapped with BRAF^{and may alter the biology of these cancers by activating PI3K signaling, a hypothesis that requires validation.}

Other HotNet2 subnetworks significantly overlapped with known pathways (Table S5C and Figures S5C-E). Identification of the ECM-receptor interactions pathway is consistent with the role of ECM microenvironment in PTC (Nucera et al., 2011). The finding of a FANCA-associated protein complex subnetwork provides additional evidence for DNA repair playing a role in PTC.

We used network-based stratification (NBS) (Hofree et al., 2013) to discover three somatic mutation-based PTC subtypes (NBS1-3, Figures S6A-C). As expected, a strong association with histologic subtype was found, with a significant association between NBS1 and FV histology (Figure S6B(b), Fisher's exact test p<2×10^{). The subtypes were also significantly associated with lymph node status, extrathyroidal extension, stage and risk of recurrence (}Figure S6B(a,c-e), Fisher's exact test p=4.4×10^{, 3.1×10}^{, 3.8×10}^{, 8.2×10}^{, respectively), as well as other molecular characteristics (}Figure S6C). In order to identify somatic events that characterize each subnetwork, we applied the HotNet algorithm to each NBS subtype (Figure S6D and Table S5D). Subtype NBS1 was associated with perturbations in RAS, PTEN, PPARG and TSHR. Subtype NBS2 was associated with alterations in RET and related genes such as NTRK3. Subtype NBS3 appears to be predominantly BRAF^{-associated. These results confirm three broad classes of PTC determined by the common drivers.}

We also looked for the presence of viral pathogens in PTC using two independent methods to assess the RNA-seq data: PathSeq (Kostic et al., 2011) and BioBloom Tools (Chu et al., 2014). We identified two tumors with hepatitis B virus (HBV) and one tumor with human papillomavirus 45 (HPV45) at relative frequencies exceeding 0.1 viral reads per million human reads (RPM) for PathSeq and 0.2 RPM for BBT (see Supplement and Tables S4G-I), indicating that viral pathogens are unlikely significant contributors to PTC pathogenesis.

‘Dark Matter’ Summary

Starting with 402 cases with informative exome DNA sequence data, we examined tumors that lacked apparent driver mutations, so-called ‘dark matter’ samples, for the presence of novel potential driver alterations. SSNVs involving drivers accounted for 299 (73.6%) cases. Mutually exclusive fusions increased the number of cases with drivers to 358 (89.0%). Three mutually exclusive focal deletions (2 PTEN and 1 BRAF) brought the number of cases to 361 (89.8%). Mutually exclusive arm-level SCNAs were present in 27 additional cases, which were mostly FV. Although we cannot pinpoint the driving genes, if we assume that some of these SCNAs indeed act as drivers, the total number of cases with apparent drivers increased to 96.5%, leaving 14 (3.5%) as ‘dark matter’ cases. By reviewing events in these cases, we observed additional potential drivers such as APC, ATM, NF1, and SPOP, mutations of chromatin remodeling genes (e.g. MLL) and potential gene fusions (Table S5E). Including these events and considering arm-level SCNAs as drivers, we have identified putative cancer drivers in 397/402 PTCs (98.8%).

Signaling and Differentiation

PTC is a MAPK-driven cancer that has two mutually exclusive drivers with distinct signaling consequences: BRAF^{and mutated}RAS. Tumors driven by BRAF^{do not respond to the negative feedback from ERK to RAF (since it signals as a monomer), resulting in high MAPK-signaling (}Pratilas et al., 2009). Conversely, tumors driven by RAS and RTK fusions signal via RAF dimers that respond to ERK feedback, resulting in lower MAPK-signaling. This differential signaling results in profound phenotypic differences. For example, expression of genes responsible for iodine uptake and metabolism are greatly reduced in BRAF^{tumors, in contrast to the “RAF-dimer” tumors in which expression of these genes is largely preserved (}Durante et al., 2007). These observations, together with the relatively low number of other genomic alterations, allow for a clear view of the signaling and transcriptional outputs of these two primary drivers. The distinct profile of expression of genes involved in thyroid hormone biosynthesis that we observed between BRAF^andRAS-driven tumors is recapitulated closely in mouse PTC models induced by knock-in mutations of Braf^{or Hras}⁽Charles et al., 2011; Franco et al., 2011), suggesting that these arise as a consequence of the constitutive activation of these drivers.

To explore these relationships across our cohort, we developed a BRAF^-RAS score (BRS) to quantify the extent to which the gene expression profile of a given tumor resembles either the BRAF^- or RAS-mutant profiles. Using 391 samples with both exome and RNA sequencing data, we compared BRAF^{-mutated and}RAS-mutated tumors to derive a 71-gene signature. Correlations with this signature were used to derive a continuous measure (-1 to +1) with BRAF^{-like (BVL) PTCs being negative and}RAS-like (RL) PTCs positive (see Supplement). As expected, this signature showed strong separation of the BRAF^{- and}RAS-mutant tumors (Figures S7A,B).

We then used the BRS as a reference continuous scale from most BVL to most RL to interrogate the signaling consequences of the other, less common, mutations (Figures 4A,B). All BRAF mutations other than BRAF^{exhibited RL behavior, including one}BRAF^{, a splice-site mutation and three indels. This is consistent with previous observations that}BRAF^{occurs in FV tumors that are mostly RL-PTCs (}Park et al., 2013). All of the BRAF fusions were BVL. Four of the six EIF1AX mutations were RL, one neutral and one weakly BVL. All of the PAX8/PPARG fusions were RL, consistent with their prevalence in follicular-patterned tumors. Nearly all of the RET fusions were weakly BVL and the NTRK1/3 and ALK fusions were largely neutral.

An external file that holds a picture, illustration, etc.
Object name is nihms633017f4.jpg

Figure 4

The BRAF^-RAS Score

(A) Thyroid samples (n=391) were ranked by BRAF^-RAS score (BRS), with BRAF^{-like and}RAS-like samples having negative (-1 to 0) and positive scores (0 to 1), respectively. The BRAF^-RAS score is strongly associated with: (B) driver mutation status; (C) thyroid differentiation score (TDS); (D) single data-type clusters and (E) histology and follicular fraction. The RAS-like samples (normalized score > 0, in red on the top bar) consistently emerged as a distinct subgroup characterized by a higher TDS. See also Figures S6 and S7A, B and Tables S2 and S4B.

Next, we focused on thyroid differentiation, which plays a central role in thyroid cancer. We summarized the expression levels of 16 thyroid metabolism and function genes (Table S5F), which were highly correlated across our cohort, and produced a single metric, designated the Thyroid Differentiation Score (TDS). The TDS and BRS measures were highly correlated across all tumors (Spearman = 0.78, P=3.1×10^{), despite being derived from different gene sets. This correlation was mainly driven by RL-PTCs having relatively high TDS values. The}BRAF^{PTC cohort, considered a homogeneous group in numerous studies, showed a wide range of TDS values (}Figure 5), and maintained the TDS and BRS correlation, albeit to a lesser degree (Spearman = 0.38, p=3.0×10^,Figures 4,,55 and S7C-E). To gain insights regarding the observed TDS variation, we identified other genes whose expression levels correlated with TDS across all tumors and within the BRAF^{cohort. We discovered that the TDS was associated with global expression changes, significantly correlated and anti-correlated with thousands of genes in both cohorts (}Figures S7F,G). We obtained similar observations using previously reported DNA microarray data (Giordano et al., 2005) (Figure S7H). Next, to test whether differences in the TDS within the BRAF^{-mutant cohort were associated with subtle architectural changes, we histologically graded the tumors (}Figure S7I) and showed that TDS was indeed correlated with grade (Kruskal-Wallis test p=4×10^,Figure S7J). TDS also correlated with risk (p=2×10^{) and MACIS (Spearman correlation p=1.3×10}^{) but only weakly with tumor purity (p=1.5×10}^{) (}Figure S7J), indicating that the observed differences in TDS and global expression levels were not strongly influenced by variations in levels of tumor stroma or lymphocyte infiltration. These results support the validity of the TDS and BRS and illustrate that, while independently derived, these measures reflect similar biological properties that are profoundly reflected by gene expression in PTC.

An external file that holds a picture, illustration, etc.
Object name is nihms633017f5.jpg

Figure 5

Role of Thyroid Differentiation in Papillary Thyroid Carcinomas

Thyroid Differentiation Score (TDS) across the cohort with tumors sorted by driver mutation and TDS. Below TDS are the BRAF^-RAS score (BRS), ERK signature, histological type, MACIS score, risk of recurrence, driver mutations, gene expression data for nine thyroid genes used to derive the TDS (TG, TPO, SLC26A4 (pendrin), SLC5A5 (Na/I symporter), SLC5A8 (apical iodide transporter), DIO1, DIO2, DUOX1, DUOX2), four selected mRNAs correlated to TDS, and three selected miRs correlated to TDS. Featured mRNA (except for 16 thyroid genes) and miRNA genes were selected based on Spearman correlation to TDS in the BRAF^{cohort (*) and the full cohort (**) (see}Supplement). See also Figures S7C-J and Table S5F.

Although the TDS was correlated with many genes, among the most correlated were several genes with cancer relevance (TFF3, KIT, PVRL4 and FHL1, q=8.02×10^{, 5.62×10}^{, 4.08×10}^{and 7.52×10}^{, respectively,}Figures 5 and S7F-G). Among miRs with cancer relevance, miR-21, miR-146b and miR-204 were highly correlated with the TDS (Figure 5 and Figures S7F-G), with miR-21 being the most negatively correlated miR in both the entire and BRAF^{cohorts. miR-21 and miR-146b are oncogenic miRs in several tumor types (}Di Leva et al., 2014) and miR-204 is down-regulated in several tumors types and may be a tumor suppressor (Imam et al., 2012). We subsequently identified variable expression of these miRs in distinct miR clusters with different TDS and BRS values (see Molecular Classification).

Our data demonstrate significant gene expression variation across the BRAF^{cohort, which may account for the range of differentiation observed and may explain the uncertainty regarding the prognostic and predictive power of}BRAF^{mutation (}Xing et al., 2013). Dedifferentiation likely plays a role in dampening responses to RAI therapy and is consistent with observations that RAI-refractory metastases are enriched for BRAF^{mutants (}Sabra et al., 2013). Although other factors are likely involved, our results support the view that potent constitutive activation of the MAPK transcriptional output by oncogenic BRAF^{downregulates the expression of genes involved in iodine metabolism (}Chakravarty et al., 2011; Franco et al., 2011). Of note, the loss of differentiation within the BRAF^{cohort is likely smaller than that observed in histologically aggressive thyroid cancers.}

An integrated view of our TDS analysis (Figure 5) summarizes how RL-PTCs result in highly differentiated tumors enriched for follicular histology with distinct gene expression and DNA methylation patterns. Conversely, BVL-PTCs result in predominantly less differentiated tumors enriched for classical and tall cell histology, with distinct gene expression and DNA methylation patterns. Further assessment of BRS and TDS might be useful in the setting of a clinical trial and even pathology practice using immunohistochemistry.

To better understand the downstream signaling effects of the main driver events (BRAF^{, RAS), we examined mRNA expression, protein and phosphoprotein levels of various signaling pathways. We used a 52-gene signature derived by inhibiting MEK in a}BRAF^{melanoma cell line (}Pratilas et al., 2009) to assess the ERK (and MAPK) activation level. BRS was highly correlated to ERK activation level (i.e. ERK score, see Supplement); BVL-PTCs showed over-activation of the pathway (Figure S8A), and increased expression of DUSP genes (Figure 6). Note that, although the two scores are highly correlated, they were derived independently and have no genes in common. In wild-type cells, ERK induces a negative feedback on effectors upstream in the pathway, resulting in impairment of RAF dimerization. Because BRAF^{signals as a monomer, it is insensitive to feedback leading to ERK over-activation (}Poulikakos et al., 2010).

An external file that holds a picture, illustration, etc.
Object name is nihms633017f6.jpg

Figure 6

Downstream Signaling of BVL and RL PTCs

(A) MAPK and PI3K pathways are differentially activated in the BVL and RL PTCs.

(B) BRAF^{-mutated cases show robust activation of MAPK signaling resulting in higher output of the ERK transcriptional program, represented in particular by DUSP (DUSP4, 5 and 6) mRNAs. This may be due to insensitivity of BRAF}^{to ERK inhibitory feedback. By contrast,}RAS-like tumors activated both MAPK and PI3K/AKT signaling, as shown by higher pAKT levels in these tumors. The mechanism by which RAS-like tumors activated MAPK signaling was distinct from that of BRAF^{tumors, as they had higher CRAF phosphorylation, consistent with engagement of RAF dimers. Paradoxically, RL-PTCs had higher phosphorylation of the ERK substrate p90RSK, which was associated with mTOR activation, likely through phosphorylation and consequent inhibition of TSC2. RL-PTCs also showed activation of an anti-apoptotic program, characterized by S112-BAD phosphorylation (a target of P90RSK) and BCL2 over-expression. See also}Figures S8 and Tables S4B,F.

RET fusions consistently had BVL phenotype, high MAPK activity and were associated with low pS338-CRAF and pS299-ARAF (Figure S8B), consistent with preferential signaling via BRAF homodimers (Mitsutake et al., 2006). RL-PTCs had concurrent activation of PI3K/AKT and MAPK signaling, the latter mostly through c-RAF phosphorylation (Figure 6). Despite having lower MAPK activity than BVL-PTCs, RL-PTCs showed significantly higher phosphorylation of p90RSK, a direct ERK substrate. Activation of p90RSK was associated with robust inhibition of TSC2, a distinguishing hallmark of the two functional classes, and likely to induce mTOR. Elevated p90RSK in the RL-PTCs was also associated with phosphorylation of its substrate BAD, and with concurrent BCL2 over-expression, leading to anti-apoptotic signaling (Figure 6 and Table S4B). Finally, we used TieDIE (Paull et al., 2013) to assess differential pathway activation between BVL-PTC and RL-PTC (see Supplement). This approach identified the small GTPase RHEB, a known regulator of mTOR activity (Groenewoud and Zwartkruis, 2013), as a contributing factor to the differences observed between BVL-PTC and RL-PTC (Figure S8C, Table S4F and Data File S1). These findings confirm many of the known signaling changes induced by BRAF^andRAS mutations in PTC, provide a framework for MAPK downstream activity in tumors with other driver alterations, and importantly shed new light on the role of p90RSK as a crucial crossroad for MAPK, mTOR and BCL2 signaling in RAS-driven tumors.

Molecular Classification

The comprehensive and multiplatform molecular data and large sample size in this study provide an opportunity to derive and refine the classification of PTC into molecular subtypes and associate them with clinically relevant parameters. To this end, we leveraged the BRS and TDS measures to inform the relationships between tumor cluster, histology, genotype, signaling and differentiation.

Applying unsupervised clustering methods to four genomic datasets yielded a different number of subtypes for each dataset: five for mRNA expression, six for miR expression, four for DNA methylation and four for protein expression (Figures S9A-D). All clustering results were consistent with two meta-clusters that separated the BRAF^{-driven tumors (BVL-PTCs) from ones with}RAS mutations (RL-PTCs), recapitulating the BRS-partitioning and association with histological subtypes (Figures 4 and S9A). We used Stratome× (Streit et al., 2014) to visually highlight these relationships (see THCA publication page); in particular, the significant distinction between BVL-PTCs and RL-PTCs that is evident in each of the molecular datasets (Table S5G). We also applied SuperCluster (see Supplement) to the four genomic datasets, which supported the overarching separation of BVL-PTCs and RL-PTCs (Figure S9E).

Next, we focused on the internal structure reported by different data types for these two meta-clusters. The RL-PTC group was associated with a single cluster in all datasets except DNA methylation (described below). Overall, this group was characterized by FV histology, relatively low risk of recurrence, distinct mRNA expression profiles with lower expression of immune response genes, higher expression of miR-182-5p and miR-183-5p, low levels of fibronectin, VHL and CHK2 proteins and high expression of claudin-7, TIGAR and BRCA2 proteins (Figures 7, S9B and S9D). RL tumors were highly differentiated and associated with younger patients. The DNA methylation data partitioned these tumors into two clusters: the larger, termed Meth-follicular showed few methylation changes compared to normal thyroid, while the other, termed the Meth-CpG Island cluster, was characterized by hypermethylation of a large number of CpG sites in islands and shores (Figure S9C). The significance of this distinction is unclear, although the Meth-CpG Island cluster tended to have high tumor purity with less lymphocyte infiltration and stromal cells (Figure S9C(d)).

An external file that holds a picture, illustration, etc.
Object name is nihms633017f7.jpg

Figure 7

Unsupervised clusters for miRNA-seq data

Heatmap showing discriminatory miRs (5p or 3p mature strands) with the largest 6% of metagene matrix scores (see Supplement), as well as miR-204-5p, 221-3p and 222-3p, which were highlighted in correlations to BRS and TDS scores (see Figure S10D). The scalebar shows log2 normalized (reads-per-million, RPM), median-centered miR abundance. miR names in red are discussed in the text. Gray vertical lines in the clinical information tracks mark samples without clinical data, and in the mutation tracks gray lines identify samples without sequence data. See also Figures S9, S10 and Tables S4C,D,E, 5G, 6.

The different datasets partitioned the BVL-PTC group into different numbers of subtypes (from 2 based on DNA methylation data to 5 based on miR data), which did not overlap with each other such as to form a fully consistent lower level partitioning of BVL-PTCs (Figures 4 and S9). Regardless of which dataset was used, the clusters were significantly different based on parameters like proportions of driver mutations and gene fusions, mutational densities, histological and risk profiles, age, BRS and TDS values (Table S5G). The most striking relationship between subtypes was that mRNA-cluster 5 was nearly fully embedded within miR-cluster 6 (86/106 mRNA-cluster 5 tumors were part of the 144 tumors in miR-cluster 6; Fisher's exact test p=1.3×10^{). This mRNA cluster, which we termed}tall cell-like, contained most of the TCV tumors (74%), had the highest frequency of BRAF^V600E mutations (78%), and the lowest BRS and TDS values (i.e. the strongest BVL phenotype and least differentiated). Since the tall cell-like cluster was associated with more advanced stage (see THCA publication page) and higher risk (Table S5G), this may be clinically relevant. We again used Stratome× to highlight the relationships between mRNA and miR clusters with recurrence risk and histology (see THCA publication page). The tall cell-like cluster was also identified by SuperCluster (Figure S9E).

Integrated miR Analysis

Given the increased role of miRs in determining cancer phenotype, we further examined the association between miR expression, molecular subtypes and clinical parameters. In addition to miR-182 and miR-183 in the RL-PTC group, several other cancer-relevant miRs were relatively abundant in other miR clusters (Figures 7 and S9B). These included oncomiRs (miR-21 and miR-146b) and tumor suppressor miRs (let-7 family, miR-204, and miR-375). OncomiRs miR-221 and miR-222 were reported to play a role in PTC aggressiveness (Mardente et al., 2012) and, in our data, were associated with less differentiated tumors (Figures 7 and S10D(b)).

We focused integrative analysis on miR-21, miR-146b, and miR-204 because they were epigenetically regulated, correlated with BRS and TDS, and/or differentially expressed between PTCs and normal thyroid, as well as between clusters derived from miRNA-seq data (Figures S9B and S10A,B,F). miR-21 expression correlated with highly variable DNA methylation (Figure S10C), defined the tall cell-like mRNA and DNA methylation Meth-classical-1 clusters, and was highly correlated with low BRS and TDS values (Figures 7 and S10D). miR-21 is a regulator of several cancer-related genes (Di Leva et al., 2014). In our data, using Regulome Explorer's pairwise associations, its expression was anti-correlated with expression of cancer-promoting genes and regulators of apoptosis (e.g. PDCD4) (Figure S10E). PDCD4 is a miR-21 target gene reported to function as a tumor suppressor in diverse tumors (Zhu et al., 2008). These observations raise the possibility that increased miR-21 expression via epigenetic dysregulation may contribute to the clinically aggressive nature of this BVL-PTC sub-cluster and may partly explain the aggressive nature of TCV.

miR-146b expression exhibited similar patterns of differential expression and correlations to DNA methylation, BRS, and TDS (Figures 7 and S10C-D), and likely influenced expression of, for example, IRAK1, KIT and TRAF6 (Figure S10E). These results are consistent with observations that miR-146b is associated with risk of recurrence and promotes cell migration and invasion (Chou et al., 2013).

miR-204 expression, while less influenced by DNA methylation, was preferentially lost in miR clusters 5 and 6, the two BVL-PTC sub-clusters with the lowest BRS and TDS values (Figures 7 and S9B). This is consistent with data from other tumors, i.e. miR-204 functions as a tumor suppressor and high levels suppress cell migration, invasion and EMT (Qiu et al., 2013). These results suggest that loss of miR-204 may also contribute to aggressive PTCs with BRAF^{mutations. Collectively, our results are consistent with prior studies and suggest that miRs may regulate fundamental aspects of the PTC phenotype,}i.e. signaling, differentiation, invasion and metastasis, by fine-tuning gene expression.

Samples, Clinical Data and Analytical Approach

Somatic Single Nucleotide Variants and Insertions and Deletions

Figure 1

Landscape of Genomic Alterations in 402 Papillary Thyroid Carcinomas

Figure 2

TERT Promoter Mutations and Clonality Assessment of Driver Mutations

(A-C) Association of TERT promoter mutations with (A) risk of recurrence, (B) MACIS score, and (C) thyroid differentiation score (TDS). See also Table S2.

Gene Fusions

Figure 3

Candidate “Driver” Gene Fusions in Papillary Thyroid Carcinoma

Somatic Copy Number Alterations

Integrated Analyses of Somatic Alterations

‘Dark Matter’ Summary

Signaling and Differentiation

Figure 4

The BRAF^-RAS Score

Figure 5

Role of Thyroid Differentiation in Papillary Thyroid Carcinomas

Figure 6

Downstream Signaling of BVL and RL PTCs

(A) MAPK and PI3K pathways are differentially activated in the BVL and RL PTCs.

Molecular Classification

Figure 7

Unsupervised clusters for miRNA-seq data

Integrated miR Analysis

Discussion

This study illustrates the dominant role and mutually exclusive nature of driving somatic genetic alterations, be they SSNVs, indels, or fusions, in the MAPK and PI3K pathways in PTC. The relative low overall density of somatic mutations may be the biological basis for the indolent clinical behavior of PTC. We discovered new driver mutations in PTC, either entirely novel in this cancer (EIF1AX) or novel alterations of known drivers (RET, BRAF and ALK fusions). As a result of these discoveries, the ‘dark matter’ of the PTC genome has been reduced substantially from ∼25% to less than 4%, which should have profound consequences for preoperative cancer diagnosis in thyroid nodules. Molecular testing of mutation hotspots, rearrangements and gene expression using fine-needle aspiration specimens has become an effective diagnostic tool to more precisely select patients for thyroid surgery (Alexander et al., 2012; Nikiforov et al., 2011), thereby reducing the number of thyroidectomies done for benign nodules and tumors (Nikiforov et al., 2013), and determining the extent of initial thyroid surgery (i.e. lobectomy vs. total thyroidectomy) (Yip et al., 2014). Through these advances, molecular diagnostics has improved the care of patients with thyroid nodules and cancer. Our expansion of the PTC somatic genetic landscape has the potential to even further enhance the care of these patients. This study also offers conclusive evidence that mutated BRAF and other driving mutations are clonal events present in the majority of cells within tumors and identified novel fusion partners of oncogenes (e.g. RET, BRAF, ALK), expanding the biological basis for targeted therapy.

Beyond the driver mutations, we discovered individual genes (CHEK2, ATM, TERT) and sets of functionally related genes (chromatin-remodeling) with alterations or expression patterns (miR-21 and miR-146b) that define clinically-relevant subclasses and may contribute to loss of differentiation and tumor progression. Specifically, increased expression of miR-21 was associated with a known aggressive form of PTC (tall cell variant) and may be a critical event in its pathogenesis. Similarly, TERT promoter mutations identified a subset of aggressive, less differentiated PTCs, consistent with recent reports (Melo et al., 2014; Xing et al., 2014). Our study also indicates that BRAF^{PTC represents a diverse group of tumors, consisting of at least four molecular subtypes, with variable degrees of thyroid differentiation. Collectively, our results suggest that}BRAF^{PTC should not be considered a homogeneous group in clinical studies and that future studies should include molecular components designed to capture the breadth of genetic diversity among PTCs.}

We demonstrate striking signaling differences in RAS- and BRAF^{-driven PTCs. In particular, BVL-PTCs signal preferentially through MAPK while RL-PTCs signal through both MAPK and PI3K. The relative simplicity of the PTC genome, with dominant mutually exclusive driving events, together with the large cohort and comprehensive data analyzed in this study enabled us to clearly dissect these signaling differences.}

Our overarching conclusion is that RL-PTCs and BVL-PTCs are fundamentally different in their genomic, epigenomic and proteomic profiles. This is consistent with their known histological differences and the published literature. However, the breadth and depth of our integrative findings have wide implications for basic pathobiology, tumor classification schemes, traditional and targeted therapies. This view is supported by recent data suggesting differential response to a MEK inhibitor related to thyroid cancer genotype (Ho et al., 2013). We feel, based on the strength of our multidimensional genomic findings, that a pathologic reclassification of follicular-patterned thyroid lesions is justified. There was a time when follicular-patterned PTCs (i.e. RL-PTCs) were classified as follicular carcinomas. Perhaps the time has come to revise the classification of thyroid cancer to reunite the FV of PTC with follicular carcinomas. Moreover, a refined classification scheme that more accurately reflects the genotypic and phenotypic differences between and within RL and BVL PTCs would lead to more precise surgical and medical therapy, especially as thyroid cancer therapy enters the realm of precision medicine.

Methods summary

Tumor and normal thyroid samples were obtained from patients with approval from local Institutional Review Boards. DNA, RNA and protein were purified and distributed throughout the TCGA network. In total, 496 primary tumors and 8 metastatic tumors with associated clinicopathologic data were assayed on at least one molecular profiling platform. Platforms included exome and whole genome DNA sequencing, RNA sequencing, miRNA sequencing, SNP arrays, DNA methylation arrays, and reverse phase protein arrays. Integrated multi-platform analyses were performed. The data and analysis results can be explored through the Broad Institute GDAC portal (http://dx.doi.org/10.7908/C17P8WZG) and FireBrowse portal (http://firebrowse.org/?cohort=THCA), Memorial Sloan Kettering Cancer Center cBioPortal (http://www.cbioportal.org/public-portal/study.do?cancer_study_id=thca_tcga), TieDIE (http://sysbiowiki.soe.ucsc.edu/tiedie), MBatch batch effects assessor (http://bioinformatics.mdanderson.org/tcgambatch/), Regulome Explorer (http://explorer.cancerregulome.org/) and Next-Generation Clustered Heat Maps (http://bioinformatics.mdanderson.org/TCGA/NGCHMPortal/). See also the Supplement and the THCA publication page (https://tcga-data.nci.nih.gov/docs/publications/thca_2014/).

Methods summary

Supplementary Material

Acknowledgments

We are grateful to all the patients and families who contributed to this study, to Chris Gunter for editing and Margi Sheth and Jiashan (Julia) Zhang for project management. Thomas Giordano thanks his colleagues who covered his clinical duties. Supported by the following grants from the United States National Institutes of Health: 5U24CA143799, 5U24CA143835, 5U24CA143840, 5U24CA143843, 5U24CA143845, 5U24CA143848, 5U24CA143858, 5U24CA143866, 5U24CA143867, 5U24CA143882, 5U24CA143883, 5U24CA144025, U54HG003067, U54HG003079, and U54HG003273, P30CA16672.

Correspondence to: Thomas J. Giordano (T.J.G), ude.hcimu@onadroig or, Gad Getz (G.G) gro.etutitsnidaorb@ztegdag

Publisher's Disclaimer

Summary

Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer. Here, we describe the genomic landscape of 496 PTCs. We observed a low frequency of somatic alterations (relative to other carcinomas) and extended the set of known PTC driver alterations to include EIF1AX, PPM1D and CHEK2 and diverse gene fusions. These discoveries reduced the fraction of PTC cases with unknown oncogenic driver from 25% to 3.5%. Combined analyses of genomic variants, gene expression, and methylation demonstrated that different driver groups lead to different pathologies with distinct signaling and differentiation characteristics. Similarly, we identified distinct molecular subgroups of BRAF-mutant tumors and multidimensional analyses highlighted a potential involvement of oncomiRs in less-differentiated subgroups. Our results propose a reclassification of thyroid cancers into molecular subtypes that better reflect their underlying signaling and differentiation properties, which has the potential to improve their pathological classification and better inform the management of the disease.

Summary

Footnotes

Competing financial interests: The authors declare no competing financial interests.

Data access: The primary and processed data used to generate the analyses presented here can be downloaded by registered users from The Cancer Genome Atlas at https://tcga-data.nci.nih.gov/tcga/tcgaDownload.jsp. The primary sequence files are deposited in CGHub (https://cghub.ucsc.edu/) and all other molecular, clinical and pathological data are deposited at the Data Coordinating Center (DCC) for public access (http://cancergenome.nih.gov/, https://cghub.ucsc.edu/ and https://tcga-data.nci.nih.gov/docs/publications/thca_2014/) and the TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga/).

Consortia: The members of The Cancer Genome Atlas Research Network for this project are Nishant Agrawal, Rehan Akbani, B. Arman Aksoy, Adrian Ally, Harindra Arachchi, Sylvia L. Asa, J. Todd Auman, Miruna Balasundaram, Saianand Balu, Stephen B. Baylin, Madhusmita Behera, Brady Bernard, Rameen Beroukhim, Justin A. Bishop, Aaron D. Black, Tom Bodenheimer, Lori Boice, Moiz S. Bootwalla, Jay Bowen, Reanne Bowlby, Christopher A. Bristow, Robin Brookens, Denise Brooks, Robert Bryant, Elizabeth Buda, Yaron S.N. Butterfield, Tobias Carling, Rebecca Carlsen, Scott L. Carter, Sally E. Carty, Timothy A. Chan, Amy Y. Chen, Andrew D. Cherniack, Dorothy Cheung, Lynda Chin, Juok Cho, Andy Chu, Eric Chuah, Kristian Cibulskis, Giovanni Ciriello, Amanda Clarke, Gary L. Clayman, Leslie Cope, John Copland, Kyle Covington, Ludmila Danilova, Tanja Davidsen, John A. Demchok, Daniel DiCara, Noreen Dhalla, Rajiv Dhir, Sheliann S. Dookran, Gideon Dresdner, Jonathan Eldridge, Greg Eley, Adel K. El-Naggar, Stephanie Eng, James A. Fagin, Timothy Fennell, Robert L. Ferris, Sheila Fisher, Scott Frazer, Jessica Frick, Stacey B. Gabriel, Ian Ganly, Jianjiong Gao, Levi A. Garraway, Julie M. Gastier-Foster, Gad Getz, Nils Gehlenborg, Ronald Ghossein, Richard A. Gibbs, Thomas J. Giordano, Karen Gomez-Hernandez, Jonna Grimsby, Benjamin Gross, Ranabir Guin, Angela Hadjipanayis, Hollie A. Harper, D. Neil Hayes, David I. Heiman, James G. Herman, Katherine A. Hoadley, Matan Hofree, Robert A. Holt, Alan P. Hoyle, Franklin W. Huang, Mei Huang, Carolyn M. Hutter, Trey Ideker, Lisa Iype, Anders Jacobsen, Stuart R. Jefferys, Corbin D. Jones, Steven J.M. Jones, Katayoon Kasaian, Electron Kebebew, Fadlo R. Khuri, Jaegil Kim, Roger Kramer, Richard Kreisberg, Raju Kucherlapati, David J. Kwiatkowski, Marc Ladanyi, Phillip H. Lai, Peter W. Laird, Eric Lander, Michael S. Lawrence, Darlene Lee, Eunjung Lee, Semin Lee, William Lee, Kristen M. Leraas, Tara M. Lichtenberg, Lee Lichtenstein, Pei Lin, Shiyun Ling, Jinze Liu, Wenbin Liu, Yingchun Liu, Virginia A. LiVolsi, Yiling Lu, Yussanne Ma, Harshad S. Mahadeshwar, Marco A. Marra, Michael Mayo, David G. McFadden, Shaowu Meng, Matthew Meyerson, Piotr A. Mieczkowski, Michael Miller, Gordon Mills, Richard A. Moore, Lisle E. Mose, Andrew J. Mungall, Bradley A. Murray, Yuri E. Nikiforov, Michael S. Noble, Akinyemi I. Ojesina, Taofeek K. Owonikoko, Bradley A. Ozenberger, Angeliki Pantazi, Michael Parfenov, Peter J. Park, Joel S. Parker, Evan O. Paull, Chandra Sekhar Pedamallu, Charles M. Perou, Jan F. Prins, Alexei Protopopov, Suresh S. Ramalingam, Nilsa C. Ramirez, Ricardo Ramirez, Benjamin J. Raphael, W. Kimryn Rathmell, Xiaojia Ren, Sheila M. Reynolds, Esther Rheinbay, Matthew D. Ringel, Michael Rivera, Jeffrey Roach, A. Gordon Robertson, Mara W. Rosenberg, Matthew Rosenthall, Sara Sadeghi, Gordon Saksena, Chris Sander, Netty Santoso, Jacqueline E. Schein, Nikolaus Schultz, Steven E. Schumacher, Raja R. Seethala, Jonathan Seidman, Yasin Senbabaoglu, Sahil Seth, Samantha Sharpe, Kenna R. Mills Shaw, John P. Shen, Ronglai Shen, Steven Sherman, Margi Sheth, Yan Shi, Ilya Shmulevich, Gabriel L. Sica, Janae V. Simons, Payal Sipahimalani, Robert C. Smallridge, Heidi J. Sofia, Matthew G. Soloway, Xingzhi Song, Carrie Sougnez, Chip Stewart, Petar Stojanov, Joshua M. Stuart, Barbara Tabak, Angela Tam, Donghui Tan, Jiabin Tang, Roy Tarnuzzer, Barry S. Taylor, Nina Thiessen, Leigh Thorne, Vésteinn Thorsson, R. Michael Tuttle, Christopher B. Umbricht, David J. Van Den Berg, Fabio Vandin, Umadevi Veluvolu, Roel G.W. Verhaak, Michelle Vinco, Doug Voet, Vonn Walter, Zhining Wang, Scot Waring, Paul M. Weinberger, John N. Weinstein, Daniel J. Weisenberger, David Wheeler, Matthew D. Wilkerson, Jocelyn Wilson, Michelle Williams, Daniel A. Winer, Lisa Wise, Junyuan Wu, Liu Xi, Andrew W. Xu, Liming Yang, Lixing Yang, Travis I. Zack, Martha A. Zeiger, Dong Zeng, Jean Claude Zenklusen, Ni Zhao, Hailei Zhang, Jianhua Zhang, Jiashan (Julia) Zhang, Wei Zhang, Erik Zmuda, and Lihua Zou.

Author Contributions: Project leaders: G.G and T.J.G. Analysis coordinator: C.S. Data coordinator J.C. Manuscript coordinator T.J.G. Project coordinators: M.S. and J.Z. Clinical expertise and data analysis: S.L.A., J.B., J.A.F., I.G., T.J.G., E.K., L.I., K.L., T.M.L., D.G.M., A.P., M.D.R., R.C.S., C.B.U., Y.E.N. and M.A.Z. Supplemental pathology review: S.L.A., T.J.G., R.G., J.A.B. and Y.E.N. DNA sequence and copy number analysis: A.D.C., J.C., G.G., K.K., J.K., D.J.K., L.L., B.A.M., E.R., M.R., G.S., C.S. and C.S. TERT promoter sequencing and analysis: C.S., C.S., J.G., F.W.H., L.A.G., S.D.D., Y.E.N., A.H., R.K., and G.G. Gene fusions: A.H., K.A.H., R.K., C.S., and T.J.G. Viral sequence detection: A.I.O., M.M., C.S.P., S.S. and Y.M. DNA methylation analysis: L.C. and L.D. mRNA analysis: K.A.H., C.S., V.W., N.Z. and A.H. miRNA analysis: L.I. and A.G.R. Pathway/Network/Integrated analysis: G.C., J.A.F., N.G., M.H., L.I., E.O.P., B.J.R., M.D.R., A.G.R., J.M.S., N.S., Y.S. and F.V. Dark matter analysis: J.C., C.S. and T.J.G. Thyroid differentiation analysis: J.K., L.D., L.I., A.G.R., C.S., G.G. and T.J.G. RPPA analysis: R.A., W.L., Y.L. and G.B.M. Supercluster: R.A. Batch effect analysis: R.A., S.L. and J.N.W. Manuscript review: M.M. and R.K.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Footnotes