Comprehensive molecular portraits of human breast tumours.
Journal: 2012/November - Nature
ISSN: 1476-4687
We analysed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays. Our ability to integrate information across platforms provided key insights into previously defined gene expression subtypes and demonstrated the existence of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular heterogeneity. Somatic mutations in only three genes (TP53, PIK3CA and GATA3) occurred at >10% incidence across all breast cancers; however, there were numerous subtype-associated and novel gene mutations including the enrichment of specific mutations in GATA3, PIK3CA and MAP3K1 with the luminal A subtype. We identified two novel protein-expression-defined subgroups, possibly produced by stromal/microenvironmental elements, and integrated analyses identified specific signalling pathways dominant in each molecular subtype including a HER2/phosphorylated HER2/EGFR/phosphorylated EGFR signature within the HER2-enriched expression subtype. Comparison of basal-like breast tumours with high-grade serous ovarian tumours showed many molecular commonalities, indicating a related aetiology and similar therapeutic opportunities. The biological finding of the four main breast cancer subtypes caused by different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable plasticity and heterogeneity occurs within, and not across, these major biological subtypes of breast cancer.
Clinical trials
Similar articles
Articles by the same authors
Discussion board
Nature. Oct/3/2012; 490(7418): 61-70
Published online Sep/22/2012

Comprehensive molecular portraits of human breast tumors


We analyzed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, mRNA arrays, microRNA sequencing and reverse phase protein arrays. Our ability to integrate information across platforms provided key insights into previously-defined gene expression subtypes and demonstrated the existence of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular heterogeneity. Somatic mutations in only three genes (TP53, PIK3CA and GATA3) occurred at > 10% incidence across all breast cancers; however, there were numerous subtype-associated and novel gene mutations including the enrichment of specific mutations in GATA3, PIK3CA and MAP3K1 with the Luminal A subtype. We identified two novel protein expression-defined subgroups, possibly contributed by stromal/microenvironmental elements, and integrated analyses identified specific signaling pathways dominant in each molecular subtype including a HER2/p-HER2/HER1/p-HER1 signature within the HER2-Enriched expression subtype. Comparison of Basal-like breast tumors with high-grade Serous Ovarian tumors showed many molecular commonalities, suggesting a related etiology and similar therapeutic opportunities. The biologic finding of the four main breast cancer subtypes caused by different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable plasticity and heterogeneity occurs within, and not across, these major biologic subtypes of breast cancer.


Breast cancer is one of the most common cancers with greater than 1,300,000 cases and 450,000 deaths each year world-wide. Clinically, this heterogeneous disease is categorized into three basic therapeutic groups. The Estrogen Receptor (ER) positive group are the most numerous and diverse, with several genomic tests to assist in predicting outcomes for ER+ patients receiving endocrine therapy1,2. The HER2/ERBB2 amplified group3 is a great clinical success because of effective therapeutic targeting of HER2/ERBB2, which has led to intense efforts to characterize other DNA copy number aberrations4,5. Triple Negative Breast Cancers (TNBC), also known as Basal-like breast cancers6, are a group with only chemotherapy options, and an increased incidence in patients with germline BRCA1 mutations7,8 or of African ancestry9.

Most molecular studies of breast cancer have focused on just one or two high information content platforms, most frequently mRNA expression profiling or DNA copy number analysis, and more recently massively parallel sequencing1012. Supervised clustering of mRNA expression data has reproducibly established that breast cancers encompass several distinct disease entities, often referred to as the intrinsic subtypes of breast cancer13,14. The recent development of additional high information content assays focused on abnormalities in DNA methylation, microRNA expression and protein expression, provide further opportunities to more completely characterize the molecular architecture of breast cancer. In this study, a diverse set of breast tumors were assayed using six different technology platforms. Individual platform and integrated pathway analyses identified many subtype-specific mutations and copy number changes that identify therapeutically tractable genomic aberrations and other events driving tumor biology.

Samples and clinical data

Tumor and germline DNA samples were obtained from 825 patients. Different subsets of patients were assayed on each platform: 466 tumors from 463 patients had data available on five platforms including Agilent mRNA expression microarrays (n=547), Illumina Infinium DNA methylation chips (n=802), Affymetrix 6.0 SNP arrays (n=773), microRNA sequencing (n=697), and whole exome sequencing (n=507); in addition, 348 of the 466 samples also had Reverse Phase Protein Array (RPPA) data (n=403). Due to the short median overall follow up (17 months) and the small number of overall survival events (93/818), survival analyses will be presented in a later publication. Demographic and clinical characteristics are presented in Supplemental Table 1.

Significantly mutated genes in breast cancer

Overall, 510 tumors from 507 patients were subjected to whole exome sequencing, identifying 30,626 somatic mutations comprised of 28,319 point mutations, 4 dinucleotide mutations, and 2,302 indels (ranging from 1 to 53 nucleotides). The point mutations included 6,486 silent, 19,045 missense, 1,437 nonsense, 26 read-through, 506 splice-site mutations, and 819 mutations in RNA genes. Comparison to COSMIC and OMIM databases identified 619 mutations across 177 previously reported cancer genes. Of 19,045 missense mutations, 9,484 were predicted to have a high probability of being deleterious by Condel15. The MuSiC package (Dees et al., submitted), which determines the significance of the observed mutation rate of each gene based on the background mutation rate, identified 35SignificantlyMutatedGenes/SMG (excluding LOC or ENSG genes) by at least two tests (Convolution and Likelihood Ratio tests) with FDR < 5% (Supplemental Table 2).

In addition to identifying nearly all genes previously implicated in breast cancer (PIK3CA, PTEN, AKT1, TP53, GATA3, CDH1, RB1, MLL3, MAP3K1 and CDKN1B), a number of novel SMGs were identified including TBX3, RUNX1, CBFB, AFF2, PIK3R1, PTPN22, PTPRD, NF1, SF3B1, and CCND3. TBX3, which is mutated in Ulnar-Mammary Syndrome and involved in mammary gland development16, harbored 13 mutations (8 frame-shift indels, 1 in-frame deletion, 1 nonsense, and 3 missense), suggesting a loss of function. Additionally, 2 mutations were found in TBX4 and 1 mutation in TBX5, which are genes involved in Holt-Oram Syndrome17. Two other transcription factors, CTCF and FOXA1, were at or near significance harboring 13 and 8 mutations, respectively. RUNX1 and CBFB, both rearranged in Acute Myeloid Leukemia and interfering with hematopoietic differentiation, harbored 19 and 9 mutations, respectively. PIK3R1 contained 14 mutations, most of which clustered in the PIK3CA interaction domain similar to previously identified mutations in glioma18 and endometrial cancer19. We also observed a statistically significant exclusion pattern among PIK3R1, PIK3CA, PTEN, and AKT1 mutations (P = 0.025). Mutation of splicing factor SF3B1, previously described in Myelodysplastic Syndromes20 and Chronic Lymphocytic Leukemia21, was significant with 15 non-silent mutations, of which 4 were a recurrent K700E substitution. Two protein tyrosine phosphatases (PTPN22 and PTPRD) were also significantly mutated; frequent deletion/mutation of PTPRD is observed in lung adenocarcinoma22.

Association between mutations and mRNA-expression subtypes

We analyzed the somatic mutation spectrum within the context of the four mRNA-expression subtypes, excluding the Normal-like group due to small numbers (n=8) (Figure 1). Several SMGs showed mRNA-subtype (Supplemental Figures 1–3) and clinical-subtype specific patterns of mutation (Supplemental Table 2). SMGs were considerably more diverse and recurrent within Luminal A and Luminal B tumors than within Basal-like and HER2E subtypes; however, the overall mutation rate was lowest in Luminal A subtype and highest in the Basal-like and HER2E subtypes. The Luminal A subtype harbored the most SMGs, with the most frequent being PIK3CA (45%), followed by MAP3K1, GATA3, TP53, CDH1, and MAP2K4. 12% of Luminal A tumors contained likely inactivating mutations in MAP3K1 and MAP2K4, which represent two contiguous steps in the p38/JNK1 stress kinase pathway23. Luminal B cancers exhibited a diversity of SMGs with TP53 and PIK3CA (29% each) being the most frequent. The luminal tumor subtypes markedly contrasted with Basal-like cancers where TP53 mutations occurred in 80% of cases and the majority of the Luminal SMG repertoire, except PIK3CA (9%), were absent or near absent. The HER2-Enriched subtype (HER2E), which has frequent HER2/ERBB2 amplification (80%), had a hybrid pattern with a high frequency of TP53 (72%) and PIK3CA(39%) mutations and a much lower frequency of other SMGs including PIK3R1 (4%).

Intrinsic mRNA-subtypes differed not only by mutation frequencies, but also by mutation type. Most notably, TP53 mutations in Basal-like tumors were mostly nonsense and frame-shift, while missense mutations predominated in Luminal A and B tumors (Supplemental Figure 1). 58 somatic GATA3 mutations, some of which were previously described24, were detected including a hot spot 2bp deletion within intron 4 only in the Luminal A subtype (13/13 mutants) (Supplemental Figure 2). In contrast, 7/9 frame-shift mutations in exon 5 (DNA binding domain) occurred in Luminal B cancers. PIK3CA mutation frequency and spectrum also varied by mRNA-subtype (Supplemental Figure 3); the recurrent PIK3CA E545K mutation was present almost exclusively within Luminal A (25/27) tumors. CDH1 mutations were common (30/36) within the lobular histologic subtype and corresponded with lower CDH1 mRNA (Supplemental Figure 4) and protein expression. Finally, we identified 4/8 somatic variants in HER2/ERBB2 within lobular cancers, 3 of which were within the tyrosine kinase domain.

We performed analyses on a selected set of genes25 using the normal tissue DNA data and detected a number of germline predisposing variants. These analyses identified 47/507 patients with deleterious germline variants, representing nine different genes (ATM, BRCA1, BRCA2, BRIP1, CHEK2, NBN, PTEN, RAD51C, and TP53;Supplementary Table 3), supporting the hypothesis that ~10% of sporadic breast cancers may have a strong germline contribution. These data confirmed the association between the presence of germline BRCA1 mutations and Basal-like breast cancers7,8.

Gene expression analyses (mRNA and microRNA)

Several approaches were used to look for structure in the mRNA expression data. We performed an unsupervised hierarchical clustering analysis of 525 tumors and 22 tumor-adjacent normal tissues using the top 3,662 variably expressed genes (Supplemental Figure 5); SigClust analysis identified 12 classes (5 classes with >9 samples/class). We performed a semi-supervised hierarchical cluster analysis using a previously published “intrinsic gene list”14, which identified 13 classes (9 classes with >9 samples/class) (Supplemental Figure 6). We also classified each sample using the 50-gene PAM50 model14 (Supplemental Figure 5). High concordance was observed between all three, therefore, we used the PAM50-defined subtype predictor as a common classification metric. There were only 8 Normal-like and 8 Claudin-low tumors26, thus no analyses focused on these two subtypes were performed.

MicroRNA expression levels were assayed via Illumina sequencing, using 1222 miRBase27 v16 mature and star strands as the reference database of microRNA transcripts/genes. Seven subtypes were identified by consensus NMF clustering using an abundance matrix containing the 25% most variable microRNAs (306 transcripts/genes or MIMATs). These subtypes correlated with mRNA-subtypes, ER, PR and HER2 clinical status (Supplemental Figure 7). Of note, microRNA groups 4 and 5 showed high overlap with the Basal-like mRNA subtype and contained many TP53 mutations. The remaining microRNA groups (1–3, 6 and 7) were composed of a mixture of Luminal A, Luminal B and HER2-positive tumors with little correlation with the PAM50 defined subtypes. With the exception of TP53, which showed a strong positive correlation, and PIK3CA and GATA3 that showed negative associations with groups 4 and 5 respectively, there was little correlation with mutation status and microRNA subtype.

DNA methylation

llumina Infinium DNA methylation arrays were used to assay 802 breast tumors. Data from HumanMethylation27 (HM27) and HumanMethylation450 (HM450) arrays were combined and filtered to yield a common set of 574 probes used in an unsupervised clustering analysis, which identified five distinct DNA methylation groups (Supplemental Figure 8). Group 3 showed a hyper-methylated phenotype and was significantly enriched for Luminal B mRNA-subtype and under-represented for PIK3CA and MAP3K1/MAP2K4 mutations. Group 5 showed the lowest levels of DNA methylation, overlapped with the Basal-like mRNA-subtype, and showed a high frequency of TP53 mutations. HER2-positive clinical status, or the HER2E mRNA subtype, had only a modest association with the methylation subtypes.

A supervised analysis of the DNA methylation and mRNA expression data was performed to compare the DNA methylation Group 3 (N=49) versus all tumors in groups 1, 2, and 4 (excluding Group 5 that consisted predominantly of Basal-like tumors). This analysis identified 4,283 genes differentially methylated (3,735 higher in Group 3 tumors) and 1,899 genes differentially expressed (1,232 downregulated); 490 genes were both methylated and showed lower expression in Group 3 tumors (Supplemental Table 4). A DAVID functional annotation analysis identified ‘Extracellular region part’ and ‘Wnt signaling pathway’ to be associated with this 490 gene-set; the Group 3 hyper-methylated samples showed fewer PIK3CA and MAP3K1 mutations, and lower expression of Wnt-pathway genes.

DNA copy number

773 breast tumors were assayed using Affymetrix 6.0 SNP arrays. Segmentation analysis and GISTIC were used to identify focal amplifications/deletions and arm-level gains and losses (Supplemental Table 5). These analyses confirmed all previously reported copy number variations and highlighted a number of SMGs including focal amplification of regions containing PIK3CA, HER1/EGFR, FOXA1, and HER2/ERBB2, as well as focal deletions of regions containing MLL3, PTEN, RB1 and MAP2K4 (Supplemental Figure 9); in all cases, multiple genes were included within each altered region. Importantly, many of these copy number changes correlated with mRNA-subtype including characteristic loss of 5q and gain of 10p in Basal-like cancers5,28 and gain of 1q and/or 16q loss in Luminal tumors4. NMF clustering of GISTIC segments identified five copy number clusters/groups that correlated with mRNA-subtypes, ER, PR and HER2 clinical status, and TP53 mutation status (Supplemental Figure 10). In addition, this aCGH subtype classification was highly correlated with the aCGH subtypes recently defined by Curtis et al.29(Supplemental Figure 11).

Reverse Phase Protein Arrays (RPPA)

Quantified expression of 171 cancer-related proteins and phospho-proteins by RPPA was performed on 403 breast tumors30. Unsupervised hierarchical clustering analyses identified seven subtypes; one class contained too few cases for further analysis (Supplemental Figure 12). These protein subtypes were highly concordant with the mRNA-subtypes, particularly with Basal-like and HER2E mRNA subtypes. Closer examination of the HER2-containing RPPA-defined subgroup showed coordinated overexpression of HER2 and HER1 with a strong concordance with phosphorylated HER2 (pY1248) and HER1 (pY992), likely from heterodimerization and cross-phosphorylation. While there is a potential for modest cross reactivity of antibodies against these related total and phosphoproteins, the concordance of phosphorylation of HER2 and HER1 was confirmed using multiple independent antibodies.

In RPPA-defined Luminal tumors, there was high protein expression of ER, PR, AR, BCL2, GATA3 and INPP4B, defining mostly Luminal A cancers and a second more heterogeneous protein subgroup composed of both Luminal A and Luminal B cancers. Two potentially novel protein-defined subgroups were identified: Reactive I consisted primarily of a subset of Luminal A tumors, whereas Reactive II consisted of a mixture of mRNA-subtypes. These groups are termed ‘Reactive’ because many of the characteristic proteins are likely produced by the microenvironment and/or cancer-activated fibroblasts including Fibronectin, Caveolin 1 and Collagen VI. These two RPPA groups did not have a marked difference in the % tumor cell content versus each other, or the other protein subtypes, as assessed by SNP array analysis or pathological examination. In addition, supervised analyses of Reactive I vs. II groups using microRNA expression, DNA methylation, mutation, or DNA copy number data identified no significant differences between these groups, while similar supervised analyses using protein and mRNA expression identified many differences.

Multi-platform subtype discovery

To reveal higher order structure in breast tumors based on multiple data types, significant clusters/subtypes from each of five platforms were analyzed using a multi-platform data matrix subjected to unsupervised Consensus Clustering (Figure 2). This “Cluster of Clusters” (C of C) approach illustrated that Basal-like cancers had the most distinct multi-platform signature since all the different platforms for the Basal-like groups clustered together. To a great extent, the four major C of C subdivisions correlated well with the previously-published mRNA-subtypes (driven in part by the fact that the four intrinsic subtypes were one of the inputs). Therefore, we also performed C of C analysis with no mRNA data present (Supplemental Figure 13) or with the 12 unsupervised mRNA subtypes (Supplemental Figure 14), and in each case 4–6 groups were identified. Recent work by Curtis et al. identified 10 copy number based subgroups in a 997 breast cancer set29. We evaluated this classification in a C of C analysis instead of our 5-class copy number subtypes, with either the PAM50 (Supplemental Figure 15) or 12 unsupervised mRNA subtypes (Supplemental Figure 16); each of these C of C classifications were highly correlated with PAM50 mRNA-subtypes and with the other C of C analyses (Figure 2). The transcriptional profiling and RPPA platforms demonstrated a high correlation with the consensus structure suggesting that the information content from copy number aberrations, microRNAs, and methylation is captured at the level of gene expression and protein function.

Luminal/ER-positive summary analysis

Luminal/ER-positive breast cancers are the most heterogeneous in terms of gene expression (Supplemental Figure 5), mutation spectrum (Figure 1), copy number changes (Supplemental Figure 9) and patient outcomes1,14. One of the most dominant features is high mRNA and protein expression of the luminal expression signature (Supplemental Figure 5), which contains ESR1, GATA3, FOXA1, XBP1 and cMYB; the Luminal/ER-positive cluster also contained the largest number of SMGs. Most notably, GATA3 and FOXA1 were mutated in a mutually exclusive fashion, while ESR1 and XBP1 were typically highly expressed but infrequently mutated. Mutations in RUNX1 and its dimerization partner CBFB may also play a role in aberrant ER-signaling in Luminals as RUNX1 functions as an ER “DNA tethering factor”31. PARADIGM32 analysis comparing Luminal vs. Basal-like cancers further emphasized the presence of a hyperactivated FOXA1/ER complex as a critical network hub differentiating these two tumor subtypes (Supplemental Figure 17).

A confirmatory finding here was the high mutation frequency of PIK3CA in Luminal/ER-positive breast cancers33,34. Through multiple technology platforms, we examined possible relationships between PIK3CA mutation, PTEN loss, INPP4B loss and multiple gene and protein expression signatures of pathway activity. RPPA data demonstrated that pAKT, pS6 and p4EBP1, typical markers of PI3K pathway activation, were not elevated in PIK3CA-mutated Luminal A cancers; instead, they were highly expressed in Basal-like and HER2E mRNA-subtypes (the latter having frequent PIK3CA mutations) and correlated strongly with INPP4B and PTEN loss, and to a degree with PIK3CA amplification. Similarly, protein35 and three mRNA signatures3638 of PI3K pathway activation were enriched in Basal-like over Luminal A cancers (Figure 3A). This apparent disconnect between the presence of PIK3CA mutations and biomarkers of pathway activation has been previously noted35.

Another striking Luminal/ER-positive subtype finding was the frequent mutation of MAP3K1 and MAP2K4, which represent two contiguous steps within the p38/JNK1 pathway23,39. These mutations are predicted to be inactivating, with MAP2K4 also a target of focal DNA loss in Luminal tumors (Supplemental Figure 9). To explore the possible interplay between PIK3CA and MAP3K1/MAP2K4 signaling, MEMo analysis40 was performed to identify mutually exclusive alterations targeting frequently altered genes likely to belong to the same pathway (Figure 4). Across all breast cancers, MEMo identified a set of modules that highlight the differential activation events within the Receptor Tyrosine Kinase (RTK)/PI3K pathway (Figure 4A); mutations of PIK3CA were very common in Luminal/ER-positive cancers while PTEN loss was more common in Basal-like tumors. Almost all MAP3K1/MAP2K4 mutations were in Luminal tumors, yet MAP3K1 and MAP2K4 appeared almost mutually exclusive relative to one another.

The TP53 pathway was differentially inactivated in Luminal/ER-positive breast cancers, with a low TP53 mutation frequency in Luminal A (12%) and a higher frequency in Luminal B (29%) (Figure 1). In addition to TP53 itself, a number of other pathway-inactivating events occurred including ATM loss and MDM2 amplification (Figure 3B and 4B), both of which occurred more frequently within Luminal B cancers. Gene expression analysis demonstrated that individual markers of functional TP53 (GADD45A and CDKN1A), and TP53 activity41,42 signatures, were highest in Luminal A (Figure 3B). These data suggest that the TP53 pathway remains largely intact in Luminal A but is often inactivated in the more aggressive Luminal B43. Other PARADIGM-based pathway differences driving Luminal B versus Luminal A included hyperactivation of transcriptional activity associated with cMYC and FOXM1/proliferation.

The critical retinoblastoma/RB1 pathway also showed mRNA-subtype specific alterations (Figure 3C). RB1 itself, by mRNA and protein expression, was detectable in most Luminal cancers with highest levels within Luminal A. A common oncogenic event was Cyclin D1 amplification and high expression, which preferentially occurred within Luminal tumors, and more specifically within Luminal B. In contrast, the presumed tumor suppressor p18/CDKN2C was at its lowest levels in Luminal A, consistent with observations from mouse models44. Lastly, RB1 activity signatures were also high in Luminal cancers4547. Luminal A tumors, which have the best prognosis, are the most likely to retain activity of the major tumor suppressors RB1 and TP53.

These genomic characterizations also provided clues for druggable targets. We compiled a drug target table in which we defined a target as a gene/protein for which there is an approved or investigational drug in human clinical trials targeting the molecule or canonical pathway (Supplemental Table 6). In Luminal/ER-positive cancers, the high frequency of PIK3CA mutations suggests that inhibitors of this activated kinase or its signaling pathway may be beneficial. Other potential SMG drug candidates include AKT1 inhibitors (11/12 AKT1 variants were Luminal) and PARP inhibitors for BRCA1/2 mutations. Though still unapproved as biomarkers, many potential copy number-based drug targets were identified including amplifications of FGFRs and IGFR1, as well as Cyclin D1/CDK4/CDK6. A summary of the general findings in Luminal tumors and the other subtypes is presented in Table 1.

HER2-based classifications and summary analysis

DNA amplification of HER2/ERBB2 was readily evident in this study (Supplemental Figure 9), together with overexpression of multiple HER2-amplicon associated genes that in part, define the HER2-Enriched (HER2E) mRNA-subtype (Supplemental Figure 5). However, not all clinically HER2+ tumors are of the HER2E mRNA-subtype, and not all tumors in the HER2E mRNA-subtype are clinically HER2+. Integrated analysis of the RPPA and mRNA data clearly identified a HER2+ group (Supplemental Figure 12). When the HER2-positive protein and HER2E mRNA-subtypes overlapped, a strong signal of HER1, p-HER1, HER2, and p-HER2 was observed. However, only ~50% of clinically HER2-positive tumors fall into this HER2E-mRNA-subtype/HER2-protein group, the rest of the clinically HER2-positive tumors were observed predominantly in the Luminal mRNA subtypes.

These data suggest that there exist at least two types of clinically defined HER2-positive tumors. To identify differences between these groups, a supervised gene expression analysis comparing 36 HER2E-mRNA-subtype/HER2-positive versus 31 Luminal-mRNA-subtype/HER2-positive tumors was performed and identified 302 differentially expressed genes (q-value = 0%) (Supplemental Figure 18, Supplemental Table 7). These genes largely track with ER status but also indicated that HER2E-mRNA-subtype/HER2-positive tumors showed significantly higher expression of a number of RTKs including FGFR4, HER1/EGFR, HER2 itself, as well as genes within the HER2-amplicon (i.e. GRB7). Conversely, the Luminal-mRNA-subtype/HER2-positive tumors showed higher expression of the Luminal cluster of genes including GATA3, BCL2, and ESR1. Further support for two types of clinically-defined HER2+ disease was evident in the somatic mutation data supervised by either mRNA-subtype or ER status; TP53 mutations were significantly enriched in HER2E or ER-negative tumors while GATA3 mutations were only observed in Luminal subtypes or ER-positive tumors.

Analysis of the RPPA data according to mRNA-subtype identified 36 differentially expressed proteins (q-value <5%) (Supplemental Figure 18G, Supplemental Table 8). The HER1/p-HER1/HER2/p-HER2 signal was again observed and present within the HER2E-mRNA-subtype/HER2-positive tumors, as was high p-SRC and p-S6; conversely, many protein markers of Luminal cancers again distinguished the Luminal-mRNA-subtype/HER2-positive tumors. Given the importance of clinical HER2 status, a more focused analysis was performed based upon the RPPA-defined protein expression of HER2 (Supplemental Figure 19); the results strongly recapitulated findings from the RPPA and mRNA-subtypes including a high correlation between HER2 clinical status, HER2 protein by RPPA, p-HER2, HER1, and p-HER1. These multiple signatures, namely HER2E-mRNA-subtype, HER2-amplicon genes by mRNA expression, and RPPA HER1/p-HER1/HER2/p-HER2 signature, ultimately identify at least two groups/subtypes within clinically HER2+ tumors (Table 1). These signatures represent breast cancer biomarker(s) that could potentially predict response to anti-HER2 targeted therapies.

Many therapeutic advances have been made for clinically HER2-positive disease. This study has identified additional somatic mutations that represent potential therapeutic targets within this group, including a high frequency of PIK3CA mutations (39%), a lower frequency of PTEN and PIK3R1 mutations (Supplemental Table 6), and genomic losses of PTEN and INPP4B. Other possible druggable mutations included variants within HER-family members including two somatic mutations in HER2, two within HER1/EGFR, and five within HER3. Pertuzumab, in combination with trastuzumab, targets the HER2-HER3 heterodimer48, however, these data suggest that targeting HER1 with HER2 could also be beneficial. Lastly, the HER2E-mRNA-subtype typically showed high aneuploidy, the highest somatic mutation rate (Table 1), and DNA amplification of other potential therapeutic targets including FGFRs, HER1/EGFR, CDK4 and Cyclin D1.

Basal-like summary analysis

The Basal-like subtype was discovered more than a decade ago by first-generation cDNA microarrays13. These tumors are often referred to as Triple-Negative Breast Cancers (TNBC) because most Basal-like tumors are typically negative for ER, PR and HER2. However, ~75% of TNBC are Basal-like with the other 25% comprised of all other mRNA-subtypes6. In this data set, there was a high degree of overlap between these two distinctions with 76 TNBC, 81 Basal-like, and 65 that were both TNBC and Basal-like. Given the known heterogeneity of TNBC, and that the Basal-like subtype proved to be distinct on every platform, we choose to use the Basal-like distinction for comparative analyses.

Basal-like tumors showed a high frequency of TP53 mutations (80%)9, which when combined with inferred TP53 pathway activity suggests that loss of TP53 function occurs within most, if not all, Basal-like cancers (Figure 3B). In addition to loss of TP53, a MEMo analysis reconfirmed that loss of RB1 and BRCA1 are Basal-like features (Figure 4C)46,49. PIK3CA was the next most commonly mutated gene (~9%); however, inferred PI3K pathway activity, whether from gene3638, protein35, or high PI3K/AKT pathway activities, was highest in Basal-like cancers (Figure 3A). Alternative means of activating the PI3K-pathway in Basal-like cancers likely includes loss of PTEN and INPP4B and/or amplification of PIK3CA. A recent paper from Shah et al.12 performed exome sequencing of 102 TNBC. Five of the top six most frequent TNBCs mutations in Shah et al. were also observed at a similar frequency in our TNBC subset (Myo3A not present here); of those five, three passed our test as a significantly mutated gene in TNBCs (Supplemental Table 2).

Expression features of Basal-like tumors include a characteristic signature containing Keratins 5, 6 and 17 and high expression of genes associated with cell proliferation (Supplemental Figure 5). A PARADIGM32 analysis of Basal-like versus Luminal tumors emphasized the importance of hyperactivated FOXM1 as a transcriptional driver of this enhanced proliferation signature (Supplemental Figure 17). PARADIGM also identified hyperactivated cMYC and HIF1α/ARNT network hubs as key regulatory features of Basal-like cancers. Even though chromosome 8q24 is amplified across all subtypes (Supplemental Figure 9), high cMYC activation appears to be a Basal-like characteristic50.

Given the striking contrasts between Basal-like and Luminal/HER2E subtypes, we performed a MEMo analysis on Basal-like tumors alone. The top-scoring module included ATM mutations, BRCA1 and BRCA2 inactivation, RB1 loss and Cyclin E1 amplification (Figure 4C). Interestingly, these same modules were identified previously for Serous Ovarian cancers40. Furthermore, the Basal-like (and TNBC) mutation spectrum was reminiscent of the spectrum seen in Serous Ovarian cancers51 with only one gene (i.e. TP53) at >10% mutation frequency. To explore possible similarities between Serous Ovarian and the breast Basal-like cancers, we performed a number of analyses comparing Ovarian vs. breast Luminal, Ovarian vs. breast Basal-like, and breast Basal-like vs. breast Luminal (Figure 5). Comparing copy number landscapes, we observed several common features between Ovarian and Basal-like tumors including widespread genomic instability and common gains of 1q, 3q, 8q and 12p, and loss of 4q, 5q and 8p (Supplemental Figure 20A). Using a more global copy number comparison, we examined the overall fraction of the genome altered and the overall copy number correlation of Ovarian cancers versus each breast cancer mRNA-subtype (Supplemental Figure 20A, B); in both cases, Basal-like tumors were the most similar to the Serous Ovarian carcinomas.

We systematically looked for other common features between Serous Ovarian and Basal-like when each was compared to Luminal. We identified: 1) BRCA1 inactivation, 2) RB1 loss and Cyclin E1 amplification, 3) high expression of AKT3, 4) cMYC amplification and high expression, and 5) a high frequency of TP53 mutations (Figure 5A). An additional supervised analysis of a large, external multi-tumor type transcriptomic dataset (GSE2109) was performed where each TCGA breast tumor expression profile was compared via a correlation analysis to that of each tumor in the multi-tumor set. Basal-like breast cancers clearly showed high mRNA expression correlations with Serous Ovarian cancers, as well as with Lung Squamous carcinomas (Figure 5B). A PARADIGM analysis that calculates whether a gene or pathway feature is both differentially activated in Basal-like versus Luminal cancers and has higher overall activity across the TCGA Ovarian samples was performed; this identified comparably high pathway activity of the HIF1α/ARNT, cMYC, and FOXM1 regulatory hubs in both Ovarian and Basal-like cancers (Supplemental Figure 20C). The common findings of TP53, RB1 and BRCA1 loss, with cMYC amplification strongly suggest that these are shared driving events for Basal-like and Serous Ovarian carcinogenesis. This suggests that common therapeutic approaches should be considered, which is supported by the activity of platinum analogs, and taxanes, in breast Basal-like and Serous Ovarian cancers.

Given that most Basal-like cancers are TNBC, finding new drug targets for this group is critical. Unfortunately, the somatic mutation repertoire for Basal-like breast cancers has not provided a common target aside from BRCA1 and BRCA2. Here we note that ~20% of Basal-like tumors had a germline (n=12) and/or somatic (n=8) BRCA1 or BRCA2 variant, which suggests 1 in 5 Basal-like patients might benefit from PARP inhibitors and/or platinum compounds52,53. The copy number landscape of Basal-like cancers showed multiple amplifications and deletions, some of which may provide therapeutic targets (Supplemental Table 6). Potential targets include losses of PTEN and INPP4B, both of which have been shown to sensitize cell lines to PI3K pathway inhibitors54,55. Interestingly, many of the components of the PI3K and RAS-RAF-MEK pathway were amplified (but not typically mutated) in Basal-like cancers including PIK3CA (49%), KRAS (32%), BRAF (30%), and HER1/EGFR (23%). Other RTKs that are plausible drug targets and amplified in some Basal-like cancers include FGFR1, FGFR2, IGFR1, c-KIT, MET and PDGFRA. Lastly, the PARADIGM identification of high HIF1α/ARNT pathway activity suggest that these malignancies might be susceptible to angiogenesis inhibitors and/or bioreductive drugs that become activated under hypoxic conditions.


The integrated molecular analyses of breast carcinomas we report significantly extends our knowledge base to produce a comprehensive catalog of likely genomic drivers of the most common breast cancer subtypes (Table 1). Our novel observation that diverse genetic and epigenetic alterations converge phenotypically into four main breast cancer classes is not only consistent with convergent evolution of gene circuits as seen across multiple organisms, but also with models of breast cancer clonal expansion and in vivo cell selection proposed to explain the phenotypic heterogeneity observed within defined breast cancer subtypes.

Methods summary

Specimens were obtained from patients with appropriate consent from institutional review boards. Using a co-isolation protocol, DNA and RNA were purified. In total, 800 patients were assayed on at least one platform. Different numbers of patients were used for each platform using the largest number of patients available at time of data freeze; 466 samples (463 patients) were in common across 5/6 platforms (excluding RPPA) and 348 patients were in common on 6/6 platforms. Technology platforms used include: 1) gene expression DNA microarrays51, 2) DNA methylation arrays, 3) microRNA sequencing, 4) Affymetrix SNP arrays, 5) exome sequencing, and 6) Reverse Phase Protein Arrays. Each platform, except for the exome sequencing, was used in a de novo subtype discovery analysis (Supplemental Methods) which were included in a single analysis to define an overall subtype architecture. Additional integrated across platform computational analyses were preformed including PARADIGM32 and MEMo40.

All of the primary sequence files are deposited in CGHub (; all other data including mutation annotation file are deposited at the Data Coordinating Center (DCC) ( Sample lists, data matrices and supporting data can be found at ( The data can be explored via the ISB Regulome Explorer ( and the cBio Cancer Genomics Portal ( Data descriptions can be found at ( and in Supplementary Methods. Reprints and permissions information is available at

Supplementary Material

Figure 1

Significantly Mutated Genes (SMG) and correlations with genomic and clinical features

Tumor samples are grouped by mRNA-subtype: Luminal A (n=225), Luminal B (n=126), HER2E (n=57), and Basal-like (n=93). Left: Non-silent somatic mutation patterns and frequencies for SMGs. Middle: Clinical features: black, positive or T2-4; white, negative or T1; grey, NA or equivocal. Right: SMGs with frequent copy number amplifications (red) or deletions (blue). Far Right: Non-silent mutation rate per tumor (mutations per megabase, adjusted for coverage). Average mutation rate for each expression subtype is indicated. Hypermutated: mutation rates > 3 SD above the mean (> 4.688, indicated by grey line).

Figure 2

Coordinated analysis of breast cancer subtypes defined from five different genomic/proteomic platforms

a) Consensus clustering analysis of the subtypes identifies four major groups (samples, n=348). The blue and white heatmap displays sample consensus. b) Heatmap display of the subtypes defined independently by microRNAs, DNA methylation, copy number, PAM50 mRNA expression, and RPPA expression. Red bar indicates membership of a cluster type. c) Associations with molecular and clinical features. P-values were calculated using a Chi-square test.

Figure 3

Integrated analysis of the PI3K, TP53, and RB1 pathways

Breast cancer subtypes differ by genetic and genomic targeting events, with corresponding effects on pathway activity. For a) PI3K, b) TP53 and c) RB1 pathways, key genes were selected using prior biological knowledge. Multiple mRNA expression signatures for a given pathway were defined (details in Supplemental Methods; PI3K:Saal, PTEN loss in human breast tumors; PI3K:CMap, PI3K/mTOR inhibitor treatment in vitro; PI3K:Majumder, Akt over-expression in mouse model; p53:IARC, expert-curated p53 targets; p53:GSK, TP53 mutant versus wild-type cell lines; p53:KANNAN, p53 over-expression in vitro; p53:TROESTER, TP53 knockdown in vitro; Rb:CHICAS, RB1 mouse knockout versus wild-type; Rb:LARA, RB1 knockdown in vitro; Rb:HERSCHKOWITZ, RB1 LOH in human breast tumors) and applied to the gene expression data, in order to score each tumor for relative signature activity (yellow: more active). The PI3K panel includes a protein-based (RPPA) proteomic signature. Tumors were ordered first by mRNA-subtype, though specific ordering differs between the panels. P-values were calculated by a Pearson’s correlation or a Chi-squared test.

Figure 4

Mutual Exclusivity Modules in Cancer (MEMo) analysis

Mutual exclusivity modules are represented by their gene components and connected to reflect their activity in distinct pathways. For each gene, the frequency of alteration in Basal-like (right box) and non-Basal (left box) is reported. Next to each module is a fingerprint indicating what specific alteration is observed for each gene (row) in each sample (column). a) MEMo identified several overlapping modules that recapitulate the RTK/PI3K and p38/JNK1 signaling pathways and whose core was the top-scoring module. b) MEMo identified alterations to TP53 signaling as occurring within a statistically significant mutually exclusive trend. c) A Basal-like only MEMo analysis identified one module that included ATM mutations, defects at BRCA1 and BRCA2, and deregulation of the RB1-pathway. A gene expression heatmap is below the fingerprint to show expression levels.

Figure 5

Comparison of Breast and Serous Ovarian carcinomas

a) Significantly enriched genomic alterations identified by comparing Basal-like or Serous Ovarian tumors to Luminal cancers. b) Inter-sample correlations (yellow: positive) between gene transcription profiles of breast tumors (columns; TCGA data, arranged by subtype) and profiles of cancers from various tissues of origin (rows; external “TGEN expO” dataset, GSE2109) including Ovarian cancers.

Table 1
Summary of disease subtypes findings highlighting some of the dominant genomic, clinical, and proteomic features.
Luminal ALuminal BBasal-likeHER2E
% ER+/HER2−87%82%10%20%
% HER2+7%15%2%68%
% TNBC2%1%80%9%
p53 PathwayTP53 mut (12%)
Gain of MDM2 (14%)
TP53 mut (32%)
Gain of MDM2 (31%)
TP53 mut (84%)
Gain of MDM2 (14%)
TP53 mut (75%)
Gain of MDM2 (30%)
PIK3CA/PTEN PathwayPIK3CA mut (49%)
PTEN mut/loss (13%)
INPP4B loss (9%)
PIK3CA mut (32%)
PTEN mut/loss (24%)
INPP4B loss (16%)
PIK3CA mut (7%)
PTEN mut/loss (35%)
INPP4B loss (30%)
PIK3CA mut (42%)
PTEN mut/loss (19%)
INPP4B loss (30%)
RB1 PathwayCyclin D1 amp (29%)
CDK4 gain (14%)
Low expression of CDKN2C
High expression of RB1
Cyclin D1 amp (58%)
CDK4 gain (25%)
RB1 mut/loss (20%)
Cyclin E1 amp (9%)
High expression of CDKN2A
Low expression of RB1
Cyclin D1 amp (38%)
CDK4 gain (24%)
mRNA ExpressionHigh ER cluster
Low proliferation
Lower ER cluster
High proliferation
Basal signature
High proliferation
HER2 amplicon signature
High proliferation
Copy NumberMost diploid
Many with quiet genomes
1q, 8q, 8p11 gain
8p, 16q loss
11q13.3 amp (24%)
Most aneuploid
Many with focal amps
1q, 8q, 8p11 gain
8p, 16q loss
11q13.3 amp (51%)
8p11.23 amp (28%)
Most aneuploid
High genomic instability
1q, 10p gain
8p, 5q loss
MYC focal gain (40%)
Most aneuploid
High genomic instability
1q, 8q gain
8p loss
17q12 focal ERRB2 amp (71%)
DNA MutationsPIK3CA (49%)
TP53 (12%)
GATA3 (14%)
MAP3K1 (14%)
TP53 (32%)
PIK3CA (32%)
MAP3K1 (5%)
TP53 (84%)
PIK3CA (7%)
TP53 (75%)
PIK3CA (42%)
PIK3R1 (8%)
DNA MethylationHyper-methylated phenotype for subsetHypo-methylated
Protein ExpressionHigh Estrogen-signaling
High cMYB
RPPA reactive subtypes
Less Estrogen-signaling
High FOXM1 and cMYC
RPPA reactive subtypes
High expression of DNA repair proteins, PTEN and INPP4B loss signature (p-AKT)High protein and phos-protein expression of HER1 and HER2
Percentages are based on 466 tumor overlap list

Genome Sequencing Centers

Washington University in St. Louis –Daniel C. Koboldt(1), Robert S. Fulton(1), Michael D. McLellan(1), Heather Schmidt(1), Joelle Kalicki-Veizer(1), Joshua F. McMichael(1), Lucinda L. Fulton(1), David J. Dooling(1), Li Ding(1,2), Elaine R. Mardis(1,2,3), Richard K. Wilson(1,2,3)

Genome Characterization Centers

BC Cancer Agency – Adrian Ally(4), Miruna Balasundaram(4), Yaron S.N. Butterfield(4), Rebecca Carlsen(4), Candace Carter(4), Andy Chu(4), Eric Chuah(4), Hye-Jung E. Chun(4), Robin J.N. Coope(4), Noreen Dhalla(4), Ranabir Guin(4), Carrie Hirst(4), Martin Hirst(4), Robert A. Holt(4), Darlene Lee(4), Haiyan I. Li(4), Michael Mayo(4), Richard A. Moore(4), Andrew J. Mungall(4), Erin Pleasance(4), A. Gordon Robertson(4), Jacqueline E. Schein(4), Arash Shafiei(4), Payal Sipahimalani(4), Jared R. Slobodan(4), Dominik Stoll(4), Angela Tam(4), Nina Thiessen(4), Richard J. Varhol(4), Natasja Wye(4), Thomas Zeng(4), Yongjun Zhao(4), Inanc Birol(4), Steven J.M. Jones(4), Marco A. Marra(4), Broad Institute – Andrew D. Cherniack(5), Gordon Saksena(5), Robert C. Onofrio(5), Nam H. Pho(5), Scott L. Carter(5), Steven E. Schumacher(5,6), Barbara Tabak(5,6), Bryan Hernandez(5), Jeff Gentry(5), Huy Nguyen(5), Andrew Crenshaw(5), Kristin Ardlie(5), Rameen Beroukhim(5,7,8), Wendy Winckler(5), Gad Getz(5), Stacey B. Gabriel(5), Matthew Meyerson(5,9,10), Brigham and Women’s Hospital and Harvard Medical School– Lynda Chin(9,11), Peter J. Park(12), and Raju Kucherlapati(13). University of North Carolina, Chapel Hill - Katherine A. Hoadley(14,15), J. Todd Auman(16,17), Cheng Fan(15), Yidi J. Turman(15), Yan Shi(15), Ling Li(15), Michael D. Topal(15,18), Xiaping He(14,15), Hann-Hsiang Chao(14,15), Aleix Prat(14,15), Grace O. Silva(14,15), Michael D. Iglesia(14,15), Wei Zhao(14,15), Jerry Usary(15), Jonathan S. Berg(14,15), Michael Adams(14), Jessica Brooker(18), Junyuan Wu(15), Anisha Gulabani(15), Tom Bodenheimer(15), Alan P. Hoyle(15), Janae V. Simons(15), Matthew G. Soloway(15), Lisle E. Mose(15), Stuart R. Jefferys(15), Saianand Balu(15), Joel S. Parker(15), D. Neil Hayes(15,19), Charles M. Perou(14,15,18), University of Southern California / Johns Hopkins – Simeen Malik(20), Swapna Mahurkar(20), Hui Shen(20), Daniel J. Weisenberger(20), Timothy Triche, Jr.(20), Phillip H. Lai(20), Moiz S. Bootwalla(20), Dennis T. Maglinte(20), Benjamin P. Berman(20), David J. Van Den Berg(20), Stephen B. Baylin(21), Peter W. Laird(20)

Genome Data Analysis

Baylor College of Medicine –Chad J. Creighton(22,23), Lawrence A. Donehower(22,23,24,25), Broad Institute – Gad Getz(26), Michael Noble(26), Doug Voet(26), Gordon Saksena(26), Nils Gehlenborg(12,26), Daniel DiCara(26), Juinhua Zhang(27), Hailei Zhang(26), Chang-Jiun Wu(28), Spring Yingchun Liu(26), Michael S. Lawrence(26), Lihua Zou(26), Andrey Sivachenko(26), Pei Lin(26), Petar Stojanov(26), Rui Jing(26), Juok Cho(26), Raktim Sinha(26), Richard W. Park(26), Marc-Danie Nazaire(26), Jim Robinson(26), Helga Thorvaldsdottir(26), Jill Mesirov(26), Peter J. Park(12,29,30), Lynda Chin(26,27), Institute for Systems Biology –Sheila Reynolds(31), Richard B. Kreisberg(31), Brady Bernard(31), Ryan Bressler(31), Timo Erkkila(32), Jake Lin(31), Vesteinn Thorsson(31), Wei Zhang(33), Ilya Shmulevich(31), Memorial Sloan-Kettering Cancer Center –Giovanni Ciriello(34), Nils Weinhold(34), Nikolaus Schultz(34), Jianjiong Gao(34), Ethan Cerami(34), Benjamin Gross(34), Anders Jacobsen(34), Rileen Sinha(34), B. Arman Aksoy(34), Yevgeniy Antipin(34), Boris Reva(34), Ronglai Shen(35), Barry S. Taylor(34), Marc Ladanyi(36), Chris Sander(34), Oregon Health and Science University - Pavana Anur(37), Paul T.Spellman(37),The University of Texas MD Anderson Cancer Center – Yiling Lu(38,39), Wenbin Liu(40), Roel R.G. Verhaak(40), Gordon B. Mills(38,39), Rehan Akbani(40), Nianxiang Zhang(40), Bradley M. Broom(40), Tod D. Casasent(40), Chris Wakefield(40), Anna K. Unruh(40), Keith Baggerly(40), Kevin Coombes(40), John N. Weinstein(40), University of California, Santa Cruz / Buck Institute –David Haussler(41,42), Christopher C. Benz(43), Joshua M. Stuart(41), Stephen C. Benz(41), Jingchun Zhu(41), Christopher C. Szeto(41), Gary K. Scott(43), Christina Yau(43), Evan O. Paull(41), Daniel Carlin(41), Christopher Wong(41), Artem Sokolov(41), Janita Thusberg(43), Sean Mooney(43), Sam Ng(41), Theodore C. Goldstein(41), Kyle Ellrott(41), Mia Grifford(41), Christopher Wilks(41), Singer Ma(41), Brian Craft(41), NCI: Chunhua Yan(44), Ying Hu(44), Daoud Meerzaman(44)

Biospecimen Core Resource

Nationwide Children’s Hospital Biospecimen Core Resource – Julie M. Gastier-Foster(45,46,47), Jay Bowen(47), Nilsa C. Ramirez(45,47), Aaron D. Black(47), Robert E. Pyatt(45,47), Peter White(46,47), Erik J. Zmuda(47), Jessica Frick(47), Tara M. Lichtenberg(47), Robin Brookens(47), Myra M. George(47), Mark A. Gerken(47), Hollie A. Harper(47), Kristen M. Leraas(47), Lisa J. Wise(47), Teresa R. Tabler(47), Cynthia McAllister(47), Thomas Barr(47), Melissa Hart-Kothari(47)

Tissue Source Sites

ABS-IUPUI - Katie Tarvin(48), Charles Saller(49), George Sandusky(50), Colleen Mitchell(50), Christiana: Mary V. Iacocca(51), Jennifer Brown(51), Brenda Rabeno(51), Christine Czerwinski(51), Nicholas Petrelli(51), Cureline – Oleg Dolzhansky(52), Mikhail Abramov(53), Olga Voronina(54), Olga Potapova(54), Duke University Medical Center: Jeffrey R. Marks(55), The Greater Poland Cancer Centre: Wiktoria M. Suchorska(56), Dawid Murawa(56), Witold Kycler(56), Matthew Ibbs(56), Konstanty Korski(56), Arkadiusz Spychała(56), Paweł Murawa(56), Jacek J. Brzeziński(56), Hanna Perz(56), Radosław ŁaŸniak(56), Marek Teresiak(56), Honorata Tatka(56), Ewa Leporowska(56), Marta Bogusz-Czerniewicz(56,57), Julian Malicki(56,57), Andrzej Mackiewicz(56,57), Maciej Wiznerowicz(56,57), ILSBio: Xuan Van Le(58), Bernard Kohl(58), Nguyen Viet Tien(59), Richard Thorp(60), Nguyen Van Bang(61), Howard Sussman(62), Bui Duc Phu(61), Richard Hajek(63), Nguyen Phi Hung(64), Tran Viet The Phuong(65), Huynh Quyet Thang(66), Khurram Zaki Khan(60), International Genomics Consortium: Robert Penny(67), David Mallery(67), Erin Curley(67), Candace Shelton(67), Peggy Yena(67), Mayo Clinic: James N. Ingle(68), Fergus J. Couch(68), Wilma L. Lingle(68), MSKCC: Tari A. King(69), MD Anderson Cancer Center: Ana Maria Gonzalez-Angulo(38,70), Gordon B. Mills(70), Mary D. Dyer(70), Shuying Liu(70), Xiaolong Meng(70), Modesto Patangan(70), University of California San Francisco – Frederic Waldman(71,72), Hubert Stöppler(73), University of North Carolina: W. Kimryn Rathmell(15), Leigh Thorne(15,74), Mei Huang(15,74), Lori Boice(15,74), Ashley Hill(15), Roswell Park Cancer Institute: Carl Morrison(75), Carmelo Gaudioso(75), Wiam Bshara(75), University of Miami - Kelly Daily(76), Sophie C. Egea(76), Marc D. Pegram(76), Carmen Gomez-Fernandez(76), University of Pittsburgh: Rajiv Dhir(77), Rohit Bhargava(78), Adam Brufsky(78), Walter Reed National Military Medical Center: Craig D. Shriver(79), Jeffrey A. Hooke(79), Jamie Leigh Campbell(79), Richard J. Mural(80), Hai Hu(80), Stella Somiari(80), Caroline Larson(80), Brenda Deyarmin(80), Leonid Kvecher(80), Albert J. Kovatich(81),

Disease Working Group

Matthew J. Ellis(3,82,83), Tari A. King(69), Hai Hu(80), Fergus J. Couch(68), Richard J. Mural(80), Thomas Stricker(84), Kevin White(84), Olufunmilayo Olopade(85), James N. Ingle(68), Chunqing Luo(80), Yaqin Chen(80), Jeffrey R. Marks(55), Frederic Waldman(71,72), Maciej Wiznerowicz(56,57), Ron Bose(3,82,83), Li-Wei Chang(86), Andrew H. Beck(10), Ana Maria Gonzalez-Angulo(38,70)

Data Coordination Center

Todd Pihl(87), Mark Jensen(87), Robert Sfeir(87), Ari Kahn(87), Anna Chu(87), Prachi Kothiyal(87), Zhining Wang(87), Eric Snyder(87), Joan Pontius(87), Brenda Ayala(87), Mark Backus(87), Jessica Walton(87), Julien Baboud(87), Dominique Berton(87), Matthew Nicholls(87), Deepak Srinivasan(87), Rohini Raman(87), Stanley Girshik(87), Peter Kigonya(87), Shelley Alonso(87), Rashmi Sanbhadti(87), Sean Barletta(87), David Pot(87)

Project Team

National Cancer Institute: Margi Sheth(88), John A. Demchok(88), Kenna R Mills Shaw(88), Liming Yang(88), Greg Eley(89), Martin L. Ferguson(90), Roy W. Tarnuzzer(88), Jiashan Zhang(88), Laura A. L. Dillon(88), Kenneth Buetow(44), Peter Fielding (88) National Human Genome Research Institute: Bradley A. Ozenberger(91), Mark S. Guyer(91), Heidi J. Sofia(91), Jacqueline D. Palchik(91)


1The Genome Institute, Washington University, St Louis, Missouri 63108, USA

2Department of Genetics, Washington University, St Louis, Missouri 63110, USA

3Siteman Cancer Center, Washington University, St Louis, Missouri 63110, USA

4Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z, Canada

5The Broad Institute of MIT and Harvard, Cambridge, MA 02142

6Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02115

7Department of Medicine, Harvard Medical School, Boston, MA 02215

8Departments of Cancer Biology and Medical Oncology, and the Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, Boston, MA, 02115

9Department of Medical Oncology and the Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, Boston, MA, 02115

10Department of Pathology, Harvard Medical School, Boston, MA02215

11Belfer Institute for Applied Cancer Science, Dana-Farber Cancer Institute, Boston, MA 02115 USA

12The Center for Biomedical Informatics, Harvard Medical School, Boston, MA 02115 USA

13Department of Genetics, Harvard Medical School and Division of Genetics, Brigham and Women’s Hospital, Boston, MA 02115 USA

14Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA

15Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA

16Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA

17Institute for Pharmacogenetics and Individualized Therapy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA

18Department of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, Chapel Hill, Chapel Hill, NC 27599 USA

19Department of Internal Medicine, Division of Medical Oncology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA

20USC Epigenome Center, University of Southern California, Los Angeles, CA 90033 USA

21Cancer Biology Division, The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University, Baltimore, MD 21231 USA

22Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX 77030

23Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030

24Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030

25Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX 77030

26The Eli and Edythe L. Broad Institute of Massachusetts Institute Of Technology and Harvard University, Cambridge, MA 02142 USA

27Institute for Applied Cancer Science, Department of Genomic Medicine, University of Texas MD Anderson Cancer Center, Houston, TX 77054

28Department of Genomic Medicine, University of Texas MD Anderson Cancer Center, Houston, TX 77054

29Division of Genetics, Brigham and Women’s Hospital, Boston, MA 02115 USA

30Informatics Program, Children’s Hospital, Boston, MA 02115 USA

31Institute for Systems Biology, Seattle, WA 98109 USA

32Tampere University of Technology, Tampere, Finland

33Cancer Genomics Core Laboratory, M.D. Anderson Cancer Center, Houston, TX 77030 USA

34Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY 10065 USA

35Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065 USA

36Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, NY 10065 USA

37Oregon Health and Science University, 3181 SW Sam Jackson Park Rd, Portland OR 97239

38Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA

39Kleberg Center for Molecular Markers, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 US

40Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA

41Department of Biomolecular Engineering and Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064 USA

42Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064 USA

43Buck Institute for Research on Aging, Novato, CA 94945 USA

44Center for Bioinformatics and Information Technology, National Cancer Institute, Rockville, MD

45The Ohio State University College of Medicine, Department of Pathology, Columbus, OH 43205

46The Ohio State University College of Medicine, Department Pediatrics, Columbus, OH 43205

47The Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205

48ABS Inc. Indianapolis, IN 46204 USA

49ABS Inc. Wilmington DE 19801 USA

50Indiana University School of Medicine, Indianapolis, Indiana 46202 USA

51Helen F. Graham Cancer Center, Christiana Care, Newark, Delaware 19713 USA

52Moscow City Clinical Oncology Dispensary #1 and the Central IHC Laboratory of the Moscow Health Department, Moscow, Russia

53Russian Cancer Research Center, Moscow, Russia

54Cureline, Inc., South San Francisco, CA, USA

55Department of Surgery, Duke University Medical Center, Durham, NC 27710 USA

56The Greater Poland Cancer Centre, Pozna, 61-866, Poland

57Poznan University of Medical Sciences, Pozna, 61-701, Poland

58ILSBio, LLC, Chestertown, MD 21620, USA

59Ministry of Health, Hanoi, Vietnam

60ILSBio LLC, Karachi, Pakistan

61Hue Central Hospital, Hue City, Vietnam

62Stanford University Medical Center, Stanford, CA 94305, USA

63Center for Minority Health Research, University of Texas, M.D. Anderson Cancer Center, Houston, TX 07703

64National Cancer Institute, Hanoi, Vietnam

65Ho Chi Minh City Cancer Center, Vietnam

66Can Tho Cancer Center, Can Tho, Vietnam

67International Genomics Consortium, Phoenix, AZ 85004 USA

68Mayo Clinic, Rochester, MN 55905

69Department of Surgery, Breast Service, Memorial Sloan-Kettering Cancer Center, New York, NY 10065

70Department of Breast Medical Oncology, The University of Texas, MD Anderson Cancer Center, Houston, TX 77030 USA

71University of California at San Francisco; San Francisco, CA 94143

72Cancer Diagnostics; Nichols Institute, Quest Diagnostics; San Juan Capistrano, CA 92675

73Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94115

74UNC Tissue Procurement Facility, Department of Pathology, UNC Lineberger Cancer Center, Chapel Hill, NC 27599, USA

75Department of Pathology, Roswell Park Cancer Institute, Buffalo, NY 14263 USA

76Department of Pathology, University of Miami Miller School of Medicine, Sylvester Comprehensive Cancer Center, Miami, FL 33136, USA.

77University of Pittsburgh, Pittsburgh, PA, 15213 USA

78Magee-Womens Hospital of University of Pittsburgh Medical Center, Pittsburgh PA 15213 USA

79Walter Reed National Military Medical Center, Bethesda, MD 20899-5600

80Windber Research Institute, Windber, PA 15963

81MDR Global, LLC, Windber, PA 15963

82Breast Cancer Program, Washington University, St. Louis, MO, USA

83Department of Internal Medicine, Division of Oncology, Washington University, St. Louis, MO, USA

84Institute for Genomics and Systems Biology, University of Chicago, Chicago IL

85Center for Clinical Cancer Genetics, The University of Chicago, Chicago, IL 60637, USA

86Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110

87SRA International, 4300 Fair Lakes Court, Fairfax, VA 22033

88The Cancer Genome Atlas Program Office, Center for Cancer Genomics, National Cancer Institute, Bethesda, MD

89TCGA Consultant, Scimentis, LLC, Atlanta, GA

90MLF Consulting, Arlington, MA 02474 USA

91National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892


We thank Margi Sheth and Susan Lucas for administrative coordination of TCGA activities, and Chris Gunter for critical reading of the manuscript. This work was supported by the following grants from the USA National Institutes of Health: U24CA143883, U24CA143858, U24CA143840, U24CA143799, U24CA143835, U24CA143845, U24CA143882, U24CA143867, U24CA143866, U24CA143848, U24CA144025, U54HG003079, P50CA116201 and P50CA58223. Additional support was provided by the Susan G. Komen for the Cure, the US Department of Defense through the Henry M. Jackson Foundation for the Advancement of Military Medicine, and the Breast Cancer Research Foundation. The views expressed in this paper are those of the authors and do not reflect the official policy of the Department of Defense, or U.S. Government.


  • 1. PaikSA multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancerN Engl J Med351281728262004[PubMed][Google Scholar]
  • 2. van ‘t VeerLJGene expression profiling predicts clinical outcome of breast cancerNature4155305362002[PubMed][Google Scholar]
  • 3. SlamonDJHuman breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogeneScience2351771821987[PubMed][Google Scholar]
  • 4. ChinKGenomic and transcriptional aberrations linked to breast cancer pathophysiologiesCancer Cell10529541S1535-6108(06)00315-1 [pii]2006[PubMed][Google Scholar]
  • 5. BergamaschiADistinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancerGenes Chromosomes Cancer45103310402006[PubMed][Google Scholar]
  • 6. PerouCMMolecular stratification of triple-negative breast cancersOncologist16Suppl 161702011[PubMed][Google Scholar]
  • 7. SorlieTRepeated observation of breast tumor subtypes in independent gene expression data setsProc Natl Acad Sci U S A100841884232003[PubMed][Google Scholar]
  • 8. FoulkesWDGermline BRCA1 mutations and a basal epithelial phenotype in breast cancerJ Natl Cancer Inst95148214852003[PubMed][Google Scholar]
  • 9. CareyLARace, breast cancer subtypes, and survival in the Carolina Breast Cancer StudyJama295249225022006[PubMed][Google Scholar]
  • 10. DingLGenome remodelling in a basal-like breast cancer metastasis and xenograftNature4649991005nature08989 [pii]2010[PubMed][Google Scholar]
  • 11. ShahSPMutational evolution in a lobular breast tumour profiled at single nucleotide resolutionNature4618098132009[PubMed][Google Scholar]
  • 12. ShahSPThe clonal and mutational evolution spectrum of primary triple-negative breast cancersNature2012[PubMed][Google Scholar]
  • 13. PerouCMMolecular portraits of human breast tumoursNature4067477522000[PubMed][Google Scholar]
  • 14. ParkerJSSupervised Risk Predictor of Breast Cancer Based on Intrinsic SubtypesJ Clin OncolJCO.2008.18.1370 [pii]2009[PubMed][Google Scholar]
  • 15. Gonzalez-PerezALopez-BigasNImproving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, CondelAm J Hum Genet884404492011[PubMed][Google Scholar]
  • 16. BamshadMMutations in human TBX3 alter limb, apocrine and genital development in ulnar-mammary syndromeNat Genet163113151997[PubMed][Google Scholar]
  • 17. LiQYHolt-Oram syndrome is caused by mutations in TBX5, a member of the Brachyury (T) gene familyNat Genet1521291997[PubMed][Google Scholar]
  • 18. The Cancer Genome Atlas Research Network et alComprehensive genomic characterization defines human glioblastoma genes and core pathwaysNature2008[Google Scholar]
  • 19. CheungLWHigh Frequency of PIK3R1 and PIK3R2 Mutations in Endometrial Cancer Elucidates a Novel Mechanism for Regulation of PTEN Protein StabilityCancer discovery11701852011[PubMed][Google Scholar]
  • 20. MalcovatiLClinical significance of SF3B1 mutations in myelodysplastic syndromes and myelodysplastic/myeloproliferative neoplasmsBlood118623962462011[PubMed][Google Scholar]
  • 21. WangLSF3B1and other novel cancer genes in chronic lymphocytic leukemiaN Engl J Med365249725062011[PubMed][Google Scholar]
  • 22. DingLSomatic mutations affect key pathways in lung adenocarcinomaNature455106910752008[PubMed][Google Scholar]
  • 23. JohnsonGLLapadatRMitogen-activated protein kinase pathways mediated by ERK, JNK, and p38 protein kinasesScience298191119122002[PubMed][Google Scholar]
  • 24. UsaryJMutation of GATA3 in human breast tumorsOncogene23766976782004[PubMed][Google Scholar]
  • 25. WalshTDetection of inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencingProc Natl Acad Sci U S A10712629126332010[PubMed][Google Scholar]
  • 26. PratAPhenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancerBreast Cancer Res12R68bcr2635 [pii]2010[PubMed][Google Scholar]
  • 27. KozomaraAGriffiths-JonesSmiRBase: integrating microRNA annotation and deep-sequencing dataNucleic Acids Res39D1521572011[PubMed][Google Scholar]
  • 28. WeigmanVJBasal-like Breast cancer DNA copy number losses identify genes involved in genomic instability, response to therapy, and patient survivalBreast Cancer Res Treat2011[PubMed][Google Scholar]
  • 29. CurtisCThe genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroupsNature2012[PubMed][Google Scholar]
  • 30. HennessyBTA Technical Assessment of the Utility of Reverse Phase Protein Arrays for the Study of the Functional Proteome in Non-microdissected Human Breast CancersClinical proteomics61291512010[PubMed][Google Scholar]
  • 31. DaubHKinase-selective enrichment enables quantitative phosphoproteomics of the kinome across the cell cycleMol Cell314384482008[PubMed][Google Scholar]
  • 32. VaskeCJInference of patient-specific pathway activities from multi-dimensional cancer genomics data usingPARADIGMBioinformatics26i2372452010[PubMed][Google Scholar]
  • 33. CampbellIGMutation of the PIK3CA gene in ovarian and breast cancerCancer Res64767876812004[PubMed][Google Scholar]
  • 34. BachmanKEThe PIK3CA Gene is Mutated with High Frequency in Human Breast CancersCancer Biol Ther37727752004[PubMed][Google Scholar]
  • 35. Stemke-HaleKAn integrative genomic and proteomic analysis of PIK3CA, PTEN, and AKT mutations in breast cancerCancer Res68608460912008[PubMed][Google Scholar]
  • 36. CreightonCJProteomic and transcriptomic profiling reveals a link between the PI3K pathway and lower estrogen-receptor (ER) levels and activity in ER+ breast cancerBreast Cancer Res12R402010[PubMed][Google Scholar]
  • 37. MajumderPKmTOR inhibition reverses Akt-dependent prostate intraepithelial neoplasia through regulation of apoptotic and HIF-1-dependent pathwaysNat Med105946012004[PubMed][Google Scholar]
  • 38. SaalLHRecurrent gross mutations of the PTEN tumor suppressor gene in breast cancers with deficient DSB repairNat Genet401021072008[PubMed][Google Scholar]
  • 39. WagnerEFNebredaARSignal integration by JNK and p38 MAPK pathways in cancer developmentNat Rev Cancer95375492009[PubMed][Google Scholar]
  • 40. CirielloGCeramiESanderCSchultzNMutual exclusivity analysis identifies oncogenic network modulesGenome Res223984062012[PubMed][Google Scholar]
  • 41. KannanKDNA microarrays identification of primary and secondary target genes regulated by p53Oncogene20222522342001[PubMed][Google Scholar]
  • 42. TroesterMAGene expression patterns associated with p53 status in breast cancerBMC Cancer62762006[PubMed][Google Scholar]
  • 43. DeisenrothCThornerAREnomotoTPerouCMZhangYMitochondrial Hep27 is a c-Myb target gene that inhibits Mdm2 and stabilizes p53Mol Cell Biol3039813993MCB.01284-09 [pii]2010[PubMed][Google Scholar]
  • 44. PeiXHCDK inhibitor p18(INK4c) is a downstream target of GATA3 and restrains mammary luminal progenitor cell proliferation and tumorigenesisCancer Cell15389401S1535-6108(09)00079-8 [pii]2009[PubMed][Google Scholar]
  • 45. ChicasADissecting the unique role of the retinoblastoma tumor suppressor during cellular senescenceCancer Cell173763872010[PubMed][Google Scholar]
  • 46. HerschkowitzJIHeXFanCPerouCMThe functional loss of the retinoblastoma tumour suppressor is a common event in basal-like and luminal B breast carcinomasBreast Cancer Res10R75bcr2142 [pii]2008[PubMed][Google Scholar]
  • 47. LaraMFGene profiling approaches help to define the specific functions of retinoblastoma family in epidermisMol Carcinog472092212008[PubMed][Google Scholar]
  • 48. BaselgaJPertuzumab plus trastuzumab plus docetaxel for metastatic breast cancerN Engl J Med3661091192012[PubMed][Google Scholar]
  • 49. JiangZRb deletion in mouse mammary progenitors induces luminal-B or basal-like/EMT tumor subtypes depending on p53 statusJ Clin Invest41490 [pii]2010[PubMed][Google Scholar]
  • 50. ChandrianiSA core MYC gene expression signature is prominent in basal-like breast cancer but only partially overlaps the core serum responsePLoS One4e66932009[PubMed][Google Scholar]
  • 51. The Cancer Genome Atlas Research NetworkPerouCMIntegrated genomic analyses of ovarian carcinomaNature4746096152011[PubMed][Google Scholar]
  • 52. AudehMWOral poly(ADP-ribose) polymerase inhibitor olaparibin patients with BRCA1 or BRCA2 mutations and recurrent ovarian cancer: a proof-of-concept trialLancet3762452512010[PubMed][Google Scholar]
  • 53. FongPCInhibition of poly(ADP-ribose) polymerase in tumors from BRCA mutation carriersN Engl J Med361123134NEJMoa0900212 [pii]2009[PubMed][Google Scholar]
  • 54. FedeleCGInositol polyphosphate 4-phosphatase II regulates PI3K/Akt signaling and is lost in human basal-like breast cancersProc Natl Acad Sci U S A10722231222362010[PubMed][Google Scholar]
  • 55. GewinnerCEvidence that inositol polyphosphate 4-phosphatase type II is a tumor suppressor that inhibits PI3K signalingCancer Cell161151252009[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.