Dysregulation of EMT Drives the Progression to Clinically Aggressive Sarcomatoid Bladder Cancer.
Journal: 2019/May - Cell Reports
ISSN: 2211-1247
Abstract:
Sarcomatoid urothelial bladder cancer (SARC) displays a high propensity for distant metastasis and is associated with short survival. We report a comprehensive genomic analysis of 28 cases of SARC and 84 cases of conventional urothelial carcinoma (UC), with the TCGA cohort of 408 muscle-invasive bladder cancers serving as the reference. SARCs show a distinct mutational landscape, with enrichment of TP53, RB1, and PIK3CA mutations. They are related to the basal molecular subtype of conventional UCs and could be divided into epithelial-basal and more clinically aggressive mesenchymal subsets on the basis of TP63 and its target gene expression levels. Other analyses reveal that SARCs are driven by downregulation of homotypic adherence genes and dysregulation of the EMT network, and nearly half exhibit a heavily infiltrated immune phenotype. Our observations have important implications for prognostication and the development of more effective therapies for this highly lethal variant of bladder cancer.
Relations:
Content
Citations
(4)
Diseases
(1)
Genes
(3)
Processes
(2)
Anatomy
(1)
Affiliates
(3)
Similar articles
Articles by the same authors
Discussion board
Cell Rep 27(6): 1781-1793e4.

Dysregulation of EMT Drives the Progression to Clinically Aggressive Sarcomatoid Bladder Cancer

+16 authors

INTRODUCTION

Bladder cancer is the ninth most common cancer worldwide, affecting 430,000 people and resulting in 165,000 deaths annually (Ferlay et al., 2013). In the United States, it is the fourth most common cancer in men, with an estimated incidence of 81,000 cases in 2018 (Siegel et al., 2018). More than 90% of bladder cancers are urothelial carcinomas (UCs), which originate from precursor lesions in the epithelial layer of the bladder called the urothelium (Grignon et al., 2016). They progress along two distinct tracks, referred to as papillary and non-papillary, that represent clinically and molecularly different forms of the disease (Czerniak et al., 2016). Non-invasive papillary tumors have a high tendency for recurrence, which necessitates lifetime surveillance that is both intrusive and costly to the patient. Non-papillary carcinomas are clinically aggressive, exhibiting a high propensity for invasive growth, and a large proportion of them are lethal because of metastatic spread (Kamat et al., 2016). We showed that papillary tumors are almost exclusively of a luminal molecular subtype that recapitulates the expression pattern of markers characteristic of normal, intermediate, and terminal urothelial differentiation (Choi et al., 2014b). In contrast, non-papillary UCs are of a basal molecular subtype and exhibit an expression pattern of genes characteristic of the normal basal urothelial layer. Molecular subtyping has shown that invasive UCs can be almost equally divided into luminal and basal subtypes that have distinct clinical behaviors and responses to frontline chemotherapy (Cancer Genome Atlas Research Network, 2014; Choi et al., 2014b; Damrauer et al., 2014; Robertson et al., 2017). In addition to conventional UCs, many microscopically distinct bladder cancer variants have been described, and in general they are thought to develop via progression of conventional disease (Amin, 2009; Grignon et al., 2016). The most frequent of these variants are sarcomatoid, small cell, and micropapillary, all of which are clinically more aggressive than conventional UCs and require uniquely tailored therapeutic management, which is often unavailable (Amin, 2009; Grignon et al., 2016; Kamat et al., 2016).

In this report, we focus on one of these more common variants, referred to as sarcomatoid carcinoma (SARC) (Amin, 2009; Grignon et al., 2016). SARC represents, in various published series, 5%–15% of bladder cancer and frequently coexists with conventional UC (Lopez-Beltran et al., 1998; Sanfrancesco et al., 2016). Clinically, it has a predilection for early metastatic spread to distant organs, and it is associated with shorter survival compared with conventional UC (Amin, 2009; Lopez-Beltran et al., 1998; Sanfrancesco et al., 2016). Here, we report on the genome-wide characterization of bladder SARC, including its miRNA, gene expression, and whole-exome mutational profiles, which identified molecular features associated with its aggressive nature that may be relevant for the early detection and treatment of this highly lethal variant of bladder cancer.

RESULTS

Fifty-two paraffin-embedded SARC tissue samples and 84 invasive conventional bladder UC samples from the MD Anderson Cancer Center cohort were analyzed retrospectively. A cohort of 408 muscle-invasive bladder cancers in The Cancer Genome Atlas (TCGA) was used as a reference. The samples were characterized by clinical and pathological data as well as by several genomic platforms. Sufficient high-quality DNA was available for 13 SARC cases and 5 paired SARC and conventional UC cases for whole-exome sequencing. Gene expression profiling was performed on 28 of the cases using Illumina’s DASL platform, and the data were merged with those obtained from a cohort of 84 conventional UCs. Panel quantitative RT-PCR was used to analyze the miRNA expression levels of all 52 SARC samples and 80 conventional UC samples.

Mutational Signature

The mutational profile of conventional UC was characterized by significant levels of recurrent somatic mutations in 30 genes (Figure 1A). The 10 most frequently mutated genes in UC were TP53 (47%), ARID1A (25%), KDM6A (22%), PIK3CA (22%), RB1 (17%), EP300 (15%), FGFR3 (14%), STAG2 (14%), ELF3 (12%), and CREBBP (11%). The overall mutational landscapes of luminal and basal bladder UC were similar, but several mutated genes were distinctively enriched in specific molecular subtypes. Mutated FGFR3, ELF3, CDKN1A, and TSC1 genes were enriched in luminal tumors, whereas mutated TP53, RB1, and PIK3CA genes were enriched in basal tumors (Figure 1A). SARCs showed high overall mutational rates (median mutational frequency 259, interquartile range 174), and their significantly mutated genes were similar to those observed in conventional UC (Figure 1B; Table S1). However, the top three genes—TP53 (72%), PIK3CA (39%), and RB1 (39%)—were mutated at significantly higher frequencies in SARCs than they were in conventional UCs (p < 0.01). This suggests that SARC evolved from precursor conventional UC carrying these mutations and that mutations in these genes may drive the progression process. Several of the genes that are frequently mutated in conventional UC, including ARID1A, KDM6A, EP300, ELF3, and CREBBP, were not mutated in SARC, and these genes are involved in chromatin remodeling (Dutto et al., 2018; Ler et al., 2017; Skulte et al., 2014; Tang et al., 2013; Wang et al., 2017; Zheng et al., 2018). In general, as a group, chromatin-remodeling genes were not mutated in SARC. Instead, SARC carried frequent mutations of MYO1F (33%), CNGB1 (22%), FXR1 (22%), RBM5 (22%), SEMA3D (22%), TTBK2 (22%), and ZNF90 (22%), which are involved in cellular motility, RNA binding, transmembrane ion channeling, kinase activity, and signaling (Ahonen et al., 2013; Bechara et al., 2013; Bouskila et al., 2011; Hamm et al., 2016; Kim et al., 2006; Qian et al., 2015; Somma et al., 1991). The functional significance of the mutations of these genes for sarcomatoid progression remains unclear, but they are attractive candidates for future mechanistic studies. Interestingly, FGFR3 mutations, which were present in 14% of conventional UCs, were not present in SARC. Nearly all mutations present in the SARCs were also present in the paired precursor conventional UCs of the same patients, indicating that the SARC and the presumed precursor lesions were clonally related.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0002.jpg
Mutational Landscape of SARC

(A) Mutational landscape among the molecular subtypes of 408 muscle-invasive bladder cancers from the TCGA cohort showing the frequency of mutations in individual tumors and somatic mutations for significantly mutated genes. The frequencies of mutations of individual genes in the luminal and basal subtypes are shown on the left. Asterisks denote the genes with statistically significant mutation frequencies in the luminal and basal subtypes. Bars on the right show the numbers of specific substitutions for individual genes.

(B) Mutational landscape of 13 cases of SARC and 5 paired samples of precursor conventional UC showing the frequency of mutations in individual genes and somatic mutations for significantly mutated genes. The frequencies of mutations of individual genes are shown on the left. Bars on the right show the numbers of specific substitutions for individual genes.

(C) Composite bar graphs showing the distribution of all nucleotide substitutions in three sets of samples corresponding to the TCGA cohort, paired precursor conventional UC, and SARC.

(D) Proportion of single-nucleotide variants (SNVs) in specific nucleotide motifs for each category of substitution in three sets of samples as shown in (C).

(E) False discovery rate (FDR) for specific nucleotide motifs in three sets of samples as shown in (C).

(F) Average weight scores of mutagenesis patterns in three sets of samples as shown in (C).

(G) Weight scores of mutagenesis patterns in individual tumor samples of the TCGA cohort.

(H) Weight scores of mutagenesis patterns in SARCs and paired precursor conventional UCs.

(I) Statistical significance of mutagenesis patterns (p value) in SARC compared with conventional UC. For (E) and (I), p value was calculated using Wilcoxon rank-sum and Kruskal-Wallis tests, respectively.

Mechanisms of Mutagenesis

To further characterize the mutational process associated with progression from conventional UC to SARC, we examined six single-base substitutions (C > A, C > G, C > T, T > A, T > C, and T > G) in all cancer samples (Alexandrov et al., 2013; Nik-Zainal et al., 2012). The results revealed that SARCs were enriched with C > T mutations compared with conventional UCs, and this increase was already apparent in the precursor conventional UCs that were associated with SARCs (Figures 1C and and1D).1D). Analyses of Sanger mutational signatures (Faltas et al., 2016) revealed the presence of six dominant signatures in the conventional UCs in the TCGA cohort: signatures 1, 2, 3 (BRCA1/2 mutagenesis), 13 (APOBEC), 19, and 30 (Figures 1F and and1G).1G). Clustering separated the conventional tumors into two subsets (a and b) that were characterized by different levels of signature 13 (APOBEC) prevalence. In contrast, SARCs and paired precursor conventional UCs were characterized by the uniform dominance of signature 1, which was present in all SARC and precursor conventional UC samples (Figure 1H). In addition, clustering segregated SARCs into two subsets, and they were also characterized by different levels of APOBEC activity. Mutagenesis signatures 1 and 19 were significantly enriched in SARCs compared with conventional UC (Figure 1I). Overall, these data suggest that SARCs evolve from a distinct subset of conventional UCs.

Gene Expression Profile

mRNA expression profiling revealed that more than 6,000 genes were differentially expressed between SARC and UC. We performed multiple unsupervised clustering analyses using all the differentially expressed genes in SARC and performed similar analyses using the top 100 and top 10 upregulated and downregulated genes. All these analyses separated SARC and UC into two distinct clusters (Figures S1A and S1B). One cluster contained conventional UC almost exclusively, whereas the other cluster contained most of the SARCs. Among the top upregulated genes in the SARC cluster were FAM101B (or RFLNB), UHRF1, and PHC2, all of which are members of the chromatinremodeling superfamily (Benavente et al., 2014; Gay et al., 2011; Isono et al., 2005) (Figure S1B). The top downregulated genes included the differentiation-associated transcription factor, ELF3, and genes involved in terminal urothelial differentiation, such as uroplakins and cell adherence genes, as well as LAD1, a component of anchoring filaments in the basement membrane (Hogg et al., 1999). The median survival for the SARC patients (11 months) was significantly shorter than that of the patients with conventional UC (24 months) (p = 0.0326) (Figure S1C).

Intrinsic Molecular Subtypes

Several molecular studies, including our own, have divided bladder cancer into two molecular subtypes that preferentially express basal or luminal genes (Cancer Genome Atlas Research Network, 2014; Choi et al., 2014a; Damrauer et al., 2014; Robertson et al., 2017). To investigate whether the intrinsic molecular types of conventional bladder UC applied to SARC, we analyzed the luminal and basal gene expression signatures in SARC. We used a previously developed classifier that includes markers of luminal and basal types (Choi et al., 2014a). The set of conventional UC was separated into two major groups (Figure 2A). The first group, comprising 53 of the 84 samples (63%), was characterized by high mRNA expression levels of luminal markers such as KRT20, GATA3, uroplakins, ERBB2, ERBB3, PPARG, FOXA1, and XBP1 and was referred to as the luminal subtype. The remaining 31 conventional UC samples (37%) were characterized by high expression levels of basal markers such as CD44, CDH3, KRT5, KRT6, and KRT14 and were referred to as the basal subtype.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0003.jpg
Luminal and Basal Molecular Subtypes in Conventional UC and SARC

(A) The expression of luminal and basal markers in molecular subtypes of conventional UC (n = 84) and SARC (n = 28).

(B) Kaplan-Meier analysis of molecular subtypes of conventional UC and SARC and log-rank testing. A p value < 0.05 was considered statistically significant.

(C) The immunohistochemical expression of signature luminal and basal markers in representative luminal and basal cases of conventional UC and in representative basal and double-negative SARC. Scale bars indicate 40 μm.

Unlike the conventional UCs, all 28 SARCs had low mRNA expression levels of the luminal genes, which suggested that they developed from a basal subtype of precursor conventional UC (Figure 2A). In addition, nearly half of the SARCs (12 [43%]) were characterized by the retained expression of basal keratins, CD44, and P-cadherin (CDH3), whereas the remaining SARCs (16 [57%]) lacked expression of canonical basal and luminal markers, and we therefore referred to them as “double-negative.” This subset of SARCs shares the “mesenchymal” molecular phenotype with the claudin-low and TCGA cluster IV subtypes identified previously (Cancer Genome Atlas Research Network, 2014; Damrauer et al., 2014; Kardos et al., 2016). Survival analysis revealed that the double-negative mesenchymal SARCs were the most aggressive of the molecular subtypes (Figure 2B). Furthermore, the mean survival duration of patients with double-negative SARC (10 months) was shorter than that of patients with basal SARC (18 months); however, this difference was not significant statistically, probably because of the limited number of cases. We verified the expression patterns of signature luminal markers (GATA3) and basal markers (P63, KRT5/6, KRT14) by immunohistochemistry using tissue microarrays containing the same cases (Figure 2C). The epithelial-basal SARCs were focally positive for basal markers such as p63, KRT5/6, and KRT14 and were negative for the signature luminal marker GATA3. In contrast, the purely mesenchymal double-negative SARCs were immunohistochemically negative for all luminal and basal markers, consistent with the RNA expression data.

Canonical and Upstream Regulator Pathways

In order to identify candidate mechanisms underlying the progression to the SARC phenotype, we used the Ingenuity Pathway Analysis (IPA) upstream pathways and gene set enrichment analysis (GSEA) functions to identify signaling pathways associated with the gene expression patterns observed (Figures 3A3D). SARCs were characterized by downregulation of G1/S checkpoint genes and upregulation of INK4 family of cyclindependent kinase inhibitors (CDKN2A-D), consistent with our observation that RB1 mutations were more common in SARCs. Similarly, SARCs exhibited decreased p53 pathway activity, which was consistent with their high mutational rate of the p53 gene. SARCs were also characterized by loss of epithelial adherens genes and downregulation of p63 pathway activity complemented with activation of TGFb and RhoA, all of which are characteristics of epithelial-mesenchymal transition (EMT) permissive state.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0004.jpg
Enrichment of Canonical Pathways and Transcriptional Regulators in SARC Compared with Conventional UC

(A) The top ten canonical pathways dysregulated in SARC, as revealed by IPA and GSEA.

(B) The top ten upstream regulators altered in SARC, as revealed by IPA and GSEA.

(C) Expression patterns of G1/S checkpoint pathway genes in molecular subtypes of conventional UC and SARC.

(D) Expression pattern of epithelial adherens genes in molecular subtypes of conventional UC and SARC.

The dotted lines indicate the significance level. For (A) and (B), p values were calculated using Fisher’s exact test.

Immune Infiltrate

Immune checkpoint blockade is clinically active in about 15% of patients with advanced bladder cancer, in which response is associated with high tumor mutational burden (TMB), the “genomically unstable” luminal subtypes, and infiltration with activated cytotoxic T lymphocytes (Mariathasan et al., 2018; Sharma et al., 2016; Sjödahl et al., 2012). Given the relatively high mutational frequencies observed in SARCs, we characterized the patterns of immune-related gene expression in them (Figure 4). In conventional UCs, 11% (9 of 84) showed enrichment of an immune gene expression signature and clustered in the luminal molecular subtype. In contrast, 39% of the SARCs (9 of 28) demonstrated overexpression of immune signature genes and preferentially clustered in the mesenchymal double-negative subtype. The enrichment of immune gene signature in SARC compared with conventional UC was confirmed by GSEA (Figure 5A). We additionally validated these findings by calculating immune score and showing its increase predominantly in the double-negative purely mesenchymal SARC. (Figure 5B) More in-depth analysis of immune infiltrate was performed using the CIBERSORT algorithm, which provided quantitative assessment of 22 immune cell types (Figure 5C) (Newman et al., 2015). It confirmed the enrichment of immune infiltrate in SARC compared with conventional UC (Figure 5D). Most important, it identified the significant increase of CD8+ cytotoxic T lymphocytes in double-negative purely mesenchymal and the most aggressive variant of SARC (Figure 5E). We also evaluated the expression signature of immune checkpoint ligands and their receptors (Sharma and Allison, 2015), including CD70 and CD27, CD80 and CTLA4, TNFSF9 and TNFRSF18, ADA and ADORA2A, and PDCD1G2 and CD274 (Figure 5F). In general, the overexpression of these genes was present in the same group of SARCs characterized by enrichment for the overexpression of immune signature genes. Of potential therapeutic significance, SARC exhibited the enhanced expression of programmed cell death ligand PD-L1 (CD274), and mRNA overexpression of PD-L1 was observed in more than 50% of the SARCs (15 of 28) (Figure 5G). This was confirmed by immunohistochemistry, which showed strong overexpression of the PD-L1 protein (Figure 5H), suggesting that immune checkpoint therapy may be an attractive therapeutic option for a subset of SARC patients.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0005.jpg
Expression Pattern of Immune Cell Infiltrate in Molecular Subtypes of Conventional UC and SARC

Top to bottom: B cell, T cell, CD8, MacTH1, and dendritic cell expression clusters. Boxed areas identify samples with enrichment of immune cell infiltrate.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0006.jpg
Expression Patterns of Immune Checkpoint Genes in Conventional UC and SARC

(A) GSEA of immune cell infiltrate of SARC compared with conventional UC. A p value < 0.05 was considered statistically significant.

(B) Boxplot of immune scores calculated using the gene expression profile shown in Figure 4 for molecular subtypes of conventional UC and SARC.

(C) Heatmap of CIBERSORT scores for 22 immune cell types in molecular subtypes of conventional UC and SAR. A p value < 0.05 was considered statistically significant.

(D) Proportion of cases with significant CIBERSORT scores in conventional UC and SARC.

(E) Boxplot of CIBERSORT scores for CD8+ T cells in molecular subtypes of conventional UC and SARC.

(F) Expression of immune checkpoint genes in conventional UC and SARC in relation to their molecular subtypes. The black boxes indicate the cases with enrichment for immune profile shown in Figure 4.

(G) Boxplot of mRNA PD-L1 expression levels in conventional UC and SARC.

(H) Examples of immunohistochemical expression of PD-L1 in conventional UC and SARC. Scale bars indicate 40 μm.

For (A), p value was calculated using the standard GSEA method, for (B) and (E) using the Kruskal-Wallis test, for (D) using Fisher’s exact test, and for (G) using the Wilcoxon rank-sum test.

MicroRNA Expression Profile

Similar to the gene expression profile, microRNA (miRNA) expression levels were widely dysregulated in SARC compared with conventional UC. Nearly 200 individual miRNA species were either overexpressed or downregulated in SARC (Figure S2A). In unsupervised clustering, a small subset of differentially expressed miRNAs could bimodally separate SARC from conventional UC (Figure S2B). Although the biological significance of the miRNAs that were upregulated in SARCs is not clear, the entire miR-200 family was downregulated in the double-negative subset of SARCs (Figure S2B), and it is likely that it plays an important role in their progression. Members of the miR-200 family play well-established roles in the maintenance of epithelial phenotype by suppressing the transcriptional regulators of EMT of the SNAIL, TWIST, and ZEB families (Peinado et al., 2007).

Dysregulation of the EMT Network

Microscopically, SARCs comprise mixed lineages of purely mesenchymal cells and cells with at least partial retention of an epithelial phenotype, reflecting various degrees of EMT (Sung et al., 2013). We previously showed that TP63 controls the expression of high-molecular weight keratins (KRT5, KRT6, and KRT14) and suppresses EMT (Choi et al., 2014b; Tran et al., 2013). The central role of p63 in the maintenance of epithelial phenotype and EMT was confirmed in several variants of solid tumors (Soares and Zhou, 2018; Stacy et al., 2017; Tanaka et al., 2018). We therefore performed additional analyses to further characterize the role of EMT in SARC. Among the EMT transcriptional regulators of the SNAIL, TWIST, and ZEB families, SNAIL2 was significantly overexpressed in SARC compared with conventional UC. Correspondingly, E-cadherin (CDH1) and other epithelial markers, including claudin-1 (CLDN1) and tight junction protein 1 (TJP1), were downregulated in SARC (Figure 6A; Figures S3 and S4A). In solid tumors, the EMT permissive state may be controlled by p53 and RB1 (Jiang et al., 2011), consistent with the mutational and gene expression characteristics of SARCs. Dysregulation of several additional pathways may have a synergistic effect on the EMT permissive state, including the upregulation of TGFB1 and RhoA, which were among the top upregulated pathways in SARC by IPA and GSEA. In addition, p63 and its downstream target genes were downregulated in SARC (Figures 3A, ,3B,3B, and and6A).6A). Importantly, SARC had widespread downregulation of miRNAs involved in EMT, including miR-100, the p63 targets miR-203 and miR-205, and all members of the miR-200 family (Tucci et al., 2012) (Figure 6A; Figure S4B). Because SARCs can be divided into two subgroups (i.e., SARCs that retain epithelial marker expression, reflecting partial EMT, and SARCs that are purely mesenchymal, reflecting complete EMT), we quantified the EMT levels across multiple tumor samples by calculating EMT scores on the basis of a 76-gene signature identified by Byers et al. (2013) (Figure 6B). This provided a quantitative assessment of the epithelial versus mesenchymal phenotype; positive EMT scores corresponded to the epithelial phenotype, whereas negative EMT scores reflected the mesenchymal phenotype. In general, the conventional UCs were characterized by positive EMT scores corresponding to their epithelial phenotype, whereas the basal and double-negative mesenchymal SARCs had intermediate and low EMT scores, reflecting their partial and complete EMT states. (Figure 6C). Essentially identical results were obtained by GSEA using the 175 EMT gene signature developed by Yu et al. (2013). The enrichments of SARCs compared with conventional UCs, as well as SARCs of the basal (partial EMT) subtypes compared with double-negative (complete EMT), were significantly different (Figures S4C and S4D). Immunohistochemistry revealed that tumors within the epithelial SARC subset showed focal retention of p63 and E-cadherin, both of which are involved in the maintenance of epithelial differentiation, whereas double-negative, purely mesenchymal SARCs were negative for the epithelial markers, confirming these subtypes’ partial and complete EMT states, respectively (Figure 6D). Finally, the loss of E-cadherin expression was confirmed in selected SARC samples by western blotting (Figure 6E).

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0007.jpg
Dysregulation of the EMT Regulatory Network in Molecular Subtypes of Conventional UC and SARC

(A) Expression patterns of representative genes in the EMT regulatory network.

(B) EMT scores in molecular subtypes of conventional UC and SARC.

(C) Boxplot of EMT scores in molecular subtypes of conventional UC and SARC. A p value < 0.05 was considered statistically significant.

(D) Examples of immunohistochemical expression of p63 and E-cadherin in conventional UC and SARC.

(E) Western blot documenting loss of E-cadherin expression in SARC.

Mutational Signature

The mutational profile of conventional UC was characterized by significant levels of recurrent somatic mutations in 30 genes (Figure 1A). The 10 most frequently mutated genes in UC were TP53 (47%), ARID1A (25%), KDM6A (22%), PIK3CA (22%), RB1 (17%), EP300 (15%), FGFR3 (14%), STAG2 (14%), ELF3 (12%), and CREBBP (11%). The overall mutational landscapes of luminal and basal bladder UC were similar, but several mutated genes were distinctively enriched in specific molecular subtypes. Mutated FGFR3, ELF3, CDKN1A, and TSC1 genes were enriched in luminal tumors, whereas mutated TP53, RB1, and PIK3CA genes were enriched in basal tumors (Figure 1A). SARCs showed high overall mutational rates (median mutational frequency 259, interquartile range 174), and their significantly mutated genes were similar to those observed in conventional UC (Figure 1B; Table S1). However, the top three genes—TP53 (72%), PIK3CA (39%), and RB1 (39%)—were mutated at significantly higher frequencies in SARCs than they were in conventional UCs (p < 0.01). This suggests that SARC evolved from precursor conventional UC carrying these mutations and that mutations in these genes may drive the progression process. Several of the genes that are frequently mutated in conventional UC, including ARID1A, KDM6A, EP300, ELF3, and CREBBP, were not mutated in SARC, and these genes are involved in chromatin remodeling (Dutto et al., 2018; Ler et al., 2017; Skulte et al., 2014; Tang et al., 2013; Wang et al., 2017; Zheng et al., 2018). In general, as a group, chromatin-remodeling genes were not mutated in SARC. Instead, SARC carried frequent mutations of MYO1F (33%), CNGB1 (22%), FXR1 (22%), RBM5 (22%), SEMA3D (22%), TTBK2 (22%), and ZNF90 (22%), which are involved in cellular motility, RNA binding, transmembrane ion channeling, kinase activity, and signaling (Ahonen et al., 2013; Bechara et al., 2013; Bouskila et al., 2011; Hamm et al., 2016; Kim et al., 2006; Qian et al., 2015; Somma et al., 1991). The functional significance of the mutations of these genes for sarcomatoid progression remains unclear, but they are attractive candidates for future mechanistic studies. Interestingly, FGFR3 mutations, which were present in 14% of conventional UCs, were not present in SARC. Nearly all mutations present in the SARCs were also present in the paired precursor conventional UCs of the same patients, indicating that the SARC and the presumed precursor lesions were clonally related.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0002.jpg
Mutational Landscape of SARC

(A) Mutational landscape among the molecular subtypes of 408 muscle-invasive bladder cancers from the TCGA cohort showing the frequency of mutations in individual tumors and somatic mutations for significantly mutated genes. The frequencies of mutations of individual genes in the luminal and basal subtypes are shown on the left. Asterisks denote the genes with statistically significant mutation frequencies in the luminal and basal subtypes. Bars on the right show the numbers of specific substitutions for individual genes.

(B) Mutational landscape of 13 cases of SARC and 5 paired samples of precursor conventional UC showing the frequency of mutations in individual genes and somatic mutations for significantly mutated genes. The frequencies of mutations of individual genes are shown on the left. Bars on the right show the numbers of specific substitutions for individual genes.

(C) Composite bar graphs showing the distribution of all nucleotide substitutions in three sets of samples corresponding to the TCGA cohort, paired precursor conventional UC, and SARC.

(D) Proportion of single-nucleotide variants (SNVs) in specific nucleotide motifs for each category of substitution in three sets of samples as shown in (C).

(E) False discovery rate (FDR) for specific nucleotide motifs in three sets of samples as shown in (C).

(F) Average weight scores of mutagenesis patterns in three sets of samples as shown in (C).

(G) Weight scores of mutagenesis patterns in individual tumor samples of the TCGA cohort.

(H) Weight scores of mutagenesis patterns in SARCs and paired precursor conventional UCs.

(I) Statistical significance of mutagenesis patterns (p value) in SARC compared with conventional UC. For (E) and (I), p value was calculated using Wilcoxon rank-sum and Kruskal-Wallis tests, respectively.

Mechanisms of Mutagenesis

To further characterize the mutational process associated with progression from conventional UC to SARC, we examined six single-base substitutions (C > A, C > G, C > T, T > A, T > C, and T > G) in all cancer samples (Alexandrov et al., 2013; Nik-Zainal et al., 2012). The results revealed that SARCs were enriched with C > T mutations compared with conventional UCs, and this increase was already apparent in the precursor conventional UCs that were associated with SARCs (Figures 1C and and1D).1D). Analyses of Sanger mutational signatures (Faltas et al., 2016) revealed the presence of six dominant signatures in the conventional UCs in the TCGA cohort: signatures 1, 2, 3 (BRCA1/2 mutagenesis), 13 (APOBEC), 19, and 30 (Figures 1F and and1G).1G). Clustering separated the conventional tumors into two subsets (a and b) that were characterized by different levels of signature 13 (APOBEC) prevalence. In contrast, SARCs and paired precursor conventional UCs were characterized by the uniform dominance of signature 1, which was present in all SARC and precursor conventional UC samples (Figure 1H). In addition, clustering segregated SARCs into two subsets, and they were also characterized by different levels of APOBEC activity. Mutagenesis signatures 1 and 19 were significantly enriched in SARCs compared with conventional UC (Figure 1I). Overall, these data suggest that SARCs evolve from a distinct subset of conventional UCs.

Gene Expression Profile

mRNA expression profiling revealed that more than 6,000 genes were differentially expressed between SARC and UC. We performed multiple unsupervised clustering analyses using all the differentially expressed genes in SARC and performed similar analyses using the top 100 and top 10 upregulated and downregulated genes. All these analyses separated SARC and UC into two distinct clusters (Figures S1A and S1B). One cluster contained conventional UC almost exclusively, whereas the other cluster contained most of the SARCs. Among the top upregulated genes in the SARC cluster were FAM101B (or RFLNB), UHRF1, and PHC2, all of which are members of the chromatinremodeling superfamily (Benavente et al., 2014; Gay et al., 2011; Isono et al., 2005) (Figure S1B). The top downregulated genes included the differentiation-associated transcription factor, ELF3, and genes involved in terminal urothelial differentiation, such as uroplakins and cell adherence genes, as well as LAD1, a component of anchoring filaments in the basement membrane (Hogg et al., 1999). The median survival for the SARC patients (11 months) was significantly shorter than that of the patients with conventional UC (24 months) (p = 0.0326) (Figure S1C).

Intrinsic Molecular Subtypes

Several molecular studies, including our own, have divided bladder cancer into two molecular subtypes that preferentially express basal or luminal genes (Cancer Genome Atlas Research Network, 2014; Choi et al., 2014a; Damrauer et al., 2014; Robertson et al., 2017). To investigate whether the intrinsic molecular types of conventional bladder UC applied to SARC, we analyzed the luminal and basal gene expression signatures in SARC. We used a previously developed classifier that includes markers of luminal and basal types (Choi et al., 2014a). The set of conventional UC was separated into two major groups (Figure 2A). The first group, comprising 53 of the 84 samples (63%), was characterized by high mRNA expression levels of luminal markers such as KRT20, GATA3, uroplakins, ERBB2, ERBB3, PPARG, FOXA1, and XBP1 and was referred to as the luminal subtype. The remaining 31 conventional UC samples (37%) were characterized by high expression levels of basal markers such as CD44, CDH3, KRT5, KRT6, and KRT14 and were referred to as the basal subtype.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0003.jpg
Luminal and Basal Molecular Subtypes in Conventional UC and SARC

(A) The expression of luminal and basal markers in molecular subtypes of conventional UC (n = 84) and SARC (n = 28).

(B) Kaplan-Meier analysis of molecular subtypes of conventional UC and SARC and log-rank testing. A p value < 0.05 was considered statistically significant.

(C) The immunohistochemical expression of signature luminal and basal markers in representative luminal and basal cases of conventional UC and in representative basal and double-negative SARC. Scale bars indicate 40 μm.

Unlike the conventional UCs, all 28 SARCs had low mRNA expression levels of the luminal genes, which suggested that they developed from a basal subtype of precursor conventional UC (Figure 2A). In addition, nearly half of the SARCs (12 [43%]) were characterized by the retained expression of basal keratins, CD44, and P-cadherin (CDH3), whereas the remaining SARCs (16 [57%]) lacked expression of canonical basal and luminal markers, and we therefore referred to them as “double-negative.” This subset of SARCs shares the “mesenchymal” molecular phenotype with the claudin-low and TCGA cluster IV subtypes identified previously (Cancer Genome Atlas Research Network, 2014; Damrauer et al., 2014; Kardos et al., 2016). Survival analysis revealed that the double-negative mesenchymal SARCs were the most aggressive of the molecular subtypes (Figure 2B). Furthermore, the mean survival duration of patients with double-negative SARC (10 months) was shorter than that of patients with basal SARC (18 months); however, this difference was not significant statistically, probably because of the limited number of cases. We verified the expression patterns of signature luminal markers (GATA3) and basal markers (P63, KRT5/6, KRT14) by immunohistochemistry using tissue microarrays containing the same cases (Figure 2C). The epithelial-basal SARCs were focally positive for basal markers such as p63, KRT5/6, and KRT14 and were negative for the signature luminal marker GATA3. In contrast, the purely mesenchymal double-negative SARCs were immunohistochemically negative for all luminal and basal markers, consistent with the RNA expression data.

Canonical and Upstream Regulator Pathways

In order to identify candidate mechanisms underlying the progression to the SARC phenotype, we used the Ingenuity Pathway Analysis (IPA) upstream pathways and gene set enrichment analysis (GSEA) functions to identify signaling pathways associated with the gene expression patterns observed (Figures 3A3D). SARCs were characterized by downregulation of G1/S checkpoint genes and upregulation of INK4 family of cyclindependent kinase inhibitors (CDKN2A-D), consistent with our observation that RB1 mutations were more common in SARCs. Similarly, SARCs exhibited decreased p53 pathway activity, which was consistent with their high mutational rate of the p53 gene. SARCs were also characterized by loss of epithelial adherens genes and downregulation of p63 pathway activity complemented with activation of TGFb and RhoA, all of which are characteristics of epithelial-mesenchymal transition (EMT) permissive state.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0004.jpg
Enrichment of Canonical Pathways and Transcriptional Regulators in SARC Compared with Conventional UC

(A) The top ten canonical pathways dysregulated in SARC, as revealed by IPA and GSEA.

(B) The top ten upstream regulators altered in SARC, as revealed by IPA and GSEA.

(C) Expression patterns of G1/S checkpoint pathway genes in molecular subtypes of conventional UC and SARC.

(D) Expression pattern of epithelial adherens genes in molecular subtypes of conventional UC and SARC.

The dotted lines indicate the significance level. For (A) and (B), p values were calculated using Fisher’s exact test.

Immune Infiltrate

Immune checkpoint blockade is clinically active in about 15% of patients with advanced bladder cancer, in which response is associated with high tumor mutational burden (TMB), the “genomically unstable” luminal subtypes, and infiltration with activated cytotoxic T lymphocytes (Mariathasan et al., 2018; Sharma et al., 2016; Sjödahl et al., 2012). Given the relatively high mutational frequencies observed in SARCs, we characterized the patterns of immune-related gene expression in them (Figure 4). In conventional UCs, 11% (9 of 84) showed enrichment of an immune gene expression signature and clustered in the luminal molecular subtype. In contrast, 39% of the SARCs (9 of 28) demonstrated overexpression of immune signature genes and preferentially clustered in the mesenchymal double-negative subtype. The enrichment of immune gene signature in SARC compared with conventional UC was confirmed by GSEA (Figure 5A). We additionally validated these findings by calculating immune score and showing its increase predominantly in the double-negative purely mesenchymal SARC. (Figure 5B) More in-depth analysis of immune infiltrate was performed using the CIBERSORT algorithm, which provided quantitative assessment of 22 immune cell types (Figure 5C) (Newman et al., 2015). It confirmed the enrichment of immune infiltrate in SARC compared with conventional UC (Figure 5D). Most important, it identified the significant increase of CD8+ cytotoxic T lymphocytes in double-negative purely mesenchymal and the most aggressive variant of SARC (Figure 5E). We also evaluated the expression signature of immune checkpoint ligands and their receptors (Sharma and Allison, 2015), including CD70 and CD27, CD80 and CTLA4, TNFSF9 and TNFRSF18, ADA and ADORA2A, and PDCD1G2 and CD274 (Figure 5F). In general, the overexpression of these genes was present in the same group of SARCs characterized by enrichment for the overexpression of immune signature genes. Of potential therapeutic significance, SARC exhibited the enhanced expression of programmed cell death ligand PD-L1 (CD274), and mRNA overexpression of PD-L1 was observed in more than 50% of the SARCs (15 of 28) (Figure 5G). This was confirmed by immunohistochemistry, which showed strong overexpression of the PD-L1 protein (Figure 5H), suggesting that immune checkpoint therapy may be an attractive therapeutic option for a subset of SARC patients.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0005.jpg
Expression Pattern of Immune Cell Infiltrate in Molecular Subtypes of Conventional UC and SARC

Top to bottom: B cell, T cell, CD8, MacTH1, and dendritic cell expression clusters. Boxed areas identify samples with enrichment of immune cell infiltrate.

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0006.jpg
Expression Patterns of Immune Checkpoint Genes in Conventional UC and SARC

(A) GSEA of immune cell infiltrate of SARC compared with conventional UC. A p value < 0.05 was considered statistically significant.

(B) Boxplot of immune scores calculated using the gene expression profile shown in Figure 4 for molecular subtypes of conventional UC and SARC.

(C) Heatmap of CIBERSORT scores for 22 immune cell types in molecular subtypes of conventional UC and SAR. A p value < 0.05 was considered statistically significant.

(D) Proportion of cases with significant CIBERSORT scores in conventional UC and SARC.

(E) Boxplot of CIBERSORT scores for CD8+ T cells in molecular subtypes of conventional UC and SARC.

(F) Expression of immune checkpoint genes in conventional UC and SARC in relation to their molecular subtypes. The black boxes indicate the cases with enrichment for immune profile shown in Figure 4.

(G) Boxplot of mRNA PD-L1 expression levels in conventional UC and SARC.

(H) Examples of immunohistochemical expression of PD-L1 in conventional UC and SARC. Scale bars indicate 40 μm.

For (A), p value was calculated using the standard GSEA method, for (B) and (E) using the Kruskal-Wallis test, for (D) using Fisher’s exact test, and for (G) using the Wilcoxon rank-sum test.

MicroRNA Expression Profile

Similar to the gene expression profile, microRNA (miRNA) expression levels were widely dysregulated in SARC compared with conventional UC. Nearly 200 individual miRNA species were either overexpressed or downregulated in SARC (Figure S2A). In unsupervised clustering, a small subset of differentially expressed miRNAs could bimodally separate SARC from conventional UC (Figure S2B). Although the biological significance of the miRNAs that were upregulated in SARCs is not clear, the entire miR-200 family was downregulated in the double-negative subset of SARCs (Figure S2B), and it is likely that it plays an important role in their progression. Members of the miR-200 family play well-established roles in the maintenance of epithelial phenotype by suppressing the transcriptional regulators of EMT of the SNAIL, TWIST, and ZEB families (Peinado et al., 2007).

Dysregulation of the EMT Network

Microscopically, SARCs comprise mixed lineages of purely mesenchymal cells and cells with at least partial retention of an epithelial phenotype, reflecting various degrees of EMT (Sung et al., 2013). We previously showed that TP63 controls the expression of high-molecular weight keratins (KRT5, KRT6, and KRT14) and suppresses EMT (Choi et al., 2014b; Tran et al., 2013). The central role of p63 in the maintenance of epithelial phenotype and EMT was confirmed in several variants of solid tumors (Soares and Zhou, 2018; Stacy et al., 2017; Tanaka et al., 2018). We therefore performed additional analyses to further characterize the role of EMT in SARC. Among the EMT transcriptional regulators of the SNAIL, TWIST, and ZEB families, SNAIL2 was significantly overexpressed in SARC compared with conventional UC. Correspondingly, E-cadherin (CDH1) and other epithelial markers, including claudin-1 (CLDN1) and tight junction protein 1 (TJP1), were downregulated in SARC (Figure 6A; Figures S3 and S4A). In solid tumors, the EMT permissive state may be controlled by p53 and RB1 (Jiang et al., 2011), consistent with the mutational and gene expression characteristics of SARCs. Dysregulation of several additional pathways may have a synergistic effect on the EMT permissive state, including the upregulation of TGFB1 and RhoA, which were among the top upregulated pathways in SARC by IPA and GSEA. In addition, p63 and its downstream target genes were downregulated in SARC (Figures 3A, ,3B,3B, and and6A).6A). Importantly, SARC had widespread downregulation of miRNAs involved in EMT, including miR-100, the p63 targets miR-203 and miR-205, and all members of the miR-200 family (Tucci et al., 2012) (Figure 6A; Figure S4B). Because SARCs can be divided into two subgroups (i.e., SARCs that retain epithelial marker expression, reflecting partial EMT, and SARCs that are purely mesenchymal, reflecting complete EMT), we quantified the EMT levels across multiple tumor samples by calculating EMT scores on the basis of a 76-gene signature identified by Byers et al. (2013) (Figure 6B). This provided a quantitative assessment of the epithelial versus mesenchymal phenotype; positive EMT scores corresponded to the epithelial phenotype, whereas negative EMT scores reflected the mesenchymal phenotype. In general, the conventional UCs were characterized by positive EMT scores corresponding to their epithelial phenotype, whereas the basal and double-negative mesenchymal SARCs had intermediate and low EMT scores, reflecting their partial and complete EMT states. (Figure 6C). Essentially identical results were obtained by GSEA using the 175 EMT gene signature developed by Yu et al. (2013). The enrichments of SARCs compared with conventional UCs, as well as SARCs of the basal (partial EMT) subtypes compared with double-negative (complete EMT), were significantly different (Figures S4C and S4D). Immunohistochemistry revealed that tumors within the epithelial SARC subset showed focal retention of p63 and E-cadherin, both of which are involved in the maintenance of epithelial differentiation, whereas double-negative, purely mesenchymal SARCs were negative for the epithelial markers, confirming these subtypes’ partial and complete EMT states, respectively (Figure 6D). Finally, the loss of E-cadherin expression was confirmed in selected SARC samples by western blotting (Figure 6E).

An external file that holds a picture, illustration, etc.
Object name is nihms-1528840-f0007.jpg
Dysregulation of the EMT Regulatory Network in Molecular Subtypes of Conventional UC and SARC

(A) Expression patterns of representative genes in the EMT regulatory network.

(B) EMT scores in molecular subtypes of conventional UC and SARC.

(C) Boxplot of EMT scores in molecular subtypes of conventional UC and SARC. A p value < 0.05 was considered statistically significant.

(D) Examples of immunohistochemical expression of p63 and E-cadherin in conventional UC and SARC.

(E) Western blot documenting loss of E-cadherin expression in SARC.

DISCUSSION

Bladder cancer is a major source of morbidity and mortality in the United States and worldwide. In the United States, there were approximately 17,000 deaths related to bladder cancer in 2018 (Siegel et al., 2018). Progression to SARC is associated with a high propensity for early metastasis and dismal 5-year survival rates (Amin, 2009; Lopez-Beltran et al., 1998; Sanfrancesco et al., 2016). In the following, we highlight essential findings concerning molecular characterizations of SARC and suggest how these findings may contribute to our understanding of its aggressive behavior as well as how they open therapeutic possibilities. SARCs exhibit age and male-predominant gender distribution, similar to conventional UC of the bladder (Cancer Genome Atlas Research Network, 2014; Robertson et al., 2017). The TCGA genomic profiling studies, in general, did not reveal major differences in molecular profiles of conventional UC of the bladder between men and women. Although gender-related molecular differences in patients affected by SARC cannot be completely excluded, they were not investigated, as the cohort was considered too small to effectively address this issue.

SARCs have a high overall mutation rate, similar to those of conventional UC, melanoma, and non-small-cell lung cancers (Hoadley et al., 2018). Our findings show that in SARC, these high mutation rates are associated with mutation signature 1 and that SARCs can be separated into two subgroups (i.e., one with and one without mutation signatures of an endogenous mutagenic enzyme, APOBEC cytidine deaminase). In addition, SARCs are enriched for TP53, RB1, and PIK3CA mutations compared with conventional UCs and exhibit gene expression profiles that are consistent with combined loss of TP53 and RB1 pathway activities. In contrast to conventional UCs, and consistent with their basal origins, none of the SARCs profiled here contained activating FGFR3 mutations, which are enriched in luminal UCs. The absence of mutations in chromatin-remodeling genes is surprising, as SARCs in other organs have been reported to have frequent mutations of these genes, including ARID1A, the frequently mutated chromatin-modifying gene in conventional UC (Bi et al., 2016; Jones et al., 2014). On the other hand, the expression analyses have shown that SARCs exhibit upregulation of several chromatin-remodeling genes, including RFLNB, UHRF1, and PHC2, suggesting that chromatin-remodeling gene activity might function to promote EMT, possibly as a cytoprotective mechanism that compensates for their high mutational load (Benavente et al., 2014; Gay et al., 2011; Isono et al., 2005).

SARCs exhibit a widespread change of their expression profile affecting approximately 30% of the protein-coding genome. However, much of this activity converged on pathways that control EMT, a process in which epithelial cells lose their adhesive features and develop migratory infiltrative properties of the mesenchymal cells (Brabletz et al., 2018; Thiery et al., 2009; Zeisberg and Neilson, 2009). The main features of EMT include suppression of E-cadherin and other homotypic adherence and cell polarity genes that is mediated by a group of transcriptional repressors (the SNAIL, TWIST, and ZEB families) complemented by a complex multi-layered regulatory network (Zeisberg and Neilson, 2009). In physiology, EMT is responsible for multiorgan development during the embryogenesis and maintenance or regeneration of these organs in adulthood (Thiery et al., 2009). In cancer, including those originating in the bladder, EMT may be a major contributor to the aggressive behavior responsible for invasive growth and metastasis (McConkey et al., 2009). Our quantitative assessment of EMT showed that basal and double-negative SARCs had intermediate and low EMT scores, respectively, which reflected their partial and complete EMT states, and the purely mesenchymal SARCs were the most aggressive variant of the disease. At the core of this circuitry are p53 and RB, which negatively regulate EMT in solid tumors and appear to be coordinately downregulated in a large percentage of SARCs. Activation of TGFB1 and RhoA can be viewed as a synergistic mechanism complementing the loss of p53 and RB. Another central mechanism that appears to be associated with the double-negative SARCs involves downregulation of p63, which positively regulates basal biomarker (i.e., CDH3, CD44, KRT5, KRT6, and KRT14) expression and negatively regulates EMT via miR-205 (Choi et al., 2014b; Tran et al., 2013). In fact, p63 and its target gene expression levels, many of which represent basal biomarkers, segregate SARCs into epithelial-basal and double-negative purely mesenchymal subtypes. Downregulation of the miR-200 family and other EMT regulatory miRNA species probably reinforces the mesenchymal phenotype of these tumors. Importantly, combined TP53 and RB1 pathways inactivation and upregulation of EMT appear to be characteristics of the clinically aggressive bladder cancer subset with neuroendocrine phenotype recently identified in the TCGA cohort (Robertson et al., 2017). Precisely how these EMT processes are initiated and what are the molecular mechanisms that distinguish sarcomatoid from neuroendocrine progression require further investigation.

We also found that nearly half of SARCs exhibit an expression signature associated with immune cell infiltration and significant increase of CD8+ cytotoxic lymphocytes particularly evident in double-negative mesenchymal, the most aggressive variant of SARC. These features combined with overexpression of immune checkpoint receptors and their ligands, including PD-L1, suggest opportunities for immune checkpoint therapy for a subset of patients with SARC, which are typically resistant to conventional cisplatin-based chemotherapies (Kamat et al., 2016; Lopez-Beltran et al., 1998; Sanfrancesco et al., 2016).

On the basis of our findings, we conclude that SARCs are driven by profound dysregulation of the EMT network and that a large proportion of SARCs have an immune infiltration phenotype with upregulation of PD-L1. These features present new avenues of therapeutic potential in patients with this highly lethal variant of bladder cancer.

STAR★METHODS

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and request for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Bogdan Czerniak (gro.nosrednadm@ainrezcb).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Clinical information and tissue samples

The use of human tissue samples for this study was approved by the Institutional Review Board of the UT MD Anderson Cancer Center. We identified 5,639 cases of bladder cancer, 147 of which were SARC, in a 5-year window from 2008–2013 in MD Anderson Cancer Center patient records. Most of the SARC cases were outside consultations for which paraffin blocks of tumor tissue were not available. For 52 SARC cases, formalin-fixed, paraffin-embedded (FFPE) tissue was available for additional studies. Paraffin blocks from 84 stage- and grade-matched cases of conventional UC were assembled for comparison and clinical data, including patient demographic characteristics, treatments, and outcomes, were retrieved from the patients’ medical records. UCs were classified according to the histologic tumor grading system of the World Health Organization (Grignon et al., 2016). Microscopically SARCs represented high-grade spindle cell sarcoma in 40 cases and undifferentiated pleomorphic sarcoma in 12 cases. Levels of invasion were defined according to the TNM staging system (Bochner et al., 2017). All conventional UCs and SARCs were invasive T2 and above high-grade tumors. The SARC and UC cohorts had similar age and gender distributions with a male predominance. The mean age of the SARC cohort (37 men and 15 women) was 70 years (range, 41–89 years). The mean age of the conventional UC cohort (65 men and 19 women) was 69 years (range, 33–91 years). The median follow-up times for the SARC and UC cohorts were 9.5 and 23 months, respectively. Together, the two cohorts had at least 90 patients whose deaths were cancer-related. For DNA/RNA extraction and tissue microarray construction histologic slides were reviewed to identify well-preserved tumorrich areas with minimal amounts of stroma that were marked on the corresponding paraffin blocks. Four parallel tissue samples were taken from these areas using a 1.0-mm biopsy punch (Miltex, York, PA). In those tumors, which contained conventional UC precursor lesions and SARC areas, the two components were sampled separately. Two of the tissue cylinders were used for DNA and RNA extractions for genomic profiling. The other two cylinders were used for the construction of a tissue microarray and for immunohistochemical validation analyses of selected proteins.

METHOD DETAILS

DNA and RNA extraction

Genomic DNA and total RNA were extracted from FFPE tissue samples for DNA sequencing and microarray experiments using the MasterPure Complete DNA and RNA Purification Kit (Epicenter Biotechnologies, Madison, WI) according to the manufacturer’s instructions as previously described (Guo et al., 2016). In brief, FFPE tissue cylinders were minced, deparaffinized, and digested with 300 ml Proteinase K digestion buffer with 10 ml Proteinase K (50 mg/ml; Roche Diagnostics, Mannheim, Germany) at 55 C overnight. DNA and RNA concentrations and quality were determined by an ND-1000 spectrophotometer (NanoDrop Technologies Inc., Wilmington, DE) and Quant-iT PicoGreen Kit (Life Technologies, Carlsbad, CA). Sufficient amounts of total RNA for gene expression analysis were extracted from 28 SARC and 84 conventional UC samples. In addition, sufficient amounts of genomic DNA were extracted from 13 SARC samples, including 5 samples that also contained coexistent precursor conventional UC. DNA extracted from the peripheral blood lymphocytes or normal tissue of the resection specimen from the same patient was used as a control.

QUANTIFICATION AND STATISTICAL ANALYSIS

Whole-exome sequencing and processing pipeline

Genomic DNA from 13 cases of SARC and five cases of paired conventional UC were used for whole-exome sequencing, which was performed on the HiSeq 2000 platform (Illumina, San Diego, CA) at MD Anderson Cancer Center’s Genomics Core. The TCGA data on 408 muscle-invasive conventional UCs of the bladder were used as a reference set for mutational analyses. BWA-MEM (version 0.7.12) was used to align reads to the hg19 reference genome. Samtools (version 1.4) and Picard (version 2.5.0) were used to sort and convert between formats and remove duplicate reads (Etherington et al., 2015; Li et al., 2009). The Genome Analysis Toolkit (version 3.4–46) was used to generate realigned and recalibrated BAM files (McKenna et al., 2010; Van der Auwera et al., 2013). Somatic variants relative to the normal reference sample were detected by MuTect2 (Cai et al., 2016; Callari et al., 2017). Oncotator (version 1.8.0.0) was used to produce gene-based and function-based annotations of the single nucleotide variants (SNVs) and insertions/deletions (Ramos et al., 2015). Similar analyses were performed for the genome-wide expression data from the TCGA cohort (n = 408), and tumors were assigned to specific molecular subtypes by applying the sets of luminal, basal, and p53 markers as described previously (Choi et al., 2014a). Mutational data were downloaded from the TCGA portal (https://tcga-data.nci.nih.gov/tcga/). MutSigCV (version 1.4; https://www.broadinstitute.org/cancer/cga/mutsig) was used to identify genes that were mutated more often than expected by chance given the background mutation processes (Lawrence et al., 2013). The significant gene list was obtained using a false discovery rate (FDR) cutoff of 0.05. The statistical significance of associations between the mutations and the molecular subtypes was assessed by the Fisher exact test.

Mutagenesis signatures

We used 432 SNVs identified in at least one sample and segregated them into six types of mutations corresponding to the following base pair substitutions: C > A, C > G, C > T, T > A, T > C, and T > G. The Fisher exact test was used to determine the distribution of these mutations in the three groups of samples corresponding to conventional UC in the TCGA cohort and paired UCs and SARCs in the sarcomatoid cohort. The genomic context of SNVs, referred to as fingerprints and which included the two flanking bases on the 5ʹand 3ʹ sides to each position for a total of 96 possible mutational fingerprints, was assembled. Wilcoxon rank sum tests were used to test against the hypothesis of no difference in the frequency of any fingerprint between any two groups of mucosal samples. The Benjamini and Hochberg method was applied to control the FDR. For each sample, we used its mutational fingerprints (V) and the quadratic programming method to compute a weight score (H) for each of 30 canonical mutational signatures (W) available from the Sanger Institute (http://cancer.sanger.ac.uk/cosmic/signatures). We applied the 96-by-30 matrix of canonical signatures (W) and, given the 96-by-1 mutational profile of a sample (V), we computed the 30-y-1 vector (H) for each of the canonical signatures’ relative contributions to the sample profile by solving the following optimization formula:

minH(WH  V)T(WH  V) such that hi0 and ihi=1

The optimization problems were solved using “quadprog” (version 1.5–5). The Kruskal-Wallis test was used to test against the null hypothesis of no difference in weight scores among conventional UC and paired UC and SARC.

mRNA expression and data processing

RNAs (0.25–1.0 μg) from SARCs (n = 28) and conventional UCs (n = 84) were assessed using Illumina HumanHT-12 DASL Expression BeadChips as per the manufacturer’s instructions, and the Illumina BeadStudio v3.1.3 (Gene Expression Module V3.3.8) was used for transformation and normalization of the data. Comparisons were carried out using Welch’s t tests and Benjamini-Hochberg–controlled FDR–adjusted p values (< 0.05) and fold changes. Unsupervised hierarchical clustering of log ratios was performed with Cluster 3.0, and the results were visualized with Treeview software (Stanford University, Palo Alto, CA). Pearson correlation, mean centering, and average linkage were applied in all clustering applications. Genes within 0.5 standard deviations of the log-transformed ratios were discarded. To select specific and robust gene sets associated with SARC, we used the combination analysis with the Welch t test and fold-change; genes having p values < 0.05 and showing fold-change > 2.0 were selected. IPA software (Ingenuity Systems, Redwood City, CA) was used to determine dysregulated canonical pathways and predicted upstream regulators by calculating z-scores and −log10 p values (Jiménez-Marín et al., 2009; Krämer et al., 2014). GSEA was used to evaluate the enrichment probability of the top canonical pathways and upstream regulators identified by IPA (Subramanian et al., 2005). Both SARC and UC samples were classified into luminal, and basal intrinsic molecular subtypes using an algorithm described previously (Choi et al., 2014b).

Immune gene expression signatures for SARC and conventional UC were established using unsupervised hierarchical clustering. Gene dendrogram nodes corresponding to genes characteristically expressed in specific immune cell types were identified and validated through DAVID functional annotation clustering and Ingenuity Systems Analysis (www.ingenuity.com). Immune gene signatures were used as reported previously (De Simone et al., 2016; Iglesia et al., 2014; Torri et al., 2010). The immune expression signature was quantitatively assessed by calculating the immune scores for the expression profile of 128 genes shown in Figure 4. Specifically, the immune score for the ith sample was defined as mi(1/n)j=1nmjwhere mi is the median expression level across the ith sample’s immune expression profile and (1/n)j=1nmjis the grand mean of the medians across all n samples. Additional analysis of immune infiltrate was performed by the CIBERSORT algorithm (https://cibersort.stanford.edu/runcibersort.php). The expression profile of 547 genes using normalized mRNA levels with absolute mode and default parameters was used to assess the presence of 22 immune cell types in conventional UC and SARC. (Newman et al., 2015) An empirical p value was calculated using 500 permutations to test against the null hypothesis that no cell type is enriched in each sample. Then a Fisher Exact test was used to test against the null hypothesis of no association between sample types and their statistical significance.

To quantitatively assess the level of EMT, we calculated the EMT score based on a 76-gene expression signature reported by Byers et al. (2013). For each tumor sample, the score was calculated as a weighted sum of 76 gene expression levels: i=176wiGij, where wi is the correlation coefficient between the ith gene expression in the signature and that of E-cadherin, and Gij is the ith gene’s normalized expression in the jth tumor sample. We centered the scores by subtracting the mean across all tumor samples so that the grand mean of the score was zero.

miRNA analysis

miRNA analysis was performed on 52 SARC samples and 80 conventional UC samples. For miRNA cDNA synthesis, 400 ng of total RNA was reverse-transcribed using a miRNA reverse transcription kit (Applied Biosystems; catalog no. 4366596) in combination with the stem-loop Megaplex primer pool (Applied Biosystems). For each cDNA sample, 381 small RNAs were profiled using TaqMan Human MicroRNA A Cards (Applied Biosystems; catalog no. 4398965). Fold-change for each microRNA was determined using the DCt method and examined using Welch’s t test. An adjusted p value with FDR < 0.05 was considered significant.

Validation studies

The expression levels of selected genes were validated on parallel tissue microarrays comprising FFPE samples of 84 UCs and 28 SARCs. The microarrays were designed and prepared as described previously and profiled by genomic platforms. In brief, the tissue microarrays (two 1-mm cores per case) were constructed with a manual tissue arrayer (Beecher Instruments, Silver Spring, MD). Tissue sections from the tissue microarrays were stained with hematoxylin and eosin to confirm the presence of tumor tissue. Immunohistochemical staining was performed with mouse monoclonal antibodies against human GATA3 (HG3–31 clone, 1:100 dilution; Santa Cruz Biotechnology Inc., Santa Cruz, CA), cytokeratin 5/6 (clone D5/16B4, 1:50 dilution; Dako, Carpinteria, CA), cytokeratin 14 (LL002 clone, 1:50 dilution; BioGenex, Fremont, CA), PD-L1 (clone 22C3 pharmDx without dilution, IHC assay; Dako), E-cadherin (HECD-1 clone, 1:100 dilution; Thermo Fisher Scientific), and P63 (4A4 clone, 1:200 dilution; BioCare Medical, Pacheco, CA). Immunostaining was performed using the Bond-Max Autostainer (Leica Biosystems, Buffalo Grove, IL). The staining intensity was scored by two pathologists (CCG and BAC) as negative and mildly, moderately, or strongly positive. In addition, the loss of expression of E-cadherin (4A2 clone, Cell Signaling Technology) in SARC was confirmed on selected frozen tumor samples by western blotting. In brief, whole cell extracts of tumor tissue were analyzed by immunoblotting using an anti-CDH1 antibody (4A, 1:1000 dilution; Cell Signaling Technology, Danvers, MA).

General statistical analyses

Survival analyses were performed by Kaplan–Meier analysis and log-rank testing. Welch’s t test was used for two-sample comparison, whereas the Kruskal-Wallis test was used for multiple group comparison. For genome-wide mRNA and miRNA differential expression analysis, the Benjamini and Hochberg (BH) method was applied to control the false discovery rate (FDR). An adjusted p value with FDR < 0.05 was considered statistically significant.

DATA AND SOFTWARE AVAILABILITY

Data were deposited on SRA (SRA: PRJNA525451) and GEO (GEO: {"type":"entrez-geo","attrs":{"text":"GSE128192","term_id":"128192"}}GSE128192 and GEO: {"type":"entrez-geo","attrs":{"text":"GSE128277","term_id":"128277"}}GSE128277).

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and request for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Bogdan Czerniak (gro.nosrednadm@ainrezcb).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Clinical information and tissue samples

The use of human tissue samples for this study was approved by the Institutional Review Board of the UT MD Anderson Cancer Center. We identified 5,639 cases of bladder cancer, 147 of which were SARC, in a 5-year window from 2008–2013 in MD Anderson Cancer Center patient records. Most of the SARC cases were outside consultations for which paraffin blocks of tumor tissue were not available. For 52 SARC cases, formalin-fixed, paraffin-embedded (FFPE) tissue was available for additional studies. Paraffin blocks from 84 stage- and grade-matched cases of conventional UC were assembled for comparison and clinical data, including patient demographic characteristics, treatments, and outcomes, were retrieved from the patients’ medical records. UCs were classified according to the histologic tumor grading system of the World Health Organization (Grignon et al., 2016). Microscopically SARCs represented high-grade spindle cell sarcoma in 40 cases and undifferentiated pleomorphic sarcoma in 12 cases. Levels of invasion were defined according to the TNM staging system (Bochner et al., 2017). All conventional UCs and SARCs were invasive T2 and above high-grade tumors. The SARC and UC cohorts had similar age and gender distributions with a male predominance. The mean age of the SARC cohort (37 men and 15 women) was 70 years (range, 41–89 years). The mean age of the conventional UC cohort (65 men and 19 women) was 69 years (range, 33–91 years). The median follow-up times for the SARC and UC cohorts were 9.5 and 23 months, respectively. Together, the two cohorts had at least 90 patients whose deaths were cancer-related. For DNA/RNA extraction and tissue microarray construction histologic slides were reviewed to identify well-preserved tumorrich areas with minimal amounts of stroma that were marked on the corresponding paraffin blocks. Four parallel tissue samples were taken from these areas using a 1.0-mm biopsy punch (Miltex, York, PA). In those tumors, which contained conventional UC precursor lesions and SARC areas, the two components were sampled separately. Two of the tissue cylinders were used for DNA and RNA extractions for genomic profiling. The other two cylinders were used for the construction of a tissue microarray and for immunohistochemical validation analyses of selected proteins.

Clinical information and tissue samples

The use of human tissue samples for this study was approved by the Institutional Review Board of the UT MD Anderson Cancer Center. We identified 5,639 cases of bladder cancer, 147 of which were SARC, in a 5-year window from 2008–2013 in MD Anderson Cancer Center patient records. Most of the SARC cases were outside consultations for which paraffin blocks of tumor tissue were not available. For 52 SARC cases, formalin-fixed, paraffin-embedded (FFPE) tissue was available for additional studies. Paraffin blocks from 84 stage- and grade-matched cases of conventional UC were assembled for comparison and clinical data, including patient demographic characteristics, treatments, and outcomes, were retrieved from the patients’ medical records. UCs were classified according to the histologic tumor grading system of the World Health Organization (Grignon et al., 2016). Microscopically SARCs represented high-grade spindle cell sarcoma in 40 cases and undifferentiated pleomorphic sarcoma in 12 cases. Levels of invasion were defined according to the TNM staging system (Bochner et al., 2017). All conventional UCs and SARCs were invasive T2 and above high-grade tumors. The SARC and UC cohorts had similar age and gender distributions with a male predominance. The mean age of the SARC cohort (37 men and 15 women) was 70 years (range, 41–89 years). The mean age of the conventional UC cohort (65 men and 19 women) was 69 years (range, 33–91 years). The median follow-up times for the SARC and UC cohorts were 9.5 and 23 months, respectively. Together, the two cohorts had at least 90 patients whose deaths were cancer-related. For DNA/RNA extraction and tissue microarray construction histologic slides were reviewed to identify well-preserved tumorrich areas with minimal amounts of stroma that were marked on the corresponding paraffin blocks. Four parallel tissue samples were taken from these areas using a 1.0-mm biopsy punch (Miltex, York, PA). In those tumors, which contained conventional UC precursor lesions and SARC areas, the two components were sampled separately. Two of the tissue cylinders were used for DNA and RNA extractions for genomic profiling. The other two cylinders were used for the construction of a tissue microarray and for immunohistochemical validation analyses of selected proteins.

METHOD DETAILS

DNA and RNA extraction

Genomic DNA and total RNA were extracted from FFPE tissue samples for DNA sequencing and microarray experiments using the MasterPure Complete DNA and RNA Purification Kit (Epicenter Biotechnologies, Madison, WI) according to the manufacturer’s instructions as previously described (Guo et al., 2016). In brief, FFPE tissue cylinders were minced, deparaffinized, and digested with 300 ml Proteinase K digestion buffer with 10 ml Proteinase K (50 mg/ml; Roche Diagnostics, Mannheim, Germany) at 55 C overnight. DNA and RNA concentrations and quality were determined by an ND-1000 spectrophotometer (NanoDrop Technologies Inc., Wilmington, DE) and Quant-iT PicoGreen Kit (Life Technologies, Carlsbad, CA). Sufficient amounts of total RNA for gene expression analysis were extracted from 28 SARC and 84 conventional UC samples. In addition, sufficient amounts of genomic DNA were extracted from 13 SARC samples, including 5 samples that also contained coexistent precursor conventional UC. DNA extracted from the peripheral blood lymphocytes or normal tissue of the resection specimen from the same patient was used as a control.

DNA and RNA extraction

Genomic DNA and total RNA were extracted from FFPE tissue samples for DNA sequencing and microarray experiments using the MasterPure Complete DNA and RNA Purification Kit (Epicenter Biotechnologies, Madison, WI) according to the manufacturer’s instructions as previously described (Guo et al., 2016). In brief, FFPE tissue cylinders were minced, deparaffinized, and digested with 300 ml Proteinase K digestion buffer with 10 ml Proteinase K (50 mg/ml; Roche Diagnostics, Mannheim, Germany) at 55 C overnight. DNA and RNA concentrations and quality were determined by an ND-1000 spectrophotometer (NanoDrop Technologies Inc., Wilmington, DE) and Quant-iT PicoGreen Kit (Life Technologies, Carlsbad, CA). Sufficient amounts of total RNA for gene expression analysis were extracted from 28 SARC and 84 conventional UC samples. In addition, sufficient amounts of genomic DNA were extracted from 13 SARC samples, including 5 samples that also contained coexistent precursor conventional UC. DNA extracted from the peripheral blood lymphocytes or normal tissue of the resection specimen from the same patient was used as a control.

QUANTIFICATION AND STATISTICAL ANALYSIS

Whole-exome sequencing and processing pipeline

Genomic DNA from 13 cases of SARC and five cases of paired conventional UC were used for whole-exome sequencing, which was performed on the HiSeq 2000 platform (Illumina, San Diego, CA) at MD Anderson Cancer Center’s Genomics Core. The TCGA data on 408 muscle-invasive conventional UCs of the bladder were used as a reference set for mutational analyses. BWA-MEM (version 0.7.12) was used to align reads to the hg19 reference genome. Samtools (version 1.4) and Picard (version 2.5.0) were used to sort and convert between formats and remove duplicate reads (Etherington et al., 2015; Li et al., 2009). The Genome Analysis Toolkit (version 3.4–46) was used to generate realigned and recalibrated BAM files (McKenna et al., 2010; Van der Auwera et al., 2013). Somatic variants relative to the normal reference sample were detected by MuTect2 (Cai et al., 2016; Callari et al., 2017). Oncotator (version 1.8.0.0) was used to produce gene-based and function-based annotations of the single nucleotide variants (SNVs) and insertions/deletions (Ramos et al., 2015). Similar analyses were performed for the genome-wide expression data from the TCGA cohort (n = 408), and tumors were assigned to specific molecular subtypes by applying the sets of luminal, basal, and p53 markers as described previously (Choi et al., 2014a). Mutational data were downloaded from the TCGA portal (https://tcga-data.nci.nih.gov/tcga/). MutSigCV (version 1.4; https://www.broadinstitute.org/cancer/cga/mutsig) was used to identify genes that were mutated more often than expected by chance given the background mutation processes (Lawrence et al., 2013). The significant gene list was obtained using a false discovery rate (FDR) cutoff of 0.05. The statistical significance of associations between the mutations and the molecular subtypes was assessed by the Fisher exact test.

Mutagenesis signatures

We used 432 SNVs identified in at least one sample and segregated them into six types of mutations corresponding to the following base pair substitutions: C > A, C > G, C > T, T > A, T > C, and T > G. The Fisher exact test was used to determine the distribution of these mutations in the three groups of samples corresponding to conventional UC in the TCGA cohort and paired UCs and SARCs in the sarcomatoid cohort. The genomic context of SNVs, referred to as fingerprints and which included the two flanking bases on the 5ʹand 3ʹ sides to each position for a total of 96 possible mutational fingerprints, was assembled. Wilcoxon rank sum tests were used to test against the hypothesis of no difference in the frequency of any fingerprint between any two groups of mucosal samples. The Benjamini and Hochberg method was applied to control the FDR. For each sample, we used its mutational fingerprints (V) and the quadratic programming method to compute a weight score (H) for each of 30 canonical mutational signatures (W) available from the Sanger Institute (http://cancer.sanger.ac.uk/cosmic/signatures). We applied the 96-by-30 matrix of canonical signatures (W) and, given the 96-by-1 mutational profile of a sample (V), we computed the 30-y-1 vector (H) for each of the canonical signatures’ relative contributions to the sample profile by solving the following optimization formula:

minH(WH  V)T(WH  V) such that hi0 and ihi=1

The optimization problems were solved using “quadprog” (version 1.5–5). The Kruskal-Wallis test was used to test against the null hypothesis of no difference in weight scores among conventional UC and paired UC and SARC.

mRNA expression and data processing

RNAs (0.25–1.0 μg) from SARCs (n = 28) and conventional UCs (n = 84) were assessed using Illumina HumanHT-12 DASL Expression BeadChips as per the manufacturer’s instructions, and the Illumina BeadStudio v3.1.3 (Gene Expression Module V3.3.8) was used for transformation and normalization of the data. Comparisons were carried out using Welch’s t tests and Benjamini-Hochberg–controlled FDR–adjusted p values (< 0.05) and fold changes. Unsupervised hierarchical clustering of log ratios was performed with Cluster 3.0, and the results were visualized with Treeview software (Stanford University, Palo Alto, CA). Pearson correlation, mean centering, and average linkage were applied in all clustering applications. Genes within 0.5 standard deviations of the log-transformed ratios were discarded. To select specific and robust gene sets associated with SARC, we used the combination analysis with the Welch t test and fold-change; genes having p values < 0.05 and showing fold-change > 2.0 were selected. IPA software (Ingenuity Systems, Redwood City, CA) was used to determine dysregulated canonical pathways and predicted upstream regulators by calculating z-scores and −log10 p values (Jiménez-Marín et al., 2009; Krämer et al., 2014). GSEA was used to evaluate the enrichment probability of the top canonical pathways and upstream regulators identified by IPA (Subramanian et al., 2005). Both SARC and UC samples were classified into luminal, and basal intrinsic molecular subtypes using an algorithm described previously (Choi et al., 2014b).

Immune gene expression signatures for SARC and conventional UC were established using unsupervised hierarchical clustering. Gene dendrogram nodes corresponding to genes characteristically expressed in specific immune cell types were identified and validated through DAVID functional annotation clustering and Ingenuity Systems Analysis (www.ingenuity.com). Immune gene signatures were used as reported previously (De Simone et al., 2016; Iglesia et al., 2014; Torri et al., 2010). The immune expression signature was quantitatively assessed by calculating the immune scores for the expression profile of 128 genes shown in Figure 4. Specifically, the immune score for the ith sample was defined as mi(1/n)j=1nmjwhere mi is the median expression level across the ith sample’s immune expression profile and (1/n)j=1nmjis the grand mean of the medians across all n samples. Additional analysis of immune infiltrate was performed by the CIBERSORT algorithm (https://cibersort.stanford.edu/runcibersort.php). The expression profile of 547 genes using normalized mRNA levels with absolute mode and default parameters was used to assess the presence of 22 immune cell types in conventional UC and SARC. (Newman et al., 2015) An empirical p value was calculated using 500 permutations to test against the null hypothesis that no cell type is enriched in each sample. Then a Fisher Exact test was used to test against the null hypothesis of no association between sample types and their statistical significance.

To quantitatively assess the level of EMT, we calculated the EMT score based on a 76-gene expression signature reported by Byers et al. (2013). For each tumor sample, the score was calculated as a weighted sum of 76 gene expression levels: i=176wiGij, where wi is the correlation coefficient between the ith gene expression in the signature and that of E-cadherin, and Gij is the ith gene’s normalized expression in the jth tumor sample. We centered the scores by subtracting the mean across all tumor samples so that the grand mean of the score was zero.

miRNA analysis

miRNA analysis was performed on 52 SARC samples and 80 conventional UC samples. For miRNA cDNA synthesis, 400 ng of total RNA was reverse-transcribed using a miRNA reverse transcription kit (Applied Biosystems; catalog no. 4366596) in combination with the stem-loop Megaplex primer pool (Applied Biosystems). For each cDNA sample, 381 small RNAs were profiled using TaqMan Human MicroRNA A Cards (Applied Biosystems; catalog no. 4398965). Fold-change for each microRNA was determined using the DCt method and examined using Welch’s t test. An adjusted p value with FDR < 0.05 was considered significant.

Validation studies

The expression levels of selected genes were validated on parallel tissue microarrays comprising FFPE samples of 84 UCs and 28 SARCs. The microarrays were designed and prepared as described previously and profiled by genomic platforms. In brief, the tissue microarrays (two 1-mm cores per case) were constructed with a manual tissue arrayer (Beecher Instruments, Silver Spring, MD). Tissue sections from the tissue microarrays were stained with hematoxylin and eosin to confirm the presence of tumor tissue. Immunohistochemical staining was performed with mouse monoclonal antibodies against human GATA3 (HG3–31 clone, 1:100 dilution; Santa Cruz Biotechnology Inc., Santa Cruz, CA), cytokeratin 5/6 (clone D5/16B4, 1:50 dilution; Dako, Carpinteria, CA), cytokeratin 14 (LL002 clone, 1:50 dilution; BioGenex, Fremont, CA), PD-L1 (clone 22C3 pharmDx without dilution, IHC assay; Dako), E-cadherin (HECD-1 clone, 1:100 dilution; Thermo Fisher Scientific), and P63 (4A4 clone, 1:200 dilution; BioCare Medical, Pacheco, CA). Immunostaining was performed using the Bond-Max Autostainer (Leica Biosystems, Buffalo Grove, IL). The staining intensity was scored by two pathologists (CCG and BAC) as negative and mildly, moderately, or strongly positive. In addition, the loss of expression of E-cadherin (4A2 clone, Cell Signaling Technology) in SARC was confirmed on selected frozen tumor samples by western blotting. In brief, whole cell extracts of tumor tissue were analyzed by immunoblotting using an anti-CDH1 antibody (4A, 1:1000 dilution; Cell Signaling Technology, Danvers, MA).

General statistical analyses

Survival analyses were performed by Kaplan–Meier analysis and log-rank testing. Welch’s t test was used for two-sample comparison, whereas the Kruskal-Wallis test was used for multiple group comparison. For genome-wide mRNA and miRNA differential expression analysis, the Benjamini and Hochberg (BH) method was applied to control the false discovery rate (FDR). An adjusted p value with FDR < 0.05 was considered statistically significant.

Whole-exome sequencing and processing pipeline

Genomic DNA from 13 cases of SARC and five cases of paired conventional UC were used for whole-exome sequencing, which was performed on the HiSeq 2000 platform (Illumina, San Diego, CA) at MD Anderson Cancer Center’s Genomics Core. The TCGA data on 408 muscle-invasive conventional UCs of the bladder were used as a reference set for mutational analyses. BWA-MEM (version 0.7.12) was used to align reads to the hg19 reference genome. Samtools (version 1.4) and Picard (version 2.5.0) were used to sort and convert between formats and remove duplicate reads (Etherington et al., 2015; Li et al., 2009). The Genome Analysis Toolkit (version 3.4–46) was used to generate realigned and recalibrated BAM files (McKenna et al., 2010; Van der Auwera et al., 2013). Somatic variants relative to the normal reference sample were detected by MuTect2 (Cai et al., 2016; Callari et al., 2017). Oncotator (version 1.8.0.0) was used to produce gene-based and function-based annotations of the single nucleotide variants (SNVs) and insertions/deletions (Ramos et al., 2015). Similar analyses were performed for the genome-wide expression data from the TCGA cohort (n = 408), and tumors were assigned to specific molecular subtypes by applying the sets of luminal, basal, and p53 markers as described previously (Choi et al., 2014a). Mutational data were downloaded from the TCGA portal (https://tcga-data.nci.nih.gov/tcga/). MutSigCV (version 1.4; https://www.broadinstitute.org/cancer/cga/mutsig) was used to identify genes that were mutated more often than expected by chance given the background mutation processes (Lawrence et al., 2013). The significant gene list was obtained using a false discovery rate (FDR) cutoff of 0.05. The statistical significance of associations between the mutations and the molecular subtypes was assessed by the Fisher exact test.

Mutagenesis signatures

We used 432 SNVs identified in at least one sample and segregated them into six types of mutations corresponding to the following base pair substitutions: C > A, C > G, C > T, T > A, T > C, and T > G. The Fisher exact test was used to determine the distribution of these mutations in the three groups of samples corresponding to conventional UC in the TCGA cohort and paired UCs and SARCs in the sarcomatoid cohort. The genomic context of SNVs, referred to as fingerprints and which included the two flanking bases on the 5ʹand 3ʹ sides to each position for a total of 96 possible mutational fingerprints, was assembled. Wilcoxon rank sum tests were used to test against the hypothesis of no difference in the frequency of any fingerprint between any two groups of mucosal samples. The Benjamini and Hochberg method was applied to control the FDR. For each sample, we used its mutational fingerprints (V) and the quadratic programming method to compute a weight score (H) for each of 30 canonical mutational signatures (W) available from the Sanger Institute (http://cancer.sanger.ac.uk/cosmic/signatures). We applied the 96-by-30 matrix of canonical signatures (W) and, given the 96-by-1 mutational profile of a sample (V), we computed the 30-y-1 vector (H) for each of the canonical signatures’ relative contributions to the sample profile by solving the following optimization formula:

minH(WH  V)T(WH  V) such that hi0 and ihi=1

The optimization problems were solved using “quadprog” (version 1.5–5). The Kruskal-Wallis test was used to test against the null hypothesis of no difference in weight scores among conventional UC and paired UC and SARC.

mRNA expression and data processing

RNAs (0.25–1.0 μg) from SARCs (n = 28) and conventional UCs (n = 84) were assessed using Illumina HumanHT-12 DASL Expression BeadChips as per the manufacturer’s instructions, and the Illumina BeadStudio v3.1.3 (Gene Expression Module V3.3.8) was used for transformation and normalization of the data. Comparisons were carried out using Welch’s t tests and Benjamini-Hochberg–controlled FDR–adjusted p values (< 0.05) and fold changes. Unsupervised hierarchical clustering of log ratios was performed with Cluster 3.0, and the results were visualized with Treeview software (Stanford University, Palo Alto, CA). Pearson correlation, mean centering, and average linkage were applied in all clustering applications. Genes within 0.5 standard deviations of the log-transformed ratios were discarded. To select specific and robust gene sets associated with SARC, we used the combination analysis with the Welch t test and fold-change; genes having p values < 0.05 and showing fold-change > 2.0 were selected. IPA software (Ingenuity Systems, Redwood City, CA) was used to determine dysregulated canonical pathways and predicted upstream regulators by calculating z-scores and −log10 p values (Jiménez-Marín et al., 2009; Krämer et al., 2014). GSEA was used to evaluate the enrichment probability of the top canonical pathways and upstream regulators identified by IPA (Subramanian et al., 2005). Both SARC and UC samples were classified into luminal, and basal intrinsic molecular subtypes using an algorithm described previously (Choi et al., 2014b).

Immune gene expression signatures for SARC and conventional UC were established using unsupervised hierarchical clustering. Gene dendrogram nodes corresponding to genes characteristically expressed in specific immune cell types were identified and validated through DAVID functional annotation clustering and Ingenuity Systems Analysis (www.ingenuity.com). Immune gene signatures were used as reported previously (De Simone et al., 2016; Iglesia et al., 2014; Torri et al., 2010). The immune expression signature was quantitatively assessed by calculating the immune scores for the expression profile of 128 genes shown in Figure 4. Specifically, the immune score for the ith sample was defined as mi(1/n)j=1nmjwhere mi is the median expression level across the ith sample’s immune expression profile and (1/n)j=1nmjis the grand mean of the medians across all n samples. Additional analysis of immune infiltrate was performed by the CIBERSORT algorithm (https://cibersort.stanford.edu/runcibersort.php). The expression profile of 547 genes using normalized mRNA levels with absolute mode and default parameters was used to assess the presence of 22 immune cell types in conventional UC and SARC. (Newman et al., 2015) An empirical p value was calculated using 500 permutations to test against the null hypothesis that no cell type is enriched in each sample. Then a Fisher Exact test was used to test against the null hypothesis of no association between sample types and their statistical significance.

To quantitatively assess the level of EMT, we calculated the EMT score based on a 76-gene expression signature reported by Byers et al. (2013). For each tumor sample, the score was calculated as a weighted sum of 76 gene expression levels: i=176wiGij, where wi is the correlation coefficient between the ith gene expression in the signature and that of E-cadherin, and Gij is the ith gene’s normalized expression in the jth tumor sample. We centered the scores by subtracting the mean across all tumor samples so that the grand mean of the score was zero.

miRNA analysis

miRNA analysis was performed on 52 SARC samples and 80 conventional UC samples. For miRNA cDNA synthesis, 400 ng of total RNA was reverse-transcribed using a miRNA reverse transcription kit (Applied Biosystems; catalog no. 4366596) in combination with the stem-loop Megaplex primer pool (Applied Biosystems). For each cDNA sample, 381 small RNAs were profiled using TaqMan Human MicroRNA A Cards (Applied Biosystems; catalog no. 4398965). Fold-change for each microRNA was determined using the DCt method and examined using Welch’s t test. An adjusted p value with FDR < 0.05 was considered significant.

Validation studies

The expression levels of selected genes were validated on parallel tissue microarrays comprising FFPE samples of 84 UCs and 28 SARCs. The microarrays were designed and prepared as described previously and profiled by genomic platforms. In brief, the tissue microarrays (two 1-mm cores per case) were constructed with a manual tissue arrayer (Beecher Instruments, Silver Spring, MD). Tissue sections from the tissue microarrays were stained with hematoxylin and eosin to confirm the presence of tumor tissue. Immunohistochemical staining was performed with mouse monoclonal antibodies against human GATA3 (HG3–31 clone, 1:100 dilution; Santa Cruz Biotechnology Inc., Santa Cruz, CA), cytokeratin 5/6 (clone D5/16B4, 1:50 dilution; Dako, Carpinteria, CA), cytokeratin 14 (LL002 clone, 1:50 dilution; BioGenex, Fremont, CA), PD-L1 (clone 22C3 pharmDx without dilution, IHC assay; Dako), E-cadherin (HECD-1 clone, 1:100 dilution; Thermo Fisher Scientific), and P63 (4A4 clone, 1:200 dilution; BioCare Medical, Pacheco, CA). Immunostaining was performed using the Bond-Max Autostainer (Leica Biosystems, Buffalo Grove, IL). The staining intensity was scored by two pathologists (CCG and BAC) as negative and mildly, moderately, or strongly positive. In addition, the loss of expression of E-cadherin (4A2 clone, Cell Signaling Technology) in SARC was confirmed on selected frozen tumor samples by western blotting. In brief, whole cell extracts of tumor tissue were analyzed by immunoblotting using an anti-CDH1 antibody (4A, 1:1000 dilution; Cell Signaling Technology, Danvers, MA).

General statistical analyses

Survival analyses were performed by Kaplan–Meier analysis and log-rank testing. Welch’s t test was used for two-sample comparison, whereas the Kruskal-Wallis test was used for multiple group comparison. For genome-wide mRNA and miRNA differential expression analysis, the Benjamini and Hochberg (BH) method was applied to control the false discovery rate (FDR). An adjusted p value with FDR < 0.05 was considered statistically significant.

DATA AND SOFTWARE AVAILABILITY

Data were deposited on SRA (SRA: PRJNA525451) and GEO (GEO: {"type":"entrez-geo","attrs":{"text":"GSE128192","term_id":"128192"}}GSE128192 and GEO: {"type":"entrez-geo","attrs":{"text":"GSE128277","term_id":"128277"}}GSE128277).

Supplementary Material

1

2

3

1

Click here to view.(1.0M, pdf)

2

Click here to view.(15K, xlsx)

3

Click here to view.(9.0M, pdf)

ACKNOWLEDGMENTS

This study was supported by NCI, United States, Genitourinary Bladder SPORE Grant P50CA 91846 (Project 1 and Core C to B.C.).

Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
Department of Bioinformatics &amp; Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Department of Urology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Johns Hopkins Greenberg Bladder Cancer Institute, Johns Hopkins University, Baltimore, MD, USA
These authors contributed equally
Lead Contact

AUTHOR CONTRIBUTIONS

B.C. conceived and supervised the whole project. T.M., C.C.G., J.B., Y.W., S.Z., J.G., S.L., D.C., and D.M. performed experiments and generated data. L.Z., H.Y., Z.W., and P.W. performed data analysis. J.N.W. and K.B. supervised data analysis. C.C.G., B.C., M.Z., H.H., and V.D. reviewed and analyzed the pathology data. H.B.G., A.K., J.J.D., J.E.F., J.G., and C.D. reviewed and analyzed the clinical data. B.C. wrote the manuscript with input from the other authors.
Correspondence: gro.nosrednadm@ainrezcb
Highlights

Footnotes

SUPPLEMENTAL INFORMATION

Supplemental Information can be found online at https://doi.org/10.1016/j.celrep.2019.04.048.

DECLARATION OF INTERESTS

The authors declare no competing interests.

Footnotes

REFERENCES

REFERENCES
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.