TANRIC: An Interactive Open Platform to Explore the Function of lncRNAs in Cancer.
Journal: 2016/January - Cancer Research
ISSN: 1538-7445
Abstract:
Long noncoding RNAs (lncRNA) have emerged as essential players in cancer biology. Using recent large-scale RNA-seq datasets, especially those from The Cancer Genome Atlas (TCGA), we have developed "The Atlas of Noncoding RNAs in Cancer" (TANRIC; http://bioinformatics.mdanderson.org/main/TANRIC:Overview), a user-friendly, open-access web resource for interactive exploration of lncRNAs in cancer. It characterizes the expression profiles of lncRNAs in large patient cohorts of 20 cancer types, including TCGA and independent datasets (>8,000 samples overall). TANRIC enables researchers to rapidly and intuitively analyze lncRNAs of interest (annotated lncRNAs or any user-defined ones) in the context of clinical and other molecular data, both within and across tumor types. Using TANRIC, we have identified a large number of lncRNAs with potential biomedical significance, many of which show strong correlations with established therapeutic targets and biomarkers across tumor types or with drug sensitivity across cell lines. TANRIC represents a valuable tool for investigating the function and clinical relevance of lncRNAs in cancer, greatly facilitating lncRNA-related biologic discoveries and clinical applications.
Relations:
Content
Citations
(137)
References
(36)
Diseases
(1)
Chemicals
(1)
Organisms
(1)
Processes
(1)
Anatomy
(1)
Affiliates
(3)
Similar articles
Articles by the same authors
Discussion board
Cancer Res 75(18): 3728-3737

TANRIC: An interactive open platform to explore the function of lncRNAs in cancer

Introduction

The human genome encodes ~20,000 protein-coding genes and also a large number of transcriptionally active, noncoding RNAs (~14,000 according to the ENCODE annotation (1)). Among noncoding RNAs, long noncoding RNAs (lncRNAs), typically >200 bp, have increasingly been recognized as playing essential roles in tumor biology, representing a new focus in cancer research (24). Emerging evidence has indicated that lncRNAs contribute to tumor initiation and progression through diverse mechanisms ranging from epigenetic regulation of key cancer genes (5,6) and enhancer-associated activity (7) to post-transcriptional processing of mRNAs (8,9). Therefore, central tasks in cancer research are identification of lncRNA components involved in carcinogenesis and elucidation of their functions in specific tumor contexts. That inquiry is expected to lay the foundation for development of novel biomarkers and therapeutic agents.

Recent RNA-seq data over large cancer patient cohorts provide an unprecedented opportunity to pursue that inquiry in a systematic way. In particular, The Cancer Genome Atlas (TCGA) represents a unique resource since it generates multidimensional data at the DNA, RNA, and protein levels for a broad range of human tumor types (10). However, there are several computational challenges for biomedical researchers to make full use of these data and prioritize lncRNAs for further functional investigations. First, the number of expressed lncRNAs in human cancers is large. For example, a very recent pan-cancer analysis reveals ~8,000 tumor-specific or lineage-specific lncRNAs (11). In terms of prioritizing lncRNAs with potential clinical relevance and elucidating their mechanisms, it is very informative to perform the correlation analysis of lncRNA expression with clinical variables (e.g., patient survival) or with the molecular characteristics of driver genes or therapeutic targets (e.g., PTEN loss or HER2 status) over large patient cohorts. But because of high dimension and complexity of the data involved, such analyses are often daunting and time-consuming. Second, the annotation of lncRNAs in the human genome is rough, very incomplete and fast evolving, so it is important for researchers to be able to query the expression profiles of user-defined lncRNAs (based on genomic coordinates). This function is not available in current lncRNA-related bioinformatics resources since it requires the calculation directly from a huge amount of raw RNA-seq mapping files. Third, given lncRNA candidates of interest, it is critical to examine their profiles in a variety of cancer cell lines, which allows researchers to choose appropriate model systems for experimental studies. Unfortunately, efficient bioinformatics tools with the above functions are still missing, representing a major barrier for the cancer research community to a systems-level understanding of the function and underlying mechanisms of lncRNAs.

To fill the gap, we have developed The Atlas of Noncoding RNAs in Cancer (TANRIC), a user-friendly, open resource for interactive exploration of lncRNAs in the context of TCGA clinical and genomic data. Using TANRIC, we have demonstrated that a large number of lncRNA species show differential expression among known tumor subtypes or in correlation with clinical variables; many lncRNAs show strong correlations with established therapeutic targets and biomarkers across tumor types or with drug sensitivity across cell lines; and the tumor subtypes defined by lncRNA-expression profiles show extensive concordance with established tumor subtypes and provide potential prognostic value.

Materials and Methods

Data resource

We downloaded RNA-seq BAM files of 6,309 patient samples (including 6,083 primary tumor samples and 226 metastasis samples) across 20 TCGA cancer types and their related 564 non-tumor tissue samples (if available)(10) from the UCSC Cancer Genomics Hub (CGHub, https://cghub.ucsc.edu/). Included were bladder urothelial carcinoma (BLCA), brain lower grade glioma (LGG), breast invasive carcinoma (BRCA), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), colon adenocarcinoma (COAD), cutaneous melanoma (SKCM), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney chromophobe (KICH), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian serous cystadenocarcinoma (OV), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), and uterine corpus endometrioid carcinoma (UCEC). We also downloaded 739 BAM files of Cancer Cell Line Encyclopedia (CCLE) cell lines (12) from CGHub. In addition, we obtained the RNA-seq files of 531 samples from another three independent studies, including lung adenocarcinoma (13), clear-cell renal cell carcinoma (14), and glioblastomas (15). In total, the current TANRIC release includes RNA-seq data from 8,143 samples (1,142 billion reads).

Efficient algorithm for expression quantitation of user-defined lncRNAs

To calculate the expression of a user-defined lncRNA, TANRIC accepts the genomic coordinates of multiple segments as the input (e.g., given a lncRNA of 3 exons, the input could be “chr7:27135713-27136007;27138458-27138985;27139398-27139585”). The total exon length of a queried lncRNA should be shorter than 50kb. To minimize the computation time for quantifying user-defined lncRNA expression, we preprocessed all raw BAM files through three steps: (i) extraction of sequence depth data from raw BAM files using SAMtools; (ii) division of genome-wide depth data into short segments (~3,000,000 bp); (iii) merged of the data into a single file, thereby minimizing the file input/output time when dealing with hundreds of samples for each cancer type; (iv) compression of the merged depth files using a block compression algorithm; and (v) generation of corresponding index files for quick location and retrieval of queried data. We quantified the lncRNA expression as Reads Per Kilobase per Million mapped reads (RPKM) (16) and generated the expression profile in a dynamic table. With data preprocessing, the time for calculating lncRNA expression was reduced by >100-fold compared with that of SAMtools. Currently, TANRIC, operating single-threaded, can generate the expression profile for any user-defined lncRNA in a minute. That capability dramatically improves performance, enabling rapid analysis of specified lncRNAs through a web interface.

Implementation of the TANRIC data portal

The expression data on annotated lncRNAs and the pre-calculated correlations with clinical and genomic data are stored in CouchDB. Correlation, differential analyses and survival analyses were performed in R. The Web interface was implemented in JavaScript; tables were visualized by DataTables; the embedded plots were based on HighCharts; and heat maps were generated using the Next-Generation Clustered Heat Map tool (Broom, Weinstein, et al., in preparation).

Expression quantitation of annotated lncRNAs

To perform a comprehensive survey of human lncRNAs, we obtained the genomic coordinates of 13,870 human lncRNAs from the GENCODE Resource (version 19)(1). We further filtered those lncRNA exons that overlapped with any known coding genes based on the gene annotations of GENCODE (1) and RefGene. As a result, the analysis focused on the remaining 12,727 lncRNAs. Based on the BAM files, we quantified the expression levels of lncRNAs as RPKM, and the lncRNAs with detectable expression were defined as those with an average RPKM ≥ 0.3 across all samples in each cancer type, as defined in the literature (17).

Analysis of expressed lncRNAs for biomedical significance

We obtained the clinical information associated with tumor samples, including the patient’s overall survival time, tumor stage and tumor grade from Synapse TCGA Pan-Cancer data portal (https://www.synapse.org/), with ID syn300013. We also obtained known tumor subtype information from TCGA marker papers (if available). To identify lncRNAs differentially expressed between tumor and matched normal samples, we used the paired student t-test to assess the statistical difference between the two groups. To identify lncRNAs differentially expressed among established tumor subtypes or tumor stages, we used analysis of variance (ANOVA) to assess the statistical difference. Groups with fewer than 5 samples were excluded from the analysis.

Analysis of lncRNA expression related to potential clinical applications

We obtained a list of 121 actionable target genes from Van Allen et al.(18) and added two genes that are well-established targets in immune therapy. We downloaded TCGA molecular profiling data of these target genes, including somatic mutations, mRNA expression, miRNA expression, and somatic copy number alteration (SCNA) data from Synapse TCGA Pan-Cancer data portal. Student t-tests were used to assess the statistical difference in lncRNA expression between mutated and wild-type samples given a gene of interest, and Spearman rank correlations were used to assess relationships between lncRNA expression and SCNA or mRNA, with a coefficient (absolute value) cutoff of 0.6. Multiple comparisons correction was performed using the Benjamini-Hochberg method with a corrected FDR cutoff of 0.05, and a 2-fold change between at least two groups was also required. To assess the effects of lncRNA expression on drug sensitivity, we downloaded the drug screening data from CCLE (http://www.broadinstitute.org/ccle/home), and calculated the correlations between the expression levels of ~1,290 expressed lncRNAs and the IC50 values of 24 drugs across ~330 cell lines. Spearman rank correlations were used to detect significant correlations with a coefficient (absolute value) cutoff of 0.3.

Analysis of tumor subtypes revealed by lncRNA expression

To classify tumor subtypes based on lncRNA expression, for each cancer type, we selected the 500 lncRNAs with the most variable expression pattern and used ConsensusClusterPlus (19) to classify the tumor samples into sample clusters (subtypes). We then used the chi-squared test to determine concordance between lncRNA-expression subtypes and known subtypes and the log-rank test to examine whether lncRNA-expression subtypes significantly correlated with the overall patient survival times. To understand the molecular mechanisms associated with lncRNA subtypes, we downloaded reverse-phase protein array (RPPA) expression data from TCPA (20). Pathway analysis was conducted as previously described (21). Briefly, the members of each pathway were predefined based on a literature search. RPPA data were median-centered and normalized by standard deviation across all samples for each component to obtain relative protein levels. The pathway score was then taken as the sum of the relative protein levels of all positive regulatory components minus the equivalent sum for the negative regulatory components in a particular pathway. Antibodies targeting different phosphorylated forms of the same protein with Pearson correlation coefficient >0.85 were averaged. We used a student t-test or ANOVA analysis to assess statistical differences in pathway score among groups, using the Benjamini-Hochberg correction, with FDR cutoff of 0.05.

Data resource

We downloaded RNA-seq BAM files of 6,309 patient samples (including 6,083 primary tumor samples and 226 metastasis samples) across 20 TCGA cancer types and their related 564 non-tumor tissue samples (if available)(10) from the UCSC Cancer Genomics Hub (CGHub, https://cghub.ucsc.edu/). Included were bladder urothelial carcinoma (BLCA), brain lower grade glioma (LGG), breast invasive carcinoma (BRCA), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), colon adenocarcinoma (COAD), cutaneous melanoma (SKCM), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney chromophobe (KICH), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian serous cystadenocarcinoma (OV), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), and uterine corpus endometrioid carcinoma (UCEC). We also downloaded 739 BAM files of Cancer Cell Line Encyclopedia (CCLE) cell lines (12) from CGHub. In addition, we obtained the RNA-seq files of 531 samples from another three independent studies, including lung adenocarcinoma (13), clear-cell renal cell carcinoma (14), and glioblastomas (15). In total, the current TANRIC release includes RNA-seq data from 8,143 samples (1,142 billion reads).

Efficient algorithm for expression quantitation of user-defined lncRNAs

To calculate the expression of a user-defined lncRNA, TANRIC accepts the genomic coordinates of multiple segments as the input (e.g., given a lncRNA of 3 exons, the input could be “chr7:27135713-27136007;27138458-27138985;27139398-27139585”). The total exon length of a queried lncRNA should be shorter than 50kb. To minimize the computation time for quantifying user-defined lncRNA expression, we preprocessed all raw BAM files through three steps: (i) extraction of sequence depth data from raw BAM files using SAMtools; (ii) division of genome-wide depth data into short segments (~3,000,000 bp); (iii) merged of the data into a single file, thereby minimizing the file input/output time when dealing with hundreds of samples for each cancer type; (iv) compression of the merged depth files using a block compression algorithm; and (v) generation of corresponding index files for quick location and retrieval of queried data. We quantified the lncRNA expression as Reads Per Kilobase per Million mapped reads (RPKM) (16) and generated the expression profile in a dynamic table. With data preprocessing, the time for calculating lncRNA expression was reduced by >100-fold compared with that of SAMtools. Currently, TANRIC, operating single-threaded, can generate the expression profile for any user-defined lncRNA in a minute. That capability dramatically improves performance, enabling rapid analysis of specified lncRNAs through a web interface.

Implementation of the TANRIC data portal

The expression data on annotated lncRNAs and the pre-calculated correlations with clinical and genomic data are stored in CouchDB. Correlation, differential analyses and survival analyses were performed in R. The Web interface was implemented in JavaScript; tables were visualized by DataTables; the embedded plots were based on HighCharts; and heat maps were generated using the Next-Generation Clustered Heat Map tool (Broom, Weinstein, et al., in preparation).

Expression quantitation of annotated lncRNAs

To perform a comprehensive survey of human lncRNAs, we obtained the genomic coordinates of 13,870 human lncRNAs from the GENCODE Resource (version 19)(1). We further filtered those lncRNA exons that overlapped with any known coding genes based on the gene annotations of GENCODE (1) and RefGene. As a result, the analysis focused on the remaining 12,727 lncRNAs. Based on the BAM files, we quantified the expression levels of lncRNAs as RPKM, and the lncRNAs with detectable expression were defined as those with an average RPKM ≥ 0.3 across all samples in each cancer type, as defined in the literature (17).

Analysis of expressed lncRNAs for biomedical significance

We obtained the clinical information associated with tumor samples, including the patient’s overall survival time, tumor stage and tumor grade from Synapse TCGA Pan-Cancer data portal (https://www.synapse.org/), with ID syn300013. We also obtained known tumor subtype information from TCGA marker papers (if available). To identify lncRNAs differentially expressed between tumor and matched normal samples, we used the paired student t-test to assess the statistical difference between the two groups. To identify lncRNAs differentially expressed among established tumor subtypes or tumor stages, we used analysis of variance (ANOVA) to assess the statistical difference. Groups with fewer than 5 samples were excluded from the analysis.

Analysis of lncRNA expression related to potential clinical applications

We obtained a list of 121 actionable target genes from Van Allen et al.(18) and added two genes that are well-established targets in immune therapy. We downloaded TCGA molecular profiling data of these target genes, including somatic mutations, mRNA expression, miRNA expression, and somatic copy number alteration (SCNA) data from Synapse TCGA Pan-Cancer data portal. Student t-tests were used to assess the statistical difference in lncRNA expression between mutated and wild-type samples given a gene of interest, and Spearman rank correlations were used to assess relationships between lncRNA expression and SCNA or mRNA, with a coefficient (absolute value) cutoff of 0.6. Multiple comparisons correction was performed using the Benjamini-Hochberg method with a corrected FDR cutoff of 0.05, and a 2-fold change between at least two groups was also required. To assess the effects of lncRNA expression on drug sensitivity, we downloaded the drug screening data from CCLE (http://www.broadinstitute.org/ccle/home), and calculated the correlations between the expression levels of ~1,290 expressed lncRNAs and the IC50 values of 24 drugs across ~330 cell lines. Spearman rank correlations were used to detect significant correlations with a coefficient (absolute value) cutoff of 0.3.

Analysis of tumor subtypes revealed by lncRNA expression

To classify tumor subtypes based on lncRNA expression, for each cancer type, we selected the 500 lncRNAs with the most variable expression pattern and used ConsensusClusterPlus (19) to classify the tumor samples into sample clusters (subtypes). We then used the chi-squared test to determine concordance between lncRNA-expression subtypes and known subtypes and the log-rank test to examine whether lncRNA-expression subtypes significantly correlated with the overall patient survival times. To understand the molecular mechanisms associated with lncRNA subtypes, we downloaded reverse-phase protein array (RPPA) expression data from TCPA (20). Pathway analysis was conducted as previously described (21). Briefly, the members of each pathway were predefined based on a literature search. RPPA data were median-centered and normalized by standard deviation across all samples for each component to obtain relative protein levels. The pathway score was then taken as the sum of the relative protein levels of all positive regulatory components minus the equivalent sum for the negative regulatory components in a particular pathway. Antibodies targeting different phosphorylated forms of the same protein with Pearson correlation coefficient >0.85 were averaged. We used a student t-test or ANOVA analysis to assess statistical differences in pathway score among groups, using the Benjamini-Hochberg correction, with FDR cutoff of 0.05.

Results

A user-friendly, interactive, open-access platform for exploring the function of lncRNAs in cancer

To provide a comprehensive lncRNA resource to the cancer research community, we have collected large-scale RNA-seq datasets from TCGA and other, independent studies and have made processed lncRNA expression data plus multiple analysis and visualization modules available through TANRIC (http://bioinformatics.mdanderson.org/main/TANRIC:Overview) (Fig. 1). The current data release, which covers 8,143 samples, has three parts (Table 1). (i) Part one consists of TCGA tissue sample sets: 6,309 tumor samples from 20 cancer types and 564 normal samples from 11 tissues. Other TCGA cancer sets will be added in the coming months. (ii) Part two consists of independent tumor tissue sample sets: one glioblastoma multiforme set (274 samples)(15), one kidney renal clear cell carcinoma set (97 samples)(14) and one lung adenocarcinoma set (83 samples)(13). Other independent sample sets will be added when available. (iii) Part three consists of tumor cell lines: 739 cell line samples from CCLE (12). To our knowledge, this represents the largest publicly available collection of lncRNA data with parallel multidimensional cancer genomic data.

An external file that holds a picture, illustration, etc.
Object name is nihms708898f1.jpg
Summary of TANRIC architecture

Table 1

Summary of the data resources of the current TANRIC release

Data SourceCancer Type#Normal
samples
#Tumor
samples
Sequencing
strategy
Read
length
#Expressed
lncRNAs*
TCGABladder urothelial carcinoma (BLCA)19252Paired-end481958
TCGABrain lower grade glioma (LGG)0486Paired-end482301
TCGABreast invasive carcinoma (BRCA)105837Paired-end501960
TCGACervical squamous cell carcinoma and endocervical adenocarcinoma (CESC)3196Paired-end481846
TCGAColon adenocarcinoma (COAD)0157Single-end76714
TCGASkin cutaneous melanoma (SKCM)0226Paired-end481755
TCGAGlioblastoma multiforme (GBM)0154Paired-end762369
TCGAHead and neck squamous cell carcinoma (HNSC)42426Paired-end481357
TCGAKidney chromophobe (KICH)2566Paired-end481971
TCGAKidney renal clear cell carcinoma (KIRC)67448Paired-end502111
TCGAKidney renal papillary cell carcinoma (KIRP)30198Paired-end482118
TCGALiver hepatocellular carcinoma (LIHC)50200Paired-end481446
TCGALung adenocarcinoma (LUAD)58488Paired-end482031
TCGALung squamous cell carcinoma (LUSC)17220Paired-end501883
TCGAOvarian serous cystadenocarcinoma (OV)0412Paired-end751866
TCGAProstate adenocarcinoma (PRAD)52374Paired-end482010
TCGARectal adenocarcinoma (READ)071Single-end76716
TCGAStomach adenocarcinoma (STAD)33285Paired-end751328
TCGAThyroid carcinoma (THCA)59497Paired-end481900
TCGAUterine corpus endometrioid carcinoma (UCEC)4316Single-end76855
CCLETumor cell lines0739Paired-end1012137
IndependentChinese_GBM0274Paired-end1012419
IndependentJapanese_KIRC097Paired-end1002308
IndependentKorean_LUAD7783Paired-end1012569
Expressed lncRNAs defined as those with an average RPKM ≥ 0.3 across all samples in each cancer type.

TANRIC integrates lncRNA expression data with clinical and genomic data (Fig. 1) and provides a user-friendly interface consisting of six modules: Summary, Visualization, Download, My lncRNA, Analyze all lncRNAs and lncRNAs in cell lines (Fig. 2, i). The “Summary” module shows an overview of RNA-seq datasets in TANRIC with a detailed description of each set (e.g., source, read length, sequencing platform and sequencing strategy) (Fig. 2, ii). The “Visualization” module offers an innovative way to examine the global patterns of lncRNA expression in a specific sample set through “next-generation clustered heat maps” (Fig. 2, iii). The interactive heat maps allow users to zoom, navigate, and drill down on clustering patterns (subtypes) of samples or lncRNAs and link to relevant biological information sources. The “Download” module allows users to obtain the expression data of ~13,000 annotated lncRNAs for analysis. (Fig. 2, iv, Materials and Methods).

An external file that holds a picture, illustration, etc.
Object name is nihms708898f2.jpg
Overview of TANRIC data portal

(i) The panel of six modules; (ii) the “Summary” module; (iii) the “next-generation clustered heat map” view in the “Visualization” module; (iv) the “Download” module; (v) the three analysis modules provide raw expression data on a lncRNA of interest; (vi) the analysis modules provide clinical data analysis of lncRNAs (including differential analysis among tumor subtypes, stages and grades) and analysis of correlation with patient survival; and (vii) the analysis modules provide genomic data analysis of lncRNAs, including differential analysis between mutated and wild-type samples for a protein-coding gene of interest and analysis of correlations with SCNA, miRNA, mRNA and protein expression.

TANRIC provides three analysis modules that enable users to examine the function and underlying mechanisms of lncRNAs in a flexible, interactive way. The “My lncRNA” module provides detailed information about one lncRNA of interest in a user-specified patient sample set. With the module, users can obtain the expression data for any annotated lncRNA (Fig. 2, v) and examine whether the lncRNA shows differential expression between tumor and normal samples or among tumor subgroups (as visualized through the box plots, Fig. 2, vi) or whether it correlates with patient survival time (based on P-values from the univariate Cox proportional hazards model and log-rank test and visualization through a Kaplan-Meier plot, Fig. 2, vi). This module also enables users to examine the correlations of the lncRNA with various molecular data for protein-coding and/ miRNA genes. The data types include SCNAs, mRNA expression, miRNA expression, protein expression (as visualized through the scatter plots in Fig. 2, vii) and somatic mutations (as visualized through the box plots, Fig. 2, vii). For example, elevated BCAR4 expression has been shown to significantly correlate with shorter survival time of breast cancer patients (22); and HOTAIR, a well-studied lncRNA, is known to be co-expressed with HOXC genes (23). Through this module, these findings can be easily confirmed based on TCGA cohorts. Since the annotation of lncRNAs is still rough and incomplete, this module also allows for the query of any user-defined lncRNA or its isoform (based on genomic coordinates) and returns the analysis results. The “Analyze all lncRNAs” module allows users to analyze ~13,000 ENCODE annotated RNAs in a user-specified patient sample set. With this module, users can easily identify the most differentially expressed lncRNAs among tumor subtypes or those with the strongest correlations with patient survival times (Fig. 2, vi). Given a known coding/miRNA gene of interest, this module helps identify those lncRNAs with the strongest associations for various types of molecular data (Fig. 2, vi). The results are presented in a table, and users can search the results by lncRNA name, rank the correlations, and visually examine the details. The “lncRNAs in cell lines” module provides analyses similar to those in “My lncRNA,” but in sets based on cell lines. It can help users identify appropriate cell line models for functional experiments. Through the TANRIC portal, users can perform extensive analyses on lncRNAs, both within and across tumor types and obtain publication-quality figures in a convenient way.

A large number of lncRNAs with potential biomedical significance across cancer types

Using the data and analysis modules available at TANRIC, we performed a comprehensive survey to assess the potential biomedical significance of lncRNAs. First, for 12 TCGA cancer types with available non-tumor samples, we found large numbers of lncRNAs with significant differential expression between tumor and matched normal samples (paired t-test, false discovery rate [FDR] < 0.05, fold change ≥ 2, Fig. 3a). As an independent validation, 81% of the differentially expressed lncRNAs identified in the TCGA lung adenocarcinoma set were confirmed in a Korean sample set (13). Second, for 11 TCGA cancer types with established biological or molecular subtypes (Supplementary Table 1), we found considerable numbers of differentially expressed lncRNAs among the known tumor subtypes (t-test or analysis of variance [ANOVA], FDR < 0.05, fold change ≥ 2 in at least two groups, Fig. 3b), and those lncRNA may play a role in defining tumor heterogeneity within a cancer type. Third, for 8 TCGA cancer types with sufficient samples available across different disease stages (tumor stages I–IV), we identified some lncRNAs for which the expression patterns correlated with disease stage. Some showed a monotonic change (e.g., 71 and 41 in kidney clear cell cancer [KIRC] and kidney renal papillary cell carcinoma [KIRP], ANOVA analysis, FDR < 0.05, fold change ≥ 2 in at least two groups, Fig. 3c). Those lncRNAs may be involved in tumor progression. Across the above three analyses, we demonstrate an abundance of lncRNAs with potential biomedical relevance, and many of the lncRNAs show significance in more than one cancer type (Fig. 3d).

An external file that holds a picture, illustration, etc.
Object name is nihms708898f3.jpg
A large number of lncRNAs with potential biomedical significance in various cancer types

(a) The total bars represent the numbers of expressed lncRNAs; the red parts represent the numbers of lncRNAs differentially expressed between tumor and matched normal samples across tumor types. (b) The total bars represent the numbers of expressed lncRNAs; the blue parts represent the numbers of differentially expressed lncRNAs among known tumor subtypes. (c) The total bars represent the numbers of expressed lncRNAs; the green parts represent the numbers of differentially expressed lncRNAs among clinical stages, among which the light green parts represent those with a pattern of consistent increase or decrease across stages. (d) The pie chart showing the numbers of lncRNAs with biomedical significance across tumor types.

To examine the potential impact of lncRNAs on clinical practice, we focused on 123 clinically actionable genes (18). According to their clinical utility, we classified the genes into four groups: (i) therapeutic targets with FDA drugs approved for cancer treatment; (ii) therapeutic targets with drugs in late-stage clinical trials; (iii) therapeutic targets with drugs in early-stage clinical trials; and (iv) other established diagnostic and prognostic biomarkers (Supplementary Table 2). We then examined the correlations between the expressed lncRNAs and the actionable genes, and found considerable numbers of lncRNAs strongly correlated with one or more targets in terms of (i) differential expression between samples with wild-type and mutated genes (t-test, FDR < 0.05, fold change ≥ 2); (ii) in correlation with SCNAs (Spearman rank correlation |Rs| > 0.6); and (iii) in correlation with mRNA expression (Spearman rank correlation |Rs| > 0.6) (Fig. 4a). Focusing on strongly correlated lncRNA-target pairs, we found that many of the pairs are consistently identified in multiple TCGA cancer types (Fig. 4b). These results highlight the potential of lncRNAs as regulators of key therapeutic targets for clinical practice.

An external file that holds a picture, illustration, etc.
Object name is nihms708898f4.jpg
Associations of lncRNAs with clinically actionable genes or drug sensitivity

(a) Numbers of lncRNAs for which the expressed levels are associated with an SCNA, mRNA expression, or somatic mutation of clinically actionable genes in each cancer type. (b) Numbers of lncRNA-gene pairs across multiple cancer types. The color bars represent the frequencies according to the clinical utility of actionable genes. (c) A Manhattan plot showing the correlations of lncRNA expression and drug IC50 across CCLE cell lines. Each dot represents one lncRNA-drug correlation and correlations for different drugs are shown in different colors.

To explore the potential effects of lncRNAs on drug sensitivity, we identified the expressed lncRNAs in the CCLE cell lines (12) and examined their correlations with the sensitivity data (IC50) of 24 drugs available. Interestingly, we found 202 lncRNA-drug pairs with significant correlations (Spearman rank correlation |Rs| > 0.3 and FDR < 0.01, Fig. 4c). These results suggest a critical role of some lncRNAs in affecting the response of cancer therapies.

Biological and clinical relevance of tumor subtypes revealed by lncRNA expression

Finally, we examined the clinical relevance of tumor subtypes revealed by TCGA lncRNA expression profiles. Based on the top 500 lncRNAs with the most variable expression, we defined sample subtypes (sample clusters) by ConsensusClusterPlus (19) (Materials and Methods). For each of the TCGA cancer types we studied, lncRNA-expression subtypes show extensive, strong concordance with established subtypes (chi-squared test, P < 0.05, FDR < 0.05, Fig. 5a). For example, lncRNA subtype 1 in breast cancer (BRCA) almost exclusively corresponds to the basal subtype; lncRNA subtype 5 in head and neck squamous cell carcinoma (HNSC) primarily corresponds to HPV-negative tumors; and lncRNA subtype 1 in endometrial cancer (UCEC) mainly represents the high-copy number molecular subtype (24). We next assessed the prognostic value of lncRNA-expression subtypes. For BRCA, HNSC, KIRC and brain lower grade glioma (LGG), the lncRNA-expression subtypes show distinct patient survival profiles (log-rank test, P < 0.05, Fig. 5b). As an independent validation, the three lncRNA-expression subtypes in another independent KIRC cohort (14) also show a significant correlation with the overall patient survival times (Supplementary Fig. 1). Furthermore, given clinical variables (i.e., disease stage and tumor grade), the lncRNA subtypes confer additional prognostic power in BRCA and KIRC (multivariate Cox proportional hazards model, P < 0.05).

An external file that holds a picture, illustration, etc.
Object name is nihms708898f5.jpg
lncRNA expression reveals clinically and biologically relevant tumor subtypes

(a) lncRNA-expression subtypes show extensive, strong concordance with established tumor subtypes. (b) lncRNA-expression subtypes appear to be correlated with overall patient survival times in BRCA, HNSC, KIRC and LGG. (c) Key signaling pathways are differentially expressed among tumor subtypes defined by lncRNA expression. The colors in the heatmap represent the statistical significance (FDR) of the associations between lncRNA-expression tumor subtypes and the protein-expression pathway scores.

To explore molecular mechanisms associated with the tumor subtypes defined by lncRNA expression, we examined whether some biological pathways showed some differential expression among the tumor subtypes based on pathway scores calculated from TCGA protein expression data (21). We found that the tumor subtypes defined by lncRNA expression (Fig. 5a) are associated with activation or inhibition of some pathways (Fig. 5c). These results suggest that lncRNA expression represents one meaningful dimension; therefore, integrating lncRNA expression with other molecular data may help characterize the molecular basis of human cancer more fully.

A user-friendly, interactive, open-access platform for exploring the function of lncRNAs in cancer

To provide a comprehensive lncRNA resource to the cancer research community, we have collected large-scale RNA-seq datasets from TCGA and other, independent studies and have made processed lncRNA expression data plus multiple analysis and visualization modules available through TANRIC (http://bioinformatics.mdanderson.org/main/TANRIC:Overview) (Fig. 1). The current data release, which covers 8,143 samples, has three parts (Table 1). (i) Part one consists of TCGA tissue sample sets: 6,309 tumor samples from 20 cancer types and 564 normal samples from 11 tissues. Other TCGA cancer sets will be added in the coming months. (ii) Part two consists of independent tumor tissue sample sets: one glioblastoma multiforme set (274 samples)(15), one kidney renal clear cell carcinoma set (97 samples)(14) and one lung adenocarcinoma set (83 samples)(13). Other independent sample sets will be added when available. (iii) Part three consists of tumor cell lines: 739 cell line samples from CCLE (12). To our knowledge, this represents the largest publicly available collection of lncRNA data with parallel multidimensional cancer genomic data.

An external file that holds a picture, illustration, etc.
Object name is nihms708898f1.jpg
Summary of TANRIC architecture

Table 1

Summary of the data resources of the current TANRIC release

Data SourceCancer Type#Normal
samples
#Tumor
samples
Sequencing
strategy
Read
length
#Expressed
lncRNAs*
TCGABladder urothelial carcinoma (BLCA)19252Paired-end481958
TCGABrain lower grade glioma (LGG)0486Paired-end482301
TCGABreast invasive carcinoma (BRCA)105837Paired-end501960
TCGACervical squamous cell carcinoma and endocervical adenocarcinoma (CESC)3196Paired-end481846
TCGAColon adenocarcinoma (COAD)0157Single-end76714
TCGASkin cutaneous melanoma (SKCM)0226Paired-end481755
TCGAGlioblastoma multiforme (GBM)0154Paired-end762369
TCGAHead and neck squamous cell carcinoma (HNSC)42426Paired-end481357
TCGAKidney chromophobe (KICH)2566Paired-end481971
TCGAKidney renal clear cell carcinoma (KIRC)67448Paired-end502111
TCGAKidney renal papillary cell carcinoma (KIRP)30198Paired-end482118
TCGALiver hepatocellular carcinoma (LIHC)50200Paired-end481446
TCGALung adenocarcinoma (LUAD)58488Paired-end482031
TCGALung squamous cell carcinoma (LUSC)17220Paired-end501883
TCGAOvarian serous cystadenocarcinoma (OV)0412Paired-end751866
TCGAProstate adenocarcinoma (PRAD)52374Paired-end482010
TCGARectal adenocarcinoma (READ)071Single-end76716
TCGAStomach adenocarcinoma (STAD)33285Paired-end751328
TCGAThyroid carcinoma (THCA)59497Paired-end481900
TCGAUterine corpus endometrioid carcinoma (UCEC)4316Single-end76855
CCLETumor cell lines0739Paired-end1012137
IndependentChinese_GBM0274Paired-end1012419
IndependentJapanese_KIRC097Paired-end1002308
IndependentKorean_LUAD7783Paired-end1012569
Expressed lncRNAs defined as those with an average RPKM ≥ 0.3 across all samples in each cancer type.

TANRIC integrates lncRNA expression data with clinical and genomic data (Fig. 1) and provides a user-friendly interface consisting of six modules: Summary, Visualization, Download, My lncRNA, Analyze all lncRNAs and lncRNAs in cell lines (Fig. 2, i). The “Summary” module shows an overview of RNA-seq datasets in TANRIC with a detailed description of each set (e.g., source, read length, sequencing platform and sequencing strategy) (Fig. 2, ii). The “Visualization” module offers an innovative way to examine the global patterns of lncRNA expression in a specific sample set through “next-generation clustered heat maps” (Fig. 2, iii). The interactive heat maps allow users to zoom, navigate, and drill down on clustering patterns (subtypes) of samples or lncRNAs and link to relevant biological information sources. The “Download” module allows users to obtain the expression data of ~13,000 annotated lncRNAs for analysis. (Fig. 2, iv, Materials and Methods).

An external file that holds a picture, illustration, etc.
Object name is nihms708898f2.jpg
Overview of TANRIC data portal

(i) The panel of six modules; (ii) the “Summary” module; (iii) the “next-generation clustered heat map” view in the “Visualization” module; (iv) the “Download” module; (v) the three analysis modules provide raw expression data on a lncRNA of interest; (vi) the analysis modules provide clinical data analysis of lncRNAs (including differential analysis among tumor subtypes, stages and grades) and analysis of correlation with patient survival; and (vii) the analysis modules provide genomic data analysis of lncRNAs, including differential analysis between mutated and wild-type samples for a protein-coding gene of interest and analysis of correlations with SCNA, miRNA, mRNA and protein expression.

TANRIC provides three analysis modules that enable users to examine the function and underlying mechanisms of lncRNAs in a flexible, interactive way. The “My lncRNA” module provides detailed information about one lncRNA of interest in a user-specified patient sample set. With the module, users can obtain the expression data for any annotated lncRNA (Fig. 2, v) and examine whether the lncRNA shows differential expression between tumor and normal samples or among tumor subgroups (as visualized through the box plots, Fig. 2, vi) or whether it correlates with patient survival time (based on P-values from the univariate Cox proportional hazards model and log-rank test and visualization through a Kaplan-Meier plot, Fig. 2, vi). This module also enables users to examine the correlations of the lncRNA with various molecular data for protein-coding and/ miRNA genes. The data types include SCNAs, mRNA expression, miRNA expression, protein expression (as visualized through the scatter plots in Fig. 2, vii) and somatic mutations (as visualized through the box plots, Fig. 2, vii). For example, elevated BCAR4 expression has been shown to significantly correlate with shorter survival time of breast cancer patients (22); and HOTAIR, a well-studied lncRNA, is known to be co-expressed with HOXC genes (23). Through this module, these findings can be easily confirmed based on TCGA cohorts. Since the annotation of lncRNAs is still rough and incomplete, this module also allows for the query of any user-defined lncRNA or its isoform (based on genomic coordinates) and returns the analysis results. The “Analyze all lncRNAs” module allows users to analyze ~13,000 ENCODE annotated RNAs in a user-specified patient sample set. With this module, users can easily identify the most differentially expressed lncRNAs among tumor subtypes or those with the strongest correlations with patient survival times (Fig. 2, vi). Given a known coding/miRNA gene of interest, this module helps identify those lncRNAs with the strongest associations for various types of molecular data (Fig. 2, vi). The results are presented in a table, and users can search the results by lncRNA name, rank the correlations, and visually examine the details. The “lncRNAs in cell lines” module provides analyses similar to those in “My lncRNA,” but in sets based on cell lines. It can help users identify appropriate cell line models for functional experiments. Through the TANRIC portal, users can perform extensive analyses on lncRNAs, both within and across tumor types and obtain publication-quality figures in a convenient way.

A large number of lncRNAs with potential biomedical significance across cancer types

Using the data and analysis modules available at TANRIC, we performed a comprehensive survey to assess the potential biomedical significance of lncRNAs. First, for 12 TCGA cancer types with available non-tumor samples, we found large numbers of lncRNAs with significant differential expression between tumor and matched normal samples (paired t-test, false discovery rate [FDR] < 0.05, fold change ≥ 2, Fig. 3a). As an independent validation, 81% of the differentially expressed lncRNAs identified in the TCGA lung adenocarcinoma set were confirmed in a Korean sample set (13). Second, for 11 TCGA cancer types with established biological or molecular subtypes (Supplementary Table 1), we found considerable numbers of differentially expressed lncRNAs among the known tumor subtypes (t-test or analysis of variance [ANOVA], FDR < 0.05, fold change ≥ 2 in at least two groups, Fig. 3b), and those lncRNA may play a role in defining tumor heterogeneity within a cancer type. Third, for 8 TCGA cancer types with sufficient samples available across different disease stages (tumor stages I–IV), we identified some lncRNAs for which the expression patterns correlated with disease stage. Some showed a monotonic change (e.g., 71 and 41 in kidney clear cell cancer [KIRC] and kidney renal papillary cell carcinoma [KIRP], ANOVA analysis, FDR < 0.05, fold change ≥ 2 in at least two groups, Fig. 3c). Those lncRNAs may be involved in tumor progression. Across the above three analyses, we demonstrate an abundance of lncRNAs with potential biomedical relevance, and many of the lncRNAs show significance in more than one cancer type (Fig. 3d).

An external file that holds a picture, illustration, etc.
Object name is nihms708898f3.jpg
A large number of lncRNAs with potential biomedical significance in various cancer types

(a) The total bars represent the numbers of expressed lncRNAs; the red parts represent the numbers of lncRNAs differentially expressed between tumor and matched normal samples across tumor types. (b) The total bars represent the numbers of expressed lncRNAs; the blue parts represent the numbers of differentially expressed lncRNAs among known tumor subtypes. (c) The total bars represent the numbers of expressed lncRNAs; the green parts represent the numbers of differentially expressed lncRNAs among clinical stages, among which the light green parts represent those with a pattern of consistent increase or decrease across stages. (d) The pie chart showing the numbers of lncRNAs with biomedical significance across tumor types.

To examine the potential impact of lncRNAs on clinical practice, we focused on 123 clinically actionable genes (18). According to their clinical utility, we classified the genes into four groups: (i) therapeutic targets with FDA drugs approved for cancer treatment; (ii) therapeutic targets with drugs in late-stage clinical trials; (iii) therapeutic targets with drugs in early-stage clinical trials; and (iv) other established diagnostic and prognostic biomarkers (Supplementary Table 2). We then examined the correlations between the expressed lncRNAs and the actionable genes, and found considerable numbers of lncRNAs strongly correlated with one or more targets in terms of (i) differential expression between samples with wild-type and mutated genes (t-test, FDR < 0.05, fold change ≥ 2); (ii) in correlation with SCNAs (Spearman rank correlation |Rs| > 0.6); and (iii) in correlation with mRNA expression (Spearman rank correlation |Rs| > 0.6) (Fig. 4a). Focusing on strongly correlated lncRNA-target pairs, we found that many of the pairs are consistently identified in multiple TCGA cancer types (Fig. 4b). These results highlight the potential of lncRNAs as regulators of key therapeutic targets for clinical practice.

An external file that holds a picture, illustration, etc.
Object name is nihms708898f4.jpg
Associations of lncRNAs with clinically actionable genes or drug sensitivity

(a) Numbers of lncRNAs for which the expressed levels are associated with an SCNA, mRNA expression, or somatic mutation of clinically actionable genes in each cancer type. (b) Numbers of lncRNA-gene pairs across multiple cancer types. The color bars represent the frequencies according to the clinical utility of actionable genes. (c) A Manhattan plot showing the correlations of lncRNA expression and drug IC50 across CCLE cell lines. Each dot represents one lncRNA-drug correlation and correlations for different drugs are shown in different colors.

To explore the potential effects of lncRNAs on drug sensitivity, we identified the expressed lncRNAs in the CCLE cell lines (12) and examined their correlations with the sensitivity data (IC50) of 24 drugs available. Interestingly, we found 202 lncRNA-drug pairs with significant correlations (Spearman rank correlation |Rs| > 0.3 and FDR < 0.01, Fig. 4c). These results suggest a critical role of some lncRNAs in affecting the response of cancer therapies.

Biological and clinical relevance of tumor subtypes revealed by lncRNA expression

Finally, we examined the clinical relevance of tumor subtypes revealed by TCGA lncRNA expression profiles. Based on the top 500 lncRNAs with the most variable expression, we defined sample subtypes (sample clusters) by ConsensusClusterPlus (19) (Materials and Methods). For each of the TCGA cancer types we studied, lncRNA-expression subtypes show extensive, strong concordance with established subtypes (chi-squared test, P < 0.05, FDR < 0.05, Fig. 5a). For example, lncRNA subtype 1 in breast cancer (BRCA) almost exclusively corresponds to the basal subtype; lncRNA subtype 5 in head and neck squamous cell carcinoma (HNSC) primarily corresponds to HPV-negative tumors; and lncRNA subtype 1 in endometrial cancer (UCEC) mainly represents the high-copy number molecular subtype (24). We next assessed the prognostic value of lncRNA-expression subtypes. For BRCA, HNSC, KIRC and brain lower grade glioma (LGG), the lncRNA-expression subtypes show distinct patient survival profiles (log-rank test, P < 0.05, Fig. 5b). As an independent validation, the three lncRNA-expression subtypes in another independent KIRC cohort (14) also show a significant correlation with the overall patient survival times (Supplementary Fig. 1). Furthermore, given clinical variables (i.e., disease stage and tumor grade), the lncRNA subtypes confer additional prognostic power in BRCA and KIRC (multivariate Cox proportional hazards model, P < 0.05).

An external file that holds a picture, illustration, etc.
Object name is nihms708898f5.jpg
lncRNA expression reveals clinically and biologically relevant tumor subtypes

(a) lncRNA-expression subtypes show extensive, strong concordance with established tumor subtypes. (b) lncRNA-expression subtypes appear to be correlated with overall patient survival times in BRCA, HNSC, KIRC and LGG. (c) Key signaling pathways are differentially expressed among tumor subtypes defined by lncRNA expression. The colors in the heatmap represent the statistical significance (FDR) of the associations between lncRNA-expression tumor subtypes and the protein-expression pathway scores.

To explore molecular mechanisms associated with the tumor subtypes defined by lncRNA expression, we examined whether some biological pathways showed some differential expression among the tumor subtypes based on pathway scores calculated from TCGA protein expression data (21). We found that the tumor subtypes defined by lncRNA expression (Fig. 5a) are associated with activation or inhibition of some pathways (Fig. 5c). These results suggest that lncRNA expression represents one meaningful dimension; therefore, integrating lncRNA expression with other molecular data may help characterize the molecular basis of human cancer more fully.

Discussion

We have developed TANRIC, a user-friendly, interactive, open-access web resource for exploring the functions and mechanisms of lncRNAs in cancer. Compared with other available lncRNA-focused bioinformatics resources (11,2534), TANRIC has several unique features (Table 2): (i) It provides extensive, intuitive and interactive analyses on lncRNAs of interest for their interactions with other TCGA genomic/proteomic/epigenomic and clinical data types, both within a tumor type and across tumor types; (ii) It enables users to query expression profiles of user-defined lncRNAs quickly; (iii) It includes RNA-seq data from well-characterized cell lines and other large, non-TCGA patient cohorts, thereby allowing users to validate a pattern of interest or identify model cell lines for experimental characterization. With the efficient analytic modules, TANRIC substantially lowers the barriers between cancer researchers and complex cancer transcriptomic data (>60TB and 1,142 billion reads in the current release). Going forward, we will constantly incorporate newly available large-scale cancer RNA-seq data into TANRIC.

Table 2

Comparison of TANRIC with other available lncRNA-focused bioinformatics resources

Name*Data ResourceAnalysis


Total
Sample Size
Cancer Cell
Line
Non-TCGA
Large Patient
Cohorts
Allow User Input?Clinical
Data
Analysis
Molecular Data Analysis



User-defined
lncRNAs
User-defined
Sample Set
Survival
Grade
Stage
Subtype
Genomic
Data
Proteomic
Data
lncRNA-
expression
Subtype
TANRIC8143739mutation SCNA mRNA miRNA
MiTranscriptome6938~200
lncRNAtor4995NAmRNA
lncRNABase~6000NA

ChIPBaseNA~80
LNCipediaNANA
lncRNAdbNANA
NONCODENANA
lncRNomeNANA
NREDNANA
DIANA-LncBaseNANA
lncRNADiseaseNANA
The resources with a primary focus on cancer lnRNAs are shaded.

We have further demonstrated the utility of TANRIC through a comprehensive pan-cancer analysis of expressed lncRNAs. Consistent with previous studies (4,11,35,36), our analysis revealed a large number of tumor-associated lncRNAs. More importantly, we report that some lncRNAs show strong correlations with established therapeutic targets across tumor types or with drug sensitivity across cell lines. Although the correlations do not necessarily indicate direct cause–effect relationships, they highlight the potential of lncRNAs as a novel class of biomarkers or therapeutic targets. TANRIC thus represents a starting point for exploration of particular lncRNA species and for generation of testable hypotheses for further experimental investigation.

Supplementary Material

1

1

Click here to view.(334K, pdf)

Acknowledgements

We gratefully acknowledge contributions from the TCGA Research Network and its TCGA Pan-Cancer Analysis Working Group. We thank the MD Anderson high performance computing core facility for computing resources and LeeAnn Chastain for editorial assistance.

Financial Support This study was supported by the National Institutes of Health ({"type":"entrez-nucleotide","attrs":{"text":"CA143883","term_id":"35040308"}}CA143883 to J.N.W., {"type":"entrez-nucleotide","attrs":{"text":"CA175486","term_id":"35104256"}}CA175486 to H.L. and the MD Anderson Cancer Center Support Grant P30 CA016672 to J.N.W. and H.L.); the R. Lee Clark Fellow Award from The Jeanne F. Shelby Scholarship Fund to H.L.; a grant from the Cancer Prevention and Research Institute of Texas (RP140462 to H.L.); the Mary K. Chapman Foundation and the Lorraine Dell Program in Bioinformatics for Personalization of Cancer Medicine to J.N.W.

Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
Department of Oncology, Nanjing Medical University, Nanjing, Jiangsu, China
Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
The University of Texas Graduate School of Biomedical Sciences at Houston, TX, USA
Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX, USA
Address correspondence to Han Liang, Ph.D., Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA, Tel: +1 713-745-9815; Fax: +1 713-563-4242, gro.nosrednadm@1gnailh
These authors contributed equally to this work

Abstract

Long noncoding RNAs (lncRNAs) have emerged as essential players in cancer biology. Using recent large-scale RNA-seq datasets, especially those from The Cancer Genome Atlas (TCGA), we have developed “The Atlas of Noncoding RNAs in Cancer” (TANRIC, http://bioinformatics.mdanderson.org/main/TANRIC:Overview), a user-friendly, open-access web resource for interactive exploration of lncRNAs in cancer. It characterizes the expression profiles of lncRNAs in large patient cohorts of 20 cancer types, including TCGA and independent data sets (>8,000 samples overall). TANRIC enables researchers to rapidly and intuitively analyze lncRNAs of interest (annotated lncRNAs or any user-defined ones) in the context of clinical and other molecular data, both within and across tumor types. Using TANRIC, we have identified a large number of lncRNAs with potential biomedical significance, many of which show strong correlations with established therapeutic targets and biomarkers across tumor types or with drug sensitivity across cell lines. TANRIC represents a valuable tool for investigating the function and clinical relevance of lncRNAs in cancer, greatly facilitating lncRNA-related biological discoveries and clinical applications.

Keywords: lncRNA expression, TCGA, cancer genomics, bioinformatics resource, prognostic marker
Abstract

Footnotes

Conflict of Interest The authors declare no conflict of interest related to this work.

Footnotes

References

  • 1. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, et al Landscape of transcription in human cells. Nature. 2012;489:101–108.[Google Scholar]
  • 2. Wapinski O, Chang HYLong noncoding RNAs and human disease. Trends Cell Biol. 2011;21:354–361.[PubMed][Google Scholar]
  • 3. Prensner JR, Chinnaiyan AMThe emergence of lncRNAs in cancer biology. Cancer Discov. 2011;1:391–407.[Google Scholar]
  • 4. Du Z, Fei T, Verhaak RG, Su Z, Zhang Y, Brown M, et al Integrative genomic analyses reveal clinically relevant long noncoding RNAs in human cancer. Nat Struct Mol Biol. 2013;20:908–913.[Google Scholar]
  • 5. Yap KL, Li S, Munoz-Cabello AM, Raguz S, Zeng L, Mujtaba S, et al Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a. Mol Cell. 2010;38:662–674.[Google Scholar]
  • 6. Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA, et al Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007;129:1311–1323.[Google Scholar]
  • 7. Wang D, Garcia-Bassets I, Benner C, Li W, Su X, Zhou Y, et al Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature. 2011;474:390–394.[Google Scholar]
  • 8. Bernard D, Prasanth KV, Tripathi V, Colasse S, Nakamura T, Xuan Z, et al A long nuclear-retained non-coding RNA regulates synaptogenesis by modulating gene expression. EMBO J. 2010;29:3082–3093.[Google Scholar]
  • 9. Tripathi V, Ellis JD, Shen Z, Song DY, Pan Q, Watt AT, et al The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell. 2010;39:925–938.[Google Scholar]
  • 10. Cancer Genome Atlas Research N. Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–1120.
  • 11. Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, et al The landscape of long noncoding RNAs in the human transcriptome. Nat Genet. 2015[Google Scholar]
  • 12. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607.[Google Scholar]
  • 13. Seo JS, Ju YS, Lee WC, Shin JY, Lee JK, Bleazard T, et al The transcriptional landscape and mutational profile of lung adenocarcinoma. Genome Res. 2012;22:2109–2119.[Google Scholar]
  • 14. Sato Y, Yoshizato T, Shiraishi Y, Maekawa S, Okuno Y, Kamura T, et al Integrated molecular analysis of clear-cell renal cell carcinoma. Nat Genet. 2013;45:860–867.[PubMed][Google Scholar]
  • 15. Bao ZS, Chen HM, Yang MY, Zhang CB, Yu K, Ye WL, et al RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas. Genome Res. 2014[Google Scholar]
  • 16. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold BMapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628.[PubMed][Google Scholar]
  • 17. Han L, Yuan Y, Zheng S, Yang Y, Li J, Edgerton ME, et al The Pan-Cancer analysis of pseudogene expression reveals biologically and clinically relevant tumour subtypes. Nat Commun. 2014;5:3963.[Google Scholar]
  • 18. Van Allen EM, Wagle N, Stojanov P, Perrin DL, Cibulskis K, Marlow S, et al Whole-exome sequencing and clinical interpretation of formalin-fixed, paraffin-embedded tumor samples to guide precision cancer medicine. Nat Med. 2014;20:682–688.[Google Scholar]
  • 19. Wilkerson MD, Hayes DNConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26:1572–1573.[Google Scholar]
  • 20. Li J, Lu Y, Akbani R, Ju Z, Roebuck PL, Liu W, et al TCPA: a resource for cancer functional proteomics data. Nat Methods. 2013;10:1046–1047.[Google Scholar]
  • 21. Akbani R, Ng PK, Werner HM, Shahmoradgoli M, Zhang F, Ju Z, et al A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat Commun. 2014;5:3887.[Google Scholar]
  • 22. Xing Z, Lin A, Li C, Liang K, Wang S, Liu Y, et al lncRNA directs cooperative epigenetic regulation downstream of chemokine signals. Cell. 2014;159:1110–1125.[Google Scholar]
  • 23. Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, et al Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464:1071–1076.[Google Scholar]
  • 24. The Cancer Genome Research NetworkIntegrated genomic characterization of endometrial carcinoma. Nature. 2013;497:67–73.[Google Scholar]
  • 25. Bhartiya D, Pal K, Ghosh S, Kapoor S, Jalali S, Panwar B, et al lncRNome: a comprehensive knowledgebase of human long noncoding RNAs. Database (Oxford) 2013;2013:bat034.[Google Scholar]
  • 26. Chen G, Wang Z, Wang D, Qiu C, Liu M, Chen X, et al LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 2013;41:D983–D986.[Google Scholar]
  • 27. Dinger ME, Pang KC, Mercer TR, Crowe ML, Grimmond SM, Mattick JSNRED: a database of long noncoding RNA expression. Nucleic Acids Res. 2009;37:D122–D126.[Google Scholar]
  • 28. Li JH, Liu S, Zhou H, Qu LH, Yang JHstarBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 2014;42:D92–D97.[Google Scholar]
  • 29. Paraskevopoulou MD, Georgakilas G, Kostoulas N, Reczko M, Maragkakis M, Dalamagas TM, et al DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs. Nucleic Acids Res. 2013;41:D239–D245.[Google Scholar]
  • 30. Park C, Yu N, Choi I, Kim W, Lee SlncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs. Bioinformatics. 2014;30:2480–2485.[PubMed][Google Scholar]
  • 31. Quek XC, Thomson DW, Maag JL, Bartonicek N, Signal B, Clark MB, et al lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 2015;43:D168–D173.[Google Scholar]
  • 32. Volders PJ, Verheggen K, Menschaert G, Vandepoele K, Martens L, Vandesompele J, et al An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res. 2015;43:D174–D180.[Google Scholar]
  • 33. Xie C, Yuan J, Li H, Li M, Zhao G, Bu D, et al NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Res. 2014;42:D98–D103.[Google Scholar]
  • 34. Yang JH, Li JH, Jiang S, Zhou H, Qu LHChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data. Nucleic Acids Res. 2013;41:D177–D187.[Google Scholar]
  • 35. Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC, et al Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol. 2011;29:742–749.[Google Scholar]
  • 36. White NM, Cabanski CR, Silva-Fisher JM, Dang HX, Govindan R, Maher CATranscriptome sequencing reveals altered long intergenic non-coding RNAs in lung cancer. Genome Biol. 2014;15:429.[Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.