Lectin chromatography/mass spectrometry discovery workflow identifies putative biomarkers of aggressive breast cancers.
Journal: 2012/September - Journal of Proteome Research
ISSN: 1535-3907
Abstract:
We used a lectin chromatography/MS-based approach to screen conditioned medium from a panel of luminal (less aggressive) and triple negative (more aggressive) breast cancer cell lines (n=5/subtype). The samples were fractionated using the lectins Aleuria aurantia (AAL) and Sambucus nigra agglutinin (SNA), which recognize fucose and sialic acid, respectively. The bound fractions were enzymatically N-deglycosylated and analyzed by LC-MS/MS. In total, we identified 533 glycoproteins, ∼90% of which were components of the cell surface or extracellular matrix. We observed 1011 glycosites, 100 of which were solely detected in ≥3 triple negative lines. Statistical analyses suggested that a number of these glycosites were triple negative-specific and thus potential biomarkers for this tumor subtype. An analysis of RNaseq data revealed that approximately half of the mRNAs encoding the protein scaffolds that carried potential biomarker glycosites were up-regulated in triple negative vs luminal cell lines, and that a number of genes encoding fucosyl- or sialyltransferases were differentially expressed between the two subtypes, suggesting that alterations in glycosylation may also drive candidate identification. Notably, the glycoproteins from which these putative biomarker candidates were derived are involved in cancer-related processes. Thus, they may represent novel therapeutic targets for this aggressive tumor subtype.
Relations:
Content
Citations
(19)
References
(76)
Grants
(54)
Diseases
(1)
Drugs
(1)
Chemicals
(5)
Organisms
(1)
Anatomy
(1)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
J Proteome Res 11(4): 2508-2520

A lectin chromatography/mass spectrometry discovery workflow identifies putative biomarkers of aggressive breast cancers

+17 authors

Introduction

The intense interest in biomarker discovery is a reflection of the clinical need for tests with a high degree of sensitivity and specificity for diagnosing diseases, predicting their courses, as well as monitoring responses to therapy and disease recurrence. Technological breakthroughs in separation strategies and mass spectrometry (MS) have enabled rapid identification and quantification of large numbers of proteins in biological samples 1. Nonetheless, their complexity requires extensive fractionation to access low abundance proteins, such as those released from nascent tumors. Alternatively, and technically less challenging, is the design of capture approaches that exploit disease biology for the purpose of biomarker identification 2. For many reasons, glycosylation is an attractive target. First, the biology allows for the rational design of discovery efforts. For example, changes in the glycosylation machinery can be identified from microarray data and translated in structural terms, providing a compelling rationale for designing lectin-based strategies to enrich glycopeptides carrying disease-related carbohydrate motifs. Second, one protein can carry many copies of an altered glycan, which may also be added to other scaffolds. Thus, there is an important amplification effect, which could enable the detection of many fewer abnormal cells than would otherwise be possible. Finally, glycosylation acts to shield the peptide backbone from proteolytic degradation 3. Thus, in theory, glycan-based biomarkers are likely to be more stable in a variety of disease settings than unmodified proteins, which are often more labile.

Glycosylation is altered in a number of pathologies, but its relationship to cancer is particularly well-defined at phenotypic and, to a lesser degree, functional levels. For example, many of the most widely used clinical tests detect glycoproteins and carbohydrate structures. These include carcinoembryonic antigen (CEA), commonly used as a marker of colorectal cancer; CA-125, frequently employed to diagnose ovarian cancer; CA 19-9, the most commonly used biomarker for diagnosing pancreatic cancer; CA 15-3, used to monitor the metastasis of breast cancer 4; and prostate-specific antigen (PSA) 58. In addition, glycan-specific antibodies and lectins are used for the cytological and histological evaluation of glycosylation for the purpose of guiding diagnoses and enabling more accurate prognoses, e.g., anti-Lewis (Le) antibodies for bladder cancer, and the lectins Helix pomatia agglutinin (HPA) and Ulex europaeus I agglutinin (UEA 1) for breast cancer 1. This is due to the fact that increases in fucosylation and sialylation of N-linked structures and truncation of O-linked oligosaccharides occur in many tumor types. The expression of Le antigens, such as sialyl Le, can also be indicative of disease progression, as these structures play important roles in promoting metastasis by virtue of their well-known ability to mediate cell trafficking and extravasation 9, 10.

Breast cancer is now recognized to be a collection of distinct neoplastic diseases with different molecular and clinical attributes. Breast tumors can be stratified into five intrinsic subtypes and a “normal-like” group according to features such as mRNA expression 11. Interestingly, these molecularly-defined cohorts, which include luminal, basal-like, and claudin-low, are also predictive of clinical outcomes such as disease severity and treatment response 1214. Specifically, luminal tumors tend to be less aggressive with better survival rates, while basal-like and claudin-low lesions have generally worse prognoses 15. Additionally, the expression of a therapeutic target such as the estrogen receptor (ER) or human epidermal growth factor receptor 2 (HER2/ErbB2) determines tumor susceptibility to drugs that interact with these molecules 16, 17. Triple negative breast cancers (TNBC) express neither ER nor the progesterone receptor (PR) and moderate levels of HER2. This clinically important, heterogeneous category includes most basal-like and claudin-low tumors 18, 19. TNBCs have poor survival rates and lack specific therapeutic targets, limiting treatment options and making early detection a priority.

We hypothesized that biomarkers specific for these tumors could be identified by a comparative analysis of the repertoire of secreted or shed glycoproteins in a panel of breast cancer cell lines that have been extensively characterized at genomic and transcriptional levels 2022. Based on gene expression, the lines can be clustered into subsets that mirror the molecular characteristics of primary breast tumors. Thus, these panels are useful tools for studying subtype-specific behavior, such as drug responses and alternative splicing 20, 23. Here, we used a subset of cells from this collection for biomarker discovery. Specifically, we analyzed conditioned medium (CM) from 5 luminal and 5 triple negative cell lines. The samples were distributed to three laboratories: University of California San Francisco (UCSF), the Buck Institute for Research on Aging, and Purdue University. Each group analyzed the samples using our recently published method for lectin affinity chromatographic enrichment and LC-MS/MS analyses 24. Overall, we identified 533 glycoproteins, including 1011 N-linked glycosylation sites (glycosites). Of these, 100 were solely detected in ≥3 triple negative lines. Interestingly, many in the latter category were from glycoproteins that are upregulated in the claudin-low subtype 21, involved in cancer progression (e.g., epithelial to mesenchymal transition) and/or metastasis, 25.

Materials and Methods

Cell culture and production of conditioned media

All cells were cultured as described in Neve et al.21. To generate the CM, we cultured 10 breast cancer cell lines (Table 1) that were derived from 5 luminal (SKBR3, SUM52 PE, MDAMB175, UACC 812, and MDAMA361) and 5 triple negative tumors (MDA468, BT549, HS578T, MDAMB231, and HCC38). CM was prepared and trypsin digested at Site M. The lines were grown to 75–80% confluence in the appropriate culture medium 21. Then they were washed with fresh medium without fetal calf serum (FCS) or phenol red and incubated for 10 min at 37 °C. This process was repeated twice before the cells were incubated in fresh medium (without FCS and phenol red) for 18–20 h. At the end of the culture period, the cells retained their original morphologies with no evidence of apoptosis. The CM was harvested and centrifuged at 2000 × g for 10 min. The supernatant was concentrated using Millipore centrifugal filter units (MWCO 3K) and dialyzed against phosphate buffered saline (PBS).

Table 1

Luminal and triple-negative breast cancer cell lines.

Cell lineaTumor subtypeERbPRcHER2dDiagnosise
SKBR3LuminalYesAdenocarcinoma
SUM52 PELuminal+NoCarcinoma
MDAMB175Luminal+NoIDC
UACC 812Luminal+YesIDC
MDAMA361Luminal+YesAdenocarcinoma
MDA468Basal ANoAdenocarcinoma
BT549Claudin-lowNoIDC, papillary
HS5787Claudin-lowNoIDC
MDAMB 231Claudin-lowNoAdenocarcinoma
HCC38Claudin-lowNoDuctal carcinoma
This table is populated with information from Neve et al., 2006.
Estrogen (ER) or progesterone receptor (PR) expression.
Human epidermal growth factor receptor 2 (HER2) overexpression.
Invasive ductal carcinoma (IDC).

Lectin blotting and staining

Biotinylated and fluoresceinated lectins were purchased from Vector Laboratories. Blotting: Cell lysates were separated by SDS-PAGE (4–12% gels) and transferred to nitrocellulose membranes. Unless otherwise indicated, the following buffer was used for all steps, including blocking, washing, and reagent dilution/incubation: 0.25 M Tris-Cl, pH 8.0, 0.5 M NaCl, 0.5% NP-40. Blots were incubated in buffer for 1 h to block non-specific binding, then exposed to a solution of ~5 μg/mL of biotinylated lectin for 2 h. Blots were washed 3 × 5 min with copious amounts of buffer. Then, membranes were reacted with ABC reagent (Vector Laboratories) for 1 h and washed again as before. Finally, bound lectin was detected using 3,3-diaminobenzidine (DAB, Vector Laboratories) prepared in water according to the manufacturer’s instructions. Staining: cell surface labeling of non-permeabilized cells was performed as described 26, except that fluoresceinated lectins, rather than antibodies, were used.

Trypsin digestion

First, protein concentrations of the CM samples were determined by amino acid analysis. Then, CM samples were digested and desalted using a published method that incorporates denaturation with 6 M urea 27. As previously described 24, samples were spiked with 25 and 50 pmol of trypsin-digested control glycopeptides from commercial yeast invertase and human lactoferrin (Sigma, St. Louis, MO), respectively. Peptides were stored at −80 °C prior to analyses.

Preparation of lectin columns

The columns were prepared at Site M from a single batch of lectin-conjugated beads and distributed to all the laboratories. Briefly, Sambucus nigra agglutinin (SNA) and Aleuria aurantia lectin (AAL) were purchased from Vector Laboratories (Burlingame, CA). Lectins (20 mg) were suspended at 5–10 mg/mL in PBS and conjugated to 330 mg of POROS-AL beads (Applied Biosystems, Foster City, CA) as previously described 24. Unconjugated protein was removed by washing the beads (5 × 5 mL of 1 M sodium chloride) before they were packed into 3 individual 4.6 × 50 mm PEEK HPLC columns. Routine storage was in PBS with 0.02% sodium azide at 4 °C for up to 6 months. Columns were reused for up to 75 affinity separations without degradation of the performance characteristics as assessed by glycopeptide enrichment and total number of glycopeptides recovered from digested human plasma.

Lectin chromatography—Instrumentation

The HPLC systems employed were standardized in terms of injection volume, transfer line lengths, dead volume minimization, and common UV elution profiles. Site M used a Paradigm MG4 HPLC system equipped with a CTC PAL robot configured as an autosampler and fraction collector (Michrom Bioresources). At Site X, a Waters system including 1525 Binary HPLC equipped with a 717 plus Autosampler and a Fraction Collector III was employed. Site S used a Shimadzu 20AD HPLC system equipped with a SIL-20AC autosampler; fractions were collected manually. Mobile phases: Buffer A was 25 mM Tris buffer, pH 7.4, 50 mM sodium chloride, 10 mM calcium chloride, and 10 mM magnesium chloride; Buffer B was 0.5 M acetic acid. Affinity separation: Routinely, ~100 μg of digested protein was diluted into Buffer A, applied to the lectin column, and separated using the following 3 step gradient: 1) Sample load: Buffer A for 9.0 min at 80 μL/min; 2) Sample elution: Buffer B for 4.8 min at 500 μL/min; and 3) Re-equilibration: Buffer A for 6.0 min at 3000 μL/min. The bound fraction, collected from 9.0 to 14.25 min, was desalted using Oasis HLB cartridges as described above. Eluted samples were neutralized by the addition of 0.5 M ammonium bicarbonate and concentrated to <100 μL by vacuum centrifugation. Further details are described in the accompanying SOP (Supplementary Document 1).

PNGase F digestion

N-linked glycopeptides in the bound fractions were deglycosylated by treatment with PNGase F (Glycerol-free, New England Biolabs; Ipswich, MA) as previously described 24. Following deglycosylation, samples were desalted and concentrated using C18 ZipTips® (Millipore; Billerica, MA) or MicroSpin Columns, 5–200 μL (The Nest Group, Inc.; Southborough, MA).

ESI-QqTOF mass spectrometric analyses (Sites M and X)

The peptides were separated using an Eksigent nano-LC 2D HPLC system (Eksigent, Dublin, CA), which was directly connected to a quadrupole time-of-flight (QqTOF) QSTAR Elite mass spectrometer (AB Sciex, Foster City, CA). We injected 33% (vol/vol) of the bound material per run. Briefly, peptides were applied to a guard column (C18 Acclaim PepMap100, 300 μm I.D. × 5 mm, 5 μm particle size, 100 Å pore size; Dionex, Sunnyvale, CA) and washed with the aqueous loading solvent (2% solvent B in A, flow rate: 20 μL/min) for 10 min prior to separation on a C18 Acclaim PepMap100 column (75 μm I.D. × 15 cm, 3 μm particle size, 100 Å pore size; Dionex, Sunnyvale, CA). Bound material was eluted at a flow rate of 300 nL/min using the following gradients: 2–40% solvent B in A (from 0–60 min), 40–90% solvent B in A (from 60–75 min), and at 90% solvent B in A (from 75–85 min), with a total runtime of 120 min (including column equilibration). Solvent A consisted of 0.1% formic acid in 98% H2O/2% acetonitrile and solvent B was 0.1% formic acid in 98% acetonitrile/2% H2O. Spectra were calibrated using MS/MS fragment-ions of a Glu-Fibrinogen B peptide standard. Advanced information dependent acquisition was employed for MS/MS data collection using QSTAR Elite (Analyst QS 2.0) specific features, including “Smart Collision” (fragment intensity multiplier set to 2.0) and “Smart Exit” (maximum accumulation time of 2.5 sec) to obtain MS/MS spectra for the six most abundant precursor ions following each survey scan. To increase overall sampling efficiencies, two replicate nano-HPLC-MS/MS analyses were performed per sample.

ESI-LTQ-Orbitrap XL mass spectrometric analyses (Site S)

The peptide mixtures were separated as described above using an Agilent nanoflow 1100 HPLC system (Agilent, Santa Clara, CA) connected to a hybrid linear ion trap Orbitrap mass spectrometer (LTQ Orbitrap XL, Thermo Fisher Scientific). The electrospray ionization emitter tip (Pico-tip emitter, F360-75-15-N-5-C10.5) was purchased from New Objective (Woburn, MA). The mass spectrometer, which was calibrated with a solution of caffeine, MRFA and Ultramark 1621 according to the manufacturer’s instructions, was operated in the data-dependent mode. Full MS scans from m/z 350 to 1600 with a full width at half maximum resolution of 30,000 were acquired as profile data, followed by MS/MS scans of the six most abundant ions in the linear trap. Singly charged ions were excluded. A dynamic mass exclusion time was applied for 120s with a repeat count of 1 and a repeat duration time of 30s. In all scan modes, one micro scan was applied.

Database searches

Mass spectrometric data from all laboratories were analyzed at Site M using two bioinformatics database search engines with integrated peak picking, ProteinPilot (AB Sciex) version 4.0.8085 (revision 148085) using the Paragon Algorithm 4.0.0.0, 148083 28, and Mascot version 2.2.04 using Mascot Daemon version 2.2.2 (both Matrix Science). For the latter, the following (default) data import filter options were used: precursor charge state +2 to +4, reject spectrum if < 7 peaks or if precursor is < 400 or >10000 m/z, remove peaks with intensity < 0.001% of the highest peak; centroid all MS/MS data, percentage height 50, and merge distance 0.1 atomic mass units. Peak lists for the Orbitrap LC-MS/MS data sets were generated using Mascot Distiller 2.3.2.0 (Matrix Science) with the supplied processing parameter file Orbitrap_low_res_MS2_4.opt. The Orbitrap peak lists were saved in MGF format with Distiller preferences set to save MS/MS peaks as MH+ for input into Mascot and ProteinPilot search engines. All data were searched using a merged database of 20293 protein sequences including the publicly available human SwissProt UniProt release 2010_09 plus 7 other proteins, which includes all 20,286 reviewed (formerly SwissProt) Human Uniprot Entries, as well as PNGase F ({"type":"entrez-protein","attrs":{"text":"Q9XBM8","term_id":"75474045","term_text":"Q9XBM8"}}Q9XBM8|Q9XBM8_FLAME, {"type":"entrez-protein","attrs":{"text":"P21163","term_id":"130373","term_text":"P21163"}}P21163|PNGF_ELIMR) and Yeast Invertase ({"type":"entrez-protein","attrs":{"text":"P10594","term_id":"124702","term_text":"P10594"}}P10594|INV1_YEAST, {"type":"entrez-protein","attrs":{"text":"P00724","term_id":"124703","term_text":"P00724"}}P00724|INV2_YEAST, {"type":"entrez-protein","attrs":{"text":"P10595","term_id":"124704","term_text":"P10595"}}P10595|INV3_YEAST, {"type":"entrez-protein","attrs":{"text":"P10596","term_id":"124705","term_text":"P10596"}}P10596|INV4_YEAST, {"type":"entrez-protein","attrs":{"text":"P10597","term_id":"124706","term_text":"P10597"}}P10597|INV5_YEAST). ProteinPilot searches were performed as previously described 24. A ProteinPilot peptide confidence cut-off value of 98.8 was chosen, corresponding to a local FDR of 5%. For Mascot searches, the following parameters were used: trypsin enzyme specificity, carbamidomethyl (Cys) as a fixed modification, and the following variable modifications: deamidation of asparagine and glutamine residues, oxidization of methionines, acetylation at the protein N-terminus, cyclization of N-terminal glutamines, and two missed tryptic cleavages. For QSTAR Elite data a mass tolerance of 100 ppm and 0.4 Da was set for the precursor and product ions, respectively; whereas values of 10 ppm and 0.8 Da were applied to Orbitrap data. Peptide-spectral matches with expectation values <0.026 were accepted. FDR analysis was performed using the Mascot automatic decoy search. In all cases, the peptide false-positive identification rate was <3%.

Glycopeptide assignment

Deglycosylated peptides were identified as previously described 24, on the basis of several criteria including the motif NxS/T, x ≠ proline, in which Asn was converted to Asp (reported by the search engine as Asn deamidation), and the presence of at least one fragment ion encompassing the glycosite. To ensure inclusion of glycosites containing Lys and/or Arg in the X position (e.g., NKT), which were likely to have been cleaved by trypsin, the amino-acid residue following the carboxy-terminal cleavage site was also considered. Peptides containing the motif NGS or NGT were excluded due to the fact that asparagine residues in that sequence are prone to chemical deamidation during overnight trypsin digestion 29. For all deglycosylated peptides the corresponding MS/MS spectra were manually examined using an adaptation of previously published criteria to ensure correct assignment 24, 30.

Generation of a list of candidate triple negative-specific glycosites

The selection criteria for triple negative-specific glycosites were subjected to a resampling, non-parametric statistical test in which no knowledge about the data’s distribution is necessary, e.g., the “bootstrap” technique 31. The basic premise of this approach is to consider the null hypothesis that there is statistically no difference between the luminal and triple negative data sets, e.g., that the two are random selections from the same population. To determine the expected FDRs, we applied 20,000 random permutations to the form:

Criterion n-m: A glycosite satisfies criterion n-m if it is identified in ≤ n Luminal cell lines and in ≥ m TN cell lines.

The results are shown in Supplementary Table 3.

Spectral Viewer: Skyline Spectral Library

An interactive Skyline spectral library file that contains all MS/MS spectra of deglycosylated peptides identified in this study been submitted as Supplemental Material. Skyline is an open source program 32 available for free download at http://proteome.gs.washington.edu/software/skyline.

Exon expression array and RNAseq experiments

Whole transcriptome shotgun sequencing (RNAseq) was completed on nine of ten breast cancer cell lines (BT549, HCC38, HS578T, MDAMB231, MDAMB175VII, MDAMB361, SKBR3, SUM52PE and UACC812). Expression analysis was performed with the ALEXA-seq software package as previously described 33. On a per sample basis, an average of 58.7 million (76bp paired-end) reads passed quality control, and 37.6 million mapped to the transcriptome, which resulted in coverage of 40x across all known genes. Log2 transformed estimates of gene-level expression were extracted for fucosyl- and sialyltransferase genes, and triple negative candidate biomarker targets that emerged from the N-glycosite workflow. Corresponding values indicating whether expression of a transcript was detected above background were also extracted. A 2-sided Student’s t-test was used to compare log2 transformed gene expression levels between the five luminal and the four triple negative cell lines. This comparison generated raw p-values, which were then adjusted for multiple comparisons using the Benjamini-Hochberg method for controlling FDRs 34. The adjustment was achieved with the p.adjust(pvals,”fdr”) function in R version 2.12.1 (2010-12-16). Adjusted FDR p-values lower than 10% (0.1) were considered significant.

Cell culture and production of conditioned media

All cells were cultured as described in Neve et al.21. To generate the CM, we cultured 10 breast cancer cell lines (Table 1) that were derived from 5 luminal (SKBR3, SUM52 PE, MDAMB175, UACC 812, and MDAMA361) and 5 triple negative tumors (MDA468, BT549, HS578T, MDAMB231, and HCC38). CM was prepared and trypsin digested at Site M. The lines were grown to 75–80% confluence in the appropriate culture medium 21. Then they were washed with fresh medium without fetal calf serum (FCS) or phenol red and incubated for 10 min at 37 °C. This process was repeated twice before the cells were incubated in fresh medium (without FCS and phenol red) for 18–20 h. At the end of the culture period, the cells retained their original morphologies with no evidence of apoptosis. The CM was harvested and centrifuged at 2000 × g for 10 min. The supernatant was concentrated using Millipore centrifugal filter units (MWCO 3K) and dialyzed against phosphate buffered saline (PBS).

Table 1

Luminal and triple-negative breast cancer cell lines.

Cell lineaTumor subtypeERbPRcHER2dDiagnosise
SKBR3LuminalYesAdenocarcinoma
SUM52 PELuminal+NoCarcinoma
MDAMB175Luminal+NoIDC
UACC 812Luminal+YesIDC
MDAMA361Luminal+YesAdenocarcinoma
MDA468Basal ANoAdenocarcinoma
BT549Claudin-lowNoIDC, papillary
HS5787Claudin-lowNoIDC
MDAMB 231Claudin-lowNoAdenocarcinoma
HCC38Claudin-lowNoDuctal carcinoma
This table is populated with information from Neve et al., 2006.
Estrogen (ER) or progesterone receptor (PR) expression.
Human epidermal growth factor receptor 2 (HER2) overexpression.
Invasive ductal carcinoma (IDC).

Lectin blotting and staining

Biotinylated and fluoresceinated lectins were purchased from Vector Laboratories. Blotting: Cell lysates were separated by SDS-PAGE (4–12% gels) and transferred to nitrocellulose membranes. Unless otherwise indicated, the following buffer was used for all steps, including blocking, washing, and reagent dilution/incubation: 0.25 M Tris-Cl, pH 8.0, 0.5 M NaCl, 0.5% NP-40. Blots were incubated in buffer for 1 h to block non-specific binding, then exposed to a solution of ~5 μg/mL of biotinylated lectin for 2 h. Blots were washed 3 × 5 min with copious amounts of buffer. Then, membranes were reacted with ABC reagent (Vector Laboratories) for 1 h and washed again as before. Finally, bound lectin was detected using 3,3-diaminobenzidine (DAB, Vector Laboratories) prepared in water according to the manufacturer’s instructions. Staining: cell surface labeling of non-permeabilized cells was performed as described 26, except that fluoresceinated lectins, rather than antibodies, were used.

Trypsin digestion

First, protein concentrations of the CM samples were determined by amino acid analysis. Then, CM samples were digested and desalted using a published method that incorporates denaturation with 6 M urea 27. As previously described 24, samples were spiked with 25 and 50 pmol of trypsin-digested control glycopeptides from commercial yeast invertase and human lactoferrin (Sigma, St. Louis, MO), respectively. Peptides were stored at −80 °C prior to analyses.

Preparation of lectin columns

The columns were prepared at Site M from a single batch of lectin-conjugated beads and distributed to all the laboratories. Briefly, Sambucus nigra agglutinin (SNA) and Aleuria aurantia lectin (AAL) were purchased from Vector Laboratories (Burlingame, CA). Lectins (20 mg) were suspended at 5–10 mg/mL in PBS and conjugated to 330 mg of POROS-AL beads (Applied Biosystems, Foster City, CA) as previously described 24. Unconjugated protein was removed by washing the beads (5 × 5 mL of 1 M sodium chloride) before they were packed into 3 individual 4.6 × 50 mm PEEK HPLC columns. Routine storage was in PBS with 0.02% sodium azide at 4 °C for up to 6 months. Columns were reused for up to 75 affinity separations without degradation of the performance characteristics as assessed by glycopeptide enrichment and total number of glycopeptides recovered from digested human plasma.

Lectin chromatography—Instrumentation

The HPLC systems employed were standardized in terms of injection volume, transfer line lengths, dead volume minimization, and common UV elution profiles. Site M used a Paradigm MG4 HPLC system equipped with a CTC PAL robot configured as an autosampler and fraction collector (Michrom Bioresources). At Site X, a Waters system including 1525 Binary HPLC equipped with a 717 plus Autosampler and a Fraction Collector III was employed. Site S used a Shimadzu 20AD HPLC system equipped with a SIL-20AC autosampler; fractions were collected manually. Mobile phases: Buffer A was 25 mM Tris buffer, pH 7.4, 50 mM sodium chloride, 10 mM calcium chloride, and 10 mM magnesium chloride; Buffer B was 0.5 M acetic acid. Affinity separation: Routinely, ~100 μg of digested protein was diluted into Buffer A, applied to the lectin column, and separated using the following 3 step gradient: 1) Sample load: Buffer A for 9.0 min at 80 μL/min; 2) Sample elution: Buffer B for 4.8 min at 500 μL/min; and 3) Re-equilibration: Buffer A for 6.0 min at 3000 μL/min. The bound fraction, collected from 9.0 to 14.25 min, was desalted using Oasis HLB cartridges as described above. Eluted samples were neutralized by the addition of 0.5 M ammonium bicarbonate and concentrated to <100 μL by vacuum centrifugation. Further details are described in the accompanying SOP (Supplementary Document 1).

PNGase F digestion

N-linked glycopeptides in the bound fractions were deglycosylated by treatment with PNGase F (Glycerol-free, New England Biolabs; Ipswich, MA) as previously described 24. Following deglycosylation, samples were desalted and concentrated using C18 ZipTips® (Millipore; Billerica, MA) or MicroSpin Columns, 5–200 μL (The Nest Group, Inc.; Southborough, MA).

ESI-QqTOF mass spectrometric analyses (Sites M and X)

The peptides were separated using an Eksigent nano-LC 2D HPLC system (Eksigent, Dublin, CA), which was directly connected to a quadrupole time-of-flight (QqTOF) QSTAR Elite mass spectrometer (AB Sciex, Foster City, CA). We injected 33% (vol/vol) of the bound material per run. Briefly, peptides were applied to a guard column (C18 Acclaim PepMap100, 300 μm I.D. × 5 mm, 5 μm particle size, 100 Å pore size; Dionex, Sunnyvale, CA) and washed with the aqueous loading solvent (2% solvent B in A, flow rate: 20 μL/min) for 10 min prior to separation on a C18 Acclaim PepMap100 column (75 μm I.D. × 15 cm, 3 μm particle size, 100 Å pore size; Dionex, Sunnyvale, CA). Bound material was eluted at a flow rate of 300 nL/min using the following gradients: 2–40% solvent B in A (from 0–60 min), 40–90% solvent B in A (from 60–75 min), and at 90% solvent B in A (from 75–85 min), with a total runtime of 120 min (including column equilibration). Solvent A consisted of 0.1% formic acid in 98% H2O/2% acetonitrile and solvent B was 0.1% formic acid in 98% acetonitrile/2% H2O. Spectra were calibrated using MS/MS fragment-ions of a Glu-Fibrinogen B peptide standard. Advanced information dependent acquisition was employed for MS/MS data collection using QSTAR Elite (Analyst QS 2.0) specific features, including “Smart Collision” (fragment intensity multiplier set to 2.0) and “Smart Exit” (maximum accumulation time of 2.5 sec) to obtain MS/MS spectra for the six most abundant precursor ions following each survey scan. To increase overall sampling efficiencies, two replicate nano-HPLC-MS/MS analyses were performed per sample.

ESI-LTQ-Orbitrap XL mass spectrometric analyses (Site S)

The peptide mixtures were separated as described above using an Agilent nanoflow 1100 HPLC system (Agilent, Santa Clara, CA) connected to a hybrid linear ion trap Orbitrap mass spectrometer (LTQ Orbitrap XL, Thermo Fisher Scientific). The electrospray ionization emitter tip (Pico-tip emitter, F360-75-15-N-5-C10.5) was purchased from New Objective (Woburn, MA). The mass spectrometer, which was calibrated with a solution of caffeine, MRFA and Ultramark 1621 according to the manufacturer’s instructions, was operated in the data-dependent mode. Full MS scans from m/z 350 to 1600 with a full width at half maximum resolution of 30,000 were acquired as profile data, followed by MS/MS scans of the six most abundant ions in the linear trap. Singly charged ions were excluded. A dynamic mass exclusion time was applied for 120s with a repeat count of 1 and a repeat duration time of 30s. In all scan modes, one micro scan was applied.

Database searches

Mass spectrometric data from all laboratories were analyzed at Site M using two bioinformatics database search engines with integrated peak picking, ProteinPilot (AB Sciex) version 4.0.8085 (revision 148085) using the Paragon Algorithm 4.0.0.0, 148083 28, and Mascot version 2.2.04 using Mascot Daemon version 2.2.2 (both Matrix Science). For the latter, the following (default) data import filter options were used: precursor charge state +2 to +4, reject spectrum if < 7 peaks or if precursor is < 400 or >10000 m/z, remove peaks with intensity < 0.001% of the highest peak; centroid all MS/MS data, percentage height 50, and merge distance 0.1 atomic mass units. Peak lists for the Orbitrap LC-MS/MS data sets were generated using Mascot Distiller 2.3.2.0 (Matrix Science) with the supplied processing parameter file Orbitrap_low_res_MS2_4.opt. The Orbitrap peak lists were saved in MGF format with Distiller preferences set to save MS/MS peaks as MH+ for input into Mascot and ProteinPilot search engines. All data were searched using a merged database of 20293 protein sequences including the publicly available human SwissProt UniProt release 2010_09 plus 7 other proteins, which includes all 20,286 reviewed (formerly SwissProt) Human Uniprot Entries, as well as PNGase F ({"type":"entrez-protein","attrs":{"text":"Q9XBM8","term_id":"75474045","term_text":"Q9XBM8"}}Q9XBM8|Q9XBM8_FLAME, {"type":"entrez-protein","attrs":{"text":"P21163","term_id":"130373","term_text":"P21163"}}P21163|PNGF_ELIMR) and Yeast Invertase ({"type":"entrez-protein","attrs":{"text":"P10594","term_id":"124702","term_text":"P10594"}}P10594|INV1_YEAST, {"type":"entrez-protein","attrs":{"text":"P00724","term_id":"124703","term_text":"P00724"}}P00724|INV2_YEAST, {"type":"entrez-protein","attrs":{"text":"P10595","term_id":"124704","term_text":"P10595"}}P10595|INV3_YEAST, {"type":"entrez-protein","attrs":{"text":"P10596","term_id":"124705","term_text":"P10596"}}P10596|INV4_YEAST, {"type":"entrez-protein","attrs":{"text":"P10597","term_id":"124706","term_text":"P10597"}}P10597|INV5_YEAST). ProteinPilot searches were performed as previously described 24. A ProteinPilot peptide confidence cut-off value of 98.8 was chosen, corresponding to a local FDR of 5%. For Mascot searches, the following parameters were used: trypsin enzyme specificity, carbamidomethyl (Cys) as a fixed modification, and the following variable modifications: deamidation of asparagine and glutamine residues, oxidization of methionines, acetylation at the protein N-terminus, cyclization of N-terminal glutamines, and two missed tryptic cleavages. For QSTAR Elite data a mass tolerance of 100 ppm and 0.4 Da was set for the precursor and product ions, respectively; whereas values of 10 ppm and 0.8 Da were applied to Orbitrap data. Peptide-spectral matches with expectation values <0.026 were accepted. FDR analysis was performed using the Mascot automatic decoy search. In all cases, the peptide false-positive identification rate was <3%.

Glycopeptide assignment

Deglycosylated peptides were identified as previously described 24, on the basis of several criteria including the motif NxS/T, x ≠ proline, in which Asn was converted to Asp (reported by the search engine as Asn deamidation), and the presence of at least one fragment ion encompassing the glycosite. To ensure inclusion of glycosites containing Lys and/or Arg in the X position (e.g., NKT), which were likely to have been cleaved by trypsin, the amino-acid residue following the carboxy-terminal cleavage site was also considered. Peptides containing the motif NGS or NGT were excluded due to the fact that asparagine residues in that sequence are prone to chemical deamidation during overnight trypsin digestion 29. For all deglycosylated peptides the corresponding MS/MS spectra were manually examined using an adaptation of previously published criteria to ensure correct assignment 24, 30.

Generation of a list of candidate triple negative-specific glycosites

The selection criteria for triple negative-specific glycosites were subjected to a resampling, non-parametric statistical test in which no knowledge about the data’s distribution is necessary, e.g., the “bootstrap” technique 31. The basic premise of this approach is to consider the null hypothesis that there is statistically no difference between the luminal and triple negative data sets, e.g., that the two are random selections from the same population. To determine the expected FDRs, we applied 20,000 random permutations to the form:

Criterion n-m: A glycosite satisfies criterion n-m if it is identified in ≤ n Luminal cell lines and in ≥ m TN cell lines.

The results are shown in Supplementary Table 3.

Spectral Viewer: Skyline Spectral Library

An interactive Skyline spectral library file that contains all MS/MS spectra of deglycosylated peptides identified in this study been submitted as Supplemental Material. Skyline is an open source program 32 available for free download at http://proteome.gs.washington.edu/software/skyline.

Exon expression array and RNAseq experiments

Whole transcriptome shotgun sequencing (RNAseq) was completed on nine of ten breast cancer cell lines (BT549, HCC38, HS578T, MDAMB231, MDAMB175VII, MDAMB361, SKBR3, SUM52PE and UACC812). Expression analysis was performed with the ALEXA-seq software package as previously described 33. On a per sample basis, an average of 58.7 million (76bp paired-end) reads passed quality control, and 37.6 million mapped to the transcriptome, which resulted in coverage of 40x across all known genes. Log2 transformed estimates of gene-level expression were extracted for fucosyl- and sialyltransferase genes, and triple negative candidate biomarker targets that emerged from the N-glycosite workflow. Corresponding values indicating whether expression of a transcript was detected above background were also extracted. A 2-sided Student’s t-test was used to compare log2 transformed gene expression levels between the five luminal and the four triple negative cell lines. This comparison generated raw p-values, which were then adjusted for multiple comparisons using the Benjamini-Hochberg method for controlling FDRs 34. The adjustment was achieved with the p.adjust(pvals,”fdr”) function in R version 2.12.1 (2010-12-16). Adjusted FDR p-values lower than 10% (0.1) were considered significant.

Results and Discussion

Workflow

These experiments utilized a lectin chromatography, MS-based approach that we recently optimized and published to identify candidate cancer biomarkers 24. Initially, we probed nitrocellulose transfers of electrophoretically-separated cell lysates of breast cancer lines established from triple negative and luminal tumor subtypes with a panel of nine lectins (SNA, AAL, Vicia villosa, Phaseolus vulgaris leukoagglutinating and erythroagglutinating, Galanthus nivalis, Euonymus europaeus, Lycopersicon esculentum, and Arachis hypogaea) that recognized either internal saccharide motifs or terminal sugars. The results showed that SNA (Fig. 1a) and AAL (data not shown), which bind motifs with sialic acid and fucose, respectively, reacted with a wide array of glycoproteins. Additionally, some glycoforms were enriched in lines that were derived from the tumors of the same subtype. Staining of intact non-permeabilized cells with fluorescein-conjugated SNA revealed strong surface labeling (Fig. 1b). Together, these results suggested that the breast cancer cell lines produced a large repertoire of glycoproteins that reacted with SNA or AAL, including cell-surface molecules poised to be shed or released.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f1.jpg
Breast cancer cell lines have a complex repertoire of SNA-reactive glycoproteins and exhibit cell surface staining with this lectin

(A) Lysates from a panel of 8 breast cancer cell lines, which included triple negative (1–6) and luminal (7, 8) subtypes, were electrophoretically separated, transferred to nitrocellulose, and probed with SNA. Lane 1. MDAMB468, 2. HCC38, 3. HCC1500, 4. HS578T, 5. MDAMB157, 6. MDAMB231, 7. T47D, 8. UCC812. (B) Non-permeabilized HS578T cells were stained with fluorescein-conjugated SNA and imaged by fluorescence microscopy (magnification 60x).

Next, we used this workflow to compare CM samples from 5 luminal and 5 triple negative breast cancer cell lines to identify subtype-specific glycosites. The cells, listed in Table 1, are members of a well-annotated collection that have been used to define the gene expression profiles, drug sensitivities, and protein splicing patterns of the tumor types from which they were derived 20, 21, 23. Contrary to many other lectin-based approaches, the affinity capture step was performed at the glycopeptide, rather than the protein level, which decreased non-specific binding due to hydrophobic interactions, a phenomenon that we previously observed between lectins and intact proteins. Thus, the samples were trypsin-digested prior to HPLC separation on lectin-conjugated POROS. Then, the bound fraction was treated with peptide N-glycosidase F (PNGase F) to remove N-linked glycans prior to LC-MS/MS analyses. The results were analyzed using two search engines, ProteinPilot and Mascot, to identify peptides and their corresponding proteins 28. N-glycosylates were identified as described in the methods 29. Finally, each MS/MS spectrum was manually inspected for the presence of at least one fragment ion that encompassed an N-glycosylation site. Thus, this method identified the glycosite that carries an oligosaccharide with a lectin-binding motif and the corresponding protein. These rigorous criteria were key to making this method highly reproducible 24.

We know from our participation in the Clinical Proteomic Technologies for Cancer (CPTAC) network that analysis of the same sample at multiple sites on different platforms is one way to maximize identifications and test the robustness of a workflow 35, 36. The experimental strategy we used, which exploited this observation, is depicted in Fig. 2. CM samples were trypsin-digested and aliquoted at a single site (Fig. 2A). Lectin enrichment and LC-MS/MS analyses were carried out according to a Standard Operating Procedure (SOP, Supplemental Document 1) at each of three locations—University of California San Francisco, Buck Institute for Research on Aging, and Purdue University (Fig. 2B). Prior to initiating the study, each group evaluated the lectin capture step using a National Institute of Standards and Technology (NIST) human pooled plasma sample, which we have extensively characterized with respect to the SNA and AAL chromatographic profiles and the glycosite composition of the bound fractions 24. MS analyses yielded glycosite identifications and percent enrichment values (total glycopeptides/total peptides) within the expected range 24.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f2.jpg
The experimental workflow

(A) CM samples from breast cancer cell lines established from five luminal and 5 triple negative tumors were prepared in one laboratory, then distributed to 3 sites. (B) Each group separated the 10 CM samples, in duplicate, by AAL or SNA chromatography, which generated 40 fractions. The samples were deglycosylated using PNGaseF and analyzed in duplicate by LC-MS/MS, yielding a total of 80 MS/MS data sets per site. (C) Files were transferred to a central location for bioinformatic analyses.

Two groups, M and X, acquired data using a QSTAR Elite QqTOF (AB Sciex), while the third, S, used an LTQ-Orbitrap (Thermo Fisher Scientific). The datasets were submitted to Site M, where all the searches and bioinformatic analyses were completed (Fig. 2C). As the work progressed, two changes to the protocol were implemented. First, due to technical problems encountered during the initial analysis, a second preparation of CM samples was analyzed at two of the three locations (M and S). Second, sites M and S replaced ZipTips® with spin-cartridges for the desalting step that followed PNGase F digestion. This change was made in response to the fact that, in initial experiments, Site S routinely identified significantly more glycosites using this desalting method. All peptides and glycopeptides observed in these experiments are presented as supplemental data (Supplementary Table 1).

Identification of >500 cell-surface or secreted glycoproteins

We tabulated the MS identifications according to the CM samples in which they were detected. Summaries of the data, including the number of glycoproteins, glycopeptides and N-glycosites observed in each CM sample, and the percent glycopeptide enrichment, are shown in Figs. 3 and and4,4, and in Supplementary Table 2. Overall the three groups identified a total of 1011 distinct N-glycosites from 533 glycoproteins. Of these, 945 and 641 were observed following AAL and SNA chromatography, respectively. Interestingly, the same workflow applied to pooled healthy human plasma resulted in many fewer identifications. Approximately half the species captured from CM bound to both lectins; the remainder preferentially interacted either with AAL or SNA. (Fig. 3A). A similar phenomenon was observed when the N-glycosites were grouped according to tumor subtype (Fig. 3B and C). Thus, it was clear that employing multiple lectins in our workflow resulted in a greater number of identifications. Furthermore, the data showed that the luminal and triple negative samples contained substantially different lectin-reactive species.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f3.jpg
Diagrammatic summary of the glycosite (glycoprotein) enrichment data according to lectin type (AAL vs. SNA) and CM samples (luminal vs. triple negative) showed distinct and overlapping specificities

(A) The intersecting circles depict the total number of N-glycosites (glycoproteins) captured by each lectin. (B and C) Venn diagrams illustrating the chromatographic separation of luminal (LUM) and triple negative (TN) CM samples.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f4.jpg
Lectin capture resulted in significant glycopeptide enrichment

The percent enrichment for the separations performed using AAL (left) or SNA (right) at Sites M (top), X (middle), and S (bottom). The dark line indicates the median; the box depicts the first and third quartiles; the whiskers show the minimum and maximum values observed. Sites M and X acquired data using QSTAR Elite instruments, while Site S used an Orbitrap mass spectrometer.

An overall comparison of the data obtained for luminal and triple negative samples across the three sites showed relatively high levels of enrichment in both cases (Fig. 4). Importantly, very few intracellular proteins were identified, additional evidence that the cells were not undergoing apoptosis. Approximately 90% of the glycoproteins observed reside either at the cell surface (59%) or in the extracellular matrix (29%), suggesting that our strategy of using CM as a source of secreted and/or shed glycoproteins was successful (Fig. 5). Since we wanted to identify candidate cancer biomarkers, we were interested to find that a number of the identified species have functions that are relevant to tumor biology. For example, we observed proteinases, including cathepsins and ADAM family members; adhesion molecules, including cadherins and integrins; extracellular matrix components, including decorin and SPARC; and cytokines, including leukemia inhibitory factor and vascular endothelial growth factor C. Furthermore, some of the glycoproteins had been previously identified as putative breast cancer biomarkers, including CD44, galectin-3 binding protein, insulin-like growth factor binding protein 3, and tissue inhibitor of metalloproteinase 1 3739. We also identified clinically useful markers, such as HER2/ErbB2, and the CA-125 antigen, MUC16, which is commonly used to screen for ovarian cancer, but can be also be upregulated in breast tumors 40, 41.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f5.jpg
Nearly 90% of identified glycoproteins resided in the plasma membrane or extracellular compartments

A portion (241/560) of the identified glycoproteins were annotated in the cellular component of Gene Ontology. Of these, the great majority were cell surface or secreted molecules.

Identification of putative glycosite biomarkers of triple negative breast cancers

Next, we used statistical analyses to generate a list of putative triple negative-specific glycosites. Specifically, we performed a statistical analysis using resampling methods that tested 20,000 random permutations of the data. This process generated a table (Supplementary Table 3) with the number of “triple negative-specific” glycosites expected at random for any given set of selection criteria (e.g., observed in “≥1 triple negative and 0 luminal” or “≥4 triple negative and 1 luminal”). This analysis allowed us to select parameters that maximized the identification of putative triple negative specific glycosites while controlling the FDR. In this context, we required that a glycosite be identified at least once in CM samples from ≥3 triple negative cell lines and not observed in luminal CMs. Using these criteria, the computed FDR for both lectin capture steps was ~15%. This yielded 49 candidates that bound to SNA and 76 that bound to AAL (Fig. 6). Of these, we removed glycosites from highly polymorphic HLA class I histocompatibility antigens, which are variably expressed in the population. The final list of 100 glycosites, from 83 glycoproteins, that were putative triple negative-specific candidates is shown in Table 2.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f6.jpg
Putative triple negative-specific glycosites (glycoproteins) enriched by AAL or SNA

The criteria applied were detection in ≥3 triple negative and 0 luminal cell line CMs.

Table 2

Putative triple negative-specific glycosites captured by AAL and SNA.

Gene nameEntry nameAccession
Numbera
GlycoproteinGlycosite(s)Observed
TN AALb
Observed
TN SNAb
FunctioncKnown
Fucosylation or
Sialylationd
NT5E5NTD_HUMAN{"type":"entrez-protein","attrs":{"text":"P21589","term_id":"112825","term_text":"P21589"}}P215895′-nucleotidaseN33332Nucleotidase
APPA4_HUMAN{"type":"entrez-protein","attrs":{"text":"P05067","term_id":"112927","term_text":"P05067"}}P05067Amyloid beta A4 proteinN57131Adhesion moleculeYes (58, 59)
ANPEPAMPN_HUMAN{"type":"entrez-protein","attrs":{"text":"P15144","term_id":"143811362","term_text":"P15144"}}P15144Aminopeptidase NN26503Protease
ANTXR1ANTR1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q96EC6","term_id":"74731549","term_text":"Q96EC6"}}Q96EC6Anthrax toxin receptor 1N18442Adhesion molecule
B3GNT2B3GN2_HUMANB3GNT2UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 2N17331Glycosyltransferase
BMP1BMP1_HUMAN{"type":"entrez-protein","attrs":{"text":"P13497","term_id":"13124688","term_text":"P13497"}}P13497Bone morphogenetic protein 1N9141ProteaseYes (60)
BMP1BMP1_HUMAN{"type":"entrez-protein","attrs":{"text":"P13497","term_id":"13124688","term_text":"P13497"}}P13497Bone morphogenetic protein 1N14231ProteaseYes (60)
BTDBTD_HUMAN{"type":"entrez-protein","attrs":{"text":"P43251","term_id":"226693503","term_text":"P43251"}}P43251BiotinidaseN15043MetabolismYes (61)
CDH2CADH2_HUMAN{"type":"entrez-protein","attrs":{"text":"P19022","term_id":"116241277","term_text":"P19022"}}P19022Cadherin-2N32530Adhesion moleculeYes (62)
CDH2CADH2_HUMAN{"type":"entrez-protein","attrs":{"text":"P19022","term_id":"116241277","term_text":"P19022"}}P19022Cadherin-2N40230Adhesion moleculeYes (62)
CDH2CADH2_HUMAN{"type":"entrez-protein","attrs":{"text":"P19022","term_id":"116241277","term_text":"P19022"}}P19022Cadherin-2N69233Adhesion moleculeYes (62)
CTSBCATB_HUMAN{"type":"entrez-protein","attrs":{"text":"P07858","term_id":"68067549","term_text":"P07858"}}P07858Cathepsin BN3854ProteaseYes (63)
CTSL1CATL1_HUMAN{"type":"entrez-protein","attrs":{"text":"P07711","term_id":"115741","term_text":"P07711"}}P07711Cathepsin L1N22143Protease
CTSL2CATL2_HUMAN{"type":"entrez-protein","attrs":{"text":"O60911","term_id":"12644075","term_text":"O60911"}}O60911Cathepsin L2N22132Protease
CCDC80CCD80_HUMAN{"type":"entrez-protein","attrs":{"text":"Q8R2G6","term_id":"143955296","term_text":"Q8R2G6"}}Q8R2G6Coiled-coil domain-containing protein 80N66731Adhesion molecule
CCDC80CCD80_HUMAN{"type":"entrez-protein","attrs":{"text":"Q8R2G6","term_id":"143955296","term_text":"Q8R2G6"}}Q8R2G6Coiled-coil domain-containing protein 80N66843Adhesion molecule
CD109CD109_HUMAN{"type":"entrez-protein","attrs":{"text":"Q6YHK3","term_id":"117949389","term_text":"Q6YHK3"}}Q6YHK3CD109 antigeneN6831TGF-beta pathway
CD109CD109_HUMAN{"type":"entrez-protein","attrs":{"text":"Q6YHK3","term_id":"117949389","term_text":"Q6YHK3"}}Q6YHK3CD109 antigeneN39740TGF-beta pathway
CD44CD44_HUMAN{"type":"entrez-protein","attrs":{"text":"P16070","term_id":"308153615","term_text":"P16070"}}P16070CD44 antigeneN2535Adhesion moleculeYes (64)
CGB1CGB1_HUMAN{"type":"entrez-protein","attrs":{"text":"A6NKQ9","term_id":"193806756","term_text":"A6NKQ9"}}A6NKQ9Choriogonadotropin subunit beta variant 1N6330HormoneYes (65)
CLIC1CLIC1_HUMAN{"type":"entrez-protein","attrs":{"text":"O00299","term_id":"12643390","term_text":"O00299"}}O00299Chloride intracellular channel protein 1N4242Ion channel
CLUCLUS_HUMAN{"type":"entrez-protein","attrs":{"text":"P10909","term_id":"116533","term_text":"P10909"}}P10909ClusterineN35443ReceptorYes (66)
COL1A1CO1A1_HUMAN{"type":"entrez-protein","attrs":{"text":"P02452","term_id":"296439504","term_text":"P02452"}}P02452Collagen alpha-1 (I) chainN136523ECM
COL5A1CO5A1_HUMAN{"type":"entrez-protein","attrs":{"text":"P20908","term_id":"85687376","term_text":"P20908"}}P20908Collagen alpha-1 (V) chainN17634ECM
COL6A1CO6A1_HUMAN{"type":"entrez-protein","attrs":{"text":"P12109","term_id":"125987811","term_text":"P12109"}}P12109Collagen alpha-1 (VI) chainN80433ECM
COL6A2CO6A2_HUMAN{"type":"entrez-protein","attrs":{"text":"P12110","term_id":"125987812","term_text":"P12110"}}P12110Collagen alpha-2 (VI) chainN14023ECM
COL6A2CO6A2_HUMAN{"type":"entrez-protein","attrs":{"text":"P12110","term_id":"125987812","term_text":"P12110"}}P12110Collagen alpha-2 (VI) chainN78533ECM
COL12A1COCA1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q99715","term_id":"146345397","term_text":"Q99715"}}Q99715Collagen alpha-1 (XII) chainN267944ECM
COL18A1COIA1_HUMAN{"type":"entrez-protein","attrs":{"text":"P39060","term_id":"215274264","term_text":"P39060"}}P39060Collagen alpha-1 (XVIII) chainN92630ECM
CPVLCPVL_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9H3G5","term_id":"67476930","term_text":"Q9H3G5"}}Q9H3G5Probable serine carboxypeptidase CPVLN34631Protease
CRIM1CRIM1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9NZV1","term_id":"67460590","term_text":"Q9NZV1"}}Q9NZV1Cysteine-rich motor neuron 1 proteinN7133Receptor
CRTAPCRTAP_HUMAN{"type":"entrez-protein","attrs":{"text":"O75718","term_id":"17372894","term_text":"O75718"}}O75718Cartilage-associated proteinN8732ECM
DCBLD1DCBD1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q8N8Z6","term_id":"50400555","term_text":"Q8N8Z6"}}Q8N8Z6Discoidin, CUB and LCCL domain-containing protein 1N12442Unknown
DKK3DKK3_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9UBP4","term_id":"311033372","term_text":"Q9UBP4"}}Q9UBP4Dickkopf-related protein 3N9631Wnt signaling pathway
ECE1ECE1_HUMAN{"type":"entrez-protein","attrs":{"text":"P42892","term_id":"1706563","term_text":"P42892"}}P42892Endothelin-converting enzyme 1N16630Protease
ECM1ECM1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q16610","term_id":"48429255","term_text":"Q16610"}}Q16610Extracellular matrix protein 1eN44433Angiogenesis
EXT1EXT1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q16394","term_id":"20141422","term_text":"Q16394"}}Q16394Exostosin-1N33042GAG synthesis
EXT2EXT2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q93063","term_id":"3023739","term_text":"Q93063"}}Q93063Exostosin-2N28830GAG synthesis
FAT1FAT1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q14517","term_id":"334302792","term_text":"Q14517"}}Q14517Protocadherin Fat 1N232830Adhesion moleculeYes (67)
FBN1FBN1_HUMAN{"type":"entrez-protein","attrs":{"text":"P35555","term_id":"1613836596","term_text":"P35555"}}P35555Fibrillin-1N158143TGF-beta pathway
FBN1FBN1_HUMAN{"type":"entrez-protein","attrs":{"text":"P35555","term_id":"1613836596","term_text":"P35555"}}P35555Fibrillin-1N448 and N276744TGF-beta pathway
FN1FINC_HUMAN{"type":"entrez-protein","attrs":{"text":"P02751","term_id":"1767132020","term_text":"P02751"}}P02751FibronectinN43033ECMYes (68)
FSTL1FSTL1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q12841","term_id":"2498390","term_text":"Q12841"}}Q12841Follistatin-related protein 1N17545Cell growth DifferentiationYes (69)
FSTL1FSTL1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q12841","term_id":"2498390","term_text":"Q12841"}}Q12841Follistatin-related protein 1N18035Cell growth, DifferentiationYes (69)
FSTL3FSTL3_HUMAN{"type":"entrez-protein","attrs":{"text":"O95633","term_id":"23821565","term_text":"O95633"}}O95633Follistatin-related protein 3N21543TGF-beta pathway
FSTFST_HUMAN{"type":"entrez-protein","attrs":{"text":"P19883","term_id":"23831079","term_text":"P19883"}}P19883FollistatinN28832Hormonal regulationYes (70)
SERPINE2GDN_HUMAN{"type":"entrez-protein","attrs":{"text":"P07093","term_id":"121110","term_text":"P07093"}}P07093Glia-derived nexinN11843Protease inhibitor
GCNT2GNT2A_HUMAN{"type":"entrez-protein","attrs":{"text":"Q8N0V5","term_id":"74714686","term_text":"Q8N0V5"}}Q8N0V5N-acetyllactosaminide beta-1,6-N-acetylglucosaminyl-transferase, isoform AN4132Glycosyltransferase
GPC1GPC1_HUMAN{"type":"entrez-protein","attrs":{"text":"P35052","term_id":"292495012","term_text":"P35052"}}P35052Glypican-1N7954GAG
GRNGRN_HUMAN{"type":"entrez-protein","attrs":{"text":"P28799","term_id":"77416865","term_text":"P28799"}}P28799GranulinsN23654Cytokine
HSPA13HSP13_HUMAN{"type":"entrez-protein","attrs":{"text":"P48723","term_id":"1351125","term_text":"P48723"}}P48723Heat shock 70 kDa protein 13N18443ATPase
IGFBP3IBP3_HUMAN{"type":"entrez-protein","attrs":{"text":"P17936","term_id":"146327827","term_text":"P17936"}}P17936Insulin-like growth factor-binding protein 3N19954Cell growth, Differentiation
ICAM5ICAM5_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9UMF0","term_id":"296439327","term_text":"Q9UMF0"}}Q9UMF0Intercellular adhesion molecule 5N64632Adhesion moleculeYes (71)
ITGA3ITA3_HUMAN{"type":"entrez-protein","attrs":{"text":"P26006","term_id":"347595830","term_text":"P26006"}}P26006Integrin alpha-3N26530Adhesion moleculeYes (72)
ITGA5ITA5_HUMAN{"type":"entrez-protein","attrs":{"text":"P08648","term_id":"23831237","term_text":"P08648"}}P08648Integrin alpha-5N86831Adhesion molecule
ITGB1ITB1_HUMAN{"type":"entrez-protein","attrs":{"text":"P05556","term_id":"218563324","term_text":"P05556"}}P05556Integrin beta-1N52043Adhesion moleculeYes (72)
ITGB1ITB1_HUMAN{"type":"entrez-protein","attrs":{"text":"P05556","term_id":"218563324","term_text":"P05556"}}P05556Integrin beta-1N66943Adhesion moleculeYes (72)
JAG1JAG1_HUMAN{"type":"entrez-protein","attrs":{"text":"P78504","term_id":"20455033","term_text":"P78504"}}P78504Protein jagged-1N21740Cell growth, Differentiation
LAMC1LAMC1_HUMAN{"type":"entrez-protein","attrs":{"text":"P11047","term_id":"224471885","term_text":"P11047"}}P11047Laminin subunit gamma-1N120543ECM
LAMC1LAMC1_HUMAN{"type":"entrez-protein","attrs":{"text":"P11047","term_id":"224471885","term_text":"P11047"}}P11047Laminin subunit gamma-1N139553ECM
LIFLIF_HUMAN{"type":"entrez-protein","attrs":{"text":"P09056","term_id":"126280","term_text":"P09056"}}P09056Leukemia inhibitory factorN8530Cell growth, Differentiation
LOXL2LOXL2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9Y4K0","term_id":"13878585","term_text":"Q9Y4K0"}}Q9Y4K0Lysyl oxidase homolog 2N28843ECM cross-linking
LOXLYOX_HUMAN{"type":"entrez-protein","attrs":{"text":"P28300","term_id":"417269","term_text":"P28300"}}P28300Protein-lysine 6-oxidaseN8142ECM cross-linking
LOXLYOX_HUMAN{"type":"entrez-protein","attrs":{"text":"P28300","term_id":"417269","term_text":"P28300"}}P28300Protein-lysine 6-oxidaseN14441ECM cross-linking
METMET_HUMAN{"type":"entrez-protein","attrs":{"text":"P08581","term_id":"251757497","term_text":"P08581"}}P08581Hepatocyte growth factor receptorN10630Cell growth, Differentiation
MFGE8MFGM_HUMAN{"type":"entrez-protein","attrs":{"text":"Q08431","term_id":"1476413346","term_text":"Q08431"}}Q08431LactadherinN32534Tissue homeostasisYes (73)
MICAMICA_HUMAN{"type":"entrez-protein","attrs":{"text":"Q29983","term_id":"74740024","term_text":"Q29983"}}Q29983MHC class I polypeptide-related sequence AN7943Immune regulator
MRC2MRC2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9UBG0","term_id":"317373394","term_text":"Q9UBG0"}}Q9UBG0C-type mannose receptor 2N49741ECM remodeling
OLFML3OLFL3_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9NRN5","term_id":"37999795","term_text":"Q9NRN5"}}Q9NRN5Olfactomedin-like protein 3N17743Development
LEPRE1P3H1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q32P28","term_id":"109892809","term_text":"Q32P28"}}Q32P28Prolyl 3-hydroxylase 1N54044GAG
SERPINF1PEDF_HUMAN{"type":"entrez-protein","attrs":{"text":"P36955","term_id":"313104314","term_text":"P36955"}}P36955Pigment epithelium-derived factoreN28514Cell growth, Differentiation
PLOD2PLOD2_HUMAN{"type":"entrez-protein","attrs":{"text":"O00469","term_id":"62906878","term_text":"O00469"}}O00469Procollagen-lysine, 2-oxoglutarate 5-dioxygenase 2N6341ECM cross-linking
PLOD3PLOD3_HUMAN{"type":"entrez-protein","attrs":{"text":"O60568","term_id":"6093731","term_text":"O60568"}}O60568Procollagen-lysine, 2-oxoglutarate 5-dioxygenase 3N54830ECM cross-linking
PLTPPLTP_HUMAN{"type":"entrez-protein","attrs":{"text":"P55058","term_id":"1709662","term_text":"P55058"}}P55058Phospholipid transfer proteinN14333Lipid metabolism
PLTPPLTP_HUMAN{"type":"entrez-protein","attrs":{"text":"P55058","term_id":"1709662","term_text":"P55058"}}P55058Phospholipid transfer proteinN39833Lipid metabolism
POSTNPOSTN_HUMAN{"type":"entrez-protein","attrs":{"text":"Q15063","term_id":"93138709","term_text":"Q15063"}}Q15063PeriostinN59943Adhesion molecule
PPGBPPGB_HUMAN{"type":"entrez-protein","attrs":{"text":"P10619","term_id":"20178316","term_text":"P10619"}}P10619Lysosomal protective proteinN33314Glycan degradation
PRNPPRIO_HUMAN{"type":"entrez-protein","attrs":{"text":"P04156","term_id":"130912","term_text":"P04156"}}P04156Major prion proteinN18154UnknownYes (74)
PTK7PTK7_HUMAN{"type":"entrez-protein","attrs":{"text":"Q13308","term_id":"116242736","term_text":"Q13308"}}Q13308Tyrosine-protein kinase-like 7N40551Adhesion molecule
PTK7PTK7_HUMAN{"type":"entrez-protein","attrs":{"text":"Q13308","term_id":"116242736","term_text":"Q13308"}}Q13308Tyrosine-protein kinase-like 7N56753Adhesion molecule
PVRPVR_HUMAN{"type":"entrez-protein","attrs":{"text":"P15151","term_id":"1346922","term_text":"P15151"}}P15151Poliovirus receptorN12043Immune regulator
SEZ6L2SE6L2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q6UXD5","term_id":"334302856","term_text":"Q6UXD5"}}Q6UXD5Seizure 6-like protein 2N24732
SEZ6L2SE6L2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q6UXD5","term_id":"334302856","term_text":"Q6UXD5"}}Q6UXD5Seizure 6-like protein 2N37331Unknown
SPARCSPRC_HUMAN{"type":"entrez-protein","attrs":{"text":"P09486","term_id":"129283","term_text":"P09486"}}P09486SPARC (Osteonectin)N11653Cell growth, DifferentiationYes (75)
SUSD5SUSD5_HUMAN{"type":"entrez-protein","attrs":{"text":"O60279","term_id":"182676443","term_text":"O60279"}}O60279Sushi domain-containing protein 5N35444Unknown
ABI3BPTARSH_HUMAN{"type":"entrez-protein","attrs":{"text":"Q7Z7G0","term_id":"50401533","term_text":"Q7Z7G0"}}Q7Z7G0Target of Nesh-SH3N4432Cell migration
TFPITFPI1_HUMAN{"type":"entrez-protein","attrs":{"text":"P10646","term_id":"125932","term_text":"P10646"}}P10646Tissue factor pathway inhibitorN14554Complement cascadeYes (76)
TGFB1TGFB1_HUMAN{"type":"entrez-protein","attrs":{"text":"P01137","term_id":"135674","term_text":"P01137"}}P01137Transforming growth factor beta-1N8241TGF-beta pathwayYes (77)
TGFB2TGFB2_HUMAN{"type":"entrez-protein","attrs":{"text":"P61812","term_id":"48429157","term_text":"P61812"}}P61812Transforming growth factor beta-2N24154TGF-beta pathway
THBS3TSP3_HUMAN{"type":"entrez-protein","attrs":{"text":"P49746","term_id":"1717814","term_text":"P49746"}}P49746Thrombospondin-3N40732Adhesion molecule
TWSG1TWSG1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9GZX9","term_id":"74733506","term_text":"Q9GZX9"}}Q9GZX9Twisted gastrulation protein homolog 1N5232Cell growth, Differentiation
TXNDC15TXD15_HUMAN{"type":"entrez-protein","attrs":{"text":"Q96J42","term_id":"74732127","term_text":"Q96J42"}}Q96J42Thioredoxin domain-containing protein 15N29332Unknown
AXLUFO_HUMAN{"type":"entrez-protein","attrs":{"text":"P30530","term_id":"1375383940","term_text":"P30530"}}P30530Tyrosine-protein kinase receptor UFON4344Receptor
AXLUFO_HUMAN{"type":"entrez-protein","attrs":{"text":"P30530","term_id":"1375383940","term_text":"P30530"}}P30530Tyrosine-protein kinase receptor UFON15742Receptor
AXLUFO_HUMAN{"type":"entrez-protein","attrs":{"text":"P30530","term_id":"1375383940","term_text":"P30530"}}P30530Tyrosine-protein kinase receptor UFON19842Receptor
AXLUFO_HUMAN{"type":"entrez-protein","attrs":{"text":"P30530","term_id":"1375383940","term_text":"P30530"}}P30530Tyrosine-protein kinase receptor UFON33943Receptor
PLAURUPAR_HUMAN{"type":"entrez-protein","attrs":{"text":"Q03405","term_id":"465003","term_text":"Q03405"}}Q03405Urokinase plasminogen activator surface receptorN22223ECM remodelingYes (78)
PLAUUROK_HUMAN{"type":"entrez-protein","attrs":{"text":"P00749","term_id":"254763341","term_text":"P00749"}}P00749Urokinase-type plasminogen activatorN32244ECM remodeling
Not availableYK047_HUMAN{"type":"entrez-protein","attrs":{"text":"Q68D85","term_id":"74708829","term_text":"Q68D85"}}Q68D85Putative Ig-like domain-containing proteinN24230Unknown
Uniprot accession number.
Number of triple-negative (TN) cell lines in which a glycosite was observed with this lectin.
Uniprot database annotation.
References in parentheses.
Denotes glycoproteins observed in healthy plasma following AAL or SNA enrichment (24).

Next, we asked whether the glycosites we identified could have been predicted from transcriptome analyses. To answer this question, we used existing exon expression array profiles for all of the cell lines and RNAseq data for 9 of the 10. Since the two platforms identified similar sets of differentially expressed genes, we performed statistical analyses using values from the RNAseq experiments, which are better able to differentiate signal from noise (Supplementary Table 4). These analyses showed that 46 of the 83 mRNAs encoding the protein scaffolds that carried biomarker glycosites were upregulated ≥ 2-fold in triple negative vs. luminal cells. This suggested that the differential detection of these glycosites in triple negative CM samples may have been attributable to differences in relative protein abundances. In contrast, more than half of the triple negative-specific candidates could not have been predicted from the mRNA expression data, as there was no difference in mRNA abundances between the luminal and triple negative subsets. The identification of these glycosites may have been driven by alterations in the protein glycosylation machinery of triple negative cell lines. To address this possibility, we looked for differences in mRNA levels of the transferases that add fucose (recognized by AAL), and sialic acid (recognized by SNA). The results are shown in Supplementary Table 5. Two fucosyltransferases and 8 sialyltransferases were differentially expressed, either up or downregulated, in triple negative vs. luminal cell lines. Given that we observed both gains and losses of enzymatic activity, it is difficult to predict, in structural terms, the net consequences of these changes. However, our glycosite data are empirical evidence of subtype-specific glycosylation patterns in breast cancer.

Disease relevance of biomarker scaffolds

Initial inspection of the 100 triple negative-specific candidates showed that many targets were derived from glycoproteins that are involved in cancer-relevant processes. To more fully explore this correlation, we performed pathway analyses using two bioinformatics resources: Kyoto Encyclopedia of Genes and Genomes (KEGG) and Ingenuity (IPA). However, the programs recognized only small portions of the dataset, together matching 38% of the total proteins (Supplementary Tables 6 and 7), and most of the results were driven by only a few molecules, e.g., integrins. As an alternative, literature searches enabled assignment of biological functions to 90% of the putative triple negative-specific glycoproteins. Three prominent, interrelated themes emerged—38% of the targets were up- or downstream components of the TGFβ pathway; 21% were involved in ECM remodeling; and at least 18% were proteinases or proteolytic targets. Minor recurring associations included the epithelial to mesenchymal transition (EMT, 9%) and bone morphogenic protein signaling (6%).

TGFβ signaling governs important aspects of ECM remodeling and proteinase activities. Through the synthesis, cross-linking, and degradation of a variety of protein and carbohydrate matrix components, the composition and tensile strength of the ECM are modulated, both of which dramatically influence the behavior of surrounding cells 42, 43. With respect to cancer, these activities are strongly associated with increased migration and invasion. TGFβ is also considered to be a central mediator of EMT, through both canonical (i.e., Smad-dependent) and non-canonical (e.g., PI3K and MAPK) pathways 44. Cells undergoing EMT lose apical-basal polarity and stabilizing adhesive epithelial interactions in exchange for the acquisition of a more migratory mesenchymal phenotype. These changes can lead to cell invasion and metastasis, functions that have been linked to TGFβ activity 45, 46. Thus, as a group, the putative triple negative-specific targets we identified were derived from proteins with striking functional similarities and disease relevance 47. It is possible that these biomarker candidates may also suggest subtype-specific clinical targets, which currently do not exist for triple negative breast cancer 18, 19.

Clinical relevance of putative biomarker targets

The heterogeneous nature of breast cancer is widely accepted 13. Tumor subtyping is commonly based on immunohistochemical analyses of tissue sections cut from biopsies to profile expression of a marker panel—ER, PR, HER2, cytokeratin 5/6 and epidermal growth factor receptor. Increasingly, clinicians are using this information to determine prognoses and optimize treatment 48. For example, the risk prediction tool Adjuvant!Online (www.adjuvantonline.com) can be used to identify the patients who will benefit most from postoperative treatment(s). Although immunohistology-based diagnoses are changing the clinical oncology landscape and improving patient outcomes, there remains much room for advancement. Currently, subtype diagnoses require identification of a lesion, and an invasive procedure to obtain a biopsy. Therefore, the need for circulating biomarkers that serve as sentinels of breast cancer and enable subtyping remains great.

In this context, our biomarker discovery method used cancer cell line CM, i.e., the secretome, as the starting material to identify candidate glycoproteins that carried putative subtype-specific N-glycosites. For the enrichment step, we used lectin capture at the glycopeptide, rather than glycoprotein level. This approach gives more information, in terms of glycan composition and location along the peptide backbone, than other commonly used related methods (e.g., lectin chromatography at the glycoprotein level, and hydrazide- or boronic acid-mediated chemical capture of glycoproteins/glycopeptides) 24. Accordingly, we interrogated a largely unexplored biomarker discovery space. This theory is substantiated by the fact that only four of the targets that we identified were among the 150 most abundant plasma proteins as described by Hortin et al. 49. Furthermore, only 52 of the targets were among the recently published high-confidence human plasma proteome that included estimated protein concentrations 50. Of those found in this dataset, 73% were predicted to be <50 ng/mL, while 40% were likely to be <10 ng/mL, reasonably low background levels against which to observe circulating disease-derived signals. As additional support for this concept, only six of the putative triple negative-specific N-glycosites from five glycoproteins were found in a previous study in which we used the same workflows and AAL or SNA chromatography to analyze a sample of NIST pooled human plasma from 100 healthy individuals 24. These included glycosites from CD109, CD44, clusterin, extracellular matrix protein 1, and pigment epithelium-derived factor.

In summary, the workflow that we developed could serve as a blueprint for biomarker discovery. In this paradigm, an initial candidate list is developed using an easily obtained renewable material, such as cell line CM, rather than valuable, and often difficult to obtain, clinical samples such as plasma or serum. As studies that employ targeted enrichment strategies are considerably more sensitive than shotgun proteomics methods, the ability to generate a candidate biomarker list from a biologically-relevant source significantly improves the chances of success during the subsequent verification stage 51. This method may be especially useful for diseases, such as ovarian cancer, for which the cell type of origin is uncertain and, consequently, it is difficult to choose control samples 52, 53. A limitation of the method is that O-linked and intact N-linked glycopeptides are not analyzed due to the absence of universal enzymes to remove carbohydrates and the lack of sufficiently powerful software for rapid identifications, respectively. However, we do not view this as a liability. This workflow was designed as a high-throughput platform to generate biomarker candidates for subsequent verification by MRM. In general, due to heterogeneity, endogenous glycopeptides make poor MRM targets. By contrast, our method yielded a list of putative biomarker targets for direct follow up in clinical samples, and is easily accessible to any laboratory performing proteomics. Indeed, several groups have recently employed similar methods to identify candidate biomarkers of various cancers including prostate, colon, thyroid and breast 5457. Interestingly, a few of the biomarkers that we identified were also observed in the latter study, suggesting that this general approach is reproducible and robust 54. Finally, this workflow is well suited to the development of a multiplexed clinical assay, analogous to a reverse protein array approach, with antibody capture as the first step and lectin binding as the second.

Workflow

These experiments utilized a lectin chromatography, MS-based approach that we recently optimized and published to identify candidate cancer biomarkers 24. Initially, we probed nitrocellulose transfers of electrophoretically-separated cell lysates of breast cancer lines established from triple negative and luminal tumor subtypes with a panel of nine lectins (SNA, AAL, Vicia villosa, Phaseolus vulgaris leukoagglutinating and erythroagglutinating, Galanthus nivalis, Euonymus europaeus, Lycopersicon esculentum, and Arachis hypogaea) that recognized either internal saccharide motifs or terminal sugars. The results showed that SNA (Fig. 1a) and AAL (data not shown), which bind motifs with sialic acid and fucose, respectively, reacted with a wide array of glycoproteins. Additionally, some glycoforms were enriched in lines that were derived from the tumors of the same subtype. Staining of intact non-permeabilized cells with fluorescein-conjugated SNA revealed strong surface labeling (Fig. 1b). Together, these results suggested that the breast cancer cell lines produced a large repertoire of glycoproteins that reacted with SNA or AAL, including cell-surface molecules poised to be shed or released.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f1.jpg
Breast cancer cell lines have a complex repertoire of SNA-reactive glycoproteins and exhibit cell surface staining with this lectin

(A) Lysates from a panel of 8 breast cancer cell lines, which included triple negative (1–6) and luminal (7, 8) subtypes, were electrophoretically separated, transferred to nitrocellulose, and probed with SNA. Lane 1. MDAMB468, 2. HCC38, 3. HCC1500, 4. HS578T, 5. MDAMB157, 6. MDAMB231, 7. T47D, 8. UCC812. (B) Non-permeabilized HS578T cells were stained with fluorescein-conjugated SNA and imaged by fluorescence microscopy (magnification 60x).

Next, we used this workflow to compare CM samples from 5 luminal and 5 triple negative breast cancer cell lines to identify subtype-specific glycosites. The cells, listed in Table 1, are members of a well-annotated collection that have been used to define the gene expression profiles, drug sensitivities, and protein splicing patterns of the tumor types from which they were derived 20, 21, 23. Contrary to many other lectin-based approaches, the affinity capture step was performed at the glycopeptide, rather than the protein level, which decreased non-specific binding due to hydrophobic interactions, a phenomenon that we previously observed between lectins and intact proteins. Thus, the samples were trypsin-digested prior to HPLC separation on lectin-conjugated POROS. Then, the bound fraction was treated with peptide N-glycosidase F (PNGase F) to remove N-linked glycans prior to LC-MS/MS analyses. The results were analyzed using two search engines, ProteinPilot and Mascot, to identify peptides and their corresponding proteins 28. N-glycosylates were identified as described in the methods 29. Finally, each MS/MS spectrum was manually inspected for the presence of at least one fragment ion that encompassed an N-glycosylation site. Thus, this method identified the glycosite that carries an oligosaccharide with a lectin-binding motif and the corresponding protein. These rigorous criteria were key to making this method highly reproducible 24.

We know from our participation in the Clinical Proteomic Technologies for Cancer (CPTAC) network that analysis of the same sample at multiple sites on different platforms is one way to maximize identifications and test the robustness of a workflow 35, 36. The experimental strategy we used, which exploited this observation, is depicted in Fig. 2. CM samples were trypsin-digested and aliquoted at a single site (Fig. 2A). Lectin enrichment and LC-MS/MS analyses were carried out according to a Standard Operating Procedure (SOP, Supplemental Document 1) at each of three locations—University of California San Francisco, Buck Institute for Research on Aging, and Purdue University (Fig. 2B). Prior to initiating the study, each group evaluated the lectin capture step using a National Institute of Standards and Technology (NIST) human pooled plasma sample, which we have extensively characterized with respect to the SNA and AAL chromatographic profiles and the glycosite composition of the bound fractions 24. MS analyses yielded glycosite identifications and percent enrichment values (total glycopeptides/total peptides) within the expected range 24.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f2.jpg
The experimental workflow

(A) CM samples from breast cancer cell lines established from five luminal and 5 triple negative tumors were prepared in one laboratory, then distributed to 3 sites. (B) Each group separated the 10 CM samples, in duplicate, by AAL or SNA chromatography, which generated 40 fractions. The samples were deglycosylated using PNGaseF and analyzed in duplicate by LC-MS/MS, yielding a total of 80 MS/MS data sets per site. (C) Files were transferred to a central location for bioinformatic analyses.

Two groups, M and X, acquired data using a QSTAR Elite QqTOF (AB Sciex), while the third, S, used an LTQ-Orbitrap (Thermo Fisher Scientific). The datasets were submitted to Site M, where all the searches and bioinformatic analyses were completed (Fig. 2C). As the work progressed, two changes to the protocol were implemented. First, due to technical problems encountered during the initial analysis, a second preparation of CM samples was analyzed at two of the three locations (M and S). Second, sites M and S replaced ZipTips® with spin-cartridges for the desalting step that followed PNGase F digestion. This change was made in response to the fact that, in initial experiments, Site S routinely identified significantly more glycosites using this desalting method. All peptides and glycopeptides observed in these experiments are presented as supplemental data (Supplementary Table 1).

Identification of >500 cell-surface or secreted glycoproteins

We tabulated the MS identifications according to the CM samples in which they were detected. Summaries of the data, including the number of glycoproteins, glycopeptides and N-glycosites observed in each CM sample, and the percent glycopeptide enrichment, are shown in Figs. 3 and and4,4, and in Supplementary Table 2. Overall the three groups identified a total of 1011 distinct N-glycosites from 533 glycoproteins. Of these, 945 and 641 were observed following AAL and SNA chromatography, respectively. Interestingly, the same workflow applied to pooled healthy human plasma resulted in many fewer identifications. Approximately half the species captured from CM bound to both lectins; the remainder preferentially interacted either with AAL or SNA. (Fig. 3A). A similar phenomenon was observed when the N-glycosites were grouped according to tumor subtype (Fig. 3B and C). Thus, it was clear that employing multiple lectins in our workflow resulted in a greater number of identifications. Furthermore, the data showed that the luminal and triple negative samples contained substantially different lectin-reactive species.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f3.jpg
Diagrammatic summary of the glycosite (glycoprotein) enrichment data according to lectin type (AAL vs. SNA) and CM samples (luminal vs. triple negative) showed distinct and overlapping specificities

(A) The intersecting circles depict the total number of N-glycosites (glycoproteins) captured by each lectin. (B and C) Venn diagrams illustrating the chromatographic separation of luminal (LUM) and triple negative (TN) CM samples.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f4.jpg
Lectin capture resulted in significant glycopeptide enrichment

The percent enrichment for the separations performed using AAL (left) or SNA (right) at Sites M (top), X (middle), and S (bottom). The dark line indicates the median; the box depicts the first and third quartiles; the whiskers show the minimum and maximum values observed. Sites M and X acquired data using QSTAR Elite instruments, while Site S used an Orbitrap mass spectrometer.

An overall comparison of the data obtained for luminal and triple negative samples across the three sites showed relatively high levels of enrichment in both cases (Fig. 4). Importantly, very few intracellular proteins were identified, additional evidence that the cells were not undergoing apoptosis. Approximately 90% of the glycoproteins observed reside either at the cell surface (59%) or in the extracellular matrix (29%), suggesting that our strategy of using CM as a source of secreted and/or shed glycoproteins was successful (Fig. 5). Since we wanted to identify candidate cancer biomarkers, we were interested to find that a number of the identified species have functions that are relevant to tumor biology. For example, we observed proteinases, including cathepsins and ADAM family members; adhesion molecules, including cadherins and integrins; extracellular matrix components, including decorin and SPARC; and cytokines, including leukemia inhibitory factor and vascular endothelial growth factor C. Furthermore, some of the glycoproteins had been previously identified as putative breast cancer biomarkers, including CD44, galectin-3 binding protein, insulin-like growth factor binding protein 3, and tissue inhibitor of metalloproteinase 1 3739. We also identified clinically useful markers, such as HER2/ErbB2, and the CA-125 antigen, MUC16, which is commonly used to screen for ovarian cancer, but can be also be upregulated in breast tumors 40, 41.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f5.jpg
Nearly 90% of identified glycoproteins resided in the plasma membrane or extracellular compartments

A portion (241/560) of the identified glycoproteins were annotated in the cellular component of Gene Ontology. Of these, the great majority were cell surface or secreted molecules.

Identification of putative glycosite biomarkers of triple negative breast cancers

Next, we used statistical analyses to generate a list of putative triple negative-specific glycosites. Specifically, we performed a statistical analysis using resampling methods that tested 20,000 random permutations of the data. This process generated a table (Supplementary Table 3) with the number of “triple negative-specific” glycosites expected at random for any given set of selection criteria (e.g., observed in “≥1 triple negative and 0 luminal” or “≥4 triple negative and 1 luminal”). This analysis allowed us to select parameters that maximized the identification of putative triple negative specific glycosites while controlling the FDR. In this context, we required that a glycosite be identified at least once in CM samples from ≥3 triple negative cell lines and not observed in luminal CMs. Using these criteria, the computed FDR for both lectin capture steps was ~15%. This yielded 49 candidates that bound to SNA and 76 that bound to AAL (Fig. 6). Of these, we removed glycosites from highly polymorphic HLA class I histocompatibility antigens, which are variably expressed in the population. The final list of 100 glycosites, from 83 glycoproteins, that were putative triple negative-specific candidates is shown in Table 2.

An external file that holds a picture, illustration, etc.
Object name is nihms363862f6.jpg
Putative triple negative-specific glycosites (glycoproteins) enriched by AAL or SNA

The criteria applied were detection in ≥3 triple negative and 0 luminal cell line CMs.

Table 2

Putative triple negative-specific glycosites captured by AAL and SNA.

Gene nameEntry nameAccession
Numbera
GlycoproteinGlycosite(s)Observed
TN AALb
Observed
TN SNAb
FunctioncKnown
Fucosylation or
Sialylationd
NT5E5NTD_HUMAN{"type":"entrez-protein","attrs":{"text":"P21589","term_id":"112825","term_text":"P21589"}}P215895′-nucleotidaseN33332Nucleotidase
APPA4_HUMAN{"type":"entrez-protein","attrs":{"text":"P05067","term_id":"112927","term_text":"P05067"}}P05067Amyloid beta A4 proteinN57131Adhesion moleculeYes (58, 59)
ANPEPAMPN_HUMAN{"type":"entrez-protein","attrs":{"text":"P15144","term_id":"143811362","term_text":"P15144"}}P15144Aminopeptidase NN26503Protease
ANTXR1ANTR1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q96EC6","term_id":"74731549","term_text":"Q96EC6"}}Q96EC6Anthrax toxin receptor 1N18442Adhesion molecule
B3GNT2B3GN2_HUMANB3GNT2UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 2N17331Glycosyltransferase
BMP1BMP1_HUMAN{"type":"entrez-protein","attrs":{"text":"P13497","term_id":"13124688","term_text":"P13497"}}P13497Bone morphogenetic protein 1N9141ProteaseYes (60)
BMP1BMP1_HUMAN{"type":"entrez-protein","attrs":{"text":"P13497","term_id":"13124688","term_text":"P13497"}}P13497Bone morphogenetic protein 1N14231ProteaseYes (60)
BTDBTD_HUMAN{"type":"entrez-protein","attrs":{"text":"P43251","term_id":"226693503","term_text":"P43251"}}P43251BiotinidaseN15043MetabolismYes (61)
CDH2CADH2_HUMAN{"type":"entrez-protein","attrs":{"text":"P19022","term_id":"116241277","term_text":"P19022"}}P19022Cadherin-2N32530Adhesion moleculeYes (62)
CDH2CADH2_HUMAN{"type":"entrez-protein","attrs":{"text":"P19022","term_id":"116241277","term_text":"P19022"}}P19022Cadherin-2N40230Adhesion moleculeYes (62)
CDH2CADH2_HUMAN{"type":"entrez-protein","attrs":{"text":"P19022","term_id":"116241277","term_text":"P19022"}}P19022Cadherin-2N69233Adhesion moleculeYes (62)
CTSBCATB_HUMAN{"type":"entrez-protein","attrs":{"text":"P07858","term_id":"68067549","term_text":"P07858"}}P07858Cathepsin BN3854ProteaseYes (63)
CTSL1CATL1_HUMAN{"type":"entrez-protein","attrs":{"text":"P07711","term_id":"115741","term_text":"P07711"}}P07711Cathepsin L1N22143Protease
CTSL2CATL2_HUMAN{"type":"entrez-protein","attrs":{"text":"O60911","term_id":"12644075","term_text":"O60911"}}O60911Cathepsin L2N22132Protease
CCDC80CCD80_HUMAN{"type":"entrez-protein","attrs":{"text":"Q8R2G6","term_id":"143955296","term_text":"Q8R2G6"}}Q8R2G6Coiled-coil domain-containing protein 80N66731Adhesion molecule
CCDC80CCD80_HUMAN{"type":"entrez-protein","attrs":{"text":"Q8R2G6","term_id":"143955296","term_text":"Q8R2G6"}}Q8R2G6Coiled-coil domain-containing protein 80N66843Adhesion molecule
CD109CD109_HUMAN{"type":"entrez-protein","attrs":{"text":"Q6YHK3","term_id":"117949389","term_text":"Q6YHK3"}}Q6YHK3CD109 antigeneN6831TGF-beta pathway
CD109CD109_HUMAN{"type":"entrez-protein","attrs":{"text":"Q6YHK3","term_id":"117949389","term_text":"Q6YHK3"}}Q6YHK3CD109 antigeneN39740TGF-beta pathway
CD44CD44_HUMAN{"type":"entrez-protein","attrs":{"text":"P16070","term_id":"308153615","term_text":"P16070"}}P16070CD44 antigeneN2535Adhesion moleculeYes (64)
CGB1CGB1_HUMAN{"type":"entrez-protein","attrs":{"text":"A6NKQ9","term_id":"193806756","term_text":"A6NKQ9"}}A6NKQ9Choriogonadotropin subunit beta variant 1N6330HormoneYes (65)
CLIC1CLIC1_HUMAN{"type":"entrez-protein","attrs":{"text":"O00299","term_id":"12643390","term_text":"O00299"}}O00299Chloride intracellular channel protein 1N4242Ion channel
CLUCLUS_HUMAN{"type":"entrez-protein","attrs":{"text":"P10909","term_id":"116533","term_text":"P10909"}}P10909ClusterineN35443ReceptorYes (66)
COL1A1CO1A1_HUMAN{"type":"entrez-protein","attrs":{"text":"P02452","term_id":"296439504","term_text":"P02452"}}P02452Collagen alpha-1 (I) chainN136523ECM
COL5A1CO5A1_HUMAN{"type":"entrez-protein","attrs":{"text":"P20908","term_id":"85687376","term_text":"P20908"}}P20908Collagen alpha-1 (V) chainN17634ECM
COL6A1CO6A1_HUMAN{"type":"entrez-protein","attrs":{"text":"P12109","term_id":"125987811","term_text":"P12109"}}P12109Collagen alpha-1 (VI) chainN80433ECM
COL6A2CO6A2_HUMAN{"type":"entrez-protein","attrs":{"text":"P12110","term_id":"125987812","term_text":"P12110"}}P12110Collagen alpha-2 (VI) chainN14023ECM
COL6A2CO6A2_HUMAN{"type":"entrez-protein","attrs":{"text":"P12110","term_id":"125987812","term_text":"P12110"}}P12110Collagen alpha-2 (VI) chainN78533ECM
COL12A1COCA1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q99715","term_id":"146345397","term_text":"Q99715"}}Q99715Collagen alpha-1 (XII) chainN267944ECM
COL18A1COIA1_HUMAN{"type":"entrez-protein","attrs":{"text":"P39060","term_id":"215274264","term_text":"P39060"}}P39060Collagen alpha-1 (XVIII) chainN92630ECM
CPVLCPVL_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9H3G5","term_id":"67476930","term_text":"Q9H3G5"}}Q9H3G5Probable serine carboxypeptidase CPVLN34631Protease
CRIM1CRIM1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9NZV1","term_id":"67460590","term_text":"Q9NZV1"}}Q9NZV1Cysteine-rich motor neuron 1 proteinN7133Receptor
CRTAPCRTAP_HUMAN{"type":"entrez-protein","attrs":{"text":"O75718","term_id":"17372894","term_text":"O75718"}}O75718Cartilage-associated proteinN8732ECM
DCBLD1DCBD1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q8N8Z6","term_id":"50400555","term_text":"Q8N8Z6"}}Q8N8Z6Discoidin, CUB and LCCL domain-containing protein 1N12442Unknown
DKK3DKK3_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9UBP4","term_id":"311033372","term_text":"Q9UBP4"}}Q9UBP4Dickkopf-related protein 3N9631Wnt signaling pathway
ECE1ECE1_HUMAN{"type":"entrez-protein","attrs":{"text":"P42892","term_id":"1706563","term_text":"P42892"}}P42892Endothelin-converting enzyme 1N16630Protease
ECM1ECM1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q16610","term_id":"48429255","term_text":"Q16610"}}Q16610Extracellular matrix protein 1eN44433Angiogenesis
EXT1EXT1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q16394","term_id":"20141422","term_text":"Q16394"}}Q16394Exostosin-1N33042GAG synthesis
EXT2EXT2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q93063","term_id":"3023739","term_text":"Q93063"}}Q93063Exostosin-2N28830GAG synthesis
FAT1FAT1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q14517","term_id":"334302792","term_text":"Q14517"}}Q14517Protocadherin Fat 1N232830Adhesion moleculeYes (67)
FBN1FBN1_HUMAN{"type":"entrez-protein","attrs":{"text":"P35555","term_id":"1613836596","term_text":"P35555"}}P35555Fibrillin-1N158143TGF-beta pathway
FBN1FBN1_HUMAN{"type":"entrez-protein","attrs":{"text":"P35555","term_id":"1613836596","term_text":"P35555"}}P35555Fibrillin-1N448 and N276744TGF-beta pathway
FN1FINC_HUMAN{"type":"entrez-protein","attrs":{"text":"P02751","term_id":"1767132020","term_text":"P02751"}}P02751FibronectinN43033ECMYes (68)
FSTL1FSTL1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q12841","term_id":"2498390","term_text":"Q12841"}}Q12841Follistatin-related protein 1N17545Cell growth DifferentiationYes (69)
FSTL1FSTL1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q12841","term_id":"2498390","term_text":"Q12841"}}Q12841Follistatin-related protein 1N18035Cell growth, DifferentiationYes (69)
FSTL3FSTL3_HUMAN{"type":"entrez-protein","attrs":{"text":"O95633","term_id":"23821565","term_text":"O95633"}}O95633Follistatin-related protein 3N21543TGF-beta pathway
FSTFST_HUMAN{"type":"entrez-protein","attrs":{"text":"P19883","term_id":"23831079","term_text":"P19883"}}P19883FollistatinN28832Hormonal regulationYes (70)
SERPINE2GDN_HUMAN{"type":"entrez-protein","attrs":{"text":"P07093","term_id":"121110","term_text":"P07093"}}P07093Glia-derived nexinN11843Protease inhibitor
GCNT2GNT2A_HUMAN{"type":"entrez-protein","attrs":{"text":"Q8N0V5","term_id":"74714686","term_text":"Q8N0V5"}}Q8N0V5N-acetyllactosaminide beta-1,6-N-acetylglucosaminyl-transferase, isoform AN4132Glycosyltransferase
GPC1GPC1_HUMAN{"type":"entrez-protein","attrs":{"text":"P35052","term_id":"292495012","term_text":"P35052"}}P35052Glypican-1N7954GAG
GRNGRN_HUMAN{"type":"entrez-protein","attrs":{"text":"P28799","term_id":"77416865","term_text":"P28799"}}P28799GranulinsN23654Cytokine
HSPA13HSP13_HUMAN{"type":"entrez-protein","attrs":{"text":"P48723","term_id":"1351125","term_text":"P48723"}}P48723Heat shock 70 kDa protein 13N18443ATPase
IGFBP3IBP3_HUMAN{"type":"entrez-protein","attrs":{"text":"P17936","term_id":"146327827","term_text":"P17936"}}P17936Insulin-like growth factor-binding protein 3N19954Cell growth, Differentiation
ICAM5ICAM5_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9UMF0","term_id":"296439327","term_text":"Q9UMF0"}}Q9UMF0Intercellular adhesion molecule 5N64632Adhesion moleculeYes (71)
ITGA3ITA3_HUMAN{"type":"entrez-protein","attrs":{"text":"P26006","term_id":"347595830","term_text":"P26006"}}P26006Integrin alpha-3N26530Adhesion moleculeYes (72)
ITGA5ITA5_HUMAN{"type":"entrez-protein","attrs":{"text":"P08648","term_id":"23831237","term_text":"P08648"}}P08648Integrin alpha-5N86831Adhesion molecule
ITGB1ITB1_HUMAN{"type":"entrez-protein","attrs":{"text":"P05556","term_id":"218563324","term_text":"P05556"}}P05556Integrin beta-1N52043Adhesion moleculeYes (72)
ITGB1ITB1_HUMAN{"type":"entrez-protein","attrs":{"text":"P05556","term_id":"218563324","term_text":"P05556"}}P05556Integrin beta-1N66943Adhesion moleculeYes (72)
JAG1JAG1_HUMAN{"type":"entrez-protein","attrs":{"text":"P78504","term_id":"20455033","term_text":"P78504"}}P78504Protein jagged-1N21740Cell growth, Differentiation
LAMC1LAMC1_HUMAN{"type":"entrez-protein","attrs":{"text":"P11047","term_id":"224471885","term_text":"P11047"}}P11047Laminin subunit gamma-1N120543ECM
LAMC1LAMC1_HUMAN{"type":"entrez-protein","attrs":{"text":"P11047","term_id":"224471885","term_text":"P11047"}}P11047Laminin subunit gamma-1N139553ECM
LIFLIF_HUMAN{"type":"entrez-protein","attrs":{"text":"P09056","term_id":"126280","term_text":"P09056"}}P09056Leukemia inhibitory factorN8530Cell growth, Differentiation
LOXL2LOXL2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9Y4K0","term_id":"13878585","term_text":"Q9Y4K0"}}Q9Y4K0Lysyl oxidase homolog 2N28843ECM cross-linking
LOXLYOX_HUMAN{"type":"entrez-protein","attrs":{"text":"P28300","term_id":"417269","term_text":"P28300"}}P28300Protein-lysine 6-oxidaseN8142ECM cross-linking
LOXLYOX_HUMAN{"type":"entrez-protein","attrs":{"text":"P28300","term_id":"417269","term_text":"P28300"}}P28300Protein-lysine 6-oxidaseN14441ECM cross-linking
METMET_HUMAN{"type":"entrez-protein","attrs":{"text":"P08581","term_id":"251757497","term_text":"P08581"}}P08581Hepatocyte growth factor receptorN10630Cell growth, Differentiation
MFGE8MFGM_HUMAN{"type":"entrez-protein","attrs":{"text":"Q08431","term_id":"1476413346","term_text":"Q08431"}}Q08431LactadherinN32534Tissue homeostasisYes (73)
MICAMICA_HUMAN{"type":"entrez-protein","attrs":{"text":"Q29983","term_id":"74740024","term_text":"Q29983"}}Q29983MHC class I polypeptide-related sequence AN7943Immune regulator
MRC2MRC2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9UBG0","term_id":"317373394","term_text":"Q9UBG0"}}Q9UBG0C-type mannose receptor 2N49741ECM remodeling
OLFML3OLFL3_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9NRN5","term_id":"37999795","term_text":"Q9NRN5"}}Q9NRN5Olfactomedin-like protein 3N17743Development
LEPRE1P3H1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q32P28","term_id":"109892809","term_text":"Q32P28"}}Q32P28Prolyl 3-hydroxylase 1N54044GAG
SERPINF1PEDF_HUMAN{"type":"entrez-protein","attrs":{"text":"P36955","term_id":"313104314","term_text":"P36955"}}P36955Pigment epithelium-derived factoreN28514Cell growth, Differentiation
PLOD2PLOD2_HUMAN{"type":"entrez-protein","attrs":{"text":"O00469","term_id":"62906878","term_text":"O00469"}}O00469Procollagen-lysine, 2-oxoglutarate 5-dioxygenase 2N6341ECM cross-linking
PLOD3PLOD3_HUMAN{"type":"entrez-protein","attrs":{"text":"O60568","term_id":"6093731","term_text":"O60568"}}O60568Procollagen-lysine, 2-oxoglutarate 5-dioxygenase 3N54830ECM cross-linking
PLTPPLTP_HUMAN{"type":"entrez-protein","attrs":{"text":"P55058","term_id":"1709662","term_text":"P55058"}}P55058Phospholipid transfer proteinN14333Lipid metabolism
PLTPPLTP_HUMAN{"type":"entrez-protein","attrs":{"text":"P55058","term_id":"1709662","term_text":"P55058"}}P55058Phospholipid transfer proteinN39833Lipid metabolism
POSTNPOSTN_HUMAN{"type":"entrez-protein","attrs":{"text":"Q15063","term_id":"93138709","term_text":"Q15063"}}Q15063PeriostinN59943Adhesion molecule
PPGBPPGB_HUMAN{"type":"entrez-protein","attrs":{"text":"P10619","term_id":"20178316","term_text":"P10619"}}P10619Lysosomal protective proteinN33314Glycan degradation
PRNPPRIO_HUMAN{"type":"entrez-protein","attrs":{"text":"P04156","term_id":"130912","term_text":"P04156"}}P04156Major prion proteinN18154UnknownYes (74)
PTK7PTK7_HUMAN{"type":"entrez-protein","attrs":{"text":"Q13308","term_id":"116242736","term_text":"Q13308"}}Q13308Tyrosine-protein kinase-like 7N40551Adhesion molecule
PTK7PTK7_HUMAN{"type":"entrez-protein","attrs":{"text":"Q13308","term_id":"116242736","term_text":"Q13308"}}Q13308Tyrosine-protein kinase-like 7N56753Adhesion molecule
PVRPVR_HUMAN{"type":"entrez-protein","attrs":{"text":"P15151","term_id":"1346922","term_text":"P15151"}}P15151Poliovirus receptorN12043Immune regulator
SEZ6L2SE6L2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q6UXD5","term_id":"334302856","term_text":"Q6UXD5"}}Q6UXD5Seizure 6-like protein 2N24732
SEZ6L2SE6L2_HUMAN{"type":"entrez-protein","attrs":{"text":"Q6UXD5","term_id":"334302856","term_text":"Q6UXD5"}}Q6UXD5Seizure 6-like protein 2N37331Unknown
SPARCSPRC_HUMAN{"type":"entrez-protein","attrs":{"text":"P09486","term_id":"129283","term_text":"P09486"}}P09486SPARC (Osteonectin)N11653Cell growth, DifferentiationYes (75)
SUSD5SUSD5_HUMAN{"type":"entrez-protein","attrs":{"text":"O60279","term_id":"182676443","term_text":"O60279"}}O60279Sushi domain-containing protein 5N35444Unknown
ABI3BPTARSH_HUMAN{"type":"entrez-protein","attrs":{"text":"Q7Z7G0","term_id":"50401533","term_text":"Q7Z7G0"}}Q7Z7G0Target of Nesh-SH3N4432Cell migration
TFPITFPI1_HUMAN{"type":"entrez-protein","attrs":{"text":"P10646","term_id":"125932","term_text":"P10646"}}P10646Tissue factor pathway inhibitorN14554Complement cascadeYes (76)
TGFB1TGFB1_HUMAN{"type":"entrez-protein","attrs":{"text":"P01137","term_id":"135674","term_text":"P01137"}}P01137Transforming growth factor beta-1N8241TGF-beta pathwayYes (77)
TGFB2TGFB2_HUMAN{"type":"entrez-protein","attrs":{"text":"P61812","term_id":"48429157","term_text":"P61812"}}P61812Transforming growth factor beta-2N24154TGF-beta pathway
THBS3TSP3_HUMAN{"type":"entrez-protein","attrs":{"text":"P49746","term_id":"1717814","term_text":"P49746"}}P49746Thrombospondin-3N40732Adhesion molecule
TWSG1TWSG1_HUMAN{"type":"entrez-protein","attrs":{"text":"Q9GZX9","term_id":"74733506","term_text":"Q9GZX9"}}Q9GZX9Twisted gastrulation protein homolog 1N5232Cell growth, Differentiation
TXNDC15TXD15_HUMAN{"type":"entrez-protein","attrs":{"text":"Q96J42","term_id":"74732127","term_text":"Q96J42"}}Q96J42Thioredoxin domain-containing protein 15N29332Unknown
AXLUFO_HUMAN{"type":"entrez-protein","attrs":{"text":"P30530","term_id":"1375383940","term_text":"P30530"}}P30530Tyrosine-protein kinase receptor UFON4344Receptor
AXLUFO_HUMAN{"type":"entrez-protein","attrs":{"text":"P30530","term_id":"1375383940","term_text":"P30530"}}P30530Tyrosine-protein kinase receptor UFON15742Receptor
AXLUFO_HUMAN{"type":"entrez-protein","attrs":{"text":"P30530","term_id":"1375383940","term_text":"P30530"}}P30530Tyrosine-protein kinase receptor UFON19842Receptor
AXLUFO_HUMAN{"type":"entrez-protein","attrs":{"text":"P30530","term_id":"1375383940","term_text":"P30530"}}P30530Tyrosine-protein kinase receptor UFON33943Receptor
PLAURUPAR_HUMAN{"type":"entrez-protein","attrs":{"text":"Q03405","term_id":"465003","term_text":"Q03405"}}Q03405Urokinase plasminogen activator surface receptorN22223ECM remodelingYes (78)
PLAUUROK_HUMAN{"type":"entrez-protein","attrs":{"text":"P00749","term_id":"254763341","term_text":"P00749"}}P00749Urokinase-type plasminogen activatorN32244ECM remodeling
Not availableYK047_HUMAN{"type":"entrez-protein","attrs":{"text":"Q68D85","term_id":"74708829","term_text":"Q68D85"}}Q68D85Putative Ig-like domain-containing proteinN24230Unknown
Uniprot accession number.
Number of triple-negative (TN) cell lines in which a glycosite was observed with this lectin.
Uniprot database annotation.
References in parentheses.
Denotes glycoproteins observed in healthy plasma following AAL or SNA enrichment (24).

Next, we asked whether the glycosites we identified could have been predicted from transcriptome analyses. To answer this question, we used existing exon expression array profiles for all of the cell lines and RNAseq data for 9 of the 10. Since the two platforms identified similar sets of differentially expressed genes, we performed statistical analyses using values from the RNAseq experiments, which are better able to differentiate signal from noise (Supplementary Table 4). These analyses showed that 46 of the 83 mRNAs encoding the protein scaffolds that carried biomarker glycosites were upregulated ≥ 2-fold in triple negative vs. luminal cells. This suggested that the differential detection of these glycosites in triple negative CM samples may have been attributable to differences in relative protein abundances. In contrast, more than half of the triple negative-specific candidates could not have been predicted from the mRNA expression data, as there was no difference in mRNA abundances between the luminal and triple negative subsets. The identification of these glycosites may have been driven by alterations in the protein glycosylation machinery of triple negative cell lines. To address this possibility, we looked for differences in mRNA levels of the transferases that add fucose (recognized by AAL), and sialic acid (recognized by SNA). The results are shown in Supplementary Table 5. Two fucosyltransferases and 8 sialyltransferases were differentially expressed, either up or downregulated, in triple negative vs. luminal cell lines. Given that we observed both gains and losses of enzymatic activity, it is difficult to predict, in structural terms, the net consequences of these changes. However, our glycosite data are empirical evidence of subtype-specific glycosylation patterns in breast cancer.

Disease relevance of biomarker scaffolds

Initial inspection of the 100 triple negative-specific candidates showed that many targets were derived from glycoproteins that are involved in cancer-relevant processes. To more fully explore this correlation, we performed pathway analyses using two bioinformatics resources: Kyoto Encyclopedia of Genes and Genomes (KEGG) and Ingenuity (IPA). However, the programs recognized only small portions of the dataset, together matching 38% of the total proteins (Supplementary Tables 6 and 7), and most of the results were driven by only a few molecules, e.g., integrins. As an alternative, literature searches enabled assignment of biological functions to 90% of the putative triple negative-specific glycoproteins. Three prominent, interrelated themes emerged—38% of the targets were up- or downstream components of the TGFβ pathway; 21% were involved in ECM remodeling; and at least 18% were proteinases or proteolytic targets. Minor recurring associations included the epithelial to mesenchymal transition (EMT, 9%) and bone morphogenic protein signaling (6%).

TGFβ signaling governs important aspects of ECM remodeling and proteinase activities. Through the synthesis, cross-linking, and degradation of a variety of protein and carbohydrate matrix components, the composition and tensile strength of the ECM are modulated, both of which dramatically influence the behavior of surrounding cells 42, 43. With respect to cancer, these activities are strongly associated with increased migration and invasion. TGFβ is also considered to be a central mediator of EMT, through both canonical (i.e., Smad-dependent) and non-canonical (e.g., PI3K and MAPK) pathways 44. Cells undergoing EMT lose apical-basal polarity and stabilizing adhesive epithelial interactions in exchange for the acquisition of a more migratory mesenchymal phenotype. These changes can lead to cell invasion and metastasis, functions that have been linked to TGFβ activity 45, 46. Thus, as a group, the putative triple negative-specific targets we identified were derived from proteins with striking functional similarities and disease relevance 47. It is possible that these biomarker candidates may also suggest subtype-specific clinical targets, which currently do not exist for triple negative breast cancer 18, 19.

Clinical relevance of putative biomarker targets

The heterogeneous nature of breast cancer is widely accepted 13. Tumor subtyping is commonly based on immunohistochemical analyses of tissue sections cut from biopsies to profile expression of a marker panel—ER, PR, HER2, cytokeratin 5/6 and epidermal growth factor receptor. Increasingly, clinicians are using this information to determine prognoses and optimize treatment 48. For example, the risk prediction tool Adjuvant!Online (www.adjuvantonline.com) can be used to identify the patients who will benefit most from postoperative treatment(s). Although immunohistology-based diagnoses are changing the clinical oncology landscape and improving patient outcomes, there remains much room for advancement. Currently, subtype diagnoses require identification of a lesion, and an invasive procedure to obtain a biopsy. Therefore, the need for circulating biomarkers that serve as sentinels of breast cancer and enable subtyping remains great.

In this context, our biomarker discovery method used cancer cell line CM, i.e., the secretome, as the starting material to identify candidate glycoproteins that carried putative subtype-specific N-glycosites. For the enrichment step, we used lectin capture at the glycopeptide, rather than glycoprotein level. This approach gives more information, in terms of glycan composition and location along the peptide backbone, than other commonly used related methods (e.g., lectin chromatography at the glycoprotein level, and hydrazide- or boronic acid-mediated chemical capture of glycoproteins/glycopeptides) 24. Accordingly, we interrogated a largely unexplored biomarker discovery space. This theory is substantiated by the fact that only four of the targets that we identified were among the 150 most abundant plasma proteins as described by Hortin et al. 49. Furthermore, only 52 of the targets were among the recently published high-confidence human plasma proteome that included estimated protein concentrations 50. Of those found in this dataset, 73% were predicted to be <50 ng/mL, while 40% were likely to be <10 ng/mL, reasonably low background levels against which to observe circulating disease-derived signals. As additional support for this concept, only six of the putative triple negative-specific N-glycosites from five glycoproteins were found in a previous study in which we used the same workflows and AAL or SNA chromatography to analyze a sample of NIST pooled human plasma from 100 healthy individuals 24. These included glycosites from CD109, CD44, clusterin, extracellular matrix protein 1, and pigment epithelium-derived factor.

In summary, the workflow that we developed could serve as a blueprint for biomarker discovery. In this paradigm, an initial candidate list is developed using an easily obtained renewable material, such as cell line CM, rather than valuable, and often difficult to obtain, clinical samples such as plasma or serum. As studies that employ targeted enrichment strategies are considerably more sensitive than shotgun proteomics methods, the ability to generate a candidate biomarker list from a biologically-relevant source significantly improves the chances of success during the subsequent verification stage 51. This method may be especially useful for diseases, such as ovarian cancer, for which the cell type of origin is uncertain and, consequently, it is difficult to choose control samples 52, 53. A limitation of the method is that O-linked and intact N-linked glycopeptides are not analyzed due to the absence of universal enzymes to remove carbohydrates and the lack of sufficiently powerful software for rapid identifications, respectively. However, we do not view this as a liability. This workflow was designed as a high-throughput platform to generate biomarker candidates for subsequent verification by MRM. In general, due to heterogeneity, endogenous glycopeptides make poor MRM targets. By contrast, our method yielded a list of putative biomarker targets for direct follow up in clinical samples, and is easily accessible to any laboratory performing proteomics. Indeed, several groups have recently employed similar methods to identify candidate biomarkers of various cancers including prostate, colon, thyroid and breast 5457. Interestingly, a few of the biomarkers that we identified were also observed in the latter study, suggesting that this general approach is reproducible and robust 54. Finally, this workflow is well suited to the development of a multiplexed clinical assay, analogous to a reverse protein array approach, with antibody capture as the first step and lectin binding as the second.

Supplementary Material

10_si_010

11_si_011

12_si_012

1_si_001

2_si_002

3_si_003

4_si_004

5_si_005

6_si_006

7_si_007

8_si_008

9_si_009

10_si_010

Click here to view.(48K, xls)

11_si_011

Click here to view.(27K, xls)

12_si_012

Click here to view.(28K, xls)

1_si_001

Click here to view.(143K, pdf)

2_si_002

Click here to view.(498K, pdf)

3_si_003

Click here to view.(433K, xls)

4_si_004

Click here to view.(3.6M, pdf)

5_si_005

Click here to view.(40M, zip)

6_si_006

Click here to view.(3.1M, xls)

7_si_007

Click here to view.(42K, xls)

8_si_008

Click here to view.(30K, xls)

9_si_009

Click here to view.(64K, xls)

Acknowledgments

We thank Ms. Tiffany Sham for excellent assistance formatting tables. This work was supported by an NCRR shared instrumentation grant S10 RR024615 (BWG) and by grants from the National Cancer Institute, U24 {"type":"entrez-nucleotide","attrs":{"text":"CA126477","term_id":"35005409","term_text":"CA126477"}}CA126477 (SJF) and a U24 Subcontract (BWG) that are part of the NCI Clinical Proteomic Technologies for Cancer initiative (http://proteomics.cancer.gov). Additional support was provided by the Director, Office of Science, Office of Biological &amp; Environmental Research, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, by the National Institutes of Health, National Cancer Institute grants P50 CA 58207, the U54 CA 112970, the U24 CA 126477 and the NIH NHGRI U24 CA 126551 for JWG. A portion of the mass spectrometric analyses was performed in the UCSF Sandler-Moore Mass Spectrometry Core Facility, which acknowledges support from the Sandler Family Foundation, the Gordon and Betty Moore Foundation, and NIH/NCI Cancer Center Support Grant P30 CA082103. OLG is supported by the Canadian Institutes of Health Research and the Stand Up To Cancer-American Association for Cancer Research Dream Team Translational Cancer Research Grant SU2C-AACR-DT0409.

Department of Obstetrics, Gynecology and Reproductive Sciences, 513 Parnassus Ave., Box 0665, University of California San Francisco, San Francisco, CA 94143
Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94945
Department of Chemistry and Bindley Bioscience Center, 201 S. University St. HANS B054, Purdue University, West Lafayette, IN 47907
Bio-Nano Chemistry, Wonkwang University, 344-2 Shinyong-dong, Iksan, Jonbuk 570-749, Korea
Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR 97238
Department of Pharmaceutical Chemistry, Box 0446, University of California, San Francisco, CA 94143
To whom correspondence should be address: Susan Fisher, phone: (415) 476-5297, fax: 415-476-5623, ude.fscu.lgc@rehsifs; Bradford W. Gibson, phone: (415) 209-2032, fax: (415) 209-2231, gro.etutitsnikcub@nosbigb

Abstract

We used a lectin chromatography/MS-based approach to screen conditioned medium from a panel of luminal (less aggressive) and triple negative (more aggressive) breast cancer cell lines (n = 5/subtype). The samples were fractionated using the lectins Aleuria aurantia (AAL) and Sambucus nigra agglutinin (SNA), which recognize fucose and sialic acid, respectively. The bound fractions were enzymatically N-deglycosylated and analyzed by LC-MS/MS. In total, we identified 533 glycoproteins, ~90% of which were components of the cell surface or extracellular matrix. We observed 1011 glycosites, 100 of which were solely detected in ≥3 triple negative lines. Statistical analyses suggested that a number of these glycosites were triple negative-specific and thus potential biomarkers for this tumor subtype. An analysis of RNAseq data revealed that approximately half of the mRNAs encoding the protein scaffolds that carried potential biomarker glycosites were upregulated in triple negative vs. luminal cell lines, and that a number of genes encoding fucosyl- or sialyltransferases were differentially expressed between the two subtypes, suggesting that alterations in glycosylation may also drive candidate identification. Notably, the glycoproteins from which these putative biomarker candidates were derived are involved in cancer-related processes. Thus, they may represent novel therapeutic targets for this aggressive tumor subtype.

Abstract
Synopsis

References

  • 1. Drake PM, Cho W, Li B, Prakobphol A, Johansen E, Anderson NL, Regnier FE, Gibson BW, Fisher SJSweetening the Pot: Adding Glycosylation to the Biomarker Discovery Equation. Clin Chem. 2010;56:223–236.[Google Scholar]
  • 2. Hart GW, Copeland RJGlycomics hits the big time. Cell. 2010;143(5):672–6.[Google Scholar]
  • 3. Clowers BH, Dodds ED, Seipert RR, Lebrilla CBSite determination of protein glycosylation based on digestion with immobilized nonspecific proteases and Fourier transform ion cyclotron resonance mass spectrometry. J Proteome Res. 2007;6(10):4032–40.[PubMed][Google Scholar]
  • 4. Duffy MJ, Evoy D, McDermott EWCA 15–3: uses and limitation as a biomarker for breast cancer. Clin Chim Acta. 2010;411(23–24):1869–74.[PubMed][Google Scholar]
  • 5. Orntoft TF, Vestergaard EMClinical aspects of altered glycosylation of glycoproteins in cancer. Electrophoresis. 1999;20(2):362–71.[PubMed][Google Scholar]
  • 6. Hammarstrom SThe carcinoembryonic antigen (CEA) family: structures, suggested functions and expression in normal and malignant tissues. Semin Cancer Biol. 1999;9(2):67–81.[PubMed][Google Scholar]
  • 7. Meany DL, Zhang Z, Sokoll LJ, Zhang H, Chan DWGlycoproteomics for prostate cancer detection: changes in serum PSA glycosylation patterns. J Proteome Res. 2009;8(2):613–9.[Google Scholar]
  • 8. Moss EL, Hollingworth J, Reynolds TMThe role of CA125 in clinical practice. J Clin Pathol. 2005;58(3):308–12.[Google Scholar]
  • 9. Witz IPThe selectin-selectin ligand axis in tumor progression. Cancer Metastasis Rev. 2008;27(1):19–30.[PubMed][Google Scholar]
  • 10. Rosen SDLigands for L-selectin: homing, inflammation, and beyond. Annu Rev Immunol. 2004;22:129–56.[PubMed][Google Scholar]
  • 11. Perou CM, Borresen-Dale ALSystems Biology and Genomics of Breast Cancer. Cold Spring Harb Perspect Biol. 2010[Google Scholar]
  • 12. O’Brien KM, Cole SR, Tse CK, Perou CM, Carey LA, Foulkes WD, Dressler LG, Geradts J, Millikan RCIntrinsic breast tumor subtypes, race, and long-term survival in the Carolina Breast Cancer Study. Clin Cancer Res. 2010;16(24):6100–10.[Google Scholar]
  • 13. Espinosa E, Vara JA, Navarro IS, Gamez-Pozo A, Pinto A, Zamora P, Redondo A, Feliu JGene profiling in breast cancer: Time to move forward. Cancer Treat Rev. 2011[PubMed][Google Scholar]
  • 14. Prat A, Parker JS, Karginova O, Fan C, Livasy C, Herschkowitz JI, He X, Perou CMPhenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res. 2010;12(5):R68.[Google Scholar]
  • 15. Toft DJ, Cryns VLMinireview: Basal-like breast cancer: from molecular profiles to targeted therapies. Mol Endocrinol. 2011;25(2):199–211.[Google Scholar]
  • 16. Abramson V, Arteaga CLNew strategies in HER2-overexpressing breast cancer: Many combinations of targeted drugs available. Clin Cancer Res. 2011[Google Scholar]
  • 17. McDermott U, Settleman JPersonalized cancer therapy with selective kinase inhibitors: an emerging paradigm in medical oncology. J Clin Oncol. 2009;27(33):5650–9.[PubMed][Google Scholar]
  • 18. Yagata H, Kajiura Y, Yamauchi HCurrent strategy for triple-negative breast cancer: appropriate combination of surgery, radiation, and chemotherapy. Breast Cancer. 2011[PubMed][Google Scholar]
  • 19. Pal SK, Childs BH, Pegram MTriple negative breast cancer: unmet medical needs. Breast Cancer Res Treat. 2011;125(3):627–36.[Google Scholar]
  • 20. Lapuk A, Marr H, Jakkula L, Pedro H, Bhattacharya S, Purdom E, Hu Z, Simpson K, Pachter L, Durinck S, Wang N, Parvin B, Fontenay G, Speed T, Garbe J, Stampfer M, Bayandorian H, Dorton S, Clark TA, Schweitzer A, Wyrobek A, Feiler H, Spellman P, Conboy J, Gray JWExon-level microarray analyses identify alternative splicing programs in breast cancer. Mol Cancer Res. 2010;8(7):961–74.[Google Scholar]
  • 21. Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, Clark L, Bayani N, Coppe JP, Tong F, Speed T, Spellman PT, DeVries S, Lapuk A, Wang NJ, Kuo WL, Stilwell JL, Pinkel D, Albertson DG, Waldman FM, McCormick F, Dickson RB, Johnson MD, Lippman M, Ethier S, Gazdar A, Gray JWA collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. 2006;10(6):515–27.[Google Scholar]
  • 22. Korkola J, Gray JWBreast cancer genomes--form and function. Curr Opin Genet Dev. 2010;20(1):4–14.[Google Scholar]
  • 23. Kuo WL, Das D, Ziyad S, Bhattacharya S, Gibb WJ, Heiser LM, Sadanandam A, Fontenay GV, Hu Z, Wang NJ, Bayani N, Feiler HS, Neve RM, Wyrobek AJ, Spellman PT, Marton LJ, Gray JWA systems analysis of the chemosensitivity of breast cancer cells to the polyamine analogue PG-11047. BMC Med. 2009;7:77.[Google Scholar]
  • 24. Drake PM, Schilling B, Niles RK, Braten M, Johansen E, Liu H, Lerch M, Sorensen DJ, Li B, Allen S, Hall SC, Witkowska HE, Regnier FE, Gibson BW, Fisher SJA lectin affinity workflow targeting glycosite-specific, cancer-related carbohydrate structures in trypsin-digested human plasma. Anal Biochem. 2011;408(1):71–85.[Google Scholar]
  • 25. Yingling JM, Blanchard KL, Sawyer JSDevelopment of TGF-beta signalling inhibitors for cancer therapy. Nat Rev Drug Discov. 2004;3(12):1011–22.[PubMed][Google Scholar]
  • 26. Janatpour MJ, McMaster MT, Genbacev O, Zhou Y, Dong J, Cross JC, Israel MA, Fisher SJId-2 regulates critical aspects of human cytotrophoblast differentiation, invasion and migration. Development. 2000;127(3):549–58.[PubMed][Google Scholar]
  • 27. Keshishian H, Addona T, Burgess M, Kuhn E, Carr SAQuantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics. 2007;6(12):2212–29.[Google Scholar]
  • 28. Shilov IV, Seymour SL, Patel AA, Loboda A, Tang WH, Keating SP, Hunter CL, Nuwaysir LM, Schaeffer DAThe Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol Cell Proteomics. 2007;6(9):1638–55.[PubMed][Google Scholar]
  • 29. Krokhin OV, Antonovici M, Ens W, Wilkins JA, Standing KGDeamidation of - Asn-Gly- sequences during sample preparation for proteomics: Consequences for MALDI and HPLC-MALDI analysis. Anal Chem. 2006;78(18):6645–50.[PubMed][Google Scholar]
  • 30. Link AJ, Eng J, Schieltz DM, Carmack E, Mize GJ, Morris DR, Garvik BM, Yates JR., 3rd Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol. 1999;17(7):676–82.[PubMed]
  • 31. Edgington ES Randomization tests. 3. Marcel-Dekker; New York: 1995. [PubMed][Google Scholar]
  • 32. MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, Kern R, Tabb DL, Liebler DC, MacCoss MJSkyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics. 2010;26(7):966–8.[Google Scholar]
  • 33. Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou YC, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJ, Tai IT, Marra MAAlternative expression analysis by RNA sequencing. Nat Methods. 2010;7(10):843–7.[PubMed][Google Scholar]
  • 34. Benjamini Y, Hochberg YControlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological) 1995;57(1):289–300.[PubMed][Google Scholar]
  • 35. Addona TA, Abbatiello SE, Schilling B, Skates SJ, Mani DR, Bunk DM, Spiegelman CH, Zimmerman LJ, Ham AJ, Keshishian H, Hall SC, Allen S, Blackman RK, Borchers CH, Buck C, Cardasis HL, Cusack MP, Dodder NG, Gibson BW, Held JM, Hiltke T, Jackson A, Johansen EB, Kinsinger CR, Li J, Mesri M, Neubert TA, Niles RK, Pulsipher TC, Ransohoff D, Rodriguez H, Rudnick PA, Smith D, Tabb DL, Tegeler TJ, Variyath AM, Vega-Montoto LJ, Wahlander A, Waldemarson S, Wang M, Whiteaker JR, Zhao L, Anderson NL, Fisher SJ, Liebler DC, Paulovich AG, Regnier FE, Tempst P, Carr SAMulti-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol. 2009;27(7):633–41.[Google Scholar]
  • 36. Tabb DL, Vega-Montoto L, Rudnick PA, Variyath AM, Ham AJ, Bunk DM, Kilpatrick LE, Billheimer DD, Blackman RK, Cardasis HL, Carr SA, Clauser KR, Jaffe JD, Kowalski KA, Neubert TA, Regnier FE, Schilling B, Tegeler TJ, Wang M, Wang P, Whiteaker JR, Zimmerman LJ, Fisher SJ, Gibson BW, Kinsinger CR, Mesri M, Rodriguez H, Stein SE, Tempst P, Paulovich AG, Liebler DC, Spiegelman CRepeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J Proteome Res. 2010;9(2):761–76.[Google Scholar]
  • 37. Wu ZS, Wu Q, Yang JH, Wang HQ, Ding XD, Yang F, Xu XCPrognostic significance of MMP-9 and TIMP-1 serum and tissue expression in breast cancer. Int J Cancer. 2008;122(9):2050–6.[PubMed][Google Scholar]
  • 38. Wang Y, Ao X, Vuong H, Konanur M, Miller FR, Goodison S, Lubman DMMembrane glycoproteins associated with breast tumor cell progression identified by a lectin affinity approach. J Proteome Res. 2008;7(10):4313–25.[Google Scholar]
  • 39. Baricevic I, Masnikosa R, Lagundzin D, Golubovic V, Nedic OAlterations of insulin-like growth factor binding protein 3 (IGFBP-3) glycosylation in patients with breast tumours. Clin Biochem. 2010;43(9):725–31.[PubMed][Google Scholar]
  • 40. Bast RC, Jr, Xu FJ, Yu YH, Barnhill S, Zhang Z, Mills GBCA 125: the past and the future. Int J Biol Markers. 1998;13(4):179–87.[PubMed][Google Scholar]
  • 41. Yin BW, Lloyd KOMolecular cloning of the CA125 ovarian cancer antigen: identification as a new mucin, MUC16. J Biol Chem. 2001;276(29):27371–5.[PubMed][Google Scholar]
  • 42. Yu H, Mouw JK, Weaver VMForcing form and function: biomechanical regulation of tumor evolution. Trends Cell Biol. 2011;21(1):47–56.[Google Scholar]
  • 43. Rowe RG, Weiss SJNavigating ECM barriers at the invasive front: the cancer cell-stroma interface. Annu Rev Cell Dev Biol. 2009;25:567–95.[PubMed][Google Scholar]
  • 44. Viloria-Petit AM, Wrana JLThe TGFbeta-Par6 polarity pathway: linking the Par complex to EMT and breast cancer progression. Cell Cycle. 2010;9(4):623–4.[PubMed][Google Scholar]
  • 45. Barcellos-Hoff MH, Akhurst RJTransforming growth factor-beta in breast cancer: too much, too late. Breast Cancer Res. 2009;11(1):202.[Google Scholar]
  • 46. Bergers G, Javaherian K, Lo KM, Folkman J, Hanahan DEffects of angiogenesis inhibitors on multistage carcinogenesis in mice. Science. 1999;284(5415):808–12.[PubMed][Google Scholar]
  • 47. Hanahan D, Weinberg RAHallmarks of cancer: the next generation. Cell. 2011;144(5):646–74.[PubMed][Google Scholar]
  • 48. Blows FM, Driver KE, Schmidt MK, Broeks A, van Leeuwen FE, Wesseling J, Cheang MC, Gelmon K, Nielsen TO, Blomqvist C, Heikkila P, Heikkinen T, Nevanlinna H, Akslen LA, Begin LR, Foulkes WD, Couch FJ, Wang X, Cafourek V, Olson JE, Baglietto L, Giles GG, Severi G, McLean CA, Southey MC, Rakha E, Green AR, Ellis IO, Sherman ME, Lissowska J, Anderson WF, Cox A, Cross SS, Reed MW, Provenzano E, Dawson SJ, Dunning AM, Humphreys M, Easton DF, Garcia-Closas M, Caldas C, Pharoah PD, Huntsman DSubtyping of breast cancer by immunohistochemistry to investigate a relationship between subtype and short and long term survival: a collaborative analysis of data for 10,159 cases from 12 studies. PLoS Med. 2010;7(5):e1000279.[Google Scholar]
  • 49. Hortin GL, Sviridov D, Anderson NLHigh-abundance polypeptides of the human plasma proteome comprising the top 4 logs of polypeptide abundance. Clin Chem. 2008;54(10):1608–16.[PubMed][Google Scholar]
  • 50. Farrah T, Deutsch EW, Omenn GS, Campbell DS, Sun Z, Bletz JA, Mallick P, Katz JE, Malmstrom J, Ossola R, Watts JD, Lin B, Zhang H, Moritz RL, Aebersold RHA high-confidence human plasma proteome reference set with estimated concentrations in PeptideAtlas. Mol Cell Proteomics. 2011[Google Scholar]
  • 51. Rifai N, Gillette MA, Carr SAProtein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol. 2006;24(8):971–83.[PubMed][Google Scholar]
  • 52. Lengyel EOvarian cancer development and metastasis. Am J Pathol. 2010;177(3):1053–64.[Google Scholar]
  • 53. Vang R, Shih Ie M, Kurman RJOvarian low-grade and high-grade serous carcinoma: pathogenesis, clinicopathologic and molecular biologic features, and diagnostic problems. Adv Anat Pathol. 2009;16(5):267–82.[Google Scholar]
  • 54. Ahn Y, Kang UB, Kim J, Lee CMining of serum glycoproteins by an indirect approach using cell line secretome. Mol Cells. 2010;29(2):123–30.[PubMed][Google Scholar]
  • 55. Arcinas A, Yen TY, Kebebew E, Macher BACell surface and secreted protein profiles of human thyroid cancer cell lines reveal distinct glycoprotein patterns. J Proteome Res. 2009;8(8):3958–68.[Google Scholar]
  • 56. Rangiah K, Tippornwong M, Sangar V, Austin D, Tetreault MP, Rustgi AK, Blair IA, Yu KHDifferential secreted proteome approach in murine model for candidate biomarker discovery in colon cancer. J Proteome Res. 2009;8(11):5153–64.[Google Scholar]
  • 57. Sardana G, Jung K, Stephan C, Diamandis EPProteomic analysis of conditioned media from the PC3, LNCaP, and 22Rv1 prostate cancer cell lines: discovery and validation of candidate prostate cancer biomarkers. J Proteome Res. 2008;7(8):3329–38.[PubMed][Google Scholar]
  • 58. Akasaka-Manya K, Manya H, Sakurai Y, Wojczyk BS, Spitalnik SL, Endo TIncreased bisecting and core-fucosylated N-glycans on mutant human amyloid precursor proteins. Glycoconj J. 2008;25(8):775–86.[PubMed][Google Scholar]
  • 59. Nakagawa K, Kitazume S, Oka R, Maruyama K, Saido TC, Sato Y, Endo T, Hashimoto YSialylation enhances the secretion of neurotoxic amyloid-beta peptides. J Neurochem. 2006;96(4):924–33.[PubMed][Google Scholar]
  • 60. Garrigue-Antar L, Hartigan N, Kadler KEPost-translational modification of bone morphogenetic protein-1 is required for secretion and stability of the protein. J Biol Chem. 2002;277(45):43327–34.[PubMed][Google Scholar]
  • 61. Wolf BBiotinidase Deficiency: New Directions and Practical Concerns. Curr Treat Options Neurol. 2003;5(4):321–328.[PubMed][Google Scholar]
  • 62. Ciolczyk-Wierzbicka D, Amoresano A, Casbarra A, Hoja-Lukowicz D, Litynska A, Laidler PThe structure of the oligosaccharides of N-cadherin from human melanoma cell lines. Glycoconj J. 2004;20(7–8):483–92.[PubMed][Google Scholar]
  • 63. Takahashi T, Schmidt PG, Tang JNovel carbohydrate structures of cathepsin B from porcine spleen. J Biol Chem. 1984;259(10):6059–62.[PubMed][Google Scholar]
  • 64. Dimitroff CJ, Lee JY, Rafii S, Fuhlbrigge RC, Sackstein RCD44 is a major E-selectin ligand on human hematopoietic progenitor cells. J Cell Biol. 2001;153(6):1277–86.[Google Scholar]
  • 65. Elliott MM, Kardana A, Lustbader JW, Cole LACarbohydrate and peptide structure of the alpha- and beta-subunits of human chorionic gonadotropin from normal and aberrant pregnancy and choriocarcinoma. Endocrine. 1997;7(1):15–32.[PubMed][Google Scholar]
  • 66. Kapron JT, Hilliard GM, Lakins JN, Tenniswood MP, West KA, Carr SA, Crabb JWIdentification and characterization of glycosylation sites in human serum clusterin. Protein Sci. 1997;6(10):2120–33.[Google Scholar]
  • 67. Goldberg M, Peshkovsky C, Shifteh A, Al-Awqati Qmu-Protocadherin, a novel developmentally regulated protocadherin with mucin-like domains. J Biol Chem. 2000;275(32):24622–9.[PubMed][Google Scholar]
  • 68. Hirnle L, Katnik-Prastowska IAmniotic fibronectin fragmentation and expression of its domains, sialyl and fucosyl glycotopes associated with pregnancy complicated by intrauterine infection. Clin Chem Lab Med. 2007;45(2):208–14.[PubMed][Google Scholar]
  • 69. Miyamae T, Marinov AD, Sowders D, Wilson DC, Devlin J, Boudreau R, Robbins P, Hirsch RFollistatin-like protein-1 is a novel proinflammatory molecule. J Immunol. 2006;177(7):4758–62.[PubMed][Google Scholar]
  • 70. Hyuga M, Itoh S, Kawasaki N, Ohta M, Ishii A, Hyuga S, Hayakawa TAnalysis of site-specific glycosylation in recombinant human follistatin expressed in Chinese hamster ovary cells. Biologicals. 2004;32(2):70–7.[PubMed][Google Scholar]
  • 71. Ohgomori T, Funatsu O, Nakaya S, Morita A, Ikekita MStructural study of the N-glycans of intercellular adhesion molecule-5 (telencephalin) Biochim Biophys Acta. 2009;1790(12):1611–23.[PubMed][Google Scholar]
  • 72. Pochec E, Litynska A, Amoresano A, Casbarra AGlycosylation profile of integrin alpha 3 beta 1 changes with melanoma progression. Biochim Biophys Acta. 2003;1643(1–3):113–23.[PubMed][Google Scholar]
  • 73. Newburg DS, Peterson JA, Ruiz-Palacios GM, Matson DO, Morrow AL, Shults J, Guerrero ML, Chaturvedi P, Newburg SO, Scallan CD, Taylor MR, Ceriani RL, Pickering LKRole of human-milk lactadherin in protection against symptomatic rotavirus infection. Lancet. 1998;351(9110):1160–4.[PubMed][Google Scholar]
  • 74. Stimson E, Hope J, Chong A, Burlingame ALSite-specific characterization of the N-linked glycans of murine prion protein by high-performance liquid chromatography/electrospray mass spectrometry and exoglycosidase digestions. Biochemistry. 1999;38(15):4885–95.[PubMed][Google Scholar]
  • 75. Sato S, Rahemtulla F, Prince CW, Tomana M, Butler WTAcidic glycoproteins from bovine compact bone. Connect Tissue Res. 1985;14(1):51–64.[PubMed][Google Scholar]
  • 76. Nakahara Y, Miyata T, Hamuro T, Funatsu A, Miyagi M, Tsunasawa S, Kato HAmino acid sequence and carbohydrate structure of a recombinant human tissue factor pathway inhibitor expressed in Chinese hamster ovary cells: one N-and two O-linked carbohydrate chains are located between Kunitz domains 2 and 3 and one N-linked carbohydrate chain is in Kunitz domain 2. Biochemistry. 1996;35(20):6450–9.[PubMed][Google Scholar]
  • 77. Brunner AM, Gentry LE, Cooper JA, Purchio AFRecombinant type 1 transforming growth factor beta precursor produced in Chinese hamster ovary cells is glycosylated and phosphorylated. Mol Cell Biol. 1988;8(5):2229–32.[Google Scholar]
  • 78. Ploug M, Rahbek-Nielsen H, Nielsen PF, Roepstorff P, Dano KGlycosylation profile of a recombinant urokinase-type plasminogen activator receptor expressed in Chinese hamster ovary cells. J Biol Chem. 1998;273(22):13933–43.[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.