Publications about SAMe

Publication

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Journal: Nucleic Acids Research

October/1/1997

Abstract

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

Authors

S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman

Pulse

Views:

19

Posts:

No posts

Rating:

Not rated

Publication

Bias in meta-analysis detected by a simple, graphical test.

Download PDF

Journal: BMJ (Clinical research ed.)

October/22/1997

Abstract

OBJECTIVE

Funnel plots (plots of effect estimates against sample size) may be useful to detect bias in meta-analyses that were later contradicted by large trials. We examined whether a simple test of asymmetry of funnel plots predicts discordance of results when meta-analyses are compared to large trials, and we assessed the prevalence of bias in published meta-analyses.

METHODS

Medline search to identify pairs consisting of a meta-analysis and a single large trial (concordance of results was assumed if effects were in the same direction and the meta-analytic estimate was within 30% of the trial); analysis of funnel plots from 37 meta-analyses identified from a hand search of four leading general medicine journals 1993-6 and 38 meta-analyses from the second 1996 issue of the Cochrane Database of Systematic Reviews.

METHODS

Degree of funnel plot asymmetry as measured by the intercept from regression of standard normal deviates against precision.

RESULTS

In the eight pairs of meta-analysis and large trial that were identified (five from cardiovascular medicine, one from diabetic medicine, one from geriatric medicine, one from perinatal medicine) there were four concordant and four discordant pairs. In all cases discordance was due to meta-analyses showing larger effects. Funnel plot asymmetry was present in three out of four discordant pairs but in none of concordant pairs. In 14 (38%) journal meta-analyses and 5 (13%) Cochrane reviews, funnel plot asymmetry indicated that there was bias.

CONCLUSIONS

A simple analysis of funnel plots provides a useful test for the likely presence of bias in meta-analyses, but as the capacity to detect bias will be limited when meta-analyses are based on a limited number of small trials the results from such analyses should be treated with considerable caution.

Authors

M Egger; G Davey Smith; M Schneider; C Minder

Pulse

Views:

4

Posts:

No posts

Rating:

Not rated

Publication

Induction of pluripotent stem cells from adult human fibroblasts by defined factors.

Journal: Cell

January/29/2008

Abstract

Successful reprogramming of differentiated human somatic cells into a pluripotent state would allow creation of patient- and disease-specific stem cells. We previously reported generation of induced pluripotent stem (iPS) cells, capable of germline transmission, from mouse somatic cells by transduction of four defined transcription factors. Here, we demonstrate the generation of iPS cells from adult human dermal fibroblasts with the same four factors: Oct3/4, Sox2, Klf4, and c-Myc. Human iPS cells were similar to human embryonic stem (ES) cells in morphology, proliferation, surface antigens, gene expression, epigenetic status of pluripotent cell-specific genes, and telomerase activity. Furthermore, these cells could differentiate into cell types of the three germ layers in vitro and in teratomas. These findings demonstrate that iPS cells can be generated from adult human fibroblasts.

Authors

Kazutoshi Takahashi; Koji Tanabe; Mari Ohnuki; Megumi Narita; Tomoko Ichisaka; Kiichiro Tomoda; Shinya Yamanaka

Pulse

Views:

2

Posts:

No posts

Rating:

Not rated

Publication

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

Journal: Systematic Biology

December/23/2003

Abstract

The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/.

Authors

Stéphane Guindon; Olivier Gascuel

Related with

Citations(4674)Processes(3)Authors(2)

Publication

STAR: ultrafast universal RNA-seq aligner.

Download PDF

Journal: Bioinformatics

July/28/2013

Abstract

BACKGROUND

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases.

RESULTS

To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy.

METHODS

STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/bin/bts635f1.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/bin/bts635f2.jpg

Authors

Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha+3 authors

Pulse

Views:

1

Posts:

No posts

Rating:

Not rated

Publication

A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae.

Download PDF

Journal: Genetics

July/16/1989

Abstract

A series of yeast shuttle vectors and host strains has been created to allow more efficient manipulation of DNA in Saccharomyces cerevisiae. Transplacement vectors were constructed and used to derive yeast strains containing nonreverting his3, trp1, leu2 and ura3 mutations. A set of YCp and YIp vectors (pRS series) was then made based on the backbone of the multipurpose plasmid pBLUESCRIPT. These pRS vectors are all uniform in structure and differ only in the yeast selectable marker gene used (HIS3, TRP1, LEU2 and URA3). They possess all of the attributes of pBLUESCRIPT and several yeast-specific features as well. Using a pRS vector, one can perform most standard DNA manipulations in the same plasmid that is introduced into yeast.

Authors

R S Sikorski; P Hieter

Publication

Molecular portraits of human breast tumours.

Journal: Nature

September/13/2000

Abstract

Human breast tumours are diverse in their natural history and in their responsiveness to treatments. Variation in transcriptional programs accounts for much of the biological diversity of human cells and tumours. In each cell, signal transduction and regulatory systems transduce information from the cell's identity to its environmental status, thereby controlling the level of expression of every gene in the genome. Here we have characterized variation in gene expression patterns in a set of 65 surgical specimens of human breast tumours from 42 different individuals, using complementary DNA microarrays representing 8,102 human genes. These patterns provided a distinctive molecular portrait of each tumour. Twenty of the tumours were sampled twice, before and after a 16-week course of doxorubicin chemotherapy, and two tumours were paired with a lymph node metastasis from the same patient. Gene expression patterns in two tumour samples from the same individual were almost always more similar to each other than either was to any other sample. Sets of co-expressed genes were identified for which variation in messenger RNA levels could be related to specific features of physiological variation. The tumours could be classified into subtypes distinguished by pervasive differences in their gene expression patterns.

Authors

C M Perou; T Sørlie; M B Eisen; M van de Rijn; S S Jeffrey; C A Rees+12 authors

Pulse

Views:

2

Posts:

No posts

Rating:

Not rated

Publication

Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain.

Journal: NeuroImage

March/19/2002

Abstract

An anatomical parcellation of the spatially normalized single-subject high-resolution T1 volume provided by the Montreal Neurological Institute (MNI) (D. L. Collins et al., 1998, Trans. Med. Imag. 17, 463-468) was performed. The MNI single-subject main sulci were first delineated and further used as landmarks for the 3D definition of 45 anatomical volumes of interest (AVOI) in each hemisphere. This procedure was performed using a dedicated software which allowed a 3D following of the sulci course on the edited brain. Regions of interest were then drawn manually with the same software every 2 mm on the axial slices of the high-resolution MNI single subject. The 90 AVOI were reconstructed and assigned a label. Using this parcellation method, three procedures to perform the automated anatomical labeling of functional studies are proposed: (1) labeling of an extremum defined by a set of coordinates, (2) percentage of voxels belonging to each of the AVOI intersected by a sphere centered by a set of coordinates, and (3) percentage of voxels belonging to each of the AVOI intersected by an activated cluster. An interface with the Statistical Parametric Mapping package (SPM, J. Ashburner and K. J. Friston, 1999, Hum. Brain Mapp. 7, 254-266) is provided as a freeware to researchers of the neuroimaging community. We believe that this tool is an improvement for the macroscopical labeling of activated area compared to labeling assessed using the Talairach atlas brain in which deformations are well known. However, this tool does not alleviate the need for more sophisticated labeling strategies based on anatomical or cytoarchitectonic probabilistic maps.

Authors

N Tzourio-Mazoyer; B Landeau; D Papathanassiou; F Crivello; O Etard; N Delcroix+2 authors

Pulse

Views:

14

Posts:

No posts

Rating:

Not rated

Publication

Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.

Journal: Biometrics

February/2/1989

Abstract

Methods of evaluating and comparing the performance of diagnostic tests are of increasing importance as new tests are developed and marketed. When a test is based on an observed variable that lies on a continuous or graded scale, an assessment of the overall value of the test can be made through the use of a receiver operating characteristic (ROC) curve. The curve is constructed by varying the cutpoint used to determine which values of the observed variable will be considered abnormal and then plotting the resulting sensitivities against the corresponding false positive rates. When two or more empirical curves are constructed based on tests performed on the same individuals, statistical analysis on differences between curves must take into account the correlated nature of the data. This paper presents a nonparametric approach to the analysis of areas under correlated ROC curves, by using the theory on generalized U-statistics to generate an estimated covariance matrix.

Authors

E R DeLong; D M DeLong; D L Clarke-Pearson

Publication

Improved tools for biological sequence comparison.

Download PDF

Journal: Proceedings of the National Academy of Sciences of the United States of America

May/19/1988

Abstract

We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC280013/bin/pnas00260-0036.tif

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC280013/bin/pnas00260-0037.tif

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC280013/bin/pnas00260-0038.tif

Authors

W R Pearson; D J Lipman

Pulse

Views:

3

Posts:

No posts

Rating:

Not rated

Publication

MicroRNA expression profiles classify human cancers.

Journal: Nature

June/20/2005

Abstract

Recent work has revealed the existence of a class of small non-coding RNA species, known as microRNAs (miRNAs), which have critical functions across various biological processes. Here we use a new, bead-based flow cytometric miRNA expression profiling method to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers. The miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours. We observe a general downregulation of miRNAs in tumours compared with normal tissues. Furthermore, we were able to successfully classify poorly differentiated tumours using miRNA expression profiles, whereas messenger RNA profiles were highly inaccurate when applied to the same samples. These findings highlight the potential of miRNA profiling in cancer diagnosis.

Authors

Jun Lu; Gad Getz; Eric A Miska; Ezequiel Alvarez-Saavedra; Justin Lamb; David Peck+8 authors

Publication

International physical activity questionnaire: 12-country reliability and validity.

Journal: Medicine and Science in Sports and Exercise

December/10/2003

Abstract

BACKGROUND

Physical inactivity is a global concern, but diverse physical activity measures in use prevent international comparisons. The International Physical Activity Questionnaire (IPAQ) was developed as an instrument for cross-national monitoring of physical activity and inactivity.

METHODS

Between 1997 and 1998, an International Consensus Group developed four long and four short forms of the IPAQ instruments (administered by telephone interview or self-administration, with two alternate reference periods, either the "last 7 d" or a "usual week" of recalled physical activity). During 2000, 14 centers from 12 countries collected reliability and/or validity data on at least two of the eight IPAQ instruments. Test-retest repeatability was assessed within the same week. Concurrent (inter-method) validity was assessed at the same administration, and criterion IPAQ validity was assessed against the CSA (now MTI) accelerometer. Spearman's correlation coefficients are reported, based on the total reported physical activity.

RESULTS

Overall, the IPAQ questionnaires produced repeatable data (Spearman's rho clustered around 0.8), with comparable data from short and long forms. Criterion validity had a median rho of about 0.30, which was comparable to most other self-report validation studies. The "usual week" and "last 7 d" reference periods performed similarly, and the reliability of telephone administration was similar to the self-administered mode.

CONCLUSIONS

The IPAQ instruments have acceptable measurement properties, at least as good as other established self-reports. Considering the diverse samples in this study, IPAQ has reasonable measurement properties for monitoring population levels of physical activity among 18- to 65-yr-old adults in diverse settings. The short IPAQ form "last 7 d recall" is recommended for national monitoring and the long form for research requiring more detailed assessment.

Authors

Cora L Craig; Alison L Marshall; Michael Sjöström; Adrian E Bauman; Michael L Booth; Barbara E Ainsworth+5 authors

Publication

Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China.

Journal: JAMA - Journal of the American Medical Association

February/7/2020

Abstract

In December 2019, novel coronavirus (2019-nCoV)-infected pneumonia (NCIP) occurred in Wuhan, China. The number of cases has increased rapidly but information on the clinical characteristics of affected patients is limited.To describe the epidemiological and clinical characteristics of NCIP.Retrospective, single-center case series of the 138 consecutive hospitalized patients with confirmed NCIP at Zhongnan Hospital of Wuhan University in Wuhan, China, from January 1 to January 28, 2020; final date of follow-up was February 3, 2020.Documented NCIP.Epidemiological, demographic, clinical, laboratory, radiological, and treatment data were collected and analyzed. Outcomes of critically ill patients and noncritically ill patients were compared. Presumed hospital-related transmission was suspected if a cluster of health professionals or hospitalized patients in the same wards became infected and a possible source of infection could be tracked.Of 138 hospitalized patients with NCIP, the median age was 56 years (interquartile range, 42-68; range, 22-92 years) and 75 (54.3%) were men. Hospital-associated transmission was suspected as the presumed mechanism of infection for affected health professionals (40 [29%]) and hospitalized patients (17 [12.3%]). Common symptoms included fever (136 [98.6%]), fatigue (96 [69.6%]), and dry cough (82 [59.4%]). Lymphopenia (lymphocyte count, 0.8 × 109/L [interquartile range {IQR}, 0.6-1.1]) occurred in 97 patients (70.3%), prolonged prothrombin time (13.0 seconds [IQR, 12.3-13.7]) in 80 patients (58%), and elevated lactate dehydrogenase (261 U/L [IQR, 182-403]) in 55 patients (39.9%). Chest computed tomographic scans showed bilateral patchy shadows or ground glass opacity in the lungs of all patients. Most patients received antiviral therapy (oseltamivir, 124 [89.9%]), and many received antibacterial therapy (moxifloxacin, 89 [64.4%]; ceftriaxone, 34 [24.6%]; azithromycin, 25 [18.1%]) and glucocorticoid therapy (62 [44.9%]). Thirty-six patients (26.1%) were transferred to the intensive care unit (ICU) because of complications, including acute respiratory distress syndrome (22 [61.1%]), arrhythmia (16 [44.4%]), and shock (11 [30.6%]). The median time from first symptom to dyspnea was 5.0 days, to hospital admission was 7.0 days, and to ARDS was 8.0 days. Patients treated in the ICU (n = 36), compared with patients not treated in the ICU (n = 102), were older (median age, 66 years vs 51 years), were more likely to have underlying comorbidities (26 [72.2%] vs 38 [37.3%]), and were more likely to have dyspnea (23 [63.9%] vs 20 [19.6%]), and anorexia (24 [66.7%] vs 31 [30.4%]). Of the 36 cases in the ICU, 4 (11.1%) received high-flow oxygen therapy, 15 (41.7%) received noninvasive ventilation, and 17 (47.2%) received invasive ventilation (4 were switched to extracorporeal membrane oxygenation). As of February 3, 47 patients (34.1%) were discharged and 6 died (overall mortality, 4.3%), but the remaining patients are still hospitalized. Among those discharged alive (n = 47), the median hospital stay was 10 days (IQR, 7.0-14.0).In this single-center case series of 138 hospitalized patients with confirmed NCIP in Wuhan, China, presumed hospital-related transmission of 2019-nCoV was suspected in 41% of patients, 26% of patients received ICU care, and mortality was 4.3%.

Authors

Dawei Wang; Bo Hu; Chang Hu; Fangfang Zhu; Xing Liu; Jing Zhang+8 authors

Pulse

Views:

12

Posts:

No posts

Rating:

Not rated

Publication

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Download PDF

Journal: BMC Bioinformatics

October/31/2011

Abstract

BACKGROUND

RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.

RESULTS

We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.

CONCLUSIONS

RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163565/bin/1471-2105-12-323-1.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163565/bin/1471-2105-12-323-2.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163565/bin/1471-2105-12-323-3.jpg

Authors

Bo Li; Colin N Dewey

Pulse

Views:

1

Posts:

No posts

Rating:

Not rated

Publication

A pneumonia outbreak associated with a new coronavirus of probable bat origin.

Journal: Nature

February/4/2020

Abstract

Since the SARS outbreak 18 years ago, a large number of severe acute respiratory syndrome-related coronaviruses (SARSr-CoV) have been discovered in their natural reservoir host, bats^1-4. Previous studies indicated that some of those bat SARSr-CoVs have the potential to infect humans^5-7. Here we report the identification and characterization of a novel coronavirus (2019-nCoV) which caused an epidemic of acute respiratory syndrome in humans in Wuhan, China. The epidemic, which started from 12 December 2019, has caused 2,050 laboratory-confirmed infections with 56 fatal cases by 26 January 2020. Full-length genome sequences were obtained from five patients at the early stage of the outbreak. They are almost identical to each other and share 79.5% sequence identify to SARS-CoV. Furthermore, it was found that 2019-nCoV is 96% identical at the whole-genome level to a bat coronavirus. The pairwise protein sequence analysis of seven conserved non-structural proteins show that this virus belongs to the species of SARSr-CoV. The 2019-nCoV virus was then isolated from the bronchoalveolar lavage fluid of a critically ill patient, which can be neutralized by sera from several patients. Importantly, we have confirmed that this novel CoV uses the same cell entry receptor, ACE2, as SARS-CoV.

Authors

Peng Zhou; Xing-Lou Yang; Xian-Guang Wang; Ben Hu; Lei Zhang; Wei Zhang+23 authors

Pulse

Views:

1

Posts:

No posts

Rating:

Not rated

Publication

The 1982 revised criteria for the classification of systemic lupus erythematosus.

Journal: Arthritis and rheumatism

December/20/1982

Abstract

The 1971 preliminary criteria for the classification of systemic lupus erythematosus (SLE) were revised and updated to incorporate new immunologic knowledge and improve disease classification. The 1982 revised criteria include fluorescence antinuclear antibody and antibody to native DNA and Sm antigen. Some criteria involving the same organ systems were aggregated into single criteria. Raynaud's phenomenon and alopecia were not included in the 1982 revised criteria because of low sensitivity and specificity. The new criteria were 96% sensitive and 96% specific when tested with SLE and control patient data gathered from 18 participating clinics. When compared with the 1971 criteria, the 1982 revised criteria showed gains in sensitivity and specificity.

Authors

E M Tan; A S Cohen; J F Fries; A T Masi; D J McShane; N F Rothfield+3 authors

Publication

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Journal: Radiology

May/20/1982

Abstract

A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.

Authors

J A Hanley; B J McNeil

Related with

Citations(2781)Organisms(1)Authors(2)

Publication

Gene expression profiling predicts clinical outcome of breast cancer.

Journal: Nature

March/11/2002

Abstract

Breast cancer patients with the same stage of disease can have markedly different treatment responses and overall outcome. The strongest predictors for metastases (for example, lymph node status and histological grade) fail to classify accurately breast tumours according to their clinical behaviour. Chemotherapy or hormonal therapy reduces the risk of distant metastases by approximately one-third; however, 70-80% of patients receiving this treatment would have survived without it. None of the signatures of breast cancer gene expression reported to date allow for patient-tailored therapy strategies. Here we used DNA microarray analysis on primary breast tumours of 117 young patients, and applied supervised classification to identify a gene expression signature strongly predictive of a short interval to distant metastases ('poor prognosis' signature) in patients without tumour cells in local lymph nodes at diagnosis (lymph node negative). In addition, we established a signature that identifies tumours of BRCA1 carriers. The poor prognosis signature consists of genes regulating cell cycle, invasion, metastasis and angiogenesis. This gene expression profile will outperform all currently used clinical parameters in predicting disease outcome. Our findings provide a strategy to select patients who would benefit from adjuvant therapy.

Authors

Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao+10 authors

Publication

Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Journal: Molecular Biology and Evolution

June/14/2000

Abstract

The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.

Authors

J Castresana

Publication

Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels.

Journal: Analytical Chemistry

September/12/1996

Abstract

Proteins from silver-stained gels can be digested enzymatically and the resulting peptide analyzed and sequenced by mass spectrometry. Standard proteins yield the same peptide maps when extracted from Coomassie- and silver-stained gels, as judged by electrospray and MALDI mass spectrometry. The low nanogram range can be reached by the protocols described here, and the method is robust. A silver-stained one-dimensional gel of a fraction from yeast proteins was analyzed by nano-electrospray tandem mass spectrometry. In the sequencing, more than 1000 amino acids were covered, resulting in no evidence of chemical modifications due to the silver staining procedure. Silver staining allows a substantial shortening of sample preparation time and may, therefore, be preferable over Coomassie staining. This work removes a major obstacle to the low-level sequence analysis of proteins separated on polyacrylamide gels.

Authors

A Shevchenko; M Wilm; O Vorm; M Mann

Publication

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Journal: Bioinformatics

September/7/2006

Abstract

BACKGROUND

In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.

Authors

Weizhong Li; Adam Godzik

Publication

Intratumor heterogeneity and branched evolution revealed by multiregion sequencing.

Download PDF

Journal: New England Journal of Medicine

March/15/2012

Abstract

BACKGROUND

Intratumor heterogeneity may foster tumor evolution and adaptation and hinder personalized-medicine strategies that depend on results from single tumor-biopsy samples.

METHODS

To examine intratumor heterogeneity, we performed exome sequencing, chromosome aberration analysis, and ploidy profiling on multiple spatially separated samples obtained from primary renal carcinomas and associated metastatic sites. We characterized the consequences of intratumor heterogeneity using immunohistochemical analysis, mutation functional analysis, and profiling of messenger RNA expression.

RESULTS

Phylogenetic reconstruction revealed branched evolutionary tumor growth, with 63 to 69% of all somatic mutations not detectable across every tumor region. Intratumor heterogeneity was observed for a mutation within an autoinhibitory domain of the mammalian target of rapamycin (mTOR) kinase, correlating with S6 and 4EBP phosphorylation in vivo and constitutive activation of mTOR kinase activity in vitro. Mutational intratumor heterogeneity was seen for multiple tumor-suppressor genes converging on loss of function; SETD2, PTEN, and KDM5C underwent multiple distinct and spatially separated inactivating mutations within a single tumor, suggesting convergent phenotypic evolution. Gene-expression signatures of good and poor prognosis were detected in different regions of the same tumor. Allelic composition and ploidy profiling analysis revealed extensive intratumor heterogeneity, with 26 of 30 tumor samples from four tumors harboring divergent allelic-imbalance profiles and with ploidy heterogeneity in two of four tumors.

CONCLUSIONS

Intratumor heterogeneity can lead to underestimation of the tumor genomics landscape portrayed from single tumor-biopsy samples and may present major challenges to personalized-medicine and biomarker development. Intratumor heterogeneity, associated with heterogeneous protein function, may foster tumor adaptation and therapeutic failure through Darwinian selection. (Funded by the Medical Research Council and others.).

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878653/bin/emss-68079-f001.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878653/bin/emss-68079-f002.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878653/bin/emss-68079-f003.jpg

Authors

Marco Gerlinger; Andrew J Rowan; Stuart Horswell; M Math; James Larkin; David Endesfelder+25 authors

Pulse

Views:

1

Posts:

No posts

Rating:

Not rated

Publication

CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.

Journal: Evolution; international journal of organic evolution

May/30/2017

Abstract

The recently-developed statistical method known as the "bootstrap" can be used to place confidence intervals on phylogenies. It involves resampling points from one's own data, with replacement, to create a series of bootstrap samples of the same size as the original data. Each of these is analyzed, and the variation among the resulting estimates taken to indicate the size of the error involved in making estimates from the original data. In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples. If a group shows up 95% of the time or more, the evidence for it is taken to be statistically significant. Existing computer programs can be used to analyze different bootstrap samples by using weights on the characters, the weight of a character being how many times it was drawn in bootstrap sampling. When all characters are perfectly compatible, as envisioned by Hennig, bootstrap sampling becomes unnecessary; the bootstrap method would show significant evidence for a group if it is defined by three or more characters.

Authors

Joseph Felsenstein

Related with

Citations(2136)Authors(1)Affiliates(1)

Publication

Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary.

Journal: American Journal of Respiratory and Critical Care Medicine

October/17/2007

Abstract

Chronic obstructive pulmonary disease (COPD) remains a major public health problem. It is the fourth leading cause of chronic morbidity and mortality in the United States, and is projected to rank fifth in 2020 in burden of disease worldwide, according to a study published by the World Bank/World Health Organization. Yet, COPD remains relatively unknown or ignored by the public as well as public health and government officials. In 1998, in an effort to bring more attention to COPD, its management, and its prevention, a committed group of scientists encouraged the U.S. National Heart, Lung, and Blood Institute and the World Health Organization to form the Global Initiative for Chronic Obstructive Lung Disease (GOLD). Among the important objectives of GOLD are to increase awareness of COPD and to help the millions of people who suffer from this disease and die prematurely of it or its complications. The first step in the GOLD program was to prepare a consensus report, Global Strategy for the Diagnosis, Management, and Prevention of COPD, published in 2001. The present, newly revised document follows the same format as the original consensus report, but has been updated to reflect the many publications on COPD that have appeared. GOLD national leaders, a network of international experts, have initiated investigations of the causes and prevalence of COPD in their countries, and developed innovative approaches for the dissemination and implementation of COPD management guidelines. We appreciate the enormous amount of work the GOLD national leaders have done on behalf of their patients with COPD. Despite the achievements in the 5 years since the GOLD report was originally published, considerable additional work is ahead of us if we are to control this major public health problem. The GOLD initiative will continue to bring COPD to the attention of governments, public health officials, health care workers, and the general public, but a concerted effort by all involved in health care will be necessary.

Authors

Klaus F Rabe; Suzanne Hurd; Antonio Anzueto; Peter J Barnes; Sonia A Buist; Peter Calverley+6 authors