Citations
All
Search in:AllTitleAbstractAuthor name
Publications
(14M+)
Patents
Grants
Pathways
Clinical trials
The language you are using is not recognised as English. To correctly search in your language please select Search and translation language
Publication
Journal: Systematic Biology
October/11/2010
Abstract
PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
Publication
Journal: Bioinformatics
November/22/2005
Abstract
CONCLUSIONS
We present here Blast2GO (B2G), a research tool designed with the main purpose of enabling Gene Ontology (GO) based data mining on sequence data for which no GO annotation is yet available. B2G joints in one application GO annotation based on similarity searches with statistical analysis and highlighted visualization on directed acyclic graphs. This tool offers a suitable platform for functional genomics research in non-model species. B2G is an intuitive and interactive desktop application that allows monitoring and comprehension of the whole annotation and analysis process.
BACKGROUND
Blast2GO is freely available via Java Web Start at http://www.blast2go.de.
UNASSIGNED
http://www.blast2go.de >> Evaluation.
Publication
Journal: NeuroImage
March/19/2002
Abstract
An anatomical parcellation of the spatially normalized single-subject high-resolution T1 volume provided by the Montreal Neurological Institute (MNI) (D. L. Collins et al., 1998, Trans. Med. Imag. 17, 463-468) was performed. The MNI single-subject main sulci were first delineated and further used as landmarks for the 3D definition of 45 anatomical volumes of interest (AVOI) in each hemisphere. This procedure was performed using a dedicated software which allowed a 3D following of the sulci course on the edited brain. Regions of interest were then drawn manually with the same software every 2 mm on the axial slices of the high-resolution MNI single subject. The 90 AVOI were reconstructed and assigned a label. Using this parcellation method, three procedures to perform the automated anatomical labeling of functional studies are proposed: (1) labeling of an extremum defined by a set of coordinates, (2) percentage of voxels belonging to each of the AVOI intersected by a sphere centered by a set of coordinates, and (3) percentage of voxels belonging to each of the AVOI intersected by an activated cluster. An interface with the Statistical Parametric Mapping package (SPM, J. Ashburner and K. J. Friston, 1999, Hum. Brain Mapp. 7, 254-266) is provided as a freeware to researchers of the neuroimaging community. We believe that this tool is an improvement for the macroscopical labeling of activated area compared to labeling assessed using the Talairach atlas brain in which deformations are well known. However, this tool does not alleviate the need for more sophisticated labeling strategies based on anatomical or cytoarchitectonic probabilistic maps.
Pulse
Views:
13
Posts:
No posts
Rating:
Not rated
Publication
Journal: Applied and Environmental Microbiology
October/18/2007
Abstract
The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence >> or = 95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and V4 variable regions giving the lowest error rates. The RDP Classifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences. Another related tool, RDP Library Compare, was developed to facilitate microbial-community comparison based on 16S rRNA gene sequence libraries. It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at http://rdp.cme.msu.edu/.
Pulse
Views:
3
Posts:
No posts
Rating:
Not rated
Publication
Journal: Proceedings of the National Academy of Sciences of the United States of America
October/2/2003
Abstract
With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.
Publication
Journal: CA - A Cancer Journal for Clinicians
July/19/2016
Abstract
With increasing incidence and mortality, cancer is the leading cause of death in China and is a major public health problem. Because of China's massive population (1.37 billion), previous national incidence and mortality estimates have been limited to small samples of the population using data from the 1990s or based on a specific year. With high-quality data from an additional number of population-based registries now available through the National Central Cancer Registry of China, the authors analyzed data from 72 local, population-based cancer registries (2009-2011), representing 6.5% of the population, to estimate the number of new cases and cancer deaths for 2015. Data from 22 registries were used for trend analyses (2000-2011). The results indicated that an estimated 4292,000 new cancer cases and 2814,000 cancer deaths would occur in China in 2015, with lung cancer being the most common incident cancer and the leading cause of cancer death. Stomach, esophageal, and liver cancers were also commonly diagnosed and were identified as leading causes of cancer death. Residents of rural areas had significantly higher age-standardized (Segi population) incidence and mortality rates for all cancers combined than urban residents (213.6 per 100,000 vs 191.5 per 100,000 for incidence; 149.0 per 100,000 vs 109.5 per 100,000 for mortality, respectively). For all cancers combined, the incidence rates were stable during 2000 through 2011 for males (+0.2% per year; P = .1), whereas they increased significantly (+2.2% per year; P < .05) among females. In contrast, the mortality rates since 2006 have decreased significantly for both males (-1.4% per year; P < .05) and females (-1.1% per year; P < .05). Many of the estimated cancer cases and deaths can be prevented through reducing the prevalence of risk factors, while increasing the effectiveness of clinical care delivery, particularly for those living in rural areas and in disadvantaged populations.
Publication
Journal: New England Journal of Medicine
January/24/2020
Abstract
In December 2019, a cluster of patients with pneumonia of unknown cause was linked to a seafood wholesale market in Wuhan, China. A previously unknown betacoronavirus was discovered through the use of unbiased sequencing in samples from patients with pneumonia. Human airway epithelial cells were used to isolate a novel coronavirus, named 2019-nCoV, which formed another clade within the subgenus sarbecovirus, Orthocoronavirinae subfamily. Different from both MERS-CoV and SARS-CoV, 2019-nCoV is the seventh member of the family of coronaviruses that infect humans. Enhanced surveillance and further investigation are ongoing. (Funded by the National Key Research and Development Program of China and the National Major Project for Control and Prevention of Infectious Disease in China.).
Pulse
Views:
5
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nature
October/14/2004
Abstract
MicroRNAs (miRNAs) are small RNAs that regulate the expression of complementary messenger RNAs. Hundreds of miRNA genes have been found in diverse animals, and many of these are phylogenetically conserved. With miRNA roles identified in developmental timing, cell death, cell proliferation, haematopoiesis and patterning of the nervous system, evidence is mounting that animal miRNAs are more numerous, and their regulatory impact more pervasive, than was previously suspected.
Publication
Journal: Journal of Personality and Social Psychology
August/31/1988
Abstract
In recent studies of the structure of affect, positive and negative affect have consistently emerged as two dominant and relatively independent dimensions. A number of mood scales have been created to measure these factors; however, many existing measures are inadequate, showing low reliability or poor convergent or discriminant validity. To fill the need for reliable and valid Positive Affect and Negative Affect scales that are also brief and easy to administer, we developed two 10-item mood scales that comprise the Positive and Negative Affect Schedule (PANAS). The scales are shown to be highly internally consistent, largely uncorrelated, and stable at appropriate levels over a 2-month time period. Normative data and factorial and external evidence of convergent and discriminant validity for the scales are also presented.
Publication
Journal: Science
November/24/1998
Abstract
Human blastocyst-derived, pluripotent cell lines are described that have normal karyotypes, express high levels of telomerase activity, and express cell surface markers that characterize primate embryonic stem cells but do not characterize other early lineages. After undifferentiated proliferation in vitro for 4 to 5 months, these cells still maintained the developmental potential to form trophoblast and derivatives of all three embryonic germ layers, including gut epithelium (endoderm); cartilage, bone, smooth muscle, and striated muscle (mesoderm); and neural epithelium, embryonic ganglia, and stratified squamous epithelium (ectoderm). These cell lines should be useful in human developmental biology, drug discovery, and transplantation medicine.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nature
December/13/2010
Abstract
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
Publication
Journal: Nature Protocols
June/20/2012
Abstract
Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Proceedings of the National Academy of Sciences of the United States of America
May/21/2003
Abstract
Breast cancer is the most common malignancy in United States women, accounting for >40,000 deaths each year. These breast tumors are comprised of phenotypically diverse populations of breast cancer cells. Using a model in which human breast cancer cells were grown in immunocompromised mice, we found that only a minority of breast cancer cells had the ability to form new tumors. We were able to distinguish the tumorigenic (tumor initiating) from the nontumorigenic cancer cells based on cell surface marker expression. We prospectively identified and isolated the tumorigenic cells as CD44(+)CD24(-/low)Lineage(-) in eight of nine patients. As few as 100 cells with this phenotype were able to form tumors in mice, whereas tens of thousands of cells with alternate phenotypes failed to form tumors. The tumorigenic subpopulation could be serially passaged: each time cells within this population generated new tumors containing additional CD44(+)CD24(-/low)Lineage(-) tumorigenic cells as well as the phenotypically diverse mixed populations of nontumorigenic cells present in the initial tumor. The ability to prospectively identify tumorigenic cancer cells will facilitate the elucidation of pathways that regulate their growth and survival. Furthermore, because these cells drive tumor development, strategies designed to target this population may lead to more effective therapies.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Biometrics
February/2/1989
Abstract
Methods of evaluating and comparing the performance of diagnostic tests are of increasing importance as new tests are developed and marketed. When a test is based on an observed variable that lies on a continuous or graded scale, an assessment of the overall value of the test can be made through the use of a receiver operating characteristic (ROC) curve. The curve is constructed by varying the cutpoint used to determine which values of the observed variable will be considered abnormal and then plotting the resulting sensitivities against the corresponding false positive rates. When two or more empirical curves are constructed based on tests performed on the same individuals, statistical analysis on differences between curves must take into account the correlated nature of the data. This paper presents a nonparametric approach to the analysis of areas under correlated ROC curves, by using the theory on generalized U-statistics to generate an estimated covariance matrix.
Publication
Journal: Cell
January/9/1994
Abstract
lin-4 is essential for the normal temporal control of diverse postembryonic developmental events in C. elegans. lin-4 acts by negatively regulating the level of LIN-14 protein, creating a temporal decrease in LIN-14 protein starting in the first larval stage (L1). We have cloned the C. elegans lin-4 locus by chromosomal walking and transformation rescue. We used the C. elegans clone to isolate the gene from three other Caenorhabditis species; all four Caenorhabditis clones functionally rescue the lin-4 null allele of C. elegans. Comparison of the lin-4 genomic sequence from these four species and site-directed mutagenesis of potential open reading frames indicated that lin-4 does not encode a protein. Two small lin-4 transcripts of approximately 22 and 61 nt were identified in C. elegans and found to contain sequences complementary to a repeated sequence element in the 3' untranslated region (UTR) of lin-14 mRNA, suggesting that lin-4 regulates lin-14 translation via an antisense RNA-RNA interaction.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Archives of general psychiatry
June/14/2005
Abstract
BACKGROUND
Little is known about lifetime prevalence or age of onset of DSM-IV disorders.
OBJECTIVE
To estimate lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the recently completed National Comorbidity Survey Replication.
METHODS
Nationally representative face-to-face household survey conducted between February 2001 and April 2003 using the fully structured World Health Organization World Mental Health Survey version of the Composite International Diagnostic Interview.
METHODS
Nine thousand two hundred eighty-two English-speaking respondents aged 18 years and older.
METHODS
Lifetime DSM-IV anxiety, mood, impulse-control, and substance use disorders.
RESULTS
Lifetime prevalence estimates are as follows: anxiety disorders, 28.8%; mood disorders, 20.8%; impulse-control disorders, 24.8%; substance use disorders, 14.6%; any disorder, 46.4%. Median age of onset is much earlier for anxiety (11 years) and impulse-control (11 years) disorders than for substance use (20 years) and mood (30 years) disorders. Half of all lifetime cases start by age 14 years and three fourths by age 24 years. Later onsets are mostly of comorbid conditions, with estimated lifetime risk of any disorder at age 75 years (50.8%) only slightly higher than observed lifetime prevalence (46.4%). Lifetime prevalence estimates are higher in recent cohorts than in earlier cohorts and have fairly stable intercohort differences across the life course that vary in substantively plausible ways among sociodemographic subgroups.
CONCLUSIONS
About half of Americans will meet the criteria for a DSM-IV disorder sometime in their life, with first onset usually in childhood or adolescence. Interventions aimed at prevention or early treatment need to focus on youth.
Publication
Journal: Bioinformatics
June/8/2015
Abstract
BACKGROUND
A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed.
RESULTS
We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.
METHODS
HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Science
January/7/2008
Abstract
Somatic cell nuclear transfer allows trans-acting factors present in the mammalian oocyte to reprogram somatic cell nuclei to an undifferentiated state. We show that four factors (OCT4, SOX2, NANOG, and LIN28) are sufficient to reprogram human somatic cells to pluripotent stem cells that exhibit the essential characteristics of embryonic stem (ES) cells. These induced pluripotent human stem cells have normal karyotypes, express telomerase activity, express cell surface markers and genes that characterize human ES cells, and maintain the developmental potential to differentiate into advanced derivatives of all three primary germ layers. Such induced pluripotent human cell lines should be useful in the production of new disease models and in drug development, as well as for applications in transplantation medicine, once technical limitations (for example, mutation through viral integration) are eliminated.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nature Reviews Genetics
January/29/2009
Abstract
RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.
Pulse
Views:
7
Posts:
No posts
Rating:
Not rated
Publication
Journal: Bioinformatics
August/26/2003
Abstract
BACKGROUND
When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations.
RESULTS
We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably.
BACKGROUND
Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org.
BACKGROUND
Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html
Publication
Journal: Molecular Ecology
August/17/2005
Abstract
The identification of genetically homogeneous groups of individuals is a long standing issue in population genetics. A recent Bayesian algorithm implemented in the software STRUCTURE allows the identification of such groups. However, the ability of this algorithm to detect the true number of clusters (K) in a sample of individuals when patterns of dispersal among populations are not homogeneous has not been tested. The goal of this study is to carry out such tests, using various dispersal scenarios from data generated with an individual-based model. We found that in most cases the estimated 'log probability of data' does not provide a correct estimation of the number of clusters, K. However, using an ad hoc statistic DeltaK based on the rate of change in the log probability of data between successive K values, we found that STRUCTURE accurately detects the uppermost hierarchical level of structure for the scenarios we tested. As might be expected, the results are sensitive to the type of genetic marker used (AFLP vs. microsatellite), the number of loci scored, the number of populations sampled, and the number of individuals typed in each sample.
Publication
Journal: Science
August/11/2002
Abstract
It has been more than 10 years since it was first proposed that the neurodegeneration in Alzheimer's disease (AD) may be caused by deposition of amyloid beta-peptide (Abeta) in plaques in brain tissue. According to the amyloid hypothesis, accumulation of Abeta in the brain is the primary influence driving AD pathogenesis. The rest of the disease process, including formation of neurofibrillary tangles containing tau protein, is proposed to result from an imbalance between Abeta production and Abeta clearance.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Genome Research
April/25/2002
Abstract
Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. BLAT's speed stems from an index of all nonoverlapping K-mers in the genome. This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly. BLAT has several major stages. It uses the index to find regions in the genome likely to be homologous to the query sequence. It performs an alignment between homologous regions. It stitches together these aligned regions (often exons) into larger alignments (typically genes). Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible. This paper describes how BLAT was optimized. Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches. BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications. http://genome.ucsc.edu hosts a web-based BLAT server for the human genome.
Publication
Journal: Nature
March/8/1998
Abstract
Experimental introduction of RNA into cells can be used in certain biological systems to interfere with the function of an endogenous gene. Such effects have been proposed to result from a simple antisense mechanism that depends on hybridization between the injected RNA and endogenous messenger RNA transcripts. RNA interference has been used in the nematode Caenorhabditis elegans to manipulate gene expression. Here we investigate the requirements for structure and delivery of the interfering RNA. To our surprise, we found that double-stranded RNA was substantially more effective at producing interference than was either strand individually. After injection into adult animals, purified single strands had at most a modest effect, whereas double-stranded mixtures caused potent and specific interference. The effects of this interference were evident in both the injected animals and their progeny. Only a few molecules of injected double-stranded RNA were required per affected cell, arguing against stochiometric interference with endogenous mRNA and suggesting that there could be a catalytic or amplification component in the interference process.
load more...