Animals
Citations
All
Search in:AllTitleAbstractAuthor name
Publications
(6M+)
Patents
Grants
Pathways
Clinical trials
Publication
Journal: Nature Methods
July/16/2008
Abstract
We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41-52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3' untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 x 10(5) distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Genetics
September/16/1974
Abstract
Methods are described for the isolation, complementation and mapping of mutants of Caenorhabditis elegans, a small free-living nematode worm. About 300 EMS-induced mutants affecting behavior and morphology have been characterized and about one hundred genes have been defined. Mutations in 77 of these alter the movement of the animal. Estimates of the induced mutation frequency of both the visible mutants and X chromosome lethals suggests that, just as in Drosophila, the genetic units in C. elegans are large.
Authors
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Genome Biology
June/1/2011
Abstract
High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Cell
February/9/2005
Abstract
We predict regulatory targets of vertebrate microRNAs (miRNAs) by identifying mRNAs with conserved complementarity to the seed (nucleotides 2-7) of the miRNA. An overrepresentation of conserved adenosines flanking the seed complementary sites in mRNAs indicates that primary sequence determinants can supplement base pairing to specify miRNA target recognition. In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of our gene set. Targeting was also detected in open reading frames. In sum, well over one third of human genes appear to be conserved miRNA targets.
Publication
Journal: Nature Biotechnology
August/29/2010
Abstract
High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Biostatistics
October/22/2003
Abstract
In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth's Genetics Institute involving 95 HG-U95A human GeneChip arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip arrays. We display some familiar features of the perfect match and mismatch probe (PM and MM) values of these data, and examine the variance-mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike-in data and assess three commonly used summary measures: Affymetrix's (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities.
Publication
Journal: Nature
October/16/2012
Abstract
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Pulse
Views:
18
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nucleic Acids Research
April/13/1997
Abstract
We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.
Publication
Journal: Journal of Molecular Evolution
April/12/1981
Abstract
Some simple formulae were obtained which enable us to estimate evolutionary distances in terms of the number of nucleotide substitutions (and, also, the evolutionary rates when the divergence times are known). In comparing a pair of nucleotide sequences, we distinguish two types of differences; if homologous sites are occupied by different nucleotide bases but both are purines or both pyrimidines, the difference is called type I (or "transition" type), while, if one of the two is a purine and the other is a pyrimidine, the difference is called type II (or "transversion" type). Letting P and Q be respectively the fractions of nucleotide sites showing type I and type II differences between two sequences compared, then the evolutionary distance per site is K = -(1/2) ln [(1-2P-Q) square root of 1-2Q]. The evolutionary rate per year is then given by k = K/(2T), where T is the time since the divergence of the two sequences. If only the third codon positions are compared, the synonymous component of the evolutionary base substitutions per site is estimated by K'S = -(1/2) ln (1-2P-Q). Also, formulae for standard errors were obtained. Some examples were worked out using reported globin sequences to show that synonymous substitutions occur at much higher rates than amino acid-altering substitutions in evolution.
Authors
Publication
Journal: Nucleic Acids Research
April/20/1983
Abstract
We have developed a procedure for preparing extracts from nuclei of human tissue culture cells that directs accurate transcription initiation in vitro from class II promoters. Conditions of extraction and assay have been optimized for maximum activity using the major late promoter of adenovirus 2. The extract also directs accurate transcription initiation from other adenovirus promoters and cellular promoters. The extract also directs accurate transcription initiation from class III promoters (tRNA and Ad 2 VA).
Publication
Journal: Genome Research
June/11/2008
Abstract
We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nucleic Acids Research
January/20/2002
Abstract
The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.
Publication
Journal: Nature
January/22/2003
Abstract
Recent data have expanded the concept that inflammation is a critical component of tumour progression. Many cancers arise from sites of infection, chronic irritation and inflammation. It is now becoming clear that the tumour microenvironment, which is largely orchestrated by inflammatory cells, is an indispensable participant in the neoplastic process, fostering proliferation, survival and migration. In addition, tumour cells have co-opted some of the signalling molecules of the innate immune system, such as selectins, chemokines and their receptors for invasion, migration and metastasis. These insights are fostering new anti-inflammatory therapeutic approaches to cancer development.
Pulse
Views:
3
Posts:
No posts
Rating:
Not rated
Publication
Journal: Journal of Molecular Biology
January/11/1994
Abstract
We describe a comparative protein modelling method designed to find the most probable structure for a sequence given its alignment with related structures. The three-dimensional (3D) model is obtained by optimally satisfying spatial restraints derived from the alignment and expressed as probability density functions (pdfs) for the features restrained. For example, the probabilities for main-chain conformations of a modelled residue may be restrained by its residue type, main-chain conformation of an equivalent residue in a related protein, and the local similarity between the two sequences. Several such pdfs are obtained from the correlations between structural features in 17 families of homologous proteins which have been aligned on the basis of their 3D structures. The pdfs restrain C alpha-C alpha distances, main-chain N-O distances, main-chain and side-chain dihedral angles. A smoothing procedure is used in the derivation of these relationships to minimize the problem of a sparse database. The 3D model of a protein is obtained by optimization of the molecular pdf such that the model violates the input restraints as little as possible. The molecular pdf is derived as a combination of pdfs restraining individual spatial features of the whole molecule. The optimization procedure is a variable target function method that applies the conjugate gradients algorithm to positions of all non-hydrogen atoms. The method is automated and is illustrated by the modelling of trypsin from two other serine proteinases.
Publication
Journal: Science
June/3/2009
Abstract
In contrast to normal differentiated cells, which rely primarily on mitochondrial oxidative phosphorylation to generate the energy needed for cellular processes, most cancer cells instead rely on aerobic glycolysis, a phenomenon termed "the Warburg effect." Aerobic glycolysis is an inefficient way to generate adenosine 5'-triphosphate (ATP), however, and the advantage it confers to cancer cells has been unclear. Here we propose that the metabolism of cancer cells, and indeed all proliferating cells, is adapted to facilitate the uptake and incorporation of nutrients into the biomass (e.g., nucleotides, amino acids, and lipids) needed to produce a new cell. Supporting this idea are recent studies showing that (i) several signaling pathways implicated in cell proliferation also regulate metabolic pathways that incorporate nutrients into biomass; and that (ii) certain cancer-associated mutations enable cancer cells to acquire and metabolize nutrients in a manner conducive to proliferation rather than efficient ATP production. A better understanding of the mechanistic links between cellular metabolism and growth control may ultimately lead to better treatments for human cancer.
Publication
Journal: Journal of Molecular Biology
February/14/2001
Abstract
We describe and validate a new membrane protein topology prediction method, TMHMM, based on a hidden Markov model. We present a detailed analysis of TMHMM's performance, and show that it correctly predicts 97-98 % of the transmembrane helices. Additionally, TMHMM can discriminate between soluble and membrane proteins with both specificity and sensitivity better than 99 %, although the accuracy drops when signal peptides are present. This high degree of accuracy allowed us to predict reliably integral membrane proteins in a large collection of genomes. Based on these predictions, we estimate that 20-30 % of all genes in most genomes encode membrane proteins, which is in agreement with previous estimates. We further discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C(in) topologies. We discuss the possible relevance of this finding for our understanding of membrane protein assembly mechanisms. A TMHMM prediction service is available at http://www.cbs.dtu.dk/services/TMHMM/.
Publication
Journal: Proceedings of the National Academy of Sciences of the United States of America
October/2/2003
Abstract
With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.
Publication
Journal: Nature
October/14/2004
Abstract
MicroRNAs (miRNAs) are small RNAs that regulate the expression of complementary messenger RNAs. Hundreds of miRNA genes have been found in diverse animals, and many of these are phylogenetically conserved. With miRNA roles identified in developmental timing, cell death, cell proliferation, haematopoiesis and patterning of the nervous system, evidence is mounting that animal miRNAs are more numerous, and their regulatory impact more pervasive, than was previously suspected.
Publication
Journal: Molecular Biology and Evolution
October/28/2007
Abstract
PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (http://abacus.gene.ucl.ac.uk/software/paml.html).
Publication
Journal: Science
November/24/1998
Abstract
Human blastocyst-derived, pluripotent cell lines are described that have normal karyotypes, express high levels of telomerase activity, and express cell surface markers that characterize primate embryonic stem cells but do not characterize other early lineages. After undifferentiated proliferation in vitro for 4 to 5 months, these cells still maintained the developmental potential to form trophoblast and derivatives of all three embryonic germ layers, including gut epithelium (endoderm); cartilage, bone, smooth muscle, and striated muscle (mesoderm); and neural epithelium, embryonic ganglia, and stratified squamous epithelium (ectoderm). These cell lines should be useful in human developmental biology, drug discovery, and transplantation medicine.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Cell
March/29/2007
Abstract
The surface of nucleosomes is studded with a multiplicity of modifications. At least eight different classes have been characterized to date and many different sites have been identified for each class. Operationally, modifications function either by disrupting chromatin contacts or by affecting the recruitment of nonhistone proteins to chromatin. Their presence on histones can dictate the higher-order chromatin structure in which DNA is packaged and can orchestrate the ordered recruitment of enzyme complexes to manipulate DNA. In this way, histone modifications have the potential to influence many fundamental biological processes, some of which may be epigenetically inherited.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Proceedings of the National Academy of Sciences of the United States of America
May/21/2003
Abstract
Breast cancer is the most common malignancy in United States women, accounting for >40,000 deaths each year. These breast tumors are comprised of phenotypically diverse populations of breast cancer cells. Using a model in which human breast cancer cells were grown in immunocompromised mice, we found that only a minority of breast cancer cells had the ability to form new tumors. We were able to distinguish the tumorigenic (tumor initiating) from the nontumorigenic cancer cells based on cell surface marker expression. We prospectively identified and isolated the tumorigenic cells as CD44(+)CD24(-/low)Lineage(-) in eight of nine patients. As few as 100 cells with this phenotype were able to form tumors in mice, whereas tens of thousands of cells with alternate phenotypes failed to form tumors. The tumorigenic subpopulation could be serially passaged: each time cells within this population generated new tumors containing additional CD44(+)CD24(-/low)Lineage(-) tumorigenic cells as well as the phenotypically diverse mixed populations of nontumorigenic cells present in the initial tumor. The ability to prospectively identify tumorigenic cancer cells will facilitate the elucidation of pathways that regulate their growth and survival. Furthermore, because these cells drive tumor development, strategies designed to target this population may lead to more effective therapies.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Cell
January/9/1994
Abstract
lin-4 is essential for the normal temporal control of diverse postembryonic developmental events in C. elegans. lin-4 acts by negatively regulating the level of LIN-14 protein, creating a temporal decrease in LIN-14 protein starting in the first larval stage (L1). We have cloned the C. elegans lin-4 locus by chromosomal walking and transformation rescue. We used the C. elegans clone to isolate the gene from three other Caenorhabditis species; all four Caenorhabditis clones functionally rescue the lin-4 null allele of C. elegans. Comparison of the lin-4 genomic sequence from these four species and site-directed mutagenesis of potential open reading frames indicated that lin-4 does not encode a protein. Two small lin-4 transcripts of approximately 22 and 61 nt were identified in C. elegans and found to contain sequences complementary to a repeated sequence element in the 3' untranslated region (UTR) of lin-14 mRNA, suggesting that lin-4 regulates lin-14 translation via an antisense RNA-RNA interaction.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Analytical Biochemistry
August/28/1979
load more...