Publications about QUIN

Publication

Journal: Acta crystallographica. Section A, Foundations of crystallography

September/2/2008

Abstract

An account is given of the development of the SHELX system of computer programs from SHELX-76 to the present day. In addition to identifying useful innovations that have come into general use through their implementation in SHELX, a critical analysis is presented of the less-successful features, missed opportunities and desirable improvements for future releases of the software. An attempt is made to understand how a program originally designed for photographic intensity data, punched cards and computers over 10000 times slower than an average modern personal computer has managed to survive for so long. SHELXL is the most widely used program for small-molecule refinement and SHELXS and SHELXD are often employed for structure solution despite the availability of objectively superior programs. SHELXL also finds a niche for the refinement of macromolecules against high-resolution or twinned data; SHELXPRO acts as an interface for macromolecular applications. SHELXC, SHELXD and SHELXE are proving useful for the experimental phasing of macromolecules, especially because they are fast and robust and so are often employed in pipelines for high-throughput phasing. This paper could serve as a general literature citation when one or more of the open-source SHELX programs (and the Bruker AXS version SHELXTL) are employed in the course of a crystal-structure determination.

Authors

George M Sheldrick

Related with

Citations(26466)Authors(1)

Pulse

Views:

34

Posts:

No posts

Rating:

Not rated

Publication

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Download PDF

Journal: Nucleic Acids Research

January/2/1995

Abstract

The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308517/bin/nar00046-0131.tif

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308517/bin/nar00046-0132.tif

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308517/bin/nar00046-0133.tif

Authors

J D Thompson; D G Higgins; T J Gibson

Pulse

Views:

29

Posts:

No posts

Rating:

Not rated

Publication

MicroRNAs: genomics, biogenesis, mechanism, and function.

Journal: Cell

February/19/2004

Abstract

MicroRNAs (miRNAs) are endogenous approximately 22 nt RNAs that can play important regulatory roles in animals and plants by targeting mRNAs for cleavage or translational repression. Although they escaped notice until relatively recently, miRNAs comprise one of the more abundant classes of gene regulatory molecules in multicellular organisms and likely influence the output of many protein-coding genes.

Authors

David P Bartel

Pulse

Views:

6

Posts:

No posts

Rating:

Not rated

Publication

Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

Journal: Nature Protocols

March/3/2009

Abstract

DAVID bioinformatics resources consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. This protocol explains how to use DAVID, a high-throughput and integrated data-mining environment, to analyze gene lists derived from high-throughput genomic experiments. The procedure first requires uploading a gene list containing any number of common gene identifiers followed by analysis using one or more text and pathway-mining tools such as gene functional classification, functional annotation chart or clustering and functional annotation table. By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.

Authors

Da Wei Huang; Brad T Sherman; Richard A Lempicki

Pulse

Views:

5

Posts:

No posts

Rating:

Not rated

Publication

Bias in meta-analysis detected by a simple, graphical test.

Download PDF

Journal: BMJ (Clinical research ed.)

October/22/1997

Abstract

OBJECTIVE

Funnel plots (plots of effect estimates against sample size) may be useful to detect bias in meta-analyses that were later contradicted by large trials. We examined whether a simple test of asymmetry of funnel plots predicts discordance of results when meta-analyses are compared to large trials, and we assessed the prevalence of bias in published meta-analyses.

METHODS

Medline search to identify pairs consisting of a meta-analysis and a single large trial (concordance of results was assumed if effects were in the same direction and the meta-analytic estimate was within 30% of the trial); analysis of funnel plots from 37 meta-analyses identified from a hand search of four leading general medicine journals 1993-6 and 38 meta-analyses from the second 1996 issue of the Cochrane Database of Systematic Reviews.

METHODS

Degree of funnel plot asymmetry as measured by the intercept from regression of standard normal deviates against precision.

RESULTS

In the eight pairs of meta-analysis and large trial that were identified (five from cardiovascular medicine, one from diabetic medicine, one from geriatric medicine, one from perinatal medicine) there were four concordant and four discordant pairs. In all cases discordance was due to meta-analyses showing larger effects. Funnel plot asymmetry was present in three out of four discordant pairs but in none of concordant pairs. In 14 (38%) journal meta-analyses and 5 (13%) Cochrane reviews, funnel plot asymmetry indicated that there was bias.

CONCLUSIONS

A simple analysis of funnel plots provides a useful test for the likely presence of bias in meta-analyses, but as the capacity to detect bias will be limited when meta-analyses are based on a limited number of small trials the results from such analyses should be treated with considerable caution.

Authors

M Egger; G Davey Smith; M Schneider; C Minder

Pulse

Views:

4

Posts:

No posts

Rating:

Not rated

Publication

Statistical methods for assessing agreement between two methods of clinical measurement.

Journal: The Lancet

March/12/1986

Abstract

In clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analysed inappropriately, notably by using correlation coefficients. The use of correlation is misleading. An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

Authors

J M Bland; D G Altman

Pulse

Views:

3

Posts:

No posts

Rating:

Not rated

Publication

VMD: visual molecular dynamics.

Journal: Journal of molecular graphics

December/3/1996

Abstract

VMD is a molecular graphics program designed for the display and analysis of molecular assemblies, in particular biopolymers such as proteins and nucleic acids. VMD can simultaneously display any number of structures using a wide variety of rendering styles and coloring methods. Molecules are displayed as one or more "representations," in which each representation embodies a particular rendering method and coloring scheme for a selected subset of atoms. The atoms displayed in each representation are chosen using an extensive atom selection syntax, which includes Boolean operators and regular expressions. VMD provides a complete graphical user interface for program control, as well as a text interface using the Tcl embeddable parser to allow for complex scripts with variable substitution, control loops, and function calls. Full session logging is supported, which produces a VMD command script for later playback. High-resolution raster images of displayed molecules may be produced by generating input scripts for use by a number of photorealistic image-rendering applications. VMD has also been expressly designed with the ability to animate molecular dynamics (MD) simulation trajectories, imported either from files or from a direct connection to a running MD simulation. VMD is the visualization component of MDScope, a set of tools for interactive problem solving in structural biology, which also includes the parallel MD program NAMD, and the MDCOMM software used to connect the visualization and simulation programs. VMD is written in C++, using an object-oriented design; the program, including source code and extensive documentation, is freely available via anonymous ftp and through the World Wide Web.

Authors

W Humphrey; A Dalke; K Schulten

Publication

The CCP4 suite: programs for protein crystallography.

Journal: Acta crystallographica. Section D, Biological crystallography

February/13/2005

Abstract

The CCP4 (Collaborative Computational Project, number 4) program suite is a collection of programs and associated data and subroutine libraries which can be used for macromolecular structure determination by X-ray crystallography. The suite is designed to be flexible, allowing users a number of methods of achieving their aims and so there may be more than one program to cover each function. The programs are written mainly in standard Fortran77. They are from a wide variety of sources but are connected by standard data file formats. The package has been ported to all the major platforms under both Unix and VMS. The suite is distributed by anonymous ftp from Daresbury Laboratory and is widely used throughout the world.

Authors

Collaborative Computational Project, Number 4

Related with

Citations(7216)Authors(1)

Pulse

Views:

4

Posts:

No posts

Rating:

Not rated

Publication

The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection.

Journal: Medical Care

July/1/1992

Abstract

A 36-item short-form (SF-36) was constructed to survey health status in the Medical Outcomes Study. The SF-36 was designed for use in clinical practice and research, health policy evaluations, and general population surveys. The SF-36 includes one multi-item scale that assesses eight health concepts: 1) limitations in physical activities because of health problems; 2) limitations in social activities because of physical or emotional problems; 3) limitations in usual role activities because of physical health problems; 4) bodily pain; 5) general mental health (psychological distress and well-being); 6) limitations in usual role activities because of emotional problems; 7) vitality (energy and fatigue); and 8) general health perceptions. The survey was constructed for self-administration by persons 14 years of age and older, and for administration by a trained interviewer in person or by telephone. The history of the development of the SF-36, the origin of specific items, and the logic underlying their selection are summarized. The content and features of the SF-36 are compared with the 20-item Medical Outcomes Study short-form.

Authors

J E Ware; C D Sherbourne

Pulse

Views:

20

Posts:

No posts

Rating:

Not rated

Publication

Phaser crystallographic software.

Download PDF

Journal: Journal of Applied Crystallography

September/21/2017

Abstract

Phaser is a program for phasing macromolecular crystal structures by both molecular replacement and experimental phasing methods. The novel phasing algorithms implemented in Phaser have been developed using maximum likelihood and multivariate statistics. For molecular replacement, the new algorithms have proved to be significantly better than traditional methods in discriminating correct solutions from noise, and for single-wavelength anomalous dispersion experimental phasing, the new algorithms, which account for correlations between F(+) and F(-), give better phases (lower mean phase error with respect to the phases given by the refined structure) than those that use mean F and anomalous differences DeltaF. One of the design concepts of Phaser was that it be capable of a high degree of automation. To this end, Phaser (written in C++) can be called directly from Python, although it can also be called using traditional CCP4 keyword-style input. Phaser is a platform for future development of improved phasing methods and their release, including source code, to the crystallographic community.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2483472/bin/j-40-00658-fig1.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2483472/bin/j-40-00658-fig2.jpg

Authors

Airlie J McCoy; Ralf W Grosse-Kunstleve; Paul D Adams; Martyn D Winn; Laurent C Storoni; Randy J Read

Related with

Citations(6524)References(36)Authors(6)

Pulse

Views:

69

Posts:

No posts

Rating:

Not rated

Publication

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Download PDF

Journal: Bioinformatics

March/1/2010

Abstract

CONCLUSIONS

It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data.

BACKGROUND

The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2796818/bin/btp616f1.jpg

Authors

Mark D Robinson; Davis J McCarthy; Gordon K Smyth

Pulse

Views:

4

Posts:

No posts

Rating:

Not rated

Publication

Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.

Journal: The Lancet

January/27/2020

Abstract

A recent cluster of pneumonia cases in Wuhan, China, was caused by a novel betacoronavirus, the 2019 novel coronavirus (2019-nCoV). We report the epidemiological, clinical, laboratory, and radiological characteristics and treatment and clinical outcomes of these patients.All patients with suspected 2019-nCoV were admitted to a designated hospital in Wuhan. We prospectively collected and analysed data on patients with laboratory-confirmed 2019-nCoV infection by real-time RT-PCR and next-generation sequencing. Data were obtained with standardised data collection forms shared by the International Severe Acute Respiratory and Emerging Infection Consortium from electronic medical records. Researchers also directly communicated with patients or their families to ascertain epidemiological and symptom data. Outcomes were also compared between patients who had been admitted to the intensive care unit (ICU) and those who had not.By Jan 2, 2020, 41 admitted hospital patients had been identified as having laboratory-confirmed 2019-nCoV infection. Most of the infected patients were men (30 [73%] of 41); less than half had underlying diseases (13 [32%]), including diabetes (eight [20%]), hypertension (six [15%]), and cardiovascular disease (six [15%]). Median age was 49·0 years (IQR 41·0-58·0). 27 (66%) of 41 patients had been exposed to Huanan seafood market. One family cluster was found. Common symptoms at onset of illness were fever (40 [98%] of 41 patients), cough (31 [76%]), and myalgia or fatigue (18 [44%]); less common symptoms were sputum production (11 [28%] of 39), headache (three [8%] of 38), haemoptysis (two [5%] of 39), and diarrhoea (one [3%] of 38). Dyspnoea developed in 22 (55%) of 40 patients (median time from illness onset to dyspnoea 8·0 days [IQR 5·0-13·0]). 26 (63%) of 41 patients had lymphopenia. All 41 patients had pneumonia with abnormal findings on chest CT. Complications included acute respiratory distress syndrome (12 [29%]), RNAaemia (six [15%]), acute cardiac injury (five [12%]) and secondary infection (four [10%]). 13 (32%) patients were admitted to an ICU and six (15%) died. Compared with non-ICU patients, ICU patients had higher plasma levels of IL2, IL7, IL10, GSCF, IP10, MCP1, MIP1A, and TNFα.The 2019-nCoV infection caused clusters of severe respiratory illness similar to severe acute respiratory syndrome coronavirus and was associated with ICU admission and high mortality. Major gaps in our knowledge of the origin, epidemiology, duration of human transmission, and clinical spectrum of disease need fulfilment by future studies.Ministry of Science and Technology, Chinese Academy of Medical Sciences, National Natural Science Foundation of China, and Beijing Municipal Science and Technology Commission.

Authors

Chaolin Huang; Yeming Wang; Xingwang Li; Lili Ren; Jianping Zhao; Yi Hu+23 authors

Pulse

Views:

7

Posts:

No posts

Rating:

Not rated

Publication

Quantifying heterogeneity in a meta-analysis.

Journal: Statistics in Medicine

August/6/2002

Abstract

The extent of heterogeneity in a meta-analysis partly determines the difficulty in drawing overall conclusions. This extent may be measured by estimating a between-study variance, but interpretation is then specific to a particular treatment effect metric. A test for the existence of heterogeneity exists, but depends on the number of studies in the meta-analysis. We develop measures of the impact of heterogeneity on a meta-analysis, from mathematical criteria, that are independent of the number of studies and the treatment effect metric. We derive and propose three suitable statistics: H is the square root of the chi2 heterogeneity statistic divided by its degrees of freedom; R is the ratio of the standard error of the underlying mean from a random effects meta-analysis to the standard error of a fixed effect meta-analytic estimate, and I2 is a transformation of (H) that describes the proportion of total variation in study estimates that is due to heterogeneity. We discuss interpretation, interval estimates and other properties of these measures and examine them in five example data sets showing different amounts of heterogeneity. We conclude that H and I2, which can usually be calculated for published meta-analyses, are particularly useful summaries of the impact of heterogeneity. One or both should be presented in published meta-analyses in preference to the test for heterogeneity.

Authors

Julian P T Higgins; Simon G Thompson

Publication

Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes.

Download PDF

Journal: Genome Biology

September/19/2002

Abstract

BACKGROUND

Gene-expression analysis is increasingly important in biological research, with real-time reverse transcription PCR (RT-PCR) becoming the method of choice for high-throughput and accurate expression profiling of selected genes. Given the increased sensitivity, reproducibility and large dynamic range of this methodology, the requirements for a proper internal control gene for normalization have become increasingly stringent. Although housekeeping gene expression has been reported to vary considerably, no systematic survey has properly determined the errors related to the common practice of using only one control gene, nor presented an adequate way of working around this problem.

RESULTS

We outline a robust and innovative strategy to identify the most stably expressed control genes in a given set of tissues, and to determine the minimum number of genes required to calculate a reliable normalization factor. We have evaluated ten housekeeping genes from different abundance and functional classes in various human tissues, and demonstrated that the conventional use of a single gene for normalization leads to relatively large errors in a significant proportion of samples tested. The geometric mean of multiple carefully selected housekeeping genes was validated as an accurate normalization factor by analyzing publicly available microarray data.

CONCLUSIONS

The normalization strategy presented here is a prerequisite for accurate RT-PCR expression profiling, which, among other things, opens up the possibility of studying the biological relevance of small expression differences.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC126239/bin/gb-2002-3-7-research0034-1.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC126239/bin/gb-2002-3-7-research0034-2.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC126239/bin/gb-2002-3-7-research0034-3.jpg

Authors

Jo Vandesompele; Katleen De Preter; Filip Pattyn; Bruce Poppe; Nadine Van Roy; Anne De Paepe; Frank Speleman

Pulse

Views:

5

Posts:

No posts

Rating:

Not rated

Publication

One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products.

Download PDF

Journal: Proceedings of the National Academy of Sciences of the United States of America

July/12/2000

Abstract

We have developed a simple and highly efficient method to disrupt chromosomal genes in Escherichia coli in which PCR primers provide the homology to the targeted gene(s). In this procedure, recombination requires the phage lambda Red recombinase, which is synthesized under the control of an inducible promoter on an easily curable, low copy number plasmid. To demonstrate the utility of this approach, we generated PCR products by using primers with 36- to 50-nt extensions that are homologous to regions adjacent to the gene to be inactivated and template plasmids carrying antibiotic resistance genes that are flanked by FRT (FLP recognition target) sites. By using the respective PCR products, we made 13 different disruptions of chromosomal genes. Mutants of the arcB, cyaA, lacZYA, ompR-envZ, phnR, pstB, pstCA, pstS, pstSCAB-phoU, recA, and torSTRCAD genes or operons were isolated as antibiotic-resistant colonies after the introduction into bacteria carrying a Red expression plasmid of synthetic (PCR-generated) DNA. The resistance genes were then eliminated by using a helper plasmid encoding the FLP recombinase which is also easily curable. This procedure should be widely useful, especially in genome analysis of E. coli and other bacteria because the procedure can be done in wild-type cells.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC18686/bin/pq1201632001.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC18686/bin/pq1201632002.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC18686/bin/pq1201632003.jpg

Authors

K A Datsenko; B L Wanner

Pulse

Views:

4

Posts:

No posts

Rating:

Not rated

Publication

NMRPipe: a multidimensional spectral processing system based on UNIX pipes.

Journal: Journal of Biomolecular NMR

January/24/1996

Abstract

The NMRPipe system is a UNIX software environment of processing, graphics, and analysis tools designed to meet current routine and research-oriented multidimensional processing requirements, and to anticipate and accommodate future demands and developments. The system is based on UNIX pipes, which allow programs running simultaneously to exchange streams of data under user control. In an NMRPipe processing scheme, a stream of spectral data flows through a pipeline of processing programs, each of which performs one component of the overall scheme, such as Fourier transformation or linear prediction. Complete multidimensional processing schemes are constructed as simple UNIX shell scripts. The processing modules themselves maintain and exploit accurate records of data sizes, detection modes, and calibration information in all dimensions, so that schemes can be constructed without the need to explicitly define or anticipate data sizes or storage details of real and imaginary channels during processing. The asynchronous pipeline scheme provides other substantial advantages, including high flexibility, favorable processing speeds, choice of both all-in-memory and disk-bound processing, easy adaptation to different data formats, simpler software development and maintenance, and the ability to distribute processing tasks on multi-CPU computers and computer networks.

Authors

F Delaglio; S Grzesiek; G W Vuister; G Zhu; J Pfeifer; A Bax

Publication

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Journal: Nature Methods

July/16/2008

Abstract

We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41-52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3' untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 x 10(5) distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices.

Authors

Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold

Pulse

Views:

3

Posts:

No posts

Rating:

Not rated

Publication

The genetics of Caenorhabditis elegans.

Download PDF

Journal: Genetics

September/16/1974

Abstract

Methods are described for the isolation, complementation and mapping of mutants of Caenorhabditis elegans, a small free-living nematode worm. About 300 EMS-induced mutants affecting behavior and morphology have been characterized and about one hundred genes have been defined. Mutations in 77 of these alter the movement of the animal. Estimates of the induced mutation frequency of both the visible mutants and X chromosome lethals suggests that, just as in Drosophila, the genetic units in C. elegans are large.

Authors

S Brenner

Pulse

Views:

2

Posts:

No posts

Rating:

Not rated

Publication

New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada.

Journal: Journal of the National Cancer Institute

February/27/2000

Abstract

Anticancer cytotoxic agents go through a process by which their antitumor activity-on the basis of the amount of tumor shrinkage they could generate-has been investigated. In the late 1970s, the International Union Against Cancer and the World Health Organization introduced specific criteria for the codification of tumor response evaluation. In 1994, several organizations involved in clinical research combined forces to tackle the review of these criteria on the basis of the experience and knowledge acquired since then. After several years of intensive discussions, a new set of guidelines is ready that will supersede the former criteria. In parallel to this initiative, one of the participating groups developed a model by which response rates could be derived from unidimensional measurement of tumor lesions instead of the usual bidimensional approach. This new concept has been largely validated by the Response Evaluation Criteria in Solid Tumors Group and integrated into the present guidelines. This special article also provides some philosophic background to clarify the various purposes of response evaluation. It proposes a model by which a combined assessment of all existing lesions, characterized by target lesions (to be measured) and nontarget lesions, is used to extrapolate an overall response to treatment. Methods of assessing tumor lesions are better codified, briefly within the guidelines and in more detail in Appendix I. All other aspects of response evaluation have been discussed, reviewed, and amended whenever appropriate.

Authors

P Therasse; S G Arbuck; E A Eisenhauer; J Wanders; R S Kaplan; L Rubinstein+5 authors

Pulse

Views:

3

Posts:

No posts

Rating:

Not rated

Publication

Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.

Journal: Cell

February/9/2005

Abstract

We predict regulatory targets of vertebrate microRNAs (miRNAs) by identifying mRNAs with conserved complementarity to the seed (nucleotides 2-7) of the miRNA. An overrepresentation of conserved adenosines flanking the seed complementary sites in mRNAs indicates that primary sequence determinants can supplement base pairing to specify miRNA target recognition. In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of our gene set. Targeting was also detected in open reading frames. In sum, well over one third of human genes appear to be conserved miRNA targets.

Authors

Benjamin P Lewis; Christopher B Burge; David P Bartel

Publication

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Download PDF

Journal: Nature Biotechnology

August/29/2010

Abstract

High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146043/bin/nihms190938f1.jpg

Authors

Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren+3 authors

Pulse

Views:

3

Posts:

No posts

Rating:

Not rated

Publication

TopHat: discovering splice junctions with RNA-Seq.

Download PDF

Journal: Bioinformatics

June/17/2009

Abstract

BACKGROUND

A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or 'reads', can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

RESULTS

We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20,000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development.

BACKGROUND

TopHat is free, open-source software available from http://tophat.cbcb.umd.edu.

BACKGROUND

Supplementary data are available at Bioinformatics online.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672628/bin/btp120f1.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672628/bin/btp120f2.jpg

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672628/bin/btp120f3.jpg

Authors

Cole Trapnell; Lior Pachter; Steven L Salzberg

Pulse

Views:

5

Posts:

No posts

Rating:

Not rated

Publication

Exploration, normalization, and summaries of high density oligonucleotide array probe level data.

Journal: Biostatistics

October/22/2003

Abstract

In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth's Genetics Institute involving 95 HG-U95A human GeneChip arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip arrays. We display some familiar features of the perfect match and mismatch probe (PM and MM) values of these data, and examine the variance-mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike-in data and assess three commonly used summary measures: Affymetrix's (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities.

Authors

Rafael A Irizarry; Bridget Hobbs; Francois Collin; Yasmin D Beazer-Barclay; Kristen J Antonellis; Uwe Scherf; Terence P Speed

Publication

An integrated encyclopedia of DNA elements in the human genome.

Download PDF

Journal: Nature

October/16/2012

Abstract

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439153/bin/nihms381381f6.jpg

Authors

ENCODE Project Consortium

Pulse

Views:

21

Posts:

No posts

Rating:

Not rated