Citations
All
Search in:AllTitleAbstractAuthor name
Publications
(14M+)
Patents
Grants
Pathways
Clinical trials
The language you are using is not recognised as English. To correctly search in your language please select Search and translation language
Publication
Journal: Nucleic Acids Research
April/20/1983
Abstract
We have developed a procedure for preparing extracts from nuclei of human tissue culture cells that directs accurate transcription initiation in vitro from class II promoters. Conditions of extraction and assay have been optimized for maximum activity using the major late promoter of adenovirus 2. The extract also directs accurate transcription initiation from other adenovirus promoters and cellular promoters. The extract also directs accurate transcription initiation from class III promoters (tRNA and Ad 2 VA).
Publication
Journal: Genome Research
June/11/2008
Abstract
We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Bioinformatics
February/15/2011
Abstract
BACKGROUND
Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification.
RESULTS
UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
BACKGROUND
Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch.
Publication
Journal: BMC Genomics
March/31/2008
Abstract
BACKGROUND
The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them.
METHODS
We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12-24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service.
CONCLUSIONS
By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.
Publication
Journal: Bioinformatics
December/10/2006
Abstract
RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Gamma yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets>> or =4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25,057 (1463 bp) and 2182 (51,089 bp) taxa, respectively.
BACKGROUND
icwww.epfl.ch/~stamatak
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Journal of Computational Biology
August/26/2012
Abstract
The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nucleic Acids Research
January/20/2002
Abstract
The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.
Publication
Journal: Nucleic Acids Research
July/6/2015
Abstract
limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Publication
Journal: Acta crystallographica. Section D, Biological crystallography
June/23/2010
Abstract
MolProbity is a structure-validation web service that provides broad-spectrum solidly based evaluation of model quality at both the global and local levels for both proteins and nucleic acids. It relies heavily on the power and sensitivity provided by optimized hydrogen placement and all-atom contact analysis, complemented by updated versions of covalent-geometry and torsion-angle criteria. Some of the local corrections can be performed automatically in MolProbity and all of the diagnostics are presented in chart and graphical forms that help guide manual rebuilding. X-ray crystallography provides a wealth of biologically important molecular data in the form of atomic three-dimensional structures of proteins, nucleic acids and increasingly large complexes in multiple forms and states. Advances in automation, in everything from crystallization to data collection to phasing to model building to refinement, have made solving a structure using crystallography easier than ever. However, despite these improvements, local errors that can affect biological interpretation are widespread at low resolution and even high-resolution structures nearly all contain at least a few local errors such as Ramachandran outliers, flipped branched protein side chains and incorrect sugar puckers. It is critical both for the crystallographer and for the end user that there are easy and reliable methods to diagnose and correct these sorts of errors in structures. MolProbity is the authors' contribution to helping solve this problem and this article reviews its general capabilities, reports on recent enhancements and usage, and presents evidence that the resulting improvements are now beneficially affecting the global database.
Publication
Journal: Spatial vision
August/19/1997
Abstract
The Psychophysics Toolbox is a software package that supports visual psychophysics. Its routines provide an interface between a high-level interpreted language (MATLAB on the Macintosh) and the video display hardware. A set of example programs is included with the Toolbox distribution.
Publication
Journal: Journal of Molecular Biology
August/10/1983
Abstract
Factors that affect the probability of genetic transformation of Escherichia coli by plasmids have been evaluated. A set of conditions is described under which about one in every 400 plasmid molecules produces a transformed cell. These conditions include cell growth in medium containing elevated levels of Mg2+, and incubation of the cells at 0 degrees C in a solution of Mn2+, Ca2+, Rb+ or K+, dimethyl sulfoxide, dithiothreitol, and hexamine cobalt (III). Transformation efficiency declines linearly with increasing plasmid size. Relaxed and supercoiled plasmids transform with similar probabilities. Non-transforming DNAs compete consistent with mass. No significant variation is observed between competing DNAs of different source, complexity, length or form. Competition with both transforming and non-transforming plasmids indicates that each cell is capable of taking up many DNA molecules, and that the establishment of a transformation event is neither helped nor hindered significantly by the presence of multiple plasmids.
Authors
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: CA - A Cancer Journal for Clinicians
May/24/2016
Abstract
Each year, the American Cancer Society estimates the numbers of new cancer cases and deaths that will occur in the United States in the current year and compiles the most recent data on cancer incidence, mortality, and survival. Incidence data were collected by the National Cancer Institute (Surveillance, Epidemiology, and End Results [SEER] Program), the Centers for Disease Control and Prevention (National Program of Cancer Registries), and the North American Association of Central Cancer Registries. Mortality data were collected by the National Center for Health Statistics. In 2016, 1,685,210 new cancer cases and 595,690 cancer deaths are projected to occur in the United States. Overall cancer incidence trends (13 oldest SEER registries) are stable in women, but declining by 3.1% per year in men (from 2009-2012), much of which is because of recent rapid declines in prostate cancer diagnoses. The cancer death rate has dropped by 23% since 1991, translating to more than 1.7 million deaths averted through 2012. Despite this progress, death rates are increasing for cancers of the liver, pancreas, and uterine corpus, and cancer is now the leading cause of death in 21 states, primarily due to exceptionally large reductions in death from heart disease. Among children and adolescents (aged birth-19 years), brain cancer has surpassed leukemia as the leading cause of cancer death because of the dramatic therapeutic advances against leukemia. Accelerating progress against cancer requires both increased national investment in cancer research and the application of existing cancer control knowledge across all segments of the population.
Publication
Journal: Acta crystallographica. Section A, Foundations of crystallography
June/6/1991
Abstract
Map interpretation remains a critical step in solving the structure of a macromolecule. Errors introduced at this early stage may persist throughout crystallographic refinement and result in an incorrect structure. The normally quoted crystallographic residual is often a poor description for the quality of the model. Strategies and tools are described that help to alleviate this problem. These simplify the model-building process, quantify the goodness of fit of the model on a per-residue basis and locate possible errors in peptide and side-chain conformations.
Publication
Journal: New England Journal of Medicine
February/19/2002
Abstract
BACKGROUND
Type 2 diabetes affects approximately 8 percent of adults in the United States. Some risk factors--elevated plasma glucose concentrations in the fasting state and after an oral glucose load, overweight, and a sedentary lifestyle--are potentially reversible. We hypothesized that modifying these factors with a lifestyle-intervention program or the administration of metformin would prevent or delay the development of diabetes.
METHODS
We randomly assigned 3234 nondiabetic persons with elevated fasting and post-load plasma glucose concentrations to placebo, metformin (850 mg twice daily), or a lifestyle-modification program with the goals of at least a 7 percent weight loss and at least 150 minutes of physical activity per week. The mean age of the participants was 51 years, and the mean body-mass index (the weight in kilograms divided by the square of the height in meters) was 34.0; 68 percent were women, and 45 percent were members of minority groups.
RESULTS
The average follow-up was 2.8 years. The incidence of diabetes was 11.0, 7.8, and 4.8 cases per 100 person-years in the placebo, metformin, and lifestyle groups, respectively. The lifestyle intervention reduced the incidence by 58 percent (95 percent confidence interval, 48 to 66 percent) and metformin by 31 percent (95 percent confidence interval, 17 to 43 percent), as compared with placebo; the lifestyle intervention was significantly more effective than metformin. To prevent one case of diabetes during a period of three years, 6.9 persons would have to participate in the lifestyle-intervention program, and 13.9 would have to receive metformin.
CONCLUSIONS
Lifestyle changes and treatment with metformin both reduced the incidence of diabetes in persons at high risk. The lifestyle intervention was more effective than metformin.
Pulse
Views:
18
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nature
January/22/2003
Abstract
Recent data have expanded the concept that inflammation is a critical component of tumour progression. Many cancers arise from sites of infection, chronic irritation and inflammation. It is now becoming clear that the tumour microenvironment, which is largely orchestrated by inflammatory cells, is an indispensable participant in the neoplastic process, fostering proliferation, survival and migration. In addition, tumour cells have co-opted some of the signalling molecules of the innate immune system, such as selectins, chemokines and their receptors for invasion, migration and metastasis. These insights are fostering new anti-inflammatory therapeutic approaches to cancer development.
Pulse
Views:
3
Posts:
No posts
Rating:
Not rated
Publication
Journal: Genome Research
July/26/2004
Abstract
WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization.
Publication
Journal: Journal of Molecular Biology
January/11/1994
Abstract
We describe a comparative protein modelling method designed to find the most probable structure for a sequence given its alignment with related structures. The three-dimensional (3D) model is obtained by optimally satisfying spatial restraints derived from the alignment and expressed as probability density functions (pdfs) for the features restrained. For example, the probabilities for main-chain conformations of a modelled residue may be restrained by its residue type, main-chain conformation of an equivalent residue in a related protein, and the local similarity between the two sequences. Several such pdfs are obtained from the correlations between structural features in 17 families of homologous proteins which have been aligned on the basis of their 3D structures. The pdfs restrain C alpha-C alpha distances, main-chain N-O distances, main-chain and side-chain dihedral angles. A smoothing procedure is used in the derivation of these relationships to minimize the problem of a sparse database. The 3D model of a protein is obtained by optimization of the molecular pdf such that the model violates the input restraints as little as possible. The molecular pdf is derived as a combination of pdfs restraining individual spatial features of the whole molecule. The optimization procedure is a variable target function method that applies the conjugate gradients algorithm to positions of all non-hydrogen atoms. The method is automated and is illustrated by the modelling of trypsin from two other serine proteinases.
Publication
Journal: Technical Report Series - World Health Organization, Geneva
June/6/2001
Abstract
Overweight and obesity represent a rapidly growing threat to the health of populations in an increasing number of countries. Indeed they are now so common that they are replacing more traditional problems such as undernutrition and infectious diseases as the most significant causes of ill-health. Obesity comorbidities include coronary heart disease, hypertension and stroke, certain types of cancer, non-insulin-dependent diabetes mellitus, gallbladder disease, dyslipidaemia, osteoarthritis and gout, and pulmonary diseases, including sleep apnoea. In addition, the obese suffer from social bias, prejudice and discrimination, on the part not only of the general public but also of health professionals, and this may make them reluctant to seek medical assistance. WHO therefore convened a Consultation on obesity to review current epidemiological information, contributing factors and associated consequences, and this report presents its conclusions and recommendations. In particular, the Consultation considered the system for classifying overweight and obesity based on the body mass index, and concluded that a coherent system is now available and should be adopted internationally. The Consultation also concluded that the fundamental causes of the obesity epidemic are sedentary lifestyles and high-fat energy-dense diets, both resulting from the profound changes taking place in society and the behavioural patterns of communities as a consequence of increased urbanization and industrialization and the disappearance of traditional lifestyles. A reduction in fat intake to around 20-25% of energy is necessary to minimize energy imbalance and weight gain in sedentary individuals. While there is strong evidence that certain genes have an influence on body mass and body fat, most do not qualify as necessary genes, i.e. genes that cause obesity whenever two copies of the defective allele are present; it is likely to be many years before the results of genetic research can be applied to the problem. Methods for the treatment of obesity are described, including dietary management, physical activity and exercise, and antiobesity drugs, with gastrointestinal surgery being reserved for extreme cases.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nature Genetics
September/10/2006
Abstract
Population stratification--allele frequency differences between cases and controls due to systematic ancestry differences-can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker's variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers.
Publication
Journal: The Lancet
September/30/1998
Abstract
BACKGROUND
Improved blood-glucose control decreases the progression of diabetic microvascular disease, but the effect on macrovascular complications is unknown. There is concern that sulphonylureas may increase cardiovascular mortality in patients with type 2 diabetes and that high insulin concentrations may enhance atheroma formation. We compared the effects of intensive blood-glucose control with either sulphonylurea or insulin and conventional treatment on the risk of microvascular and macrovascular complications in patients with type 2 diabetes in a randomised controlled trial.
METHODS
3867 newly diagnosed patients with type 2 diabetes, median age 54 years (IQR 48-60 years), who after 3 months' diet treatment had a mean of two fasting plasma glucose (FPG) concentrations of 6.1-15.0 mmol/L were randomly assigned intensive policy with a sulphonylurea (chlorpropamide, glibenclamide, or glipizide) or with insulin, or conventional policy with diet. The aim in the intensive group was FPG less than 6 mmol/L. In the conventional group, the aim was the best achievable FPG with diet alone; drugs were added only if there were hyperglycaemic symptoms or FPG greater than 15 mmol/L. Three aggregate endpoints were used to assess differences between conventional and intensive treatment: any diabetes-related endpoint (sudden death, death from hyperglycaemia or hypoglycaemia, fatal or non-fatal myocardial infarction, angina, heart failure, stroke, renal failure, amputation [of at least one digit], vitreous haemorrhage, retinopathy requiring photocoagulation, blindness in one eye, or cataract extraction); diabetes-related death (death from myocardial infarction, stroke, peripheral vascular disease, renal disease, hyperglycaemia or hypoglycaemia, and sudden death); all-cause mortality. Single clinical endpoints and surrogate subclinical endpoints were also assessed. All analyses were by intention to treat and frequency of hypoglycaemia was also analysed by actual therapy.
RESULTS
Over 10 years, haemoglobin A1c (HbA1c) was 7.0% (6.2-8.2) in the intensive group compared with 7.9% (6.9-8.8) in the conventional group--an 11% reduction. There was no difference in HbA1c among agents in the intensive group. Compared with the conventional group, the risk in the intensive group was 12% lower (95% CI 1-21, p=0.029) for any diabetes-related endpoint; 10% lower (-11 to 27, p=0.34) for any diabetes-related death; and 6% lower (-10 to 20, p=0.44) for all-cause mortality. Most of the risk reduction in the any diabetes-related aggregate endpoint was due to a 25% risk reduction (7-40, p=0.0099) in microvascular endpoints, including the need for retinal photocoagulation. There was no difference for any of the three aggregate endpoints between the three intensive agents (chlorpropamide, glibenclamide, or insulin). Patients in the intensive group had more hypoglycaemic episodes than those in the conventional group on both types of analysis (both p<0.0001). The rates of major hypoglycaemic episodes per year were 0.7% with conventional treatment, 1.0% with chlorpropamide, 1.4% with glibenclamide, and 1.8% with insulin. Weight gain was significantly higher in the intensive group (mean 2.9 kg) than in the conventional group (p<0.001), and patients assigned insulin had a greater gain in weight (4.0 kg) than those assigned chlorpropamide (2.6 kg) or glibenclamide (1.7 kg).
CONCLUSIONS
Intensive blood-glucose control by either sulphonylureas or insulin substantially decreases the risk of microvascular complications, but not macrovascular disease, in patients with type 2 diabetes.(ABSTRACT TRUNCATED)
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Science
June/3/2009
Abstract
In contrast to normal differentiated cells, which rely primarily on mitochondrial oxidative phosphorylation to generate the energy needed for cellular processes, most cancer cells instead rely on aerobic glycolysis, a phenomenon termed "the Warburg effect." Aerobic glycolysis is an inefficient way to generate adenosine 5'-triphosphate (ATP), however, and the advantage it confers to cancer cells has been unclear. Here we propose that the metabolism of cancer cells, and indeed all proliferating cells, is adapted to facilitate the uptake and incorporation of nutrients into the biomass (e.g., nucleotides, amino acids, and lipids) needed to produce a new cell. Supporting this idea are recent studies showing that (i) several signaling pathways implicated in cell proliferation also regulate metabolic pathways that incorporate nutrients into biomass; and that (ii) certain cancer-associated mutations enable cancer cells to acquire and metabolize nutrients in a manner conducive to proliferation rather than efficient ATP production. A better understanding of the mechanistic links between cellular metabolism and growth control may ultimately lead to better treatments for human cancer.
Publication
Journal: Nature
June/28/2007
Abstract
There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined approximately 2,000 individuals for each of 7 major diseases and a shared set of approximately 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 x 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10(-5) and 5 x 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Genome Biology
July/21/2003
Abstract
BACKGROUND
Functional annotation of differentially expressed genes is a necessary and critical step in the analysis of microarray data. The distributed nature of biological knowledge frequently requires researchers to navigate through numerous web-accessible databases gathering information one gene at a time. A more judicious approach is to provide query-based access to an integrated database that disseminates biologically rich information across large datasets and displays graphic summaries of functional information.
RESULTS
Database for Annotation, Visualization, and Integrated Discovery (DAVID; http://www.david.niaid.nih.gov) addresses this need via four web-based analysis modules: 1) Annotation Tool - rapidly appends descriptive data from several public databases to lists of genes; 2) GoCharts - assigns genes to Gene Ontology functional categories based on user selected classifications and term specificity level; 3) KeggCharts - assigns genes to KEGG metabolic processes and enables users to view genes in the context of biochemical pathway maps; and 4) DomainCharts - groups genes according to PFAM conserved protein domains.
CONCLUSIONS
Analysis results and graphical displays remain dynamically linked to primary data and external data repositories, thereby furnishing in-depth as well as broad-based data coverage. The functionality provided by DAVID accelerates the analysis of genome-scale datasets by facilitating the transition from data collection to biological meaning.
Publication
Journal: Journal of Molecular Biology
February/14/2001
Abstract
We describe and validate a new membrane protein topology prediction method, TMHMM, based on a hidden Markov model. We present a detailed analysis of TMHMM's performance, and show that it correctly predicts 97-98 % of the transmembrane helices. Additionally, TMHMM can discriminate between soluble and membrane proteins with both specificity and sensitivity better than 99 %, although the accuracy drops when signal peptides are present. This high degree of accuracy allowed us to predict reliably integral membrane proteins in a large collection of genomes. Based on these predictions, we estimate that 20-30 % of all genes in most genomes encode membrane proteins, which is in agreement with previous estimates. We further discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C(in) topologies. We discuss the possible relevance of this finding for our understanding of membrane protein assembly mechanisms. A TMHMM prediction service is available at http://www.cbs.dtu.dk/services/TMHMM/.
load more...