Citations
All
Search in:AllTitleAbstractAuthor name
Publications
(10M+)
Patents
Grants
Pathways
Clinical trials
The language you are using is not recognised as English. To correctly search in your language please select Search and translation language
Publication
Journal: Journal of Molecular Biology
October/31/2007
Abstract
We discuss basic physical-chemical principles underlying the formation of stable macromolecular complexes, which in many cases are likely to be the biological units performing a certain physiological function. We also consider available theoretical approaches to the calculation of macromolecular affinity and entropy of complexation. The latter is shown to play an important role and make a major effect on complex size and symmetry. We develop a new method, based on chemical thermodynamics, for automatic detection of macromolecular assemblies in the Protein Data Bank (PDB) entries that are the results of X-ray diffraction experiments. As found, biological units may be recovered at 80-90% success rate, which makes X-ray crystallography an important source of experimental data on macromolecular complexes and protein-protein interactions. The method is implemented as a public WWW service.
Publication
Journal: BMC Bioinformatics
October/31/2011
Abstract
BACKGROUND
RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.
RESULTS
We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.
CONCLUSIONS
RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Journals of Gerontology - Series A Biological Sciences and Medical Sciences
March/28/2001
Abstract
BACKGROUND
Frailty is considered highly prevalent in old age and to confer high risk for falls, disability, hospitalization, and mortality. Frailty has been considered synonymous with disability, comorbidity, and other characteristics, but it is recognized that it may have a biologic basis and be a distinct clinical syndrome. A standardized definition has not yet been established.
METHODS
To develop and operationalize a phenotype of frailty in older adults and assess concurrent and predictive validity, the study used data from the Cardiovascular Health Study. Participants were 5,317 men and women 65 years and older (4,735 from an original cohort recruited in 1989-90 and 582 from an African American cohort recruited in 1992-93). Both cohorts received almost identical baseline evaluations and 7 and 4 years of follow-up, respectively, with annual examinations and surveillance for outcomes including incident disease, hospitalization, falls, disability, and mortality.
RESULTS
Frailty was defined as a clinical syndrome in which three or more of the following criteria were present: unintentional weight loss (10 lbs in past year), self-reported exhaustion, weakness (grip strength), slow walking speed, and low physical activity. The overall prevalence of frailty in this community-dwelling population was 6.9%; it increased with age and was greater in women than men. Four-year incidence was 7.2%. Frailty was associated with being African American, having lower education and income, poorer health, and having higher rates of comorbid chronic diseases and disability. There was overlap, but not concordance, in the cooccurrence of frailty, comorbidity, and disability. This frailty phenotype was independently predictive (over 3 years) of incident falls, worsening mobility or ADL disability, hospitalization, and death, with hazard ratios ranging from 1.82 to 4.46, unadjusted, and 1.29-2.24, adjusted for a number of health, disease, and social characteristics predictive of 5-year mortality. Intermediate frailty status, as indicated by the presence of one or two criteria, showed intermediate risk of these outcomes as well as increased risk of becoming frail over 3-4 years of follow-up (odds ratios for incident frailty = 4.51 unadjusted and 2.63 adjusted for covariates, compared to those with no frailty criteria at baseline).
CONCLUSIONS
This study provides a potential standardized definition for frailty in community-dwelling older adults and offers concurrent and predictive validity for the definition. It also finds that there is an intermediate stage identifying those at high risk of frailty. Finally, it provides evidence that frailty is not synonymous with either comorbidity or disability, but comorbidity is an etiologic risk factor for, and disability is an outcome of, frailty. This provides a potential basis for clinical assessment for those who are frail or at risk, and for future research to develop interventions for frailty based on a standardized ascertainment of frailty.
Publication
Journal: Nature
February/4/2020
Abstract
Since the SARS outbreak 18 years ago, a large number of severe acute respiratory syndrome-related coronaviruses (SARSr-CoV) have been discovered in their natural reservoir host, bats1-4. Previous studies indicated that some of those bat SARSr-CoVs have the potential to infect humans5-7. Here we report the identification and characterization of a novel coronavirus (2019-nCoV) which caused an epidemic of acute respiratory syndrome in humans in Wuhan, China. The epidemic, which started from 12 December 2019, has caused 2,050 laboratory-confirmed infections with 56 fatal cases by 26 January 2020. Full-length genome sequences were obtained from five patients at the early stage of the outbreak. They are almost identical to each other and share 79.5% sequence identify to SARS-CoV. Furthermore, it was found that 2019-nCoV is 96% identical at the whole-genome level to a bat coronavirus. The pairwise protein sequence analysis of seven conserved non-structural proteins show that this virus belongs to the species of SARSr-CoV. The 2019-nCoV virus was then isolated from the bronchoalveolar lavage fluid of a critically ill patient, which can be neutralized by sera from several patients. Importantly, we have confirmed that this novel CoV uses the same cell entry receptor, ACE2, as SARS-CoV.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nature
December/6/2001
Abstract
Stem cell biology has come of age. Unequivocal proof that stem cells exist in the haematopoietic system has given way to the prospective isolation of several tissue-specific stem and progenitor cells, the initial delineation of their properties and expressed genetic programmes, and the beginnings of their utility in regenerative medicine. Perhaps the most important and useful property of stem cells is that of self-renewal. Through this property, striking parallels can be found between stem cells and cancer cells: tumours may often originate from the transformation of normal stem cells, similar signalling pathways may regulate self-renewal in stem cells and cancer cells, and cancer cells may include 'cancer stem cells' - rare cells with indefinite potential for self-renewal that drive tumorigenesis.
Pulse
Views:
3
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nucleic Acids Research
October/12/2010
Abstract
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
Publication
Journal: BMC Bioinformatics
March/17/2009
Abstract
BACKGROUND
Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial.
RESULTS
The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings.
CONCLUSIONS
The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA.
Publication
Journal: Nature
June/16/1998
Abstract
Networks of coupled dynamical systems have been used to model biological oscillators, Josephson junction arrays, excitable media, neural networks, spatial games, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks 'rewired' to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them 'small-world' networks, by analogy with the small-world phenomenon (popularly known as six degrees of separation. The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of film actors are shown to be small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more easily in small-world networks than in regular lattices.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Bioinformatics
January/30/2013
Abstract
The two main functions of bioinformatics are the organization and analysis of biological data using computational resources. Geneious Basic has been designed to be an easy-to-use and flexible desktop software application framework for the organization and analysis of biological data, with a focus on molecular sequences and related data types. It integrates numerous industry-standard discovery analysis tools, with interactive visualizations to generate publication-ready images. One key contribution to researchers in the life sciences is the Geneious public application programming interface (API) that affords the ability to leverage the existing framework of the Geneious Basic software platform for virtually unlimited extension and customization. The result is an increase in the speed and quality of development of computation tools for the life sciences, due to the functionality and graphical user interface available to the developer through the public API. Geneious Basic represents an ideal platform for the bioinformatics community to leverage existing components and to integrate their own specific requirements for the discovery, analysis and visualization of biological data.
METHODS
Binaries and public API freely available for download at http://www.geneious.com/basic, implemented in Java and supported on Linux, Apple OSX and MS Windows. The software is also available from the Bio-Linux package repository at http://nebc.nerc.ac.uk/news/geneiousonbl.
Pulse
Views:
5
Posts:
No posts
Rating:
Not rated
Publication
Journal: Radiology
May/20/1982
Abstract
A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.
Publication
Journal: Proceedings of the National Academy of Sciences of the United States of America
May/29/1984
Abstract
Unique DNA sequences can be determined directly from mouse genomic DNA. A denaturing gel separates by size mixtures of unlabeled DNA fragments from complete restriction and partial chemical cleavages of the entire genome. These lanes of DNA are transferred and UV-crosslinked to nylon membranes. Hybridization with a short 32P-labeled single-stranded probe produces the image of a DNA sequence "ladder" extending from the 3' or 5' end of one restriction site in the genome. Numerous different sequences can be obtained from a single membrane by reprobing. Each band in these sequences represents 3 fg of DNA complementary to the probe. Sequence data from mouse immunoglobulin heavy chain genes from several cell types are presented. The genomic sequencing procedures are applicable to the analysis of genetic polymorphisms, DNA methylation at deoxycytidines, and nucleic acid-protein interactions at single nucleotide resolution.
Publication
Journal: Hypertension
January/4/2004
Abstract
The National High Blood Pressure Education Program presents the complete Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. Like its predecessors, the purpose is to provide an evidence-based approach to the prevention and management of hypertension. The key messages of this report are these: in those older than age 50, systolic blood pressure (BP) of greater than 140 mm Hg is a more important cardiovascular disease (CVD) risk factor than diastolic BP; beginning at 115/75 mm Hg, CVD risk doubles for each increment of 20/10 mm Hg; those who are normotensive at 55 years of age will have a 90% lifetime risk of developing hypertension; prehypertensive individuals (systolic BP 120-139 mm Hg or diastolic BP 80-89 mm Hg) require health-promoting lifestyle modifications to prevent the progressive rise in blood pressure and CVD; for uncomplicated hypertension, thiazide diuretic should be used in drug treatment for most, either alone or combined with drugs from other classes; this report delineates specific high-risk conditions that are compelling indications for the use of other antihypertensive drug classes (angiotensin-converting enzyme inhibitors, angiotensin-receptor blockers, beta-blockers, calcium channel blockers); two or more antihypertensive medications will be required to achieve goal BP (<140/90 mm Hg, or <130/80 mm Hg) for patients with diabetes and chronic kidney disease; for patients whose BP is more than 20 mm Hg above the systolic BP goal or more than 10 mm Hg above the diastolic BP goal, initiation of therapy using two agents, one of which usually will be a thiazide diuretic, should be considered; regardless of therapy or care, hypertension will be controlled only if patients are motivated to stay on their treatment plan. Positive experiences, trust in the clinician, and empathy improve patient motivation and satisfaction. This report serves as a guide, and the committee continues to recognize that the responsible physician's judgment remains paramount.
Publication
Journal: Cell
October/30/2006
Abstract
Microenvironments appear important in stem cell lineage specification but can be difficult to adequately characterize or control with soft tissues. Naive mesenchymal stem cells (MSCs) are shown here to specify lineage and commit to phenotypes with extreme sensitivity to tissue-level elasticity. Soft matrices that mimic brain are neurogenic, stiffer matrices that mimic muscle are myogenic, and comparatively rigid matrices that mimic collagenous bone prove osteogenic. During the initial week in culture, reprogramming of these lineages is possible with addition of soluble induction factors, but after several weeks in culture, the cells commit to the lineage specified by matrix elasticity, consistent with the elasticity-insensitive commitment of differentiated cell types. Inhibition of nonmuscle myosin II blocks all elasticity-directed lineage specification-without strongly perturbing many other aspects of cell function and shape. The results have significant implications for understanding physical effects of the in vivo microenvironment and also for therapeutic uses of stem cells.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nature
October/19/2009
Abstract
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, 'missing' heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.
Publication
Journal: Proceedings of the National Academy of Sciences of the United States of America
May/19/1977
Abstract
DNA can be sequenced by a chemical procedure that breaks a terminally labeled DNA molecule partially at each repetition of a base. The lengths of the labeled fragments then identify the positions of that base. We describe reactions that cleave DNA preferentially at guanines, at adenines, at cytosines and thymines equally, and at cytosines alone. When the products of these four reactions are resolved by size, by electrophoresis on a polyacrylamide gel, the DNA sequence can be read from the pattern of radioactive bands. The technique will permit sequencing of at least 100 bases from the point of labeling.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: BMC Bioinformatics
April/8/2010
Abstract
BACKGROUND
Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications.
RESULTS
We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site.
CONCLUSIONS
The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
Publication
Journal: Nephron
February/12/1976
Abstract
A formula has been developed to predict creatinine clearance (Ccr) from serum creatinine (Scr) in adult males: (see article)(15% less in females). Derivation included the relationship found between age and 24-hour creatinine excretion/kg in 249 patients aged 18-92. Values for Ccr were predicted by this formula and four other methods and the results compared with the means of two 24-hour Ccr's measured in 236 patients. The above formula gave a correlation coefficient between predicted and mean measured Ccr's of 0.83; on average, the difference predicted and mean measured values was no greater than that between paired clearances. Factors for age and body weight must be included for reasonable prediction.
Publication
Journal: Nature Reviews Cancer
January/2/2007
Abstract
MicroRNA (miRNA) alterations are involved in the initiation and progression of human cancer. The causes of the widespread differential expression of miRNA genes in malignant compared with normal cells can be explained by the location of these genes in cancer-associated genomic regions, by epigenetic mechanisms and by alterations in the miRNA processing machinery. MiRNA-expression profiling of human tumours has identified signatures associated with diagnosis, staging, progression, prognosis and response to treatment. In addition, profiling has been exploited to identify miRNA genes that might represent downstream targets of activated oncogenic pathways, or that target protein-coding genes involved in cancer.
Publication
Journal: Circulation
November/12/2009
Abstract
A cluster of risk factors for cardiovascular disease and type 2 diabetes mellitus, which occur together more often than by chance alone, have become known as the metabolic syndrome. The risk factors include raised blood pressure, dyslipidemia (raised triglycerides and lowered high-density lipoprotein cholesterol), raised fasting glucose, and central obesity. Various diagnostic criteria have been proposed by different organizations over the past decade. Most recently, these have come from the International Diabetes Federation and the American Heart Association/National Heart, Lung, and Blood Institute. The main difference concerns the measure for central obesity, with this being an obligatory component in the International Diabetes Federation definition, lower than in the American Heart Association/National Heart, Lung, and Blood Institute criteria, and ethnic specific. The present article represents the outcome of a meeting between several major organizations in an attempt to unify criteria. It was agreed that there should not be an obligatory component, but that waist measurement would continue to be a useful preliminary screening tool. Three abnormal findings out of 5 would qualify a person for the metabolic syndrome. A single set of cut points would be used for all components except waist circumference, for which further work is required. In the interim, national or regional cut points for waist circumference can be used.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Gene
March/16/1983
Abstract
A series of plasmid vectors containing the multiple cloning site (MCS7) of M13mp7 has been constructed. In one of these vectors a kanamycin-resistance marker has been inserted into the center of the symmetrical MCS7 to yield a restriction-site-mobilizing element (RSM). The drug-resistance marker can be cleaved out of this vector with any of the restriction enzymes that recognize a site of the flanking sequences of the RSM to generate an RSM with either various sticky ends or blunt ends. These fragments can be used for insertion mutagenesis of any target molecule with compatible restriction sites. Insertion mutants are selected by their resistance to kanamycin. When the drug-resistance marker is removed with PstI, a small in-frame insertion can be generated. In addition, two new MCSs having single restriction sites have been formed by altering the symmetrical structure of MCS7. The resulting plasmids pUC8 and pUC9 allow one to clone doubly digested restriction fragments separately with both orientations in respect to the lac promoter. The terminal sequences of any DNA cloned in these plasmids can be characterized using the universal M13 primers.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Analytical Biochemistry
March/15/1988
Abstract
A discontinuous sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) system for the separation of proteins in the range from 1 to 100 kDa is described. Tricine, used as the trailing ion, allows a resolution of small proteins at lower acrylamide concentrations than in glycine-SDS-PAGE systems. A superior resolution of proteins, especially in the range between 5 and 20 kDa, is achieved without the necessity to use urea. Proteins above 30 kDa are already destacked within the sample gel. Thus a smooth passage of these proteins from sample to separating gel is warranted and overloading effects are reduced. This is of special importance when large amounts of protein are to be loaded onto preparative gels. The omission of glycine and urea prevents disturbances which might occur in the course of subsequent amino acid sequencing.
Publication
Journal: Genome Research
March/2/2009
Abstract
MicroRNAs (miRNAs) are small endogenous RNAs that pair to sites in mRNAs to direct post-transcriptional repression. Many sites that match the miRNA seed (nucleotides 2-7), particularly those in 3' untranslated regions (3'UTRs), are preferentially conserved. Here, we overhauled our tool for finding preferential conservation of sequence motifs and applied it to the analysis of human 3'UTRs, increasing by nearly threefold the detected number of preferentially conserved miRNA target sites. The new tool more efficiently incorporates new genomes and more completely controls for background conservation by accounting for mutational biases, dinucleotide conservation rates, and the conservation rates of individual UTRs. The improved background model enabled preferential conservation of a new site type, the "offset 6mer," to be detected. In total, >45,000 miRNA target sites within human 3'UTRs are conserved above background levels, and >60% of human protein-coding genes have been under selective pressure to maintain pairing to miRNAs. Mammalian-specific miRNAs have far fewer conserved targets than do the more broadly conserved miRNAs, even when considering only more recently emerged targets. Although pairing to the 3' end of miRNAs can compensate for seed mismatches, this class of sites constitutes less than 2% of all preferentially conserved sites detected. The new tool enables statistically powerful analysis of individual miRNA target sites, with the probability of preferentially conserved targeting (P(CT)) correlating with experimental measurements of repression. Our expanded set of target predictions (including conserved 3'-compensatory sites), are available at the TargetScan website, which displays the P(CT) for each site and each predicted target.
Publication
Journal: Science
March/14/2001
Abstract
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
Publication
Journal: Science
February/3/1987
Abstract
The HER-2/neu oncogene is a member of the erbB-like oncogene family, and is related to, but distinct from, the epidermal growth factor receptor. This gene has been shown to be amplified in human breast cancer cell lines. In the current study, alterations of the gene in 189 primary human breast cancers were investigated. HER-2/neu was found to be amplified from 2- to greater than 20-fold in 30% of the tumors. Correlation of gene amplification with several disease parameters was evaluated. Amplification of the HER-2/neu gene was a significant predictor of both overall survival and time to relapse in patients with breast cancer. It retained its significance even when adjustments were made for other known prognostic factors. Moreover, HER-2/neu amplification had greater prognostic value than most currently used prognostic factors, including hormonal-receptor status, in lymph node-positive disease. These data indicate that this gene may play a role in the biologic behavior and/or pathogenesis of human breast cancer.
load more...