Amino Acid Sequence
Citations
All
Search in:AllTitleAbstractAuthor name
Publications
(460K+)
Patents
Grants
Pathways
Clinical trials
Publication
Journal: Journal of Molecular Biology
December/4/1990
Abstract
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
Pulse
Views:
21
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nucleic Acids Research
October/1/1997
Abstract
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
Pulse
Views:
17
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nucleic Acids Research
January/2/1995
Abstract
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.
Pulse
Views:
29
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nucleic Acids Research
July/5/2004
Abstract
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
Pulse
Views:
5
Posts:
No posts
Rating:
Not rated
Publication
Journal: Journal of Computational Chemistry
September/15/2004
Abstract
The design, implementation, and capabilities of an extensible visualization system, UCSF Chimera, are discussed. Chimera is segmented into a core that provides basic services and visualization, and extensions that provide most higher level functionality. This architecture ensures that the extension mechanism satisfies the demands of outside developers who wish to incorporate new features. Two unusual extensions are presented: Multiscale, which adds the ability to visualize large-scale molecular assemblies such as viral coats, and Collaboratory, which allows researchers to share a Chimera session interactively despite being at separate locales. Other extensions include Multalign Viewer, for showing multiple sequence alignments and associated structures; ViewDock, for screening docked ligand orientations; Movie, for replaying molecular dynamics trajectories; and Volume Viewer, for display and analysis of volumetric data. A discussion of the usage of Chimera in real-world situations is given, along with anticipated future directions. Chimera includes full user documentation, is free to academic and nonprofit users, and is available for Microsoft Windows, Linux, Apple Mac OS X, SGI IRIX, and HP Tru64 Unix from http://www.cgl.ucsf.edu/chimera/.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Bioinformatics
December/20/2007
Abstract
CONCLUSIONS
The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++. This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems.
BACKGROUND
The programs can be run on-line from the EBI web server: http://www.ebi.ac.uk/tools/clustalw2. The source code and executables for Windows, Linux and Macintosh computers are available from the EBI ftp site ftp://ftp.ebi.ac.uk/pub/software/clustalw2/
Publication
Journal: Nucleic Acids Research
February/23/1998
Abstract
CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.
Pulse
Views:
5
Posts:
No posts
Rating:
Not rated
Publication
Journal: Journal of Molecular Biology
October/20/1982
Publication
Journal: Molecular Biology and Evolution
September/2/2013
Abstract
We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Cell
February/9/2005
Abstract
We predict regulatory targets of vertebrate microRNAs (miRNAs) by identifying mRNAs with conserved complementarity to the seed (nucleotides 2-7) of the miRNA. An overrepresentation of conserved adenosines flanking the seed complementary sites in mRNAs indicates that primary sequence determinants can supplement base pairing to specify miRNA target recognition. In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of our gene set. Targeting was also detected in open reading frames. In sum, well over one third of human genes appear to be conserved miRNA targets.
Publication
Journal: Genome Research
July/26/2004
Abstract
WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization.
Publication
Journal: Journal of Molecular Biology
January/11/1994
Abstract
We describe a comparative protein modelling method designed to find the most probable structure for a sequence given its alignment with related structures. The three-dimensional (3D) model is obtained by optimally satisfying spatial restraints derived from the alignment and expressed as probability density functions (pdfs) for the features restrained. For example, the probabilities for main-chain conformations of a modelled residue may be restrained by its residue type, main-chain conformation of an equivalent residue in a related protein, and the local similarity between the two sequences. Several such pdfs are obtained from the correlations between structural features in 17 families of homologous proteins which have been aligned on the basis of their 3D structures. The pdfs restrain C alpha-C alpha distances, main-chain N-O distances, main-chain and side-chain dihedral angles. A smoothing procedure is used in the derivation of these relationships to minimize the problem of a sparse database. The 3D model of a protein is obtained by optimization of the molecular pdf such that the model violates the input restraints as little as possible. The molecular pdf is derived as a combination of pdfs restraining individual spatial features of the whole molecule. The optimization procedure is a variable target function method that applies the conjugate gradients algorithm to positions of all non-hydrogen atoms. The method is automated and is illustrated by the modelling of trypsin from two other serine proteinases.
Publication
Journal: Biopolymers
April/23/1984
Publication
Journal: Science
August/11/2002
Abstract
It has been more than 10 years since it was first proposed that the neurodegeneration in Alzheimer's disease (AD) may be caused by deposition of amyloid beta-peptide (Abeta) in plaques in brain tissue. According to the amyloid hypothesis, accumulation of Abeta in the brain is the primary influence driving AD pathogenesis. The rest of the disease process, including formation of neurofibrillary tangles containing tau protein, is proposed to result from an imbalance between Abeta production and Abeta clearance.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Proceedings of the National Academy of Sciences of the United States of America
May/19/1988
Abstract
We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.
Pulse
Views:
2
Posts:
No posts
Rating:
Not rated
Publication
Journal: Electrophoresis
April/30/1998
Abstract
Comparative protein modeling is increasingly gaining interest since it is of great assistance during the rational design of mutagenesis experiments. The availability of this method, and the resulting models, has however been restricted by the availability of expensive computer hardware and software. To overcome these limitations, we have developed an environment for comparative protein modeling that consists of SWISS-MODEL, a server for automated comparative protein modeling and of the SWISS-PdbViewer, a sequence to structure workbench. The Swiss-PdbViewer not only acts as a client for SWISS-MODEL, but also provides a large selection of structure analysis and display tools. In addition, we provide the SWISS-MODEL Repository, a database containing more than 3500 automatically generated protein models. By making such tools freely available to the scientific community, we hope to increase the use of protein structures and models in the process of experiment design.
Publication
Journal: New England Journal of Medicine
May/24/2004
Abstract
BACKGROUND
Most patients with non-small-cell lung cancer have no response to the tyrosine kinase inhibitor gefitinib, which targets the epidermal growth factor receptor (EGFR). However, about 10 percent of patients have a rapid and often dramatic clinical response. The molecular mechanisms underlying sensitivity to gefitinib are unknown.
METHODS
We searched for mutations in the EGFR gene in primary tumors from patients with non-small-cell lung cancer who had a response to gefitinib, those who did not have a response, and those who had not been exposed to gefitinib. The functional consequences of identified mutations were evaluated after the mutant proteins were expressed in cultured cells.
RESULTS
Somatic mutations were identified in the tyrosine kinase domain of the EGFR gene in eight of nine patients with gefitinib-responsive lung cancer, as compared with none of the seven patients with no response (P<0.001). Mutations were either small, in-frame deletions or amino acid substitutions clustered around the ATP-binding pocket of the tyrosine kinase domain. Similar mutations were detected in tumors from 2 of 25 patients with primary non-small-cell lung cancer who had not been exposed to gefitinib (8 percent). All mutations were heterozygous, and identical mutations were observed in multiple patients, suggesting an additive specific gain of function. In vitro, EGFR mutants demonstrated enhanced tyrosine kinase activity in response to epidermal growth factor and increased sensitivity to inhibition by gefitinib.
CONCLUSIONS
A subgroup of patients with non-small-cell lung cancer have specific mutations in the EGFR gene, which correlate with clinical responsiveness to the tyrosine kinase inhibitor gefitinib. These mutations lead to increased growth factor signaling and confer susceptibility to the inhibitor. Screening for such mutations in lung cancers may identify patients who will have a response to gefitinib.
Pulse
Views:
4
Posts:
No posts
Rating:
Not rated
Publication
Journal: Science
September/5/2001
Abstract
Chromatin, the physiological template of all eukaryotic genetic information, is subject to a diverse array of posttranslational modifications that largely impinge on histone amino termini, thereby regulating access to the underlying DNA. Distinct histone amino-terminal modifications can generate synergistic or antagonistic interaction affinities for chromatin-associated proteins, which in turn dictate dynamic transitions between transcriptionally active or transcriptionally silent chromatin states. The combinatorial nature of histone amino-terminal modifications thus reveals a "histone code" that considerably extends the information potential of the genetic code. We propose that this epigenetic marking system represents a fundamental regulatory mechanism that has an impact on most, if not all, chromatin-templated processes, with far-reaching consequences for cell fate decisions and both normal and pathological development.
Publication
Journal: Molecular Systems Biology
December/21/2011
Abstract
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.
Publication
Journal: Nature
July/18/2002
Abstract
Cancers arise owing to the accumulation of mutations in critical genes that alter normal programmes of cell proliferation, differentiation and death. As the first stage of a systematic genome-wide screen for these genes, we have prioritized for analysis signalling pathways in which at least one gene is mutated in human cancer. The RAS RAF MEK ERK MAP kinase pathway mediates cellular responses to growth signals. RAS is mutated to an oncogenic form in about 15% of human cancer. The three RAF genes code for cytoplasmic serine/threonine kinases that are regulated by binding RAS. Here we report BRAF somatic missense mutations in 66% of malignant melanomas and at lower frequency in a wide range of human cancers. All mutations are within the kinase domain, with a single substitution (V599E) accounting for 80%. Mutated BRAF proteins have elevated kinase activity and are transforming in NIH3T3 cells. Furthermore, RAS function is not required for the growth of cancer cell lines with the V599E mutation. As BRAF is a serine/threonine kinase that is commonly activated by somatic point mutation in human cancer, it may provide new therapeutic opportunities in malignant melanoma.
Pulse
Views:
1
Posts:
No posts
Rating:
Not rated
Publication
Journal: Science
June/30/2004
Abstract
Receptor tyrosine kinase genes were sequenced in non-small cell lung cancer (NSCLC) and matched normal tissue. Somatic mutations of the epidermal growth factor receptor gene EGFR were found in 15of 58 unselected tumors from Japan and 1 of 61 from the United States. Treatment with the EGFR kinase inhibitor gefitinib (Iressa) causes tumor regression in some patients with NSCLC, more frequently in Japan. EGFR mutations were found in additional lung cancer samples from U.S. patients who responded to gefitinib therapy and in a lung adenocarcinoma cell line that was hypersensitive to growth inhibition by gefitinib, but not in gefitinib-insensitive tumors or cell lines. These results suggest that EGFR mutations may predict sensitivity to gefitinib.
Pulse
Views:
3
Posts:
No posts
Rating:
Not rated
Publication
Journal: Gene
August/24/1989
Abstract
Overlap extension represents a new approach to genetic engineering. Complementary oligodeoxyribonucleotide (oligo) primers and the polymerase chain reaction are used to generate two DNA fragments having overlapping ends. These fragments are combined in a subsequent 'fusion' reaction in which the overlapping ends anneal, allowing the 3' overlap of each strand to serve as a primer for the 3' extension of the complementary strand. The resulting fusion product is amplified further by PCR. Specific alterations in the nucleotide (nt) sequence can be introduced by incorporating nucleotide changes into the overlapping oligo primers. Using this technique of site-directed mutagenesis, three variants of a mouse major histocompatibility complex class-I gene have been generated, cloned and analyzed. Screening of mutant clones revealed at least a 98% efficiency of mutagenesis. All clones sequenced contained the desired mutations, and a low frequency of random substitution estimated to occur at approx. 1 in 4000 nt was detected. This method represents a significant improvement over standard methods of site-directed mutagenesis because it is much faster, simpler and approaches 100% efficiency in the generation of mutant product.
Publication
Journal: Nature
September/29/1997
Abstract
The X-ray crystal structure of the nucleosome core particle of chromatin shows in atomic detail how the histone protein octamer is assembled and how 146 base pairs of DNA are organized into a superhelix around it. Both histone/histone and histone/DNA interactions depend on the histone fold domains and additional, well ordered structure elements extending from this motif. Histone amino-terminal tails pass over and between the gyres of the DNA superhelix to contact neighbouring particles. The lack of uniformity between multiple histone/DNA-binding sites causes the DNA to deviate from ideal superhelix geometry.
Publication
Journal: Journal of Molecular Biology
August/2/2004
Abstract
We describe improvements of the currently most popular method for prediction of classically secreted proteins, SignalP. SignalP consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated. Motivated by the idea that the cleavage site position and the amino acid composition of the signal peptide are correlated, new features have been included as input to the neural network. This addition, combined with a thorough error-correction of a new data set, have improved the performance of the predictor significantly over SignalP version 2. In version 3, correctness of the cleavage site predictions has increased notably for all three organism groups, eukaryotes, Gram-negative and Gram-positive bacteria. The accuracy of cleavage site prediction has increased in the range 6-17% over the previous version, whereas the signal peptide discrimination improvement is mainly due to the elimination of false-positive predictions, as well as the introduction of a new discrimination score for the neural network. The new method has been benchmarked against other available methods. Predictions can be made at the publicly available web server
load more...