CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
Journal: 1995/January - Nucleic Acids Research
ISSN: 0305-1048
PUBMED: 7984417
Abstract:
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.
Relations:
Content
Citations
(15K+)
References
(32)
Chemicals
(2)
Processes
(3)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Nucleic Acids Res 22(22): 4673-4680

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Abstract

The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (2.5M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Images in this article

Click on the image to see a larger version.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Lüthy R, Xenarios I, Bucher P. Improving the sensitivity of the sequence profile method. Protein Sci. 1994 Jan;3(1):139–146.[PMC free article] [PubMed] [Google Scholar]
  • Higgins DG, Sharp PM. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988 Dec 15;73(1):237–244. [PubMed] [Google Scholar]
  • Higgins DG, Sharp PM. Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci. 1989 Apr;5(2):151–153. [PubMed] [Google Scholar]
  • Higgins DG, Bleasby AJ, Fuchs R. CLUSTAL V: improved software for multiple sequence alignment. Comput Appl Biosci. 1992 Apr;8(2):189–191. [PubMed] [Google Scholar]
  • Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987 Jul;4(4):406–425. [PubMed] [Google Scholar]
  • Bashford D, Chothia C, Lesk AM. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. [PubMed] [Google Scholar]
  • Musacchio A, Gibson T, Lehto VP, Saraste M. SH3--an abundant protein domain in search of a function. FEBS Lett. 1992 Jul 27;307(1):55–61. [PubMed] [Google Scholar]
  • Musacchio A, Noble M, Pauptit R, Wierenga R, Saraste M. Crystal structure of a Src-homology 3 (SH3) domain. Nature. 1992 Oct 29;359(6398):851–855. [PubMed] [Google Scholar]
  • Bashford D, Chothia C, Lesk AM. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. [PubMed] [Google Scholar]
  • Myers EW, Miller W. Optimal alignments in linear space. Comput Appl Biosci. 1988 Mar;4(1):11–17. [PubMed] [Google Scholar]
  • Smith TF, Waterman MS, Fitch WM. Comparative biosequence metrics. J Mol Evol. 1981;18(1):38–46. [PubMed] [Google Scholar]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448.[PMC free article] [PubMed] [Google Scholar]
  • Devereux J, Haeberli P, Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395.[PMC free article] [PubMed] [Google Scholar]
  • Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980 Dec;16(2):111–120. [PubMed] [Google Scholar]
  • Smith RF, Smith TF. Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Eng. 1992 Jan;5(1):35–41. [PubMed] [Google Scholar]
  • Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. [PubMed] [Google Scholar]
  • Jones DT, Taylor WR, Thornton JM. A mutation data matrix for transmembrane proteins. FEBS Lett. 1994 Feb 21;339(3):269–275. [PubMed] [Google Scholar]
  • Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022.[PMC free article] [PubMed] [Google Scholar]
  • Noble ME, Musacchio A, Saraste M, Courtneidge SA, Wierenga RK. Crystal structure of the SH3 domain in human Fyn; comparison of the three-dimensional structures of SH3 domains in tyrosine kinases and spectrin. EMBO J. 1993 Jul;12(7):2617–2624.[PMC free article] [PubMed] [Google Scholar]
  • Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. [PubMed] [Google Scholar]
  • Feng DF, Doolittle RF. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. [PubMed] [Google Scholar]
  • Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. [PubMed] [Google Scholar]
  • Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919.[PMC free article] [PubMed] [Google Scholar]
  • Lipman DJ, Altschul SF, Kececioglu JD. A tool for multiple sequence alignment. Proc Natl Acad Sci U S A. 1989 Jun;86(12):4412–4415.[PMC free article] [PubMed] [Google Scholar]
  • Barton GJ, Sternberg MJ. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol. 1987 Nov 20;198(2):327–337. [PubMed] [Google Scholar]
  • Gotoh O. Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci. 1993 Jun;9(3):361–370. [PubMed] [Google Scholar]
  • Altschul SF. Gap costs for multiple sequence alignment. J Theor Biol. 1989 Jun 8;138(3):297–309. [PubMed] [Google Scholar]
  • Lukashin AV, Engelbrecht J, Brunak S. Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. Nucleic Acids Res. 1992 May 25;20(10):2511–2516.[PMC free article] [PubMed] [Google Scholar]
  • Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. [PubMed] [Google Scholar]
  • Sainsard-Chanet A, Begel O, Belcour L. DNA deletion of mitochondrial introns is correlated with the process of senescence in Podospora anserina. J Mol Biol. 1993 Nov 5;234(1):1–7. [PubMed] [Google Scholar]
  • Pascarella S, Argos P. Analysis of insertions/deletions in protein structures. J Mol Biol. 1992 Mar 20;224(2):461–471. [PubMed] [Google Scholar]
  • Vingron M, Sibbald PR. Weighting in sequence space: a comparison of methods in terms of generalized sequences. Proc Natl Acad Sci U S A. 1993 Oct 1;90(19):8777–8781.[PMC free article] [PubMed] [Google Scholar]
  • Thompson JD, Higgins DG, Gibson TJ. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. [PubMed] [Google Scholar]
European Molecular Biology Laboratory, Heidelberg, Germany.
European Molecular Biology Laboratory, Heidelberg, Germany.
Abstract
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.