Abstract:

Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.

Open in

PUBMED | PMC | Google Scholar | Wikipedia

Relations:

Content

Citations

(1K+)

References

(23)

Chemicals

(1)

Organisms

(3)

Processes

(4)

Affiliates

(2)

Amino acid substitution matrices from protein blocks.

S Henikoff

J Henikoff

Abstract

Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.0M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

McLachlan AD. Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . J Mol Biol. 1971 Oct 28;61(2):409–424. [PubMed] [Google Scholar]
Feng DF, Johnson MS, Doolittle RF. Aligning amino acid sequences: comparison of commonly used methods. J Mol Evol. 1984;21(2):112–125. [PubMed] [Google Scholar]
Mohana Rao JK. New scoring matrix for amino acid residue exchanges based on residue characteristic physical parameters. Int J Pept Protein Res. 1987 Feb;29(2):276–281. [PubMed] [Google Scholar]
Risler JL, Delorme MO, Delacroix H, Henaut A. Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix. J Mol Biol. 1988 Dec 20;204(4):1019–1029. [PubMed] [Google Scholar]
Smith RF, Smith TF. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122.[PMC free article] [PubMed] [Google Scholar]
George DG, Barker WC, Hunt LT. Mutation data matrix and its uses. Methods Enzymol. 1990;183:333–351. [PubMed] [Google Scholar]
Altschul SF. Amino acid substitution matrices from an information theoretic perspective. J Mol Biol. 1991 Jun 5;219(3):555–565. [PubMed] [Google Scholar]
Henikoff S, Henikoff JG. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 1991 Dec 11;19(23):6565–6572.[PMC free article] [PubMed] [Google Scholar]
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed] [Google Scholar]
Henikoff S, Wallace JC, Brown JP. Finding protein similarities with nucleotide sequence databases. Methods Enzymol. 1990;183:111–132. [PubMed] [Google Scholar]
Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2241–2245.[PMC free article] [PubMed] [Google Scholar]
Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2247–2249.[PMC free article] [PubMed] [Google Scholar]
Smith HO, Annau TM, Chandrasegaran S. Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci U S A. 1990 Jan;87(2):826–830.[PMC free article] [PubMed] [Google Scholar]
Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 1988 Nov 25;16(22):10881–10890.[PMC free article] [PubMed] [Google Scholar]
Pearson WR. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990;183:63–98. [PubMed] [Google Scholar]
Pearson WR. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics. 1991 Nov;11(3):635–650. [PubMed] [Google Scholar]
Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed] [Google Scholar]
Lipman DJ, Altschul SF, Kececioglu JD. A tool for multiple sequence alignment. Proc Natl Acad Sci U S A. 1989 Jun;86(12):4412–4415.[PMC free article] [PubMed] [Google Scholar]
Greer J. Comparative model-building of the mammalian serine proteases. J Mol Biol. 1981 Dec 25;153(4):1027–1042. [PubMed] [Google Scholar]
Doolittle RF. Searching through sequence databases. Methods Enzymol. 1990;183:99–110. [PubMed] [Google Scholar]
Attwood TK, Eliopoulos EE, Findlay JB. Multiple sequence alignment of protein families showing low sequence homology: a methodological approach using database pattern-matching discriminators for G-protein-linked receptors. Gene. 1991 Feb 15;98(2):153–159. [PubMed] [Google Scholar]
Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence database. Science. 1992 Jun 5;256(5062):1443–1445. [PubMed] [Google Scholar]
Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992 Jun;8(3):275–282. [PubMed] [Google Scholar]

Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA 98104.

Copyright notice

Abstract

Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.

Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.

Learn More