Extracting protein alignment models from the sequence database.
Abstract
Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences.
Full Text
The Full Text of this article is available as a PDF (594K).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed] [Google Scholar]
- Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448.[PMC free article] [PubMed] [Google Scholar]
- Greenwell PW, Kronmal SL, Porter SE, Gassenhuber J, Obermaier B, Petes TD. TEL1, a gene involved in controlling telomere length in S. cerevisiae, is homologous to the human ataxia telangiectasia gene. Cell. 1995 Sep 8;82(5):823–829. [PubMed] [Google Scholar]
- Papadopoulos N, Nicolaides NC, Wei YF, Ruben SM, Carter KC, Rosen CA, Haseltine WA, Fleischmann RD, Fraser CM, Adams MD, et al. Mutation of a mutL homolog in hereditary colon cancer. Science. 1994 Mar 18;263(5153):1625–1629. [PubMed] [Google Scholar]
- Bronner CE, Baker SM, Morrison PT, Warren G, Smith LG, Lescoe MK, Kane M, Earabino C, Lipford J, Lindblom A, et al. Mutation in the DNA mismatch repair gene homologue hMLH1 is associated with hereditary non-polyposis colon cancer. Nature. 1994 Mar 17;368(6468):258–261. [PubMed] [Google Scholar]
- Henikoff S, Henikoff JG. Protein family classification based on searching a database of blocks. Genomics. 1994 Jan 1;19(1):97–107. [PubMed] [Google Scholar]
- Gribskov M. Profile analysis. Methods Mol Biol. 1994;25:247–266. [PubMed] [Google Scholar]
- Gribskov M, Veretnik S. Identification of sequence pattern with profile analysis. Methods Enzymol. 1996;266:198–212. [PubMed] [Google Scholar]
- Lüthy R, Xenarios I, Bucher P. Improving the sensitivity of the sequence profile method. Protein Sci. 1994 Jan;3(1):139–146.[PMC free article] [PubMed] [Google Scholar]
- Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. [PubMed] [Google Scholar]
- Baldi P, Chauvin Y, Hunkapiller T, McClure MA. Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci U S A. 1994 Feb 1;91(3):1059–1063.[PMC free article] [PubMed] [Google Scholar]
- Eddy SR. Multiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol. 1995;3:114–120. [PubMed] [Google Scholar]
- Tatusov RL, Altschul SF, Koonin EV. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091–12095.[PMC free article] [PubMed] [Google Scholar]
- Neuwald AF, Liu JS, Lawrence CE. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 1995 Aug;4(8):1618–1632.[PMC free article] [PubMed] [Google Scholar]
- Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. [PubMed] [Google Scholar]
- Eddy SR, Mitchison G, Durbin R. Maximum discrimination hidden Markov models of sequence consensus. J Comput Biol. 1995 Spring;2(1):9–23. [PubMed] [Google Scholar]
- Ouzounis C, Sander C. Homology of the NifS family of proteins to a new class of pyridoxal phosphate-dependent enzymes. FEBS Lett. 1993 May 10;322(2):159–164. [PubMed] [Google Scholar]
- Gribskov M. Translational initiation factors IF-1 and eIF-2 alpha share an RNA-binding motif with prokaryotic ribosomal protein S1 and polynucleotide phosphorylase. Gene. 1992 Sep 21;119(1):107–111. [PubMed] [Google Scholar]
- Koonin EV, Tatusov RL. Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity. Application of an iterative approach to database search. J Mol Biol. 1994 Nov 18;244(1):125–132. [PubMed] [Google Scholar]
- Koonin EV, Tatusov RL, Rudd KE. Protein sequence comparison at genome scale. Methods Enzymol. 1996;266:295–322. [PubMed] [Google Scholar]
- Bork P, Gibson TJ. Applying motif and profile searches. Methods Enzymol. 1996;266:162–184. [PubMed] [Google Scholar]
- Yi TM, Lander ES. Iterative template refinement: protein-fold prediction using iterative search and hybrid sequence/structure templates. Methods Enzymol. 1996;266:322–339. [PubMed] [Google Scholar]
- Green P, Lipman D, Hillier L, Waterston R, States D, Claverie JM. Ancient conserved regions in new gene sequences and the protein databases. Science. 1993 Mar 19;259(5102):1711–1716. [PubMed] [Google Scholar]
- Koonin EV, Bork P, Sander C. Yeast chromosome III: new gene functions. EMBO J. 1994 Feb 1;13(3):493–503.[PMC free article] [PubMed] [Google Scholar]
- Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed] [Google Scholar]
- Henikoff S, Henikoff JG. Position-based sequence weights. J Mol Biol. 1994 Nov 4;243(4):574–578. [PubMed] [Google Scholar]
- Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. [PubMed] [Google Scholar]
- Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358.[PMC free article] [PubMed] [Google Scholar]
- Aronson HE, Royer WE, Jr, Hendrickson WA. Quantification of tertiary structural conservation despite primary sequence drift in the globin fold. Protein Sci. 1994 Oct;3(10):1706–1711.[PMC free article] [PubMed] [Google Scholar]
- Bashford D, Chothia C, Lesk AM. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. [PubMed] [Google Scholar]
- Kapp OH, Moens L, Vanfleteren J, Trotman CN, Suzuki T, Vinogradov SN. Alignment of 700 globin sequences: extent of amino acid substitution and its correlation with variation in volume. Protein Sci. 1995 Oct;4(10):2179–2190.[PMC free article] [PubMed] [Google Scholar]
- Blaxter ML. Nemoglobins: divergent nematode globins. Parasitol Today. 1993 Oct;9(10):353–360. [PubMed] [Google Scholar]
- Trewitt PM, Luhm RA, Samad F, Ramakrishnan S, Kao WY, Bergtrom G. Molecular evolutionary analysis of the YWVZ/7B globin gene cluster of the insect Chironomus thummi. J Mol Evol. 1995 Sep;41(3):313–328. [PubMed] [Google Scholar]
- Wootton JC. Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem. 1994 Sep;18(3):269–285. [PubMed] [Google Scholar]
- Minning DM, Kloek AP, Yang J, Mathews FS, Goldberg DE. Subunit interactions in Ascaris hemoglobin octamer formation. J Biol Chem. 1995 Sep 22;270(38):22248–22253. [PubMed] [Google Scholar]
- Saraste M, Sibbald PR, Wittinghofer A. The P-loop--a common motif in ATP- and GTP-binding proteins. Trends Biochem Sci. 1990 Nov;15(11):430–434. [PubMed] [Google Scholar]
- Walker JE, Saraste M, Runswick MJ, Gay NJ. Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J. 1982;1(8):945–951.[PMC free article] [PubMed] [Google Scholar]
- Glucksmann MA, Reuber TL, Walker GC. Family of glycosyl transferases needed for the synthesis of succinoglycan by Rhizobium meliloti. J Bacteriol. 1993 Nov;175(21):7033–7044.[PMC free article] [PubMed] [Google Scholar]
- Rawlings ND, Barrett AJ. Evolutionary families of metallopeptidases. Methods Enzymol. 1995;248:183–228. [PubMed] [Google Scholar]
- Deddish PA, Skidgel RA, Erdös EG. Enhanced Co2+ activation and inhibitor binding of carboxypeptidase M at low pH. Similarity to carboxypeptidase H (enkephalin convertase). Biochem J. 1989 Jul 1;261(1):289–291.[PMC free article] [PubMed] [Google Scholar]
- Salowe SP, Marcy AI, Cuca GC, Smith CK, Kopka IE, Hagmann WK, Hermes JD. Characterization of zinc-binding sites in human stromelysin-1: stoichiometry of the catalytic domain and identification of a cysteine ligand in the proenzyme. Biochemistry. 1992 May 19;31(19):4535–4540. [PubMed] [Google Scholar]
- Carfi A, Pares S, Duée E, Galleni M, Duez C, Frère JM, Dideberg O. The 3-D structure of a zinc metallo-beta-lactamase from Bacillus cereus reveals a new type of protein fold. EMBO J. 1995 Oct 16;14(20):4914–4921.[PMC free article] [PubMed] [Google Scholar]
- Richter D, Niegemann E, Brendel M. Molecular structure of the DNA cross-link repair gene SNM1 (PSO2) of the yeast Saccharomyces cerevisiae. Mol Gen Genet. 1992 Jan;231(2):194–200. [PubMed] [Google Scholar]
- Jiang W, Metcalf WW, Lee KS, Wanner BL. Molecular cloning, mapping, and regulation of Pho regulon genes for phosphonate breakdown by the phosphonatase pathway of Salmonella typhimurium LT2. J Bacteriol. 1995 Nov;177(22):6411–6421.[PMC free article] [PubMed] [Google Scholar]
- Perham RN. Structure and posttranslational modification of lipoyl domain of 2-oxo-acid dehydrogenase multienzyme complexes. Methods Enzymol. 1995;251:436–448. [PubMed] [Google Scholar]
- Green JD, Laue ED, Perham RN, Ali ST, Guest JR. Three-dimensional structure of a lipoyl domain from the dihydrolipoyl acetyltransferase component of the pyruvate dehydrogenase multienzyme complex of Escherichia coli. J Mol Biol. 1995 Apr 28;248(2):328–343. [PubMed] [Google Scholar]
- Wood HG, Barden RE. Biotin enzymes. Annu Rev Biochem. 1977;46:385–413. [PubMed] [Google Scholar]
- Saier MH, Jr, Tam R, Reizer A, Reizer J. Two novel families of bacterial membrane proteins concerned with nodulation, cell division and transport. Mol Microbiol. 1994 Mar;11(5):841–847. [PubMed] [Google Scholar]
- Dinh T, Paulsen IT, Saier MH., Jr A family of extracytoplasmic proteins that allow transport of large molecules across the outer membranes of gram-negative bacteria. J Bacteriol. 1994 Jul;176(13):3825–3831.[PMC free article] [PubMed] [Google Scholar]
- Lewis K. Multidrug resistance pumps in bacteria: variations on a theme. Trends Biochem Sci. 1994 Mar;19(3):119–123. [PubMed] [Google Scholar]
- Ma D, Cook DN, Hearst JE, Nikaido H. Efflux pumps and drug resistance in gram-negative bacteria. Trends Microbiol. 1994 Dec;2(12):489–493. [PubMed] [Google Scholar]
- Brocklehurst SM, Perham RN. Prediction of the three-dimensional structures of the biotinylated domain from yeast pyruvate carboxylase and of the lipoylated H-protein from the pea leaf glycine cleavage system: a new automated method for the prediction of protein tertiary structure. Protein Sci. 1993 Apr;2(4):626–639.[PMC free article] [PubMed] [Google Scholar]
- Lim F, Morris CP, Occhiodoro F, Wallace JC. Sequence and domain structure of yeast pyruvate carboxylase. J Biol Chem. 1988 Aug 15;263(23):11493–11497. [PubMed] [Google Scholar]
- Hale G, Wallis NG, Perham RN. Interaction of avidin with the lipoyl domains in the pyruvate dehydrogenase multienzyme complex: three-dimensional location and similarity to biotinyl domains in carboxylases. Proc Biol Sci. 1992 Jun 22;248(1323):247–253. [PubMed] [Google Scholar]
- Boguski MS, McCormick F. Proteins regulating Ras and its relatives. Nature. 1993 Dec 16;366(6456):643–654. [PubMed] [Google Scholar]
- Cerione RA, Zheng Y. The Dbl family of oncogenes. Curr Opin Cell Biol. 1996 Apr;8(2):216–222. [PubMed] [Google Scholar]
- Takai Y, Sasaki T, Tanaka K, Nakanishi H. Rho as a regulator of the cytoskeleton. Trends Biochem Sci. 1995 Jun;20(6):227–231. [PubMed] [Google Scholar]
- Hart MJ, Eva A, Zangrilli D, Aaronson SA, Evans T, Cerione RA, Zheng Y. Cellular transformation and guanine nucleotide exchange activity are catalyzed by a common domain on the dbl oncogene product. J Biol Chem. 1994 Jan 7;269(1):62–65. [PubMed] [Google Scholar]
- Musacchio A, Gibson T, Rice P, Thompson J, Saraste M. The PH domain: a common piece in the structural patchwork of signalling proteins. Trends Biochem Sci. 1993 Sep;18(9):343–348. [PubMed] [Google Scholar]
- Gibson TJ, Hyvönen M, Musacchio A, Saraste M, Birney E. PH domain: the first anniversary. Trends Biochem Sci. 1994 Sep;19(9):349–353. [PubMed] [Google Scholar]
- Shaw G. The pleckstrin homology domain: an intriguing multifunctional protein module. Bioessays. 1996 Jan;18(1):35–46. [PubMed] [Google Scholar]
- Sanders SL, Field CM. Cell division. Bud-site selection is only skin deep. Curr Biol. 1995 Nov 1;5(11):1213–1215. [PubMed] [Google Scholar]
- Chant J, Mischke M, Mitchell E, Herskowitz I, Pringle JR. Role of Bud3p in producing the axial budding pattern of yeast. J Cell Biol. 1995 May;129(3):767–778.[PMC free article] [PubMed] [Google Scholar]
- Elion EA, Trueheart J, Fink GR. Fus2 localizes near the site of cell fusion and is required for both cell fusion and nuclear alignment during zygote formation. J Cell Biol. 1995 Sep;130(6):1283–1296.[PMC free article] [PubMed] [Google Scholar]
- Chenevert J, Valtz N, Herskowitz I. Identification of genes required for normal pheromone-induced cell polarization in Saccharomyces cerevisiae. Genetics. 1994 Apr;136(4):1287–1296.[PMC free article] [PubMed] [Google Scholar]
- Chant J. Cell polarity in yeast. Trends Genet. 1994 Sep;10(9):328–333. [PubMed] [Google Scholar]
- Drubin DG, Nelson WJ. Origins of cell polarity. Cell. 1996 Feb 9;84(3):335–344. [PubMed] [Google Scholar]
- Bender A, Pringle JR. Multicopy suppression of the cdc24 budding defect in yeast by CDC42 and three newly identified genes including the ras-related gene RSR1. Proc Natl Acad Sci U S A. 1989 Dec;86(24):9976–9980.[PMC free article] [PubMed] [Google Scholar]
- Chant J, Corrado K, Pringle JR, Herskowitz I. Yeast BUD5, encoding a putative GDP-GTP exchange factor, is necessary for bud site selection and interacts with bud formation gene BEM1. Cell. 1991 Jun 28;65(7):1213–1224. [PubMed] [Google Scholar]
- Park HO, Chant J, Herskowitz I. BUD2 encodes a GTPase-activating protein for Bud1/Rsr1 necessary for proper bud-site selection in yeast. Nature. 1993 Sep 16;365(6443):269–274. [PubMed] [Google Scholar]
- Way JC, Wang L, Run JQ, Hung MS. Cell polarity and the mechanism of asymmetric cell division. Bioessays. 1994 Dec;16(12):925–931. [PubMed] [Google Scholar]
- Sanders SL, Herskowitz I. The BUD4 protein of yeast, required for axial budding, is localized to the mother/BUD neck in a cell cycle-dependent manner. J Cell Biol. 1996 Jul;134(2):413–427.[PMC free article] [PubMed] [Google Scholar]
- Simon MN, De Virgilio C, Souza B, Pringle JR, Abo A, Reed SI. Role for the Rho-family GTPase Cdc42 in yeast mating-pheromone signal pathway. Nature. 1995 Aug 24;376(6542):702–705. [PubMed] [Google Scholar]
- Zhao ZS, Leung T, Manser E, Lim L. Pheromone signalling in Saccharomyces cerevisiae requires the small GTP-binding protein Cdc42p and its activator CDC24. Mol Cell Biol. 1995 Oct;15(10):5246–5257.[PMC free article] [PubMed] [Google Scholar]
- Wittenberg C, Reed SI. Plugging it in: signaling circuits and the yeast cell cycle. Curr Opin Cell Biol. 1996 Apr;8(2):223–230. [PubMed] [Google Scholar]
- Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992 Jun 18;357(6379):543–544. [PubMed] [Google Scholar]
- Nagai K, Luisi B, Shih D, Miyazaki G, Imai K, Poyart C, De Young A, Kwiatkowsky L, Noble RW, Lin SH, et al. Distal residues in the oxygen binding site of haemoglobin studied by protein engineering. Nature. 329(6142):858–860. [PubMed] [Google Scholar]
- Lin SH, Yu NT, Tame J, Shih D, Renaud JP, Pagnier J, Nagai K. Effect of the distal residues on the vibrational modes of the Fe-CO bond in hemoglobin studied by protein engineering. Biochemistry. 1990 Jun 12;29(23):5562–5566. [PubMed] [Google Scholar]
- Chevrier B, Schalk C, D'Orchymont H, Rondeau JM, Moras D, Tarnus C. Crystal structure of Aeromonas proteolytica aminopeptidase: a prototypical member of the co-catalytic zinc enzyme family. Structure. 1994 Apr 15;2(4):283–291. [PubMed] [Google Scholar]
- Chevrier B, D'Orchymont H, Schalk C, Tarnus C, Moras D. The structure of the Aeromonas proteolytica aminopeptidase complexed with a hydroxamate inhibitor. Involvement in catalysis of Glu151 and two zinc ions of the co-catalytic unit. Eur J Biochem. 1996 Apr 15;237(2):393–398. [PubMed] [Google Scholar]
- Dardel F, Davis AL, Laue ED, Perham RN. Three-dimensional structure of the lipoyl domain from Bacillus stearothermophilus pyruvate dehydrogenase multienzyme complex. J Mol Biol. 1993 Feb 20;229(4):1037–1048. [PubMed] [Google Scholar]
- Altschul SF, Gish W. Local alignment statistics. Methods Enzymol. 1996;266:460–480. [PubMed] [Google Scholar]
Abstract
Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences.