Why highly expressed proteins evolve slowly.
Journal: 2006/March - Proceedings of the National Academy of Sciences of the United States of America
ISSN: 0027-8424
Abstract:
Much recent work has explored molecular and population-genetic constraints on the rate of protein sequence evolution. The best predictor of evolutionary rate is expression level, for reasons that have remained unexplained. Here, we hypothesize that selection to reduce the burden of protein misfolding will favor protein sequences with increased robustness to translational missense errors. Pressure for translational robustness increases with expression level and constrains sequence evolution. Using several sequenced yeast genomes, global expression and protein abundance data, and sets of paralogs traceable to an ancient whole-genome duplication in yeast, we rule out several confounding effects and show that expression level explains roughly half the variation in Saccharomyces cerevisiae protein evolutionary rates. We examine causes for expression's dominant role and find that genome-wide tests favor the translational robustness explanation over existing hypotheses that invoke constraints on function or translational efficiency. Our results suggest that proteins evolve at rates largely unrelated to their functions and can explain why highly expressed proteins evolve slowly across the tree of life.
Relations:
Content
Citations
(315)
References
(43)
Chemicals
(1)
Organisms
(1)
Processes
(8)
Affiliates
(2)
Similar articles
Articles by the same authors
Discussion board
Proc Natl Acad Sci U S A 102(40): 14338-14343

Why highly expressed proteins evolve slowly

Program in Computation and Neural Systems and Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125-4100; and Keck Graduate Institute, Claremont, CA 91711
To whom correspondence should be addressed. E-mail: ude.notecnirp.inmula@dnommurd.
Edited by Francisco J. Ayala, University of California, Irvine, CA, and approved August 11, 2005
Edited by Francisco J. Ayala, University of California, Irvine, CA, and approved August 11, 2005
Received 2005 May 16

Abstract

Much recent work has explored molecular and population-genetic constraints on the rate of protein sequence evolution. The best predictor of evolutionary rate is expression level, for reasons that have remained unexplained. Here, we hypothesize that selection to reduce the burden of protein misfolding will favor protein sequences with increased robustness to translational missense errors. Pressure for translational robustness increases with expression level and constrains sequence evolution. Using several sequenced yeast genomes, global expression and protein abundance data, and sets of paralogs traceable to an ancient whole-genome duplication in yeast, we rule out several confounding effects and show that expression level explains roughly half the variation in Saccharomyces cerevisiae protein evolutionary rates. We examine causes for expression's dominant role and find that genome-wide tests favor the translational robustness explanation over existing hypotheses that invoke constraints on function or translational efficiency. Our results suggest that proteins evolve at rates largely unrelated to their functions and can explain why highly expressed proteins evolve slowly across the tree of life.

Keywords: evolutionary rate, protein misfolding, yeast, translation errors, gene duplication
Abstract

A central problem in molecular evolution is why proteins evolve at different rates. Protein evolutionary rates, quantified by the number of nonsynonymous nucleotide changes per site (dN) in the encoding genes, are routinely used to build phylogenetic trees, detect selection, find orthologous proteins among related species (1), and evaluate the functional importance of genes (2), yet we possess only hints of the biophysical cause of rate differences. Thirty years ago, Zuckerkandl (3) proposed that a protein's sequence will evolve at a rate primarily determined by the proportion of its sites involved in specific functions (or “functional density”). Although this proposal has gained wide acceptance (2), measurement of functional density remains problematic because residues may contribute to protein function in unpredictable ways, and arduous sequence-wide saturation mutagenesis and mutant characterization studies are required to ascertain these effects.

Instead, many recent studies have focused on other, more readily obtained, measures that may approximate functional density. For example, protein-protein interactions presumably constrain interfacial residues, and some reports indicate that highly interactive proteins evolve slowly (4). The intuition that a protein's overall functional importance should amplify the fitness costs of mutations at sites that make subtle functional contributions has been captured in analyses of how a gene's functional category (5, 6), its essentiality for organism survival (6-8), or the fitness effect of its deletion (or “dispensability”) (9, 10) correlate with evolutionary rate. In all cases, the effects under consideration explain only a small fraction (≈5% or less) of the observed variation in evolutionary rate as quantified by their squared correlation coefficients, r.

Surprisingly, from bacteria to mammals, the best indicator of a protein's relative evolutionary rate is the expression level of the encoding gene, measured in mRNA transcripts per cell (5, 6, 11-14). Highly expressed proteins evolve slowly, accounting for as much as 34% of rate variation in yeast (5). Moreover, after expression level is controlled for, the remaining influence of protein-protein interactions and dispensability decreases or, in some datasets, vanishes completely (15-17). Expression level's disproportionate influence remains unexplained (5, 6, 16-20).

Significant questions have persisted about whether expression level truly determines evolutionary rate, because highly expressed proteins may possess unique structural or functional features that constrain their sequences. Paralogous gene pairs resulting from a whole-genome duplication (WGD) event, such as in the lineage of Saccharomyces cerevisiae (21), minimize such differences: homology ensures a similar structure, and the majority of yeast paralogs show little, if any, difference in function (22). Analyses of evolutionary rates among paralogs have, to date, confirmed only a small independent role for expression level. Among a set of 185 yeast paralog pairs, evolutionary rate and expression level in mRNA molecules per cell correlated (r = 0.341), but the correlation of rate and expression differences between members of a paralogous pair was much smaller (r = 0.046), and no significant tendency for the higher-expressed paralog to evolve slower was found (5). A recent study that proved the WGD in yeast (21) analyzed patterns of paralog evolutionary rates and concluded that they supported a widely cited model of evolution by gene duplication (23) in which one duplicate gene retains the ancestral function and evolves slowly, whereas the other duplicate gene evolves rapidly and acquires a new function. Such behavior would obscure the influence of other variables such as expression level on paralog evolutionary rates.

Recently, several resources have become available that allow a more thorough analysis of these issues: a set of 900 S. cerevisiae paralogs derived from gene synteny and traceable to the WGD event (21), a global measurement of yeast protein abundances (24), and several additional yeast genome sequences (21, 25). Here, using this new information, we examine the strength, independence, and physical basis of expression-based constraints on protein sequence evolution. We carry out a systematic analysis designed to answer several questions. How strongly does expression constrain yeast protein evolution after controlling for structure and function? What role does functional differentiation play, compared with gene expression, in predicting the relative evolutionary rates of duplicate genes? And, what do these correlations reveal about underlying causes of evolutionary rate differences? We introduce a previously unexplored hypothesis to explain why highly expressed proteins evolve slowly and test this explanation against other causal hypotheses by using genome-wide data. Finally, we explore whether the selective pressure that we propose increases functional density and examine the biological costs underlying it.

, P < 10; , P < 10; , P < 10.

Click here to view.

Acknowledgments

This work was supported by National Institutes of Health National Research Service Award 5 T32 MH19138 (to D.A.D.) and a Howard Hughes Medical Institute Predoctoral Fellowship (to J.D.B.).

Acknowledgments

Notes

Author contributions: D.A.D. designed and performed research; D.A.D., J.D.B., C.A., C.O.W., and F.H.A. contributed new reagents/analytic tools; D.A.D. analyzed data; and D.A.D., J.D.B., C.A., C.O.W., and F.H.A. wrote the paper.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: dN, number of nonsynonymous substitutions per site; dS, number of synonymous substitutions per site; CAI, codon adaptation index; WGD, whole-genome duplication.

Notes
Author contributions: D.A.D. designed and performed research; D.A.D., J.D.B., C.A., C.O.W., and F.H.A. contributed new reagents/analytic tools; D.A.D. analyzed data; and D.A.D., J.D.B., C.A., C.O.W., and F.H.A. wrote the paper.
This paper was submitted directly (Track II) to the PNAS office.
Abbreviations: dN, number of nonsynonymous substitutions per site; dS, number of synonymous substitutions per site; CAI, codon adaptation index; WGD, whole-genome duplication.

References

  • 1. Wall, D. P., Fraser, H. B. &amp; Hirsh, A. E. (2003) Bioinformatics19, 1710-1711. [[PubMed]
  • 2. Graur, D. &amp; Li, W.-H. (2000) Fundamentals of Molecular Evolution (Sinauer, Sunderland, MA).
  • 3. Zuckerkandl, E(1976) J. Mol. Evol.7, 167-183. [[PubMed][Google Scholar]
  • 4. Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C. &amp; Feldman, M. W. (2002) Science296, 750-752. [[PubMed]
  • 5. Pál, C., Papp, B. &amp; Hurst, L. D. (2001) Genetics158, 927-931.
  • 6. Rocha, E. P. &amp; Danchin, A. (2004) Mol. Biol. Evol.21, 108-116. [[PubMed]
  • 7. Hurst, L. D. &amp; Smith, N. G. (1999) Curr. Biol.9, 747-750. [[PubMed]
  • 8. Jordan, I. K., Rogozin, I. B., Wolf, Y. I. &amp; Koonin, E. V. (2002) Genome Res.12, 962-968.
  • 9. Hirsh, A. E. &amp; Fraser, H. B. (2001) Nature411, 1046-1049. [[PubMed]
  • 10. Wall, D. P., Hirsh, A. E., Fraser, H. B., Kumm, J., Giaever, G., Eisen, M. B. &amp; Feldman, M. W. (2005) Proc. Natl. Acad. Sci. USA102, 5483-5488.
  • 11. Herbeck, J. T., Wall, D. P. &amp; Wernegreen, J. J. (2003) Microbiology149, 2585-2596. [[PubMed]
  • 12. Sharp, P. M. (1991) J. Mol. Evol.33, 23-33. [[PubMed]
  • 13. Duret, L. &amp; Mouchiroud, D. (2000) Mol. Biol. Evol.17, 68-74. [[PubMed]
  • 14. Subramanian, S. &amp; Kumar, S. (2004) Genetics168, 373-381.
  • 15. Bloom, J. D. &amp; Adami, C. (2003) BMC Evol. Biol.3, 21.
  • 16. Pál, C., Papp, B. &amp; Hurst, L. D. (2003) Nature421, 496-497. [[PubMed]
  • 17. Hirsh, A. E. &amp; Fraser, H. B. (2003) Nature421, 497-498. [PubMed]
  • 18. Akashi, H(2001) Curr. Opin. Genet. Dev.11, 660-666. [[PubMed][Google Scholar]
  • 19. Akashi, H(2003) Genetics164, 1291-1303. [Google Scholar]
  • 20. Marais, G., Domazet-Loso, T., Tautz, D. &amp; Charlesworth, B. (2004) J. Mol. Evol.59, 771-779. [[PubMed]
  • 21. Kellis, M., Birren, B. W. &amp; Lander, E. S. (2004) Nature428, 617-624. [[PubMed]
  • 22. Seoighe, C. &amp; Wolfe, K. H. (1999) Curr. Opin. Microbiol.2, 548-554. [[PubMed]
  • 23. Ohno, S(1970) Evolution by Gene Duplication (Allen &amp; Unwin, London).[Google Scholar]
  • 24. Ghaemmaghami, S., Huh, W. K., Bower, K., Howson, R. W., Belle, A., Dephoure, N., O'Shea, E. K. &amp; Weissman, J. S. (2003) Nature425, 737-741. [[PubMed]
  • 25. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. &amp; Lander, E. S. (2003) Nature423, 241-254. [[PubMed]
  • 26. Holstege, F. C., Jennings, E. G., Wyrick, J. J., Lee, T. I., Hengartner, C. J., Green, M. R., Golub, T. R., Lander, E. S. &amp; Young, R. A. (1998) Cell95, 717-728. [[PubMed]
  • 27. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. &amp; Lipman, D. J. (1997) Nucleic Acids Res.25, 3389-3402.
  • 28. Cho, R. J., Campbell, M. J., Winzeler, E. A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T. G., Gabrielian, A. E., Landsman, D., Lockhart, D. J. &amp; Davis, R. W. (1998) Mol. Cell2, 65-73. [[PubMed]
  • 29. Coghlan, A. &amp; Wolfe, K. H. (2000) Yeast16, 1131-1145. [[PubMed]
  • 30. Sharp, P. M. &amp; Li, W. H. (1987) Nucleic Acids Res.15, 1281-1295.
  • 31. Thompson, J. D., Higgins, D. G. &amp; Gibson, T. J. (1994) Nucleic Acids Res.22, 4673-4680.
  • 32. Yang, Z. H. (1997) Comput. Appl. Biosci.13, 555-556. [[PubMed]
  • 33. Ihaka, R. &amp; Gentleman, R. (1996) J. Comput. Graph. Stat.5, 299-314. [PubMed]
  • 34. Akashi, H(1994) Genetics136, 927-935. [Google Scholar]
  • 35. Parker, J(1989) Microbiol. Rev.53, 273-298. [Google Scholar]
  • 36. Goldberg, A. L. (2003) Nature426, 895-899. [[PubMed]
  • 37. Bloom, J. D., Silberg, J. J., Wilke, C. O., Drummond, D. A., Adami, C. &amp; Arnold, F. H. (2005) Proc. Natl. Acad. Sci. USA102, 606-611.
  • 38. Ellis, R. J. &amp; Pinheiro, T. J. (2002) Nature416, 483-484. [[PubMed]
  • 39. Dong, H., Nilsson, L. &amp; Kurland, C. G. (1995) J. Bacteriol.177, 1497-1504.
  • 40. Greenbaum, D., Colangelo, C., Williams, K. &amp; Gerstein, M. (2003) Genome Biol.4, 117.
  • 41. Benjamini, Y. &amp; Hochberg, Y. (1995) J. R. Stat. Soc. B57, 289-300. [PubMed]
  • 42. Gu, Z., Steinmetz, L. M., Gu, X., Scharfe, C., Davis, R. W. &amp; Li, W. H. (2003) Nature421, 63-66. [[PubMed]
  • 43. Pál, C., Papp, B. &amp; Hurst, L. D. (2001) Mol. Biol. Evol.18, 2323-2326. [[PubMed]
  • 44. Drummond, D. A., Raval, A. &amp; Wilke, C. O. (2005) arXiv: q-bio.PE/0506011.
  • 45. Guo, H. H., Choe, J. &amp; Loeb, L. A. (2004) Proc. Natl. Acad. Sci. USA101, 9205-9210.
  • 46. Bucciantini, M., Giannoni, E., Chiti, F., Baroni, F., Formigli, L., Zurdo, J., Taddei, N., Ramponi, G., Dobson, C. M. &amp; Stefani, M. (2002) Nature416, 507-511. [[PubMed]
  • 47. Precup, J. &amp; Parker, J. (1987) J. Biol. Chem.262, 11351-11355. [[PubMed]
  • 48. Spreitzer, R. J. (1993) Annu. Rev. Plant Physiol. Plant Mol. Biol.44, 411-434. [PubMed]
  • 49. Rokas, A., Williams, B. L., King, N. &amp; Carroll, S. B. (2003) Nature425, 798-804. [[PubMed]
  • 50. Kurtzman, C. P. &amp; Robnett, C. J. (2003) FEMS Yeast Res.3, 417-432. [[PubMed]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.