Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes.
Journal: 2004/February - Nucleic Acids Research
ISSN: 1362-4962
PUBMED: 12954770
Abstract:
Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences, we have thoroughly studied DNA mutation patterns in the human genome. We analyzed a total of 1726 processed RP pseudogene sequences, comprising more than 700 000 bases. To be sure to differentiate the sequence changes occurring in the functional genes during evolution from those occurring in pseudogenes after they were fixed in the genome, we used only pseudogene sequences originating from parts of RP genes that are identical in human and mouse. Overall, we found that nucleotide transitions are more common than transversions, by roughly a factor of two. Moreover, the substitution rates amongst the 12 possible nucleotide pairs are not homogeneous as they are affected by the type of immediately neighboring nucleotides and the overall local G+C content. Finally, our dataset is large enough that it has many indels, thus allowing for the first time statistically robust analysis of these events. Overall, we found that deletions are about three times more common than insertions (3740 versus 1291). The frequencies of both these events follow characteristic power-law behavior associated with the size of the indel. However, unexpectedly, the frequency of 3 bp deletions (in contrast to 3 bp insertions) violates this trend, being considerably higher than that of 2 bp deletions. The possible biological implications of such a 3 bp bias are discussed.
Relations:
Content
Citations
(84)
References
(27)
Chemicals
(1)
Organisms
(3)
Processes
(7)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Nucleic Acids Res 31(18): 5338-5348

Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes

Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520-8114, USA
To whom correspondence should be addressed. Tel: +1 203 432 6105; Fax: +1 360 838 7861; Email: ude.elay@nietsreg.kram
Received 2003 Jun 9; Revised 2003 Jul 14; Accepted 2003 Jul 31.

Abstract

Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences, we have thoroughly studied DNA mutation patterns in the human genome. We analyzed a total of 1726 processed RP pseudogene sequences, comprising more than 700 000 bases. To be sure to differentiate the sequence changes occurring in the functional genes during evolution from those occurring in pseudogenes after they were fixed in the genome, we used only pseudogene sequences originating from parts of RP genes that are identical in human and mouse. Overall, we found that nucleotide transitions are more common than transversions, by roughly a factor of two. Moreover, the substitution rates amongst the 12 possible nucleotide pairs are not homogeneous as they are affected by the type of immediately neighboring nucleotides and the overall local G+C content. Finally, our dataset is large enough that it has many indels, thus allowing for the first time statistically robust analysis of these events. Overall, we found that deletions are about three times more common than insertions (3740 versus 1291). The frequencies of both these events follow characteristic power–law behavior associated with the size of the indel. However, unexpectedly, the frequency of 3 bp deletions (in contrast to 3 bp insertions) violates this trend, being considerably higher than that of 2 bp deletions. The possible biological implications of such a 3 bp bias are discussed.

Abstract

ACKNOWLEDGEMENTS

Z.Z. thanks Paul Harrison for scintillating discussions and Duncan Milburn, Yin Liu and Nat Echols for computational assistance. We also thank the two anonymous reviewers for helpful suggestions. M.G. acknowledges an NIH CEGS grant (P50HG02357-01) and the Keck Foundation for financial support.

ACKNOWLEDGEMENTS

REFERENCES

REFERENCES

References

  • 1. Mighell A.J., Smith,N.R., Robinson,P.A. and Markham,A.F. (2000) Vertebrate pseudogenes. FEBS Lett., 468, 109–114. [[PubMed]
  • 2. Esnault C., Maestre,J. and Heidmann,T. (2000) Human line retrotransposons generate processed pseudogenes. Nature Genet., 24, 363–367. [[PubMed]
  • 3. Antonarakis S.E., Krawczak,M. and Cooper,D.N. (2000) Disease-causing mutations in the human genome. Eur. J. Pediatr., 159, S173–S178. [[PubMed]
  • 4. Krawczak M., Chuzhanova,N.A., Stenson,P.D., Johansen,B.N., Ball,E.V. and Cooper,D.N. (2000) Changes in primary DNA sequence complexity influence the phenotypic consequences of mutations in human gene regulatory regions. Hum. Genet., 107, 362–365. [[PubMed]
  • 5. Hess S.T., Blake,J.D. and Blake,R.D. (1994) Wide variations in neighbor-dependent substitution rates. J. Mol. Biol., 236, 1022–1033. [[PubMed]
  • 6. Gu X. and Li,W.H. (1995) The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J. Mol. Evol., 40, 464–473. [[PubMed]
  • 7. Ophir R. and Graur,D. (1997) Patterns and rates of indel evolution in processed pseudogenes from humans and murids. Gene, 205, 191–202. [[PubMed]
  • 8. Petrov D.A. and Hartl,D.L. (1999) Patterns of nucleotide substitution in Drosophila and mammalian genomes. Proc. Natl Acad. Sci. USA, 96, 1475–1479.
  • 9. Saitou N. and Ueda,S. (1994) Evolutionary rates of insertion and deletion in noncoding nucleotide sequences of primates. Mol. Biol. Evol., 11, 504–512. [[PubMed]
  • 10. Zhang Z., Harrison,P. and Gerstein,M. (2002) Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Res., 12, 1466–1482.
  • 11. Wool I.G., Chan,Y.L. and Gluck,A. (1995) Structure and evolution of mammalian ribosomal proteins. Biochem. Cell Biol., 73, 933–947. [[PubMed]
  • 12. Thompson J.D., Higgins,D.G. and Gibson,T.J. (1994) Clustal W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680.
  • 13. Pearson W.R(1997) Comparison of DNA sequences with protein sequences. Genomics, 46, 24–36. [[PubMed][Google Scholar]
  • 14. Gojobori T., Li,W.H. and Graur,D. (1982) Patterns of nucleotide substitution in pseudogenes and functional genes. J. Mol. Evol., 18, 360–369. [[PubMed]
  • 15. Li W.H., Wu,C.I. and Luo,C.C. (1984) Nonrandomness of point mutation as reflected in nucleotide substitutions in pseudogenes and its evolutionary implications. J. Mol. Evol., 21, 58–71. [[PubMed]
  • 16. Lander E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. [[PubMed]
  • 17. Russell G.J., Walker,P.M., Elton,R.A. and Subak-Sharpe,J.H. (1976) Doublet frequency analysis of fractionated vertebrate nuclear DNA. J. Mol. Biol., 108, 1–23. [[PubMed]
  • 18. Gentles A.J. and Karlin,S. (2001) Genome-scale compositional comparisons in eukaryotes. Genome Res., 11, 540–546.
  • 19. Bulmer M(1986) Neighboring base effects on substitution rates in pseudogenes. Mol. Biol. Evol., 3, 322–329. [[PubMed][Google Scholar]
  • 20. Karlin S. and Mrazek,J. (1997) Compositional differences within and between eukaryotic genomes. Proc. Natl Acad. Sci. USA, 94, 10227–10232.
  • 21. Duret L. and Galtier,N. (2000) The covariation between tpa deficiency, cpg deficiency and g+c content of human isochores is due to a mathematical artifact. Mol. Biol. Evol., 17, 1620–1625. [[PubMed]
  • 22. Luscombe N.M., Qian,J., Zhang,Z., Johnson,T. and Gerstein,M. (2002) The dominance of the population by a selected few: power–law behaviour applies to a wide variety of genomic properties. Genome Biol., 3, RESEARCH0040.
  • 23. Waterston R.H., Lindblad-Toh,K., Birney,E., Rogers,J., Abril,J.F., Agarwal,P., Agarwala,R., Ainscough,R., Alexandersson,M., An,P. et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520–562. [[PubMed]
  • 24. Toth G., Gaspari,Z. and Jurka,J. (2000) Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res., 10, 967–981.
  • 25. Capy P. (2000) Perspectives: evolution. Is bigger better in cricket? Science, 287, 985–986. [[PubMed]
  • 26. Levinson G. and Gutman,G.A. (1987) Slipped-strand mispairing: A major mechanism for DNA sequence evolution. Mol. Biol. Evol., 4, 203–221. [[PubMed]
  • 27. Pearson C.E. and Sinden,R.R. (1998) Trinucleotide repeat DNA structures: Dynamic mutations from dynamic DNA. Curr. Opin. Struct. Biol., 8, 321–330. [[PubMed]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.