Repeat polymorphisms within gene regions: phenotypic and evolutionary implications.
Journal: 2000/August - American Journal of Human Genetics
ISSN: 0002-9297
Abstract:
We have developed an algorithm that predicted 11,265 potentially polymorphic tandem repeats within transcribed sequences. We estimate that 22% (2,207/9,717) of the annotated clusters within UniGene contain at least one potentially polymorphic locus. Our predictions were tested by allelotyping a panel of approximately 30 individuals for 5% of these regions, confirming polymorphism for more than half the loci tested. Our study indicates that tandem-repeat polymorphisms in genes are more common than is generally believed. Approximately 8% of these loci are within coding sequences and, if polymorphic, would result in frameshifts. Our catalogue of putative polymorphic repeats within transcribed sequences comprises a large set of potentially phenotypic or disease-causing loci. In addition, from the anomalous character of the repetitive sequences within unannotated clusters, we also conclude that the UniGene cluster count substantially overestimates the number of genes in the human genome. We hypothesize that polymorphisms in repeated sequences occur with some baseline distribution, on the basis of repeat homogeneity, size, and sequence composition, and that deviations from that distribution are indicative of the nature of selection pressure at that locus. We find evidence of selective maintenance of the ability of some genes to respond very rapidly, perhaps even on intragenerational timescales, to fluctuating selective pressures.
Relations:
Content
Citations
(47)
References
(34)
Grants
(340)
Organisms
(1)
Processes
(10)
Similar articles
Articles by the same authors
Discussion board
Am J Hum Genet 67(2): 345-356

Repeat Polymorphisms within Gene Regions: Phenotypic and Evolutionary Implications

Programs in Genetics and Development and Molecular Biophysics, Southwestern Graduate School of Biomedical Sciences, Hamon Center for Therapeutic Oncology Research, and Departments of Biochemistry, Internal Medicine, Molecular Biology, and Pharmacology, McDermott Center for Human Growth and Development, Center for Biomedical Inventions, and the Ryburn Cardiac Center, The University of Texas Southwestern Medical Center, Dallas
Address for correspondence and reprints: Dr. Harold R. (Skip) Garner, 5323 Harry Hines Boulevard, Dallas, TX 75390-8591. E-mail: ude.demws.wstu@renrag
Address for correspondence and reprints: Dr. Harold R. (Skip) Garner, 5323 Harry Hines Boulevard, Dallas, TX 75390-8591. E-mail: ude.demws.wstu@renrag
Received 2000 Apr 10; Accepted 2000 Jun 2.

Abstract

We have developed an algorithm that predicted 11,265 potentially polymorphic tandem repeats within transcribed sequences. We estimate that 22% (2,207/9,717) of the annotated clusters within UniGene contain at least one potentially polymorphic locus. Our predictions were tested by allelotyping a panel of ∼30 individuals for 5% of these regions, confirming polymorphism for more than half the loci tested. Our study indicates that tandem-repeat polymorphisms in genes are more common than is generally believed. Approximately 8% of these loci are within coding sequences and, if polymorphic, would result in frameshifts. Our catalogue of putative polymorphic repeats within transcribed sequences comprises a large set of potentially phenotypic or disease-causing loci. In addition, from the anomalous character of the repetitive sequences within unannotated clusters, we also conclude that the UniGene cluster count substantially overestimates the number of genes in the human genome. We hypothesize that polymorphisms in repeated sequences occur with some baseline distribution, on the basis of repeat homogeneity, size, and sequence composition, and that deviations from that distribution are indicative of the nature of selection pressure at that locus. We find evidence of selective maintenance of the ability of some genes to respond very rapidly, perhaps even on intragenerational timescales, to fluctuating selective pressures.

Abstract

Acknowledgments

This research was funded by Special Projects Open Research Environment grant P50CA70907, the Patrick O'Brien Montgomery Distinguished Chair, and the D.W. Reynolds Cardiovascular Clinical Research Center. We would like to thank Hewlett-Packard for the loan of an Exemplar supercomputer.

Acknowledgments

Electronic-Database Information

Electronic-Database Information
Alba MM, Santibanez-Koref MF, Hancock JM (1999) Conservation of polyglutamine tract size between mice and humans depends on codon interruption. Mol Biol Evol 16:1641–1644 [PubMed] [Google Scholar]
Beaton S, ten Have J, Cleary A, Bradley MP (1995) Cloning and partial characterization of the cDNA encoding the fox sperm protein FSA-Acr.1 with similarities to the SP-10 antigen. Mol Reprod Dev 40:242–252 [PubMed] [Google Scholar]
Bidichandani SI, Ashizawa T, Patel PI (1998) The GAA triplet-repeat expansion in Friedreich ataxia interferes with transcription and may be associated with an unusual DNA structure. Am J Hum Genet 62:111–121 [PMC free article] [PubMed] [Google Scholar]
Boguski MS, Schuler GD (1995) ESTablishing a human transcript map. Nat Genet 10:369–371 [PubMed] [Google Scholar]
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94 [PubMed] [Google Scholar]
Deininger PL, Batzer MA (1999) Alu repeats and human disease. Mol Genet Metab 67:183–193 [PubMed] [Google Scholar]
Fondon JW III, Mele GM, Brezinschek RI, Cummings D, Pande A, Wren J, O'Brien KM, et al (1998) Computerized polymorphic marker identification: experimental validation and a predicted human polymorphism catalog. Proc Natl Acad Sci USA 95:7514–7519 [PMC free article] [PubMed] [Google Scholar]
Gay E, Babajko S (2000) AUUUA sequences compromise human insulin-like growth factor binding protein-1 mRNA stability. Biochem Biophys Res Commun 267:509–515 [PubMed] [Google Scholar]
Hancock JM, Santibanez-Koref MF (1998) Trinucleotide expansion diseases in the context of micro- and minisatellite evolution, Hammersmith Hospital, April 1–3, 1998. EMBO J 17:5521–5524 [PMC free article] [PubMed] [Google Scholar]
Jakupciak JP, Wells RD (1999) Genetic instabilities in (CTG.CAG) repeats occur by recombination. J Biol Chem 274:23468–23479 [PubMed] [Google Scholar]
Jansen G, Willems P, Coerwinkel M, Nillesen W, Smeets H, Vits L, Howeler C, et al (1994) Gonosomal mosaicism in myotonic dystrophy patients: involvement of mitotic events in (CTG)n repeat variation and selection against extreme expansion in sperm. Am J Hum Genet 54:575–585 [PMC free article] [PubMed] [Google Scholar]
Jeffreys AJ, Royle NJ, Wilson V, Wong Z (1988) Spontaneous mutation rates to new length alleles at tandem-repetitive hypervariable loci in human DNA. Nature 332:278–281 [PubMed] [Google Scholar]
Karthikeyan G, Chary KV, Rao BJ (1999) Fold-back structures at the distal end influence DNA slippage at the proximal end during mononucleotide repeat expansions. Nucleic Acids Res 27:3851–3858 [PMC free article] [PubMed] [Google Scholar]
Kawaguchi Y, Okamoto T, Taniwaki M, Aizawa M, Inoue M, Katayama S, Kawakami H, et al (1994) CAG expansions in a novel gene for Machado-Joseph disease at chromosome 14q32.1. Nat Genet 8:221–228 [PubMed] [Google Scholar]
Kunst CB, Leeflang EP, Iber JC, Arnheim N, Warren ST (1997) The effect of FMR1 CGG repeat interruptions on mutation frequency as measured by sperm typing. J Med Genet 34:627–631 [PMC free article] [PubMed] [Google Scholar]
Leeflang EP, Tavare S, Marjoram P, Neal CO, Srinidhi J, MacFarlane H, MacDonald ME, et al (1999) Analysis of germline mutation spectra at the Huntington's disease locus supports a mitotic mutation mechanism. Hum Mol Genet 8:173–183 [erratum: Hum Mol Genet 8:717] [PubMed] [Google Scholar]
Mar Alba M, Santibanez-Koref MF, Hancock JM (1999) Amino acid reiterations in yeast are overrepresented in particular classes of proteins and show evidence of a slippage-like mutational process. J Mol Evol 49:789–797 [PubMed] [Google Scholar]
Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D (1999) A census of protein repeats. J Mol Biol 293:151–160 [PubMed] [Google Scholar]
Mooser V, Mancini FP, Bopp S, Petho-Schramm A, Guerra R, Boerwinkle E, Muller HJ, et al (1995) Sequence polymorphisms in the apo(a) gene associated with specific levels of Lp(a) in plasma. Hum Mol Genet 4:173–181 [PubMed] [Google Scholar]
Nakamura Y, Leppert M, O'Connell P, Wolff R, Holm T, Culver M, Martin C, et al (1987) Variable number of tandem repeat (VNTR) markers for human gene mapping. Science 235:1616–1622 [PubMed] [Google Scholar]
Ohno S (1984) Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes. J Mol Evol 20:313–321 [PubMed] [Google Scholar]
Rayner S, Brignac S, Bumeister R, Belosludtsev Y, Ward T, Grant O, O'Brien K, et al (1998) MerMade: an oligodeoxyribonucleotide synthesizer for high throughput oligonucleotide production in dual 96-well plates. Genome Res 8:741–747 [PMC free article] [PubMed] [Google Scholar]
Richards RI, Holman K, Yu S, Sutherland GR (1993) Fragile X syndrome unstable element, p(CCG)n, and other simple tandem repeat sequences are binding sites for specific nuclear proteins. Hum Mol Genet 2:1429–1435 [PubMed] [Google Scholar]
Schmid KJ, Nigro L, Aquadro CF, Tautz D (1999) Large number of replacement polymorphisms in rapidly evolving genes of drosophila: implications for genome-wide surveys of dna polymorphism. Genetics 153:1717–1729 [PMC free article] [PubMed] [Google Scholar]
Shimajiri S, Arima N, Tanimoto A, Murata Y, Hamada T, Wang KY, Sasaguri Y (1999) Shortened microsatellite d(CA)21 sequence down-regulates promoter activity of matrix metalloproteinase 9 gene. FEBS Lett 455:70–74 [PubMed] [Google Scholar]
Smits AP, Dreesen JC, Post JG, Smeets DF, de Die-Smulders C, Spaans-van der Bijl T, Govaerts LC, et al (1993) The fragile X syndrome: no evidence for any recent mutations. J Med Genet 30:94–96 [PMC free article] [PubMed] [Google Scholar]
Stallings RL (1994) Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. Genomics 21:116–121 [PubMed] [Google Scholar]
Sumiyama K, Washio-Watanabe K, Saitou N, Hayakawa T, Ueda S (1996) Class III POU genes: generation of homopolymeric amino acid repeats under GC pressure in mammals. J Mol Evol 43:170–178 [PubMed] [Google Scholar]
Tautz D, Trick M, Dover GA (1986) Cryptic simplicity in DNA is a major source of genetic variation. Nature 322:652–656 [PubMed] [Google Scholar]
Valenti K, Aveynier E, Leaute S, Laporte F, Hadjian AJ (1999) Contribution of apolipoprotein(a) size, pentanucleotide TTTTA repeat and C/T(+93) polymorphisms of the apo(a) gene to regulation of lipoprotein(a) plasma levels in a population of young European Caucasians. Atherosclerosis 147:17–24 [PubMed] [Google Scholar]
Verkerk AJ, Pieretti M, Sutcliffe JS, Fu YH, Kuhl DP, Pizzuti A, Reiner O, et al (1991) Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65:905–914 [PubMed] [Google Scholar]
Wells RD (1996) Molecular basis of genetic instability of triplet repeats. J Biol Chem 271:2875–2878 [PubMed] [Google Scholar]
Yamada N, Yamaya M, Okinaga S, Nakayama K, Sekizawa K, Shibahara S, Sasaki H (2000) Microsatellite polymorphism in the heme oxygenase-1 gene promoter is associated with susceptibility to emphysema. Am J Hum Genet 66:187–195 [PMC free article] [PubMed] [Google Scholar]
Zuliani G, Hobbs HH (1990) A high frequency of length polymorphisms in repeated sequences adjacent to Alu sequences. Am J Hum Genet 46:963–969 [PMC free article] [PubMed] [Google Scholar]

References

  • 1. [PubMed]
  • 2. (for human databases [accession numbers Y00285, D86407, M60052, AF047437, D83492, Y11525, AF032886, M60315, X82209, U60325, U49020, AF017789, M60052, AF060231, AF013956, D86550, AF042838, T62484, T63962, R42196, X78261, T70173, R12160, T47177, X55313, L08835, M64347, D14838, M55047, X70811, U36798, U36336, K02402, X04412, U75285, X78520, U68723, AF065482, M36089, L04489, AB015132, X15949, AF022654, U38276, S83513, U29589, X06374, AB002454, D16532, U92436, U38810, AL021155, X60188, U43292, M75866, M73980, U94333, U21858, D55655, U34962, U47741, U02031, U23752, AF002715, AF010403, S62539, AB005216, AB011792, X05299, M55514, L06147, X05299, AF053944, U68063, Y00764, X53416, AF008192, U13616, AF051946, U88153, T87413, R33865, T62835, T80553, T70304, T60175, L14837, Y00285, U17327, NM_004691, X02812, M14764, AB010710, M12783, U67784, U52152, Y00815, M74525, AF075292, U58334, X02812, L08488, and X17360])[PubMed]
  • 3. (for nucleotide sequences)[PubMed]
  • 4. [PubMed]
  • 5. [PubMed]
  • 6. [PubMed]
  • 7. [PubMed]
  • 8.
  • 9. [PubMed]
  • 10. [PubMed]
  • 11. [PubMed]
  • 12.
  • 13. [PubMed]
  • 14.
  • 15. [PubMed]
  • 16.
  • 17. [PubMed]
  • 18.
  • 19. [PubMed]
  • 20.
  • 21. [PubMed]
  • 22. [PubMed]
  • 23. [PubMed]
  • 24. [PubMed]
  • 25. [PubMed]
  • 26. [PubMed]
  • 27.
  • 28. [PubMed]
  • 29.
  • 30. [PubMed]
  • 31.
  • 32. [PubMed]
  • 33. [PubMed]
  • 34. [PubMed]
  • 35. [PubMed]
  • 36. [PubMed]
  • 37. [PubMed]
  • 38.
  • 39.
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.