Human non-synonymous SNPs: server and survey

^{European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany,}^{Max-Delbrueck Center for Molecular Medicine, Robert-Roessle-Strasse 10, 13122 Berlin, Germany and}^{Engelhardt Institute of Molecular Biology, Vavilova 32, 119991 Moscow, Russia}

^{To whom correspondence should be addressed at present address: Genetics Division, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA. Tel: +1 617 7325856; Fax: +1 617 7325123; Email:}ude.dravrah.hwb.scir@veaynuss

Received 2002 Mar 19; Revised 2002 Jul 8; Accepted 2002 Jul 8.

Abstract

Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect of an nsSNP on protein structure and function. The prediction method enabled analysis of the publicly available SNP database HGVbase, which gave rise to a dataset of nsSNPs with predicted functionality. The dataset was further used to compare the effect of various structural and functional characteristics of amino acid substitutions responsible for phenotypic display of nsSNPs. We also studied the dependence of selective pressure on the structural and functional properties of proteins. We found that in our dataset the selection pressure against deleterious SNPs depends on the molecular function of the protein, although it is insensitive to several other protein features considered. The strongest selective pressure was detected for proteins involved in transcription regulation.

Abstract

One row corresponds to one rule, which may consist of several parts connected by logical AND. For a given substitution, all rules are tried one by one, resulting in prediction of functional effect: benign, possibly damaging or probably damaging. If no evidence for a damaging effect is seen, substitution is considered benign.

^{BINDING, ACT_SITE, SITE, MOD_RES, LIPID, METAL, SE_CYS (SwissProt feature table terms).}

^{DISULFID, THIOLEST, THIOETH (SwissProt feature table terms).}

ACKNOWLEDGEMENTS

The authors are thankful to Evgenia Kriventseva for her help in the work with the GO database. S.S. acknowledges Alexey Kondrashov for useful discussions.

ACKNOWLEDGEMENTS

REFERENCES

References

1. Risch N. and Merikangas,K. (1996) The future of genetic studies of complex human diseases. Science, 273, 1516–1517. [[PubMed]
2. Risch N.J(2000) Searching for genetic determinants in the new millennium. Nature, 15, 847–856. [[PubMed][Google Scholar]
3. Lai E., Riley,J., Purvis,I. and Roses,A. (1998) A 4-Mb high-density single nucleotide polymorphism-based map around human APOE. Genomics, 54, 31–38. [[PubMed]
4. Emahazion T., Feuk,L., Jobs,M., Sawyer,S.L., Fredman,D., St Clair,D., Prince,J.A. and Brookes,A.J. (2001) SNP association studies in Alzheimer’s disease highlight problem for complex disease analysis. Trends Genet., 17, 407–413. [[PubMed]
5. Schork N.J., Fallin,D. and Lanchbury,J.S. (2000) Single nucleotide polymorphisms and the future of genetic epidemiology. Clin. Genet., 58, 250–264. [[PubMed]
6. Sunyaev S., Ramensky,V. and Bork,P. (2000) Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet., 16, 198–200. [[PubMed]
7. Sunyaev S., Ramensky,V., Koch,I., Lathe,W.,III, Kondrashov,A.S. and Bork,P. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet., 10, 591–597. [[PubMed]
8. Wang Z. and Moult,J. (2001) SNPs, protein structure and disease. Hum. Mutat., 17, 263–270. [[PubMed]
9. Chasman D. and Adams,R.M. (2001) Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J. Mol. Biol., 307, 683–706. [[PubMed]
10. Ng P.C. and Henikoff,S. (2001) Predicting deleterious amino acid substitutions. Genome Res., 11, 863–874.
11. Ferrer-Costa C., Orozco,M. and de la Cruz,X. (2002) Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J. Mol. Biol., 315, 771–786. [[PubMed]
12. Sunyaev S.R., Lathe,W.C.,III, Ramensky,V.E. and Bork,P. (2000) SNP frequencies in human genes: an excess of rare alleles and differing modes of selection. Trends Genet., 16, 335–337. [[PubMed]
13. Fredman D., Siegfried,M., Yuan,Y.P., Bork,P., Lehvaslaiho,H. and Brookes,A.J. (2002) HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Res., 30, 387–391.
14. Johnson G.C. and Todd,J.A. (2000) Strategies in complex disease mapping. Curr. Opin. Genet. Dev., 10, 330–334. [[PubMed]
15. Apweiler R(2000) Protein sequence databases. Adv. Protein Chem., 54, 31–71. [[PubMed][Google Scholar]
16. Wootton J.C. and Federhen,S. (1993) Statistics of local complexity in amino-acid-sequences and sequence databases. Comput. Chem., 17, 149–163. [PubMed]
17. Claverie J.M. and States,D.J. (1993) Information enhancement methods for large-scale sequence analysis. Comput. Chem., 17, 191–201. [PubMed]
18. Jurka J(2000) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet., 16, 418–420 [[PubMed][Google Scholar]
19. Krogh A., Larsson,B., von Heijne,G. and Sonnhammer,E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol., 305, 567–580. [[PubMed]
20. Lupas A., Van Dyke,M. and Stock,J. (1991) Predicting coiled coils from protein sequences. Science, 252, 1162–1164. [[PubMed]
21. Nielsen H., Engelbrecht,J., Brunak,S. and von Heijne G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 10, 1–6. [[PubMed]
22. Ng P.C., Henikoff,J.G. and Henikoff,S. (2000) PHAT: a transmembrane-specific substitution matrix. Predicted hydrophobic and transmembrane. Bioinformatics, 16, 760–766. [[PubMed]
23. Altschul S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. [[PubMed]
24. Sunyaev S.R., Eisenhaber,F., Rodchenkov,I.V., Eisenhaber,B., Tumanyan,V.G. and Kuznetsov,E.N. (1999) PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. Protein Eng., 12, 387–394. [[PubMed]
25. Berman H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242.
26. Henrick K. and Thornton,J.M. (1998) PQS: a protein quaternary structure file server. Trends Biochem. Sci., 23, 358–361. [[PubMed]
27. Kabsch W. and Sander,C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637. [[PubMed]
28. McDonald I.K. and Thornton,J.M. (1994) Satisfying hydrogen bonding potential in proteins. J. Mol. Biol., 238, 777–793. [[PubMed]
29. Cargill M., Altshuler,D., Ireland,J., Sklar,P., Ardlie,K., Patil,N., Shaw,N., Lane,C.R., Lim,E.P., Kalyanaraman,N. et al. (1999) Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet., 22, 231–238. [[PubMed]
30. Halushka M.K., Fan,J.B., Bentley,K., Hsie,L., Shen,N., Weder,A., Cooper,R., Lipshutz,R. and Chakravarti,A. (1999) Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet., 22, 239–247. [[PubMed]
31. Cambien F., Poirier,O., Nicaud,V., Hermann,S.M., Mallet,C., Ricard,S., Beague,I., Hallet,V., Blanc,H., Loucaci,V. et al. (1999) Sequence diversity in 36 candidate genes for cardiovascular disorders. Am. J. Hum. Genet., 65, 183–191.
32. Hirsh A.E. and Fraser,H.B. (2001) Protein dispensability and rate of evolution. Nature, 411, 1046–1049. [[PubMed]
33. Lo Conte L., Brenner,S.E., Hubbard,T.J., Chothia,C. and Murzin,A.G. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30, 264–267.
34. The Gene Ontology Consortium (2001) Creating the gene ontology resource: design and implementation. Genome Res., 11, 1425–1433.
35. Kacser H. and Burns,J.A. (1981) The molecular basis of dominance. Genetics, 97, 639–666.
36. Weiss K.M. and Terwilliger,J.D. (2000) How many diseases does it take to map a gene with SNPs? Nature Genet., 26, 151–157. [[PubMed]
37. Fay J.C., Wyckoff,G.J. and Wu,C.I. (2001) Positive and negative selection on the human genome. Genetics, 158, 1227–1234.