The EMBL nucleotide sequence database
Abstract
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
References
- 1. Tateno Y., Miyazaki,S., Ota,M., Sugawara,H. and Gojobori,T. (2000) DNA Data Bank of Japan (DDBJ) in collaboration with mass sequencing teams. Nucleic Acids Res., 28, 24–26.
- 2. Benson D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J., Rapp,B.Aand Wheeler,D.L. (2000) GenBank. Nucleic Acids Res.,28, 15–18. [Google Scholar]
- 3. Bairoch A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res.,28, 45–48.
- 4. Périer R.C., Praz,V., Junier,T., Bonnard C. and Bucher P. (2000) The Eukaryotic Promoter Database (EPD). Nucleic Acids Res., 28, 302–303.
- 5. Wingender E., Chen,X., Hehl,R., Karas,H., Liebich,I., Matys,V., Meinhardt,T., Prüß,M., Reuter,Iand Schacherer,F. (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res., 28, 316–319. Updated article in this issue: Nucleic Acids Res. (2001), 29, 281–283. [Google Scholar]
- 6. Ruiz M., Giudicelli,V., Ginestoux,C., Stoehr,P., Robinson,J., Bodmer,J., Marsh,S.G.E., Bontrop,R., Lemaitre,M., Lefranc,G., Chaume,Dand Lefranc,M.-P. (2000) IMGT, the international ImMunoGeneTics database. Nucleic Acids Res.,28, 219–221. Updated article in this issue: Nucleic Acids Res. (2001), 29, 207–209. [Google Scholar]
- 7. The FlyBase Consortium (1999) The FlyBase database of the Drosophila Genome Projects and community literature., Nucleic Acids Res.27, 85–88.
- 8. Beck S. and Sterk,P. (1998) Genome-Scale DNA sequencing where are we? Curr. Opin. Biotechnol., 9, 116–121. [[PubMed]
- 9. Etzold T., Ulyanov,A. and Argos,P. (1996) SRS: information retrieval system for molecular biology data banks. Methods Enzymol., 266, 114–128. [[PubMed]
- 10. Pearson W.R(1994) Using the FASTA program to search protein and DNA sequence databases. Methods Mol. Biol., 24, 307–331. [[PubMed]
- 11. Altschul S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. [[PubMed]
- 12. Smith R.F. and Waterman,M.S. (1981) Comparison of biosequences. Adv. Applied Math., 2, 482–489.
- 13. Thompson, J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680.
- 14. Borodovsky M. and McIninch,J. (1993) GeneMark: Parallel Gene Recognition for both DNA Strands. Comput. Chem., 17, 123–133. [PubMed]
- 15. Jonassen I., Collins,J.F. and Higgins,D.G. (1995) Finding flexible patterns in unaligned protein sequences. Protein Sci., 4, 1587–1595.
