Human Chromosome 7: DNA Sequence and Biology
Abstract
DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. This approach enabled the discovery of candidate genes for developmental diseases including autism.
With the advent of the Human Genome Project (HGP), a wealth of resources including genetic (1), physical (2, 3), gene (4), and draft DNA sequence maps (5, 6) have facilitated the discovery of more than 360 disease-associated genes and loci on chromosome 7 (table S1).
Here we present a comprehensive assembly of 157,953,789 nucleotides (nt) of DNA covering human chromosome 7. About 85% of the content was derived from a subset of unpublished Celera whole-genome scaffolds for chromosome 7 (7) based on updates of previous work (5). Another 15% was from new or updated clone-based sequences from the International Human Genome Sequencing Consortium (notably the Washington University Genome Sequencing Center) and other sources (supporting online text) (tables S2 and S3). The assembly (named CRA_TCAGchr7.v1) is available at a public Web site (www.chr7.org/) and in Gen-Bank (7). To maximize the utility of the sequence for discovery, we incorporated biological and medically relevant features from all available databases, the literature, and our data (7). Wherever possible, computer-based annotations of the sequence were examined manually and validated experimentally. Moreover, we included patient analysis as an aspect of the sequence annotation to increase knowledge of the function and regulation of genes. The Generic Model Organism Database (8) and its Genome Browser function were implemented to display all mapping, sequencing, structural, and clinical data to provide a mechanism and dynamic platform for human chromosome 7 annotation.
The assembled sequences were positioned to cytogenetic bands on chromosome 7 by fluorescence in situ hybridization (FISH) with 1440 genomic clones (7). The FISH resource also assisted in confirming order and copy number in chromosomal regions containing low-copy or complex repeats (9, 10). For the 770 bacterial genomic clones displayed in the Genome Browser, FISH experiments were reproduced more than once in at least two laboratories to allow accurate cytogenetic boundaries to be established. The sequence assembly reached both telomere ends and encompassed the apparent junction sequences between the euchromatic arms, and the D7Z2 and D7Z1 centromeric satellites on 7p and 7q, respectively. Because the centromere is polymorphic (ranging in size from 1500 to 3800 kb at D7Z1 and 100 to 500 kb at D7Z2) (11, 12), 2,700,000 nucleotides (nt) were substituted to represent an average-sized chromosome 7.
We tested all available genomic data against our assembly, including the latest National Center for Biotechnology Information (NCBI) chromosome 7 sequence database (Build 31) (supporting online text). Using the PatternHunter program (13) to compare CRA_TCAGchr7.v1 and Build 31, we found (i) a total of 1,186,913 nts of unmatched sequence between the assemblies, (ii) 132 other sites (encompassing 508,332 nt) where different sequences were found at the same relative chromosomal positions (termed sequence variations), and (iii) 10 equivalent DNA segments placed in an inverted orientation between the two assemblies (Fig. 1; figs. S1 and S2, table S4). The differences detected could be due to rearrangements arising during cloning, assembly mistakes, or polymorphism between the source chromosomal DNA (no correlation was observed between inverted regions and known genomic polymorphism or discrepancies in genetic maps).
Footnotes
www.sciencemag.org/cgi/content/full/1083423/DC1
Materials and Methods
SOM Text
References
References and Notes
References
- 1. Dib C, et al Nature. 1996;380:152.[PubMed][Google Scholar]
- 2. Kunz J, et al Genomics. 1994;22:439.[PubMed][Google Scholar]
- 3. Bouffard GG, et al Genome Res. 1997;7:59.[PubMed][Google Scholar]
- 4. Schuler GD, et al Science. 1996;274:540.[PubMed][Google Scholar]
- 5. Venter JC, et al Science. 2001;291:1304.[PubMed][Google Scholar]
- 6. International Human Genome Sequencing Consortium. Nature. 2001;409:860.[PubMed]
- 7. Materials and methods are available as on Science Online. The sequence assembly is at and in the Third Party Annotation Section of the DDBJ/EMBL/GenBank databases under the accession number TPA: BL000001. The scaffolds are in DDBJ/EMBL/GenBank under the project accession number AACC00000000. The version described in this paper is the first version, AACC01000000. Individual accession numbers of the scaffolds are AACC01000001, AACC01000002, AACC01000003, AACC01000004, AACC01000005, AACC01000006, AACC01000007, AACC01000008, AACC01000009, AACC01000010, AACC01000011, AACC01000012, AACC01000013, AACC01000014, AACC01000015, AACC01000016, AACC01000017, AACC01000018, AACC01000019, AACC01000020, AACC01000021, AACC01000022, AACC01000023, AACC01000024, AACC01000025, and AACC01000026. The annotation data and analyses based on the CRA_TCAGchr7.v1 assembly described in this paper are shown (and are archived) as the March 2003 database freeze (see ). Additional annotations or updates to the sequence assembly will be available as subsequent freezes. The Washington University Genome Sequencing Center has also produced an assembly and analysis of human chromosome 7 (L. Hillier et al., Nature, in press).[PubMed]
- 8. Stein LD, et al Genome Res. 2002;10:1599.[Google Scholar]
- 9. Osborne LR, et al Genomics. 1997;45:402.[PubMed][Google Scholar]
- 10. Osborne LR, et al Nature Genet. 2001;29:321.[Google Scholar]
- 11. Wevrick R, Willard HF. Nucleic Acids Res. 1991;19:2295.
- 12. de la Puente A, et al Cytogenet Cell Genet. 1998;83:176.[PubMed][Google Scholar]
- 13. Ma B, Tromp J, Li M. Bioinformatics. 2002;18:440.[PubMed]
- 14. Mural RJ, et al Science. 2002;296:1661.[PubMed][Google Scholar]
- 15. Pevzner P, Tesler G. Genome Res. 2003;13:37.
- 16. Mouse Genome Sequencing Consortium. Nature. 2002;420:520.[PubMed]
- 17. The Fantom Consortium. Nature. 2002;420:523.[PubMed]
- 18. Heilig R, et al Nature. 2003;421:601.[PubMed][Google Scholar]
- 19. Dunham I, et al Nature. 1999;402:489.[PubMed][Google Scholar]
- 20. Deloukas P, et al Nature. 2001;414:865.[PubMed][Google Scholar]
- 21. Hattori M, et al Nature. 2000;405:311.[PubMed][Google Scholar]
- 22. Boocock GR, et al Nature Genet. 2003;33:97.[PubMed][Google Scholar]
- 23. Cheung J, et al Genome Biology. 2003;4:R25.[Google Scholar]
- 24. Bailey JA, et al Science. 2002;297:1003.[PubMed][Google Scholar]
- 25. Merlo GR, et al Genesis. 2002;33:97.[PubMed][Google Scholar]
- 26. Robledo RF, Rajan L, Li X, Lufkin T. Genes Dev. 2002;16:1089.
- 27. International Molecular Genetic Study of Autism Consortium. Human Mol Genet. 1998;3:571.[PubMed]
- 28. Lai CS, et al Am J Hum Genet. 2000;67:357.[Google Scholar]
- 29. Lai CS, Fisher SE, Hurst JA, Vargha-Khadem F, Monaco AP. Nature. 2001;413:519.[PubMed]
- 30. Scherer SW, et al Hum Mol Genet. 1994;3:1345.[PubMed][Google Scholar]
- 31. Kobayashi K, et al Nature Genet. 1999;2:159.[PubMed][Google Scholar]