Science 300(5620): 767-772

PMC: PMC2882961

PMID: 12690205

Human Chromosome 7: DNA Sequence and Biology

Layla Parker-Katiraee

Jennifer Skaug+81 authors

^{Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada, M5G 1X8}

^{Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, Ontario, Canada, M5G 1X8}

^{The Child Development Centre, The Hospital for Sick Children, Toronto, Ontario, Canada, M5G 1X8}

^{Division of Neurology, Department of Paediatrics, The Hospital for Sick Children, Toronto, Ontario, Canada, M5G 1X8}

^{Department of Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada, M5S 1A8}

^{Department of Medicine, University of Toronto, Toronto, Ontario, Canada, M5S 1A8}

^{Discipline of Genetics, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Newfoundland, Canada, A1B 3V6}

^{Celera Genomics, Rockville, MD 20850, USA}

^{Hamilton Health Sciences Centre and McMaster University, Hamilton, Ontario, Canada, L8N 3Z5}

^{Department of Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, Ontario, Canada, L8N 3Z5}

^{Department of Pediatrics, University of Toronto, Toronto, Ontario, Canada, M5G 1X8}

^{Division of Human Genetics and Molecular Biology, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104–4301, USA}

^{University of Phoenix Genetics Program, Phoenix, AZ 85016, USA}

^{Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA}

^{Section of Cancer Genetics, Institute of Cancer Research, Sutton, Surrey, SM2 5NG, UK}

^{Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada, V6H 3N1}

^{Wayne State University School of Medicine, Detroit, MI 48202, USA}

^{European Institute of Oncology, Department of Experimental Oncology, 20141 Milan, Italy}

^{Firc Institute for Molecular Oncology, Cancer Genetics Unit, 20134 Milan, Italy}

^{Universita di Roma Tor Vergata, Dipartimento di Biopatologia e Diagnostica per Immagini, 00133 Rome, Italy}

^{Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA}

^{Department of Genetics, Yale University School of Medicine, New Haven, CT 06520–8005, USA}

^{Department of Obstetrics, Gynecology and Reproductive Biology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA}

^{Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA}

^{Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA}

^{Harvard Partners Center for Genetics and Genomics, Harvard Medical School, Boston, MA 02115, USA}

^{Molecular Neurogenetics Laboratory, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA 02129, USA}

^{Department of Pediatrics, The Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA}

^{Department of Genetics, North York General Hospital, Toronto, Ontario, Canada, M2K 1E1}

^{Division of Medical Genetics, A. I. duPont Hospital for Children, Wilmington, DE 19899, USA}

^{Prenatal Diagnosis Program and Department of Laboratory Medicine and Pathobiology, University Health Network, The University of Toronto, Toronto, Ontario, Canada, M5G 1X5}

^{Medizinisches Zentrum für Humangenetik der Universität Marburg, D35037 Marburg, Germany}

^{Department of Biosciences, Karolinska Institute, at Novum and Clinical Research Centre, Huddinge University Hospital, S-141 57 Stockholm, Sweden}

^{Program in Genes and Disease, Centre for Genomic Regulation, 08003 Barcelona, Catalonia, Spain}

^{Department of Biology, University of Victoria, Victoria, British Columbia, Canada, V8W 3N5}

^{MRC Molecular Haematology Unit, Institute of Molecular Medicine, John Radcliffe Hospital, Oxford OX3 9DS, UK}

^{Department of Fetal and Maternal Medicine, Institute of Reproductive and Developmental Biology, Imperial College, Faculty of Medicine, Hammersmith Campus, London W12 0NN, UK}

^{Division of Endocrinology, Department of Medicine, University Health Network, University of Toronto, Toronto, Ontario, Canada, M5G 2C4}

^{Department of Genetics, The Life Sciences Institute, The Hebrew University, Jerusalem, 91904 Israel}

^{Institute of Medical Biology and Human Genetics, Karl-Franzens University of Graz, A-8010 Graz, Austria}

^{Department of Haematology, Royal Bournemouth Hospital, Bournemouth, BH7 7DW UK}

^{Department of Internal Medicine III, University Hospital of Ulm, Ulm, Germany, 89081}

^{Centre for Addiction and Mental Health, Clarke Institute and Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada, M5T 1R8}

^{To whom correspondence should be addressed.}ac.no.sdikkcis.teneg@evets

^{Present address: The University of Hong Kong, Pofuklam Road, Hong Kong.}

Abstract

DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. This approach enabled the discovery of candidate genes for developmental diseases including autism.

Abstract

With the advent of the Human Genome Project (HGP), a wealth of resources including genetic (1), physical (2, 3), gene (4), and draft DNA sequence maps (5, 6) have facilitated the discovery of more than 360 disease-associated genes and loci on chromosome 7 (table S1).

Here we present a comprehensive assembly of 157,953,789 nucleotides (nt) of DNA covering human chromosome 7. About 85% of the content was derived from a subset of unpublished Celera whole-genome scaffolds for chromosome 7 (7) based on updates of previous work (5). Another 15% was from new or updated clone-based sequences from the International Human Genome Sequencing Consortium (notably the Washington University Genome Sequencing Center) and other sources (supporting online text) (tables S2 and S3). The assembly (named CRA_TCAGchr7.v1) is available at a public Web site (www.chr7.org/) and in Gen-Bank (7). To maximize the utility of the sequence for discovery, we incorporated biological and medically relevant features from all available databases, the literature, and our data (7). Wherever possible, computer-based annotations of the sequence were examined manually and validated experimentally. Moreover, we included patient analysis as an aspect of the sequence annotation to increase knowledge of the function and regulation of genes. The Generic Model Organism Database (8) and its Genome Browser function were implemented to display all mapping, sequencing, structural, and clinical data to provide a mechanism and dynamic platform for human chromosome 7 annotation.

The assembled sequences were positioned to cytogenetic bands on chromosome 7 by fluorescence in situ hybridization (FISH) with 1440 genomic clones (7). The FISH resource also assisted in confirming order and copy number in chromosomal regions containing low-copy or complex repeats (9, 10). For the 770 bacterial genomic clones displayed in the Genome Browser, FISH experiments were reproduced more than once in at least two laboratories to allow accurate cytogenetic boundaries to be established. The sequence assembly reached both telomere ends and encompassed the apparent junction sequences between the euchromatic arms, and the D7Z2 and D7Z1 centromeric satellites on 7p and 7q, respectively. Because the centromere is polymorphic (ranging in size from 1500 to 3800 kb at D7Z1 and 100 to 500 kb at D7Z2) (11, 12), 2,700,000 nucleotides (nt) were substituted to represent an average-sized chromosome 7.

We tested all available genomic data against our assembly, including the latest National Center for Biotechnology Information (NCBI) chromosome 7 sequence database (Build 31) (supporting online text). Using the PatternHunter program (13) to compare CRA_TCAGchr7.v1 and Build 31, we found (i) a total of 1,186,913 nts of unmatched sequence between the assemblies, (ii) 132 other sites (encompassing 508,332 nt) where different sequences were found at the same relative chromosomal positions (termed sequence variations), and (iii) 10 equivalent DNA segments placed in an inverted orientation between the two assemblies (Fig. 1; figs. S1 and S2, table S4). The differences detected could be due to rearrangements arising during cloning, assembly mistakes, or polymorphism between the source chromosomal DNA (no correlation was observed between inverted regions and known genomic polymorphism or discrepancies in genetic maps).

An external file that holds a picture, illustration, etc.
Object name is nihms403f1.jpg

Fig. 1

DNA sequence comparison of CRA_TCAGchr7.v1 against NCBI Build 31. Black circles represent the sites of physical gaps. The sites and extent of unmatched sequences present in one assembly but not the other are shown in red, sequence variations in blue, and inversions in green. Genes present in CRA_TCAGchr7.v1, but absent in Build 31, are shown (see table S4; complete dataset is at www.chr7.org/).

Footnotes

Supporting Online Material

www.sciencemag.org/cgi/content/full/1083423/DC1

Materials and Methods

SOM Text

Figs. S1 and S2

Tables S1 to S9

References

Footnotes

References and Notes

References

1. Dib C, et al Nature. 1996;380:152.[PubMed][Google Scholar]
2. Kunz J, et al Genomics. 1994;22:439.[PubMed][Google Scholar]
3. Bouffard GG, et al Genome Res. 1997;7:59.[PubMed][Google Scholar]
4. Schuler GD, et al Science. 1996;274:540.[PubMed][Google Scholar]
5. Venter JC, et al Science. 2001;291:1304.[PubMed][Google Scholar]
6. International Human Genome Sequencing Consortium. Nature. 2001;409:860.[PubMed]
7. Materials and methods are available as on Science Online. The sequence assembly is at and in the Third Party Annotation Section of the DDBJ/EMBL/GenBank databases under the accession number TPA: BL000001. The scaffolds are in DDBJ/EMBL/GenBank under the project accession number AACC00000000. The version described in this paper is the first version, AACC01000000. Individual accession numbers of the scaffolds are AACC01000001, AACC01000002, AACC01000003, AACC01000004, AACC01000005, AACC01000006, AACC01000007, AACC01000008, AACC01000009, AACC01000010, AACC01000011, AACC01000012, AACC01000013, AACC01000014, AACC01000015, AACC01000016, AACC01000017, AACC01000018, AACC01000019, AACC01000020, AACC01000021, AACC01000022, AACC01000023, AACC01000024, AACC01000025, and AACC01000026. The annotation data and analyses based on the CRA_TCAGchr7.v1 assembly described in this paper are shown (and are archived) as the March 2003 database freeze (see ). Additional annotations or updates to the sequence assembly will be available as subsequent freezes. The Washington University Genome Sequencing Center has also produced an assembly and analysis of human chromosome 7 (L. Hillier et al., Nature, in press).[PubMed]
8. Stein LD, et al Genome Res. 2002;10:1599.[Google Scholar]
9. Osborne LR, et al Genomics. 1997;45:402.[PubMed][Google Scholar]
10. Osborne LR, et al Nature Genet. 2001;29:321.[Google Scholar]
11. Wevrick R, Willard HF. Nucleic Acids Res. 1991;19:2295.
12. de la Puente A, et al Cytogenet Cell Genet. 1998;83:176.[PubMed][Google Scholar]
13. Ma B, Tromp J, Li M. Bioinformatics. 2002;18:440.[PubMed]
14. Mural RJ, et al Science. 2002;296:1661.[PubMed][Google Scholar]
15. Pevzner P, Tesler G. Genome Res. 2003;13:37.
16. Mouse Genome Sequencing Consortium. Nature. 2002;420:520.[PubMed]
17. The Fantom Consortium. Nature. 2002;420:523.[PubMed]
18. Heilig R, et al Nature. 2003;421:601.[PubMed][Google Scholar]
19. Dunham I, et al Nature. 1999;402:489.[PubMed][Google Scholar]
20. Deloukas P, et al Nature. 2001;414:865.[PubMed][Google Scholar]
21. Hattori M, et al Nature. 2000;405:311.[PubMed][Google Scholar]
22. Boocock GR, et al Nature Genet. 2003;33:97.[PubMed][Google Scholar]
23. Cheung J, et al Genome Biology. 2003;4:R25.[Google Scholar]
24. Bailey JA, et al Science. 2002;297:1003.[PubMed][Google Scholar]
25. Merlo GR, et al Genesis. 2002;33:97.[PubMed][Google Scholar]
26. Robledo RF, Rajan L, Li X, Lufkin T. Genes Dev. 2002;16:1089.
27. International Molecular Genetic Study of Autism Consortium. Human Mol Genet. 1998;3:571.[PubMed]
28. Lai CS, et al Am J Hum Genet. 2000;67:357.[Google Scholar]
29. Lai CS, Fisher SE, Hurst JA, Vargha-Khadem F, Monaco AP. Nature. 2001;413:519.[PubMed]
30. Scherer SW, et al Hum Mol Genet. 1994;3:1345.[PubMed][Google Scholar]
31. Kobayashi K, et al Nature Genet. 1999;2:159.[PubMed][Google Scholar]