The African coelacanth genome provides insights into tetrapod evolution.
Journal: 2013/May - Nature
ISSN: 1476-4687
Abstract:
The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
Relations:
Content
Citations
(178)
References
(41)
Chemicals
(1)
Genes
(74)
Organisms
(5)
Processes
(7)
Anatomy
(2)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Nature. Apr/17/2013; 496(7445): 311-316

Analysis of the African coelacanth genome sheds light on tetrapod evolution

+82 authors

Abstract

It was a zoological sensation when a living specimen of the coelacanth was first discovered in 1938, as this lineage of lobe-finned fish was thought to have gone extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features . Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain, and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues demonstrate the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.

Introduction

It was 1938 when Ms. Marjorie Courtenay-Latimer, the curator of a small natural history museum in East London, South Africa, discovered a large, peculiar looking fish among the myriad specimens delivered to her by a local fish trawler. Latimeriachalumnae, named after its discoverer1, was over one meter long, bluish in coloration, and had conspicuously fleshy fins that resembled the limbs of terrestrial vertebrates. This discovery turned out to be a biological sensation and is considered one of the greatest zoological finds of the 20th century. Latimeria is the only living member of an ancient group of lobe-finned fishes previously known only from fossils and believed to have been extinct since the Late Cretaceous period, about 70 million years ago (MYA)1. It took almost 15 years before a second specimen of this elusive species was discovered in the Comoros Islands in the Indian Ocean, and only a total of 309 individuals that are known to science, have been found in the past 75 years (Rik Nulens, personal communication)2. The discovery in 1997 of a second coelacanth species in Indonesia, L. menadoensis, was equally surprising, as it had been assumed that living coelacanths were confined to small populations off the East African coast34. Fascination with these fish is partly due to their prehistoric appearance – remarkably, their morphology is similar to that of fossils that date back at least 300 million years, leading to the supposition that this lineage is especially slow-evolving among vertebrates1,5. Latimeria has also been of particular interest to evolutionary biologists due to its hotly debated relationship to our last fish ancestor – the fish that first crawled up on land6. In the past 15 years, targeted sequencing efforts have yielded the sequences of the coelacanth mitochondrial genomes7, HOX clusters8, and a few gene families910, but still, coelacanth research has felt the lack of large-scale sequencing data.

Here we describe the sequencing and comparative analysis of the genome of L. chalumnae, the African coelacanth.

Genome assembly and annotation

The African coelacanth genome was sequenced and assembled (LatCha1.0) using DNA from a Comoros Islands Latimeria chalumnae specimen (Supplementary Figure 1). It was sequenced by Illumina sequencing technology and assembled via ALLPATHS-LG11. The L. chalumnae genome has previously been reported to have a karyotype of 48 chromosomes12. The draft assembly is 2.86 Gb in size and is composed of 2.18 Gb of sequence plus gaps between contigs. The coelacanth genome assembly has a contig N50 size of 12.7 kb and a scaffold N50 size of 924 kb, and quality metrics comparable to other Illumina genomes (See Supplementary Note 1, Supplementary Tables 1,2).

The genome assembly was annotated separately by both the Ensembl gene annotation pipeline (Ensembl release 66, February 2012) and by MAKER13. The Ensembl gene annotation pipeline created gene models using Uniprot protein alignments, limited coelacanth cDNA data, RNA-seq data generated from L. chalumnae muscle (18 Gb of paired end reads were assembled by Trinity14, Supplementary Figure 2) as well as orthology with other vertebrates. This pipeline produced 19,033 protein coding genes containing 21,817 transcripts. The MAKER pipeline used the L. chalumnae Ensembl gene set, Uniprot protein alignments, and L. chalumnae (muscle) and L. menadoensis (liver and testis)15 RNA-seq to create gene models, yielding 29,237 protein coding gene annotations. In addition, 2,894 short non-coding RNAs, 1,214 lncRNAs and more than 24,000 conserved RNA secondary structures were identified (Supplementary Note 2, Supplementary Tables 3–4, Supplementary Dataset 13, Supplementary Figure 3). 336 genes were inferred to have undergone specific duplications in the coelacanth lineage (Supplementary Note 3, Supplementary Tables 5–6, Supplementary Dataset 4).

Closest living fish relative of tetrapods

The question of which living fish is the closest relative to ‘the fish that first crawled up on land’ has long captured our imagination: among scientists the odds have been placed on either the lungfish or the coelacanth16. Analyses of small to moderate amounts of sequence data for this important phylogenetic question (ranging from 1 to 43 genes) has tended to favor the lungfishes as the extant sister group to the land vertebrates17, however, the alternative hypothesis that lungfish and coelacanth are equally closely related to the tetrapods could not be rejected with previous data sets18.

To seek a comprehensive answer we generated RNA-seq data from three samples (brain, gonad/kidney, gut/liver) from the West African lungfish, Protopterus annectens, and compared it to gene sets from 21 strategically chosen jawed vertebrate species. To perform a reliable analysis we selected 251 genes where 1–1 orthology was clear and used CAT-GTR, a complex site-heterogeneous model of sequence evolution known to reduce tree reconstruction artefacts19 (see Methods). The resulting phylogeny, based on 100,583 concatenated amino acid positions, (Figure 1, PP=1.0 for the lungfish-tetrapod node) is fully resolved except for the relative positions of armadillo and elephant. It corroborates known vertebrate phylogenetic relationships and strongly supports the conclusion that tetrapods are more closely related to lungfish than to the coelacanth (Supplementary Note 4, Supplementary Figure 4).

How slowly evolving is the coelacanth?

The morphological resemblance of the modern coelacanth to its fossil ancestors has resulted in it being nicknamed ‘the living fossil’1. This invites the question: Is the genome of the coelacanth as slowly evolving as its outward appearance suggests? Earlier work found that a few gene families, such as Hox and protocadherins, showed comparatively slower protein-coding evolution in coelacanth than in other vertebrate lineages8,10. To address this question, we examined several types of genomic changes in the coelacanth compared to other vertebrates.

Protein-coding gene evolution was examined using the 251 concatenated protein phylogenomics dataset (Figure 1). Pair-wise distances between taxa were calculated from the branch lengths of the tree using the Two-Cluster test proposed by Takezaki et al.20 to test for equality of average substitution rates. Then, for each of the following species and species clusters (coelacanth, lungfish, chicken and mammals), we ascertained their respective mean distance to an outgroup consisting of three cartilaginous fishes (elephant shark, little skate and spotted catshark). Finally, we tested whether there was any significant difference in distance to the outgroup of cartilaginous fish for every pair of species and species clusters, using a Z-statistic. When these distances to the outgroup of cartilaginous fish were compared, we found that the coelacanth proteins tested were significantly more slowly evolving (0.890 substitutions/site) than the lungfish (1.05 substitutions/site), chicken (1.09 substitutions/site) and mammalian (1.21 substitutions/site) orthologues (Supplementary Dataset 5), in all cases with p-values <10−6. Additionally, as can be seen in Figure 1, the substitution rate in coelacanth is approximately half that in tetrapods since the two lineages diverged. A Tajima relative rate test21 confirmed the coelacanth’s significantly slower rate of protein evolution (Supplementary Dataset 6).

Secondly, we examined the abundance of transposable elements (TEs) in the coelacanth genome. Theoretically, TEs might contribute most significantly to the evolution of a species by generating templates for exaptation to form novel regulatory elements and exons, and by acting as substrates for genomic rearrangement22. We found that the coelacanth genome contains a wide variety of TE superfamilies and has a relatively high TE content (25%); this number is likely an underestimate due to the draft nature of the assembly (Supplementary Note 5, Supplementary Tables 7–10). Analysis of RNA-seq data and of the divergence of individual TE copies from consensus sequences show that 14 coelacanth TE super-families are currently active (Supplementary Note 6, Supplementary Table 10, Supplementary Figure 5). We conclude that the current coelacanth genome shows both an abundance and activity of TEs similar to many other genomes. This contrasts with the slow protein evolution observed.

Analyses of chromosomal breakpoints in the coelacanth genome and tetrapod genomes reveal extensive conservation of synteny and indicate that large-scale rearrangements have occurred at a generally low rate in the coelacanth lineage. Analyses of these rearrangement classes detected several previously published fission events that are known to have occurred in tetrapod lineages and at least 31 interchromosomal rearrangements that occurred in the coelacanth lineage or the early tetrapod lineage (0.063 fusions/million years), compared to 20 events (0.054 fusions/million years) in the salamander lineage and 21 events (0.057 fusions/million years) in the Xenopus lineage23 (Supplementary Note 7, Supplementary Figure 6). Overall, these analyses indicate that karyotypic evolution in the coelacanth lineage has occurred at a relatively slow rate, similar to that of non-mammalian tetrapods24.

In a separate analysis we also examined the evolutionary divergence between the two species of coelacanth, L. chalumnae and L. menadoensis, found in African and Indonesian waters respectively. Previous analysis of mitochondrial DNA showed a sequence identity of 96%, but estimated divergence times range widely from 6 to 40 million years2526. When we compared the liver and testis transcriptomes of L. menadoensis27 to the L. chalumnae genome, we found an identity of 99.73% (Supplementary Note 8, Supplementary Figure 7), whereas alignments between 20 sequenced L. menadoensis BACs and the L. chalumnae genome showed an identity of 98.7% (Supplementary Table 11, Supplementary Figure 8). Both the genic and genomic divergence rates are similar to those seen between the human and chimpanzee genomes (99.5% and 98.8% respectively, divergence time 6–8 million years ago)28, while the rates of molecular evolution in Latimeria are likely affected by multiple factors including the slower substitution rate seen in coelacanth, thereby suggesting a slightly larger divergence time for the two coelacanth species.

Vertebrate adaptation to land

As the sequenced genome closest to our most recent aquatic ancestor, the coelacanth provides a unique opportunity to identify genomic changes that were associated with the successful adaptation of vertebrates to an important new environment – land.

Over the 400 MY interval that vertebrates have lived on land, genes that are unnecessary for existence in their new environment would have been eliminated. To understand this aspect of the water-to-land transition, we surveyed the Latimeria genome annotations to identify genes that were present in the last common ancestor of all bony fish (including coelacanth) but that are missing from tetrapod genomes. More than 50 such genes including components of the Fgf signaling, TGF-beta/Bmp signaling, and Wnt signaling pathways, as well as many transcription factor genes, were inferred to be lost based on the coelacanth data (Supplementary Dataset 7, Supplementary Figure 9). Previous studies of genes lost in this transition could only compare teleost fish to tetrapods, meaning that differences in gene content could have been due to loss in the tetrapod or in the lobe-finned fish lineages. We were able to confirm that four genes previously shown to be absent in tetrapods (Actinodin genes29, Fgf2430, Asip231), were indeed present and intact in Latimeria, supporting their loss in the tetrapod lineage.

We functionally annotated the >50 genes lost in tetrapods using zebrafish data (gene expression, knock-downs and knock-outs). Many genes were classified in important developmental categories (Supplementary Dataset 7): Fin development (13 genes), otolith and ear development (8 genes), kidney development (7 genes), trunk/somite/tail development (11 genes), eye (13 genes), and brain development (23 genes). This implies that critical characters in the morphological transition from water to land (fin-to-limb transition, remodelling of the ear, etc.) are reflected in the loss of specific genes along the phylogenetic branch leading to tetrapods. However, homeobox genes, which are responsible for the development of an organism’s basic body plan, show only slight differences between Latimeria, ray-finned fish and tetrapods; it would appear that the protein-coding portion of this gene family, along with several others (Supplementary Note 9, Supplementary Tables 12–16, Supplementary Figure 10), have remained largely conserved during the vertebrate land transition. (Supplementary Figure 11).

As vertebrates transitioned to a new land environment, changes occurred not only in gene content, but also in the regulation of existing genes. Conserved non-coding elements (CNEs) are strong candidates for gene regulatory elements and can act as promoters, enhancers, repressors and insulators3233, and have been implicated as major facilitators of evolutionary change34. To identify CNEs that originated in the most recent common ancestor of tetrapods, we predicted CNEs that evolved in various bony vertebrate (i.e., ray-finned fish, coelacanth and tetrapod) lineages and assigned them to their likely branch points of origin. To detect CNEs, conserved sequences in the human genome were identified using MULTIZ alignments of bony vertebrate genomes, and then known protein-coding sequences, UTRs and known RNA genes were excluded. Our analysis identified 44,200 ancestral tetrapod CNEs that originated after the divergence of the coelacanth lineage. They represent 6% of the 739,597 CNEs that are under constraint in the bony vertebrate lineage. We compared the ancestral tetrapod CNEs to mouse embryo ChIP-seq data obtained using antibodies against p300, a transcriptional co-activator. This resulted in a 7-fold enrichment in the p300 binding sites for our candidate CNEs and confirmed that these CNEs are indeed enriched for gene regulatory elements.

Each tetrapod CNE was assigned to the gene whose transcription start site was closest, and GO category enrichment was calculated for those genes. The most enriched categories were involved with smell perception (sensory perception of smell, detection of chemical stimulus, olfactory receptor activity etc.). This is consistent with the notable expansion of olfactory receptor family genes in tetrapods compared with teleosts, and may reflect the necessity of a more tightly regulated, larger and more diverse repertoire of olfactory receptors for detecting airborne odorants as part of the terrestrial lifestyle. Other significant categories include morphogenesis (radial pattern formation, hind limb morphogenesis, kidney morphogenesis) and cell differentiation (endothelial cell fate commitment, epithelial cell fate commitment), which is consistent with the body plan changes required for land transition, as well as immunoglobulin VDJ recombination, which reflects the presumed response differences required to address the novel pathogens that vertebrates would encounter on land (Supplementary Note 10, Supplementary Tables 17–24).

A major innovation of tetrapods is the evolution of limbs characterised by digits. The limb skeleton consists of a stylopod (humerus or femur), the zeugopod (radius/ulna and tibia/fibula), and an autopod (wrist/ankle and digits). There are two major hypotheses about the origins of the autopod – either it was a novel feature of tetrapods, or it has antecedents in the fins of fish35 (Supplementary Note 11, Supplementary Figure 12). We examine here the Hox regulation of limb development in ray-finned fish, coelacanth, and tetrapods to address these hypotheses.

In mouse, late phase digit enhancers are located in a gene desert located proximal to the HOX-D cluster36. Here we provide an alignment of the HoxD centromeric gene desert of coelacanth with tetrapods and ray-finned fishes (Figure 2a). Among the six cis-regulatory sequences previously identified in this gene desert36, three sequences show sequence conservation restricted to tetrapods (Supplementary Figure 13). However, one regulatory sequence (Island 1) is shared between tetrapods and coelacanth, but not with ray-finned fish (Figure 2b, Supplementary Figure 14). When tested in a transient transgenic assay in mouse, the coelacanth sequence of Island 1 was able to drive reporter expression in a limb specific pattern (Figure 2c), making it likely that Island 1 was a lobe-fin developmental enhancer in the fish ancestor of tetrapods that was then coopted into the autopod enhancer of modern tetrapods. In this case, the autopod developmental regulation was derived from an ancestral lobe-finned fish regulatory element.

Changes in the urea cycle provide an illuminating example of the adaptations associated with transition to land. Excretion of nitrogen is a major physiological challenge for terrestrial vertebrates. In aquatic environments, the primary nitrogenous waste product is ammonia, which is readily diluted by surrounding water before it reaches toxic levels, but on land, less toxic substances such as urea or uric acid must be produced instead (Supplementary Figure 15). The widespread and almost exclusive occurrence of urea excretion in amphibians, some turtles and mammals has led to the hypothesis that the use of urea as the main nitrogenous waste product was a key innovation in the vertebrate transition from water to land37.

With the availability of gene sequences from coelacanth and lungfish, it became possible to test this hypothesis. We used a branch-site model in the HYPHY package38, which estimates dN/dS (ω) values among different branches and among different sites (codons) across a multiple species sequence alignment. For the rate-limiting enzyme of the hepatic urea cycle, carbamoyl phosphate synthase I (CPS1), only one branch of the tree shows a strong signature of selection (p = 0.02), namely the branch leading to tetrapods and the branch leading to amniotes (Figure 3); no other enzymes in this cycle showed a signature of selection. Conversely, mitochondrial arginase (ARG2), which produces extrahepatic urea as a byproduct of arginine metabolism but which is not involved in the production of urea for nitrogenous waste disposal, did not show any evidence of selection in vertebrates (Supplementary Figure 16). This leads us to conclude that adaptive evolution occurred in the hepatic urea cycle during the vertebrate land transition. In addition, it is interesting to note that of the five amino acids of CPS1 that changed between coelacanth and tetrapods, three are in important domains (ATP-A site, ATP-B site, subunit interaction domain) and a fourth is known to cause a malfunctioning enzyme in human patients if mutated39.

The adaptation to a terrestrial lifestyle necessitated major changes in the physiological milieu of the developing embryo and fetus, resulting in the evolution and specialization of extraembryonic membranes of the amniote mammals40. The placenta, in particular, is a complex structure that is critical for providing gas and nutrient exchange between mother and fetus and is also a major site of hematopoiesis41.

We have identified a region of the coelacanth HOX-A cluster that may have been involved in the evolution of extraembryonic structures in tetrapods, including the eutherian placenta. Global alignment of the coelacanth Hoxa14-a13 region with the homologous regions of the horn shark, chicken, human and mouse yielded a CNE just upstream of the coelacanth Hoxa14 gene (Supplementary Figure 17a, arrow). This conserved stretch is not found in teleost fishes but is highly conserved among horn shark, chicken, human and mouse despite the fact that the latter three have no Hoxa14 orthologues, and that the horn shark Hoxa14 gene has become a pseudogene. This CNE, HA14E1, corresponds to the proximal promoter-enhancer region of the Hoxa14 gene in Latimeria. HA14E1 is >99% identical between mouse, human and all other sequenced mammals, and would thus be considered an ultraconserved element42. The high level of conservation suggests that this element, which already possessed promoter activity, may have been coopted for other functions despite the loss of the Hoxa14 gene in amniotes (Supplementary Figure 17bc). Expression of human HA14E1 in a mouse transient transgenic assay did not give notable expression in the embryo proper at day 11.543, which was unexpected since its location would predict that it would regulate axial structures caudally44. A similar experiment in chick embryos using the chicken HA14E1 also showed no activity in the AP-axis. However, stunning expression was observed in the extraembryonic area vasculosa of the chick embryo (Figure 4a). Examination of a Latimeria BAC Hoxa14-reporter transgene in mouse embryos showed that the Hoxa14 gene is specifically expressed in a subset of cells in an extraembryonic region at E8.5 (Figure 4b).

These findings suggest that the HA14E1 region may have been evolutionarily recruited to coordinate regulation of posterior HoxA genes (Hoxa13, Hoxa11 and Hoxa10), which are known to be expressed in the mouse allantois and are critical for early formation of the mammalian placenta45. Although Latimeria does not possess a placenta, it is a livebearer and has very large, vascularised eggs, but the relationship of Hoxa14, the HA14E1 enhancer, and blood island formation in the coelacanth remains unknown.

Coelacanth lacks IgM

Immunoglobulin M (IgM), a class of antibodies, has been reported in all vertebrate species thus far characterised and is considered to be indispensable for adaptive immunity46. Interestingly, IgM genes cannot be found in coelacanth despite an exhaustive search of the coelacanth sequence data, and even though all other major components of the immune system are present (Supplementary Note 12, Supplementary Figure 18). Instead, we found two IgW genes (Supplementary Figures 19–21), immunoglobulin genes only found in lungfish and cartilaginous fish and which are believed to have originated in the ancestor of jawed vertebrates47 and to have been subsequently lost in teleosts and tetrapods. IgM was similarly absent from the Latimeria RNA-seq data, although both IgW genes were found as transcripts. To further characterise the apparent absence of IgM, we exhaustively screened large genomic L. menadoensis libraries using numerous strategies and probes and also performed PCR with degenerate primers that should universally amplify IgM sequences. The lack of IgM in Latimeria raises questions as to how coelacanth B cells respond to microbial pathogens and whether the IgW molecules can serve a compensatory function, even though there is no indication that the coelacanth IgW was derived from vertebrate IgM genes.

Discussion

Ever since its discovery, the coelacanth has been referred to as a ‘living fossil’ due to its morphological similarities to its fossil ancestors1. However, questions have remained as to whether it truly is slowly evolving, as morphological stasis does not necessarily imply genomic stasis. In this study, we determined that L. chalumnae’s protein-coding genes show a decreased substitution rate compared to those of other sequenced vertebrates, even though its genome as a whole does not show evidence of low genome plasticity. The reason for this lower substitution rate is still unknown, although a static habitat and a lack of predation over evolutionary timescales could be contributing factors to a lower need for adaptation. A closer examination of gene families that show either unusually high or low levels of directional selection indicative of adaptation in the coelacanth, could tell us a great deal about which selective pressures, or lack thereof, shaped this evolutionary relict (Supplementary Note 13, Supplementary Figure 22).

The vertebrate land transition is one of the most important steps in our evolutionary history. We conclude that the closest living fish to the tetrapod ancestor is the lungfish, not the coelacanth. However, the coelacanth is critical for our understanding of this transition, as the lungfish have intractable genome sizes (estimated at 50–100 Gb)48. We have already learned a great deal about our adaptation to land through coelacanth whole genome analysis, and we have shown the promise of focused analysis of specific gene families involved in this process. Still, further study of these changes between tetrapods and the coelacanth will undoubtedly yield important insights as to how a complex organism like a vertebrate can so drastically change its way of life.

Methods: Appear in the online supplement.

Supplementary Material

Figure 1

A phylogenetic tree of a broad selection of jawed vertebrates shows that lungfish, not coelacanth, is the closest relative of tetrapods

Multiple sequence alignments of 251 genes present as 1-to-1 orthologs in 22 vertebrates and with a full sequence coverage for both lungfish and coelacanth were used to generate a concatenated matrix of 100,583 unambiguously aligned amino acid positions. The Bayesian tree was inferred using PhyloBayes under the CAT+GTR+Г4 model with confidence estimates derived from 100 jackknife tests (1.0 posterior probability)49. The tree was rooted on cartilaginous fish. It shows both that lungfish is more closely related to tetrapods than coelacanth and that the protein sequence of coelacanth is slowly evolving.

Figure 2

Alignment of the HOX-D locus and upstream gene desert identifies conserved limb enhancers

(a) Organization of the mouse HOX-D locus and centromeric gene desert, flanked by the ATF2 and MTX2 genes. Limb regulatory sequences (I1, I2, I3, I4, CsB and CsC) are noted. Using the mouse locus as a reference (NCBI37/mm9 assembly), corresponding sequences from human, chicken, frog, coelacanth, pufferfish, medaka, stickleback, zebrafish and elephant shark were aligned. Alignment shows regions of homology between tetrapod, coelacanth and ray-finned fishes. (b) Alignment of vertebrate cis-regulatory elements I1, I2, I3, I4, CsB and CsC. (c) Expression patterns of coelacanth Island I in a transgenic mouse. Limb buds indicated by arrowheads in the first two panels. The third panel shows a close-up of a limb bud.

Figure 3

Phylogeny of CPS1 coding sequences used to determine positive selection within the urea cycle

Branch lengths are scaled to the expected number of substitutions/nucleotide and branch color indicates the strength of selection (dN/dS or ω) with red corresponding to positive or diversifying selection (ω > 5), blue to purifying selection (ω = 0), and yellow to neutral evolution (ω = 1). Thick branches indicate statistical support for evolution under episodic diversifying selection. The proportion of each color represents the fraction of the sequence undergoing the corresponding class of selection.

Figure 4

Transgenic analysis implicates involvement of Hox CNE HA14E1 in extraembryonic activities in the chick and mouse

(A) Chicken HA14E1 drives reporter expression in blood islands in chick embryos. A construct containing chicken HA14E1 upstream of a minimal (TK) promoter driving eGFP was electroporated in HH4 stage chick embryos together with a nuclear mCherry construct. GFP expression was analyzed at stage ~ HH11. The green aggregations and punctate staining are observed in the blood islands and developing vasculature. (B) Expression of Latimeria Hoxa14 reporter transgene in the developing placental labyrinth of a mouse embryo. A field of cells from the labyrinth region of an E8.5 embryo from a BAC transgenic line containing coelacanth Hoxa14-Hoxa950 in which the Hoxa14 gene had been supplanted with the gene for red fluorescence protein (RFP). Immunohistochemistry was used to detect RFP (brown staining in a small number of cells).

Footnotes

Supplementary Information is linked to the online version of the paper at www.nature.com/nature.

Author contributions JA, CTA, AM and KLT planned and oversaw the project. RD and CTA provided blood and tissues for sequencing. CTA and ML prepared the DNA for sequencing. IM, SG, DP, FJR, TS and DJ assembled the genome. NRS prepared RNA from L. chalumnae LF and JL made the L. chalumnae RNA-seq library. AC, MB, MAB, MF, FB, GS, AMF, AP, MG, GDM, JT-M and EO sequenced and analyzed the L. menadoensis RNA-seq library. BA, SMJS, SW, MC and MY annotated the genome. WH and CPP performed the lncRNAs annotation and analysis. PFS, SH, AN, HT, and SJP annotated ncRNAs. MG, GDM, AP, MR and CTA compared L. chalumnae and L. menadoensis sequence. HB, DB and HP performed the phylogenomic analysis. TMa and AM performed the gene relative rate analysis. AC, JG, SP, BP, PvH and UH performed the analysis, annotation and statistical enrichment of L. chalumnae specific gene duplications. NF and AM analyzed the homeobox gene repertoires. DC, SF, OS, J-NV, MS and AM analysed transposable elements. JJS analysed large scale rearrangements in vertebrate genomes. IB, JP, NF and SK analysed genes lost in tetrapods. TMi analyzed actinodin and pectoral fin musculature. CO and MS analysed selection in urea cycle genes. AL and BV performed the conserved non-coding element analysis. IS, NR, VR, NS and CT performed the analysis of autopodial CNEs. KS, TS-S and CTA examined the evolution of a placenta-related CNE. NRS, GWL, MGM, TO and CTA performed the IgM analysis. JA, CTA, AM and KLT wrote the paper with input from other authors. AG, DT and LW constructed fosmid libraries for the L. chalumnae genome assembly.

Author Information

The L. chalumnae genome assembly has been deposited in GenBank under the accession number AFYH00000000. The L. chalumnae transcriptome has been deposited under the accession number SRX117503 and the P. annectans transcriptomes have been deposited under the accession numbers SRX152529, SRX152530, and SRX152531. The P. annectans mitochondrial DNA sequence was deposited under the accession number JX568887. All animal experiments were approved by the MIT Committee for Animal Care.

The authors declare no competing financial interests.

Acknowledgments

Acquisition and storage of Latimeria chalumnae samples was supported by grants from the African Coelacanth Ecosystem Programme of the South African National Department of Science and Technology. Generation of the Latimeria chalumnae and Protopterus annectens sequence by Broad Institute of MIT and Harvard was supported by grants from the National Human Genome Research Institute (NHGRI). KLT is the recipient of a EURYI award from the ESF. We would also like to thank the Genomics Sequencing Platform of the Broad Institute for sequencing the L. chalumnae genome and L. chalumnae and P. annectens transcriptomes, Said Ahamada, Robin Stobbs and the Association pour le Protection de Gombesa (APG) for their help in obtaining coelacanth samples, Yu Zhao for the use of data from Rana chensinensis, and Leslie Gaffney, Catherine Hamilton and John Westlund for assistance with figure preparation.

References

  • 1. SmithJLBA Living Fish of Mesozoic TypeNature1939143455456doi:10.1038/143455a0[Google Scholar]
  • 2. NulensRScottLHerbinMAn Updated Inventory of All Known Specimens of the Coelacanth, Latimeria Spp: By Rik Nulens, Lucy Scott and Marc Herbin2010
  • 3. ErdmannMCaldwellRKasim MoosaMIndonesian 'king of the sea' discoveredNature1998395335[Google Scholar]
  • 4. SmithJLOld Fourlegs: The story of the coelacanth1956Longmans, Green
  • 5. ZhuMEarliest known coelacanth skull extends the range of anatomically modern coelacanths to the Early DevonianNat Commun20123772doi:ncomms1764 [pii] 10.1038/ncomms1764[PubMed][Google Scholar]
  • 6. ZimmerCAt the Water's Edge: Fish with Fingers, Whales with Legs, and How Life Came Ashore but Then Went Back to Sea1999Free Press
  • 7. ZardoyaRMeyerAThe complete DNA sequence of the mitochondrial genome of a "living fossil," the coelacanth (Latimeria chalumnae)Genetics19971469951010[PubMed][Google Scholar]
  • 8. AmemiyaCTComplete HOX cluster characterization of the coelacanth provides further evidence for slow evolution of its genomeProc Natl Acad Sci U S A201010736223627doi:0914312107 [pii] 10.1073/pnas.0914312107[PubMed][Google Scholar]
  • 9. LarssonTALarsonETLarhammarDCloning and sequence analysis of the neuropeptide Y receptors Y5 and Y6 in the coelacanth Latimeria chalumnaeGen Comp Endocrinol2007150337342doi:S0016-6480(06)00296-6 [pii] 10.1016/j.ygcen.2006.09.002[PubMed][Google Scholar]
  • 10. NoonanJPCoelacanth genome sequence reveals the evolutionary history of vertebrate genesGenome Res20041423972405doi:gr.2972804 [pii] 10.1101/gr.2972804[PubMed][Google Scholar]
  • 11. GnerreSHigh-quality draft assemblies of mammalian genomes from massively parallel sequence dataProc Natl Acad Sci U S A201110815131518doi:1017351108 [pii] 10.1073/pnas.1017351108[PubMed][Google Scholar]
  • 12. BogartJPBalonEKBrutonMNThe chromosomes of the living coelacanth and their remarkable similarity to those of one of the most ancient frogsJ Hered199485322325[PubMed][Google Scholar]
  • 13. CantarelBLMAKER: an easy-to-use annotation pipeline designed for emerging model organism genomesGenome Res200818188196doi:gr.6743907 [pii] 10.1101/gr.6743907[PubMed][Google Scholar]
  • 14. GrabherrMGFull-length transcriptome assembly from RNA-Seq data without a reference genomeNat Biotechnol201129644652doi:nbt.1883 [pii] 10.1038/nbt.1883[PubMed][Google Scholar]
  • 15. PallaviciniAAnalysis of the transcriptome of the Indonesian coelacanthLatimeria menadoensis2012submitted[Google Scholar]
  • 16. SchultzeHPTruebLOrigins of the Higher Groups of Tetrapods: Controversy and Consensus1991Comstock Pub. Associates
  • 17. MeyerADolvenSIMolecules, fossils, and the origin of tetrapodsJ Mol Evol199235102113[PubMed][Google Scholar]
  • 18. BrinkmannHVenkateshBBrennerSMeyerANuclear protein-coding genes support lungfish and not the coelacanth as the closest living relatives of land vertebratesProc Natl Acad Sci U S A200410149004905doi:10.1073/pnas.0400609101 0400609101 [pii][PubMed][Google Scholar]
  • 19. LartillotNPhilippeHA Bayesian mixture model for across-site heterogeneities in the amino-acid replacement processMol Biol Evol20042110951109doi:10.1093/molbev/msh112msh112 [pii][PubMed][Google Scholar]
  • 20. TakezakiNRzhetskyANeiMPhylogenetic test of the molecular clock and linearized treesMol Biol Evol199512823833[PubMed][Google Scholar]
  • 21. TajimaFSimple methods for testing the molecular evolutionary clock hypothesisGenetics1993135599607[PubMed][Google Scholar]
  • 22. BejeranoGA distal enhancer and an ultraconserved exon are derived from a novel retroposonNature20064418790doi:nature04696 [pii] 10.1038/nature04696[PubMed][Google Scholar]
  • 23. VossSROrigin of amphibian and avian chromosomes by fission, fusion, and retention of ancestral chromosomesGenome Res20112113061312doi:gr.116491.110 [pii] 10.1101/gr.116491.110[PubMed][Google Scholar]
  • 24. SmithJJVossSRGene order data from a model amphibian (Ambystoma): new perspectives on vertebrate genome structure and evolutionBMC Genomics20067219doi:1471-2164-7-219 [pii] 10.1186/1471-2164-7-219[PubMed][Google Scholar]
  • 25. InoueJGMiyaMVenkateshBNishidaMThe mitochondrial genome of Indonesian coelacanth Latimeria menadoensis (Sarcopterygii: Coelacanthiformes) and divergence time estimation between the two coelacanthsGene2005349227235doi:S0378-1119(05)00017-X [pii] 10.1016/j.gene.2005.01.008[PubMed][Google Scholar]
  • 26. HolderMTErdmannMVWilcoxTPCaldwellRLHillisDMTwo living species of coelacanths?Proc Natl Acad Sci U S A1999961261612620[PubMed][Google Scholar]
  • 27. CanapaAComposition and Phylogenetic Analysis of Vitellogenin Coding Sequences in the Indonesian Coelacanth Latimeria menadoensisJ Exp Zool B Mol Dev Evol2012318404416doi:10.1002/jez.b.22455[PubMed][Google Scholar]
  • 28. Initial sequence of the chimpanzee genome and comparison with the human genomeNature20054376987doi:nature04072 [pii] 10.1038/nature04072[PubMed][Google Scholar]
  • 29. ZhangJLoss of fish actinotrichia proteins and the fin-to-limb transitionNature2010466234237doi:10.1038/nature09137[PubMed][Google Scholar]
  • 30. JovelinREvolution of developmental regulation in the vertebrate FgfD subfamilyJournal of experimental zoology. Part B, Molecular and developmental evolution20103143356doi:10.1002/jez.b.21307[Google Scholar]
  • 31. BraaschIPostlethwaitJHThe teleost agouti-related protein 2 gene is an ohnolog gone missing from the tetrapod genomeProceedings of the National Academy of Sciences of the United States of America2011108E47E48doi:10.1073/pnas.1101594108[PubMed][Google Scholar]
  • 32. NavratilovaPSystematic human/zebrafish comparative identification of cis-regulatory activity around vertebrate developmental transcription factor genesDev Biol2009327526540doi:S0012-1606(08)01320-1 [pii] 10.1016/j.ydbio.2008.10.044[PubMed][Google Scholar]
  • 33. XieXSystematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sitesProc Natl Acad Sci U S A200710471457150doi:0701811104 [pii] 10.1073/pnas.0701811104[PubMed][Google Scholar]
  • 34. JonesFCThe genomic basis of adaptive evolution in threespine sticklebacksNature20124845561doi:nature10944 [pii] 10.1038/nature10944[PubMed][Google Scholar]
  • 35. ShubinNTabinCCarrollSDeep homology and the origins of evolutionary noveltyNature2009457818823doi:nature07891 [pii] 10.1038/nature07891[PubMed][Google Scholar]
  • 36. MontavonTA regulatory archipelago controls Hox genes transcription in digitsCell201114711321145doi:S0092-8674(11)01273-6 [pii] 10.1016/j.cell.2011.10.023[PubMed][Google Scholar]
  • 37. WrightPANitrogen excretion: three end products, many physiological rolesJ Exp Biol1995198273281[PubMed][Google Scholar]
  • 38. Kosakovsky PondSLA random effects branch-site model for detecting episodic diversifying selectionMol Biol Evol20112830333043doi:msr125 [pii] 10.1093/molbev/msr125[PubMed][Google Scholar]
  • 39. HaberleJMolecular defects in human carbamoy phosphate synthetase I: mutational spectrum, diagnostic and protein structure considerationsHum Mutat201132579589doi:10.1002/humu.21406[PubMed][Google Scholar]
  • 40. CarrollRLVertebrate Paleontology and Evolution1988W.H. Freeman and Company
  • 41. GekasCHematopoietic stem cell development in the placentaInt J Dev Biol20105410891098doi:103070cg [pii] 10.1387/ijdb.103070cg[PubMed][Google Scholar]
  • 42. BejeranoGUltraconserved elements in the human genomeScience200430413211325doi:10.1126/science.1098119 1098119 [pii][PubMed][Google Scholar]
  • 43. Vista Enhancer Browser<http://enhancer.lbl.gov/cgi-bin/imagedb3.pl?form=presentation&show=1&experiment_id=501&organism_id=1> (
  • 44. WellikDMHox patterning of the vertebrate axial skeletonDev Dyn200723624542463doi:10.1002/dvdy.21286[PubMed][Google Scholar]
  • 45. ScottiMKmitaMRecruitment of 5' Hoxa genes in the allantois is essential for proper extra-embryonic function in placental mammalsDevelopment2012139731739doi:dev.075408 [pii] 10.1242/dev.075408[PubMed][Google Scholar]
  • 46. BengtenEImmunoglobulin isotypes: structure, function, and geneticsCurr Top Microbiol Immunol2000248189219[PubMed][Google Scholar]
  • 47. OtaTRastJPLitmanGWAmemiyaCTLineage-restricted retention of a primitive immunoglobulin heavy chain isotype within the Dipnoi reveals an evolutionary paradoxProc Natl Acad Sci U S A200310025012506doi:10.1073/pnas.0538029100 0538029100 [pii][PubMed][Google Scholar]
  • 48. GregoryTRThe Evolution of the Genome2004Elsevier Academic Press, Inc.
  • 49. StamatakisALudwigTMeierHRAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic treesBioinformatics200521456463doi:bti191 [pii] 10.1093/bioinformatics/bti191[PubMed][Google Scholar]
  • 50. SmithJJSumiyamaKAmemiyaCTA living fossil in the genome of a living fossil: Harbinger transposons in the coelacanth genomeMol Biol Evol201229985993doi:msr267 [pii] 10.1093/molbev/msr267[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.