MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline.
Journal: 2014/May - Molecular Biology and Evolution
ISSN: 1537-1719
Abstract:
Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface.
Relations:
Content
Citations
(121)
References
(32)
Genes
(13)
Organisms
(2)
Processes
(3)
Similar articles
Articles by the same authors
Discussion board
Molecular Biology and Evolution. Oct/31/2013; 30(11): 2531-2540
Published online Aug/15/2013

MitoFish and MitoAnnotator: A Mitochondrial Genome Database of Fish with an Accurate and Automatic Annotation Pipeline

+2 authors

Abstract

Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface.

Introduction

Genetic information provides a foundation for the protection and management of biological diversity and enables researchers to decipher the evolutionary histories of diverse biological species. For metazoans, one of the most useful source of information is mitochondrial DNA (Gissi et al. 2008). As a notable example, the barcode sequence of the cytochrome c oxidase subunit I (COX1) gene in mitogenomes is a particularly useful tool for species identification as it has been exhaustively collected in the Barcode of Life project (Ratnasingham and Hebert 2007). The availability of large ranging primers, the species-level diversity, and the compactness of the COX1 barcode sequences (∼650 bp) represents considerable advantages for the effective identification of species. However, it is widely known that information obtained from a single gene is often insufficient for resolving branches of phylogenetic trees (Miya and Nishida 2000; Arnason et al. 2002; Pacheco et al. 2011).

The MitoFish database collects complete mitogenomic data of fish, i.e., vertebrates excluding tetrapods. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, which covers approximately 70% of the Earth’s surface. Whole mitogenomic sequencing was first introduced into the phylogenetic study of fish at the end of the 20th century (Miya and Nishida 1999). Since then, the advantages of mitogenomic sequencing in evolutionary research has been demonstrated in many studies (e.g., Miya et al. 2003; Ramsden et al. 2003). As a publicly accessible, specialized sequence database of fish mitogenomes, MitoFish has received an average of more than 30,000 unique accesses annually since its launch in 2004. In recent years, it has become much easier to sequence whole mitogenomes at a reasonable cost and in unprecedented volumes due to the emergence of the high-throughput DNA sequencing technologies. Today, a benchtop-type sequencer is capable of sequencing dozens of mitogenomes in a single run; thus, the volume of mitogenomic information is expected to grow rapidly in the near future.

Although metazoan mitogenomes have diverse structures (Boore 1999), vertebrate mitogenomes are typically circular, approximately 16 kb in length, and encode 13 protein-coding genes, 22 transfer RNA (tRNA), and 2 ribosomal RNA (rRNA) genes. These genes are variously oriented between the two strands of the mitogenomes; typically, one strand contains the NADH dehydrogenase subunit 6 (ND6) gene and 8 tRNA genes, and the other strand contains the remaining genes. The gene order is highly conserved, with changes in position typically only observed for tRNAs. Although there are exceptions, repeated genes and gene regions are rarely observed. The annotation of these mitogenomes is intrinsically difficult because, for example, sometimes the structures of mitochondrial tRNAs degenerate, protein-coding genes adopt divergent start/stop codons, and genetic elements overlap. The divergent start codons include ATG, GTG, TTG, ATA, ATT, CTG, TTA, ATC, and ACG, and the stop codons include TAA, TAG, AGA, AGG, TA-, AG-, and T–, where “-” denotes immature stop codons that require the post-transcriptional addition of A bases (Satoh 2006).

New Approaches

Here, we provide a novel report of the MitoFish database, including its recent major updates. The updates include a pipeline named MitoAnnotator, which automatically annotates fish mitogenomes rapidly and accurately. The annotation process likely represents a severe bottleneck for future mitogenomic efforts due to the production of an overwhelming amount of data. In addition, RefSeq (Pruitt et al. 2012), the most comprehensive database for mitogenomes, is known to contain many incorrect mitogenomic annotations (Bernt et al. 2013), which can lead to inaccurate research results. These errors result not only from human errors committed during manual annotation steps (e.g., the annotations of the strands of mitochondrial genes are sometimes reversed) but also from the aforementioned intrinsic difficulties of annotating mitogenomes. MitoAnnotator was developed to overcome these difficulties and can also be used for the re-annotation of previously sequenced fish mitogenomes. MitoFish also contains re-annotations of already sequenced fish mitogenomes that researchers can use as standardized references when they encounter annotations that are likely to be erroneous in public databases or when they conduct large-scale comparative mitogenomic studies. With the added functionality of MitoAnnotator, MitoFish serves as a regularly updated mitogenomic database equipped with a re-annotation function (table 1).

Table 1.
Comparison Table of Mitogenomic Databases.
DatabaseTaxonomic CoverageSequence Data TypeAvailability of Re-annotation PipelineUpdate Frequency/ Last UpdateaReference
GOBASEEukaryotesComplete + partialJune 2010O'Brien et al. (2009)
MamMiBaseMammalsProtein coding genes onlyJune 2010Vasconcelos et al. (2005)
METAMiGAMetazoansCompleteDailyFeijao et al. (2006)
MitoZoaMetazoans, excluding placozoansComplete + nearly completeSemiautomaticDecember 2011D'Onorio de Meo et al. (2012)
MitoFishFish (vertebrates, excluding tetrapods)Complete + partialFully automaticMonthly

aThe last update dates were checked on 25 June, 2013.

MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the MitoAnnotator pipeline can be used via a web interface.

Results and Discussion

Database Content—Overview

The principal content of MitoFish is fish mitogenomic sequence data. The database now contains more than 1,000 complete fish mitogenomic sequences and is regularly updated by incorporating RefSeq updates every month. In addition, MitoFish provides precise mitogenomic annotations, which can be readily adopted in a wide range of studies. In addition to mitogenomes, MitoFish contains partial mitogenomic sequence data, which are updated monthly by incorporating GenBank updates (Benson et al. 2013). MitoFish includes a total of 17,000 source fish species, which is more than half of the number of currently valid fish species in the world (Nelson 2006; Froese and Pauly 2013).

The sequence data are associated with taxonomic information (i.e., orders, families, genera, and species). In addition, information on the sample voucher and registration institution is provided wherever available. For users who need more information on taxonomy, fish habitats, phenotypes, or life cycles, MitoFish provides links to related databases such as FishBase, NCBI Taxonomy, Integrated Taxonomic Information System, and the Catalog of Fishes.

Typical Users and User Interface

MitoFish is accessed via a web browser. Figure 1 shows a screenshot of the home page. The vertical menu bar on the right side allows users to move to the four main functions of MitoFish: species/taxonomy search, sequence similarity search, batch data download, and fish mitogenome annotation. The first two searches can also be performed directly from the home page.

Fig. 1.
MitoFish home page. A vertical menu bar on the right-hand side allows users to access the main functions of MitoFish. The fish species/taxonomy search and sequence similarity searches can also be performed directly from the home page.

We assume four primary types of users that correspond to these four functions. The first type includes users who are interested in particular species or groups of fish and will search for that subset of mitogenomic data. Such users can easily access pages regarding the species/taxa of interest via the species/taxonomy search function. On the mitogenome page of each species (fig. 2), a picture of the fish and a visual representation of its annotated circular mitogenome aid visual recognition, and users are able to download the mitogenomic sequence of the species along with annotation data. In addition, taxonomic information and links to external databases are summarized on the same page to aid further analysis. Users can apply the downloaded mitogenomes to their analysis directly or can, for example, construct polymerase chain reaction (PCR) primers using the complete mitogenomic data to sequence their own samples for further molecular evolutionary analysis.

Fig. 2.
Mitogenome page of individual species. The mitogenome page of each species includes a picture of the fish and a visual representation of the annotated circular mitogenome to aid visual recognition. Users can download mitogenomic sequences and the associated annotation data from the links. Information on sample vouchers and registration institutions is also provided. To facilitate further analysis, taxonomic information and links to external databases are comprehensively summarized.

The second user type includes researchers who possess fish mitochondrial DNA sequence data and want to identify the species or infer the evolutionary background. For such users, MitoFish provides a sequence similarity search function. When a user inputs a nucleotide sequence, MitoFish runs a BlastN search (Camacho et al. 2009) against the mitogenomic or mitogenomic + partial mitochondrial sequence database and outputs sequences similar to the sequence provided by the user. Links to each mitogenomic data page are included in the search result page to allow users to easily download sequences similar to the input sequence for further analysis.

The third user type includes those who are interested in comparative mitogenomics. MitoFish provides precise and standardized annotations that are batch downloadable. For example, users can convert the annotations into concatenated gene sequences or synteny structure data to conduct large-scale analyses of mitogenomic evolution.

Finally, the fourth user type refers to users who have sequenced fish mitogenomes and wish to annotate their sequences as easily as possible. MitoAnnotator includes functions intended for this user type. These functions are described in the following sections.

MitoAnnotator: Mitochondrial Genome Annotation Pipeline

MitoAnnotator is a pipeline for automatically annotating fish mitogenomic sequences with a high degree of accuracy. The high-quality, in-house manual annotation of 250 fish mitogenomes (Satoh 2006) was incorporated into the development of the pipeline and enhanced the performance of MitoAnnotator. When sequences are provided by the user in the conventional FASTA format, MitoAnnotator provides a full mitogenomic annotation without any user input in approximately 5 min. This rapid response is highly desirable in an annotation pipeline in the current era of high-throughput sequencing.

As described above, a vertebrate mitogenome typically contains 13 protein-coding genes; 22 tRNA genes (two tRNAs each for serine and leucine and one tRNA for each of the other 18 amino acids); 2 rRNA genes; and 1 control region or “d-loop”, which is a non-coding region for replication and transcription control (Boore 1999). MitoAnnotator automatically finds these 38 elements and outputs their coordinates and strands as described below (see fig. 3 for an overview).

Fig. 3.

Overview of the MitoAnnotator pipeline. Please refer to the main text and figure 4 for the details of each procedure.

A vertebrate mitogenome is a circular molecule that can be represented arbitrarily in linear representations (e.g., in FASTA format). In addition to the two complementary sequences of a given circular DNA molecule, the start position can be chosen arbitrarily. We followed a convention that places a tRNAPhe gene at the first position, as tRNAPhe genes are typically located immediately after the control region in vertebrate mitogenomic sequences (Boore 1999). Accordingly, MitoAnnotator first locates the tRNAPhe gene within the input sequence and adjusts the coordinates to place the tRNAPhe gene in the first position. The tRNAPhe gene is detected using MiTFi (Juhling et al. 2012) with an e-value threshold of 1e−5. If the tRNAPhe gene is not found (e.g., when a partial mitogenomic sequence is provided), the coordinate adjustment is not conducted, and the original sequence is directly fed into the subsequent steps.

MiTFi is a tool for accurately locating tRNA genes within mitogenomic sequences. Other than MiTFi, tRNAscan-SE (Lowe and Eddy 1997) is the most commonly used tool for locating tRNA genes in prokaryotic and eukaryotic genomic DNA sequences. However, tRNAs encoded in mitogenomes sometimes have exceptional structures and cannot be discovered using general methods. For example, mitochondrial tRNAs can have incomplete cloverleaf structures lacking otherwise highly conserved loops (Anderson et al. 1981) or arms (Arcari and Brownlee 1980; de Bruijn et al. 1980). Consequently, tRNAscan-SE finds mitochondrial tRNA genes with high specificity but with low sensitivity (Juhling et al. 2012). To identify tRNA genes in mitogenomic sequences with greater sensitivity, a second tool, named ARWEN (Laslett and Canback 2008), employs a heuristic algorithm that first searches for hairpin structures to avoid overlooking degenerate structures. However, in compensation, the heuristics of ARWEN result in a substantial false discovery rate. MiTFi addresses this need through covariance models (a special case of stochastic context-free grammars designed for modeling RNA consensus sequence and structure) developed from known vertebrate mitochondrial tRNAs and uses Infernal (Nawrocki et al. 2009) as its search engine.

MitoAnnotator then determines the precise coordinates of the 22 tRNA genes, 2 rRNA genes, 13 protein-coding genes, and the control region. First, MitoAnnotator searches for their approximate positions. Here, MiTFi is applied again, but this time to locate the remaining 21 tRNAs. Because exceptional mitogenomes containing multiple tRNAs have been reported in vertebrates, we allowed MiTFi to accept multiple tRNAs with e-values below 1e−5. Next, BlastX and BlastN (Camacho et al. 2009) are applied to identify protein-coding genes and rRNA genes, respectively, against a fish mitochondrial gene database created from 250 fish mitogenomes (Satoh 2006). To our knowledge, the duplication of protein-coding genes and rRNA genes has not been reported in fish mitogenomes, whereas some avian families are reported to have tandem duplications of their mitogenomic regions that include protein-coding genes (Sammler et al. 2011). Thus, we chose to permit only hits with the lowest e-values for these two gene categories.

Next, to obtain their precise coordinates, the following steps are conducted. For tRNAs, the 3′ termini of some tRNA annotations in RefSeq lack a base after the accepter stems (and before the CCA tails). MitoAnnotator consistently includes the +1 base to standardize the annotations. In case multiple tRNA genes are identified, each of them is included in the output annotations, and special notes are added for the tRNA genes with the lowest e-values. In general, duplicated tRNA genes become redundant and quickly degenerate during evolution because of their weak functional constraints. In mitochondrial genomes, however, duplicated tRNA genes may be maintained as punctuation markers that keep their flanking elements intact through splicing (Mabuchi et al. 2004). Thus, the annotation of degenerated tRNA genes is very important because they may retain biological functions and play an important role in deciphering the evolution of mitogenomic structures.

Regarding rRNAs, we adopted the tRNA punctuation model as a principal criterion (Ojala et al. 1980). According to this model, the processing of flanking tRNAs directly produces both the 5′ and 3′ ends of rRNAs (and mRNAs) or results in the location of rRNA genes and tRNA genes on mitogenomes without gaps between them. However, our analysis of the large fish mitogenome dataset suggested that this model would not be applicable to some rRNAs. Therefore, in the event that the tRNA punctuation model resulted in exceptionally long 12S rRNA genes (more than 1,000 bp) or 16S rRNA genes (more than 1,850 bp), we employed the BlastN search results directly to annotate the rRNA genes. In such cases, it is intrinsically difficult to determine rRNA gene structures from genomic sequences alone; transcribed rRNAs must be directly sequenced.

Protein-coding genes require more complex rules (fig. 4). First, MitoAnnotator continually extends the end positions of the BlastX-hit regions by three bases until a vertebrate mitochondrial stop codon is found (i.e., TAA, TAG, AGA, or AGG [Osawa et al. 1989]). When the search does not find a stop codon before the succeeding element, we allowed the ATP synthase Fo subunit 8 (ATP8), COX1, NADH dehydrogenase subunit 4L (ND4L), subunit 5 (ND5), and ND6 genes to overlap with the succeeding element. For the remaining eight protein-coding genes, MitoAnnotator allows the genes to overlap with the succeeding element only if their coding strands are different. If the coding strands are the same or if the above procedures also cannot identify a stop codon in the succeeding element, then MitoAnnotator sets the −1 position of the start position of the succeeding element as the end position. Next, the start positions are determined (fig. 4B). Because mitochondrial genomes are typically very efficient and have little space between the coding elements (Ojala et al. 1980), MitoAnnotator chooses the mitochondrial start codon (i.e., ATG or GTG [Desjardins and Morais 1991]) that is farthest from the stop codon and that does not overlap with the preceding element. In this case, the following exceptions were introduced by again referring to our in-house dataset. First, for the NADH dehydrogenase subunit 1 (ND1), subunit 2 (ND2), subunit 3 (ND3), and ND5 genes, one-base overlaps with the preceding element are allowed. Second, for the ATP synthase Fo subunit 6 (ATP6) and ND4 genes, overlaps of up to 20 bases are allowed. Third, if the above criteria fail to identify a start codon, a search for special start codons is permitted for the following four genes: CTG for ATP6, TTG for ND1, ATA and ATT for ND3, and TTG for cytochrome c oxidase subunit II (COX2). Fourth, if the start codons found are contiguous (e.g., ATGGTGATG), the last ATG codon is accepted as the start site. In the case that the contiguous start codons do not contain ATG, the last codon is chosen. Fifth, if no start codons are identified using the above criteria, MitoAnnotator searches for the farthest TTG, ATA, ATT, CTG, TTA, ATC, or ACG within +30 bp from the end position of the preceding element. Sixth, if this process also fails to identify a start codon, MitoAnnotator sets the first in-frame position after the preceding element as the start position (i.e., +1, +2, or +3 from the end position of the preceding element, depending on the reading frame). Seventh, if the identified genes are unusually long (over 220 bp for ATP8, 750 bp for COX2, 800 bp for COX3, 1000 bp for ND1, 355 bp for ND3, and 535 bp for ND6), the furthest start codon within the coding frame (within 170–220 bp for ATP8, 650–750 bp for COX2, 700–800 bp for COX3, 900–1,000 bp for ND1, 305–355 bp for ND3, and 435–535 bp for ND6) is chosen as the start site. Eighth, if the seventh rule cannot find a start codon, the last start codon that makes the length of the gene closest to the threshold lengths is chosen.

Fig. 4.

Workflows to determine the coordinates of protein-coding genes. Workflows to determine the end position (A) and the start position (B) of protein-coding genes are presented.

Finally, the control region is annotated if the above procedures provide any interval region longer than 600 bp.

Performance of MitoAnnotator

The performance of the MitoAnnotator pipeline was thoroughly examined using additional mitogenomic data from 42 newly sequenced fish (Miya et al. 2013; Sado T and Miya M, unpublished data; the list of species names and accession numbers is provided in the Material and Methods section) that were not included in the 250-fish mitogenomic dataset. The annotations were conducted very efficiently, taking approximately 5 min per mitogenome, and two expert curators (T.P.S. and T.S.) examined the results manually. T.P.S. had sequenced more than 200 fish mitogenomes in 14 years and T.S. had sequenced more than 400 in 8 years. MitoAnnotator correctly annotated all of the 42 mitogenomes, not only identifying the existence of the 38 mitogenomic elements but also precisely locating their start and stop positions at the single-nucleotide level.

Some fish mitogenomes have unusual structures (Inoue et al. 2003b). To evaluate whether MitoAnnotator can also correctly annotate exceptional cases, we downloaded and evaluated 10 mitogenomes known to have exceptional structures. Their species names, RefSeq accession numbers, and the characteristic parts of their mitogenomic structures are as follows: Aspasma minima (NC_008130, T-CR-I-Q-F-P-12S); Aulostomus chinensis (NC_010269, T-P-CR-ND1-M-ND2-W-Q-F-12S); Ceratias uranoscopus (NC_013882, 16S-ND1-Q-CR-L-I-M-M-ND2); Chauliodus sloani (NC_003159, W-A-N-Y-C-C-C-C-C-COI); Conger myriaster (NC_002761, ND5-CYB-T-CR-ND6-E-P); Sigmops gracilis (NC_002574, ND6-CYB-E-P-T-CR); Coelorinchus kishinouyei (NC_003169, ND6-CYB-T-P-E-CR); Cryptopsaras couesii (NC_013880, 16S-NC-I-ND1-Q-L-M-ND2); Diaphus splendidus (NC_003164, ND1-I-M-Q-ND2); and Eurypharynx pelecanoides (NC_005299, highly rearranged structure [Inoue et al. 2003b]). The experts confirmed that the annotations were correct. For example, a tandem repeat of five tRNACys genes on the C. sloani mitogenome was correctly annotated.

Some tools already exist to facilitate the annotation of metazoan mitogenomes. DOGMA (Wyman et al. 2004) is a pioneering tool in this field that helps researchers annotate mitochondrial and chloroplast genomes in a semiautomatic manner. Because DOGMA uses rather simple approaches to identify coding and noncoding genes, it requires users to manually check the results. MITOS (Bernt et al. 2013) is an automated pipeline for the de novo annotation of metazoan mitogenomes and comes closest to what MitoAnnotator achieves. The biggest difference between MITOS and MitoAnnotator lies in their running times. MITOS requires more than 1 h to annotate one mitogenome, whereas MitoAnnotator requires 5 min. Second, annotations by MITOS have been found to have many inconsistencies with annotations performed by experts, most frequently for the stop codons of protein-coding genes. We suppose that this is most likely because MITOS requires lengths of protein-coding genes too strictly to be multiples of three. For example, in the annotation of the exceptional C. sloani mitogenome, MITOS employed CTT as a stop codon for the cytochrome b (CytB) gene, which is unlikely. Other types of inconsistencies observed using MITOS are summarized in table 2. Third, only MitoAnnotator offers a coordinate adjustment function, which is an important feature for the end user. In MITOS, if a mitogenomic genetic element overlaps the boundary between the head and tail of a linear representation of a mitogenomic sequence, it is not annotated. In contrast, users can create a standardized annotation using the coordinate adjustment function of MitoAnnotator immediately after obtaining an assembled mitogenomic sequence. Last but not least, we envision that the framework of MitoAnnotator can also be applied to other groups of vertebrates once sufficient amounts of high-quality manual annotation data are obtained and the pipeline is appropriately modified.

Table 2.
Numbers of Genomes Whose Automatic Annotations Were Inconsistent with Annotations Performed by Experts for 42 Mitogenomes.
Category of inconsistent annotationsaMito AnnotatorMITOS
Annotation of additional genes03b
Different start positions of protein-coding genes042
Different stop positions of protein-coding genes042
Different start positions of tRNA genes00
Different stop positions of tRNA genes03

aWe excluded start/stop positions of rRNA genes from this comparison table because the annotation of rRNA genes is intrinsically difficult as described in the text.

bEach of the three additional genes predicted by MITOS was a second protein-coding gene copy located in the d-loop of each mitogenome. These genes were very short (the 105-bp ATP8 gene of Nesiarchus nasutus, the 324-bp ND6 gene of Kali indica, and the 438-bp ND2 gene of Diplospinus multistriatus) and are likely to be misannotations.

In conclusion, MitoAnnotator is a fully automatic pipeline that efficiently annotates fish mitogenomes with high accuracy. Annotation results obtained from MitoAnnotator can be directly fed into public sequence repository services, thereby greatly reducing the efforts of researchers in annotating newly sequenced mitogenomes. In combination with MitoFish, we believe that MitoAnnotator will accelerate studies on fish evolution as data collection continues to become easier and less expensive.

Material and Methods

MitoFish Server

The server runs on a Linux operating system, and an Apache HTTP Server provides the web services. A MySQL database system stores information on each fish species. Perl and Ruby scripts process all of these data and the requests from users. All of these resources have been extensively used and are well supported. We have taken care to make MitoFish easily accessible via search engines; thus, search queries such as fish mitogenome, fish mitochondrial genome, and fish mitochondria database on google.com return MitoFish as the top hit as of June 2013.

Database Update

RefSeq and GenBank entries are downloaded every month to update MitoFish. For RefSeq, mitogenomic entries are batch downloaded from the FTP URL ftp://ftp.ncbi.nih.gov/refseq/release/mitochondrion/ (last accessed August 28, 2013). All GenBank entries are downloaded, and those having the feature organelle = mitochondrion are selected. For both databases, sequence entries whose NCBI taxonomy classification entries are under Myxiniformes, Petromyzontiformes, Chondrichthyes, Actinopterygii, Coelacanthiformes, or Dipnoi are selected and incorporated into the mitogenomic and BLAST databases.

Mitochondrial Genomes

Newly sequenced mitogenomes from 42 diverse fish species were used in evaluating the performance of the MitoAnnotator pipeline. The 42 species are as follows (the International Nucleotide Sequence Database Collaboration accession numbers are provided): Acanthocybium solandri (AP012945), Anchoviella sp. (AP012524), Aphanopus carbo (AP012944), Ariomma indica (AP012513), Ariomma lurida (AP012512), Assurger anzac (AP012508), Benthodesmus tenuis (AP012522), Dionda episcopa (AP012077), Diplospinus multistriatus (AP012523), Epinnula magistralis (AP012943), Eumegistus illustris (AP012497), Euthynnus affinis (AP012946), Evoxymetopon poeyi (AP012509), Gempylus serpens (AP012502), Gymnosarda unicolor (AP012510), Hemitremia flammea (AP012078), Icichthys lockingtoni (AP012511), Kali indica (AP012500), Luciocyprinus striolatus (AP012525), Luxilus chrysocephalus (AP012079), Macrhybopsis gelida (AP012080), Margariscus margarita (AP012081), Microphysogobio yaluensis (AP012073), Nesiarchus nasutus (AP012503), Nocomis biguttatus (AP012082), Notropis atherinoides (AP012083), Notropis baileyi (AP012084), Opsopoeodus emiliae (AP012085), Pampus punctatissimus (AP012516), Peprilus burti (AP012947), Promethichthys prometheus (AP012504), Pteraclis aesticola (AP012499), Rastrelliger kanagurta (AP012948), Ruvettus pretiosus (AP012506), Sarda orientalis (AP012949), Scombrolabrax heterolepis (AP012517), Sphyraena japonica (AP012501), Tanakia tanago (AP012526), Taractes asper (AP012498), Tetragonurus atlanticus (AP012515), Tetragonurus cuvieri (AP012514), and Thyrsitoides marleyi (AP012505). The extracted mitogenomes were amplified via the long PCR technique (Miya and Nishida 1999; Inoue et al. 2003a) and sequenced with the Sanger sequencing technique.

Acknowledgments

The authors thank Keiichi Matsuura for helpful discussion; Jun G. Inoue, Satoko Koide, and Tomoyuki Yamada for technical support; and the editor and four anonymous reviewers for their valuable comments. This work was supported by theJapan Society for the Promotion of Science(Grant Numbers13556028,19207007,23370041,23710231,178087,248046, and258048) and theJapan Science and Technology Agency (CREST).

References

  • 1. AndersonSBankierATBarrellBG(14 co-authors)Sequence and organization of the human mitochondrial genomeNature1981290457465[PubMed][Google Scholar]
  • 2. ArcariPBrownleeGGThe nucleotide sequence of a small (3S) seryl-tRNA (anticodon GCU) from beef heart mitochondriaNucleic Acids Res.1980852075212[PubMed][Google Scholar]
  • 3. ArnasonUAdegokeJABodinKBornEWEsaYBGullbergANilssonMShortRVXuXJankeAMammalian mitogenomic relationships and the root of the eutherian treeProc Natl Acad Sci U S A.20029981518156[PubMed][Google Scholar]
  • 4. BensonDACavanaughMClarkKKarsch-MizrachiILipmanDJOstellJSayersEWGenBankNucleic Acids Res.201341D36D42[PubMed][Google Scholar]
  • 5. BerntMDonathAJuhlingFExternbrinkFFlorentzCFritzschGPutzJMiddendorfMStadlerPFMITOS: improved de novo metazoan mitochondrial genome annotationMol Phylogenet Evol.Forthcoming 201369313319[PubMed][Google Scholar]
  • 6. BooreJLAnimal mitochondrial genomesNucleic Acids Res.19992717671780[PubMed][Google Scholar]
  • 7. CamachoCCoulourisGAvagyanVMaNPapadopoulosJBealerKMaddenTLBLAST+: architecture and applicationsBMC Bioinformatics200910421[PubMed][Google Scholar]
  • 8. D'Onorio de MeoPD'AntonioMGriggioFLupiRBorsaniMPavesiGCastrignanoTPesoleGGissiCMitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in MetazoaNucleic Acids Res.201240D1168D1172[PubMed][Google Scholar]
  • 9. de BruijnMHSchreierPHEperonICBarrellBGChenEYArmstrongPWWongJFRoeBAA mammalian mitochondrial serine transfer RNA lacking the “dihydrouridine” loop and stemNucleic Acids Res.1980852135222[PubMed][Google Scholar]
  • 10. DesjardinsPMoraisRNucleotide sequence and evolution of coding and noncoding regions of a quail mitochondrial genomeJ Mol Evol.199132153161[PubMed][Google Scholar]
  • 11. FeijaoPCNeivaLSde Azeredo-EspinAMLessingerACAMiGA: the arthropodan mitochondrial genomes accessible databaseBioinformatics200622902903[PubMed][Google Scholar]
  • 12. FroeseRPaulyDFishBase. World Wide Web electronic publication2013[cited 2013 August 28] www.fishbase.org, version (04/2013)
  • 13. GissiCIannelliFPesoleGEvolution of the mitochondrial genome of Metazoa as exemplified by comparison of congeneric speciesHeredity2008101301320[PubMed][Google Scholar]
  • 14. InoueJGMiyaMTsukamotoKNishidaMBasal actinopterygian relationships: a mitogenomic perspective on the phylogeny of the “ancient fish”Mol Phylogenet Evol.2003a26110120[PubMed][Google Scholar]
  • 15. InoueJGMiyaMTsukamotoKNishidaMEvolution of the deep-sea gulper eel mitochondrial genomes: large-scale gene rearrangements originated within the eelsMol Biol Evol.2003b2019171924[PubMed][Google Scholar]
  • 16. JuhlingFPutzJBerntMDonathAMiddendorfMFlorentzCStadlerPFImproved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangementsNucleic Acids Res.20124028332845[PubMed][Google Scholar]
  • 17. LaslettDCanbackBARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequencesBioinformatics200824172175[PubMed][Google Scholar]
  • 18. LoweTMEddySRtRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequenceNucleic Acids Res.199725955964[PubMed][Google Scholar]
  • 19. MabuchiKMiyaMSatohTPWestneatMWNishidaMGene rearrangements and evolution of tRNA pseudogenes in the mitochondrial genome of the parrotfish (Teleostei: Perciformes: Scaridae)J Mol Evol.200459287297[PubMed][Google Scholar]
  • 20. MiyaMNishidaMOrganization of the mitochondrial genome of a deep-sea fish, Gonostoma gracile (Teleostei: Stomiiformes): first example of transfer RNA gene rearrangements in bony fishesMarine Biotechnol.19991416426[Google Scholar]
  • 21. MiyaMNishidaMUse of mitogenomic information in teleostean molecular phylogenetics: a tree-based exploration under the maximum-parsimony optimality criterionMol Phylogenet Evol.200017437455[PubMed][Google Scholar]
  • 22. MiyaMTakeshimaHEndoH(12 co-authros)Major patterns of higher teleostean phylogenies: a new perspective based on 100 complete mitochondrial DNA sequencesMol Phylogenet Evol.200326121138[PubMed][Google Scholar]
  • 23. MiyaMFriedmanMSatohTP(14 co-authros)Volutionary origin of the scombridae (Tunas and Mackerels): members of a paleogene adaptive radiation with 14 other pelagic fish familiessPLOS ONE Forthcoming 2013[Google Scholar]
  • 24. NawrockiEPKolbeDLEddySRInfernal 1.0: inference of RNA alignmentsBioinformatics20092513351337[PubMed][Google Scholar]
  • 25. NelsonJSFishes of the world2006Hoboken (NJ)John Wiley
  • 26. O'BrienEAZhangYWangEMarieVBadejokoWLangBFBurgerGGOBASE: an organelle genome databaseNucleic Acids Res.200937D946D950[PubMed][Google Scholar]
  • 27. OjalaDMerkelCGelfandRAttardiGThe tRNA genes punctuate the reading of genetic information in human mitochondrial DNACell198022393403[PubMed][Google Scholar]
  • 28. OsawaSOhamaTJukesTHWatanabeKEvolution of the mitochondrial genetic code. I. Origin of AGR serine and stop codons in metazoan mitochondriaJ Mol Evol.198929202207[PubMed][Google Scholar]
  • 29. PachecoMABattistuzziFULentinoMAguilarRFKumarSEscalanteAAEvolution of modern birds revealed by mitogenomics: timing the radiation and origin of major ordersMol Biol Evol.20112819271942[PubMed][Google Scholar]
  • 30. PruittKDTatusovaTBrownGRMaglottDRNCBI Reference Sequences (RefSeq): current status, new features and genome annotation policyNucleic Acids Res.201240D130D135[PubMed][Google Scholar]
  • 31. RamsdenSDBrinkmannHHawryshynCWTaylorJSMitogenomics and the sister of SalmonidaeTrends Ecol Evol.200318607610[Google Scholar]
  • 32. RatnasinghamSHebertPDbold: The Barcode of Life Data System (http://www.barcodinglife.org)Mol Ecol Notes.20077355364[PubMed][Google Scholar]
  • 33. SammlerSBleidornCTiedemannRFull mitochondrial genome sequences of two endemic Philippine hornbill species (Aves: Bucerotidae) provide evidence for pervasive mitochondrial DNA recombinationBMC Genomics20111235[PubMed][Google Scholar]
  • 34. SatohTPComparative study on the structural features of fish mitochondrial genomes [doctoral thesis]2006Tokyo (Japan)The University of Tokyo
  • 35. VasconcelosATGuimaraesACCastellettiCH(23 co-authors)MamMiBase: a mitochondrial genome database for mammalian phylogenetic studiesBioinformatics20052125662567[PubMed][Google Scholar]
  • 36. WymanSKJansenRKBooreJLAutomatic annotation of organellar genomes with DOGMABioinformatics20042032523255[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.