Comparative genomics of ten solanaceous plastomes.
Journal: 2014/December - Advances in Bioinformatics
ISSN: 1687-8027
Abstract:
Availability of complete plastid genomes of ten solanaceous species, Atropa belladonna, Capsicum annuum, Datura stramonium, Nicotiana sylvestris, Nicotiana tabacum, Nicotiana tomentosiformis, Nicotiana undulata, Solanum bulbocastanum, Solanum lycopersicum, and Solanum tuberosum provided us with an opportunity to conduct their in silico comparative analysis in depth. The size of complete chloroplast genomes and LSC and SSC regions of three species of Solanum is comparatively smaller than that of any other species studied till date (exception: SSC region of A. belladonna). AT content of coding regions was found to be less than noncoding regions. A duplicate copy of trnH gene in C. annuum and two alternative tRNA genes for proline in D. stramonium were observed for the first time in this analysis. Further, homology search revealed the presence of rps19 pseudogene and infA genes in A. belladonna and D. stramonium, a region identical to rps19 pseudogene in C. annum and orthologues of sprA gene in another six species. Among the eighteen intron-containing genes, 3 genes have two introns and 15 genes have one intron. The longest insertion was found in accD gene in C. annuum. Phylogenetic analysis using concatenated protein coding sequences gave two clades, one for Nicotiana species and another for Solanum, Capsicum, Atropa, and Datura.
Relations:
Content
References
(38)
Drugs
(6)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Advances in Bioinformatics. Dec/31/2013; 2014
Published online Nov/16/2014

Comparative Genomics of Ten Solanaceous Plastomes

Abstract

Availability of complete plastid genomes of ten solanaceous species, Atropa belladonna, Capsicum annuum, Datura stramonium, Nicotiana sylvestris, Nicotiana tabacum, Nicotiana tomentosiformis, Nicotiana undulata, Solanum bulbocastanum, Solanum lycopersicum, and Solanum tuberosum provided us with an opportunity to conduct their in silico comparative analysis in depth. The size of complete chloroplast genomes and LSC and SSC regions of three species of Solanum is comparatively smaller than that of any other species studied till date (exception: SSC region of A. belladonna). AT content of coding regions was found to be less than noncoding regions. A duplicate copy of trnH gene in C. annuum and two alternative tRNA genes for proline in D. stramonium were observed for the first time in this analysis. Further, homology search revealed the presence of rps19 pseudogene and infA genes in A. belladonna and D. stramonium, a region identical to rps19 pseudogene in C. annum and orthologues of sprA gene in another six species. Among the eighteen intron-containing genes, 3 genes have two introns and 15 genes have one intron. The longest insertion was found in accD gene in C. annuum. Phylogenetic analysis using concatenated protein coding sequences gave two clades, one for Nicotiana species and another for Solanum, Capsicum, Atropa, and Datura.

1. Introduction

Chloroplasts are essential cellular organelles within plant cells possessing the enzymatic machinery for the process of photosynthesis which provides essential energy to plants. Besides photosynthesis, chloroplasts are also involved in biosynthesis of fatty acids, amino acids, pigments, and vitamins [1, 2]. Despite enormous divergence in whole plant form and habitat, chloroplast structure and function have remained remarkably conserved which might be due to intense evolutionary selection pressures associated with the functional requirements of photosynthesis [37]. The chloroplast genome is actually a reduced genome derived from a cyanobacterial ancestor that was captured early in the evolution of the eukaryotic cell [8, 9]. Among the three genomes of the plant cell, the plastome is the most gene dense with more than 100 genes in a genome of only 120 to 210 kb [10]. In the last two decades, the nucleotide sequences of large number of plastid genomes have been published leading to better understanding of their organization and evolution [2, 11, 12]. Currently, about 470 eukaryotic chloroplast genomes have been sequenced completely (http://www.ncbi.nlm.nih.gov/genomes/GenomesHome.cgi?taxid=2759&hopt=html) with the best representation from flowering plants.

Most land plant chloroplast genomes are composed of a single circular chromosome with a quadripartite structure which includes two copies of an inverted repeat (IR) region that separates the large and small single copy regions (LSC and SSC). Genes of chloroplast genomes of higher plants can be divided into three broad categories [13, 14]. In the first, there are genetic system genes encoding for rRNAs, tRNAs, ribosomal proteins, and RNA polymerase subunits. The second category is comprised of genes for photosynthesis which encode subunits of the two photosystems, the cytochrome b6f complex and the ATP synthase. Open reading frames (orfs) of unknown function constitute the third category. Besides, there are some other genes coding for different kinds of proteins including infA, matK, clpP, cemA, accD, and ccsA. Although overall chloroplast genome organization is highly conserved among taxa, structural rearrangements due to inversions have been reported in different taxa like Campanulaceae [15], Cyatheaceae [16], Fabaceae [17], Funariaceae [18], Geraniaceae [19], Onagraceae [20], and Poaceae [21, 22]. Besides structural rearrangements, sequence polymorphisms have also been reported in some cereals [23, 24] and Oenothera species [20]. These studies revealed that highly divergent sequences were concentrated in specific regions called “hotspots.” Such sequence polymorphisms have been used to derive phylogenetic relationships among species.

Solanaceae is an important family of dicots comprising more than 3000 species placed within about 90 genera. It is an ethnobotanical family and is extensively utilized by humans and has recently become a model of comparative and evolutionary genomics research. Few efforts have been made to study the variations in chloroplast genomes of Solanaceae family by using in silico tools. Most of these attempts have been concentrated on comparison of newly sequenced chloroplast genome with the available complete chloroplast genomes from some members of this family [2529]. The availability of complete nucleotide sequences of plastid genomes of ten solanaceous species, Atropa belladonna (NC_004561.1; [30]), Capsicum annuum (NC_018552.1; [29]), Datura stramonium (NC_018117.1; Li et al. (unpublished)), Nicotiana sylvestris (NC_007500.1; [28]), Nicotiana tabacum (NC_001879.2; [31]), Nicotiana tomentosiformis (NC_007602.1; [28]), Nicotiana undulata (NC_016068.1; [32]), Solanum bulbocastanum (NC_007943.1; [26]), Solanum lycopersicum (NC_007898.2; [27]), and Solanum tuberosum (NC_008096.2; [33]), provided us with an opportunity to conduct their in silico comparative analysis in depth. Hence, the present study is an attempt to compare the genome organization, structure, and coding capacity of chloroplast genomes of ten solanaceous species. The study focuses on length mutations, intron-containing genes, grouping of genes in different identity classes based on pairwise comparison of individual genes, and InDel analysis of divergent genes.

2. Materials and Methods

2.1. Sequence Analysis

Whole chloroplast genome sequences as well as individual gene and protein sequences of ten solanaceous species were obtained from “Organelle Genome Resources” section of NCBI in Genbank as well as in Fasta format. Sequence regions corresponding to various genomic features including genes, exons, introns, and cds were specifically extracted from the Genbank files using Extractfeat, Extractseq, and Featcopy programs from Jemboss package. AT percentage for different genomic regions was calculated using Wordcount and Union programs from Jemboss package. Pairwise comparison of gene sequences was done by using NCBI BLAST program and multiple sequence alignment of nucleotide as well as protein sequences was done by using ClustalW. Alignments of protein sequences for some of the genes were manually edited in correspondence to InDels observed in alignments of their nucleotide sequences.

2.2. Phylogenetic Analysis of Concatenated Protein-Coding Genes

75 protein-coding genes of plastomes of ten solanaceous species and two outgroup species (Daucus carota and Coffea arabica) were selected for phylogenetic analysis from the total of 79 classified protein-coding genes excluding accD, rpl20, ycf1, and ycf15. Ycf15 was excluded due to its absence on the plastome of both outgroup species chosen while the other three were not included in the phylogenetic analysis due to their high levels of variation. Multiple sequence alignment of each gene was obtained using ClustalW (https://www.ebi.ac.uk/Tools/msa/clustalw2/). These alignments were then concatenated using standalone BIOEDIT version 7.25 (http://www.mbio.ncsu.edu/bioedit/bioedit.html) and maximum likelihood phylogenetic tree with 500 bootstrap iterations was constructed using PhyMLv3.0 (http://www.atgc-montpellier.fr/phyml/). A graphical view of tree was generated using Archaeopteryx 0.988 SR (https://sites.google.com/site/cmzmasek/home/software/archaeopteryx).

3. Results and Discussion

3.1. Comparison of Properties of Chloroplast Genomes

Comparison of the properties of plastid genomes of ten solanaceous species with respect to their genome size (size of complete plastid genome and LSC, SSC, and IR regions); percent coding regions, introns, and intergenic regions; AT content of overall plastid genomes as well as coding and noncoding regions is presented in Table 1. The total plastid genome size ranged from 155296 bp (S. tuberosum) to 156781 bp (C. annuum). The large size of plastome of C. annuum can be attributed to large LSC region as compared to other species. On the contrary, size of SSC region in C. annuum was the least as compared to other species. The largest size of IR region was in A. belladonna. Among four Nicotiana species studied, N. sylvestris and N. tabacum were almost identical to each other with respect to size of complete genome (difference of only 2 bps) or LSC, SSC, or IR regions compared with plastome of any other species studied. However the percent coding region was slightly more for N. sylvestris (61.49%) than in N. tabacum (61.12%). The size of complete chloroplast genome and LSC and SSC regions of three species of Solanum is comparatively smaller than that of any other species studied except for A. belladonna where size of SSC region was the smallest (18008 bp). However the size of IR region of Solanum species is larger as compared to Nicotiana species. Coding region percentage was found to be higher in Nicotiana species as compared to all other species with maximum for N. undulata (63.12%) and minimum for S. tuberosum (58.45%). Maximum of 12.8% of the plastome was shown to be introns for S. bulbocastanum whereas minimum intron percentage (11.62%) was observed for D. stramonium. Maximum percentage (29.19%) of intergenic region was observed in D. stramonium and minimum (24.19%) was observed in N. undulata. The AT content of noncoding regions was found to be higher as compared to coding regions for all the ten species studied. Similarly, protein-coding regions have shown higher content of AT base pairs as compared to RNA coding genes which can be explained by the requirement of more GC base pairs for proper folding of highly structured ribosomal RNAs and tRNAs [1327]. Comparison of AT content in LSC, SSC, and IR regions reveals that AT content was the highest in SSC regions and the lowest in IR regions. Some earlier studies have also shown similar distribution of AT content in LSC, SSC, and IR regions with the lowest AT content in IR region and the highest AT content in SSC region [2, 27, 34, 35]. The low AT content of IR regions reflects low AT content in the four ribosomal RNA genes in this region.

3.2. Gene Content of Solanaceous Chloroplast Genomes

The genes present in different regions of the plastid genomes are highly conserved except for several open reading frames [6, 26, 36]. There are typically 111 genes, 5 hypothetical chloroplast reading frames (ycfs), and few open reading frames (orfs). Some of our unique findings have been discussed below.

  • The trnP-GGG which codes for tRNA for proline was annotated only in D. stramonium whereas its alternative code trnP-UGG was annotated in all other species including D. stramonium (NC_018117.1; Li et al. (unpublished)). We mined all the species for similar sequence by BLAST search but no similar sequence was found in any other species. Gene trnH was only reported to be trnH coding gene in C. annuum. In all other species, this region was reported to be part of ycf2 gene as in C. annuum also. These observations indicate the presence of duplicate copy of trnH gene sequence in C. annuum and two alternative tRNA genes coding for proline amino acid in D. stramonium. However, no other evidence was found in databases about this particular region coding for trnH.

  • Rps19 pseudogene was reported in three species, namely, N. tomentosiformis, S. bulbocastanum, and S. tuberosum. All other species were mined for similar pseudogene using BLAST pairwise algorithm which confirmed the presence of rps19 pseudogene in other species, namely, A. belladonna, C. annuum, and D. stramonium. The presence of pseudogene may be attributed to the expansion of IRB into the LSC region. In three species, namely, N. sylvestris, N. tabacum, and N. undulata, rps19 pseudogene was found to be absent.

  • infA, a pseudogene for all species except A. belladonna, D. stramonium, and S. Lycopercicum, is a protein-coding gene for S. bulbocastanum. Homology search with infA sequence from S. bulbocastanum against plastomes of A. belladonna and D. stramonium revealed identical sequence in both species.

  • sprA gene has been annotated for N. sylvestris, N. tomentosiformis, S. lycopersicum, and S. tuberosum. Its identical orthologous gene sequences were found in A. belladonna, C. annuum, D. stramonium, N. tabacum, N. undulata, and S. bulbocastanum using BLAST search.

3.3. Split Genes

A total of eighteen split genes have been reported. The sizes of exons and introns for these genes in all the solanaceous species studied are summarized in Table 2. The rps12 gene is divided such that its 5′ end exon is located in the LSC region whereas second and third exons are located in the IR region. Maturation of RNA transcript requires a trans-splicing mechanism between exon 1 and exon 2 [34, 37]. Among the eighteen intron-containing genes, ycf3, clpP, and rps12 contained two introns whereas the other 15 genes contain only one intron. As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intron whereas all other introns belong to group II. Generally, the size of exons was shown to be conserved and variability was observed in the intron regions; however, ndhB was found to be highly conserved for both exons and introns.

3.4. Pairwise Comparison of Plastid Genes of Solanaceae and InDel Analyses

Pairwise comparison of nucleotide sequences of individual gene sequences (45 combinations) for 116 genes was also performed to classify genes based on percent identity. Supplementary Table 1 (Supplementary Material available online at http://dx.doi.org/10.1155/2014/424873) shows grouping of genes in different clusters based on percent identity in pairwise comparison. Genes which showed 100% identity in comparison were considered as highly conserved and the genes showing less than 95% identity at least once in the comparison were considered as highly divergent. These highly divergent genes were further explored at nucleotide as well as at protein level to probe the variations in detail. A total of 11 highly divergent genes were found whereas the number of highly conserved genes varied from 26 (for species pair: N. tomentosiformis and S. lycopersicum) to 107 (for species pair: N. sylvestris and N. tabacum). Most of the tRNA genes were found to be highly conserved. Genes accD, cemA, clpP, ndhA, rpl32, rpl36, rps16, sprA, trnA-UGC, trnL-UAA, and ycf1 were found to be highly diverged.

Tables 3 and 4 describe the summary of InDels observed in nucleotide and amino acid sequences, respectively. Partial multiple sequence alignment of 9 genes and 5 proteins is shown in Figures 1 and 2, respectively. The longest insertion of 141 bp was observed in accD gene sequence of C. annuum. Since genes clpP, ndhA, rps16, and trnL-UAA contained introns, it was important to examine whether these InDels were present in exon or intron region. It was found that all the InDels reported in ndhA and trnL-UAA were present in introns whereas, in case of clpP, InDel 24 was located in exon of the gene. Similarly, the first and last InDels of gene rps16 were present in exons of the gene. Keeping in view the observations in number and length of InDels in nucleotide and protein sequences of genes under consideration, the variation for individual genes is discussed below.

(1) accD. A total of four InDels were observed in accD gene as depicted in Figure 1. Insertion of 24 bp was present interestingly in all Nicotiana species and D. stramonium followed by insertion of 9 bp in all Solanum species indicating stronger sequence conservation at genus level. These insertions were also reported by Chung et al. [25]. A 141 bp insertion was observed specifically in C. annuum which has also been reported by Jo et al. [29] and confirmed by RT-PCR. Similarly a species specific deletion of 6 bp was found in D. stramonium. All these InDels were also reflected in the corresponding protein sequences as shown in Figure 2. The accD gene has been reported to be one of the most variable plastid genes and is probably under diversifying selection [26].

(2) clpP. In clpP gene InDels were found both in intron and in exon regions. Two major consequences were observed in the InDels in the exon regions. An insertion of 6 bp in S. bulbocastanum and S. tuberosum and 30 bp in S. lycopersicum at 3′ end (exon 3) of the clpP gene resulted in shifting of stop codon by 6, 6, and 30 bp downstream in respective species compared to other species of Solanaceae family, increasing the length of the coding sequence and the protein product (Figures 1 and 2). An interesting feature was observed as InDel 1 in protein sequence corresponding to insertion of a repeat of “I” amino acids in D. stramonium making exon 3 region longer by 6 bp. This region however corresponds to the end of intron 2 in clpP gene in all other species. Since D. stramonium chloroplast genome has been sequenced recently, this observation needs to be experimentally validated.

(3) ndhA. All the InDels found in ndhA were present in introns while the protein-coding regions (exons) were highly conserved. This indicates high diversifying selection on intronic region of this gene. Out of the total 14 InDels most of the InDels were observed with respect to C. annuum (InDels 1, 2, 5, 7, and 10). InDel 10 was observed to be shared by C. annuum and S. lycopersicum in full and by C. annuum and A. belladonna in part.

(4) rpl32. In rpl32 insertion of 1 bp in D. stramonium and 3 bp in C. annuum was found in the 3′ region of gene while a deletion of 4 bp was observed in D. stramonium. The insertion of 3 bp in C. annuum only altered the length of the protein by making it longer by 1 amino acid. However, the small insertion of 1 bp in D. stramonium proved to be a frameshift mutation resulting in three changes in the amino acid sequence near the C-terminus. Moreover, deletion of 4 bp at the 3′ end resulted in premature termination of protein synthesis. The frameshift mutation and the 3′ end deletion finally reduced the gene product length by 1 amino acid. As the C-terminal of amino acid chain is well conserved in all the other species, the effect of above mentioned variations needs to be validated experimentally.

(5) rps16. In rps16 also InDels were observed in introns as well as exons. Five of the major insertions in the intron regions were species specific. Insertion of 38 bp (InDel 1), 9 bp (InDel 2), 5 bp (InDel 7), 4 bp (InDel 8), and 6 bp (InDel 9) was observed in A. belladonna, S. lycopersicum, D. stramonium, and C. annuum. A deletion of 5 bp was observed in all the three Solanum species and C. annuum. A deletion of 9 bp was observed in all Nicotiana species resulting in an amino acid change (P to S) and shortening of protein by three amino acids in the C-terminal region. Similar deletion has also been observed by Kahlau et al. [27] and was suggested to be functionally neutral.

(6) sprA. sprA gene has been reported as stable noncoding RNA of unknown function. This gene has been suggested to influence 16S rRNA maturation [38, 39]. In many species this gene seems to be present as remnant and shows large variations in its 5′ region. The largest deletion of 109 bp was observed in C. annuum. The rest of this gene appears to be more conserved with a deletion towards the 3′ end in all Nicotiana species and A. belladonna. The manner in which this gene functions and the consequences of the above mentioned variations are yet to be investigated experimentally.

(7) trnA-UGC. In this particular gene a long deletion of 102 bp was observed in all Nicotiana species. Interestingly, this deletion was further extended to 130 bp in both directions in A. belladonna. These deletions were found in the intron region and so are unlikely to have any negative impact on gene product function.

(8) trnL-UAA. The trend of variation in trnL-UAA was similar to that in ndhA as all InDels were observed in introns. The longest species specific deletion (InDel 3) was observed in C. annuum whereas short insertion of four nucleotides, a repeat of “T,” was observed specifically in D. stramonium. Another insertion of 6 bp was observed in two Nicotiana species, that is, N. sylvestris and N. tabacum.

(9) ycf1. Many InDels (3′ region) were found in the fastest evolving gene, that is, ycf1 gene. Most of the InDels were found to be species specific. Maximum InDels (InDels 2, 3, 5, 7, 9, 16, 17, 19, 23, 24, 25, and 30) were observed in C. annuum followed by D. stramonium (InDels 4, 6, 20, 26, and 31), by N. tomentosiformis (InDels 8, 17, and 18), and by S. lycopersicum (InDels 16, 22, and 28). Two genus specific InDels (InDels 14 and 21) were observed in all the four Nicotiana species. However, InDel 19 was also present in D. stramonium. Another genus specific InDel (InDel 29) was observed in Solanum species. Most of the InDels altered the length of the gene product with maximum length of 1906 amino acids (aa) observed in C. annuum and the minimum of 1873 aa observed in D. stramonium. Among the Solanum species the length of protein (1887 aa) was conserved among S. bulbocastanum and S. tuberosum. However, S. lycopersicum was having the amino acid sequence of 1891 aa, larger by 4 aa as compared to the other two species of the same genus. Among the four Nicotiana species the ycf1 gene product length was conserved among N. sylvestris, N. tabacum, and N. undulata having protein lengths of 1901 aa, 1902 aa, and 1901 aa, respectively. However, N. tomentosiformis was observed to be the most variable member of the genus Nicotiana having protein length of 1892 aa.

3.5. Phylogenetic Analysis of Solanaceous Plastomes

Evolutionary relationships between diverse plant species have been analyzed using several plastome markers including matK and rbcL (genes) or trnH-psbA and trnL-trnF (intergenic regions) due to sequence conservation among plant taxa blended with suitable variation [40, 41]. However, determination of phylogeny based on single gene sequences may be inaccurate [42]. Availability of complete chloroplast sequences for many species has made it possible to use many individual genes or concatenated gene sequences to deduce phylogenetic relationships among taxa [4244].

Efforts have been made to carry out phylogenetic analysis of solanaceous species using complete plastome sequences by Moore et al. [44] and Jansen et al. [45]. Evolutionary positions of Capsicum and Datura in Solanaceae have been determined using a single or a few plastid genes [46, 47]. Recently, concatenated protein-coding gene sequences from completely sequenced plastomes were used to obtain reasonable phylogenetic relationships for solanaceous species [29]. In the present investigation we also used a similar approach to analyze the phylogenetic relationship for ten solanaceous species with completely sequenced plastomes. Individual multiple sequence alignments were concatenated for maximum likelihood phylogenetic tree generation. As depicted in Figure 3, taxa were divided into two clades with 100% bootstrap value of 500. The first clade consisted of four Nicotiana species while the species in Solanum, Capsicum, Atropa, and Datura were included in the second clade. These results are in line with previous phylogenetic analyses using concatenated protein-coding gene sequences as well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29, 47]. However, in an analysis of 13 orfs of solanaceous plastomes, a different arrangement was shown in which Atropa was shown to be separated from Solanum and was grouped together with Nicotiana [25].

4. Conclusions

The analyses of complete plastid genomes of ten solanaceous species revealed overall similarity in terms of the gene content and organization. The sizes of LSC, SSC, and IR regions were found to be somewhat conserved among species but a significant variation was found between genera. Most of the coding regions were well conserved. However, many of the features in few genes were observed to be typical of a particular genus and even species, which can be used as molecular markers to investigate genetic diversity and evolution. These typical variations can be further utilized to develop more sophisticated DNA barcoding based techniques. Ten solanaceous species are divided into two clades on the basis of Phylogenetic analysis using concatenated alignment of gene sequences from coding regions of plastomes. The first clade consisted of four Nicotiana species and the second clade consisted of species of Solanum, Capsicum, Atropa, and Datura.

Supplementary Material

Supplementary table 1: shows grouping of genes in different clusters based on percent identity in pairwise comparison. Ten clusters were made depending upon percentage identity observed between the genes ranging from 80% (minimum identity observed for a given gene between any two species) to 100%. Genes which showed 100% identity in comparison were considered as highly conserved and the genes showing less than 95% identity at least once in the comparison were considered highly divergent. These highly divergent genes were further explored at nucleotide as well as at protein level to probe the variations in detail. A total of 11 highly divergent genes were found whereas the number of highly conserved genes varied from 26 (for species pair: N. tomentosiformis and S. lycopersicum) to 107 (for N. sylvestris and N. tabacum). Most of the tRNA genes were found to be highly conserved. Genes accD, cemA, clpP, ndhA, rpl32, rpl36, rps16, sprA, trnA-UGC, trnL-UAA and ycf1 were found to be highly diverged.

Figure 1
Partial multiple sequence alignment of accD, clpP, ndhA, rpl32, rps16, sprA, tRNA-Ala (UGC), tRNA-Leu(UAA), and ycf1 gene sequences of ten solanaceous species showing location of InDels indicated by hyphens.
Figure 2
Partial multiple sequence alignment of amino acid sequences of genes, namely, accD, clpP, ndhA, rpl32, rps16, sprA, tRNA-Ala(UGC), tRNA-Leu(UAA), and ycf1, of ten solanaceous species showing location of InDels indicated by hyphens.
Figure 3
Maximum likelihood phylogenetic tree derived using concatenated nucleotide sequences of 75 protein-coding genes of ten solanaceous species and two outgroup species.
Table 1
Properties of the solanaceous chloroplast genomes.
AT content (%)
PropertyName of species
ABECANDSTNSYNTANTONUNSBUSLYSTU
Genome size (bp)156687156781155871155941155943155745155863155371155461155296
LSC (bp) (coordinates)*86,869 (1–86869)87366 (1–87366)86297 (1–86297)86684 (1–86684)86,686 (1–86686)86392 (1–86392)86633 (1–86633)85785 (1–85785)85,882 (1–85882)85737 (1–85737)
IRB (bp) (coordinates)*25,905 (86870–112774)25783 (87367–113149)25563 (86298–111860)25342 (86685–112026)25,343 (86687–112029)25429 (86393–111821)25331 (86634–111964)25588 (85786–111373)25,608 (85883–111490)25593 (85738–111330)
SSC (bp) (coordinates)*18,008 (112775–130782)17849 (113150–130998)18448 (111861–130308)18573 (112027–130599)18,571 (112030–130600)18495 (111822–130316)18568 (111965–130532)18381 (111374–129754)18,363 (111491–129853)18373 (111331–129703)
IRA (bp) (coordinates)*25,905 (130783–156687)25783 (130999–156781)25563 (130309–155871)25342 (130600–155941)25,343 (130601–155943)25429 (130317–155745)25331 (130533–155863)25588 (129755–155342)25,608 (129854–155461)25593 (129704–155296)
Coding regions (%)58.8958.5059.1961.4961.1261.5863.1258.5258.9158.45
Introns (%)12.5112.7111.6212.7012.7012.6812.6912.8212.4712.49
Intergenic regions (%)28.6028.7929.1925.8126.1825.7324.1928.6628.6229.06

Overall62.4462.2762.1262.1562.1562.2162.1262.1262.1462.12
Coding regions59.8659.6859.6559.8559.7959.7959.7059.6159.6559.59
Noncoding regions66.1365.9365.7265.8465.8766.0966.2765.6665.7165.68
tRNAs47.7047.3847.0847.0647.0547.1047.0847.1247.0147.06
rRNAs44.6444.7344.6344.6444.6444.6444.6444.6644.6644.65
Protein-coding genes62.0161.8361.7961.9161.8661.8461.6861.7661.8061.74
LSC64.3764.2564.0464.0564.0564.1264.0163.9964.0163.99
SSC68.3567.9967.7267.9467.9368.0367.8767.8767.9767.91
IR57.1456.9456.8756.7856.7856.8456.7856.9356.9156.90

ABE: Atropa belladonna, CAN: Capsicum annuum, DST: Datura stramonium, NSY: Nicotiana sylvestris, NTA: Nicotiana tabacum, NTO: Nicotiana tomentosiformis, NUN: Nicotiana undulata, SBU: Solanum bulbocastanum, SLY: Solanum lycopersicum, STU: Solanum tuberosum, LSC: large single copy region, SSC: small single copy region, and IR: inverted repeat region.

*Start and end position of nucleotide in the genome.

Table 2
The lengths of introns and exons for the split genes of ten solanaceous species.
Gene (region)Exon/intronABECANDSTNSYNTANTONUNSBUSLYSTU
trnK-UUU (LSC)Exon I37373737373737373737
Intron I2519250025062526252625262521250125142512
Exon II36353535353535353535

rps16 (LSC)Exon I40404040404040404040
Intron I822865866860860860859855864855
Exon II227227227218218218218227227227

trnG-UCC (LSC)Exon I23232323232323232323
Intron I692692694692692690691701695692
Exon II48484848484848374848

atpF (LSC)Exon I145145145145145145145144144145
Intron I715693700695695692692693686693
Exon II410410410410410410410411411410

rpoC1 (LSC)Exon I432453453453453432453453453453
Intron I737742737737737709733737737737
Exon II1614161416141614161416141623161416141614

ycf3 (LSC)Exon I124124124124124124124124124124
Intron I739742740739738731735730729727
Exon II230230230230230230230230230230
Intron II763744753783783779781750750750
Exon III153153159153153153153153153153

trnL-UAA (LSC)Exon I35353535353535373535
Intron I497426501503503497498502497497
Exon II50505050505050505050

trnV-UAC (LSC)Exon I38383838383838383838
Intron I572575569571571572573569571571
Exon II35353735353535373535

rps12*Exon I114114114114114114114114114114
Intron I
Exon II232232232232232232232232232232
Intron II535536536536536536536536536536
Exon III26262626262626262626

clpP (LSC)Exon I71717171717171717171
Intron I799811792807807789789789798789
Exon II292292292292292292292292292292
Intron II622626624637637634631625617620
Exon III228228234228228228228234258234

petB (LSC)Exon I6666666666
Intron I759755746753753753753747747747
Exon II642642642642642642642642642642

petD (LSC)Exon I8898888688
Intron I742742748742742742742739738739
Exon II475475474475475475475477475475

rpl16 (LSC)Exon I9999999999
Intron I1019102610251020102010211020101410181014
Exon II396396396396396396396396396396

rpl2 (IR)Exon I391391393391391391391390391391
Intron I664665669666666666666666666666
Exon II434434429434434434434435434434

ndhB (IR)Exon I777777777777777777777777777777
Intron I679679679679679679679679679679
Exon II756756756756756756756756756756

trnI-GAU (IR)Exon I37374237373737423737
Intron I717722717707707716716717722722
Exon II34353535353535353535

trnA-UGC (IR)Exon I38383838383838383838
Intron I681811811709709709709811811811
Exon II35353535353535353535

ndhA (SSC)Exon I553553552553553553553552553553
Intron I1150115711541148114811491148115811331158
Exon II539539537539539539539540539539

*rps12 gene is dividing gene. The 3′-rps12 locates on the IR-region, while the 5′-rps12 locates on the LSC region.

ABE: Atropa belladonna, CAN: Capsicum annuum, DST: Datura stramonium, NSY: Nicotiana sylvestris, NTA: Nicotiana tabacum, NTO: Nicotiana tomentosiformis, NUN: Nicotiana undulata, SBU: Solanum bulbocastanum, SLY: Solanum lycopersicum, and STU: Solanum tuberosum.

Table 3
InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes.
S. numberGeneabcTotal number of InDelsInDel length (bp)
1accDa424, 9, 141, 6
2clpPa248(I), 14(I), 13(I), 7(I), 1(I), 2-3(I), 7(I), 1–7(I), 3(I), 2(I), 3(I), 1–7(I), 1–3(I), 1(I), 1(I), 1(I), 1–5(I), 4–7(I), 1(I), 9(I), 1-2(I), 3(I), 5(I), 24–30
3ndhAb149(I), 5-6(I), 3(I), 1(I), 9(I), 3(I), 4(I), 1–4(I), 1-2(I), 1–23(I), 1-2(I), 2(I), 1(I), 3(I)
4rpl32b22-3, 4
5rps16a111–38, 9(I), 1(I), 1(I), 5(I), 1-2(I), 5(I), 4(I), 6(I), 1(I), 9
6sprAb2109, 7
7trnA-UGCc1102–130
8trnL-UAAa41, 6, 71, 4
9ycf1b313, 18, 18, 21, 6, 6, 48, 9, 6, 6, 42, 3, 6, 30, 3, 15, 12–39, 18, 6, 9–36, 6, 6, 6, 9, 9, 12, 6, 6, 6, 57, 6

abcLocation in different regions; aLSC, bSSC, and cIR; I: InDels present in introns.

Table 4
InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes.
S. numberProteinTotal number of InDelsInDel length (bp)
1accD48, 3, 47, 2
2clpP22, 10
3rpl3211-2
4rps1613
5ycf1291, 6, 6, 7, 2, 2, 7, 3, 2, 2, 14, 1–10, 1, 5, 4–13, 6, 2, 3–12, 2, 2, 2, 3, 3, 4, 2, 2, 2, 19, 2

Acknowledgment

The authors are thankful to University Grants Commission, New Delhi, for providing MANF fellowship to Harpreet Kaur.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

  • 1. BausherM. G.SinghN. D.LeeS.-B.JansenR. K.DaniellH.The complete chloroplast genome sequence of Citrus sinensis (L.) Osbeck var “Ridge Pineapple”: organization and phylogenetic relationships to other angiospermsBMC Plant Biology200661, article 21[PubMed][Google Scholar]
  • 2. HuZ. Y.HuaW.HuangS. M.WangH. Z.Complete chloroplast genome sequence of rapeseed (Brassica napus L.) and its evolutionary implicationsGenetic Resources and Crop Evolution2011586875887[PubMed][Google Scholar]
  • 3. PalmerJ. D.Comparative organization of chloroplast genomesAnnual Review of Genetics1985191325354[PubMed][Google Scholar]
  • 4. OlmsteadR. G.PalmerJ. D.Chloroplast DNA systematics: a review of methods and data analysisThe American Journal of Botany199481912051224[PubMed][Google Scholar]
  • 5. BungardR. A.Photosynthetic evolution in parasitic plants: Insight from the chloroplast genomeBioEssays2004263235247[PubMed][Google Scholar]
  • 6. RaubesonL. A.JansenR. K.HenryR.Chloroplast genomes of plantsDiversity and Evolution of Plants-Genotypic and Phenotypic Variation in Higher Plants2005Wallingford, UKCABI Publishing4568[Google Scholar]
  • 7. HaberleR. C.FourcadeH. M.BooreJ. L.JansenR. K.Extensive rearrangements in the chloroplast genome of Trachelium caeruleum are associated with repeats and tRNA genesJournal of Molecular Evolution2008664350361[PubMed][Google Scholar]
  • 8. MartinW.HerrmannR. G.Gene transfer from organelles to the nucleus: how much, what happens, and why?Plant Physiology19981181917[PubMed][Google Scholar]
  • 9. RaceH. L.HerrmannR. G.MartinW.Why have organelles retained genomes?Trends in Genetics1999159364370[PubMed][Google Scholar]
  • 10. WakasugiT.TsudzukiT.SugiuraM.The genomics of land plant chloroplasts: gene content and alteration of genomic information by RNA editingPhotosynthesis Research2001701107118[PubMed][Google Scholar]
  • 11. KimY.-K.ParkC.-W.KimK.-J.Complete chloroplast DNA sequence from a Korean endemic genus, Megaleranthis saniculifolia, and its evolutionary implicationsMolecules and Cells2009273365381[PubMed][Google Scholar]
  • 12. KhanA.KhanI. A.AsifH.AzimM. K.Current trends in chloroplast genome researchAfrican Journal of Biotechnology201092434943500[PubMed][Google Scholar]
  • 13. ShimadaH.SugiuraM.Fine structural features of the chloroplast genome: comparison of the sequenced chloroplast genomesNucleic Acids Research1991195983995[PubMed][Google Scholar]
  • 14. SugiuraM.The chloroplast genomePlant Molecular Biology1992191149168[PubMed][Google Scholar]
  • 15. CosnerM. E.JansenR. K.PalmerJ. D.DownieS. R.The highly rearranged chloroplast genome of Trachelium caeruleum (Campanulaceae): multiple inversions, inverted repeat expansion and contraction, transposition, insertions/deletions, and several repeat familiesCurrent Genetics1997315419429[PubMed][Google Scholar]
  • 16. GaoL.YiX.YangY.-X.SuY.-J.WangT.Complete chloroplast genome sequence of a tree fern Alsophila spinulosa: insights into evolutionary changes in fern chloroplast genomesBMC Evolutionary Biology200991, article 130[PubMed][Google Scholar]
  • 17. KatoT.KanekoT.SatoS.NakamuraY.TabataS.Complete structure of the chloroplast genome of a legume, Lotus japonicusDNA Research200076323330[PubMed][Google Scholar]
  • 18. GoffinetB.WickettN. J.WernerO.RosR. M.ShawA. J.CoxC. J.Distribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)Annals of Botany2007994747753[PubMed][Google Scholar]
  • 19. PalmerJ. D.NugentJ. M.HerbonL. A.Unusual structure of geranium chloroplast DNA: a triple-sized inverted repeat, extensive gene duplications, multiple inversions, and two repeat familiesProceedings of the National Academy of Sciences of the United States of America1987843769773[PubMed][Google Scholar]
  • 20. GreinerS.WangX.RauwolfU.The complete nucleotide sequences of the five genetically distinct plastid genomes of Oenothera, subsection Oenothera: I. Sequence evaluation and plastome evolutionNucleic Acids Research200836723662378[PubMed][Google Scholar]
  • 21. DoyleJ. J.DavisJ. I.SorengR. J.GarvinD.AndersonM. J.Chloroplast DNA inversions and the origin of the grass family (Poaceae)Proceedings of the National Academy of Sciences of the United States of America1992891677227726[PubMed][Google Scholar]
  • 22. MichelangeliF. A.DavisJ. I.StevensonD. W.Phylogenetic relationships among Poaceae and related families as inferred from morphology, inversions in the plastid genome, and sequence data from the mitochondrial and plastid genomesThe American Journal of Botany200390193106[PubMed][Google Scholar]
  • 23. BortiriE.Coleman-DerrD.LazoG. R.AndersonO. D.GuY. Q.The complete chloroplast genome sequence of Brachypodium distachyon: sequence comparison and phylogenetic analysis of eight grass plastomesBMC Research Notes20081, article 61[PubMed][Google Scholar]
  • 24. LesebergC. H.DuvallM. R.The complete chloroplast genome of Coix lacryma-jobi and a comparative molecular evolutionary analysis of plastomes in cerealsJournal of Molecular Evolution2009694311318[PubMed][Google Scholar]
  • 25. ChungH. J.JungJ. D.ParkH. W.KimJ. H.ChaH. W.MinS. R.JeongW. J.LiuJ. R.The complete chloroplast genome sequences of Solanum tuberosum and comparative analysis with Solanaceae species identified the presence of a 241-bp deletion in cultivated potato chloroplast DNA sequencePlant Cell Reports2006251213691379[PubMed][Google Scholar]
  • 26. DaniellH.LeeS.-B.GrevichJ.Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomesTheoretical and Applied Genetics2006112815031518[PubMed][Google Scholar]
  • 27. KahlauS.AspinallS.GrayJ. C.BockR.Sequence of the tomato chloroplast DNA and evolutionary comparison of solanaceous plastid genomesJournal of Molecular Evolution2006632194207[PubMed][Google Scholar]
  • 28. YukawaM.TsudzukiT.SugiuraM.The chloroplast genome of Nicotiana sylvestris and Nicotiana tomentosiformis: complete sequencing confirms that the Nicotiana sylvestris progenitor is the maternal genome donor of Nicotiana tabacumMolecular Genetics and Genomics20062754367373[PubMed][Google Scholar]
  • 29. JoY. D.ParkJ.KimJ.SongW.HurC.-G.LeeY.-H.KangB.-C.Complete sequencing and comparative analyses of the pepper (Capsicum annuum L.) plastome revealed high frequency of tandem repeats and large insertion/deletions on pepper plastomePlant Cell Reports2011302217229[PubMed][Google Scholar]
  • 30. Schmitz-LinneweberC.RegelR.DuT. G.HupferH.HerrmannR. G.MaierR. M.The plastid chromosome of Atropa belladonna and its comparison with that of Nicotiana tabacum: the role of RNA editing in generating divergence in the process of plant speciationMolecular Biology and Evolution200219916021612[PubMed][Google Scholar]
  • 31. KunnimalaiyaanM.NielsenB. L.Fine mapping of replication origins (oriA and oriB) in Nicotiana tabacum chloroplast DNANucleic Acids Research1997251836813686[PubMed][Google Scholar]
  • 32. ThyssenG.SvabZ.MaligaP.Cell-to-cell movement of plastids in plantsProceedings of the National Academy of Sciences of the United States of America2012109724392443[PubMed][Google Scholar]
  • 33. GarganoD.VezziA.ScottiN.The complete nucleotide sequence genome of potato (Solanum tuberosum cv. Desiree) chloroplast DNABook of Abstracts of the 2nd Solanaceae Genome Workshop2005107[Google Scholar]
  • 34. KimK.-J.LeeH.-L.Complete chloroplast genome sequences from Korean ginseng (Panax schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plantsDNA Research2004114247261[PubMed][Google Scholar]
  • 35. SteaneD. A.Complete nucleotide sequence of the chloroplast genome from the Tasmanian blue gum, Eucalyptus globulus (Myrtaceae)DNA Research2005123215220[PubMed][Google Scholar]
  • 36. PalmerJ. D.HermannR. G.Cell culture and somatic cell genetics of plantsThe Molecular Biology of Plastids1991Vienna, AustriaSpringer553[Google Scholar]
  • 37. SugiuraM.ShinozakiK.TanakaM.von WettsteinD.ChuaN. H.Split genes and cis/trans splicing in tobacco chloroplastsPlant Molecular Biology1987New York, NY, USAPlenum Press6576[Google Scholar]
  • 38. VeraA.SugiuraM.A novel RNA gene in the tobacco plastid genome: its possible role in the maturation of 16S rRNAThe EMBO Journal199413922112217[PubMed][Google Scholar]
  • 39. SugitaM.SvabZ.MaligaP.SugiuraM.Targeted deletion of sprA from the tobacco plastid genome indicates that the encoded small RNA is not essential for pre-16S rRNA maturation in plastidsMolecular and General Genetics199725712327[PubMed][Google Scholar]
  • 40. LahayeR.van der BankM.BogarinD.WarnerJ.PupulinF.GigotG.MaurinO.DuthoitS.BarracloughT. G.SavolainenV.DNA barcoding the floras of biodiversity hotspotsProceedings of the National Academy of Sciences of the United States of America2008105829232928[PubMed][Google Scholar]
  • 41. TaberletP.GiellyL.PautouG.BouvetJ.Universal primers for amplification of three non-coding regions of chloroplast DNAPlant Molecular Biology199117511051109[PubMed][Google Scholar]
  • 42. GuoX.Castillo-RamírezS.GonzálezV.BustosP.Fernández-VázquezJ. L.SantamaríaR.ArellanoJ.CevallosM. A.DávilaG.Rapid evolutionary change of common bean (Phaseolus vulgaris L) plastome, and the genomic diversification of legume chloroplastsBMC Genomics20078, article 228[PubMed][Google Scholar]
  • 43. JansenR. K.KaittanisC.SaskiC.LeeS.-B.TomkinsJ.AlversonA. J.DaniellH.Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosidsBMC Evolutionary Biology200661, article 32[PubMed][Google Scholar]
  • 44. MooreM. J.SoltisP. S.BellC. D.BurleighJ. G.SoltisD. E.Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicotsProceedings of the National Academy of Sciences of the United States of America20101071046234628[PubMed][Google Scholar]
  • 45. JansenR. K.CaiZ.RaubesonL. A.DaniellH.DepamphilisC. W.Leebens-MackJ.MüllerK. F.Guisinger-BellianM.HaberleR. C.HansenA. K.ChumleyT. W.LeeS.-B.PeeryR.McNealJ. R.KuehlJ. V.BooreJ. L.Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patternsProceedings of the National Academy of Sciences of the United States of America2007104491936919374[PubMed][Google Scholar]
  • 46. BohsL.OlmsteadR. G.Phylogenetic relationships in Solanum (Solanaceae) based on ndhF sequencesSystematic Botany1997221517[PubMed][Google Scholar]
  • 47. OlmsteadR. G.BohsL.MigidH. A.Santiago-ValentinE.GarciaV. F.CollierS. M.A molecular phylogeny of the SolanaceaeTaxon200857411591181[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.