The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes.
Journal: 2015/October - Nature Communications
ISSN: 2041-1723
Abstract:
Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear. Brassica is an ideal model to increase knowledge of polyploid evolution. Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes. Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B. oleracea. This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus.
Relations:
Content
Citations
(238)
References
(61)
Conditions
(1)
Chemicals
(2)
Organisms
(2)
Processes
(12)
Affiliates
(19)
Similar articles
Articles by the same authors
Discussion board
Nature Communications 5
Published online May/22/2014

The Brassica oleracea genome reveals the asymmetrical evolution of polyploidgenomes

+77 authors

Abstract

Polyploidization has provided much genetic variation for plant adaptive evolution,but the mechanisms by which the molecular evolution of polyploid genomes establishesgenetic architecture underlying species differentiation are unclear. Brassicais an ideal model to increase knowledge of polyploid evolution. Here we describe adraft genome sequence of Brassica oleracea, comparing it with that of itssister species B. rapa to reveal numerous chromosome rearrangements andasymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification oftransposable elements, differential gene co-retention for specific pathways andvariation in gene expression, including alternative splicing, among a large numberof paralogous and orthologous genes. Genes related to the production of anticancerphytochemicals and morphological variations illustrate consequences of genomeduplication and gene divergence, imparting biochemical and morphological variationto B. oleracea. This study provides insights into Brassica genomeevolution and will underpin research into the many important crops in thisgenus.

Brassica oleracea is plant species comprising economically important vegetablecrops. Here, the authors report the draft genome sequence of B. oleracea and,through a comparative analysis with the closely related B. rapa, reveal insightsinto Brassica evolution and divergence of interspecific genomes and intraspecificsubgenomes.

Brassica oleracea comprises many important vegetable crops including cauliflower,broccoli, cabbages, Brussels sprouts, kohlrabi and kales. The species demonstratesextreme morphological diversity and crop forms, with various members grown for theirleaves, flowers and stems. About 76 million tons of Brassica vegetables wereproduced in 2010, with a value of 14.85 billion dollars ( http://faostat.fao.org/). Most B.oleracea crops are high in protein1 and carotenoids2, and contain diverse glucosinolates (GSLs) that function as unique phytochemicals forplant defence against fungal and bacterial pathogens3 and on consumptionhave been shown to have potent anticancer properties456.

B. oleracea is a member of the family Brassicaceae (~\n338genera and 3,709 species)7 and one of three diploid Brassicaspecies in the classical triangle of U8 that also includes diploids B.rapa (AA) and B. nigra (BB) and allotetraploids B. juncea (AABB),B. napus (AACC) and B. carinata (BBCC). These allotetraploid speciesare important oilseed crops, accounting for 12% of world edible oil production (http://faostat.fao.org/). As theorigin and relationship between these species is clear, the timing and nature of theevolutionary events associated with Brassica divergence and speciation can berevealed by interspecific genome comparison. Each of the Brassica genomes retainsevidence of recursive whole-genome duplication (WGD) events910 (Supplementary Fig. 1) and have undergone aBrassiceae-lineage-specific whole-genome triplication (WGT)1112 since their divergence from the Arabidopsis lineage. Theseevents were followed by diploidization that involved substantial genome reshuffling andgene losses1112131415. Because of this, Brassica speciesare a model for the study of polyploid genome evolution (Supplementary Fig. 2), mechanisms of duplicated geneloss, neo- and sub-functionalization, and associated impact on morphological diversityand species differentiation.

We report a draft genome sequence of B. oleracea and its comprehensive genomiccomparison with the genome of sister species B. rapa, which diverged from acommon ancestor ~\n4 MYA. These data provide insights into the dynamics ofBrassica genome evolution and divergence, and serve as important resourcesfor Brassica vegetable and oilseed crop breeding. Furthermore, this genome willsupport studies of the large range of morphological variation found within B.oleracea, which includes sexually compatible crops such as cabbages, cauliflowerand broccoli that are important for their economic, nutritional and potent anticancervalue.

Results

B. oleracea genome assembly and annotation

Complementing the sequencing of the smaller B. rapa genome11, a draft genome assembly of B. oleracea var. capitata line02–12 was produced by interleaving Illumina, Roche 454 and Sangersequence data. This assembly represents 85% of the estimated 630 Mbgenome, and includes >98% of the gene space (Supplementary Methods, Supplementary Tables 1–3, 7 and 8and Supplementary Fig. 3). Theassembly was anchored to a new genetic map16 to produce ninepseudo-chromosomes that account for 72% of the assembly, and validated bycomparison with a B. oleracea physical map17, ahigh-density B. napus genetic map18 and complete BACsequences (Supplementary Figs4–9 and Supplementary Tables 4 and 5). For comparative analyses, identicalgenome annotation pipelines were used for annotation of protein-coding genes andtransposable elements (TEs) for B. oleracea and B. rapa.

A total of 45,758 protein-coding genes were predicted, with a mean transcriptlength of 1,761 bp, a mean coding length of 1,037 bp, anda mean of 4.55 exons per gene (Table 1, Supplementary Methods, Supplementary Table 6 and Supplementary Fig. 10), similar to A.thaliana19 and B. rapa11. Publiclyavailable ESTs, together with RNA sequencing (RNA-seq) data generated in thisstudy, support 94% of predicted gene models (Supplementary Tables 7 and 8), and 91.6% ofpredicted genes have a match in at least one public protein database (Supplementary Tables 9 and 10, andSupplementary Fig. 11). Of the45,758 predicted genes, 13,032 produce alternative splicing (AS) variants withintron retention and exon skipping (Supplementary Table 11). Genome annotation also predicted 3,756non-coding RNAs (miRNA, tRNA, rRNA and snRNA) (Supplementary Table 12).

A combination of structure-based analyses and homology-based comparisons resultedin the identification of 13,382 TEs with clearly identified terminal boundaries,including 5,107 retrotransposons and 8,275 DNA transposons (Supplementary Methods, Supplementary Fig. 12 and Supplementary Table 13). These elementstogether with numerous truncated elements or TE remnants make up 38.80% of theassembled portion of the B. oleracea genome, whereas TEs account for only21.47% of the B. rapa genome assembly. Copia (11.64%) and gypsy (7.84%)retroelements are the major constituents of the repetitive fraction, and areunevenly distributed across each chromosome, with retrotransposons predominantlyfound in pericentromeric or heterochromatic regions (Supplementary Fig. 13) in B. oleracea.Tentative physical positions of some of the centromeres were determined based onhomologue and phylogenetic analysis of the centromere-specific 76 bptandem repeats CentBo-1 and CentBo-2 and copia-type retrotransposon (CentCRBo)(Supplementary Table 14 andSupplementary Figs14–17). The distribution of 45S and 5S rDNA sequences werealso visualized by fluorescent in situ hybridization (Supplementary Figs 18 and 19), leading to apredicted karyotype ideogram for B. oleracea (Supplementary Fig. 20). An extra-centromericlocus with colocalized centromeric satellite repeat CentBo-1 and the centromericretrotransposon CRBo-1 was observed on the long arm of chromosome 6 (Supplementary Figs18–20). A comprehensive database for the genome information isaccessible at http://www.ocri-genomics.org/bolbase/index.html.

Conserved syntenic blocks and genome rearrangement after WGT

The relatively complete triplicated regions in B. oleracea and B.rapa were constructed and they relate to the 24 ancestral cruciferblocks (A–X) in A. thaliana20. Further thetriplicated blocks resulting from WGT in the two Brassica species werepartitioned into three subgenomes: LF (Least-fractionated), MF1(Medium-fractionated) and MF2 (Most-fractionated)11 (Fig. 1a, Supplementary Methods, Supplementary Tables 15 and 16, and Supplementary Figs 21–26). Thesesyntenic blocks occupy the majority of the genome assemblies of A.thaliana (19,628 genes, 72.24% of 27,169 genes), B. oleracea(26,485 genes, 57.88%) and B. rapa (26,698 genes, 64.84%), and provide afoundation for comparative analyses of chromosomal rearrangement, gene loss anddivergence of retained paralogues after WGT. Massive gene loss occurred in anasymmetrical and reciprocal fashion in the three subgenomes of each species andwas largely completed before the B. oleracea–B. rapadivergence (Fig. 1c, Supplementary Tables 17–19 andSupplementary Figs25–27). The timing of this evolutionary process wassupported by the estimated timing of WGT ~\n15.9 million years ago(MYA), and species divergence ~\n4.6 MYA, based on synonymoussubstitution (Ks) rates of genes located in the blocks (Fig.1b and Supplementary Table20). Gene loss occurred mainly through small deletions that may becaused by illegitimate recombination2122 (Supplementary Fig. 27), consistent withobservations in other plant genomes.

Abundant genome rearrangement following WGT and subsequent Brassicaspecies divergence resulted in complex mosaics of triplicated ancestral genomicblocks in the A and C genomes (Fig. 1a and Supplementary Fig. 28). At least 19 major,and numerous fine-scale, chromosome rearrangements occurred, which differentiatethe two Brassica species (Supplementary Fig. 29). This is in agreement with previouscomparative studies based on chromosome painting1223 andgenetic mapping2425. The extensive chromosome reshuffling inBrassica is in contrast to that observed in other taxa, such as thehighly syntenic tomato–potato and pear–apple genomes, eachwith longer divergence times and less genome rearrangement2627.This difference may be a consequence of mesopolyploidy in Brassica.

Greater TEs accumulation in B. oleracea than B.rapa

Both retro- (22.13%) and DNA (16.67%) TEs appear to be greater amplified in B.oleracea relative to B. rapa (9.43 and 12.04%) (Fig. 2a and SupplementaryTable 13). We constructed 1,362 gap-free contig-contig syntenicregions by clustering orthologous B. rapaB. oleraceagenes using MCscan (Supplementary Figs 29and 30). The B. oleracea TE length (34.03% of the 259.6M) is3.4 times greater than that of the syntenic B. rapa regions (16.73% ofthe 155.0M) (Fig. 2c, Supplementary Tables 21 and 22, and Supplementary Fig. 31). Phylogeneticanalysis revealed that B. oleracea has both more LTR retrotransposon(LTR-RT) families, and more members in most families than B. rapa (Fig. 2d and Supplementary Figs 12, 32 and 33). Furthermore, two new lineages ofLTR-RTs, Brassica Copia Retrotransposon and Brassica GypsyRetrotransposon, were defined in both Brassica species (Supplementary Fig. 33). Analysis of LTRinsertion time revealed that ~\n98% of B. oleracea intactLTR-RTs amplified continuously over the ~\n4 million years (MY) sincethe B. oleraceaB. rapa split, whereas~\n68% of B. rapa intact LTR-RTs amplified rapidly within thelast 1 MY, predominantly in the recent 0.2 MY (Fig. 2b andSupplementary Fig. 34). Hence,LTR-RTs expanded more in the intergenic space of euchromatic regions in B.oleracea than B. rapa. This agrees with previous observationsbased on comparison of BAC sequences between the A and C genomes28. As a consequence of continuous TE amplification over the last 4 MY, thegenome size of B. oleracea is ~\n30% larger than that of B.rapa although the two genomes share the same ploidy and are largelycollinear.

Species-specific genes and tandemly duplicated genes

While the genomes of B. oleracea and B. rapa are highly similar interms of total gene clusters/sequences and the gene number in each cluster,there are also a large number of species-specific genes in the two species. Atotal of 66.5% (34,237 genes) of B. oleracea genes and 74.9% (34,324) ofB. rapa genes were clustered into OrthoMCL groups (Supplementary Table 23 and Supplementary Fig. 35). We identified 9,832B. oleracea-specific and 5,735 B. rapa-specific genes, ofwhich 77% were supported by gene expression and/or a clear Arabidopsishomologue (Supplementary Table 24).Of them, >90% of these specific genes were validated for their absence inthe counterpart genomes by reciprocal mapping of raw clean reads (Supplementary Tables 25 and 26). MostBrassica-specific genes are randomly distributed along thechromosomes (Supplementary Figs 36 and37). More than 80% of the species-specific genes were surrounded bynon-specific genes (Supplementary Fig.38), suggesting that deletion of individual genes may be the majormechanism underlying gene loss and the difference in gene numbers between B.oleracea and B. rapa.

Tandem duplication produces clusters of duplicated genes and contributes to theexpansion of gene families29. We identified 1,825, 2,111 and1,554 gene clusters containing 4,365, 5,181 and 4,170 tandemly duplicated genesin B. oleracea, B. rapa and A. thaliana, respectively(Fig. 3a, Supplementary Tables 27 and 28 and Supplementary Fig. 39). The wide range ofsequence divergence of tandem gene pairs in each species suggests that tandemgene duplication occurred continuously throughout the evolutionary history ofthese species, rather than in discrete bursts (Supplementary Figs 40 and 41). Theircontinuous and asymmetrical occurrence after species divergence resulted in 522,697 and 815 species-specific tandem clusters in the three genomes. The frequencyof tandem duplication is independent of the total gene content, suggesting thatgenome triplication has not inhibited its occurrence. Tandemly duplicated genesare preferentially enriched for gene ontology (GO) categories related to defenceresponse and pathways related to secondary metabolism such as indole alkaloidbiosynthesis and tropane, piperidine and pyridine alkaloid biosynthesis (Fig. 3b, Supplementary Tables 29–32 and Supplementary Fig. 42). Over 44.0 and 51.9%of the NBS-encoding resistance genes are tandemly duplicated in B.oleracea and B. rapa, respectively (Supplementary Table 33).

Biased loss and retention of genes after WGT/WGD

Following polyploidization, reversion of gene numbers towards diploid levelsthrough gene loss has been widely observed in plants30. However,in Brassica this only appears to be true for collinear genes in theconserved syntenic regions, with a loss of ~\n60% of the predictedpost-triplication gene set, nearly restoring the pre-triplication gene number.This is reflected in an overall retention rate of 1.2-fold of A. thalianaorthologous genes in corresponding syntenic regions (Fig.1c and Supplementary Table18). In contrast, in terms of genes that have no collinear gene inA. thaliana and either Brassica species (hereafter callednon-collinear genes), gene retention rates is 2.5-fold the A. thalianagene number in B. oleracea and 1.9-fold in B. rapa, bothsignificantly higher than the expected rates (P value<2.2e–16;Supplementary Table 34). For these retained genes, the numbers of thegenes that are common in the two Brassica species are 11,746 in B.oleracea and 10,411 in B. rapa. Most of these genes are supportedby expression and/or the presence of an Arabidopsis homologue (Supplementary Table 35). More than61% of these genes have homologues present as collinear genes and 16% also arehomologous to other non-collinear genes, indicating gene movement fromtriplicated syntenic regions and being similar to observations in A.thaliana, where half of the genes are nonsyntenic within rosids31. This suggests that the breakdown of the triplicated syntenicrelationship has not only prevented gene loss and a move towardspre-triplication gene numbers but has also maintained a higher gene density, andthus maintained WGT-derived genes for species evolution.

The presence of a large number of the retained paralogous genes in the syntenicregions led us to examine whether genes in some functional categories havepreferentially been over-retained, as observed in other plants29.The results indicate that WGT-produced paralogous genes are over-retained in GOcategories associated with regulation of metabolic and biosynthetic processes,RNA metabolism and transcription factors (Supplementary Table 36 and Supplementary Figs 43–45), and the two Brassicaspecies exhibit similar patterns of gene category retention. From a study ofKEGG pathways, we also found that WGT-produced Brassica paralogous genescontribute 40–60% of total genes for 90% of KEGG pathways (Fig. 3c and Supplementary Fig. 43), and are functionally enriched in primary orcore metabolic processes such as oxidative phosphorylation, carbon fixation,photosynthesis, circadian rhythm32 and lipid metabolism (Supplementary Tables 36 and 37 andSupplementary Figs43–45). Notably, the pathways associated with energymetabolism have been enhanced in both Brassica species. For instance, inthe oxidative phosphorylation pathway, there are 161 genes in A.thaliana, but 241 in B. oleracea and 208 in B. rapa. Themajority (143/241 and 142/208) of these Brassica genes are multipleparalogues residing in the triplicated syntenic regions, and more than half ofthese paralogues have been retained as three copies, significantly higher thanobserved for other genes in the triplication regions (Fig.3d and Supplementary Fig.43).

Phylogenetic analyses show that WGT led to an expansion of genes involved inauxin functioning (AUX, IAA, GH3, PIN, SAUR, TAA, TIR, TPL and YUCCA),morphology specification (TCP), and flowering time control (FLC, CO, VRN1,LFY, AP1 and GI) (SupplementaryTable 38 and Supplementary Figs46–61), and that most Arabidopsis genes in thesefamilies have two or three orthologs in Brassica species. TheseWGT-produced duplicated genes may provide important sources of evolutionaryinnovation33 and contribute to the extreme morphologicaldiversity in Brassica species.

Divergence of duplicated genes in the Brassica genomes

The largest genetic foundation for plant genome evolution and new speciesformation is the differentiation of retained paralogous and orthologous genes.Around 38% (4,302/11,493) of all paralogous gene pairs in B. oleracea and~\n36% (4,089/11,448) in B. rapa have different predicted exonnumbers (Supplementary Data 1, Supplementary Tables 39 and 40 andSupplementary Fig. 62). Thereare 6,571 orthologous gene pairs with different exon numbers, accounting for27.6% of total gene pairs (23,823). Some paralogous or orthologous pairs havehigh Ks values and low sequence similarity (Supplementary Fig. 63), indicating sequencedifferentiation. Of these paralogous genes, some offer appreciable opportunityfor non-reciprocal DNA exchanges (gene conversion). About 8% of the 4,296homologous quartets in B. rapa and B. oleracea have been affectedby gene conversion (Fig. 4a, Supplementary Table 41 and Supplementary Fig. 64) and about one-sixth(53) of converted genes were inferred to have experienced independent conversionevents in both Brassica species, a parallelism sometimes observed inother plants1134. Around 40–44% of conversion eventsinvolved paralogues in the less-fractionated subgenomes LF in both species,substantially higher than the other two subgenomes (Supplementary Table 41). This findingsuggests that gene conversion is related to homologous gene density, whichdetermines the likelihood of illegitimate recombination.

Analysis of RNA-seq data generated from callus, root, leaf, stem, flower andsilique of B. oleracea and B. rapa suggests that >40% ofWGT paralogous gene pairs are differentially expressed in these species (Fig. 4b and Supplementary Fig. 65), suggesting potential subfunctionalization ofthese genes. In both species, a general trend of expression differentiation wasalpha-WGD paralogous genes (~\n46%) > WGT paralogous genes(~\n42%) > tandemly duplicated genes (~\n35%)(Fig. 4b, Supplementary Fig. 66 and Supplementary Tables 42 and 43). Different tissues harbourapproximately the same number of differentially expressed duplicates, but thisnumber was slightly higher in flower tissue. The expression level of genes inthe LF subgenome was significantly higher than corresponding syntenic genes inthe more fractionated subgenomes (MF1 and MF2) while no expression dominancerelationship was observed between the subgenomes MF1 and MF2 (Fig. 4c, Supplementary Table44 and Supplementary Fig.67). Duplicated transcription factor gene pairs showed lessdifferentiated expression (~\n38%) than the expected ratio at thegenome-wide level (Fig. 4d and Supplementary Table 45), while paralogueswith GO categories related to membrane, catalytic activity and defence responseexhibited a higher ratio of differentiated expression (Fig.4e and Supplementary Table46). Of B. oleracea–B. rapa orthologous gene pairs(23,823 in total), ~\n42% were differentially expressed across alltissues (Supplementary Tables 42 and43).

Furthermore, many paralogues generate different transcripts, resulting inexpression differentiation. Analysis of AS variants of paralogous gene pairsthat have identical numbers of exons demonstrated that these variants (eitherdifferent variants or differential expression of the same variants) cause>20% and >44% of such paralogous genes to be differentiallyexpressed in B. oleracea and B. rapa, respectively (Fig. 4f and SupplementaryTable 47). For orthologous gene pairs of B. oleracea and B.rapa, 35.5% (8,467) of gene pairs showed differential expression due toAS variation. When only counting intron retention and exon skipping, 9.3%(2,215) of gene pairs differ. Divergence in AS variants of gene pairs presentsan important layer of gene regulation, as reported35363738,and thus provides a genetic basis for species evolution and new speciesformation.

Unique GSLs metabolism pathways

GSLs and hydrolysis products have been of long-standing interest due to theirrole in plant defence and anticancer properties. Compared with B. rapaand B. napus, B. oleracea has the greatest GSL profile diversity,with wide qualitative and quantitative variation3940. Weidentified 101 and 105 GSL biosynthesis genes in B. rapa and B.oleracea, respectively, and 22 GSL catabolism genes in each species(Fig. 5a, Supplementary Table 48 and Supplementary Data 2). In the GSL biosynthesis and catabolismpathways, tandem genes (41.4%, 40.7% and 33.9% in A. thaliana, B.oleracea and B. rapa, respectively) were present in a much higherproportion than the genome-wide average (Supplementary Table 32). The observed variation of GSL profiles ismainly attributed to the duplication of two genes, methylthioalkylmalate(MAM) synthase and 2-oxoglutarate-dependent dioxygenase(AOP).

In Arabidopsis, the MAM family contains three tandemly duplicatedand functionally diverse members (MAM1, MAM2 and MAM3), and functional analysis demonstrated thatMAM2 (absent in ecotypeColumbia) and MAM1 catalysesthe condensation reaction of the first and the first two elongation cycles forthe synthesis of dominant 3 and 4 carbon (C) side-chain aliphatic GSLs,respectively4041, while MAM3 is assumed to contribute to the production of all GSLchain lengths42. In B. rapa and B. oleracea,MAM1/MAM2 genes experienced independent tandem duplication toproduce 6 and 5 orthologs respectively (Fig. 5b,c). Themain GSLs in B. oleracea are 4C and 3C GSLs (progoitrin, gluconapin, glucoraphanin and sinigrin)43, while thosein B. rapa are 4C and 5C GSLs (gluconapin and glucobrassicanapin)39 (Fig.5a). Based on the results of expression and phylogenetic analyses, wefound a pair of genes Bol017070 and Bra013007, which are the only orthologous genes showing highexpression in B. oleracea but silenced in B. rapa (Fig. 5a). This expression difference most likely leads to greateraccumulation of the 3C GSL anticancer precursor sinigrin in B. oleracea.Meanwhile, the expression level of MAM3 in B. rapa is much higher than inB. oleracea, explaining the accumulation of 5C GSL glucobrassicanapin in B. rapa.Other genes affecting specific anticancer GLS products are AOPs.Previously, research has reported four gene loci involved in the side-chainmodifications of aliphatic GSLs in Arabidopsis. Two tandemly duplicatedgenes AOP2 andAOP3 catalysethe formation of alkenyl and hydroxyalkyl GSLs, respectively. When bothAOPs are non-functional, the plant accumulates the precursormethylsulfinyl alkyl GSL. We identified three AOP2 genes in B.oleracea (Fig. 5d), but two are non-functional dueto the presence of premature stop codons. In contrast, all threeAOP2 copies arefunctional in B. rapa44. No AOP3 homologue has beenidentified in Brassica. This analysis supports GSL content surveys andexplains why glucoraphanin isabundant in B. oleracea, but not in B. rapa.

Discussion

The Brassica genomes experienced WGT111225 followed bymassive gene loss and frequent reshuffling of triplicated genomic blocks. Analysisof retained or lost genes following triplication identified over-retention of genesfor metabolic pathways such as oxidative phosphorylation, carbon fixation,photosynthesis and circadian rhythm32, which may contribute topolyploid vigour45. Fewer lost genes were observed in theless-fractionated subgenome, possibly due to expression dominance as reported inmaize46.

Gene expression analysis revealed extensive divergence and AS variants betweenduplicate genes. This subfunctionalization or neofunctionalization of duplicatedgenes provides genetic novelty and a basis for species evolution and new speciesformation. For example, TF genes that are considered to be conserved still have morethan 38% of paralogous pairs showing differential expression across tissues althoughthis percentage is lower than the average from all duplicated genes. Gene expressionvariation may contribute to an increased complexity of regulatory networks afterpolyploidization.

The multi-layered asymmetrical evolution of the Brassica genomes revealed inthis study suggests mechanisms of polyploid genome evolution underlying speciation.Asymmetrical gene loss between the Brassica subgenomes, the asymmetricalamplification of TEs and tandem duplications, preferential enrichment of genes forcertain pathways or functional categories, and divergence in DNA sequence andexpression, including alternative splicing among a large number of paralogous andorthologous genes, together shape a route for genome evolution afterpolyploidization. A molecular model of polyploid genome evolution through theseasymmetrical mechanisms is summarized in Supplementary Fig. 2. The additional information of accessible largedatasets and resource was provided in Supplementary Table 49.

In summary, the B. oleracea genomic sequence, its features in comparison withits relatives, and the genome evolution mechanisms revealed, provide a fundamentalresource for the genetic improvement of important traits, including components ofGSLs for anticancer pharmaceuticals. The genome sequence has also laid a foundationfor investigation of the tremendous range of morphological variation in B.oleracea as well as supporting genome analysis of the importantallotetraploid crop B. napus (canola or rapeseed).

Methods

Sample preparation and genome sequencing

A B. oleracea sp. capitata homozygous line 02–12 withelite agronomic characters and widely used as a parent in hybrid breeding wasused for the reference genome sequencing (Supplementary Methods). The seedlings of plants were collected andgenomic DNA was extracted from leaves with a standard CTAB extraction method.Illumina Genome Analyser whole-genome shotgun sequencing combined with GS FLXTitanium sequencing technology was used to achieve a B. oleracea draftgenome. We constructed a total of 35 paired-end sequencing libraries withinsertion sizes of 180 base pairs (bp), 200 bp, 350 bp,500 bp, 650 bp, 800 bp, 2 kb,5 kb, 10 kb and 20 kb following a standardprotocol provided by Illumina (Supplementary Methods). Sequencing was performed using IlluminaGenome Analyser II according to the manufacturer’s standardprotocol.

Genome assembly and validation

We took a series of checking and filtering measures on reads following theIllumina-Pipeline, and low-quality reads, adaptor sequences and duplicates wereremoved (Supplementary Methods).The reads after the above filtering and correction steps were used to performassembly including contig construction, scaffold construction and gap fillingusing SOAPdenovo1.04 ( http://soap.genomics.org.cn/) (Supplementary Methods). Finally, we used20-kb-span paired-end data generated from the 454 platform and 105-kb-spanBAC-end data downloaded from NCBI ( http://www.ncbi.nlm.nih.gov/nucgss?term=BOT01) to extend scaffoldlength (Supplementary Methods). TheB. oleracea genome size was estimated using the distribution curve of17-mer frequency (SupplementaryMethods).

To anchor the assembled scaffolds onto pseudo-chromosomes, we developed a geneticmap using a double haploid population with 165 lines derived from a F1 crossbetween two homozygous lines 02–12 (sequenced) and 0188(re-sequenced). The genetic map contains 1,227 simple sequence repeat markersand single nucleotide polymorphism markers in nine linkage groups, which span atotal of 1,180.2 cM with an average of 0.96 cM between theadjacent loci16. To position these markers to the scaffolds,marker primers were compared with the scaffold sequences using e-PCR (parameters-n2 -g1 –d 400–800), with the best-scoring match chosen incase of multiple matches.

We validated the B. oleracea genome assembly by comparing it with thepublished physical map constructed using 73,728 BAC clones ( http://lulu.pgml.uga.edu/fpc/WebAGCoL/brassica/WebFPC/)17 and a genetic map from B. napus18 (Supplementary Methods). ElevenSanger-sequenced B. oleracea BAC sequences were used to assess theassembled genome using MUMmer-3.22 ( http://mummer.sourceforge.net/) (Supplementary Methods).

Gene prediction and annotation

Gene prediction was performed on the genome sequence after pre-masking for TEs(Supplementary Methods). Geneprediction was processed with the following steps: (i) De novo geneprediction used AUGUSTUS47 and GlimmerHMM48 withparameters trained from A. thaliana genes. (ii) For homologue prediction,we mapped the protein sequences from A. thaliana, O. sativa, C.papaya, V. vinifera and P. trichocarpa to the B.oleracea genome using tblastn with an E-value cutoff of10−5, and used GeneWise (Version 2.2.0)49 for gene annotation. (iii) For EST-aided annotation, theBrassica ESTs from NCBI were aligned to the B. oleracea genomeusing BLAT (identity ≥0.95, coverage ≥0.90) and furtherassembled using PASA50. Finally, all the predictions werecombined using GLEAN51 to produce the consensus gene sets.

Functional annotation of B. oleracea genes was based on comparison withSwissProt, TrEMBL, Interproscan and KEGG proteins databases. The tRNA genes wereidentified by tRNAscan-SE using default parameters52. Then rRNAswere compared with the genome using blastn. Other non-coding RNAs, includingmiRNA, snRNA, were identified using INFERNAL53 by comparison withthe Rfam database.

TE annotation

LTR-RTs were initially identified using the LTR_STRUC54 programme,and then manually annotated and checked based on structure characteristics andsequence homology. Refined intact elements were then used to identify otherintact elements and solo LTRs55. All the LTR-RTs with clearboundaries and insertion sites were classified into superfamilies(Copia-like, Gypsy-like and Unclassified retroelements) andfamilies relying on the internal protein sequence, 5′, 3′LTRs, primer-binding site and polypurine tracts. Non-LTR-RTs (Long interspersednuclear element, LINE and Short interspersed nuclear element, SINE) and DNAtransposons (Tc1-Mariner, hAT, Mutator, Pong,PIF-Harbinger, CACTA and miniature inverted repeat TE) wereidentified using conserved protein domains of reverse transposase or transposaseas queries to search against the assembled genome using tblastn. Furtherupstream and downstream sequences of the candidate matches were compared witheach other to define their boundaries and structure56.Helitron elements were identified by the HelSearch 1.0 programme57 and manually inspected. All the TE categories were identifiedaccording to the criteria described previously58. Typicalelements of each category were selected and mixed together as a database forRepeatMasker59 analysis. Around 20 × coverage ofshotgun reads randomly sampled from the two Brassica genomes were maskedby the same TE data set to confirm the different accumulation of TEs between thetwo genomes.

Syntenic block construction of B. oleracea and itsrelatives

We used the same strategy as described in the B. rapa genome paper11 to construct syntenic blocks between species (Supplementary Methods). The all-against-allblastp comparison (E-value ≤ 1e–5) provided thegene pairs for syntenic clustering determined by MCScan (MATCH_SCORE: 50,MATCH_SIZE: 5, GAP_SCORE: –3, E_VALUE: 1E–05). As appliedin B. rapa11, we assigned and partitioned multiple B.oleracea or B. rapa chromosomal segments that matched the sameA. thaliana segment (‘A to X’ numbering systemin A. thaliana22) into three subgenomes: LF, MF1 andMF2.

OrthoMCL clustering

To identify and estimate the number of potential orthologous gene familiesbetween B. oleracea, B. rapa, A. thaliana, C.papaya, P. trichocarpa, V. vinifera, S. bicolor andO. sativa, and also between B. oleracea and B. rapa, weapplied the OrthoMCL pipeline60 using standard settings (blastpE value <1 × 10−5 andinflation factor =1.5) to compute the all-against-all similarities.

Phylogenetic analysis of gene families

We performed comparative analysis of trait-related gene families. Genes fromgrape, papaya and Arabidopsis were downloaded from the GenoScope database( http://www.genoscope.cns.fr/externe/GenomeBrowser/Vitis/), theHawaii Papaya Genome Project ( http://asgpb.mhpcc.hawaii.edu/papaya/), and the ArabidopsisInformation Resource ( http://www.arabidopsis.org/). Previously reportedArabidopsis and Brassica gene sequences were downloaded fromTAIR ( http://www.arabidopsis.org/) and BRAD ( http://brassicadb.org/brad/).The protein sequences of the genes were used to determine homologues in grape,papaya, Arabidopsis, B. oleracea and B. rapa by performingblast comparisons with an E-value 1e–10. The Clustal61 programs were used for multiple sequence alignment. Alignment ofthe small family of GI genes was performed using MEGA562 toconduct neighbour-joining analysis with default parameters and subjected tocareful manual checks to remove highly divergent sequences from furtheranalysis. While for other genes, often found in families of tens of genes, thephylogenetic analysis were performed by PhyML63, which canaccommodate quite divergent sequences by implementing a maximal likelihoodapproach with initial analysis based on neighbour-joining method. During theseanalyses, we constructed trees using both CDS and protein sequence, and theprotein-derived tree was used to show the phylogeny if not much incongruity wasfound. Bootstrapping was performed using 100 repetitive samplings for each genefamily. All the inferred trees were displayed using MEGA5 (ref. 62). The multiple sequence alignment of these familieswas provided as Supplementary Data3.

Differential expression of duplicated genes across tissues

RNA-seq reads were mapped to their respective locations on the reference genomeusing Tophat64. Uniquely aligned read counts were calculated foreach gene for each tissue sample. We performed the exact conditional test of twoPoisson rates on read counts of duplicated genes to test the differentialexpression of duplicated genes, according to the method applied in soybean6566. For each duplicated gene pair (for example, genes A and B),read counts and gene length were denoted as Ea and La for gene A, and Eb and Lbfor gene B, respectively. The read counts of the genes A and B were assumed tofollow the Poisson distributions with rates λA=Ra × Laand λB=Rb × Lb. Under the null hypothesis of equalexpression of the genes A and B, that is, Ra=Rb, the conditional distribution ofEa given Ea+Eb=k follows a binomial distribution with success probabilityP=λa/(λa+λb)=La/(La+Lb). The Pvalues were computed and further adjusted to maintain the false discovery rateat 0.05 across gene pairs using the Benjamini–Hochberg method67.

Statistical analysis

The average number of all retained orthologues in the three subgenomes was usedto estimate the expected retained gene number in each block, and used togetherwith the observed retained gene number, for the gene retention disparitystatistics using the χ2 test. In the GO, IPR(Interproscan) or KEGG enrichment analyses of WGT or tandem genes, theχ2 test (N>5) or theFisher’s exact test (N≤5) was used to detectsignificant differences between the proportion of (WGT or tandem) genes observedin each child GO, IPR or KEGG categories, and the expected overall proportion of(WGT or tandem) genes in the whole genome. Correlation of the gene numbers ofWGT-derived paralogous genes with tandem genes in 938 GO terms was tested byPearson correlation coefficients (Supplementary Figure 68). The Benjamini–Hochberg falsediscovery rate was performed to adjust the P values67.

Author contributions

I.B., B.C., D.E., Q.H., W.H., G.J.K., S.L., Y.L., J. Ma, A.H.P., J.C.P., I.A.P.P.,JunW., XiaowuW., XiyinW. and T.-J.Y. are principal investigators (alphabetic order).B.C., W.H., A.H.P., JunW. and XiaowuW. are equally contributing senior authors.S.L., J.W., W.H., X.X. and Z.Y. planned and managed the project. S.L., C.T., A.H.P.and D.E., X.Y. and M.Z. wrote this manuscript and I.B., J. Ma., G.J.K., J.C.P.,B.C., T.-J.Y., I.A.P.P., XiyinW., XiaowuW., K.L., Y.L., J.B. and A.G.S. maderevision or edits or comments. J.W. (leader), W.H. (co-leader), JunW., L.Y., andZ.Y. performed DNA sequencing. L.Y. (leader), W.H. (co-leader), S.H., J.W., S.L. andJ.Y. conducted genomic sequence assembly. S.H. (leader), XiyinW. (co-leader), J.Min,I.B., W.H., J.B., D.E., P.R., S.L., J.S., Y.L. and W.W. conducted scaffold anchoringto linkage maps and assembly validation. X.Y. (leader), J.Y. (co-leader), S.L.,Q.Z., S.H. and J. Min performed annotation. C.T. (leader), Wanshun L., W.H., Y.L.,C.L., W.W., J. Wu, S.L., C.D. and M.Z. performed transcriptome sequencing. S.L.conceived analysis of comparison and evolution. S.L. (leader), C.T., X.Y.,ZhangyanW., C.L., S.H., J. Ma, J.Y., M.Z., Zhuo W., Q.Z., S.P., I.A.P.P., A.G.S.,L.Y., I.B., G.J.K., J.C.P., XiaowuW., B.C., F.C., YinH., WenbinL. and X.Liangperformed analysis of comparative genomics and evolution. J. Ma (leader), M.Z.,Q.Z., C.T., S.L., B.C., S.H., H.B., C.L. and JianaL. conducted TE analysis. XiyinW.(leader), J.Y., T.-J.Y., ZhangyanW., L.W., J. Li, T.-H.L., JinpengW., H.J., X.T.,X.L., M.G. and L.J. conducted gene family analysis. K.L. (leader), J.Y., S.L., C.T.,H.L., H.G., S.P., D.Z., Z.F., Q.H., Xnfa W., C.Q., D.D., Z.H., Y.H., J.H., D.M.,J.L., Z. Li, J.Z., L.X., Y.Zhou., Z.L. and Y.Zhang conducted trait-related geneanalysis. A.H.P. (leader), XiyinW., D.J., Y.W. and T.-H.L. conducted gene conversionanalysis. T.-J. Y. (leader), M.Z., P.S., B.-S.P., J.Ma, N.E.W., R.Q., X.L., J.Leeand H.H.K. conducted centromere analysis. C.T. (leader), S.L., X.Y., S.H., C.L.,Zhangyan W., Q.Z., J.Y., J.T. and J.B. conducted tandemly duplicated gene analysis.ZhangyanW. and J.Y. performed data submission.

Additional information

Accession codes: Genome sequence data for B. oleracea have beendeposited in the DDBJ/EMBL/GenBank nucleotide core database under the accession codeAOIX00000000. Transcriptome sequence data for B. rapa and B. oleracea havebeen deposited in the DDBJ/EMBL/GenBank Sequence Read Archive (SRA) under theaccession codes GSE43245 and GSE42891 respectively.

How to cite this article: Liu, S. et al. The Brassica oleraceagenome reveals the asymmetrical evolution of polyploid genomes. Nat. Commun.5:3930 doi: 10.1038/ncomms4930 (2014).

Supplementary Material

Supplementary Figures, Tables, Methods and References

Supplementary Figures 1-68, Supplementary Tables 1-49, Supplementary Methodsand Supplementary References

Supplementary Data 1

The 23,823 Brassica oleracea-B. rapa orthologous gene pairs and thosewith different exon numbers

Supplementary Data 2

The genes for biosynthesis and breakdown of glucosinolates (GSL) in B.rapa and B. oleracea.

Supplementary Data 3

The multiple sequence alignment of gene families corresponding to Figure 5and Supplementary Figures 46-61.

Figure 1

Genomic structure and gene retention rates in syntenic regions of B.oleracea and B. rapa.

(a) Segmental colinearity of the genomes of B. oleracea, B.rapa and A. thaliana. Syntenic blocks are defined andlabelled from A to X (coloured) previously reported in A.thaliana20. (b) Time estimate of WGD andsubsequent two Brassica species divergence. (c) Pattern ofretention/loss of orthologous genes on each set of three subgenomic (LF, MF1and MF2) blocks of B. oleracea and B. rapa corresponding toA. thaliana A to X blocks. The x axis denotes the physicalposition of each A. thaliana gene locus. The y axis denotesthe proportion of orthologous genes retained in the B. oleracea andB. rapa subgenomic blocks around each A. thaliana gene,where 500 genes flanking each side of a certain gene locus were analysed,giving a total window size of 1,001 genes.

Figure 2

TE comparison analyses in B. oleracea and B. rapa.

(a) TE copy number and total length in each assembly and B.oleracea–B. rapa syntenic blocks. (b) The numberof intact LTR (Copia-like and Gypsy-like) birthed at differenttimes (million years ago, MYA) in the syntenic regions of B. oleraceaand B. rapa. (c) The comparison of TE distribution andcomposition in B. oleracea–B. rapa syntenic blocks alongB. oleracea chromosomes. We divided B. oleracea–B.rapa syntenic region into non-overlapping sliding 200 kbwindows to compare TE contents. For each window, the ratiolog10(B. oleracea/B. rapa) was calculated for totalsyntenic block length (blue line), LTR length (purple line), gene length(yellow point), exons length (red point) and intron length (green point). IfB. oleracea > B. rapa in absolute length of TEcomposition in a compared window, the dot or line is above the liney=0. The corresponding B. rapa chromosome segments along B.oleracea C08 were indicated by coloured bars. All other B.oleracea chromosomes are showed in Supplementary Fig. 31. (d)Phylogeny of the Copia-like elements as an example of LTR-RTs of thesyntenic regions in B. rapa and B. oleracea. Theneighbor-joining (NJ) trees were generated based on the conserved RTdomain nucleotide sequences using the Kimura two-parameter method68 in MEGA4 (ref. 69).

Figure 3

The duplicated genes derived from tandem duplication and whole-genomeduplications in Brassica genomes.

(a) A Venn diagram showing shared and specific tandem duplicationevents in A. thaliana, B. rapa and B. oleracea.(b,c) Distribution of tandem genes and WGT/WGD-derivedparalogues in the KEGG pathway maps in B. oleracea (bol), B.rapa (bra) and A. thaliana (ath). For each KEGG pathway map,the proportion of the number of duplicated genes or paralogues to the totalgenes was calculated (x axis) and the number of maps whose tandemgene proportion fell in a range was shown on the y axis. (d)Oxidative phosphorylation pathway enriched by WGT-derived paralogous genesin the Brassica genomes. The gene copy number for each KO enzyme inB. oleracea, B. rapa and A. thaliana were shown(dash-connected) under the KO enzyme number.

Figure 4

Divergence of Brassica paralogous and orthologous genes in B.oleracea and B. rapa.

(a) Genome-wide gene conversion in B. oleracea. The conversionin B. rapa is showed in Supplementary Fig. 64. (b) The ratio of differentiallyexpressed duplicated gene pairs derived from different duplications: alphawhole-genome duplication (α-WGD), Brassiceae-lineage WGT,tandem duplication (TD). Bol, B. oleracea; Bra, B. rapa. C:callus; R: root; St: stem; L: leaf; F: flower; Si: silique. Thedifferentially expressed duplicated gene pairs were defined as fold change>2 and false discovery rate (FDR) <0.05 or gene pair whereexpression was detected for only one gene within gene pairs (FDR<0.05). (c) Box and whisker plots for differentiatedexpression for three subgenomes (LF, MF1 and MF2) in flower tissue of B.oleracea and B. rapa. For the other tissues, see Supplementary Fig. 67.(d) The duplicated gene pairs belonging to transcription factors(TFs) and its related GO terms contain a significantly lower ratio ofdifferentially expressed duplicated gene pairs than the average at thegenome-wide level in leaf (values given) and other tissues (values notpresented) (Supplementary Table45). (e) The GO terms (left) in which the duplicated genepairs contain a significantly higher ratio of differentially expressedduplicated gene pairs than the average ratio at the genome-wide level inleaf and other tissues (SupplementaryTable 46). Values from one tissue were presented and the othertissues were indicated with abbreviated letters to the right if expressionin these tissues is significantly higher. (f) Expression variationcaused by divergence (either different variants or differential expressionof the same variants) of alternative splicing (AS) variants in WGTparalogous gene pairs with identical numbers of exons and inBol–Bra orthologous gene pairs. IRES denotes types of intronretention and exon skipping.

Figure 5

Whole-genome-wide comparison of genes involved in glucosinolate metabolismpathways in B. oleracea and its relatives.

(a) Aliphatic and indolic GSL biosynthesis and catabolism pathways inA. thaliana, B. oleracea and B. rapa. The copynumber of GSL biosynthetic genes in A. thaliana, B. rapa andB. oleracea are listed in square brackets, respectively.Potential anticancer substances/precursors are highlighted in blue bold. Twoimportant amino acid chain elongation and side-chain modification lociMAMs and AOP2 are highlighted in red bold, within thenumber in the green bracket representing the number of non-functional genes.(b,c) The neighbour-joining (NJ) trees of MAM and AOPgenes were generated based on the aligned coding sequences and 100 bootstraprepeats. The silenced genes are indicated by red hollow circle, expressedfunctional genes are represented by red solid disc and green rectangle. InA. thaliana ecotype Columbia there are just MAM1 and MAM3. (d) Three B.oleracea AOP2 loci among which are one functional AOP2 andtwo mutated AOP2. 1MOI3M: 1-methoxyindol-3-ylmethyl GSL; 1OHI3M:1-hydroxyindol-3-ylmethyl GSL; 3MSOP: 3-methylsulfinylpropyl GSL; 3MTP: 3-methylthiopropyl GSL;3PREY: 2-Propenyl GSL; 4BTEY: 3-butenyl GSL; 4MOI3M: 4-methoxyindol-3-ylmethyl GSL;4OHB, 4-hydroxybutyl GSL; 4OHI3M: 4-hydroxyindol-3-ylmethyl GSL;4MSOB: 4-methylsulfinylbutyl GSL;4MTB, 4-methylthiobutyl GSL; AITC: allyl isothiocyanate; I3C: indole-3-carbinol; I3M: indolyl-3-methyl GSL; DIM: 3,3′-diindolymethane;MAM: methylthioalkylmalate; AOP: 2-oxoglutarate-dependent dioxygenase.

Table 1

Summary of genome assembly and annotation of B. oleracea.

B. oleraceagenome assembly

N90
N50
Longest
Total size
Contig size (bp)3,52726,828199,461502,114,421
Contig number22,6695,425
Total number(>2 kb):27,351
Scaffold size (bp)258,9061,457,0558,788,225539,907,250
Scaffold number388224Anchored to chr. 72%
Total number(>2 kb):1,809
B. oleraceagenome annotation
B. oleracea
In the assembly
In WG short reads*
Size (bp)Copy number% assembly
Retrotransposon105,755,173108,94822.1323.60
DNA transposon79,675,583170,50016.6712.71
Total
185,430,756
279,448
38.80
36.31

Gene models
Gene space covered§
Annotated
Supported by ESTs
Protein-coding genes
45,758
98%
91.6%
99.0%

Average transcript length
Average coding length
No. of average exons
No. of alternative splicing variants

1,762 bp
1,037 bp
4.6
30,932
Non-coding RNA
miRNA
tRNA
rRNA
snRNA
Copy number3361,4255531,442
Average length (bp)11975166110

*WG, whole genome, 20 × coveragereads were randomly sampled from all the genomic short readslibraries.

The copy number of TEs was fromthe RepeatMasker results.

The ungapped regions were used todetect the percentage of TEs in the assembly. TE sizes arefrom the ungapped regions of B. oleracea 477,847,347 bp.

§Estimated by public BrassicaESTs and RNA-seq data.

Acknowledgments

This work was supported by the National Basic Research Program of China(2011CB109300, 2012CB113906, 2012CB723007 and 2006CB101600), the National NaturalScience Foundation of China (3067134, 30671119 and 31301039), the National HighTechnology Research and Development Program (2013AA102602, 2012AA100105 and2012AA100104), the China Agriculture Research System (CARS-13 and CARS-25-A), theCore Research Budget of the Non-profit Governmental Research Institution(1610172010005), the Special Fund for Agro-scientific Research in the PublicInterest (201103016), China–Australia collaboration project(2010DFA31730), UK Biotechnology and Biological Sciences Research Council(BB/E017363/1), the Australian Research Council (LP0882095, LP0883462, DP0985953 andLP110100200), the Next-Generation BioGreen 21 Program (PJ008944 and PJ008202), andthe US National Science Foundation (IOS 0638418, DBI 0849896, MCB 1021718).

References

  • 1. U.S. Department of Agriculture, Agricultural Research Service. USDANational Nutrient Database for Standard Reference, Release 26-Vegetables andVegetable Products. (2013).
  • 2. KopsellD. A. &KopsellD. E.Accumulation and bioavailability of dietary carotenoids in vegetablecrops. Trends Plant Sci.11, 499507 (2006).[PubMed][Google Scholar]
  • 3. HalkierB. A. &GershenzonJ.Biology and biochemistry of glucosinolates. Annu. Rev. PlantBiol.57, 303333 (2006).[PubMed][Google Scholar]
  • 4. KhwajaF. S., WynneS., PoseyI.& DjakiewD.3,3'-diindolylmethane induction of p75NTR-dependent cell death via the p38mitogen-activated protein kinase pathway in prostate cancer cells.Cancer Prev. Res. (Phila)2, 566571 (2009).[PubMed][Google Scholar]
  • 5. LiY.. Sulforaphane, a dietary component of broccoli/broccolisprouts, inhibits breast cancer stem cells. Clin. CancerRes.16, 25802590 (2010).[PubMed][Google Scholar]
  • 6. HigdonJ. V.,DelageB., WilliamsD. E. & DashwoodR.H.Cruciferous vegetables and human cancer risk: epidemiologic evidence andmechanistic basis. Pharmacol Res.55, 224236 (2007).[PubMed][Google Scholar]
  • 7. WarwickS. I.,FrancisA. &Al-ShehbazI. A.Brassicaceae: species checklist and database on CD-Rom. Pl.Syst. Evol.259, 249258 (2006).[Google Scholar]
  • 8. Nagaharu, U.Genome analysis in Brassica with special reference to theexperimental formation of B. napus and peculiar mode offertilication. Jap. J. Bot.7, 389452 (1935).[Google Scholar]
  • 9. BowersJ. E.,ChapmanB. A.,RongJ. &PatersonA. H.Unravelling angiosperm genome evolution by phylogenetic analysis ofchromosomal duplication events. Nature422, 433438 (2003).[PubMed][Google Scholar]
  • 10. JiaoY.. Ancestral polyploidy in seed plants and angiosperms.Nature473, 97100 (2011).[PubMed][Google Scholar]
  • 11. WangX.. The genome of the mesopolyploid crop species Brassicarapa. Nat. Genet.43, 10351039 (2011).[PubMed][Google Scholar]
  • 12. LysakM. A., KochM. A., PecinkaA. & SchubertI.Chromosome triplication found across the tribe Brassiceae.Genome Res.15, 516525 (2005).[PubMed][Google Scholar]
  • 13. ChengF.. Deciphering the diploid ancestral genome of the MesohexaploidBrassica rapa. Plant Cell25, 15411554 (2013).[PubMed][Google Scholar]
  • 14. TownC. D.. Comparative genomics of Brassica oleracea andArabidopsis thaliana reveal gene loss, fragmentation, anddispersal after polyploidy. Plant Cell18, 13481359 (2006).[PubMed][Google Scholar]
  • 15. MunJ. H.. Genome-wide comparative analysis of the Brassica rapagene space reveals genome shrinkage and differential loss of duplicatedgenes after whole genome triplication. Genome Biol.10, R111 (2009).[PubMed][Google Scholar]
  • 16. WangW.. Construction and analysis of a high-density genetic linkagemap in cabbage (Brassica oleracea L. var. capitata).BMC Genomics13, 523 (2012).[PubMed][Google Scholar]
  • 17. WangX.. A physical map of Brassica oleracea shows complexityof chromosomal changes following recursive paleopolyploidizations.BMC Genomics12, 470 (2011).[PubMed][Google Scholar]
  • 18. BancroftI.. Dissecting the genome of the polyploid crop oilseed rape bytranscriptome sequencing. Nat. Biotechnol.29, 762766 (2011).[PubMed][Google Scholar]
  • 19. Arabidopsis Genome and Initiative. Analysis of the genome sequence ofthe flowering plant Arabidopsis thaliana. Nature408, 796815 (2000).[PubMed][Google Scholar]
  • 20. SchranzM. E.,LysakM. A. &Mitchell-OldsT.The ABC’s of comparative genomics in the Brassicaceae: buildingblocks of crucifer genomes. Trends Plant Sci.11, 535542 (2006).[PubMed][Google Scholar]
  • 21. WoodhouseM. R.. Following tetraploidy in maize, a short deletion mechanismremoved genes preferentially from one of the two homologs. PLoSBiol.8, e1000409 (2010).[PubMed][Google Scholar]
  • 22. DevosK. M., BrownJ. K. & BennetzenJ.L.Genome size reduction through illegitimate recombination counteracts genomeexpansion in Arabidopsis. Genome Res.12, 10751079 (2002).[PubMed][Google Scholar]
  • 23. LysakM. A., CheungK., KitschkeM.& BuresP.Ancestral chromosomal blocks are triplicated in Brassiceae species withvarying chromosome number and genome size. Plant Physiol.145, 402410 (2007).[PubMed][Google Scholar]
  • 24. PanjabiP.. Comparative mapping of Brassica juncea andArabidopsis thaliana using Intron Polymorphism (IP) markers:homoeologous relationships, diversification and evolution of the A, B and CBrassica genomes. BMC Genomics9, 113 (2008).[PubMed][Google Scholar]
  • 25. ParkinI. A.. Segmental structure of the Brassica napus genome basedon comparative analysis with Arabidopsis thaliana.Genetics171, 765781 (2005).[PubMed][Google Scholar]
  • 26. WuJ.. The genome of the pear (Pyrus bretschneideriRehd.). Genome Res.23, 396408 (2012).[PubMed][Google Scholar]
  • 27. The Tomato Genome Consortium. The tomato genome sequence providesinsights into fleshy fruit evolution. Nature485, 635641 (2012).[PubMed][Google Scholar]
  • 28. CheungF.. Comparative analysis between homoeologous genome segments ofBrassica napus and its progenitor species reveals extensivesequence-level divergence. Plant Cell21, 19121928 (2009).[PubMed][Google Scholar]
  • 29. FreelingM.Bias in plant gene content following different sorts of duplication:tandem, whole-genome, segmental, or by transposition. Annu. Rev.Plant Biol.60, 433453 (2009).[PubMed][Google Scholar]
  • 30. SankoffD., ZhengC. & ZhuQ.The collapse of gene complement following whole genome duplication.BMC Genomics11, 313 (2010).[PubMed][Google Scholar]
  • 31. WoodhouseM. R.,TangH. &FreelingM.Different gene families in Arabidopsis thaliana transposed indifferent epochs and at different frequencies throughout the rosids.Plant Cell23, 42414253 (2011).[PubMed][Google Scholar]
  • 32. LouP.. Preferential retention of circadian clock genes duringdiploidization following whole genome triplication in Brassicarapa. Plant Cell24, 24152426 (2012).[PubMed][Google Scholar]
  • 33. DoyleJ. J.. Evolutionary genetics of genome merger and doubling inplants. Annu. Rev. Genet.42, 443461 (2008).[PubMed][Google Scholar]
  • 34. WangX., TangH. & PatersonA.H.Seventy million years of concerted evolution of a homoeologous chromosomepair, in parallel, in major Poaceae lineages. Plant Cell23, 2737 (2011).[PubMed][Google Scholar]
  • 35. SyedN. H., KalynaM., MarquezY.,BartaA. &BrownJ. W.Alternative splicing in plants--coming of age. Trends PlantSci.17, 616623 (2012).[PubMed][Google Scholar]
  • 36. GabutM.. An alternative splicing switch regulates embryonic stem cellpluripotency and reprogramming. Cell147, 132146 (2011).[PubMed][Google Scholar]
  • 37. ZhangP. G., HuangS. Z., PinA.L. & AdamsK. L.Extensive divergence in alternative splicing patterns after gene and genomeduplication during the evolutionary history of Arabidopsis.Mol. Biol. Evol.27, 16861697 (2010).[PubMed][Google Scholar]
  • 38. FilichkinS. A.. Genome-wide mapping of alternative splicing in Arabidopsisthaliana. Genome Res.20, 4558 (2010).[PubMed][Google Scholar]
  • 39. YangB. &QuirosC. F.Survey of glucosinolate variation in leaves of Brassica rapacrops. Genet. Res. Crop Evol.57, 10791089 (2010).[Google Scholar]
  • 40. BenderothM., PfalzM. & KroymannJ.Methylthioalkylmalate synthases: genetics, ecology and evolution.Phytochem. Rev.8, 255268 (2009).[Google Scholar]
  • 41. BenderothM.. Positive selection driving diversification in plant secondarymetabolism. Proc. Natl. Acad. Sci. USA103, 91189123 (2006).[PubMed][Google Scholar]
  • 42. TextorS., deKrakerJ. W., HauseB., GershenzonJ. & TokuhisaJ.G.MAM3 catalyses the formation of all aliphatic glucosinolate chain lengthsin Arabidopsis. Plant Physiol.144, 6071 (2007).[PubMed][Google Scholar]
  • 43. VoldenJ.. Processing (blanching, boiling, steaming) effects on thecontent of glucosinolates and antioxidant related parameters in cauliflower(Brassica oleracea L. ssp. botrytis). LWT FoodSci. Technol.42, 6373 (2009).[Google Scholar]
  • 44. WangH.. Glucosinolate biosynthetic genes in Brassicarapa. Gene487, 135142 (2011).[PubMed][Google Scholar]
  • 45. ChenZ. J.Molecular mechanisms of polyploidy and hybrid vigour. TrendsPlant Sci.15, 5771 (2010).[PubMed][Google Scholar]
  • 46. SchnableJ. C.,SpringerN. M. &FreelingM.Differentiation of the maize subgenomes by genome dominance and bothancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA108, 40694074 (2011).[PubMed][Google Scholar]
  • 47. StankeM.,SteinkampR., WaackS. & MorgensternB.AUGUSTUS: a web server for gene finding in eukaryotes. NucleicAcids Res.32, W309W312 (2004).[PubMed][Google Scholar]
  • 48. MajorosW. H.,PerteaM. &SalzbergS. L.TigrScan and GlimmerHMM: two open source ab initio eukaryoticgene-finders. Bioinformatics20, 28782879 (2004).[PubMed][Google Scholar]
  • 49. BirneyE., ClampM. & DurbinR.GeneWise and Genomewise. Genome Res.14, 988995 (2004).[PubMed][Google Scholar]
  • 50. XuY., WangX., YangJ.,VaynbergJ. &QinJ.PASA—a program for automated protein NMR backbone signalassignment by pattern-filtering approach. J. Biomol. NMR34, 4156 (2006).[PubMed][Google Scholar]
  • 51. ElsikC. G.. Creating a honey bee consensus gene set. GenomeBiol.8, R13 (2007).[PubMed][Google Scholar]
  • 52. LoweT. M. &EddyS. R.tRNAscan-SE: a program for improved detection of transfer RNA genes ingenomic sequence. Nucleic Acids Res.25, 955964 (1997).[PubMed][Google Scholar]
  • 53. NawrockiE. P.,KolbeD. L. &EddyS. R.Infernal 1.0: inference of RNA alignments. Bioinformatics25, 13351337 (2009).[PubMed][Google Scholar]
  • 54. McCarthyE. M. &McDonaldJ. F.LTR_STRUC: a novel search and identification program for LTRretrotransposons. Bioinformatics19, 362367 (2003).[PubMed][Google Scholar]
  • 55. MaJ., DevosK. M. & BennetzenJ.L.Analyses of LTR-retrotransposon structures reveal recent and rapid genomicDNA loss in rice. Genome Res.14, 860869 (2004).[PubMed][Google Scholar]
  • 56. HolliganD., ZhangX., JiangN.,PrithamE. J. &WesslerS. R.The transposable element landscape of the model legume Lotusjaponicus. Genetics174, 22152228 (2006).[PubMed][Google Scholar]
  • 57. YangL. &BennetzenJ. L.Structure-based discovery and description of plant and animalHelitrons. Proc. Natl Acad. Sci. USA106, 1283212837 (2009).[PubMed][Google Scholar]
  • 58. WickerT.. A unified classification system for eukaryotic transposableelements. Nat. Rev. Genet.8, 973982 (2007).[PubMed][Google Scholar]
  • 59. SmitA., HubleyR. & GreenP. RepeatMasker. http://www.repeatmasker.org.
  • 60. LiL., StoeckertC. J.Jr & RoosD.S.OrthoMCL: identification of ortholog groups for eukaryotic genomes.Genome Res.13, 21782189 (2003).[PubMed][Google Scholar]
  • 61. LarkinM. A.. Clustal W and Clustal X version 2.0.Bioinformatics23, 29472948 (2007).[PubMed][Google Scholar]
  • 62. TamuraK.. MEGA5: molecular evolutionary genetics analysis using maximumlikelihood, evolutionary distance, and maximum parsimony methods.Mol. Biol. Evol.28, 27312739 (2011).[PubMed][Google Scholar]
  • 63. GuindonS., DelsucF., DufayardJ.F. & GascuelO.Estimating maximum likelihood phylogenies with PhyML. MethodsMol. Biol.537, 113137 (2009).[PubMed][Google Scholar]
  • 64. TrapnellC.. Differential gene and transcript expression analysis ofRNA-seq experiments with TopHat and Cufflinks. Nat. Protoc.7, 562578 (2012).[PubMed][Google Scholar]
  • 65. RoulinA.. The fate of duplicated genes in a polyploid plantgenome. Plant J.73, 143153 (2012).[Google Scholar]
  • 66. GuK., NgH.K., TangM. L.& SchucanyW. R.Testing the ratio of two poisson rates. Biom. J.50, 283298 (2008).[PubMed][Google Scholar]
  • 67. BenjaminiY. &HochbergY.Controlling the false discovery rate: a practical and powerful approach tomultiple testing. J. Roy. Statist. Soc. Ser.57, 289300 (1995).[Google Scholar]
  • 68. KimuraM.A simple method for estimating evolutionary rates of base substitutionsthrough comparative studies of nucleotide sequences. J. Mol.Evol.16, 111120 (1980).[PubMed][Google Scholar]
  • 69. TamuraK., DudleyJ., NeiM.& KumarS.MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version4.0. Mol. Biol. Evol.24, 15961599 (2007).[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.