Sequence of the Sugar Pine Megagenome.
Journal: 2017/May - Genetics
ISSN: 1943-2631
Abstract:
Until very recently, complete characterization of the megagenomes of conifers has remained elusive. The diploid genome of sugar pine (Pinus lambertiana Dougl.) has a highly repetitive, 31 billion bp genome. It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group that is notable for having the largest genomes among the pines. The genome represents a unique opportunity to investigate genome "obesity" in conifers and white pines. Comparative analysis of P. lambertiana and P. taeda L. reveals new insights on the conservation, age, and diversity of the highly abundant transposable elements, the primary factor determining genome size. Like most North American white pines, the principal pathogen of P. lambertiana is white pine blister rust (Cronartium ribicola J.C. Fischer ex Raben.). Identification of candidate genes for resistance to this pathogen is of great ecological importance. The genome sequence afforded us the opportunity to make substantial progress on locating the major dominant gene for simple resistance hypersensitive response, Cr1 We describe new markers and gene annotation that are both tightly linked to Cr1 in a mapping population, and associated with Cr1 in unrelated sugar pine individuals sampled throughout the species' range, creating a solid foundation for future mapping. This genomic variation and annotated candidate genes characterized in our study of the Cr1 region are resources for future marker-assisted breeding efforts as well as for investigations of fundamental mechanisms of invasive disease and evolutionary response.
Relations:
Content
Citations
(19)
References
(40)
Drugs
(1)
Chemicals
(1)
Organisms
(2)
Processes
(4)
Affiliates
(9)
Similar articles
Articles by the same authors
Discussion board
Genetics 204(4): 1613-1626

Sequence of the Sugar Pine Megagenome

+14 authors
Department of Evolution and Ecology, University of California at Davis, California 95616
Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut
Institute for Physical Sciences and Technology (IPST), University of Maryland, College Park, Maryland
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland
Children’s Hospital Oakland Research Institute, California
Department of Plant Sciences, University of California at Davis, California
United States Department of Agriculture Forest Service, Pacific Southwest Research Station, Placerville, California
Department of Ecosystem Science and Management, Texas A&M University, College Station, Texas
School of Forest Resources and Conservation, University of Florida, Gainesville, Florida
Department of Biology, Virginia Commonwealth University, Richmond, Virginia
Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, Maryland
Departments of Mathematics and Physics, University of Maryland, College Park, Maryland
Corresponding authors: Department of Evolution and Ecology, University of California at Davis, One Shields Ave, Davis, CA 95616. E-mail: ude.sivadcu@snevetsak; and ude.sivadcu@yelgnalhc
These authors contributed equally to this work.
Received 2016 Jun 29; Accepted 2016 Oct 25.

Abstract

Until very recently, complete characterization of the megagenomes of conifers has remained elusive. The diploid genome of sugar pine (Pinus lambertiana Dougl.) has a highly repetitive, 31 billion bp genome. It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group that is notable for having the largest genomes among the pines. The genome represents a unique opportunity to investigate genome “obesity” in conifers and white pines. Comparative analysis of P. lambertiana and P. taeda L. reveals new insights on the conservation, age, and diversity of the highly abundant transposable elements, the primary factor determining genome size. Like most North American white pines, the principal pathogen of P. lambertiana is white pine blister rust (Cronartium ribicola J.C. Fischer ex Raben.). Identification of candidate genes for resistance to this pathogen is of great ecological importance. The genome sequence afforded us the opportunity to make substantial progress on locating the major dominant gene for simple resistance hypersensitive response, Cr1. We describe new markers and gene annotation that are both tightly linked to Cr1 in a mapping population, and associated with Cr1 in unrelated sugar pine individuals sampled throughout the species’ range, creating a solid foundation for future mapping. This genomic variation and annotated candidate genes characterized in our study of the Cr1 region are resources for future marker-assisted breeding efforts as well as for investigations of fundamental mechanisms of invasive disease and evolutionary response.

Keywords: conifer genome, transposable elements, white pine blister rust
Abstract

THE gymnosperm genus Pinus is diverse and ubiquitous in temperate zones (Critchfield and Little 1966; Farjon and Filer 2013). Pines are often the keystone trees of terrestrial ecosystems (Richardson and Rundel 1998; Keane et al. 2012, and citations therein). Typical of conifers, pines have megagenomes that vary greatly in size among species, yet their karyotype is highly conserved. Pinus is divided into two large, ancient monophyletic subgenera, Strobus and Pinus, “white pines” and “yellow pines,” respectively (Critchfield and Little 1966; Gernandt et al. 2005). The first Pinus genome sequence (22 Gbp) was recently reported for Pinus taeda L. (Zimin et al. 2014), a yellow pine commonly known as loblolly pine. The genomes of white pines are larger and more variable in size (Tomback 1982). Fossils allied with Strobus are known from the early Tertiary and late Cretaceous (Millar 1998), consistent with molecular phylogenetic dating of the crown group Strobus at 45–85 MYA (Willyard et al. 2007; DeGiorgio et al. 2014). Populations of a number of the majestic white pines of North America, and their associated ecosystems, have been devastated over the last century by white pine blister rust, WPBR (Kinloch 1992) caused by a highly pathogenic and invasive fungus, Cronartium ribicola J.C. Fischer ex Raben. While major gene resistance to this disease has been discovered in several species, and loci have been placed on the genetic maps of Pinus lambertiana Dougl. (Harkins et al. 1998; Jermstad et al. 2011) and P. monticola Dougl. ex D.Don (Liu et al. 2006), the discovery of the underlying genes, and of markers serviceable for genetic improvement in reforestation, may be greatly accelerated by the genome sequence itself.

P. lambertiana, commonly known as sugar pine, is a white pine native to western North America that is distributed from northern Oregon to Baja California at a wide span of altitudes. It is currently the tallest pine species, with heights reaching 76 m. The female cones of sugar pine are also gigantic, often longer than 600 mm (Kinloch and Scheuner 1990; Van Pelt 2001; American Forests 2015). P. lambertiana trees may live > 500 years, and the onset of the species’ sexual reproduction is delayed compared to other pines, possibly due to the height and girth needed to support these massive strobili. Paralleling these oversized dimensions, the genome of P. lambertiana was estimated from cytometry to be 31 Gbp (see below), nearly 50% larger than that of P. taeda and ten times the size of the human genome. While P. lambertiana was historically a significant timber source, heavy harvesting, and the arrival of the devastating white pine blister rust to its range, has changed the management focus. Since this species plays important ecological roles in the maintenance of biodiversity, carbon sequestration, soil stabilization, and watershed protection (Maloney 2012), considerable effort and resources have been deployed both by the US Forest Service and the private sector to structure the genetics of reforestation to fit the ecological factors, especially WPBR (reviewed in Waring and Goodrich 2012). In particular, the screening by progeny testing of diverse seed sources for individual trees carrying the major gene for WPBR resistance, Cr1 (Kinloch 1992), has been ongoing for more than a decade. These extra costs of collecting seeds from candidate trees throughout the species range, of progeny testing for WPBR resistance (requiring several years), and the deployment of resistant seedlings, are significant components of forest management. Genotyping by markers with strong associations to WPBR resistance has the potential to greatly reduce both the effort and time required by the ongoing approach, and could open new strategies. Here, we demonstrate that the sequencing, assembly, and annotation of the genome sequence of P. lambertiana greatly accelerates the discovery of such genetic tools.

Conifer evolution and genome size

All members of the genus Pinus have 12 chromosomes (Saylor 1960) and are considered to be karyotypically stable throughout their evolutionary history (Sax 1960; Saylor 1964). With the exception of a potential event preceding the radiation (Li et al. 2015), whole genome polyploidy is thought to be absent among the ≥100 species. However, the amount of nuclear DNA that comprises a single copy of a pine genome can vary widely between species. Flow cytometric estimates for the genus Pinus in the C-values database (Bennett and Leitch 2012) range from a low of 20 Gbp for P. muricata D. Don, to a high of 35 Gbp for P. ayacahuite Ehrenb. ex Schltdl. (Figure 1B). The correlates and causes of this variation in genome size, including in Pinus, are an open topic of speculation and investigation (Williams et al. 2002; Grotkopp et al. 2004; Ahuja and Neale 2005; Morse et al. 2009).

An external file that holds a picture, illustration, etc.
Object name is 1613fig1.jpg

(A) The phylogeny of major genera within the Pinaceae along with genome size estimates. P. lambertiana falls in the Strobus subgenus. Inference was conducted using Bayesian analysis as implemented in BEAST ver. 2.2.0 (Bouckaert et al. 2014). Gray bars represent the 95% highest posterior density range for the age of the node. Data used for inference were 28 independent nuclear gene regions (see Eckert et al. 2013a,b), sequenced and assembled for representative taxa selected within each taxonomic group [Pinus subg. Pinus: P. taeda; Pinus subg. Strobus: P. lambertiana; Picea: P. abies; Pseudotsuga: P. menziesii (Mirb.) Franco; Larix: L. decidua Mill.; Abies: A. alba Mill.]. Details are presented in the Supplementary Methods in File S1 (B) Illustration of the genome size trends of major genera within Pinaceae. Genome sizes are from the c-values database (Bennett and Leitch 2012). Diamonds mark the estimates of genomes with a reference sequence. Point estimates in each category are shown as short horizontal lines. Species from other genera within the Pinaceae are shown in gray.

The two subgenera of Pinus diverged ∼45–85 MYA ago (Figure 1A) (see also Willyard et al. 2007). Members of Strobus have an average genome size 5.2 Gbp larger than the subgenus Pinus (Figure 1B) (Grotkopp et al. 2004). The majority of sequenced conifer megagenomes are composed of interspersed repetitive sequences, with estimates ranging from 69% for Picea abies (L.) H. Karst. (Nystedt et al. 2013) to 80% for P. taeda (Wegrzyn et al. 2014). The evolutionary dynamics of transposable elements (TEs) have long been suspected to shape genomic change, including overall genome size, in numerous species (Orgel and Crick 1980; Hawkins 2006; Piegu et al. 2006; Tenaillon et al. 2011), including conifers (Nystedt et al. 2013). In contrast to angiosperms, where genome duplication events and LTR retrotransposon bursts are frequent, and account for most of the genome size expansions, a continual accretion of repeats may provide a better explanation of genome size variation within the genus Pinus (Morse et al. 2009). The genome sequence of P. lambertiana presents a new opportunity to address elements of the hypothesis that TE dynamics are behind these significant changes in genome size.

White pine blister rust

WPBR, the non-native heteroecious fungus Cronartium ribicola, infects North American pines of the Strobus subgenus. An invasive species, C. ribicola has devastated populations of five-needle pines, including P. strobus L. (eastern white pine), P. monticola (western white pine), P. lambertiana (sugar pine), P. flexilis James (limber pine), and P. albicaulis Engelm. (whitebark pine), and foxtail pine, along with closely related bristlecone pines (subgenus Pinus subsection Balfourianae) since its introduction from Asia or Europe a century ago. Damage from C. ribicola is known to reduce reproduction and survival of the majority of white pine species (Kinloch 1970; Waring and Goodrich 2012). Exacerbated by recent outbreaks of the mountain pine beetle, decreasing pine populations have affected wildlife, biodiversity, watershed, and timber potential. Rare individuals among the white pines species exhibit innate and heritable resistance that forms the basis for various selective reforestation efforts (Kinloch 2003). A major “gene” of resistance (MGR) to WPBR was mapped in P. lambertiana over 40 years ago (Kinloch 1970). An apparently biallelic locus, Cr1R/Cr1r locus has been mapped in several P. lambertiana families (Devey et al. 1995; Harkins et al. 1998; Jermstad et al. 2011). In this work, we leverage these markers and the assembled P. lambertiana genome to identify large genomic scaffolds tightly linked to Cr1 and SNPs in strong association with Cr1R. We discuss possible Cr1 candidates among the annotated genes.

Sequencing and assembly

The sequencing and assembly approach used here for P. lambertiana is an adaptation of the approach successfully used for P. taeda (Neale et al. 2014; Zimin et al. 2014). We have found that the haploid DNA obtainable from a single megagametophyte from the target genotype is sufficient to form the basis of a high quality whole genome shotgun assembly. For additional contiguity, haploid megagametophyte coverage is supplemented with longer linking mate pair libraries using DNA isolated from abundantly available diploid needle tissue of the maternal parent. For additional contiguity of the gene space, we performed transcriptome-based scaffolding using deep coverage RNA-Seq data. The nearly 50% larger size of the P. lambertiana genome required changes to the previous software methods to make assembly tractable. The resulting draft genome sequence described here has an N50 scaffold size of 246.6 kbp and a total estimated genome size of 31 Gbp, making it the largest genome sequenced and assembled to date.

N50 statistics were calculated using an estimated genome size of 31 Gbp. Paired end sequencing depth represents the raw output prior to error correction. Physical coverage estimated by MaSuRCA (including the inferred DNA fragement) is reported here for all libraries by chemistry (see Supplementary Methods in File S1).

Erroneous k-mers refer to k-mers that were identified as likely to contain errors, and these were removed from the calculation.

Each pool contained 48 fosmids.

Acknowledgments

We thank Carson Holt and Mark Yandell for their modifications to their MAKER-P pipeline to support conifer genomes. Funding for this project was provided through a United States Department of Agriculture/ National Institute of Food and Agriculture (USDA/NIFA) (2011-67009-30030) award to D.B.N. at University of California, Davis.

Note added in proof: See Gonzalez-Ibeas et al. 2016 (pp. 3787–3802) in G3: Genes, Genomes, Genetics for a related work.

Acknowledgments

Footnotes

Communicating editor: S. C. Gonzalez-Martinez

Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.193227/-/DC1.

Footnotes

Literature Cited

Literature Cited
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.