The time-resolved transcriptome of <em>C. elegans</em>
Supplementary Material
Abstract
We generated detailed RNA-seq data for the nematode Caenorhabditis elegans with high temporal resolution in the embryo as well as representative samples from post-embryonic stages across the life cycle. The data reveal that early and late embryogenesis is accompanied by large numbers of genes changing expression, whereas fewer genes are changing in mid-embryogenesis. This lull in genes changing expression correlates with a period during which histone mRNAs produce almost 40% of the RNA-seq reads. We find evidence for many more splice junctions than are annotated in WormBase, with many of these suggesting alternative splice forms, often with differential usage over the life cycle. We annotated internal promoter usage in operons using SL1 and SL2 data. We also uncovered correlated transcriptional programs that span >80 kb. These data provide detailed annotation of the C. elegans transcriptome.
RNA transcripts represent a direct readout of the information stored in a genome. Their differential abundance in turn reflects the regulatory networks operative in the organism. Accurate and comprehensive characterization of RNA transcript levels is central to an understanding of how an organism's genome dictates its traits and behavior. In the nematode Caenorhabditis elegans, multiple different studies assayed the RNA content at different stages and in different tissues. Microarray studies, including a detailed embryonic time course using small numbers of hand-picked embryos, gave a picture of overall gene expression across early development (Kim et al. 2001; Baugh et al. 2003; Levin et al. 2012). SAGE tags provided a deeper analysis of transcripts present at various stages and in certain tissues or cell types (Shin et al. 2008; McGhee et al. 2009). More recently, CEL-seq has been used on individual embryos to produce a detailed embryonic time series (Hashimshony et al. 2015).
Each of the aforementioned studies has provided useful insight into the RNA transcripts present during the life cycle, but none cover the entire life cycle and each has its own shortcomings. Microarray studies have a limited dynamic range, often assay only annotated genes, fail to distinguish between close paralogs, and usually ignore different isoforms. Studies using small numbers of embryos require multiple rounds of amplification, possibly introducing significant distortion into the expression measurements. SAGE tags attempt only to assay 3′ ends of polyadenylated [poly(A)] transcripts, ignoring splicing; internal priming at A-rich sites can create false positive tags. In addition, the short length of early SAGE tags led to ambiguity in genome alignment. CEL-seq on individual embryos can assay very precise time points, but again the method only seeks to count 3′ ends of poly(A) mRNAs. In addition, the limited efficiency of CEL-seq in copying RNA into DNA and the subsequent amplification leads to irregular representation of lower abundance transcripts. The lack of a single comprehensive data set across the full life cycle complicates comparison of gene expression levels at different stages.
To provide a comprehensive, high quality, uniformly collected expression data set for C. elegans, we performed RNA-seq on bulk samples from synchronized animals across the full life cycle, including embryonic samples beginning at four cells and sampled at 30-min intervals. Obtaining these embryo data required the development of a novel method to synchronize bulk populations of embryos and the implementation of a Bayesian approach to refine the estimates of gene expression within individual developmental series and to combine multiple series. The resultant new embryo data, combined with expression data from larval stages, dauers, males, and aged adults collected as part of the modENCODE project (Hillier et al. 2009; Gerstein et al. 2010, 2014), reveal the pattern of expression for protein coding genes, as well as the patterns of noncoding transcripts, splice junctions, and spliced leader sequences across the full life cycle.
Click here to view.Acknowledgments
We thank Pnina Strasbourger for assistance in making RNA-seq libraries; Calvin Mok and Adam Warner for helpful discussions; and John Murray and Don Moerman for comments on the manuscript. This work was supported by National Institutes of Health (NIH) grants U01HG004263 and R01GM072675 to R.H.W. and by the William H. Gates Chair of Biomedical Sciences.
Footnotes
[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.202663.115.
Freely available online through the Genome Research Open Access option.




