A highly conserved program of neuronal microexons is misregulated in autistic brains.
Journal: 2015/February - Cell
ISSN: 1097-4172
Abstract:
Alternative splicing (AS) generates vast transcriptomic and proteomic complexity. However, which of the myriad of detected AS events provide important biological functions is not well understood. Here, we define the largest program of functionally coordinated, neural-regulated AS described to date in mammals. Relative to all other types of AS within this program, 3-15 nucleotide "microexons" display the most striking evolutionary conservation and switch-like regulation. These microexons modulate the function of interaction domains of proteins involved in neurogenesis. Most neural microexons are regulated by the neuronal-specific splicing factor nSR100/SRRM4, through its binding to adjacent intronic enhancer motifs. Neural microexons are frequently misregulated in the brains of individuals with autism spectrum disorder, and this misregulation is associated with reduced levels of nSR100. The results thus reveal a highly conserved program of dynamic microexon regulation associated with the remodeling of protein-interaction networks during neurogenesis, the misregulation of which is linked to autism.
Relations:
Content
Citations
(105)
References
(63)
Chemicals
(1)
Genes
(1)
Organisms
(3)
Processes
(3)
Anatomy
(2)
Affiliates
(8)
Similar articles
Articles by the same authors
Discussion board
Cell 159(7): 1511-1523

A highly conserved program of neuronal microexons is misregulated in autistic brains

+8 authors

Introduction

Alternative splicing (AS) – the process by which different pairs of splice sites are selected in precursor mRNA to generate multiple mRNA and protein products – is responsible for greatly expanding the functional and regulatory capacity of metazoan genomes (Braunschweig et al., 2013; Chen and Manley, 2009; Kalsotra and Cooper, 2011). For example, transcripts from over 95% of human multi-exon genes undergo AS, and most of the resulting mRNA splice variants are variably expressed between different cell and tissue types (Pan et al., 2008; Wang et al., 2008). However, the function of the vast majority of AS events detected to date are not known, and new landscapes of AS regulation remain to be discovered and characterized(Braunschweig et al., 2014; Eom et al., 2013). Moreover, since the misregulation of AS frequently causes or contributes to human disease, there is a pressing need to systematically define the functions of splice variants in disease contexts.

AS generates transcriptomic complexity through differential selection of cassette alternative exons, alternative 5′ and 3′ splice sites, mutually exclusive exons, and alternative intron retention. These events are regulated by the interplay of cis-acting motifs and trans-acting factors that control the assembly of spliceosomes (Chen and Manley, 2009; Wahl et al., 2009). The assembly of spliceosomes at 5′ and 3′ splice sites is typically regulated by RNA binding proteins (RBPs) that recognize proximal cis-elements, referred to as exonic/intronic splicing enhancers and silencers (Chen and Manley, 2009). An important advance that is facilitating a more general understanding of the role of individual AS events is the observation that many cell/tissue type- and developmentally-regulated AS events are coordinately controlled by individual RBPs, and that these events are significantly enriched in genes that operate in common biological processes and pathways (Calarco et al., 2011; Irimia and Blencowe, 2012; Licatalosi and Darnell, 2010).

AS can have dramatic consequences on protein function, and/or affect the expression, localization and stability of spliced mRNAs (Irimia and Blencowe, 2012). While cell and tissue differentially-regulated AS events are significantly under-represented in functionally defined, folded domains in proteins, they are enriched in regions of protein disorder that typically are surface accessible and embed short linear interaction motifs (Buljan et al., 2012; Ellis et al., 2012; Romero et al., 2006). AS events located in these regions are predicted to participate in interactions with proteins and other ligands (Buljan et al., 2012; Weatheritt et al., 2012). Indeed, among a set of analyzed neural-specific exons enriched in disordered regions, approximately one third promoted or disrupted interactions with partner proteins (Ellis et al., 2012). These observations suggested that a widespread role for regulated exons is to specify cell and tissue type-specific protein interaction networks.

Human disease mutations often disrupt cis-elements that control splicing and result in aberrant AS patterns (Cartegni et al., 2002). Other disease changes affect the activity or expression of RBPs, causing entire programs of AS to be misregulated. For example, amyotrophic lateral sclerosis-causing mutations in the RBPs TLS/FUS and TDP43 affect AS and other aspects of post-transcriptional regulation (Polymenidou et al., 2012), and changes in the expression of the RBP RBFOX1 have been linked to misregulation of AS in the brains of individuals with autism spectrum disorder (ASD) (Voineagu et al., 2011). It is also widely established that misregulation of AS plays important roles in altering the growth and invasiveness of various cancers (David and Manley, 2010). As is the case with assessing the normal functions of AS, it is generally not known which disease-misregulated AS events cause or contribute to disease phenotypes.

Central to addressing the above questions is the importance of comprehensively defining AS programs associated with normal and disease biology. Gene prediction algorithms, high-throughput RNA sequencing (RNA-Seq) analysis methods, and RNA-Seq datasets generally lack the sensitivity and/or depth required to detect specific types of AS. In particular, microexons (Beachy et al., 1985; Coleman et al., 1987), defined here as 3-27 nucleotide (nt)-long exons, have been largely missed by genome annotations and transcriptome profiling studies (Volfovsky et al., 2003; Wu et al., 2013; Wu and Watanabe, 2005). This is especially true for microexons shorter than 15 nts. Furthermore, where alignment tools have been developed to capture microexons (Wu et al., 2013), they have not been applied to the analysis of different cell and tissue types, or disease states.

In this study, we developed a new RNA-Seq pipeline for the systematic discovery and analysis of all classes of AS, including microexons. By applying this pipeline to deep RNA-Seq datasets from more than 50 diverse cell and tissue types, as well as developmental stages, from human and mouse, we define a large program of neural-regulated AS. Strikingly, neural-included microexons represent the most highly conserved and dynamically-regulated component of this program, and the corresponding genes are highly enriched in neuronal functions. These microexons are enriched on the surfaces of protein interaction domains and are under strong selection pressure to preserve reading frame. We also observe that microexons are frequently misregulated in the brains of autistic individuals, and that this misregulation is linked to the reduced expression of the neural-specific Ser/Arg-related splicing factor of 100kDa, nSR100/SRRM4. Collectively, our results reveal that alternative microexons represent the most highly conserved component of developmental AS regulation identified to date, and that they function in domain surface “microsurgery” to control interaction networks associated with neurogenesis. Microexons thus represent a new landscape for investigating the molecular consequences of AS (mis)regulation in nervous system development and ASD.

Results

Global features of neural-regulated AS

An RNA-Seq analysis pipeline was developed to detect and quantify all AS event classes involving all hypothetically possible splice junctions formed by the usage of annotated and unannotated splice sites, including those that demarcate microexons. By applying this pipeline to more than 50 diverse cell and tissue types each from human and mouse (Table S1), we identified ∼2,500 neural-regulated AS events in each species (Figure 1A and Table S2; Extended Experimental Procedures).

An external file that holds a picture, illustration, etc.
Object name is nihms654938f1.jpg
An extensive program of neural-regulated AS

A) Distribution by type of human AS events with increased/decreased neural inclusion of the alternative sequence. Alt3/5, alternative splice site acceptor/donor selection; IR, intron retention; Microexons, 3-27 nt exons; Single/Multi AltEx, single/multiple cassette exons. B) Predicted impact of non-neural and neural-regulated AS events on proteomes. Neural-regulated events are more often predicted to generate isoforms preserving open reading frame (ORF) when the alternative sequence is included and excluded (“ORF-preserving isoforms”, black), than to disrupt ORFs (i.e. the exon leads to a frame shift and/or introduces a premature termination codon) specifically in neural samples (“ORF disruption in brain”, dark grey) or in non-neural samples (“ORF preservation in brain”, light grey). See Extended Experimental Procedures for details. C) Enrichment map for GO and KEGG categories in genes with neural-regulated AS that are predicted to generate alternative protein isoforms (top), and representative GO terms and their associated enrichment p-value for each subnetwork (bottom). The node size is proportional to the number of genes associated with the GO category, and the width of the edges to the number of genes shared between GO categories.

Nearly half of the neural-regulated AS events, including alternative retained introns, are predicted to generate protein isoforms both when the alternative sequence is included and skipped. In contrast, only ∼20% of AS events not subject to neural regulation (hereafter ‘non-neural’ events) have the potential to generate alternative protein isoforms (Figure 1B; p=2.7×10, proportion test). Gene Ontology (GO) analysis shows that genes with neural-regulated AS events predicted to generate alternative protein isoforms form highly interconnected networks based on functions associated with neuronal biology, signaling pathways, structural components of the cytoskeleton and the plasma membrane (Figure 1C). Consistent with previous results (Fagnani et al., 2007; Pan et al., 2004), there is little overlap (8.5%) between genes with neural-regulated AS and mRNA expression, although these subsets of genes are highly enriched in overlapping GO terms (40% in common; Figure S1). These data reveal the largest program of neural-regulated AS events defined to date, and that this program is associated with a broader range of functional processes and pathways linked to nervous system biology than previously detected (Boutz et al., 2007; Fagnani et al., 2007; Ule et al., 2005).

Highly conserved microexons are frequently neural specific

Further analysis of the neural-regulated AS program revealed a striking inverse relationship between the length of an alternative exon and its propensity to be specifically included in neural tissues. Increased neural-specific inclusion was detected for the majority of microexons (length ≤ 27 nt, Figure 2A); 60.7% of alternative microexons show increased neural ‘percent spliced in’ (PSI) (ΔPSI>15) versus 9.5% of longer (average ∼135 nt) alternative exons (p=1.9×10, proportion test). This trend extends to microexons as short as 3 nt. RT-PCR validation experiments confirmed the RNA-Seq-detected regulatory profiles and inclusion levels of all (10/10) microexons analyzed across ten diverse tissues (R = 0.92, n=107; Figure S2A). To further investigate the cell and tissue type specificity of microexon regulation, we used RNA-Seq data (Sofueva et al., 2013; Zhang et al., 2014; Zhang et al., 2013) to compare their inclusion levels in major glial cell types (astrocytes, microglia and oligodendrocytes), isolated neurons, and in muscle cells and tissues. While up to ∼20% of the detected neural-regulated microexons showed increased PSIs in one or more glial cell types, and/or in muscle, compared to other non-neural tissues, the vast majority (>90%) of neural-regulated microexons display highest PSIs in neurons compared to all other cell and tissue types analyzed (Figure S2B-D, and Supplemental Information). These results indicate that tissue-regulated microexons are predominately neuronal-specific.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f2.jpg
A landscape of highly conserved neural microexons

A) Difference in exon inclusion level (ΔPSI) between the average PSIs for neural samples and non-neural samples (Y-axis) for bins of increasing exon lengths (X-axis). Microexons are defined as exons with lengths of 3-27 nt. Restricting the analysis to alternative exons with a PSI range across samples of >50 showed a similar pattern (data not shown). B) Number of exons by length whose inclusion level is higher (blue), lower (red) or not different (grey) in neural compared to non-neural samples. Short exons tend to be multiple of 3 nts and have higher inclusion in neural samples. C) Percent of neural-regulated microexons (of lengths of 3-15 and 16-27 nt) and longer exons that are predicted to generate alternative ORF-preserving isoforms (black), disrupt the ORF in/outside neural tissues (dark/light grey), or overlap non-coding sequences (white). D) Higher evolutionary conservation of alternative microexons compared to longer alternative exons at the genomic, transcriptomic (i.e. whether the exon is alternatively spliced in both species), and neural-regulatory level. Y-axis shows the percent of conservation at each specific level between human and mouse. p-values correspond to two-sided proportion tests. E) Percent of alternative microexons and longer exons that are detected as neural-regulated (average absolute ΔPSI>25) in each vertebrate species. F) Alternative 3-15 and 16-27 nt microexons show higher average phastCons scores at their intronic boundaries than longer alternative and constitutive exons. See also Figure S2.

Relative to longer alternative exons, microexons, in particular those that are 3-15-nt long and neural-specifically included, are strongly enriched in multiple features indicative of functionally important AS. They are highly enriched for lengths that are multiples of three nts (Figure 2B), and a significantly larger fraction are predicted to generate alternative protein isoforms upon inclusion and exclusion, compared with longer exons (Figure 2C; p<10, proportion test). They are also significantly more often conserved at the levels of genomic sequence, detection in alternatively spliced transcripts, and neural-differential regulation (Figure 2D; Figure S2E, neural-regulated exons; p<0.001 for all pairwise comparisons, proportion tests. Similar results were obtained when comparing neural-regulated microexons and longer exons that have matching distributions of neural versus non-neural ΔPSI values (data not shown)). Of 308 neural-regulated microexons in human, 225 (73.5%) are neural-differentially spliced in mouse, compared to only 527 of 1390 (37.9%) longer neural-regulated exons. Remarkably, while microexons represent only ∼1% of all AS events, they comprise approximately one third of all neural-regulated AS events conserved between human and mouse that are predicted to generate alternative protein isoforms (Figure S2F). Moreover, of ∼150 analyzed mammalian, neural-regulated, 3-15-nt microexons, at least 55 are deeply conserved in vertebrate species spanning 400-450 million years of evolution, from zebrafish and/or shark to human (Table S3). This is in marked contrast to the generally low degree of evolutionary conservation of other types of AS across vertebrate species (Barbosa-Morais et al., 2012; Braunschweig et al., 2014; Merkin et al., 2012). Furthermore, comparable numbers of alternative microexons were detected in all analyzed vertebrate species, the majority of which are also strongly neural-specifically included (Figure 2E; Supplemental Information for details). Consistent with their striking regulatory conservation, sequences overlapping microexons, including both the upstream and downstream flanking intronic regions, are more highly conserved than sequences surrounding longer alternative exons (Figure 2F, S2G), including longer exons with a similar distribution of neural versus non-neural ΔPSI values (Figure S2H,I; data not shown).

Dynamic regulation of microexons during neuronal differentiation

To further investigate the functional significance of neural-regulated microexons, we used RNA-Seq data to analyze their regulation across six time points of differentiation of mouse embryonic stem (ES) cells into cortical glutamatergic neurons (Figure 3). Remarkably, of 219 neural-regulated microexons with sufficient read coverage across time points, 151 (69%) displayed a PSI switch ≥ 50 between ES cells and mature neurons, and 65 (30%) a switch of ≥ 90 (Figure 3). Unsupervised hierarchical clustering of PSI changes between consecutive time points (transitions T1 to T5) revealed several temporally-distinct regulatory patterns (Figure 3A). Most microexons show sharp PSI switches at late (T3 to T5) transitions during differentiation. These stages correspond to maturing post-mitotic neurons when pan-neuronal markers are already expressed, and are subsequent to the expression of most neurogenic transcription factors (Figure S3A). This pattern of late activation (Figure S3B) suggests enrichment for important functions for microexons in terminal neurogenesis (Figure 1C). Despite the small number of genes representing clusters of kinetically-distinct sets of regulated microexons, each cluster revealed significant enrichment of specific GO terms including “regulation of GTPase activity” (Cluster I), “glutamate receptor binding” and “actin cytoskeleton organization” (Cluster V) (Table S4). These observations indicate that the dynamic switch-like regulation of microexons is intimately associated with the maturation of neurons.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f3.jpg
Switch-like regulation of microexons during neuronal differentiation

A) Heatmap of PSI changes (ΔPSIs) between time points during differentiation of ESCs to glutamatergic neurons in vitro (Hubbard et al., 2013). Yellow/pink indicate increased/decreased PSI at a given transition (T1 to T5). Unsupervised clustering detects eight clusters of exons based on their dynamic PSI regulation (clusters I-VIII, legend). Right, top: scheme of the neuronal differentiation assay time points of sample collection, and analyzed transitions. Right, bottom: PSIs for each microexons (grey lines) in five selected clusters; red lines show the median for the cluster at each time point. B) Representative RT-PCR assays monitoring AS patterns of microexons during neuronal differentiation in Ap1s2 (9 nt), Mef2d (21 nt), Apbb1 (6 nt), Ap1b1 (21nt), Enah (12 nt) and Shank2 (9 and 21 nt). See also Figure S3.

The neural-specific splicing factor nSR100/SRRM4 regulates most neural microexons

Among several analyzed splicing regulators (Extended Experimental Procedures), knockdown and overexpression of nSR100 had the strongest effect on microexon regulation, with more than half of the profiled microexons displaying a pronounced change in inclusion level compared to controls (Figure 4A and S4A-S4H). Moreover, an analysis of RNA-Seq data from different neural cell types (Zhang et al., 2014) revealed that nSR100 has the strongest neuronal-specific expression relative to the other splicing regulators (Figure S4I, and data not shown), which is also consistent with its immunohistochemical detection in neurons but not glia (Calarco et al., 2009). Recently, we have shown that nSR100 promotes the inclusion of a subset of (longer) neural exons via binding to intronic UGC motifs proximal to suboptimal 3′ splice sites (Raj et al., 2014). Consistent with these results, and supporting a direct role for nSR100 in microexon regulation, RNA sequence tags cross-linked to nSR100 in vivo are also highly enriched in intronic sequences containing UGC motifs, located adjacent to the 3′ splice sites of nSR100-regulated microexons (Figure 4B, C; p<0.0001 for all comparisons, Wilcoxon Rank Sum test). Relative to longer exons, we additionally observe that neural-regulated microexons are associated with weak 3′ splice sites and strong 5′ splice sites (Figure S4J). nSR100 thus has a direct and extensive role in the regulation of the neural microexon program.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f4.jpg
nSR100 is a positive, direct regulator of most microexons

A) Percent of neural-regulated exons within each length class that is affected by nSR100 expression in human 293T kidney cells (absolute ΔPSI > 15 [orange] or absolute ΔPSI > 25 [red]). p-values correspond to two-sided proportion tests of affected vs. non-affected events. B) Average normalized density of nSR100 cross-linked sites in 200 nt windows encompassing neural-regulated exons of different length classes. FPB, Fragments Per Billion. C) Cumulative distribution plots indicating the position of the first UGC motif within 200 nts upstream of neural-regulated microexons and longer exons, as well as non-neural and constitutive exons. p<0.0001 for all comparisons against microexons, Wilcoxon Sum Rank test. See also Figure S4.

Distinct protein regulatory properties of microexons

Neural-regulated microexons, in particular those that are 3-15-nt long, possess multiple properties that distinguish them from longer neural-regulated exons (Figures 5 and S5). A significantly smaller fraction overlap predicted disordered amino acid residues (Figures 5A and S5A-D; p < 1.3×10; 3-way Fisher Exact tests), whereas a significantly higher fraction overlap modular protein domains (Figures 5B and S5E; ∼2-fold increase, p=1.0×10, proportion test). In contrast, microexon residues overlapping protein domains are significantly more often surface-accessible and enriched in charged residues (Figures 5C,D, and S5F-I; p<10 for all comparisons, proportion test) than are residues overlapping longer neural or non-neural exons. Moreover, when not overlapping protein domains, microexons are significantly more often located immediately adjacent (i.e. within 5 amino acids) to folded protein domains (Figures 5E and S5J,K). These results suggest that a common function of microexons may be to modulate the activity of overlapping or adjacent protein domains. Supporting this view, among 49 available and de novo-modeled tertiary protein structures containing microexons, the corresponding residues are largely surface accessible amd unlikely to significantly affect the folding of the overlapping or adjacent protein domains (Figure S6A; Extended Experimental Procedures).

An external file that holds a picture, illustration, etc.
Object name is nihms654938f5.jpg
Microexons possess distinct protein-coding features

For each analysis, values are shown for neural-regulated, 3-15 nt microexons and longer (>27 nt) exons, as well as non-neural AS exons (see Figure S5 for other types of exons). A) Percent of exons with a high average (>0.67), mid-range (0.33 to 0.67) and low disorder rate (<0.33). B) Fraction of amino acids (AA) that overlap a PFAM protein domain. C) Percent of AA within PFAM domains predicted to be on the protein surface. D) Percent of AA types based on their properties; p-values correspond to the comparison of charged (acid and basic) versus uncharged (polar and apolar) AAs. E) Percent of exons that are adjacent to a domain (within 0-5 (black) or 6-10 AAs (grey)); p-values correspond to the comparison of exons within 0-5 AAs. F) Percent of residues overlapping PFAM domains involved in linear motif or lipid binding. G) Percent of residues overlapping binding motifs predicted by ANCHOR. H) Percent of exons with proteins identified as belonging to one or more protein complexes (data from (Havugimana et al., 2012)). All p-values correspond to proportion tests except for A (3-way Fisher test) and C (Wilcoxon Sum Ranks test). See also Figure S5.

Microexons modulate the function of interaction domains

Neural-regulated microexons are significantly enriched in domains that function in peptide and lipid-binding interactions (Figures 5F and S5L; p = 1.7×10, proportion test). Overall, genes with microexons are highly enriched in modular domains involved in cellular signaling, such as SH3 and PH domains (Figure S5M). Conversely, unlike longer neural exons (Buljan et al., 2012; Ellis et al., 2012), they are depleted of linear binding motifs (Figure 5G and S5N, p<0.005, proportion tests for all comparisons). Moreover, proteins containing microexons are significantly more often central in protein-protein interaction networks and detected in protein complexes compared to proteins with other types of alternative exons (Figures 5H and S5O,P, p≤0.004 for all comparisons, Wilcoxon Rank Sum test). Taken together with the data in Figure 1, these results suggest that microexons may often regulate interaction domains to facilitate the remodeling of protein interaction networks associated with signaling and other aspects of neuronal maturation and function.

To test this hypothesis, we employed luminescence-based mammalian interactome mapping (LUMIER; (Barrios-Rodiles et al., 2005; Ellis et al., 2012)) and co-immunoprecipitation-western blot assays to investigate whether the insertion of a highly conserved, neural-regulated 6-nt microexon in the nuclear adaptor Apbb1 affects its known interactions with the histone acetyltransferase Kat5/Tip60, and amyloid precursor protein App (Figure 6A-D). Previous genetic and functional studies have revealed multiple functions for the Apbb1-Kat5 complex (Cao and Sudhoff, 2001; Stante et al., 2009), and that the loss of Kat5 activity is associated with developmental defects that impact learning and memory (Pirooznia et al., 2012; Wang et al., 2004; Wang et al., 2009) (see Discussion). Apbb1 contains two phosphotyrosine binding domains, PTB1 and PTB2, which bind Kat5 and App, respectively (Cao and Sudhoff, 2001). Exemplifying the distinct protein features of neural microexons described above (Figure 5), the Apbb1 microexon adds two charged residues (Arg and Glu) to the PTB1 domain near its predicted interaction surface (Figures 6A,B; Extended Experimental Procedures). LUMIER and co-immunoprecipitation-western analysis reveals that inclusion of the microexon significantly enhances the interaction with Kat5, whereas there is little to no effect on the interaction with App (Figures 6C,D and Figure S6B,C). Substitution of both microexon residues with alanine also enhanced the Kat5 interaction, although to a lesser extent than the presence of Arg and Glu (Figure 6C). This suggests that the primary function of this microexon is to extend the interface with which Apbb1 binds its partner proteins.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f6.jpg
Microexons regulate protein-protein interactions

A) Structural alignment of APBB1-PTB1 (pink) and APBB1-PTB2 (cyan) domains. Residues located at the protein-binding interface of APBB1-PTB2 are shown in blue. Inset shows the microexon residues in APBB1-PTB1 (E462-R463). B) Upon superimposition of APBB1-PTB1 (pink) and APBB1-PTB2 (cyan) domains, the microexon (magenta) is located close to the APBB1-PTB2 binding partner (APP protein fragment, blue), suggesting the microexon in PTB1 may affect protein binding. C) Quantification of LUMIER-normalized luciferase intensity ratio (NLIR) values for RL-tagged Apbb1, with or without the microexon, or with a mutated version consisting of two Alanine substitutions (ALA-mic.), co-immunoprecipitated with 3Flag-tagged Kat5. D, E) 293T cells were transfected HA-tagged Apbb1 (D) or AP1S2 (E) constructs, with or without the respective microexon, together with 3Flag-tagged Kat5 (D) or AP1B1 (E), as indicated. Immunoprecipitation was performed with anti-Flag (D) or anti-HA (E) antibody, and the immunoprecipitates were blotted with anti-HA or anti-Flag antibody, as indicated. Results shown in (E) were confirmed in a biological replicate experiment (Figure S6D). p-values in C and D correspond to t-tests for four and three replicates, respectively; error bars indicate standard error. Asterisk in panel E indicates a band corresponding to the light chain of the HA antibody.

We also examined the function of a 9-nt microexon in the AP1S2 subunit of the adaptor-related protein complex 1 (AP1). The AP1 complex functions in the intracellular transport of cargo proteins between the trans-Golgi apparatus and endosomes by linking clathrin to the cargo proteins during vesicle membrane formation (Kirchhausen, 2000), and is important for the somatodendritic transport of proteins required for neuronal polarity (Farias et al., 2012). Interestingly, mutations in AP1S2 have been previously implicated in phenotypic features associated with ASD and X-linked mental retardation (Borck et al., 2008; Tarpey et al., 2006). Co-immunoprecipitation-western analyses reveal that the microexon in AP1S2 strongly promotes its interaction with another AP1 subunit, AP1B1 (Figure 6E and S6D). This observation thus provides additional evidence supporting an important role for microexons in the control of protein interactions that function in neurons.

Microexons are misregulated in individuals with Autism Spectrum Disorder

The properties of microexons described above suggest that their misregulation could be associated with neurological disorders. To investigate this possibility, we analyzed RNA-Seq data from the superior temporal gyrus (Brodmann areas ba41/42/22) from post-mortem samples from individuals with ASD and control subjects, matched for age, gender and other variables (Experimental Procedures). These samples were stratified based on the strength of an ASD-associated gene expression signature (Voineagu et al., 2011), and subsets of 12 ASD samples with the strongest ASD-associated differential gene expression signatures and 12 controls were selected for further analysis. Remarkably, within these samples, 126 of 504 (30%) detected alternative microexons display a mean ΔPSI > 10 between ASD and control subjects (Figure 7A), of which 113 (90%) also display neural-differential regulation. By contrast, only 825 of 15,405 (5.4%) longer (i.e. >27 nt) exons show such misregulation (Figure 7A), of which 285 (35%) correspond to neural-regulated exons. Significant enrichment for misregulation among microexons compared to longer exons was also observed when restricting the analysis to neural-regulated exons, including subsets of neural-regulated microexons and longer exons with similar distributions of neural versus non-neural ΔPSI values (Figure S7A; p<2×10-, proportion test; data not shown). Similar results were observed when analyzing data from a different brain region (Brodmann area ba9) from the same individuals (data not shown). RT-PCR experiments on a representative subset of profiled tissues confirmed increased misregulation of microexons in autistic versus control brain samples (Figure S7B). Analysis of the proportions of microexons displaying coincident misregulation revealed that the vast majority (81.3%) have a ΔPSI>10 in at least half of the ASD stratified brain samples (Figure S7C). However, only 26.9% (32/119) of the genes containing misregulated microexons overlapped with the 2,519 genes with significant ASD-associated misregulation at the level of gene expression. This reveals that largely distinct subsets of genes are misregulated at the levels of expression and microexon splicing in the analyzed ASD subjects. In contrast, a comparison of autistic subjects that possessed a weaker ASD-related differential gene expression signature did not reveal significant misregulation of microexons, or of longer exons (data not shown). These data reveal frequent misregulation of microexon splicing in the brain cortices of some individuals with ASD.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f7.jpg
Microexons are often misregulated in ASD

A) Percent of alternative exons of each length class that are misregulated in ASD (absolute ΔPSI>10 between PSI-averaged ASD and control groups. in ba41/42/22 brain regions. Dark shading, lower inclusion in ASD; light shading, higher inclusion in ASD; p-values correspond to proportion tests. B) Expression of nSR100 across the 12 control and 12 ASD individuals, Adjusted Fragments Per Kilobase Of Exon Per Million Fragments Mapped (FPKMs) were calculated using a regression analysis that accounts for variation derived from differences in RNA integrity, brain sample batch, sequencing depth, and 5' - 3' bias in measurements of gene-level FPKM values. C) Percent of exons within each length class misregulated in autistic compared to control brains (average absolute ΔPSI>10) for nSR100-regulated (ΔPSI>25 in the nSR100-overexpressing compared to control 293T cells) and non-nSR100-regulated (absolute ΔPSI<5) exons. D) Distribution of correlation coefficients between PSIs and nSR100 expression values across stratified ASD and control samples for microexons that are (n=59) or are not (n=69) regulated by nSR100. Only microexons with sufficient read coverage to derive accurate PSI quantifications in at least 9 ASD and 9 control ba41/42/22 samples were included. p-value correspond to Wilcoxon Sum Rank test. E) GO categories significantly enriched in genes with microexons that are misregulated in ASD. F) A protein-protein interaction network involving genes with ASD misregulated microexons (ΔPSI > 10) in ba41/42/22 brain regions. Genes with major effect mutations, and smaller effect risk genes, are indicated in red and shaded ovals, respectively. Genes grouped by functional category are indicated. See also Figure S7.

Consistent with a widespread and important role for nSR100 in the regulation of microexons (Figure 4), nSR100 mRNA expression is, on average, significantly downregulated in the brains of the analyzed ASD versus control subjects, and to an even greater extent in brain samples with the strongest ASD-associated signature compared to the controls (∼10%, p=0.014, FDR<0.1, Figure 7B and data not shown). These differences were confirmed by qRT-PCR assays for a representative subset of individuals (p<2.8×10 for all normalizations; two-sided T-test; Figure S7D). Moreover, relative to other exons, nSR100-dependent microexons are significantly more often misregulated in brain tissues from ASD compared to control subjects (Figure 7C; p<0.01 for all comparisons, proportion test). Notably, we also observe significantly higher correlations between microexon inclusion and nSR100 mRNA expression levels across the stratified ASD samples and controls, for those microexons regulated by nSR100 relative to those microexons that are not regulated by this factor (Figure 7D; p=1.4×10, Wilcoxon Sum Rank test).

A GO analysis of genes with ASD-associated misregulation of microexons reveals significant enrichment of terms related to axonogenesis and synapse biology (Figure 7E), processes that have been previously implicated in autism (Gilman et al., 2011; Parikshak et al., 2013; Voineagu et al., 2011). Many of the corresponding genes act in common pathways and/or physically interact through protein-protein interactions (Figure 7F). Moreover, misregulated microexons are also significantly enriched in genes that have been genetically linked to ASD (p < 0.0005, Fisher exact test), including many relatively well-established examples such as DNTA, ANK2, ROBO1, SHANK2, AP1S2. Other genes with misregulated microexons have been linked to learning or intellectual disability (e.g. APBB1, TRAPPC9, RAB3GAP1). In this regard, it is interesting to note that the microexons we have analyzed in APBB1 and AP1S2 are significantly misregulated in the brain samples from ASD subjects (p<0.05 Wilcoxon Sum Rank test; Figure S7E). Taken together with data in Figures 5 and and6,6, the results suggest that the misregulation of microexons, as well as of longer alternative exons (Corominas et al., 2014; Voineagu et al., 2011), may impact protein interaction networks that are required for normal neuronal development and synaptic function. Disruption of microexon-regulated protein interaction networks is therefore a potentially important mechanism underlying ASD and likely other neurodevelopmental disorders.

Global features of neural-regulated AS

An RNA-Seq analysis pipeline was developed to detect and quantify all AS event classes involving all hypothetically possible splice junctions formed by the usage of annotated and unannotated splice sites, including those that demarcate microexons. By applying this pipeline to more than 50 diverse cell and tissue types each from human and mouse (Table S1), we identified ∼2,500 neural-regulated AS events in each species (Figure 1A and Table S2; Extended Experimental Procedures).

An external file that holds a picture, illustration, etc.
Object name is nihms654938f1.jpg
An extensive program of neural-regulated AS

A) Distribution by type of human AS events with increased/decreased neural inclusion of the alternative sequence. Alt3/5, alternative splice site acceptor/donor selection; IR, intron retention; Microexons, 3-27 nt exons; Single/Multi AltEx, single/multiple cassette exons. B) Predicted impact of non-neural and neural-regulated AS events on proteomes. Neural-regulated events are more often predicted to generate isoforms preserving open reading frame (ORF) when the alternative sequence is included and excluded (“ORF-preserving isoforms”, black), than to disrupt ORFs (i.e. the exon leads to a frame shift and/or introduces a premature termination codon) specifically in neural samples (“ORF disruption in brain”, dark grey) or in non-neural samples (“ORF preservation in brain”, light grey). See Extended Experimental Procedures for details. C) Enrichment map for GO and KEGG categories in genes with neural-regulated AS that are predicted to generate alternative protein isoforms (top), and representative GO terms and their associated enrichment p-value for each subnetwork (bottom). The node size is proportional to the number of genes associated with the GO category, and the width of the edges to the number of genes shared between GO categories.

Nearly half of the neural-regulated AS events, including alternative retained introns, are predicted to generate protein isoforms both when the alternative sequence is included and skipped. In contrast, only ∼20% of AS events not subject to neural regulation (hereafter ‘non-neural’ events) have the potential to generate alternative protein isoforms (Figure 1B; p=2.7×10, proportion test). Gene Ontology (GO) analysis shows that genes with neural-regulated AS events predicted to generate alternative protein isoforms form highly interconnected networks based on functions associated with neuronal biology, signaling pathways, structural components of the cytoskeleton and the plasma membrane (Figure 1C). Consistent with previous results (Fagnani et al., 2007; Pan et al., 2004), there is little overlap (8.5%) between genes with neural-regulated AS and mRNA expression, although these subsets of genes are highly enriched in overlapping GO terms (40% in common; Figure S1). These data reveal the largest program of neural-regulated AS events defined to date, and that this program is associated with a broader range of functional processes and pathways linked to nervous system biology than previously detected (Boutz et al., 2007; Fagnani et al., 2007; Ule et al., 2005).

Highly conserved microexons are frequently neural specific

Further analysis of the neural-regulated AS program revealed a striking inverse relationship between the length of an alternative exon and its propensity to be specifically included in neural tissues. Increased neural-specific inclusion was detected for the majority of microexons (length ≤ 27 nt, Figure 2A); 60.7% of alternative microexons show increased neural ‘percent spliced in’ (PSI) (ΔPSI>15) versus 9.5% of longer (average ∼135 nt) alternative exons (p=1.9×10, proportion test). This trend extends to microexons as short as 3 nt. RT-PCR validation experiments confirmed the RNA-Seq-detected regulatory profiles and inclusion levels of all (10/10) microexons analyzed across ten diverse tissues (R = 0.92, n=107; Figure S2A). To further investigate the cell and tissue type specificity of microexon regulation, we used RNA-Seq data (Sofueva et al., 2013; Zhang et al., 2014; Zhang et al., 2013) to compare their inclusion levels in major glial cell types (astrocytes, microglia and oligodendrocytes), isolated neurons, and in muscle cells and tissues. While up to ∼20% of the detected neural-regulated microexons showed increased PSIs in one or more glial cell types, and/or in muscle, compared to other non-neural tissues, the vast majority (>90%) of neural-regulated microexons display highest PSIs in neurons compared to all other cell and tissue types analyzed (Figure S2B-D, and Supplemental Information). These results indicate that tissue-regulated microexons are predominately neuronal-specific.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f2.jpg
A landscape of highly conserved neural microexons

A) Difference in exon inclusion level (ΔPSI) between the average PSIs for neural samples and non-neural samples (Y-axis) for bins of increasing exon lengths (X-axis). Microexons are defined as exons with lengths of 3-27 nt. Restricting the analysis to alternative exons with a PSI range across samples of >50 showed a similar pattern (data not shown). B) Number of exons by length whose inclusion level is higher (blue), lower (red) or not different (grey) in neural compared to non-neural samples. Short exons tend to be multiple of 3 nts and have higher inclusion in neural samples. C) Percent of neural-regulated microexons (of lengths of 3-15 and 16-27 nt) and longer exons that are predicted to generate alternative ORF-preserving isoforms (black), disrupt the ORF in/outside neural tissues (dark/light grey), or overlap non-coding sequences (white). D) Higher evolutionary conservation of alternative microexons compared to longer alternative exons at the genomic, transcriptomic (i.e. whether the exon is alternatively spliced in both species), and neural-regulatory level. Y-axis shows the percent of conservation at each specific level between human and mouse. p-values correspond to two-sided proportion tests. E) Percent of alternative microexons and longer exons that are detected as neural-regulated (average absolute ΔPSI>25) in each vertebrate species. F) Alternative 3-15 and 16-27 nt microexons show higher average phastCons scores at their intronic boundaries than longer alternative and constitutive exons. See also Figure S2.

Relative to longer alternative exons, microexons, in particular those that are 3-15-nt long and neural-specifically included, are strongly enriched in multiple features indicative of functionally important AS. They are highly enriched for lengths that are multiples of three nts (Figure 2B), and a significantly larger fraction are predicted to generate alternative protein isoforms upon inclusion and exclusion, compared with longer exons (Figure 2C; p<10, proportion test). They are also significantly more often conserved at the levels of genomic sequence, detection in alternatively spliced transcripts, and neural-differential regulation (Figure 2D; Figure S2E, neural-regulated exons; p<0.001 for all pairwise comparisons, proportion tests. Similar results were obtained when comparing neural-regulated microexons and longer exons that have matching distributions of neural versus non-neural ΔPSI values (data not shown)). Of 308 neural-regulated microexons in human, 225 (73.5%) are neural-differentially spliced in mouse, compared to only 527 of 1390 (37.9%) longer neural-regulated exons. Remarkably, while microexons represent only ∼1% of all AS events, they comprise approximately one third of all neural-regulated AS events conserved between human and mouse that are predicted to generate alternative protein isoforms (Figure S2F). Moreover, of ∼150 analyzed mammalian, neural-regulated, 3-15-nt microexons, at least 55 are deeply conserved in vertebrate species spanning 400-450 million years of evolution, from zebrafish and/or shark to human (Table S3). This is in marked contrast to the generally low degree of evolutionary conservation of other types of AS across vertebrate species (Barbosa-Morais et al., 2012; Braunschweig et al., 2014; Merkin et al., 2012). Furthermore, comparable numbers of alternative microexons were detected in all analyzed vertebrate species, the majority of which are also strongly neural-specifically included (Figure 2E; Supplemental Information for details). Consistent with their striking regulatory conservation, sequences overlapping microexons, including both the upstream and downstream flanking intronic regions, are more highly conserved than sequences surrounding longer alternative exons (Figure 2F, S2G), including longer exons with a similar distribution of neural versus non-neural ΔPSI values (Figure S2H,I; data not shown).

Dynamic regulation of microexons during neuronal differentiation

To further investigate the functional significance of neural-regulated microexons, we used RNA-Seq data to analyze their regulation across six time points of differentiation of mouse embryonic stem (ES) cells into cortical glutamatergic neurons (Figure 3). Remarkably, of 219 neural-regulated microexons with sufficient read coverage across time points, 151 (69%) displayed a PSI switch ≥ 50 between ES cells and mature neurons, and 65 (30%) a switch of ≥ 90 (Figure 3). Unsupervised hierarchical clustering of PSI changes between consecutive time points (transitions T1 to T5) revealed several temporally-distinct regulatory patterns (Figure 3A). Most microexons show sharp PSI switches at late (T3 to T5) transitions during differentiation. These stages correspond to maturing post-mitotic neurons when pan-neuronal markers are already expressed, and are subsequent to the expression of most neurogenic transcription factors (Figure S3A). This pattern of late activation (Figure S3B) suggests enrichment for important functions for microexons in terminal neurogenesis (Figure 1C). Despite the small number of genes representing clusters of kinetically-distinct sets of regulated microexons, each cluster revealed significant enrichment of specific GO terms including “regulation of GTPase activity” (Cluster I), “glutamate receptor binding” and “actin cytoskeleton organization” (Cluster V) (Table S4). These observations indicate that the dynamic switch-like regulation of microexons is intimately associated with the maturation of neurons.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f3.jpg
Switch-like regulation of microexons during neuronal differentiation

A) Heatmap of PSI changes (ΔPSIs) between time points during differentiation of ESCs to glutamatergic neurons in vitro (Hubbard et al., 2013). Yellow/pink indicate increased/decreased PSI at a given transition (T1 to T5). Unsupervised clustering detects eight clusters of exons based on their dynamic PSI regulation (clusters I-VIII, legend). Right, top: scheme of the neuronal differentiation assay time points of sample collection, and analyzed transitions. Right, bottom: PSIs for each microexons (grey lines) in five selected clusters; red lines show the median for the cluster at each time point. B) Representative RT-PCR assays monitoring AS patterns of microexons during neuronal differentiation in Ap1s2 (9 nt), Mef2d (21 nt), Apbb1 (6 nt), Ap1b1 (21nt), Enah (12 nt) and Shank2 (9 and 21 nt). See also Figure S3.

The neural-specific splicing factor nSR100/SRRM4 regulates most neural microexons

Among several analyzed splicing regulators (Extended Experimental Procedures), knockdown and overexpression of nSR100 had the strongest effect on microexon regulation, with more than half of the profiled microexons displaying a pronounced change in inclusion level compared to controls (Figure 4A and S4A-S4H). Moreover, an analysis of RNA-Seq data from different neural cell types (Zhang et al., 2014) revealed that nSR100 has the strongest neuronal-specific expression relative to the other splicing regulators (Figure S4I, and data not shown), which is also consistent with its immunohistochemical detection in neurons but not glia (Calarco et al., 2009). Recently, we have shown that nSR100 promotes the inclusion of a subset of (longer) neural exons via binding to intronic UGC motifs proximal to suboptimal 3′ splice sites (Raj et al., 2014). Consistent with these results, and supporting a direct role for nSR100 in microexon regulation, RNA sequence tags cross-linked to nSR100 in vivo are also highly enriched in intronic sequences containing UGC motifs, located adjacent to the 3′ splice sites of nSR100-regulated microexons (Figure 4B, C; p<0.0001 for all comparisons, Wilcoxon Rank Sum test). Relative to longer exons, we additionally observe that neural-regulated microexons are associated with weak 3′ splice sites and strong 5′ splice sites (Figure S4J). nSR100 thus has a direct and extensive role in the regulation of the neural microexon program.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f4.jpg
nSR100 is a positive, direct regulator of most microexons

A) Percent of neural-regulated exons within each length class that is affected by nSR100 expression in human 293T kidney cells (absolute ΔPSI > 15 [orange] or absolute ΔPSI > 25 [red]). p-values correspond to two-sided proportion tests of affected vs. non-affected events. B) Average normalized density of nSR100 cross-linked sites in 200 nt windows encompassing neural-regulated exons of different length classes. FPB, Fragments Per Billion. C) Cumulative distribution plots indicating the position of the first UGC motif within 200 nts upstream of neural-regulated microexons and longer exons, as well as non-neural and constitutive exons. p<0.0001 for all comparisons against microexons, Wilcoxon Sum Rank test. See also Figure S4.

Distinct protein regulatory properties of microexons

Neural-regulated microexons, in particular those that are 3-15-nt long, possess multiple properties that distinguish them from longer neural-regulated exons (Figures 5 and S5). A significantly smaller fraction overlap predicted disordered amino acid residues (Figures 5A and S5A-D; p < 1.3×10; 3-way Fisher Exact tests), whereas a significantly higher fraction overlap modular protein domains (Figures 5B and S5E; ∼2-fold increase, p=1.0×10, proportion test). In contrast, microexon residues overlapping protein domains are significantly more often surface-accessible and enriched in charged residues (Figures 5C,D, and S5F-I; p<10 for all comparisons, proportion test) than are residues overlapping longer neural or non-neural exons. Moreover, when not overlapping protein domains, microexons are significantly more often located immediately adjacent (i.e. within 5 amino acids) to folded protein domains (Figures 5E and S5J,K). These results suggest that a common function of microexons may be to modulate the activity of overlapping or adjacent protein domains. Supporting this view, among 49 available and de novo-modeled tertiary protein structures containing microexons, the corresponding residues are largely surface accessible amd unlikely to significantly affect the folding of the overlapping or adjacent protein domains (Figure S6A; Extended Experimental Procedures).

An external file that holds a picture, illustration, etc.
Object name is nihms654938f5.jpg
Microexons possess distinct protein-coding features

For each analysis, values are shown for neural-regulated, 3-15 nt microexons and longer (>27 nt) exons, as well as non-neural AS exons (see Figure S5 for other types of exons). A) Percent of exons with a high average (>0.67), mid-range (0.33 to 0.67) and low disorder rate (<0.33). B) Fraction of amino acids (AA) that overlap a PFAM protein domain. C) Percent of AA within PFAM domains predicted to be on the protein surface. D) Percent of AA types based on their properties; p-values correspond to the comparison of charged (acid and basic) versus uncharged (polar and apolar) AAs. E) Percent of exons that are adjacent to a domain (within 0-5 (black) or 6-10 AAs (grey)); p-values correspond to the comparison of exons within 0-5 AAs. F) Percent of residues overlapping PFAM domains involved in linear motif or lipid binding. G) Percent of residues overlapping binding motifs predicted by ANCHOR. H) Percent of exons with proteins identified as belonging to one or more protein complexes (data from (Havugimana et al., 2012)). All p-values correspond to proportion tests except for A (3-way Fisher test) and C (Wilcoxon Sum Ranks test). See also Figure S5.

Microexons modulate the function of interaction domains

Neural-regulated microexons are significantly enriched in domains that function in peptide and lipid-binding interactions (Figures 5F and S5L; p = 1.7×10, proportion test). Overall, genes with microexons are highly enriched in modular domains involved in cellular signaling, such as SH3 and PH domains (Figure S5M). Conversely, unlike longer neural exons (Buljan et al., 2012; Ellis et al., 2012), they are depleted of linear binding motifs (Figure 5G and S5N, p<0.005, proportion tests for all comparisons). Moreover, proteins containing microexons are significantly more often central in protein-protein interaction networks and detected in protein complexes compared to proteins with other types of alternative exons (Figures 5H and S5O,P, p≤0.004 for all comparisons, Wilcoxon Rank Sum test). Taken together with the data in Figure 1, these results suggest that microexons may often regulate interaction domains to facilitate the remodeling of protein interaction networks associated with signaling and other aspects of neuronal maturation and function.

To test this hypothesis, we employed luminescence-based mammalian interactome mapping (LUMIER; (Barrios-Rodiles et al., 2005; Ellis et al., 2012)) and co-immunoprecipitation-western blot assays to investigate whether the insertion of a highly conserved, neural-regulated 6-nt microexon in the nuclear adaptor Apbb1 affects its known interactions with the histone acetyltransferase Kat5/Tip60, and amyloid precursor protein App (Figure 6A-D). Previous genetic and functional studies have revealed multiple functions for the Apbb1-Kat5 complex (Cao and Sudhoff, 2001; Stante et al., 2009), and that the loss of Kat5 activity is associated with developmental defects that impact learning and memory (Pirooznia et al., 2012; Wang et al., 2004; Wang et al., 2009) (see Discussion). Apbb1 contains two phosphotyrosine binding domains, PTB1 and PTB2, which bind Kat5 and App, respectively (Cao and Sudhoff, 2001). Exemplifying the distinct protein features of neural microexons described above (Figure 5), the Apbb1 microexon adds two charged residues (Arg and Glu) to the PTB1 domain near its predicted interaction surface (Figures 6A,B; Extended Experimental Procedures). LUMIER and co-immunoprecipitation-western analysis reveals that inclusion of the microexon significantly enhances the interaction with Kat5, whereas there is little to no effect on the interaction with App (Figures 6C,D and Figure S6B,C). Substitution of both microexon residues with alanine also enhanced the Kat5 interaction, although to a lesser extent than the presence of Arg and Glu (Figure 6C). This suggests that the primary function of this microexon is to extend the interface with which Apbb1 binds its partner proteins.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f6.jpg
Microexons regulate protein-protein interactions

A) Structural alignment of APBB1-PTB1 (pink) and APBB1-PTB2 (cyan) domains. Residues located at the protein-binding interface of APBB1-PTB2 are shown in blue. Inset shows the microexon residues in APBB1-PTB1 (E462-R463). B) Upon superimposition of APBB1-PTB1 (pink) and APBB1-PTB2 (cyan) domains, the microexon (magenta) is located close to the APBB1-PTB2 binding partner (APP protein fragment, blue), suggesting the microexon in PTB1 may affect protein binding. C) Quantification of LUMIER-normalized luciferase intensity ratio (NLIR) values for RL-tagged Apbb1, with or without the microexon, or with a mutated version consisting of two Alanine substitutions (ALA-mic.), co-immunoprecipitated with 3Flag-tagged Kat5. D, E) 293T cells were transfected HA-tagged Apbb1 (D) or AP1S2 (E) constructs, with or without the respective microexon, together with 3Flag-tagged Kat5 (D) or AP1B1 (E), as indicated. Immunoprecipitation was performed with anti-Flag (D) or anti-HA (E) antibody, and the immunoprecipitates were blotted with anti-HA or anti-Flag antibody, as indicated. Results shown in (E) were confirmed in a biological replicate experiment (Figure S6D). p-values in C and D correspond to t-tests for four and three replicates, respectively; error bars indicate standard error. Asterisk in panel E indicates a band corresponding to the light chain of the HA antibody.

We also examined the function of a 9-nt microexon in the AP1S2 subunit of the adaptor-related protein complex 1 (AP1). The AP1 complex functions in the intracellular transport of cargo proteins between the trans-Golgi apparatus and endosomes by linking clathrin to the cargo proteins during vesicle membrane formation (Kirchhausen, 2000), and is important for the somatodendritic transport of proteins required for neuronal polarity (Farias et al., 2012). Interestingly, mutations in AP1S2 have been previously implicated in phenotypic features associated with ASD and X-linked mental retardation (Borck et al., 2008; Tarpey et al., 2006). Co-immunoprecipitation-western analyses reveal that the microexon in AP1S2 strongly promotes its interaction with another AP1 subunit, AP1B1 (Figure 6E and S6D). This observation thus provides additional evidence supporting an important role for microexons in the control of protein interactions that function in neurons.

Microexons are misregulated in individuals with Autism Spectrum Disorder

The properties of microexons described above suggest that their misregulation could be associated with neurological disorders. To investigate this possibility, we analyzed RNA-Seq data from the superior temporal gyrus (Brodmann areas ba41/42/22) from post-mortem samples from individuals with ASD and control subjects, matched for age, gender and other variables (Experimental Procedures). These samples were stratified based on the strength of an ASD-associated gene expression signature (Voineagu et al., 2011), and subsets of 12 ASD samples with the strongest ASD-associated differential gene expression signatures and 12 controls were selected for further analysis. Remarkably, within these samples, 126 of 504 (30%) detected alternative microexons display a mean ΔPSI > 10 between ASD and control subjects (Figure 7A), of which 113 (90%) also display neural-differential regulation. By contrast, only 825 of 15,405 (5.4%) longer (i.e. >27 nt) exons show such misregulation (Figure 7A), of which 285 (35%) correspond to neural-regulated exons. Significant enrichment for misregulation among microexons compared to longer exons was also observed when restricting the analysis to neural-regulated exons, including subsets of neural-regulated microexons and longer exons with similar distributions of neural versus non-neural ΔPSI values (Figure S7A; p<2×10-, proportion test; data not shown). Similar results were observed when analyzing data from a different brain region (Brodmann area ba9) from the same individuals (data not shown). RT-PCR experiments on a representative subset of profiled tissues confirmed increased misregulation of microexons in autistic versus control brain samples (Figure S7B). Analysis of the proportions of microexons displaying coincident misregulation revealed that the vast majority (81.3%) have a ΔPSI>10 in at least half of the ASD stratified brain samples (Figure S7C). However, only 26.9% (32/119) of the genes containing misregulated microexons overlapped with the 2,519 genes with significant ASD-associated misregulation at the level of gene expression. This reveals that largely distinct subsets of genes are misregulated at the levels of expression and microexon splicing in the analyzed ASD subjects. In contrast, a comparison of autistic subjects that possessed a weaker ASD-related differential gene expression signature did not reveal significant misregulation of microexons, or of longer exons (data not shown). These data reveal frequent misregulation of microexon splicing in the brain cortices of some individuals with ASD.

An external file that holds a picture, illustration, etc.
Object name is nihms654938f7.jpg
Microexons are often misregulated in ASD

A) Percent of alternative exons of each length class that are misregulated in ASD (absolute ΔPSI>10 between PSI-averaged ASD and control groups. in ba41/42/22 brain regions. Dark shading, lower inclusion in ASD; light shading, higher inclusion in ASD; p-values correspond to proportion tests. B) Expression of nSR100 across the 12 control and 12 ASD individuals, Adjusted Fragments Per Kilobase Of Exon Per Million Fragments Mapped (FPKMs) were calculated using a regression analysis that accounts for variation derived from differences in RNA integrity, brain sample batch, sequencing depth, and 5' - 3' bias in measurements of gene-level FPKM values. C) Percent of exons within each length class misregulated in autistic compared to control brains (average absolute ΔPSI>10) for nSR100-regulated (ΔPSI>25 in the nSR100-overexpressing compared to control 293T cells) and non-nSR100-regulated (absolute ΔPSI<5) exons. D) Distribution of correlation coefficients between PSIs and nSR100 expression values across stratified ASD and control samples for microexons that are (n=59) or are not (n=69) regulated by nSR100. Only microexons with sufficient read coverage to derive accurate PSI quantifications in at least 9 ASD and 9 control ba41/42/22 samples were included. p-value correspond to Wilcoxon Sum Rank test. E) GO categories significantly enriched in genes with microexons that are misregulated in ASD. F) A protein-protein interaction network involving genes with ASD misregulated microexons (ΔPSI > 10) in ba41/42/22 brain regions. Genes with major effect mutations, and smaller effect risk genes, are indicated in red and shaded ovals, respectively. Genes grouped by functional category are indicated. See also Figure S7.

Consistent with a widespread and important role for nSR100 in the regulation of microexons (Figure 4), nSR100 mRNA expression is, on average, significantly downregulated in the brains of the analyzed ASD versus control subjects, and to an even greater extent in brain samples with the strongest ASD-associated signature compared to the controls (∼10%, p=0.014, FDR<0.1, Figure 7B and data not shown). These differences were confirmed by qRT-PCR assays for a representative subset of individuals (p<2.8×10 for all normalizations; two-sided T-test; Figure S7D). Moreover, relative to other exons, nSR100-dependent microexons are significantly more often misregulated in brain tissues from ASD compared to control subjects (Figure 7C; p<0.01 for all comparisons, proportion test). Notably, we also observe significantly higher correlations between microexon inclusion and nSR100 mRNA expression levels across the stratified ASD samples and controls, for those microexons regulated by nSR100 relative to those microexons that are not regulated by this factor (Figure 7D; p=1.4×10, Wilcoxon Sum Rank test).

A GO analysis of genes with ASD-associated misregulation of microexons reveals significant enrichment of terms related to axonogenesis and synapse biology (Figure 7E), processes that have been previously implicated in autism (Gilman et al., 2011; Parikshak et al., 2013; Voineagu et al., 2011). Many of the corresponding genes act in common pathways and/or physically interact through protein-protein interactions (Figure 7F). Moreover, misregulated microexons are also significantly enriched in genes that have been genetically linked to ASD (p < 0.0005, Fisher exact test), including many relatively well-established examples such as DNTA, ANK2, ROBO1, SHANK2, AP1S2. Other genes with misregulated microexons have been linked to learning or intellectual disability (e.g. APBB1, TRAPPC9, RAB3GAP1). In this regard, it is interesting to note that the microexons we have analyzed in APBB1 and AP1S2 are significantly misregulated in the brain samples from ASD subjects (p<0.05 Wilcoxon Sum Rank test; Figure S7E). Taken together with data in Figures 5 and and6,6, the results suggest that the misregulation of microexons, as well as of longer alternative exons (Corominas et al., 2014; Voineagu et al., 2011), may impact protein interaction networks that are required for normal neuronal development and synaptic function. Disruption of microexon-regulated protein interaction networks is therefore a potentially important mechanism underlying ASD and likely other neurodevelopmental disorders.

Discussion

In this study, we show that alternative microexons display the highest degrees of genomic sequence conservation, tissue-specific regulatory conservation, and frame-preservation potential, relative to all other classes of AS detected to date in vertebrate species. Unlike longer neural-regulated exons, neural microexons are significantly enriched in surface-accessible, charged amino acids that overlap or lie in close proximity to protein domains, including those that bind linear motifs. Together with their remarkably dynamic regulation, these observations suggest that microexons contribute important and complementary roles to longer neural exons in the remodeling of protein interaction networks that operate during neuronal maturation.

Most microexons display high inclusion at late stages of neuronal differentiation in genes (e.g. Src (Black, 1991), Bin1, Agrn, Dock9, Shank2, Robo1) associated with axonogenesis and the formation and function of synapses. Supporting such functions, an alternative microexon overlapping the SH3A domain of Intersectin 1 (Itsn1) has been reported to promote an interaction with Dynamin 1, and was proposed to modulate roles of Itsn1 in endocytosis, cell signaling and/or actin-cytoskeleton dynamics (Dergai et al., 2010). A neural-specific microexon in Protrudin/Zfyve27 was recently shown to increase its interaction with the vesicle-associated membrane protein-associated protein (VAP), and to promote neurite outgrowth (Ohnishi et al., 2014). Similarly, in the present study, we show that a 6 nt neural microexon in Apbb1/Fe65 promotes an interaction with Kat5/Tip60. Apbb1 is an adapter protein that functions in neurite outgrowth (Cheung et al., 2014; Ikin et al., 2007) and synaptic plasticity (Sabo et al., 2003), processes that have been linked to neurological disorders including ASD (Hussman et al., 2011). Consistent with these findings, we have previously shown that nSR100 promotes neurite outgrowth (Calarco et al., 2009). In the present study we further demonstrate that it controls the switch-like regulation of most neural microexons, and that its reduced expression is linked to the altered splicing of microexons in the brains of subjects with ASD.

Many of the conserved, neural-regulated microexons identified in this study are misregulated in ASD individuals, including the microexon in AP1S2 that strongly promotes an interaction with the AP1B1 subunit of the AP1 intracellular transport complex. Intriguingly, several other genes containing microexons are genetically linked to ASD, intellectual disability and/or functions in memory and learning (see Results). Another link to ASD is the observation that nSR100 is strongly co-expressed in the developing human brain in a gene network module, M2, which is enriched for rare de novo ASD-associated mutations (Parikshak et al., 2013). Furthermore, additional genes containing microexons may have as yet undiscovered roles in ASD and or other neuropsychiatric disorders. For example, the microexon in APBB1 is also significantly misregulated in brain tissues from ASD subjects (Figure S7B,E). It is possible that the misregulation of microexons, at least in part through altered expression of nSR100, perturbs protein interaction networks required for proper neuronal maturation and function, thus contributing to ASD as well as other neurodevelopmental disorders. Consistent with this view, recent reports have begun to link individual microexons with neurodevelopmental disorders, including ASD (Zhu et al., 2014), schizophrenia (Ovadia and Shifman, 2011) and epilepsy (Rusconi et al., 2014). The discovery and characterization of widespread, neural-regulated microexons in the present study thus enables a systematic investigation of new and highly conserved mechanisms controlling protein interaction networks associated with vertebrate nervous system development and neurological disorders.

Experimental Procedures

RNA-Seq data and genomes

Unless stated otherwise, RNA-Seq data was generated from Poly(A) RNA (Table S1). Analyses used the following genome releases: Homo sapiens, hg19, Mus musculus, mm9; Gallus gallus, galGal3; Xenopus tropicalis, xenTro3; Danio rerio, danRer7; Callorhinchus milii, v1.0).

Alternative splicing analysis pipeline

A multi-module analysis pipeline was developed that uses RNA-Seq, EST and cDNA data, as well as gene annotations and evolutionary conservation, to assemble libraries of exon-exon-junctions (EEJs) for subsequent read alignment to detect and quantify AS events in RNA-Seq data. For cassette exons, three complementary modules were developed for assembling EEJs: (i) A “transcript-based module”, employing cufflinks (Trapnell et al., 2010) and alignments of ESTs and cDNAs with genomic sequence (Khare et al., 2012); (ii) A “splice site-based module”, utilizing joining of all hypothetically-possible EEJ combinations from annotated and de novo splice sites (Han et al., 2013); and (3) A “microexon module”, including de novo searching of pairs of donor and acceptor splice sites in intronic sequence. Alt3 or Alt5 events were quantified based on the fraction of reads supporting the usage of each alternative splice site. Intron retention was analyzed as recently described (Braunschweig et al., 2014). See Extended Experimental Procedures for additional details.

LUMIER assay

HEK-293T cells were transiently transfected using Polyfect (Qiagen) with Renilla Luciferase (RL)-tagged Apbb1, with or without inclusion of the microexon, or with a version consisting of two alanine substitutions, together with 3Flag-tagged Kat5. Subsequent steps were performed essentially as described previously (Ellis et al., 2012).

Immunoprecipitation and immunoblotting

HEK-293T cells were transiently transfected using Lipofectamine 2000 (Life Technologies). Cells were lysed in 0.5% TNTE. After pre-clearing with protein G-Sepharose, lysates were incubated with anti-Flag M2 antibody (Sigma) or anti-Hemagglutinin (HA)-antibody (Roche) bound to Protein-G Dynabeads (Life Technologies) for 2 hours at 4°C. Immunoprecipitates were washed 5 times with 0.1% TNTE, subjected to SDS-PAGE, transferred onto nitrocellulose and immunoblotted with the anti-Hemagglutinin (HA)-antibody (Roche) or anti-Flag M2 antibody (Sigma). Detection was achieved using horseradish peroxidase-conjugated rabbit anti-rat (Sigma) or sheep anti-mouse secondary antibodies (GE Healthcare) and chemiluminescence. ImageJ was used for quantification of band intensities.

Analysis of microexon regulation

Available RNA-Seq data from splicing factor-deficient or -overexpressing systems were used to identify misregulated exons and microexons (see Extended Experimental Procedures). To investigate regulation by nSR100, we used PAR-iCLIP data and motif enrichments analyses, as recently described (Raj et al., 2014).

Comparison of ASD and control brain samples

We analyzed 24 autistic individuals and 24 controls matched by age and gender. Samples from superior temporal gyrus (Brodmann areas ba41/42/22) were dissected retaining grey matter from all cortical layers, and RNA was isolated using the miRNeasy kit (Qiagen). Ribosomal RNA was depleted from 2ug total RNA with the Ribo-Zero Gold kit (Epicentre), and then size-selected with AMPure XP beads (Beckman Coulter). An average of 64 million, 50bp paired-end reads were generated for each sample (Table S1). The 12 samples with the strongest ASD-associated differential gene expression signature and 12 control samples with a signal that is closest to the median of all controls were selected for downstream analyses (Extended Experimental Procedures for details). Sample selection was independent of any information on splicing changes.

RNA-Seq data and genomes

Unless stated otherwise, RNA-Seq data was generated from Poly(A) RNA (Table S1). Analyses used the following genome releases: Homo sapiens, hg19, Mus musculus, mm9; Gallus gallus, galGal3; Xenopus tropicalis, xenTro3; Danio rerio, danRer7; Callorhinchus milii, v1.0).

Alternative splicing analysis pipeline

A multi-module analysis pipeline was developed that uses RNA-Seq, EST and cDNA data, as well as gene annotations and evolutionary conservation, to assemble libraries of exon-exon-junctions (EEJs) for subsequent read alignment to detect and quantify AS events in RNA-Seq data. For cassette exons, three complementary modules were developed for assembling EEJs: (i) A “transcript-based module”, employing cufflinks (Trapnell et al., 2010) and alignments of ESTs and cDNAs with genomic sequence (Khare et al., 2012); (ii) A “splice site-based module”, utilizing joining of all hypothetically-possible EEJ combinations from annotated and de novo splice sites (Han et al., 2013); and (3) A “microexon module”, including de novo searching of pairs of donor and acceptor splice sites in intronic sequence. Alt3 or Alt5 events were quantified based on the fraction of reads supporting the usage of each alternative splice site. Intron retention was analyzed as recently described (Braunschweig et al., 2014). See Extended Experimental Procedures for additional details.

LUMIER assay

HEK-293T cells were transiently transfected using Polyfect (Qiagen) with Renilla Luciferase (RL)-tagged Apbb1, with or without inclusion of the microexon, or with a version consisting of two alanine substitutions, together with 3Flag-tagged Kat5. Subsequent steps were performed essentially as described previously (Ellis et al., 2012).

Immunoprecipitation and immunoblotting

HEK-293T cells were transiently transfected using Lipofectamine 2000 (Life Technologies). Cells were lysed in 0.5% TNTE. After pre-clearing with protein G-Sepharose, lysates were incubated with anti-Flag M2 antibody (Sigma) or anti-Hemagglutinin (HA)-antibody (Roche) bound to Protein-G Dynabeads (Life Technologies) for 2 hours at 4°C. Immunoprecipitates were washed 5 times with 0.1% TNTE, subjected to SDS-PAGE, transferred onto nitrocellulose and immunoblotted with the anti-Hemagglutinin (HA)-antibody (Roche) or anti-Flag M2 antibody (Sigma). Detection was achieved using horseradish peroxidase-conjugated rabbit anti-rat (Sigma) or sheep anti-mouse secondary antibodies (GE Healthcare) and chemiluminescence. ImageJ was used for quantification of band intensities.

Analysis of microexon regulation

Available RNA-Seq data from splicing factor-deficient or -overexpressing systems were used to identify misregulated exons and microexons (see Extended Experimental Procedures). To investigate regulation by nSR100, we used PAR-iCLIP data and motif enrichments analyses, as recently described (Raj et al., 2014).

Comparison of ASD and control brain samples

We analyzed 24 autistic individuals and 24 controls matched by age and gender. Samples from superior temporal gyrus (Brodmann areas ba41/42/22) were dissected retaining grey matter from all cortical layers, and RNA was isolated using the miRNeasy kit (Qiagen). Ribosomal RNA was depleted from 2ug total RNA with the Ribo-Zero Gold kit (Epicentre), and then size-selected with AMPure XP beads (Beckman Coulter). An average of 64 million, 50bp paired-end reads were generated for each sample (Table S1). The 12 samples with the strongest ASD-associated differential gene expression signature and 12 control samples with a signal that is closest to the median of all controls were selected for downstream analyses (Extended Experimental Procedures for details). Sample selection was independent of any information on splicing changes.

Supplementary Material

1

Figure S1 – Relationship between neural regulation at the alternative splicing (AS) and gene expression (GE) levels. Related toFigure 1. A) Overlap between differentially regulated genes at each level of regulation (GE and AS). Only 8.5% of the genes undergoing neural-regulated AS also display neural regulation at the GE level. B) Overlap of significantly enriched GO terms (Benjamini corrected p-value < 0.01) for genes that are significantly differentially upregulated at the mRNA steady-state levels in neural samples (“GE upregulated”) and genes that harbor AS events that are differentially regulated in neural vs. non-neural samples and are predicted to generate alternative ORF-preserving isoforms (“Alt. protein isoforms”). Over 40% of the GO categories enriched among the genes with neural-regulated AS are shared with those of genes upregulated at the GE level in neural tissues. p-values correspond to hypergeometric tests.

10

Table S2 – Neural-regulated AS events in human and mouse. Related toFigure 1. Full coordinate: chromosome: C1 donor, AS exon, C2 acceptor. Strand is “+” if C1 donor coordinate is smaller than C2 acceptor coordinate, and “-” otherwise. Alt3/Alt5, alternative splice site acceptor/donor selection; IR, intron retention; AltEx, cassette alternative exons (including microexons when length ≤ 27 nt).

11

Table S3 – Highly conserved neural-regulated microexons across vertebrate species. Related toFigure 2. For each species in which the orthologous microexon can be identified, it shows the average PSI in neural ('av_NEURAL') and non-neural ('av_REST') samples, as well as the sequence of the microexon (A), and constitutive upstream (C1) and downstream (C2) exons.

12

Table S4 – Gene Ontology analyses for clusters of microexons based on their PSI dynamics during neuronal differentiation. Related to Figure 3.

13

Table S5 – Features of human alternative microexons. Related toFigure 5. In “Basic Protein Information”, EEJ=# in start and stop positions represents insertion point of exon that could not be mapped to an Ensembl protein (i.e. position of the exon-exon junction between C1 and C2 exons). In “Protein features (various)”, the annotation with “NA-START-END” represents the annotation of C1 and C2 exons combined (i.e. with the A exon removed). Empty cell indicates that the A exon could be mapped, but no specific annotation for the feature was present. In “Disorder Predictions”, predictions are provided for Disopred2 and IUPred. In “Protein Domain Information”, the UniProt annotation starts with the assigned position within the canonical UniProt protein (either length of A exon or where exon is inserted) and afterwards is comma separated with assignments from UniProt Annotation. In addition to UniProt and Pfam annotations, de novo Pfam and PROSITE predictions are provided for C1, A and C2 exons. Columns “Dist_5” and “Dist_3” indicate the distance to the closest domain upstream and downstream, respectively (OV, overlapping a domain). References: Pfam (Finn et al., 2014); PhosphoSite (Hornbeck et al., 2004); Transmembrane predictions, TMHMM (Krogh et al., 2001); ELM (Dinkel et al., 2014); SignalP (Petersen et al., 2011); COIL (Lupas et al., 1991); Disorder, IUPred (Dosztányi et al., 2005), Disopred2 (Ward et al., 2004); CAST (Promponas et al., 2000); ANCHOR (Dosztányi et al., 2009); UniProt (Consortium., 2014).

14

Table S6 – Inclusion levels for human microexons across 52 cell and tissues types. Related toFigure 1. For each sample, two columns are provided: percent inclusion (PSI) and coverage (in order of increasing read coverage, N<VLOW<LOW<OK<SOK; VLOW corresponds to the “minimum coverage” criteria specified in Extended Experimental Procedures).

2

Figure S2 – Impact on protein and evolutionary conservation of neural-regulated exons. Related toFigure 2. A) Representative RT-PCR assays monitoring AS patterns of microexons in Vav2, Rapgef6, Itsn1, Rims2, Abi1, Ptprd, Nbea, Zmynd8, Ppfia2 and Dnm2 (non-neural) in mouse neural (hippocampus, cerebellum and spinal cord), muscle-related (heart and skeletal muscle) and other (stomach, liver, spleen, kidney and testis) tissues. Molecular weight markers are indicated. B) For each sample, proportion of neural-regulated microexons that show inclusion levels similar to neural (blue) or non-neural (red) samples (see Extended Experimental Procedures for details). C) PSI distributions for neural-regulated microexons with increased neural inclusion for different classes of cell and tissue types. For clarity, outliers are not shown. D) Heatmap of PSI changes (ΔPSIs) between time points during differentiation of C2C12 myoblasts to myotubes in vitro (Trapnell et al., 2010). Yellow/pink indicate increased/decreased PSI at a given transition (T1 to T3). Unsupervised clustering detects a cluster of 17 microexons with increased PSI during differentiation, particularly at T1. Right inset: PSIs for each microexons (grey lines) in the highlighted cluster; red line shows the median PSI at each time point. E) Higher evolutionary conservation of human neural 3-15-nt (dark blue) and 16-27-nt (lighter blue) microexons compared to longer neural exons (light blue) at the genomic, transcriptomic and neural regulatory level. Y-axis shows the percent of conservation between human and mouse. p-values correspond to proportion tests. F) Contribution of each type of AS to events with conserved neural regulation between human and mouse, according to their predicted impact on proteomes. Microexons comprise approximately one third of all conserved neural-regulated events predicted to generate alternative protein isoforms. G) Distributions of average phastCons scores for exonic sequences of alternative microexons and long exons, as well as constitutive exons. H) Distributions of average phastCons scores for exonic sequences of neural-regulated microexons and long exons, as well as non-neural alternative exons and constitutive exons. p-values for G and H correspond to Wilcoxon Rank Sum tests. I) Average phastCons scores for neighboring intronic sequences of neural-regulated microexons and longer exons, as well as non-neural alternative exons and constitutive exons. Only exons conserved at the genomic level between human and mouse were used for this analysis.

3

Figure S3 - Switch-like regulation of microexons during neuronal differentiation. Related toFigure 3. A) Heatmap showing relative gene expression levels for key ESC and neural markers, including proneural genes (Neurog2 to Pax6) and post-mitotic neuronal markers (Elavl3/HuC and Rbfox3/NeuN). B) Distribution of relative ΔPSI (ΔPSI divided by the PSI range across the six time points) for neural microexons and longer exons at each transition.

4

Figure S4 - Regulation of neural-regulated exons and microexons by splicing factors. Related toFigure 4. Percent of neural-regulated exons within each length class that is affected at 15 < |ΔPSI| < 25 (orange) and |ΔPSI| > 25 (red) by (A) RBFOX1 knock down in human neural precursor cells; (B) MBNL1 and MBNL2 double knock down in human HeLa cells; (C) ESRP1 knock down in human PNT2 cells; (D) nSR100/Srrm4 knock down in mouse N2A cells; (E) Ptbp1 knock down in mouse N2A cells; (F) Ptbp2 knock out in mouse cortex (P1 stage); (G) Ptbp2 knock out in mouse embryonic brain (18.5 days post conception); and (H) Rbfox1 knock out. p-values correspond to two-sided proportion tests of regulated vs non-regulated events. I) Expression of nSR100 in different isolated brain cell types (Zhang et al., 2014). J) Box plots comparing the 3′ and 5′ splice site strengths of neural 3-15 nt (dark blue) and 16-27 nt (light blue) microexons, longer (>27 nt, cyan) exons, non-neural alternative exons (grey), and constitutive exons (black).

5

Figure S5 - Protein features of different exon classes. Related toFigure 5. For each analysis, values are shown for neural 3-15 nt (dark blue) and 16-27 nt (light blue) microexons and longer (>27 nt, cyan) exons, as well as non-neural AS exons (grey) and constitutive exons (black). A, B) Percent of exons with high (average disorder rate >0.67), mid (between 0.33 and 0.67) and low (<0.33) disorder calculated using Disopred2 (A) or IUPred (B); p-values correspond to 3-way Fisher tests. C) Average disorder rate calculated using Disopred2 for each group of exons, as well as their neighboring upstream (C1, left) and downstream (C2, right) exons. D) Distribution of disorder rate across exon groups, calculated by IUPred. E) Percent of residues that overlap a PFAM protein domain. p-values correspond to proportion tests. F) Percent of AA within PFAM domains predicted to be in the protein surface using NetSurfP; p-values correspond to Wilcoxon Sum Ranks test. G) Accessible surface area score, based on the subset of exons with available crystal structures in PDB; p-values correspond to Wilcoxon Sum Ranks test. H) Percent of AA groups based on their properties; p-values correspond to proportion tests for the comparison of charged (acid and basic) versus uncharged (polar and apolar) AAs. I) Significantly enriched (Glu, Lys, Arg) or depleted (Pro, Thr) AAs in microexons compared to other exon types. Asterisks correspond to different levels of statistical significance (*, p<0.05; **, p<0.01; ***, p<0.001) in a proportion test. J) Percent of exons that fall nearby PFAM protein domains, without overlap. Black, within 0-5 AAs; grey, within 6-10 AAs. p-values correspond to proportion tests for exons within 0-5 AAs of a domain. K) Cumulative distance of exons that do not overlap domains with the nearest protein domain. Exons in proteins with no predicted PFAM domain are excluded. L) Percent of residues overlapping PFAM domains involved in linear motif or lipid binding (Extended Experimental Procedures); p-values correspond to proportion tests. M) PFAM protein domains enriched in genes containing microexons. Color scheme: red, protein binding GO (GO:0005515); dark pink, specific protein binding GO terms; orange, other binding GO terms; black, no associated GO terms. Y-axis corresponds to p-values from DAVID. N) Percent of residues overlapping ANCHOR binding motifs; p-values correspond to proportion tests. O) Degree (number of interactors in PPI networks) of proteins containing different types of exons. Degree values obtained from (Ellis et al., 2012). p-values correspond to Wilcoxon Sum Ranks test. P) Percent of exons in which the containing proteins have been identified as part of protein complexes (data from (Havugimana et al., 2012)); p-values correspond to proportion tests.

6

Figure S6 – Location of microexons in protein structures. Related toFigure 6. A) Selection of available protein structures from PDB and SWISS-MODEL, and modeled structures using Phyre2 containing neural-regulated microexons (in red). The number of residues of each microexon is indicated in parenthesis. B, D) 293T cells were transfected HA-tagged Apbb1 (B) or AP1S2 (D) constructs, with or without the microexon, together with 3Flag-tagged App (B) or AP1B1 (D), as indicated. Immunoprecipitation was performed with anti-Flag antibody or anti-HA antibody, as indicated. C) Quantification of LUMIER-normalized luciferase intensity ratio (NLIR) values for RL-tagged Apbb1, with or without the microexon, co-immunoprecipitated with 3Flag-tagged App. p-values in B and C correspond to t-tests for three replicates, respectively; error bars indicate standard error.

7

Figure S7 – Microexons are often misregulated in ASD. Related toFigure 7. A) Percent of neural-regulated exons by length groups that are misregulated in ASD (|ΔPSI| > 10 between averaged ASD and control groups) in ba41/42/22 brain region. p-values correspond to proportion tests. B) Representative RT-PCRs for microexons misregulated in ba41/42/22 and ba9 regions from ASD versus control individuals (see Table S1). Bottom: boxplot of isoform quantifications from RT-PCR assays for 10 microexons in control (n=70 data points) and ASD (n=80 data points) individuals. p-value from Wilcoxon Sum Ranks test. C) Heatmap and unsupervised clustering of z-scores of PSIs for microexons misregulated in ASD individuals with sufficient read coverage in at least 9 ASD and 9 control ba41/42/22 samples (n=64), and of nSR100 expression values. Conditions: ASD (red), control (green). Asterisks indicate individual samples used in RT-PCR and qRT-PCR analyses (panels B and D). D) qRT-PCR quantifications of nSR100 expression in four ASD and three control ba41/42/22 samples (see panel C) normalized for three different housekeeping genes. p-values correspond to two-sided t-tests. E) PSI distributions of the 6-nt and 9-nt microexons in APBB1 and AP1S2, respectively, in control (green) and ASD (red) individuals; p-values from Wilcoxon Sum Ranks test.

8

9

Table S1 – RNA-Seq samples used in this study. Related to Figures 1--33 and and77.

1

Figure S1 – Relationship between neural regulation at the alternative splicing (AS) and gene expression (GE) levels. Related toFigure 1. A) Overlap between differentially regulated genes at each level of regulation (GE and AS). Only 8.5% of the genes undergoing neural-regulated AS also display neural regulation at the GE level. B) Overlap of significantly enriched GO terms (Benjamini corrected p-value < 0.01) for genes that are significantly differentially upregulated at the mRNA steady-state levels in neural samples (“GE upregulated”) and genes that harbor AS events that are differentially regulated in neural vs. non-neural samples and are predicted to generate alternative ORF-preserving isoforms (“Alt. protein isoforms”). Over 40% of the GO categories enriched among the genes with neural-regulated AS are shared with those of genes upregulated at the GE level in neural tissues. p-values correspond to hypergeometric tests.

Click here to view.(287K, pdf)

10

Table S2 – Neural-regulated AS events in human and mouse. Related toFigure 1. Full coordinate: chromosome: C1 donor, AS exon, C2 acceptor. Strand is “+” if C1 donor coordinate is smaller than C2 acceptor coordinate, and “-” otherwise. Alt3/Alt5, alternative splice site acceptor/donor selection; IR, intron retention; AltEx, cassette alternative exons (including microexons when length ≤ 27 nt).

Click here to view.(413K, xlsx)

11

Table S3 – Highly conserved neural-regulated microexons across vertebrate species. Related toFigure 2. For each species in which the orthologous microexon can be identified, it shows the average PSI in neural ('av_NEURAL') and non-neural ('av_REST') samples, as well as the sequence of the microexon (A), and constitutive upstream (C1) and downstream (C2) exons.

Click here to view.(129K, xlsx)

12

Table S4 – Gene Ontology analyses for clusters of microexons based on their PSI dynamics during neuronal differentiation. Related to Figure 3.

Click here to view.(50K, xlsx)

13

Table S5 – Features of human alternative microexons. Related toFigure 5. In “Basic Protein Information”, EEJ=# in start and stop positions represents insertion point of exon that could not be mapped to an Ensembl protein (i.e. position of the exon-exon junction between C1 and C2 exons). In “Protein features (various)”, the annotation with “NA-START-END” represents the annotation of C1 and C2 exons combined (i.e. with the A exon removed). Empty cell indicates that the A exon could be mapped, but no specific annotation for the feature was present. In “Disorder Predictions”, predictions are provided for Disopred2 and IUPred. In “Protein Domain Information”, the UniProt annotation starts with the assigned position within the canonical UniProt protein (either length of A exon or where exon is inserted) and afterwards is comma separated with assignments from UniProt Annotation. In addition to UniProt and Pfam annotations, de novo Pfam and PROSITE predictions are provided for C1, A and C2 exons. Columns “Dist_5” and “Dist_3” indicate the distance to the closest domain upstream and downstream, respectively (OV, overlapping a domain). References: Pfam (Finn et al., 2014); PhosphoSite (Hornbeck et al., 2004); Transmembrane predictions, TMHMM (Krogh et al., 2001); ELM (Dinkel et al., 2014); SignalP (Petersen et al., 2011); COIL (Lupas et al., 1991); Disorder, IUPred (Dosztányi et al., 2005), Disopred2 (Ward et al., 2004); CAST (Promponas et al., 2000); ANCHOR (Dosztányi et al., 2009); UniProt (Consortium., 2014).

Click here to view.(480K, xlsx)

14

Table S6 – Inclusion levels for human microexons across 52 cell and tissues types. Related toFigure 1. For each sample, two columns are provided: percent inclusion (PSI) and coverage (in order of increasing read coverage, N<VLOW<LOW<OK<SOK; VLOW corresponds to the “minimum coverage” criteria specified in Extended Experimental Procedures).

Click here to view.(492K, xlsx)

2

Figure S2 – Impact on protein and evolutionary conservation of neural-regulated exons. Related toFigure 2. A) Representative RT-PCR assays monitoring AS patterns of microexons in Vav2, Rapgef6, Itsn1, Rims2, Abi1, Ptprd, Nbea, Zmynd8, Ppfia2 and Dnm2 (non-neural) in mouse neural (hippocampus, cerebellum and spinal cord), muscle-related (heart and skeletal muscle) and other (stomach, liver, spleen, kidney and testis) tissues. Molecular weight markers are indicated. B) For each sample, proportion of neural-regulated microexons that show inclusion levels similar to neural (blue) or non-neural (red) samples (see Extended Experimental Procedures for details). C) PSI distributions for neural-regulated microexons with increased neural inclusion for different classes of cell and tissue types. For clarity, outliers are not shown. D) Heatmap of PSI changes (ΔPSIs) between time points during differentiation of C2C12 myoblasts to myotubes in vitro (Trapnell et al., 2010). Yellow/pink indicate increased/decreased PSI at a given transition (T1 to T3). Unsupervised clustering detects a cluster of 17 microexons with increased PSI during differentiation, particularly at T1. Right inset: PSIs for each microexons (grey lines) in the highlighted cluster; red line shows the median PSI at each time point. E) Higher evolutionary conservation of human neural 3-15-nt (dark blue) and 16-27-nt (lighter blue) microexons compared to longer neural exons (light blue) at the genomic, transcriptomic and neural regulatory level. Y-axis shows the percent of conservation between human and mouse. p-values correspond to proportion tests. F) Contribution of each type of AS to events with conserved neural regulation between human and mouse, according to their predicted impact on proteomes. Microexons comprise approximately one third of all conserved neural-regulated events predicted to generate alternative protein isoforms. G) Distributions of average phastCons scores for exonic sequences of alternative microexons and long exons, as well as constitutive exons. H) Distributions of average phastCons scores for exonic sequences of neural-regulated microexons and long exons, as well as non-neural alternative exons and constitutive exons. p-values for G and H correspond to Wilcoxon Rank Sum tests. I) Average phastCons scores for neighboring intronic sequences of neural-regulated microexons and longer exons, as well as non-neural alternative exons and constitutive exons. Only exons conserved at the genomic level between human and mouse were used for this analysis.

Click here to view.(2.1M, pdf)

3

Figure S3 - Switch-like regulation of microexons during neuronal differentiation. Related toFigure 3. A) Heatmap showing relative gene expression levels for key ESC and neural markers, including proneural genes (Neurog2 to Pax6) and post-mitotic neuronal markers (Elavl3/HuC and Rbfox3/NeuN). B) Distribution of relative ΔPSI (ΔPSI divided by the PSI range across the six time points) for neural microexons and longer exons at each transition.

Click here to view.(220K, pdf)

4

Figure S4 - Regulation of neural-regulated exons and microexons by splicing factors. Related toFigure 4. Percent of neural-regulated exons within each length class that is affected at 15 < |ΔPSI| < 25 (orange) and |ΔPSI| > 25 (red) by (A) RBFOX1 knock down in human neural precursor cells; (B) MBNL1 and MBNL2 double knock down in human HeLa cells; (C) ESRP1 knock down in human PNT2 cells; (D) nSR100/Srrm4 knock down in mouse N2A cells; (E) Ptbp1 knock down in mouse N2A cells; (F) Ptbp2 knock out in mouse cortex (P1 stage); (G) Ptbp2 knock out in mouse embryonic brain (18.5 days post conception); and (H) Rbfox1 knock out. p-values correspond to two-sided proportion tests of regulated vs non-regulated events. I) Expression of nSR100 in different isolated brain cell types (Zhang et al., 2014). J) Box plots comparing the 3′ and 5′ splice site strengths of neural 3-15 nt (dark blue) and 16-27 nt (light blue) microexons, longer (>27 nt, cyan) exons, non-neural alternative exons (grey), and constitutive exons (black).

Click here to view.(545K, pdf)

5

Figure S5 - Protein features of different exon classes. Related toFigure 5. For each analysis, values are shown for neural 3-15 nt (dark blue) and 16-27 nt (light blue) microexons and longer (>27 nt, cyan) exons, as well as non-neural AS exons (grey) and constitutive exons (black). A, B) Percent of exons with high (average disorder rate >0.67), mid (between 0.33 and 0.67) and low (<0.33) disorder calculated using Disopred2 (A) or IUPred (B); p-values correspond to 3-way Fisher tests. C) Average disorder rate calculated using Disopred2 for each group of exons, as well as their neighboring upstream (C1, left) and downstream (C2, right) exons. D) Distribution of disorder rate across exon groups, calculated by IUPred. E) Percent of residues that overlap a PFAM protein domain. p-values correspond to proportion tests. F) Percent of AA within PFAM domains predicted to be in the protein surface using NetSurfP; p-values correspond to Wilcoxon Sum Ranks test. G) Accessible surface area score, based on the subset of exons with available crystal structures in PDB; p-values correspond to Wilcoxon Sum Ranks test. H) Percent of AA groups based on their properties; p-values correspond to proportion tests for the comparison of charged (acid and basic) versus uncharged (polar and apolar) AAs. I) Significantly enriched (Glu, Lys, Arg) or depleted (Pro, Thr) AAs in microexons compared to other exon types. Asterisks correspond to different levels of statistical significance (*, p<0.05; **, p<0.01; ***, p<0.001) in a proportion test. J) Percent of exons that fall nearby PFAM protein domains, without overlap. Black, within 0-5 AAs; grey, within 6-10 AAs. p-values correspond to proportion tests for exons within 0-5 AAs of a domain. K) Cumulative distance of exons that do not overlap domains with the nearest protein domain. Exons in proteins with no predicted PFAM domain are excluded. L) Percent of residues overlapping PFAM domains involved in linear motif or lipid binding (Extended Experimental Procedures); p-values correspond to proportion tests. M) PFAM protein domains enriched in genes containing microexons. Color scheme: red, protein binding GO (GO:0005515); dark pink, specific protein binding GO terms; orange, other binding GO terms; black, no associated GO terms. Y-axis corresponds to p-values from DAVID. N) Percent of residues overlapping ANCHOR binding motifs; p-values correspond to proportion tests. O) Degree (number of interactors in PPI networks) of proteins containing different types of exons. Degree values obtained from (Ellis et al., 2012). p-values correspond to Wilcoxon Sum Ranks test. P) Percent of exons in which the containing proteins have been identified as part of protein complexes (data from (Havugimana et al., 2012)); p-values correspond to proportion tests.

Click here to view.(635K, pdf)

6

Figure S6 – Location of microexons in protein structures. Related toFigure 6. A) Selection of available protein structures from PDB and SWISS-MODEL, and modeled structures using Phyre2 containing neural-regulated microexons (in red). The number of residues of each microexon is indicated in parenthesis. B, D) 293T cells were transfected HA-tagged Apbb1 (B) or AP1S2 (D) constructs, with or without the microexon, together with 3Flag-tagged App (B) or AP1B1 (D), as indicated. Immunoprecipitation was performed with anti-Flag antibody or anti-HA antibody, as indicated. C) Quantification of LUMIER-normalized luciferase intensity ratio (NLIR) values for RL-tagged Apbb1, with or without the microexon, co-immunoprecipitated with 3Flag-tagged App. p-values in B and C correspond to t-tests for three replicates, respectively; error bars indicate standard error.

Click here to view.(12M, pdf)

7

Figure S7 – Microexons are often misregulated in ASD. Related toFigure 7. A) Percent of neural-regulated exons by length groups that are misregulated in ASD (|ΔPSI| > 10 between averaged ASD and control groups) in ba41/42/22 brain region. p-values correspond to proportion tests. B) Representative RT-PCRs for microexons misregulated in ba41/42/22 and ba9 regions from ASD versus control individuals (see Table S1). Bottom: boxplot of isoform quantifications from RT-PCR assays for 10 microexons in control (n=70 data points) and ASD (n=80 data points) individuals. p-value from Wilcoxon Sum Ranks test. C) Heatmap and unsupervised clustering of z-scores of PSIs for microexons misregulated in ASD individuals with sufficient read coverage in at least 9 ASD and 9 control ba41/42/22 samples (n=64), and of nSR100 expression values. Conditions: ASD (red), control (green). Asterisks indicate individual samples used in RT-PCR and qRT-PCR analyses (panels B and D). D) qRT-PCR quantifications of nSR100 expression in four ASD and three control ba41/42/22 samples (see panel C) normalized for three different housekeeping genes. p-values correspond to two-sided t-tests. E) PSI distributions of the 6-nt and 9-nt microexons in APBB1 and AP1S2, respectively, in control (green) and ASD (red) individuals; p-values from Wilcoxon Sum Ranks test.

Click here to view.(8.0M, pdf)

8

Click here to view.(236K, docx)

9

Table S1 – RNA-Seq samples used in this study. Related to Figures 1--33 and and77.

Click here to view.(40K, xlsx)

Acknowledgments

The authors thank the Eunice Kennedy Shriver NICHD Brain and Tissue Bank for Developmental Disorders, the Autism Tissue Program, and the Harvard Brain Tissue Resource Center for providing brain samples. Dax Torti and Danica Leung of the Donnelly Sequencing Centre are gratefully acknowledged for sequencing samples. The authors also thank Xinchen Wang for initial contributions to the RNA-Seq analysis pipeline, Ulrich Braunschweig for assistance with CLIP-Seq analyses, Benjamin Lang for advice on surface accessibility measurements, Nuno Barbosa-Morais for guidance on statistical testing, and Serge Gueroussov and Jonathan Roth for helpful discussions and comments on the manuscript. MI holds a LTF from the Human Frontiers Science Program Organization. RJW holds a Canadian Institute of Health Research (CIHR) Postdoctoral Fellowship. NNP holds an NIMH NRSA fellowship. MB is supported by a fellowship from the Department of Cell and Systems Biology, University of Toronto. MQV holds a Banting and Best CIHR Scholarship. TGP is supported by fellowships from EMBO and OSCI. This research was supported by grants from the CIHR (BJB, JLW, FPR, SPC), Ontario Research Fund (JLW, BJB and others), Alzheimer's Research Foundation (BJB), University of Toronto McLaughlin Centre (BJB), NIH/NHGRI (FPR, P50 HG004233), NIMH (DHG, 5R37MH060233 and 5R01MH094714), and the Simons Foundation (DHG, SFARI 206744). BJB holds the Banbury Chair of Medical Research at the University of Toronto.

Donnelly Centre, University of Toronto, 160 College St., Toronto, Ontario M5S 3E1, Canada
EMBL/CRG Research Unit in Systems Biology, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, 08003, Spain
MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, 695 Charles E. Young Dr. South, Los Angeles, CA 90095, USA
Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
Centre for Systems Biology, Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 982 - 600 University Ave., Toronto, Ontario M5G 1X5, Canada
Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, Ontario M5S 1A8, Canada
Department of Computer Science, University of Toronto, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
Correspondence to: Benjamin J. Blencowe, PhD and Manuel Irimia, PhD, Donnelly Centre, University of Toronto, Toronto, ON, Canada, Office 416-978-3016; Fax 416-946-5545, ac.otnorotu@ewocnelb.b, moc.liamg@aimirim
Corresponding authors
Publisher's Disclaimer

Summary

Alternative splicing (AS) generates vast transcriptomic and proteomic complexity. However, which of the myriad of detected AS events provide important biological functions is not well understood. Here, we define the largest program of functionally coordinated, neural-regulated AS described to date in mammals. Relative to all other types of AS within this program, 3-15 nucleotide ‘microexons’ display the most striking evolutionary conservation and switch-like regulation. These microexons modulate the function of interaction domains of proteins involved in neurogenesis. Most neural microexons are regulated by the neuronal-specific splicing factor nSR100/SRRM4, through its binding to adjacent intronic enhancer motifs. Neural microexons are frequently misregulated in the brains of individuals with autism spectrum disorder, and this misregulation is associated with reduced levels of nSR100. The results thus reveal a highly conserved program of dynamic microexon regulation associated with the remodeling of protein interaction networks during neurogenesis, the misregulation of which is linked to autism.

Summary

Footnotes

Author Contributions: M.I. developed the RNA-Seq analysis pipeline and performed analyses in Figures 1--5,5, and and7.7. R.J.W., J.E. and N.N.P. contributed equally to this study, performing analyses of microexon protein sequence features (Figure 5), protein-interaction experiments (Figure 6) and analyses of autism patient RNA-Seq data (Figure 7), respectively. T. G.-P. and M.Q.-V. performed RT-PCR assays. M.B., and J.T. analyzed and modeled protein structural data. B.R. generated RNA-Seq datasets. D.O’H. assisted with cloning and protein interaction assays. M.B.-R. optimized LUMIER assays. M.J.E.S., S.P.C, F.P.R., J.L.W. and D.H.G. supervised experiments and analyses. M.I. and B.J.B. designed the study and wrote the paper, with input from the other authors.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Footnotes
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.