Information

How cotranscription events affect splicing?


I want to know how and how often does cootranscription takes place? Can transcription regulators influence splicing during cotranscription?


Co-transcriptional splicing (CTS) is very widespread. Different studies (which are done on different cell types) report different frequencies of CTS. Most of them report a frequency of ~0.8 in different cells except for mouse liver which was reported to have a frequency of 0.45 This is the article that summarizes these different studies.

Can transcription regulators influence splicing during cotranscription?

Yes… including factors such as nucleosome positions… this post may be helpful.


How cotranscription events affect splicing? - Biology

In the past 10 years, much attention has been focused on transcription preinitiation complex formation as a target for regulating gene expression, and other targets such as transcription termination complex assemblage have been less intensively investigated. We established the existence of poly(A) site choice and fusion splicing of two adjacent genes, galactose-1-phosphate uridylyltransferase (GALT) and interleukin-11 receptor α-chain (IL-11Rα), in normal human cells. This 16-kilobase (kb) transcription unit contains two promoters (the first one is constitutive, and the second one, 8 kb downstream, is highly regulated) and two cleavage/polyadenylation signals separated by 12 kb. The promoter from the GALT gene yields two mRNAs, a 1.4-kb mRNA encoding GALT and a 3-kb fusion mRNA when the first poly(A) site is spliced out and the second poly(A) is used. The 3-kb mRNA codes for a fusion protein of unknown function, containing part of the GALT protein and the entire IL-11Rα protein. The GALT promoter/IL-11Rα poly(A) transcript results from leaky termination and alternative splicing. This feature of RNA polymerase (pol) II transcription, which contrasts with efficient RNA pol I and pol III termination, may be involved, together with chromosome rearrangements, in the generation of fusion proteins with multiple domains and would have major evolutionary implications in terms of natural processes to generate novel proteins with common motifs. Our results, together with accumulation of genomic informations, will stimulate new considerations and experiments in gene expression studies.


Detection of aberrant splicing events in RNA-seq data using FRASER

Aberrant splicing is a major cause of rare diseases. However, its prediction from genome sequence alone remains in most cases inconclusive. Recently, RNA sequencing has proven to be an effective complementary avenue to detect aberrant splicing. Here, we develop FRASER, an algorithm to detect aberrant splicing from RNA sequencing data. Unlike existing methods, FRASER captures not only alternative splicing but also intron retention events. This typically doubles the number of detected aberrant events and identified a pathogenic intron retention in MCOLN1 causing mucolipidosis. FRASER automatically controls for latent confounders, which are widespread and affect sensitivity substantially. Moreover, FRASER is based on a count distribution and multiple testing correction, thus reducing the number of calls by two orders of magnitude over commonly applied z score cutoffs, with a minor loss of sensitivity. Applying FRASER to rare disease diagnostics is demonstrated by reprioritizing a pathogenic aberrant exon truncation in TAZ from a published dataset. FRASER is easy to use and freely available.

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1. The FRASER aberrant splicing detection…

Fig. 1. The FRASER aberrant splicing detection workflow.

The workflow starts with RNA-seq aligned reads…

Fig. 2. FRASER corrects for covariations in…

Fig. 2. FRASER corrects for covariations in alternative acceptor usage.

Fig. 3. Splicing outlier detection based on…

Fig. 3. Splicing outlier detection based on the beta-binomial distribution.

Fig. 4. Benchmark using artificially injected outliers…

Fig. 4. Benchmark using artificially injected outliers for alternative acceptor usage.


Prediction and Quantification of Splice Events from RNA-Seq Data

Analysis of splice variants from short read RNA-seq data remains a challenging problem. Here we present a novel method for the genome-guided prediction and quantification of splice events from RNA-seq data, which enables the analysis of unannotated and complex splice events. Splice junctions and exons are predicted from reads mapped to a reference genome and are assembled into a genome-wide splice graph. Splice events are identified recursively from the graph and are quantified locally based on reads extending across the start or end of each splice variant. We assess prediction accuracy based on simulated and real RNA-seq data, and illustrate how different read aligners (GSNAP, HISAT2, STAR, TopHat2) affect prediction results. We validate our approach for quantification based on simulated data, and compare local estimates of relative splice variant usage with those from other methods (MISO, Cufflinks) based on simulated and real RNA-seq data. In a proof-of-concept study of splice variants in 16 normal human tissues (Illumina Body Map 2.0) we identify 249 internal exons that belong to known genes but are not related to annotated exons. Using independent RNA samples from 14 matched normal human tissues, we validate 9/9 of these exons by RT-PCR and 216/249 by paired-end RNA-seq (2 x 250 bp). These results indicate that de novo prediction of splice variants remains beneficial even in well-studied systems. An implementation of our method is freely available as an R/Bioconductor package [Formula: see text].

Conflict of interest statement

Competing Interests: The authors of this manuscript have read the journal’s policy and have the following competing interests: All authors are or have been employees of Genentech Inc. and some hold shares in Roche. RG is an employee of 23AndMe Inc. This does not alter the authors’ adherence to PLOS ONE policies on sharing data and materials.


Roles of alternative splicing in modulating transcriptional regulation

Background: The ability of a transcription factor to regulate its targets is modulated by a variety of genetic and epigenetic mechanisms. Alternative splicing can modulate gene function by adding or removing certain protein domains, and therefore affect the activity of protein. Reverse engineering of gene regulatory networks using gene expression profiles has proven valuable in dissecting the logical relationships among multiple proteins during the transcriptional regulation. However, it is unclear whether alternative splicing of certain proteins affects the activity of other transcription factors.

Results: In order to investigate the roles of alternative splicing during transcriptional regulation, we constructed a statistical model to infer whether the alternative splicing events of modulator proteins can affect the ability of key transcription factors in regulating the expression levels of their transcriptional targets. We tested our strategy in KIRC (Kidney Renal Clear Cell Carcinoma) using the RNA-seq data downloaded from TCGA (the Cancer Genomic Atlas). We identified 828of modulation relationships between the splicing levels of modulator proteins and activity levels of transcription factors. For instance, we found that the activity levels of GR (glucocorticoid receptor) protein, a key transcription factor in kidney, can be influenced by the splicing status of multiple proteins, including TP53, MDM2 (mouse double minute 2 homolog), RBM14 (RNA-binding protein 14) and SLK (STE20 like kinase). The influenced GR-targets are enriched by key cancer-related pathways, including p53 signaling pathway, TR/RXR activation, CAR/RXR activation, G1/S checkpoint regulation pathway, and G2/M DNA damage checkpoint regulation pathway.

Conclusions: Our analysis suggests, for the first time, that exon inclusion levels of certain regulatory proteins can affect the activities of many transcription factors. Such analysis can potentially unravel a novel mechanism of how splicing variation influences the cellular function and provide important insights for how dysregulation of splicing outcome can lead to various diseases.

Keywords: Alternative splicing GR Kidney cancer Linear regression MDM2 TP53 Transcriptional regulation.


Impact of spliceosome mutations on RNA splicing in myelodysplasia: dysregulated genes/pathways and clinical associations

SF3B1, SRSF2, and U2AF1 are the most frequently mutated splicing factor genes in the myelodysplastic syndromes (MDS). We have performed a comprehensive and systematic analysis to determine the effect of these commonly mutated splicing factors on pre-mRNA splicing in the bone marrow stem/progenitor cells and in the erythroid and myeloid precursors in splicing factor mutant MDS. Using RNA-seq, we determined the aberrantly spliced genes and dysregulated pathways in CD34 + cells of 84 patients with MDS. Splicing factor mutations result in different alterations in splicing and largely affect different genes, but these converge in common dysregulated pathways and cellular processes, focused on RNA splicing, protein synthesis, and mitochondrial dysfunction, suggesting common mechanisms of action in MDS. Many of these dysregulated pathways and cellular processes can be linked to the known disease pathophysiology associated with splicing factor mutations in MDS, whereas several others have not been previously associated with MDS, such as sirtuin signaling. We identified aberrantly spliced events associated with clinical variables, and isoforms that independently predict survival in MDS and implicate dysregulation of focal adhesion and extracellular exosomes as drivers of poor survival. Aberrantly spliced genes and dysregulated pathways were identified in the MDS-affected lineages in splicing factor mutant MDS. Functional studies demonstrated that knockdown of the mitosis regulators SEPT2 and AKAP8, aberrantly spliced target genes of SF3B1 and SRSF2 mutations, respectively, led to impaired erythroid cell growth and differentiation. This study illuminates the effect of the common spliceosome mutations on the MDS phenotype and provides novel insights into disease pathophysiology.

© 2018 by The American Society of Hematology.

Conflict of interest statement

Conflict-of-interest disclosure: The authors declare no competing financial interests.


Results and Discussion

To investigate the effects of hybridization and genome doubling on transcriptome reprogramming during allopolyploidization, we reanalyzed the previously published RNA-Seq data sets of three interspecific crossing combinations in wheat and brassica lineages ( Hao et al. 2017 Zhang et al. 2018) ( fig. 1A, supplementary table 1 and supplementary figs. S1–S3 , Supplementary Material online). For wheat combinations, two tetraploids of Triticum turgidum (AABB) were crossed with diploid Aegilops tauschii (DD) to produce triploid hybrids (ABD) whose genomes were then doubled to generate allohexaploid wheat (AABBDD) ( fig. 1A) ( Hao et al. 2017). Similarly, diploids Brassica rapa (ArAr) and Brassica oleracea (CoCo) were crossed to produce a hybrid (ArCo) which was used in the generation of allotetraploid brassica (ArArCoCo) ( fig. 1A) ( Zhang et al. 2018). To examine the respective effects of hybridization and genome doubling on gene expression, significantly differentially expressed genes (DEGs) were identified by comparing hybrids with parents (Hybrid-vs-Parents) and allopolyploids with hybrids (Allopolyploid-vs-Hybrid), by applying the criteria “expression fold change ≥2 and false discovery rate < 0.05” ( supplementary tables 2–4 , Supplementary Material online). We found that both hybridization and genome doubling can induce dramatic expression changes in thousands of genes ( fig. 1B, supplementary fig. S4 , Supplementary Material online). The combined effect of hybridization and genome doubling was further examined by comparing allopolyploids with parents (Allopolyploid-vs-Parents). Notably, for most comparisons (subgenomes A and B in wheat and subgenome Ar in brassica), far fewer DEGs were caused by the combined effect than those caused by hybridization or genome doubling alone ( fig. 1B, supplementary fig. S4 , Supplementary Material online). For example, for subgenome A in wheat combination 1, up to 4,759 and 4,735 DEGs were induced by hybridization and genome doubling, respectively, whereas only 1,849 DEGs were caused by their combined effect ( fig. 1B). For subgenome D in wheat and subgenome Co in brassica, the number of DEGs caused by the combined effect was still far fewer than the sum of DEGs caused by these two events alone ( fig. 1B, supplementary fig. S4 , Supplementary Material online).

Comparison of gene expression and splicing efficiency changes induced by hybridization and genome doubling during allopolyploidization. (A) Schematic representation of the synthesis of allopolyploid wheat and brassica. These hybrids and allopolyploids were produced in previous studies, which provided the data sets used in this study ( Hao et al. 2017 Zhang et al. 2018). For wheat combinations, two tetraploids, Triticum turgidum (AABB) ssp. durum cv. Langdon (LDN) and ssp. turgidum accession AS2255, were crossed with diploid Aegilops tauschii ssp. tauschii accession AS60 (DD) to produce the two triploid hybrids (ABD). The resulting triploids were used to generate the allohexaploid wheat (AABBDD) through genome doubling. For the brassica combination, two diploids Brassica rapa (ArAr) and Brassica oleracea (CoCo) were crossed to produce the diploid hybrid (ArCo), which was then used to generate the allotetraploid brassica (ArArCoCo) through genome doubling. (B) The number of DEGs caused by hybridization, genome doubling, and allopolyploidization in wheat combination 1. The numbers of DEGs caused by hybridization (Hybrid-vs-Parents, blue bars), genome doubling (Allopolyploid-vs-Hybrid, blue bars), and allopolyploidization (Allopolyploid-vs-Parents, red bars) are shown for each subgenome. (C) The number of DSIs caused by hybridization, genome doubling, and allopolyploidization in wheat combination 1. The numbers of DSIs caused by hybridization (Hybrid-vs-Parents, blue bars), genome doubling (Allopolyploid-vs-Hybrid, blue bars) and allopolyploidization (Allopolyploid-vs-Parents, red bars) are shown for each subgenome. (D) Significantly negative correlations between gene expression fold changes caused by hybridization and genome doubling in wheat combination 1. All DEGs identified from Hybrid-vs-Parents, Allopolyploid-vs-Hybrid, and Allopolyploid-vs-Parents were analyzed and plotted. The PCC of each comparison is shown (P values < 2.2e-16 for all comparisons). (E) Significantly negative correlations between splicing efficiency changes caused by hybridization and genome doubling in wheat combination 1. All DSIs identified from Hybrid-vs-Parents, Allopolyploid-vs-Hybrid, and Allopolyploid-vs-Parents were analyzed and plotted. The PCC value of each comparison is shown (P values < 2.2e-16 for all the comparisons).

Comparison of gene expression and splicing efficiency changes induced by hybridization and genome doubling during allopolyploidization. (A) Schematic representation of the synthesis of allopolyploid wheat and brassica. These hybrids and allopolyploids were produced in previous studies, which provided the data sets used in this study ( Hao et al. 2017 Zhang et al. 2018). For wheat combinations, two tetraploids, Triticum turgidum (AABB) ssp. durum cv. Langdon (LDN) and ssp. turgidum accession AS2255, were crossed with diploid Aegilops tauschii ssp. tauschii accession AS60 (DD) to produce the two triploid hybrids (ABD). The resulting triploids were used to generate the allohexaploid wheat (AABBDD) through genome doubling. For the brassica combination, two diploids Brassica rapa (ArAr) and Brassica oleracea (CoCo) were crossed to produce the diploid hybrid (ArCo), which was then used to generate the allotetraploid brassica (ArArCoCo) through genome doubling. (B) The number of DEGs caused by hybridization, genome doubling, and allopolyploidization in wheat combination 1. The numbers of DEGs caused by hybridization (Hybrid-vs-Parents, blue bars), genome doubling (Allopolyploid-vs-Hybrid, blue bars), and allopolyploidization (Allopolyploid-vs-Parents, red bars) are shown for each subgenome. (C) The number of DSIs caused by hybridization, genome doubling, and allopolyploidization in wheat combination 1. The numbers of DSIs caused by hybridization (Hybrid-vs-Parents, blue bars), genome doubling (Allopolyploid-vs-Hybrid, blue bars) and allopolyploidization (Allopolyploid-vs-Parents, red bars) are shown for each subgenome. (D) Significantly negative correlations between gene expression fold changes caused by hybridization and genome doubling in wheat combination 1. All DEGs identified from Hybrid-vs-Parents, Allopolyploid-vs-Hybrid, and Allopolyploid-vs-Parents were analyzed and plotted. The PCC of each comparison is shown (P values < 2.2e-16 for all comparisons). (E) Significantly negative correlations between splicing efficiency changes caused by hybridization and genome doubling in wheat combination 1. All DSIs identified from Hybrid-vs-Parents, Allopolyploid-vs-Hybrid, and Allopolyploid-vs-Parents were analyzed and plotted. The PCC value of each comparison is shown (P values < 2.2e-16 for all the comparisons).

To examine genome-wide splicing changes during allopolyploidization, the splicing efficiency of each intron was calculated ( supplementary material and supplementary fig. S5 , Supplementary Material online). Significantly differentially spliced introns (DSIs) were identified by applying the criteria “splicing efficiency change ≥ 20% and false discovery rate < 0.05” ( supplementary tables 5–7 , Supplementary Material online) ( Brooks et al. 2011). We found that both hybridization and genome doubling events can induce genome-wide splicing efficiency changes ( fig. 1C, supplementary fig. S6 , Supplementary Material online). Intriguingly, for most comparisons (subgenomes A and B in wheat and subgenome Ar in brassica), there were fewer DSIs caused by the combined effect of hybridization and genome doubling compared with those caused by either event alone ( fig. 1C, supplementary fig. S6 , Supplementary Material online). For example, for subgenome A in wheat combination 1, 803 and 604 DSIs were caused by hybridization and genome doubling, respectively, whereas only 423 DSIs were identified due to their combined effect ( fig. 1C). For subgenome D in wheat and subgenome Co in brassica, the DSIs caused by the combined effect were also far fewer than the total DSIs caused by these two events alone ( fig. 1C, supplementary fig. S6 , Supplementary Material online). Collectively, these results indicated that the combined effect of hybridization and genome doubling on gene expression and splicing is far less than the simple sum of their individual effects, which suggests an interplay between hybridization and genome doubling during allopolyploidization.

The relationship between hybridization and genome doubling was further examined by comparing gene expression and splicing changes induced by these two events ( fig. 1D and E, supplementary figs. S7–S10 , Supplementary Material online). Significantly negative correlations were observed between the expression fold changes induced by these two events for all the subgenomes in all three cross combinations ( fig. 1D, supplementary figs. S7 and S8 , Supplementary Material online). For example, the Pearson correlation coefficient (PCC) is −0.52 to −0.77 for different subgenomes of wheat combination 1 ( fig. 1D). Furthermore, significantly negative correlations were also observed between the splicing efficiency changes caused by hybridization and genome doubling, as seen in wheat combination 1, which had a PCC of −0.65 to −0.72 ( fig. 1E, supplementary figs. S9 and S10 , Supplementary Material online). Taken together, these results indicate that genome doubling can cause global opposite effects compared with hybridization at both gene expression and splicing levels.

To determine how much hybridization-induced transcriptome reprogramming in hybrids can be recovered in allopolyploids after genome doubling, DEGs/DSIs identified in the hybrids were further classified into four groups ( Supplementary Material online): DEGs/DSIs in hybrids being recovered to parental levels in allopolyploids (Group 1) DEGs/DSIs in hybrids being retained in allopolyploids (Group 2) DEGs/DSIs in hybrids being reinforced by genome doubling (Group 3) and others (Group 4) ( fig. 2A–D). Most (89–96%) of the DEGs and DSIs can be classified into Groups 1–3 ( fig. 2A–D). Notably, for A and B subgenomes in both wheat combinations, most hybridization-induced gene expression and splicing efficiency changes in hybrids can be recovered to parental levels after genome doubling (Group 1, 61–76% for DEGs and 54–75% for DSIs), whereas fewer were retained in allopolyploids (Group 2, 15–29% for DEGs and 20–35% for DSIs) ( fig. 2A–D). For subgenome D in wheat, relatively more DEGs/DSIs in hybrids were retained in allopolyploids, but still, ∼32% and ∼39% of the DEGs and DSIs in hybrids were recovered to parental levels by genome doubling, respectively ( fig. 2A–D). Likewise, in the brassica combination, more hybridization-induced DEGs (45–62%) and DSIs (49–63%) were recovered to parental levels compared with those which retained their hybridization-induced changes (31–42% for DEGs, 27–41% for DSIs) in the allotetraploid after genome doubling ( fig. 2A–D). We also found that the DEGs and DSIs being recovered after genome doubling were distributed among all the chromosomes with no apparent preference, which indicated that this “recovery effect” is common among all the chromosomes ( supplementary figs. S11 and S12 , Supplementary Material online). In addition, most of the DEGs (51–70%) or DSIs (60–69%) caused by genome doubling were also found to be due to reversion to parental levels ( supplementary fig. S13 , Supplementary Material online). Conclusively, hybridization-induced transcriptome reprogramming in hybrids can be globally recovered in allopolyploids after genome doubling.

Classification of DEGs and DSIs identified in hybrids when compared with their parents. (A) Classification of DEGs identified in the hybrids compared with their parents. These DEGs were classified into four groups: Altered expression in hybrids was recovered to parental levels by genome doubling (Group 1) altered expression in hybrids remained in allopolyploids after genome doubling (Group 2) hybridization-induced expression changes were reinforced by genome doubling (Group 3, genome doubling caused changes in the same direction as that caused by hybridization) and others (Group 4). (B) Classification of DSIs identified in hybrids compared with their parents. All DSIs were classified into four groups: altered splicing efficiency in hybrids recovered to parental levels by genome doubling (Group 1) altered splicing efficiency in hybrids remained in allopolyploids after genome doubling (Group 2) hybridization-induced splicing changes were reinforced by genome doubling (Group 3, genome doubling caused change in the same direction as that caused by hybridization) and others (Group 4). (C) The expression profiles of DEGs in Groups 1–3 from the classification analysis in (A). DEGs of the same group from all subgenomes were plotted together and each gene is represented by each single line. The numbers of DEGs in each subgenome are indicated. The upregulated and downregulated genes in hybrids are represented by red and blue lines, respectively. (D) The splicing efficiency profiles of DSIs in Groups 1–3 from the classification analysis in (B). DSIs of the same group from all subgenomes were plotted together and each intron is represented by each single line. The numbers of DSIs in each subgenome are indicated. The introns with increased splicing efficiency and decreased splicing efficiency in hybrids are represented by red and blue lines, respectively.

Classification of DEGs and DSIs identified in hybrids when compared with their parents. (A) Classification of DEGs identified in the hybrids compared with their parents. These DEGs were classified into four groups: Altered expression in hybrids was recovered to parental levels by genome doubling (Group 1) altered expression in hybrids remained in allopolyploids after genome doubling (Group 2) hybridization-induced expression changes were reinforced by genome doubling (Group 3, genome doubling caused changes in the same direction as that caused by hybridization) and others (Group 4). (B) Classification of DSIs identified in hybrids compared with their parents. All DSIs were classified into four groups: altered splicing efficiency in hybrids recovered to parental levels by genome doubling (Group 1) altered splicing efficiency in hybrids remained in allopolyploids after genome doubling (Group 2) hybridization-induced splicing changes were reinforced by genome doubling (Group 3, genome doubling caused change in the same direction as that caused by hybridization) and others (Group 4). (C) The expression profiles of DEGs in Groups 1–3 from the classification analysis in (A). DEGs of the same group from all subgenomes were plotted together and each gene is represented by each single line. The numbers of DEGs in each subgenome are indicated. The upregulated and downregulated genes in hybrids are represented by red and blue lines, respectively. (D) The splicing efficiency profiles of DSIs in Groups 1–3 from the classification analysis in (B). DSIs of the same group from all subgenomes were plotted together and each intron is represented by each single line. The numbers of DSIs in each subgenome are indicated. The introns with increased splicing efficiency and decreased splicing efficiency in hybrids are represented by red and blue lines, respectively.

Having observed the recovery of hybridization-induced gene expression changes by genome doubling, we next attempted to determine the relative contribution of hybridization and genome doubling to the final transcriptome reprogramming in allopolyploids. The DEGs and DSIs identified from comparisons between allopolyploids and their parents were further grouped into three clusters according to the relative contribution of hybridization and genome doubling ( Supplementary Material online): DEGs/DSIs mainly caused by hybridization (Cluster 1) DEGs/DSIs mainly caused by genome doubling (Cluster 2) and DEGs/DSIs significantly contributed to by both events (Cluster 3) ( fig. 3A and B, supplementary figs. S14 and S15 , Supplementary Material online). About 34–55% of DEGs and 31–51% of DSIs identified in allopolyploids were found to be mainly caused by hybridization, and comparable amounts of DEGs (21–45%) and DSIs (23–43%) were found to be mainly caused by genome doubling ( fig. 3A and B, supplementary figs. S14 and S15 , Supplementary Material online). Relatively fewer DEGs and DSIs were contributed to by both events with the same trend ( fig. 3A and 3B, supplementary figs. S14 and S15 , Supplementary Material online). These results suggested that both hybridization and genome doubling substantially and comparably contributed to transcriptome reprogramming in allopolyploids. In addition, the relative contribution of hybridization and genome doubling varied among different subgenomes and species. For example, in wheat combination 1, more DEGs were contributed to by hybridization in subgenomes A and B, whereas more DEGs were contributed to by genome doubling in subgenome D ( fig. 3A).

Classification of DEGs and DSIs identified in allopolyploids when compared with their parents. (A) Classification of DEGs identified in allopolyploids compared with their parents in wheat combination 1. All DEGs can be classified into three clusters: expression changes mainly caused by hybridization (Cluster 1) expression changes mainly caused by genome doubling (Cluster 2) and expression changes significantly contributed to by both hybridization and genome doubling with the same trend (Cluster 3). Allopolyploidization: expression fold changes between allopolyploids and parents Hybridization: expression fold changes between hybrids and parents Genome doubling: expression fold changes between allopolyploids and hybrids. The expression fold changes are shown in heatmaps. Upregulated and downregulated genes are colored in red and blue, respectively. The color intensity reflects the magnitude of fold change. (B) Classification of DSIs identified in allopolyploids compared with their parents in wheat combination 1. All DSIs can be classified into three clusters: splicing efficiency changes mainly caused by hybridization (Cluster 1) splicing efficiency changes mainly caused by genome doubling (Cluster 2) and splicing efficiency changes significantly contributed to by both hybridization and genome doubling with the same trend (Cluster 3). The splicing efficiency changes are shown in heatmaps. Introns with increased and decreased splicing efficiency are colored in red and blue, respectively. The color intensity reflects the magnitude of splicing efficiency change.

Classification of DEGs and DSIs identified in allopolyploids when compared with their parents. (A) Classification of DEGs identified in allopolyploids compared with their parents in wheat combination 1. All DEGs can be classified into three clusters: expression changes mainly caused by hybridization (Cluster 1) expression changes mainly caused by genome doubling (Cluster 2) and expression changes significantly contributed to by both hybridization and genome doubling with the same trend (Cluster 3). Allopolyploidization: expression fold changes between allopolyploids and parents Hybridization: expression fold changes between hybrids and parents Genome doubling: expression fold changes between allopolyploids and hybrids. The expression fold changes are shown in heatmaps. Upregulated and downregulated genes are colored in red and blue, respectively. The color intensity reflects the magnitude of fold change. (B) Classification of DSIs identified in allopolyploids compared with their parents in wheat combination 1. All DSIs can be classified into three clusters: splicing efficiency changes mainly caused by hybridization (Cluster 1) splicing efficiency changes mainly caused by genome doubling (Cluster 2) and splicing efficiency changes significantly contributed to by both hybridization and genome doubling with the same trend (Cluster 3). The splicing efficiency changes are shown in heatmaps. Introns with increased and decreased splicing efficiency are colored in red and blue, respectively. The color intensity reflects the magnitude of splicing efficiency change.

It has long been considered that heterosis in interspecific hybrids can be permanently fixed through genome doubling to form allopolyploids ( Comai 2005 Chen 2010, 2013). The fixation of heterosis can confer advantages to allopolyploids in adaptive evolution ( Comai 2005 Chen 2010). However, little is known about how much heterosis in hybrids can be fixed in allopolyploids, or exactly what role genome doubling plays in the fixation of heterosis. We found that most of the transcriptome reprogramming which occurred in hybrids cannot be fixed in allopolyploids after genome doubling ( fig. 2A–D). As transcriptome reprogramming is an important contributor to heterosis ( Chen 2010, 2013), our results suggest that most of the heterosis resulting from transcriptome reprogramming in interspecific hybrids cannot be fixed in allopolyploids due to the “recovery effect” of genome doubling. Heterosis also arises from other factors, such as combination of different alleles ( Chen 2013). Theoretically, heterotic phenotypes caused by other factors will not be affected by this “recovery effect.” It is also possible that a small number of DEGs were due to homoeologous exchange, although it rarely occurred ( Zhang et al. 2020). However, the distribution of DEGs/DSIs among all the chromosomes indicated that the impact of possible homoeologous exchange is very slight even if it occurred ( supplementary figs. S11 and S12 , Supplementary Material online).

Hybridization is both a common mechanism in plant speciation and one of the most important applications of genetics in crop breeding ( Abbott et al. 2013 Huang et al. 2016). Recent hybridization events were found in nearly half of the world’s crops and wild species during their evolution ( Abbott et al. 2013 Alix et al. 2017). Recent studies demonstrated that hybridization can induce dramatic transcriptome reprogramming which can lead to phenotypic variations and serve as an important source of heterosis in hybrids ( Chen 2013 Yoo et al. 2013). Transcriptome reprogramming in hybrids has typically been considered to be caused by the merger of two genomes or subsequent “genome shock” ( Chen 2013). Our results suggested that a large proportion of hybridization-induced transcriptome reprogramming in interspecific hybrids (Group 1 in fig. 2A and B) was not attributed to genome merger, as it recovered to parental levels in allopolyploids possessing merged genomes. This proportion of transcriptome reprogramming in hybrids was probably due to other factors, such as a reduction of homologous chromosome sets in hybrids, as both parents and allopolyploids have two copies of homologous chromosome sets but hybrids only have one. Thus, our findings provide new insights into the mechanism underlying heterosis and hybrid speciation. In addition, several previous studies have demonstrated that hybridization-induced DNA methylation alterations can also be recovered in allopolyploids after genome doubling ( Beaulieu et al. 2009 Hegarty et al. 2011 Xu et al. 2012 Qi et al. 2018). Together with our findings at gene transcriptional and splicing levels, this “recovery effect” of genome doubling implies a novel gene expression regulatory mechanism which is worth further investigation.


How do transcriptional R loops induce genome instability?

Transcriptional R-loop formation in eukaryotes is highly correlated with DNA recombination and/or impairment of genome stability, indicating an inherent impact of R looping on the integrity of the genome. Although relatively little is known about the molecular mechanism(s) by which transcriptional R loops influence genome stability, several possible scenarios can be envisioned.

It has long been known that ssDNA is more vulnerable to mutations than dsDNA (e.g., Lindahl and Nyberg 1974). Thus, in one model, extensive R-loop formation may make certain transcribed regions of the genome more susceptible to DNA-damaging agents simply by increasing the frequency of single-stranded regions (Fig. 4A). This idea is consistent with early observations that transcription synergistically increases the mutagenic effects of DNA-damaging agents in bacteria (e.g., Brock 1971 Herman and Dworkin 1971). More recently, Garcia-Rubio et al. (2003) provided evidence that transcription can also synergistically increase 4-nitroquinoline-N-oxide or methyl methanesulfonate-induced recombination in yeast. However, it is unclear whether effects such as these are in fact due to R-loop formation. It will be intriguing in the future to determine, for example, if increased mutagenesis is enhanced in topA mutants or decreased by RNase H overexpression.

Potential models of transcriptional R-loop-mediated DNA damage. (A) R loops make the genome more susceptible to DNA-damaging agents by increasing the frequency of single-stranded regions. (B) AID-mediated DNA cytosine deamination generates abasic sites on the nontemplated DNA strand within R-loop regions. These lead to DNA lesions, catalyzed, for example, by the base excision repair components UNG and APE, which generate a DNA nick in the R-loop region. (C) G-quartets form on the nontemplate DNA strand, and may both contribute to R-loop formation and generate susceptible sites for specific nucleases. (D) The collision between the DNA replication apparatus and transcription elongation complex leads to the formation of DNA lesions. RNAP II is indicated.

In a second scenario, (a) protein factor(s) that specifically recognize(s) R-loop structures might be involved in initiating the generation of mutagenic/recombinant DNA lesions. One candidate is AID, which was first identified as a factor specifically expressed in germinal center B cells upon activation (Muramatsu et al. 1999). Later studies have demonstrated that AID is indeed a key player necessary for CSR (Muramatsu et al. 2000 Okazaki et al. 2002 Longerich et al. 2006). Two models, the RNA-editing model and the DNA-deamination model, have been suggested to explain AID action in CSR. Since the high homology of the amino acid sequence between AID and apolipoprotein B mRNA-editing catalytic polypeptide 1 (APOBEC-1), AID was initially proposed to be an RNA-editing enzyme that modifies an mRNA encoding an unknown endonuclease, which then specifically targets the S regions (Muramatsu et al. 1999, 2000). While this model has not been ruled out, later studies have provided more and more evidence supporting a DNA-based function for AID in CSR. That is, AID directly deaminates cytidine to generate uracil on the DNA, and the resulting U:G mismatch is then processed by the base excision repair components, such as uracil DNA glycosylase (UNG) and apurinic/apyrimidinic endonuclease (APE), to generate a DNA nick in the switch region (Fig. 4B Di Noia and Neuberger 2002 Petersen-Mahrt et al. 2002 Rada et al. 2002). The nicks are then further converted into DSBs by mismatch repair proteins to initiate CSR (Stavnezer and Schrader 2006). In this view, it is especially noteworthy that AID prefers to deaminate cytidine in ssDNA (Chaudhuri et al. 2003 Dickerson et al. 2003), and transcription enhances AID-mediated cytidine deamination on the nontemplate strand (Ramiro et al. 2003 Sohail et al. 2003). These studies support a model by which the ssDNA regions resulting from transcriptional R looping form targets for AID, and this confers upon AID the specificity to introduce DNA breaks in the S region, which initiate the process of CSR.

An interesting question is whether AID or a related enzyme might also be responsible for initiating the DNA DSBs observed following ASF/SF2 depletion. In keeping with its role in CSR and other immunoglobulin gene alterations, AID is specifically expressed in germinal B cells. While DT40 cells are B cell derived and express AID, HMW DNA fragments have also been observed in HeLa cells upon small interfering RNA (siRNA)-mediated depletion of ASF/SF2 (Li and Manley 2005a). One possibility is that other APOBEC family members might function analogously to AID in other cell types. Indeed, a number of additional members of this family, expressed in a variety of tissues, have been described, and show DNA deamination activity but with differential context dependence (Beale et al. 2004)

In addition to ssDNA regions, other structural features of R loops may also contribute to making these structures targets for certain nucleases. Each R loop contains two duplex–single strand junctions. Several nucleases are capable of recognizing such loop–duplex junctions. For example, XPF-ERCC1 and XPG, originally identified as nucleotide excision repair nucleases, are able to cleave bubble structures at the 5′ and 3′ side of the loop–duplex junction, respectively. Intriguingly, it has been reported that recombinant XPF-ERCC1 and XPG can cleave transcribed S regions (Tian and Alt 2000).

It is also interesting to note that transcriptional R looping occurs preferentially when the nontemplate strand is G rich (e.g., Phoenix et al. 1997 Huertas and Aguilera 2003 Yu et al. 2003 Li and Manley 2005a). Recent studies have indeed shown that guanine is indispensable for the formation of a stable RNA:DNA hybrids in the immunoglobulin S region during in vitro transcription (Duquette et al. 2004 Mizuta et al. 2005). Substitution of GTP with analogs, such as 7-deaza guanine or ITP, impair the formation of higher-order RNA:DNA complexes, presumably R loops, in the S region. This is consistent with the fact that the exceptional stability of the rG/dC base pairs (Sugimoto et al. 1995) facilitates R-loop formation by favoring RNA:DNA over DNA:DNA hybrids in regions where the nontemplate strand is G-rich, such as the S regions (Gritzmacher 1989). On the other hand, the single-stranded G-rich regions are also prone to form stable parallel four-stranded DNA structures, referred to as G-quartets or G4 DNAs (for review, see Gilbert and Feigon 1999). This could help stabilize the single-strand nature of the nontemplate strand and thereby facilitate RNA:DNA hybrid formation during transcription. Formation of G-quartet structures during in vitro transcription of S regions has in fact been observed by electron microscopy, and they have also been detected in S region-containing plasmids propagated in E. coli (Duquette et al. 2004). Notably, several nucleases have been identified, from yeast and mammalian cells, that cleave in the single-stranded region 5′ of stacked G-quartets with striking specificity (Liu et al. 1993 Liu and Gilbert 1994 Sun et al. 2001). These observations raise the possibility that higher-order DNA structures, such as G-quartets, may both help drive the formation of transcriptional R loops and also generate susceptible sites for specific nucleases, although further work is required to determine whether such structures indeed play significant roles in vivo.

An additional possible mechanism for R-loop-mediated DNA damage involves DNA replication. Specifically, transcription elongation complexes halted or slowed by R-loop structures might block the progress of replication forks and lead to DNA recombination and/or DNA DSBs in the newly replicated DNA (Fig. 4D Aguilera 2002). Given that cotranscriptional R loops have been known to be responsible for the transcription elongation defects in yeast THO mutants (Huertas and Aguilera 2003), it is plausible that resulting stalled elongation complexes could interfere with the replication machinery. In fact, transcription-dependent replication fork collisions have been shown to induce recombination in bacteria and yeast (French 1992 Krasilnikova et al. 1998 Takeuchi et al. 2003). Recently, using plasmid-borne direct-repeat constructs, Prado and Aguilera (2005) provided evidence that a clash between transcription and replication facilitates TAR. A significant increase in TAR was observed when RNAP II transcription encountered a head-on oncoming replication fork, and this was concomitant with the appearance of a replication fork pause. These results support the idea that TAR can be a consequence of replication fork collapse caused by RNAP II-mediated transcription, at least in yeast. Evidence for this phenomenon in vertebrate systems, however, is currently lacking. In addition, ASF/SF2 depletion does not appear to cause an overall decrease in RNAP II transcription rate (Wang et al. 1996). Thus, the idea that transcriptional R-loop formation interferes with replication fork progression and thereby induces DNA damage and genome instability is an intriguing possibility, but one that requires further work.


A genome landscape of SRSF3-regulated splicing events and gene expression in human osteosarcoma U2OS cells

Alternative RNA splicing is an essential process to yield proteomic diversity in eukaryotic cells, and aberrant splicing is often associated with numerous human diseases and cancers. We recently described serine/arginine-rich splicing factor 3 (SRSF3 or SRp20) being a proto-oncogene. However, the SRSF3-regulated splicing events responsible for its oncogenic activities remain largely unknown. By global profiling of the SRSF3-regulated splicing events in human osteosarcoma U2OS cells, we found that SRSF3 regulates the expression of 60 genes including ERRFI1, ANXA1 and TGFB2, and 182 splicing events in 164 genes, including EP300, PUS3, CLINT1, PKP4, KIF23, CHK1, SMC2, CKLF, MAP4, MBNL1, MELK, DDX5, PABPC1, MAP4K4, Sp1 and SRSF1, which are primarily associated with cell proliferation or cell cycle. Two SRSF3-binding motifs, CCAGC(G)C and A(G)CAGCA, are enriched to the alternative exons. An SRSF3-binding site in the EP300 exon 14 is essential for exon 14 inclusion. We found that the expression of SRSF1 and SRSF3 are mutually dependent and coexpressed in normal and tumor tissues/cells. SRSF3 also significantly regulates the expression of at least 20 miRNAs, including a subset of oncogenic or tumor suppressive miRNAs. These data indicate that SRSF3 affects a global change of gene expression to maintain cell homeostasis.

© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Figures

SRSF3-targeted splicing events and changes…

SRSF3-targeted splicing events and changes of gene expression in U2OS cells. ( A…

Functional classification and MEME motif…

Functional classification and MEME motif analysis of the identified SRSF3-responsive targets. ( A…

Validation of the SRSF3-responsive events…

Validation of the SRSF3-responsive events identified by B/E ratio analysis. U2OS and HeLa…

Validation of other SRSF3-responsive RNA…

Validation of other SRSF3-responsive RNA splicing events identified by ANOVA analysis. Following transfection…

Validation for protein expression changes…

Validation for protein expression changes following SRSF3 knockdown in U2OS and HeLa cells.…

SRSF3 promotes inclusion of the…

SRSF3 promotes inclusion of the EP300 exon 14 through an exonic SRSF3-binding site.…

SRSF3 and SRSF1 are mutually…

SRSF3 and SRSF1 are mutually regulated in cells. ( A ) SRSF3 knockdown…

SRSF3 and SRSF1 are mutually…

SRSF3 and SRSF1 are mutually regulated in cell lines and coexpressed in normal…


How does temperature affect splicing events? Isoform switching of splicing factors regulates splicing of LATE ELONGATED HYPOCOTYL (LHY)

One of the ways in which plants can respond to temperature is via alternative splicing (AS). Previous work showed that temperature changes affected the splicing of several circadian clock gene transcripts. Here, we investigated the role of RNA-binding splicing factors (SFs) in temperature-sensitive AS of the clock gene LATE ELONGATED HYPOCOTYL (LHY). We characterized, in wild type plants, temperature-associated isoform switching and expression patterns for SF transcripts from a high-resolution temperature and time series RNA-seq experiment. In addition, we employed quantitative RT-PCR of SF mutant plants to explore the role of the SFs in cooling-associated AS of LHY. We show that the splicing and expression of several SFs responds sufficiently, rapidly, and sensitively to temperature changes to contribute to the splicing of the 5'UTR of LHY. Moreover, the choice of splice site in LHY was altered in some SF mutants. The splicing of the 5'UTR region of LHY has characteristics of a molecular thermostat, where the ratio of transcript isoforms is sensitive to temperature changes as modest as 2 °C and is scalable over a wide dynamic range of temperature. Our work provides novel insight into SF-mediated coupling of the perception of temperature to post-transcriptional regulation of the clock.

Keywords: 5′UTR Arabidopsis alternative splicing circadian isoform switch signalling splicing factors temperature thermostat.

© 2018 The Authors. Plant, Cell & Environment Published by John Wiley & Sons Ltd.

Figures

Temperature‐dependent AS of LHY transcripts.…

Temperature‐dependent AS of LHY transcripts. (a) Gene organization of LHY (At1g01060) with the…

Temperature associated AS and isoform…

Temperature associated AS and isoform switching of PTB1 , PTB2 , U2AF65A ,…

Splicing of LHY and splicing…

Splicing of LHY and splicing factors are sensitive to small extents and durations…

Effects of splicing factor mutations…

Effects of splicing factor mutations on LHY 5′UTR AS. Levels of (a) LHY…

White light intensity consolidates temperature…

White light intensity consolidates temperature associated AS for PTB1 but not for LHY…


Predicting how splicing errors impact disease risk

No one knows how many times in a day, or even an hour, the trillions of cells in our body need to make proteins. But we do know that it's going on all the time, on a massive scale. We also know that every time this happens, an editing process takes place in the cell nucleus. Called RNA splicing, it makes sure that the RNA "instructions" sent to cellular protein factories correspond precisely with the blueprint encoded in our genes.

Researchers led by Adrian Krainer, a Cold Spring Harbor Laboratory (CSHL) Professor, and Assistant Professor Justin Kinney, are teasing out the rules that guide how cells process these RNA messages, enabling better predictions about the impact of specific genetic mutations that affect this process. This in turn will help assess how certain mutations affect a person's risk for disease.

Splicing removes interrupting segments called introns from the raw, unedited RNA copy of a gene, leaving only the exons, or protein-coding regions. There are over 200,000 introns in the human genome, and if they are spliced out imprecisely, cells will generate faulty proteins. The results can be life-threatening: about 14% of the single-letter mutations that have been linked to human diseases are thought to occur within the DNA sequences that flag intron positions in the genome.

The cell's splicing machinery seeks "splice sites" to correctly remove introns from a raw RNA message. Splice sites throughout the genome are similar but not identical, and small changes don't always impair splicing efficiency. For the splice site at the beginning of an intron -- known as its 5' ["five-prime"] splice site, Krainer says, "we know that at the first and second [DNA-letter] position, mutations have a very strong impact. Mutations elsewhere in the intron can have dramatic effects or no effect, or something in between."

That's made it hard to predict how mutations at splice sites within disease-linked genes will impact patients. For example, mutations in the genes BRCA1 or BRCA2 can increase a woman's risk of breast and ovarian cancer, but not every mutation is harmful.

In experiments led by first author Mandy Wong, a Krainer lab postdoc, the team created 5' splice sites with every possible combination of DNA letters, then measured how well the associated introns were removed from a larger piece of RNA. For their experiments, they used introns from three disease-associated genes -- BRCA2 and two genes in which mutations cause neurodegenerative diseases, IKBKAP and SMN1.

In one intron of each of the three genes, the team tested over 32,000 5' splice sites. They found that specific DNA sequences corresponded with similar splicing efficiency or inefficiency in different introns. This is a step toward making general predictions. But they also found that other features of each gene -- the larger context -- tended to modify the impact in each specific case. In other words: how a mutation within a given 5' splice site will affect splicing is somewhat predictable, but is also influenced by context beyond the splice site itself.

Krainer says this knowledge will better help predict the impact of splice-site mutations -- but a deeper investigation is needed.


Watch the video: Splicing (January 2022).