Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 57
Filter
1.
Wiley Interdiscip Rev RNA ; 15(2): e1838, 2024.
Article in English | MEDLINE | ID: mdl-38509732

ABSTRACT

Disruptions in spatiotemporal gene expression can result in atypical brain function. Specifically, autism spectrum disorder (ASD) is characterized by abnormalities in pre-mRNA splicing. Abnormal splicing patterns have been identified in the brains of individuals with ASD, and mutations in splicing factors have been found to contribute to neurodevelopmental delays associated with ASD. Here we review studies that shed light on the importance of splicing observed in ASD and that explored the intricate relationship between splicing factors and ASD, revealing how disruptions in pre-mRNA splicing may underlie ASD pathogenesis. We provide an overview of the research regarding all splicing factors associated with ASD and place a special emphasis on five specific splicing factors-HNRNPH2, NOVA2, WBP4, SRRM2, and RBFOX1-known to impact the splicing of ASD-related genes. In the discussion of the molecular mechanisms influenced by these splicing factors, we lay the groundwork for a deeper understanding of ASD's complex etiology. Finally, we discuss the potential benefit of unraveling the connection between splicing and ASD for the development of more precise diagnostic tools and targeted therapeutic interventions. This article is categorized under: RNA in Disease and Development > RNA in Disease RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution RNA Evolution and Genomics > Computational Analyses of RNA RNA-Based Catalysis > RNA Catalysis in Splicing and Translation.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Humans , Autism Spectrum Disorder/genetics , Autism Spectrum Disorder/metabolism , Autistic Disorder/genetics , RNA Precursors/genetics , RNA Precursors/metabolism , RNA Splicing/genetics , RNA Splicing Factors/metabolism , Neuro-Oncological Ventral Antigen
2.
Nat Microbiol ; 9(3): 595-613, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38347104

ABSTRACT

Microbial breakdown of organic matter is one of the most important processes on Earth, yet the controls of decomposition are poorly understood. Here we track 36 terrestrial human cadavers in three locations and show that a phylogenetically distinct, interdomain microbial network assembles during decomposition despite selection effects of location, climate and season. We generated a metagenome-assembled genome library from cadaver-associated soils and integrated it with metabolomics data to identify links between taxonomy and function. This universal network of microbial decomposers is characterized by cross-feeding to metabolize labile decomposition products. The key bacterial and fungal decomposers are rare across non-decomposition environments and appear unique to the breakdown of terrestrial decaying flesh, including humans, swine, mice and cattle, with insects as likely important vectors for dispersal. The observed lockstep of microbial interactions further underlies a robust microbial forensic tool with the potential to aid predictions of the time since death.


Subject(s)
Microbial Consortia , Soil Microbiology , Mice , Humans , Animals , Swine , Cattle , Cadaver , Metagenome , Bacteria
3.
Cell Rep Med ; 4(12): 101313, 2023 12 19.
Article in English | MEDLINE | ID: mdl-38118424

ABSTRACT

Identification of the gene expression state of a cancer patient from routine pathology imaging and characterization of its phenotypic effects have significant clinical and therapeutic implications. However, prediction of expression of individual genes from whole slide images (WSIs) is challenging due to co-dependent or correlated expression of multiple genes. Here, we use a purely data-driven approach to first identify groups of genes with co-dependent expression and then predict their status from WSIs using a bespoke graph neural network. These gene groups allow us to capture the gene expression state of a patient with a small number of binary variables that are biologically meaningful and carry histopathological insights for clinical and therapeutic use cases. Prediction of gene expression state based on these gene groups allows associating histological phenotypes (cellular composition, mitotic counts, grading, etc.) with underlying gene expression patterns and opens avenues for gaining biological insights from routine pathology imaging directly.


Subject(s)
Breast Neoplasms , Gene Expression Profiling , Humans , Female , Transcriptome/genetics , Neural Networks, Computer , Phenotype , Breast Neoplasms/genetics
4.
Front Bioinform ; 3: 1198218, 2023.
Article in English | MEDLINE | ID: mdl-37915563

ABSTRACT

Motivation: The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. Results: In this work, we describe Qϵ, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the ϵ-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. Availability: The code for Qϵ is available at https://github.com/soumyadip1997/qepsilon.

5.
Genome Biol ; 24(1): 53, 2023 03 22.
Article in English | MEDLINE | ID: mdl-36949544

ABSTRACT

BACKGROUND: Alternative splicing is a widespread regulatory phenomenon that enables a single gene to produce multiple transcripts. Among the different types of alternative splicing, intron retention is one of the least explored despite its high prevalence in both plants and animals. The recent discovery that the majority of splicing is co-transcriptional has led to the finding that chromatin state affects alternative splicing. Therefore, it is plausible that transcription factors can regulate splicing outcomes. RESULTS: We provide evidence for the hypothesis that transcription factors are involved in the regulation of intron retention by studying regions of open chromatin in retained and excised introns. Using deep learning models designed to distinguish between regions of open chromatin in retained introns and non-retained introns, we identified motifs enriched in IR events with significant hits to known human transcription factors. Our model predicts that the majority of transcription factors that affect intron retention come from the zinc finger family. We demonstrate the validity of these predictions using ChIP-seq data for multiple zinc finger transcription factors and find strong over-representation for their peaks in intron retention events. CONCLUSIONS: This work opens up opportunities for further studies that elucidate the mechanisms by which transcription factors affect intron retention and other forms of splicing. AVAILABILITY: Source code available at https://github.com/fahadahaf/chromir.


Subject(s)
Alternative Splicing , Transcription Factors , Animals , Humans , Introns , Transcription Factors/genetics , RNA Splicing , Chromatin/genetics
6.
Bioinformatics ; 38(Suppl_2): ii75-ii81, 2022 09 16.
Article in English | MEDLINE | ID: mdl-36124806

ABSTRACT

MOTIVATION: Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance. RESULTS: We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. AVAILABILITY AND IMPLEMENTATION: Code and supplementary material available at https://github.com/adibayaseen/HKRCPI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Angiotensin-Converting Enzyme 2 , Machine Learning , Humans , Ligands , SARS-CoV-2
7.
BMC Bioinformatics ; 23(1): 142, 2022 Apr 20.
Article in English | MEDLINE | ID: mdl-35443610

ABSTRACT

BACKGROUND: Despite recent progress in basecalling of Oxford nanopore DNA sequencing data, its wide adoption is still being hampered by its relatively low accuracy compared to short read technologies. Furthermore, very little of the recent research was focused on basecalling of RNA data, which has different characteristics than its DNA counterpart. RESULTS: We fill this gap by benchmarking a fully convolutional deep learning basecalling architecture with improved performance compared to Oxford nanopore's RNA basecallers. AVAILABILITY: The source code for our basecaller is available at: https://github.com/biodlab/RODAN .


Subject(s)
Nanopore Sequencing , Nanopores , DNA , High-Throughput Nucleotide Sequencing , RNA , Sequence Analysis, DNA , Sequence Analysis, RNA
8.
Front Bioinform ; 2: 1083292, 2022.
Article in English | MEDLINE | ID: mdl-36591335

ABSTRACT

As practitioners of machine learning in the area of bioinformatics we know that the quality of the results crucially depends on the quality of our labeled data. While there is a tendency to focus on the quality of positive examples, the negative examples are equally as important. In this opinion paper we revisit the problem of choosing negative examples for the task of predicting protein-protein interactions, either among proteins of a given species or for host-pathogen interactions and describe important issues that are prevalent in the current literature. The challenge in creating datasets for this task is the noisy nature of the experimentally derived interactions and the lack of information on non-interacting proteins. A standard approach is to choose random pairs of non-interacting proteins as negative examples. Since the interactomes of all species are only partially known, this leads to a very small percentage of false negatives. This is especially true for host-pathogen interactions. To address this perceived issue, some researchers have chosen to select negative examples as pairs of proteins whose sequence similarity to the positive examples is sufficiently low. This clearly reduces the chance for false negatives, but also makes the problem much easier than it really is, leading to over-optimistic accuracy estimates. We demonstrate the effect of this form of bias using a selection of recent protein interaction prediction methods of varying complexity, and urge researchers to pay attention to the details of generating their datasets for potential biases like this.

9.
Front Microbiol ; 12: 681150, 2021.
Article in English | MEDLINE | ID: mdl-34054788

ABSTRACT

Histone proteins compact and organize DNA resulting in a dynamic chromatin architecture impacting DNA accessibility and ultimately gene expression. Eukaryotic chromatin landscapes are structured through histone protein variants, epigenetic marks, the activities of chromatin-remodeling complexes, and post-translational modification of histone proteins. In most Archaea, histone-based chromatin structure is dominated by the helical polymerization of histone proteins wrapping DNA into a repetitive and closely gyred configuration. The formation of the archaeal-histone chromatin-superhelix is a regulatory force of adaptive gene expression and is likely critical for regulation of gene expression in all histone-encoding Archaea. Single amino acid substitutions in archaeal histones that block formation of tightly packed chromatin structures have profound effects on cellular fitness, but the underlying gene expression changes resultant from an altered chromatin landscape have not been resolved. Using the model organism Thermococcus kodakarensis, we genetically alter the chromatin landscape and quantify the resultant changes in gene expression, including unanticipated and significant impacts on provirus transcription. Global transcriptome changes resultant from varying chromatin landscapes reveal the regulatory importance of higher-order histone-based chromatin architectures in regulating archaeal gene expression.

10.
Nucleic Acids Res ; 49(13): e77, 2021 07 21.
Article in English | MEDLINE | ID: mdl-33950192

ABSTRACT

Deep learning has demonstrated its predictive power in modeling complex biological phenomena such as gene expression. The value of these models hinges not only on their accuracy, but also on the ability to extract biologically relevant information from the trained models. While there has been much recent work on developing feature attribution methods that discover the most important features for a given sequence, inferring cooperativity between regulatory elements, which is the hallmark of phenomena such as gene expression, remains an open problem. We present SATORI, a Self-ATtentiOn based model to detect Regulatory element Interactions. Our approach combines convolutional layers with a self-attention mechanism that helps us capture a global view of the landscape of interactions between regulatory elements in a sequence. A comprehensive evaluation demonstrates the ability of SATORI to identify numerous statistically significant TF-TF interactions, many of which have been previously reported. Our method is able to detect higher numbers of experimentally verified TF-TF interactions than existing methods, and has the advantage of not requiring a computationally expensive post-processing step. Finally, SATORI can be used for detection of any type of feature interaction in models that use a similar attention mechanism, and is not limited to the detection of TF-TF interactions.


Subject(s)
Deep Learning , Genomics/methods , Regulatory Elements, Transcriptional , Transcription Factors/metabolism , Arabidopsis/genetics , Cell Line , Chromatin Immunoprecipitation Sequencing , Humans , Nucleotide Motifs , Promoter Regions, Genetic
11.
Biochem Soc Trans ; 48(6): 2399-2414, 2020 12 18.
Article in English | MEDLINE | ID: mdl-33196096

ABSTRACT

Next-generation sequencing (NGS) technologies - Illumina RNA-seq, Pacific Biosciences isoform sequencing (PacBio Iso-seq), and Oxford Nanopore direct RNA sequencing (DRS) - have revealed the complexity of plant transcriptomes and their regulation at the co-/post-transcriptional level. Global analysis of mature mRNAs, transcripts from nuclear run-on assays, and nascent chromatin-bound mRNAs using short as well as full-length and single-molecule DRS reads have uncovered potential roles of different forms of RNA polymerase II during the transcription process, and the extent of co-transcriptional pre-mRNA splicing and polyadenylation. These tools have also allowed mapping of transcriptome-wide start sites in cap-containing RNAs, poly(A) site choice, poly(A) tail length, and RNA base modifications. The emerging theme from recent studies is that reprogramming of gene expression in response to developmental cues and stresses at the co-/post-transcriptional level likely plays a crucial role in eliciting appropriate responses for optimal growth and plant survival under adverse conditions. Although the mechanisms by which developmental cues and different stresses regulate co-/post-transcriptional splicing are largely unknown, a few recent studies indicate that the external cues target spliceosomal and splicing regulatory proteins to modulate alternative splicing. In this review, we provide an overview of recent discoveries on the dynamics and complexities of plant transcriptomes, mechanistic insights into splicing regulation, and discuss critical gaps in co-/post-transcriptional research that need to be addressed using diverse genomic and biochemical approaches.


Subject(s)
Plant Proteins/metabolism , Transcriptome , Alternative Splicing , Arabidopsis/genetics , Base Sequence , Chromatin/chemistry , Chromatin/metabolism , Gene Expression Profiling , Genes, Plant , Green Fluorescent Proteins/metabolism , High-Throughput Nucleotide Sequencing , Protein Isoforms , RNA Processing, Post-Transcriptional , RNA Splicing , RNA-Seq , Sequence Analysis, RNA
12.
Genes (Basel) ; 11(8)2020 08 03.
Article in English | MEDLINE | ID: mdl-32756364

ABSTRACT

Breast cancer is the second leading cause of death in women above 60 years in the US. Screening mammography is recommended for women above 50 years; however, 22% of breast cancer cases are diagnosed in women below this age. We set out to develop a test based on the detection of cell-free RNA from saliva. To this end, we sequenced RNA from a pool of ten women. The 1254 transcripts identified were enriched for genes with an annotation of alternative pre-mRNA splicing. Pre-mRNA splicing is a tightly regulated process and its misregulation in cancer cells promotes the formation of cancer-driving isoforms. For these reasons, we chose to focus on splicing factors as biomarkers for the early detection of breast cancer. We found that the level of the splicing factors is unique to each woman and consistent in the same woman at different time points. Next, we extracted RNA from 36 healthy subjects and 31 breast cancer patients. Recording the mRNA level of seven splicing factors in these samples demonstrated that the combination of all these factors is different in the two groups (p value = 0.005). Our results demonstrate a differential abundance of splicing factor mRNA in the saliva of breast cancer patients.


Subject(s)
Biomarkers, Tumor/genetics , Breast Neoplasms/diagnosis , RNA Splicing Factors/genetics , RNA, Messenger/genetics , Saliva/metabolism , Adult , Aged , Biomarkers, Tumor/metabolism , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Female , Humans , Middle Aged , RNA Splicing Factors/metabolism , RNA, Messenger/metabolism
13.
Sci Rep ; 10(1): 6047, 2020 04 08.
Article in English | MEDLINE | ID: mdl-32269234

ABSTRACT

Efforts to develop effective and safe drugs for treatment of tuberculosis require preclinical evaluation in animal models. Alongside efficacy testing of novel therapies, effects on pulmonary pathology and disease progression are monitored by using histopathology images from these infected animals. To compare the severity of disease across treatment cohorts, pathologists have historically assigned a semi-quantitative histopathology score that may be subjective in terms of their training, experience, and personal bias. Manual histopathology therefore has limitations regarding reproducibility between studies and pathologists, potentially masking successful treatments. This report describes a pathologist-assistive software tool that reduces these user limitations, while providing a rapid, quantitative scoring system for digital histopathology image analysis. The software, called 'Lesion Image Recognition and Analysis' (LIRA), employs convolutional neural networks to classify seven different pathology features, including three different lesion types from pulmonary tissues of the C3HeB/FeJ tuberculosis mouse model. LIRA was developed to improve the efficiency of histopathology analysis for mouse tuberculosis infection models, this approach has also broader applications to other disease models and tissues. The full source code and documentation is available from https://Github.com/TB-imaging/LIRA.


Subject(s)
Image Processing, Computer-Assisted/methods , Lung/diagnostic imaging , Mycobacterium tuberculosis/physiology , Tuberculosis, Pulmonary/diagnostic imaging , Algorithms , Animals , Disease Models, Animal , Humans , Lung/pathology , Mice , Mice, Inbred C3H , Neural Networks, Computer , Software , Tuberculosis, Pulmonary/pathology
14.
Int J Mol Sci ; 21(3)2020 Jan 24.
Article in English | MEDLINE | ID: mdl-31991584

ABSTRACT

Drought is a major limiting factor of crop yields. In response to drought, plants reprogram their gene expression, which ultimately regulates a multitude of biochemical and physiological processes. The timing of this reprogramming and the nature of the drought-regulated genes in different genotypes are thought to confer differential tolerance to drought stress. Sorghum is a highly drought-tolerant crop and has been increasingly used as a model cereal to identify genes that confer tolerance. Also, there is considerable natural variation in resistance to drought in different sorghum genotypes. Here, we evaluated drought resistance in four genotypes to polyethylene glycol (PEG)-induced drought stress at the seedling stage and performed transcriptome analysis in seedlings of sorghum genotypes that are either drought-resistant or drought-sensitive to identify drought-regulated changes in gene expression that are unique to drought-resistant genotypes of sorghum. Our analysis revealed that about 180 genes are differentially regulated in response to drought stress only in drought-resistant genotypes and most of these (over 70%) are up-regulated in response to drought. Among these, about 70 genes are novel with no known function and the remaining are transcription factors, signaling and stress-related proteins implicated in drought tolerance in other crops. This study revealed a set of drought-regulated genes, including many genes encoding uncharacterized proteins that are associated with drought tolerance at the seedling stage.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation, Plant/drug effects , Genotype , Polyethylene Glycols/pharmacology , Sorghum/metabolism , Transcription, Genetic/drug effects , Transcriptome/drug effects , Dehydration/genetics , Dehydration/metabolism , Sorghum/genetics
15.
Bioinformatics ; 35(14): i269-i277, 2019 07 15.
Article in English | MEDLINE | ID: mdl-31510640

ABSTRACT

MOTIVATION: Deep learning architectures have recently demonstrated their power in predicting DNA- and RNA-binding specificity. Existing methods fall into three classes: Some are based on convolutional neural networks (CNNs), others use recurrent neural networks (RNNs) and others rely on hybrid architectures combining CNNs and RNNs. However, based on existing studies the relative merit of the various architectures remains unclear. RESULTS: In this study we present a systematic exploration of deep learning architectures for predicting DNA- and RNA-binding specificity. For this purpose, we present deepRAM, an end-to-end deep learning tool that provides an implementation of a wide selection of architectures; its fully automatic model selection procedure allows us to perform a fair and unbiased comparison of deep learning architectures. We find that deeper more complex architectures provide a clear advantage with sufficient training data, and that hybrid CNN/RNN architectures outperform other methods in terms of accuracy. Our work provides guidelines that can assist the practitioner in choosing an appropriate network architecture, and provides insight on the difference between the models learned by convolutional and recurrent networks. In particular, we find that although recurrent networks improve model accuracy, this comes at the expense of a loss in the interpretability of the features learned by the model. AVAILABILITY AND IMPLEMENTATION: The source code for deepRAM is available at https://github.com/MedChaabane/deepRAM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Deep Learning , Neural Networks, Computer , Base Sequence , DNA , RNA , Sensitivity and Specificity
16.
Plant Dis ; 103(11): 2893-2902, 2019 Nov.
Article in English | MEDLINE | ID: mdl-31436473

ABSTRACT

Uniqprimer, a software pipeline developed in Python, was deployed as a user-friendly internet tool in Rice Galaxy for comparative genome analyses to design primer sets for PCRassays capable of detecting target bacterial taxa. The pipeline was trialed with Dickeya dianthicola, a destructive broad-host-range bacterial pathogen found in most potato-growing regions. Dickeya is a highly variable genus, and some primers available to detect this genus and species exhibit common diagnostic failures. Upon uploading a selection of target and nontarget genomes, six primer sets were rapidly identified with Uniqprimer, of which two were specific and sensitive when tested with D. dianthicola. The remaining four amplified a minority of the nontarget strains tested. The two promising candidate primer sets were trialed with DNA isolated from 116 field samples from across the United States that were previously submitted for testing. D. dianthicola was detected in 41 samples, demonstrating the applicability of our detection primers and suggesting widespread occurrence of D. dianthicola in North America.


Subject(s)
Agriculture , Bacteriological Techniques , DNA Primers , Enterobacteriaceae , Solanum tuberosum , Agriculture/methods , Bacteriological Techniques/methods , DNA Primers/genetics , Enterobacteriaceae/genetics , North America , Plant Diseases/microbiology , Solanum tuberosum/microbiology
17.
BMC Bioinformatics ; 19(1): 425, 2018 Nov 15.
Article in English | MEDLINE | ID: mdl-30442086

ABSTRACT

BACKGROUND: Determining protein-protein interactions and their binding affinity are important in understanding cellular biological processes, discovery and design of novel therapeutics, protein engineering, and mutagenesis studies. Due to the time and effort required in wet lab experiments, computational prediction of binding affinity from sequence or structure is an important area of research. Structure-based methods, though more accurate than sequence-based techniques, are limited in their applicability due to limited availability of protein structure data. RESULTS: In this study, we propose a novel machine learning method for predicting binding affinity that uses protein 3D structure as privileged information at training time while expecting only protein sequence information during testing. Using the method, which is based on the framework of learning using privileged information (LUPI), we have achieved improved performance over corresponding sequence-based binding affinity prediction methods that do not have access to privileged information during training. Our experiments show that with the proposed framework which uses structure only during training, it is possible to achieve classification performance comparable to that which is obtained using structure-based features. Evaluation on an independent test set shows improved performance over the PPA-Pred2 method as well. CONCLUSIONS: The proposed method outperforms several baseline learners and a state-of-the-art binding affinity predictor not only in cross-validation, but also on an additional validation dataset, demonstrating the utility of the LUPI framework for problems that would benefit from classification using structure-based features. The implementation of LUPI developed for this work is expected to be useful in other areas of bioinformatics as well.


Subject(s)
Algorithms , Computational Biology/methods , Machine Learning , Proteins/metabolism , Amino Acid Sequence , Ligands , Protein Binding , Proteins/chemistry , ROC Curve , Reproducibility of Results , Support Vector Machine
18.
Front Plant Sci ; 9: 5, 2018.
Article in English | MEDLINE | ID: mdl-29483921

ABSTRACT

Abiotic stresses affect plant physiology, development, growth, and alter pre-mRNA splicing. Western poplar is a model woody tree and a potential bioenergy feedstock. To investigate the extent of stress-regulated alternative splicing (AS), we conducted an in-depth survey of leaf, root, and stem xylem transcriptomes under drought, salt, or temperature stress. Analysis of approximately one billion of genome-aligned RNA-Seq reads from tissue- or stress-specific libraries revealed over fifteen millions of novel splice junctions. Transcript models supported by both RNA-Seq and single molecule isoform sequencing (Iso-Seq) data revealed a broad array of novel stress- and/or tissue-specific isoforms. Analysis of Iso-Seq data also resulted in the discovery of 15,087 novel transcribed regions of which 164 show AS. Our findings demonstrate that abiotic stresses profoundly perturb transcript isoform profiles and trigger widespread intron retention (IR) events. Stress treatments often increased or decreased retention of specific introns - a phenomenon described here as differential intron retention (DIR). Many differentially retained introns were regulated in a stress- and/or tissue-specific manner. A subset of transcripts harboring super stress-responsive DIR events showed persisting fluctuations in the degree of IR across all treatments and tissue types. To investigate coordinated dynamics of intron-containing transcripts in the study we quantified absolute copy number of isoforms of two conserved transcription factors (TFs) using Droplet Digital PCR. This case study suggests that stress treatments can be associated with coordinated switches in relative ratios between fully spliced and intron-retaining isoforms and may play a role in adjusting transcriptome to abiotic stresses.

19.
BMC Genomics ; 19(1): 21, 2018 01 05.
Article in English | MEDLINE | ID: mdl-29304739

ABSTRACT

BACKGROUND: Intron retention (IR) is the most prevalent form of alternative splicing in plants. IR, like other forms of alternative splicing, has an important role in increasing gene product diversity and regulating transcript functionality. Splicing is known to occur co-transcriptionally and is influenced by the speed of transcription which in turn, is affected by chromatin structure. It follows that chromatin structure may have an important role in the regulation of splicing, and there is preliminary evidence in metazoans to suggest that this is indeed the case; however, nothing is known about the role of chromatin structure in regulating IR in plants. DNase I-seq is a useful experimental tool for genome-wide interrogation of chromatin accessibility, providing information on regions of chromatin with very high likelihood of cleavage by the enzyme DNase I, known as DNase I Hypersensitive Sites (DHSs). While it is well-established that promoter regions are highly accessible and are over-represented with DHSs, not much is known about DHSs in the bodies of genes, and their relationship to splicing in general, and IR in particular. RESULTS: In this study we use publicly available DNase I-seq data in arabidopsis and rice to investigate the relationship between IR and chromatin structure. We find that IR events are highly enriched in DHSs in both species. This implies that chromatin is more open in retained introns, which is consistent with a kinetic model of the process whereby higher speeds of transcription in those regions give less time for the spliceosomal machinery to recognize and splice out those introns co-transcriptionally. The more open chromatin in IR can also be the result of regulation mediated by DNA-binding proteins. To test this, we performed an exhaustive search for footprints left by DNA-binding proteins that are associated with IR. We identified several hundred short sequence elements that exhibit footprints in their DNase I-seq coverage, the telltale sign for binding events of a regulatory protein, protecting its binding site from cleavage by DNase I. A highly significant fraction of those sequence elements are conserved between arabidopsis and rice, a strong indication of their functional importance. CONCLUSIONS: In this study we have established an association between IR and chromatin accessibility, and presented a mechanistic hypothesis that explains the observed association from the perspective of the co-transcriptional nature of splicing. Furthermore, we identified conserved sequence elements for DNA-binding proteins that affect splicing.


Subject(s)
Arabidopsis/genetics , Chromatin/chemistry , Introns , Oryza/genetics , Alternative Splicing , Chromatin/metabolism , DNA-Binding Proteins/metabolism , Deoxyribonuclease I , Protein Footprinting
20.
PLoS Comput Biol ; 13(4): e1005465, 2017 04.
Article in English | MEDLINE | ID: mdl-28394888

ABSTRACT

Many prion-forming proteins contain glutamine/asparagine (Q/N) rich domains, and there are conflicting opinions as to the role of primary sequence in their conversion to the prion form: is this phenomenon driven primarily by amino acid composition, or, as a recent computational analysis suggested, dependent on the presence of short sequence elements with high amyloid-forming potential. The argument for the importance of short sequence elements hinged on the relatively-high accuracy obtained using a method that utilizes a collection of length-six sequence elements with known amyloid-forming potential. We weigh in on this question and demonstrate that when those sequence elements are permuted, even higher accuracy is obtained; we also propose a novel multiple-instance machine learning method that uses sequence composition alone, and achieves better accuracy than all existing prion prediction approaches. While we expect there to be elements of primary sequence that affect the process, our experiments suggest that sequence composition alone is sufficient for predicting protein sequences that are likely to form prions. A web-server for the proposed method is available at http://faculty.pieas.edu.pk/fayyaz/prank.html, and the code for reproducing our experiments is available at http://doi.org/10.5281/zenodo.167136.


Subject(s)
Amino Acid Sequence , Asparagine/chemistry , Computational Biology/methods , Glutamine/chemistry , Machine Learning , Prions/chemistry , Amyloid/chemistry , Humans , Prions/metabolism , Yeasts
SELECTION OF CITATIONS
SEARCH DETAIL
...