Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Nucleic Acids Res ; 46(16): 8105-8113, 2018 09 19.
Article in English | MEDLINE | ID: mdl-29986088

ABSTRACT

The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger RNA (mRNA) and long noncoding RNA (lncRNA) sequences. Our model, mRNA RNN (mRNN), surpasses state-of-the-art methods at predicting protein-coding potential despite being trained with less data and with no prior concept of what features define mRNAs. To understand what mRNN learned, we probed the network and uncovered several context-sensitive codons highly predictive of coding potential. Our results suggest that gated RNNs can learn complex and long-range patterns in full-length human transcripts, making them ideal for performing a wide range of difficult classification tasks and, most importantly, for harvesting new biological insights from the rising flood of sequencing data.


Subject(s)
Computational Biology/methods , Neural Networks, Computer , Open Reading Frames/genetics , RNA, Long Noncoding/genetics , RNA, Messenger/genetics , Base Sequence , Humans , Machine Learning , Protein Biosynthesis , Reproducibility of Results , Sequence Analysis, RNA/methods
2.
Nucleic Acids Res ; 46(11): 5381-5394, 2018 06 20.
Article in English | MEDLINE | ID: mdl-29746666

ABSTRACT

While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.


Subject(s)
Computational Biology/methods , Inverted Repeat Sequences/genetics , Nucleic Acid Conformation , RNA/metabolism , Algorithms , Bacteria/genetics , Base Pairing/genetics , Sequence Analysis, RNA , Software , Thermodynamics
3.
Pac Symp Biocomput ; 22: 219-229, 2017.
Article in English | MEDLINE | ID: mdl-27896977

ABSTRACT

Cancer detection from gene expression data continues to pose a challenge due to the high dimensionality and complexity of these data. After decades of research there is still uncertainty in the clinical diagnosis of cancer and the identification of tumor-specific markers. Here we present a deep learning approach to cancer detection, and to the identification of genes critical for the diagnosis of breast cancer. First, we used Stacked Denoising Autoencoder (SDAE) to deeply extract functional features from high dimensional gene expression profiles. Next, we evaluated the performance of the extracted representation through supervised classification models to verify the usefulness of the new features in cancer detection. Lastly, we identified a set of highly interactive genes by analyzing the SDAE connectivity matrices. Our results and analysis illustrate that these highly interactive genes could be useful cancer biomarkers for the detection of breast cancer that deserve further studies.


Subject(s)
Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Algorithms , Biomarkers, Tumor/genetics , Breast Neoplasms/classification , Computational Biology , Databases, Genetic/statistics & numerical data , Female , Gene Expression Profiling/statistics & numerical data , Gene Ontology , Humans , Principal Component Analysis , Supervised Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL
...