Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Front Phys ; 82020 Jun.
Article in English | MEDLINE | ID: mdl-33274189

ABSTRACT

Epitranscriptome is an exciting area that studies different types of modifications in transcripts and the prediction of such modification sites from the transcript sequence is of significant interest. However, the scarcity of positive sites for most modifications imposes critical challenges for training robust algorithms. To circumvent this problem, we propose MR-GAN, a generative adversarial network (GAN) based model, which is trained in an unsupervised fashion on the entire pre-mRNA sequences to learn a low dimensional embedding of transcriptomic sequences. MR-GAN was then applied to extract embeddings of the sequences in a training dataset we created for eight epitranscriptome modifications, including m6A, m1A, m1G, m2G, m5C, m5U, 2'-O-Me, Pseudouridine (Ψ) and Dihydrouridine (D), of which the positive samples are very limited. Prediction models were trained based on the embeddings extracted by MR-GAN. We compared the prediction performance with the one-hot encoding of the training sequences and SRAMP, a state-of-the-art m6A site prediction algorithm and demonstrated that the learned embeddings outperform one-hot encoding by a significant margin for up to 15% improvement. Using MR-GAN, we also investigated the sequence motifs for each modification type and uncovered known motifs as well as new motifs not possible with sequences directly. The results demonstrated that transcriptome features extracted using unsupervised learning could lead to high precision for predicting multiple types of epitranscriptome modifications, even when the data size is small and extremely imbalanced.

2.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 2394-2397, 2018 Jul.
Article in English | MEDLINE | ID: mdl-30440889

ABSTRACT

2'-O-methylation (2'-O-me) of ribose moiety is one of the significant and ubiquitous post-transcriptional RNA modifications which is vital for metabolism and functions of RNA. Although recent development of new technology (Nmseq) enabled biologists to find precise location of 2'-O-me in RNA sequences, there is still a lack of computational tools that can also provide high resolution prediction of this RNA modification. In this paper, we propose a deep learning based method that takes advantage of an embedding method to learn complex feature representation of pre-mRNA sequences and employs a Convolutional Neural Network to fine-tune the features required for accurate prediction of such alteration. Specifically, we adopted dna2vec, a biological sequence embedding method originally inspired by the word2vec model of text analysis, to yield embedded representation of sequences that may or may not contain 2-O-me sites before feeding those features into CNN for classification. Our model was trained using the data collected from Nm-seq experiment. The proposed method achieved AUC and auPRC scores of 90% outperforming existing state-of-the-art algorithms by a significant margin in both balanced and unbalanced class testing scenarios.


Subject(s)
Algorithms , Methylation , Neural Networks, Computer , RNA/chemistry
3.
Bioinformatics ; 34(20): 3446-3453, 2018 10 15.
Article in English | MEDLINE | ID: mdl-29757349

ABSTRACT

Motivation: Transcription factor (TF) binds to the promoter region of a gene to control gene expression. Identifying precise TF binding sites (TFBSs) is essential for understanding the detailed mechanisms of TF-mediated gene regulation. However, there is a shortage of computational approach that can deliver single base pair resolution prediction of TFBS. Results: In this paper, we propose DeepSNR, a Deep Learning algorithm for predicting TF binding location at Single Nucleotide Resolution de novo from DNA sequence. DeepSNR adopts a novel deconvolutional network (deconvNet) model and is inspired by the similarity to image segmentation by deconvNet. The proposed deconvNet architecture is constructed on top of 'DeepBind' and we trained the entire model using TF-specific data from ChIP-exonuclease (ChIP-exo) experiments. DeepSNR has been shown to outperform motif search-based methods for several evaluation metrics. We have also demonstrated the usefulness of DeepSNR in the regulatory analysis of TFBS as well as in improving the TFBS prediction specificity using ChIP-seq data. Availability and implementation: DeepSNR is available open source in the GitHub repository (https://github.com/sirajulsalekin/DeepSNR). Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Transcription Factors/metabolism , Algorithms , Base Pairing , Binding Sites , Humans , Protein Binding , Software , Transcription Factors/chemistry
4.
BMC Bioinformatics ; 18(1): 313, 2017 Jun 23.
Article in English | MEDLINE | ID: mdl-28645323

ABSTRACT

BACKGROUND: Identifying disease correlated features early before large number of molecules are impacted by disease progression with significant abundance change is very advantageous to biologists for developing early disease diagnosis biomarkers. Disease correlated features have relatively low level of abundance change at early stages. Finding them using existing bioinformatic tools in high throughput data is a challenging task since the technology suffers from limited dynamic range and significant noise. Most existing biomarker discovery algorithms can only detect molecules with high abundance changes, frequently missing early disease diagnostic markers. RESULTS: We present a new statistic called early response index (ERI) to prioritize disease correlated molecules as potential early biomarkers. Instead of classification accuracy, ERI measures the average classification accuracy improvement attainable by a feature when it is united with other counterparts for classification. ERI is more sensitive to abundance changes than other ranking statistics. We have shown that ERI significantly outperforms SAM and Localfdr in detecting early responding molecules in a proteomics study of a mouse model of multiple sclerosis. Importantly, ERI was able to detect many disease relevant proteins before those algorithms detect them at a later time point. CONCLUSIONS: ERI method is more sensitive for significant feature detection during early stage of disease development. It potentially has a higher specificity for biomarker discovery, and can be used to identify critical time frame for disease intervention.


Subject(s)
Biomarkers/metabolism , Multiple Sclerosis/diagnosis , Proteomics/methods , Algorithms , Animals , Central Nervous System/metabolism , Early Diagnosis , Mice , Multiple Sclerosis/metabolism , Multiple Sclerosis/pathology , Proteome/metabolism , Time Factors
5.
Mol Inform ; 36(4)2017 04.
Article in English | MEDLINE | ID: mdl-28000384

ABSTRACT

In the past decades, a few synergistic feature selection algorithms have been published, which includes Cooperative Index (CI) and K-Top Scoring Pair (k-TSP). These algorithms consider the synergistic behavior of features when they are included in a feature panel. Although promising results have been shown for these algorithms, there is lack of a comprehensive and fair comparison with other feature selection algorithms across a large number of microarray datasets in terms of classification accuracy and computational complexity. There is a need in evaluating their performance and reducing the complexity of such algorithms. We compared the performance of synergistic feature selection algorithms with 11 other commonly used algorithms based on 22 microarray gene expression binary class datasets. The evaluation confirms that synergistic algorithms such as CI and k-TSP will gradually increase the classification performance as more features are used in the classifiers. Also, in order to cut down computational cost, we proposed a new feature selection ranking score called Positive Synergy Index (PSI). Testing results show that features selected using PSI as well as synergistic feature selection algorithms provide better performance compared to with all other methods, while PSI has a computational complexity significantly lower than that of other synergistic algorithms.


Subject(s)
Algorithms , Microarray Analysis , Humans , Neoplasms/metabolism , Neoplasms/pathology , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL
...