Search | VHL Regional Portal

DeepGSR: an optimized deep-learning structure for the recognition of genomic signals and regions.

Kalkatawi, Manal; Magana-Mora, Arturo; Jankovic, Boris; Bajic, Vladimir B.

Bioinformatics ; 35(7): 1125-1132, 2019 04 01.

Article in English | MEDLINE | ID: mdl-30184052

ABSTRACT

MOTIVATION: Recognition of different genomic signals and regions (GSRs) in DNA is crucial for understanding genome organization, gene regulation, and gene function, which in turn generate better genome and gene annotations. Although many methods have been developed to recognize GSRs, their pure computational identification remains challenging. Moreover, various GSRs usually require a specialized set of features for developing robust recognition models. Recently, deep-learning (DL) methods have been shown to generate more accurate prediction models than 'shallow' methods without the need to develop specialized features for the problems in question. Here, we explore the potential use of DL for the recognition of GSRs. RESULTS: We developed DeepGSR, an optimized DL architecture for the prediction of different types of GSRs. The performance of the DeepGSR structure is evaluated on the recognition of polyadenylation signals (PAS) and translation initiation sites (TIS) of different organisms: human, mouse, bovine and fruit fly. The results show that DeepGSR outperformed the state-of-the-art methods, reducing the classification error rate of the PAS and TIS prediction in the human genome by up to 29% and 86%, respectively. Moreover, the cross-organisms and genome-wide analyses we performed, confirmed the robustness of DeepGSR and provided new insights into the conservation of examined GSRs across species. AVAILABILITY AND IMPLEMENTATION: DeepGSR is implemented in Python using Keras API; it is available as open-source software and can be obtained at https://doi.org/10.5281/zenodo.1117159. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Deep Learning , Genome , Genomics , Software , Animals , Cattle , Drosophila/genetics , Genome/genetics , Genome-Wide Association Study , Genomics/methods , Humans , Mice , Sequence Analysis, DNA , Software/standards

Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA.

Magana-Mora, Arturo; Kalkatawi, Manal; Bajic, Vladimir B.

BMC Genomics ; 18(1): 620, 2017 Aug 15.

Article in English | MEDLINE | ID: mdl-28810905

ABSTRACT

BACKGROUND: Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3'-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge. RESULTS: In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results. CONCLUSIONS: The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/ .

Subject(s)

Genome, Human/genetics , Poly A/metabolism , Data Mining , Genomics , Humans , Polyadenylation/genetics

BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B.

BMC Genomics ; 16: 616, 2015 Aug 18.

Article in English | MEDLINE | ID: mdl-26283419

ABSTRACT

BACKGROUND: Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). RESULTS: The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. CONCLUSIONS: We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

Subject(s)

Genome, Bacterial , Molecular Sequence Annotation/methods , Computational Biology/methods , Databases, Genetic , Software

INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba Alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B.

PLoS One ; 8(12): e82210, 2013.

Article in English | MEDLINE | ID: mdl-24324765

ABSTRACT

BACKGROUND: The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. RESULTS: We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. CONCLUSIONS: We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.

Subject(s)

Archaea/genetics , Bacteria/genetics , Databases, Genetic , Genome, Microbial/genetics , Benzoates/metabolism , Biodegradation, Environmental , Genome, Bacterial , Indian Ocean , Molecular Sequence Annotation , Search Engine , Software , User-Computer Interface

Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences.

Kalkatawi, Manal; Rangkuti, Farania; Schramm, Michael; Jankovic, Boris R; Kamau, Allan; Chowdhary, Rajesh; Archer, John A C; Bajic, Vladimir B.

Bioinformatics ; 29(11): 1484, 2013 Jun 01.

Article in English | MEDLINE | ID: mdl-23616439

Subject(s)

Algorithms , Neural Networks, Computer , Poly A/analysis , Humans

Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences.

Kalkatawi, Manal; Rangkuti, Farania; Schramm, Michael; Jankovic, Boris R; Kamau, Allan; Chowdhary, Rajesh; Archer, John A C; Bajic, Vladimir B.

Bioinformatics ; 28(1): 127-9, 2012 Jan 01.

Article in English | MEDLINE | ID: mdl-22088842

ABSTRACT

MOTIVATION: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. CONTACT: vladimir.bajic@kaust.edu.sa SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Neural Networks, Computer , Poly A/analysis , Genome, Human , Humans , Internet , Poly A/genetics , Sensitivity and Specificity , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL