Search | VHL Regional Portal

Integration of Meta-Multi-Omics Data Using Probabilistic Graphs and External Knowledge.

Can, Handan; Chanumolu, Sree K; Nielsen, Barbara D; Alvarez, Sophie; Naldrett, Michael J; Ünlü, Gülhan; Otu, Hasan H.

Cells ; 12(15)2023 08 04.

Article in English | MEDLINE | ID: mdl-37566077

ABSTRACT

Multi-omics has the promise to provide a detailed molecular picture of biological systems. Although obtaining multi-omics data is relatively easy, methods that analyze such data have been lagging. In this paper, we present an algorithm that uses probabilistic graph representations and external knowledge to perform optimal structure learning and deduce a multifarious interaction network for multi-omics data from a bacterial community. Kefir grain, a microbial community that ferments milk and creates kefir, represents a self-renewing, stable, natural microbial community. Kefir has been shown to have a wide range of health benefits. We obtained a controlled bacterial community using the two most abundant and well-studied species in kefir grains: Lentilactobacillus kefiri and Lactobacillus kefiranofaciens. We applied growth temperatures of 30 °C and 37 °C and obtained transcriptomic, metabolomic, and proteomic data for the same 20 samples (10 samples per temperature). We obtained a multi-omics interaction network, which generated insights that would not have been possible with single-omics analysis. We identified interactions among transcripts, proteins, and metabolites, suggesting active toxin/antitoxin systems. We also observed multifarious interactions that involved the shikimate pathway. These observations helped explain bacterial adaptation to different stress conditions, co-aggregation, and increased activation of L. kefiranofaciens at 37 °C.

Subject(s)

Cultured Milk Products , Cultured Milk Products/microbiology , Multiomics , Proteomics , Bacteria/genetics

Identifying large-scale interaction atlases using probabilistic graphs and external knowledge.

Chanumolu, Sree K; Otu, Hasan H.

J Clin Transl Sci ; 6(1): e27, 2022.

Article in English | MEDLINE | ID: mdl-35321220

ABSTRACT

Introduction: Reconstruction of gene interaction networks from experimental data provides a deep understanding of the underlying biological mechanisms. The noisy nature of the data and the large size of the network make this a very challenging task. Complex approaches handle the stochastic nature of the data but can only do this for small networks; simpler, linear models generate large networks but with less reliability. Methods: We propose a divide-and-conquer approach using probabilistic graph representations and external knowledge. We cluster the experimental data and learn an interaction network for each cluster, which are merged using the interaction network for the representative genes selected for each cluster. Results: We generated an interaction atlas for 337 human pathways yielding a network of 11,454 genes with 17,777 edges. Simulated gene expression data from this atlas formed the basis for reconstruction. Based on the area under the curve of the precision-recall curve, the proposed approach outperformed the baseline (random classifier) by â¼15-fold and conventional methods by â¼5-17-fold. The performance of the proposed workflow is significantly linked to the accuracy of the clustering step that tries to identify the modularity of the underlying biological mechanisms. Conclusions: We provide an interaction atlas generation workflow optimizing the algorithm/parameter selection. The proposed approach integrates external knowledge in the reconstruction of the interactome using probabilistic graphs. Network characterization and understanding long-range effects in interaction atlases provide means for comparative analysis with implications in biomarker discovery and therapeutic approaches. The proposed workflow is freely available at http://otulab.unl.edu/atlas.

KEGG2Net: Deducing gene interaction networks and acyclic graphs from KEGG pathways.

Chanumolu, Sree K; Albahrani, Mustafa; Can, Handan; Otu, Hasan H.

EMBnet J ; 262021.

Article in English | MEDLINE | ID: mdl-33880340

ABSTRACT

The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database provides a manual curation of biological pathways that involve genes (or gene products), metabolites, chemical compounds, maps, and other entries. However, most applications and datasets involved in omics are gene or protein-centric requiring pathway representations that include direct and indirect interactions only between genes. Furthermore, special methodologies, such as Bayesian networks require acyclic representations of graphs. We developed KEGG2Net, a web resource that generates a network involving only the genes represented on a KEGG pathway with all of the direct and indirect gene-gene interactions deduced from the pathway. KEGG2Net offers four different methods to remove cycles from the resulting gene interaction network, converting them into directed acyclic graphs (DAGs). We generated synthetic gene expression data using the gene interaction networks deduced from the KEGG pathways and performed a comparative analysis of different cycle removal methods by testing the fitness of their DAGs to the data and by the number of edges they eliminate. Our results indicate that an ensemble method for cycle removal performs as the best approach to convert the gene interaction networks into DAGs. Resulting gene interaction networks and DAGs are represented in multiple user-friendly formats that can be used in other applications, and as images for quick and easy visualisation. The KEGG2Net web portal converts KEGG maps for any organism into gene-gene interaction networks and corresponding DAGS representing all of the direct and indirect interactions among the genes.

Comparative analysis of single-cell transcriptomics in human and Zebrafish oocytes.

Can, Handan; Chanumolu, Sree K; Gonzalez-Muñoz, Elena; Prukudom, Sukumal; Otu, Hasan H; Cibelli, Jose B.

BMC Genomics ; 21(1): 471, 2020 Jul 08.

Article in English | MEDLINE | ID: mdl-32640983

ABSTRACT

BACKGROUND: Zebrafish is a popular model organism, which is widely used in developmental biology research. Despite its general use, the direct comparison of the zebrafish and human oocyte transcriptomes has not been well studied. It is significant to see if the similarity observed between the two organisms at the gene sequence level is also observed at the expression level in key cell types such as the oocyte. RESULTS: We performed single-cell RNA-seq of the zebrafish oocyte and compared it with two studies that have performed single-cell RNA-seq of the human oocyte. We carried out a comparative analysis of genes expressed in the oocyte and genes highly expressed in the oocyte across the three studies. Overall, we found high consistency between the human studies and high concordance in expression for the orthologous genes in the two organisms. According to the Ensembl database, about 60% of the human protein coding genes are orthologous to the zebrafish genes. Our results showed that a higher percentage of the genes that are highly expressed in both organisms show orthology compared to the lower expressed genes. Systems biology analysis of the genes highly expressed in the three studies showed significant overlap of the enriched pathways and GO terms. Moreover, orthologous genes that are commonly overexpressed in both organisms were involved in biological mechanisms that are functionally essential to the oocyte. CONCLUSIONS: Orthologous genes are concurrently highly expressed in the oocytes of the two organisms and these genes belong to similar functional categories. Our results provide evidence that zebrafish could serve as a valid model organism to study the oocyte with direct implications in human.

Subject(s)

Oocytes/metabolism , Transcriptome , Zebrafish/genetics , Animals , Humans , RNA-Seq , Single-Cell Analysis , Zebrafish/metabolism

FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics.

Chanumolu, Sree K; Albahrani, Mustafa; Otu, Hasan H.

BMC Bioinformatics ; 20(1): 424, 2019 Aug 15.

Article in English | MEDLINE | ID: mdl-31416440

ABSTRACT

BACKGROUND: High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data has become a routine but important task. Identification and elimination of low-quality sequence data is crucial for reliability of downstream analysis results. There is a need for a high-speed tool that uses optimized parallel programming for batch processing and simply gauges the quality of sequencing data from multiple datasets independent of any other processing steps. RESULTS: FQStat is a stand-alone, platform-independent software tool that assesses the quality of FASTQ files using parallel programming. Based on the machine architecture and input data, FQStat automatically determines the number of cores and the amount of memory to be allocated per file for optimum performance. Our results indicate that in a core-limited case, core assignment overhead exceeds the benefit of additional cores. In a core-unlimited case, there is a saturation point reached in performance by increasingly assigning additional cores per file. We also show that memory allocation per file has a lower priority in performance when compared to the allocation of cores. FQStat's output is summarized in HTML web page, tab-delimited text file, and high-resolution image formats. FQStat calculates and plots read count, read length, quality score, and high-quality base statistics. FQStat identifies and marks low-quality sequencing data to suggest removal from downstream analysis. We applied FQStat on real sequencing data to optimize performance and to demonstrate its capabilities. We also compared FQStat's performance to similar quality control (QC) tools that utilize parallel programming and attained improvements in run time. CONCLUSIONS: FQStat is a user-friendly tool with a graphical interface that employs a parallel programming architecture and automatically optimizes its performance to generate quality control statistics for sequencing data. Unlike existing tools, these statistics are calculated for multiple datasets and separately at the "lane," "sample," and "experiment" level to identify subsets of the samples with low quality, thereby preventing the loss of complete samples when reliable data can still be obtained.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Software , Databases, Nucleic Acid , Humans , Quality Control , Sequence Analysis, DNA , Sequence Analysis, RNA , Time Factors

EpiCombFlu: exploring known influenza epitopes and their combination to design a universal influenza vaccine.

Jaiswal, Varun; Chanumolu, Sree K; Sharma, Pankaj; Chauhan, Rajinder S; Rout, Chittaranjan.

Bioinformatics ; 29(15): 1904-7, 2013 Aug 01.

Article in English | MEDLINE | ID: mdl-23716197

ABSTRACT

MOTIVATION: Influenza is responsible for half a million deaths annually, and vaccination is the best preventive measure against this pervasive health problem. Influenza vaccines developed from surveillance data of each season are strain-specific, and therefore, are unable to provide protection against pandemic strains arising from antigenic shift and drift. Seasonal epidemics and occasional pandemics of influenza have created a need for a universal influenza vaccine (UIV). Researchers have shown that a combination of conserved epitopes has the potential to be used as a UIV. RESULT: In the present work, available data on strains, proteins, epitopes and their associated information were used to develop a Web resource, 'EpiCombFlu', which can explore different influenza epitopes and their combinations for conservation among different strains, population coverage and immune response for vaccine design. Forward selection algorithm was implemented in EpiCombFlu to select optimum combination of epitopes that may be expressed and evaluated as potential UIV. AVAILABILITY: The Web resource is freely available at http://117.211.115.67/influenza/home.html. CONTACT: chittaranjan.rout@juit.ac.in SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Epitopes/chemistry , Influenza Vaccines/immunology , Software , Algorithms , Epitopes/immunology , Humans , Internet , Orthomyxoviridae/classification , Orthomyxoviridae/immunology , Sequence Analysis, Protein , Viral Proteins/chemistry , Viral Proteins/immunology

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL