Pesquisa | Portal Regional da BVS

1.

Target repositioning using multi-layer networks and machine learning: The case of prostate cancer.

Picard, Milan; Scott-Boyer, Marie-Pier; Bodein, Antoine; Leclercq, Mickaël; Prunier, Julien; Périn, Olivier; Droit, Arnaud.

Comput Struct Biotechnol J ; 24: 464-475, 2024 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38983753

RESUMO

The discovery of novel therapeutic targets, defined as proteins which drugs can interact with to induce therapeutic benefits, typically represent the first and most important step of drug discovery. One solution for target discovery is target repositioning, a strategy which relies on the repurposing of known targets for new diseases, leading to new treatments, less side effects and potential drug synergies. Biological networks have emerged as powerful tools for integrating heterogeneous data and facilitating the prediction of biological or therapeutic properties. Consequently, they are widely employed to predict new therapeutic targets by characterizing potential candidates, often based on their interactions within a Protein-Protein Interaction (PPI) network, and their proximity to genes associated with the disease. However, over-reliance on PPI networks and the assumption that potential targets are necessarily near known genes can introduce biases that may limit the effectiveness of these methods. This study addresses these limitations in two ways. First, by exploiting a multi-layer network which incorporates additional information such as gene regulation, metabolite interactions, metabolic pathways, and several disease signatures such as Differentially Expressed Genes, mutated genes, Copy Number Alteration, and structural variants. Second, by extracting relevant features from the network using several approaches including proximity to disease-associated genes, but also unbiased approaches such as propagation-based methods, topological metrics, and module detection algorithms. Using prostate cancer as a case study, the best features were identified and utilized to train machine learning algorithms to predict 5 novel promising therapeutic targets for prostate cancer: IGF2R, C5AR, RAB7, SETD2 and NPBWR1.

2.

A strategy to detect metabolic changes induced by exposure to chemicals from large sets of condition-specific metabolic models computed with enumeration techniques.

Fresnais, Louison; Perin, Olivier; Riu, Anne; Grall, Romain; Ott, Alban; Fromenty, Bernard; Gallardo, Jean-Clément; Stingl, Maximilian; Frainay, Clément; Jourdan, Fabien; Poupin, Nathalie.

BMC Bioinformatics ; 25(1): 234, 2024 Jul 11.

Artigo em Inglês | MEDLINE | ID: mdl-38992584

RESUMO

BACKGROUND: The growing abundance of in vitro omics data, coupled with the necessity to reduce animal testing in the safety assessment of chemical compounds and even eliminate it in the evaluation of cosmetics, highlights the need for adequate computational methodologies. Data from omics technologies allow the exploration of a wide range of biological processes, therefore providing a better understanding of mechanisms of action (MoA) related to chemical exposure in biological systems. However, the analysis of these large datasets remains difficult due to the complexity of modulations spanning multiple biological processes. RESULTS: To address this, we propose a strategy to reduce information overload by computing, based on transcriptomics data, a comprehensive metabolic sub-network reflecting the metabolic impact of a chemical. The proposed strategy integrates transcriptomic data to a genome scale metabolic network through enumeration of condition-specific metabolic models hence translating transcriptomics data into reaction activity probabilities. Based on these results, a graph algorithm is applied to retrieve user readable sub-networks reflecting the possible metabolic MoA (mMoA) of chemicals. This strategy has been implemented as a three-step workflow. The first step consists in building cell condition-specific models reflecting the metabolic impact of each exposure condition while taking into account the diversity of possible optimal solutions with a partial enumeration algorithm. In a second step, we address the challenge of analyzing thousands of enumerated condition-specific networks by computing differentially activated reactions (DARs) between the two sets of enumerated possible condition-specific models. Finally, in the third step, DARs are grouped into clusters of functionally interconnected metabolic reactions, representing possible mMoA, using the distance-based clustering and subnetwork extraction method. The first part of the workflow was exemplified on eight molecules selected for their known human hepatotoxic outcomes associated with specific MoAs well described in the literature and for which we retrieved primary human hepatocytes transcriptomic data in Open TG-GATEs. Then, we further applied this strategy to more precisely model and visualize associated mMoA for two of these eight molecules (amiodarone and valproic acid). The approach proved to go beyond gene-based analysis by identifying mMoA when few genes are significantly differentially expressed (2 differentially expressed genes (DEGs) for amiodarone), bringing additional information from the network topology, or when very large number of genes were differentially expressed (5709 DEGs for valproic acid). In both cases, the results of our strategy well fitted evidence from the literature regarding known MoA. Beyond these confirmations, the workflow highlighted potential other unexplored mMoA. CONCLUSION: The proposed strategy allows toxicology experts to decipher which part of cellular metabolism is expected to be affected by the exposition to a given chemical. The approach originality resides in the combination of different metabolic modelling approaches (constraint based and graph modelling). The application to two model molecules shows the strong potential of the approach for interpretation and visual mining of complex omics in vitro data. The presented strategy is freely available as a python module ( https://pypi.org/project/manamodeller/ ) and jupyter notebooks ( https://github.com/LouisonF/MANA ).

Assuntos

Algoritmos , Humanos , Redes e Vias Metabólicas/efeitos dos fármacos , Modelos Biológicos , Biologia Computacional/métodos , Transcriptoma/genética , Transcriptoma/efeitos dos fármacos , Perfilação da Expressão Gênica/métodos

3.

Use of Elasticsearch-based business intelligence tools for integration and visualization of biological data.

Scott-Boyer, Marie-Pier; Dufour, Pascal; Belleau, François; Ongaro-Carcy, Regis; Plessis, Clément; Périn, Olivier; Droit, Arnaud.

Brief Bioinform ; 24(6)2023 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-37798252

RESUMO

The emergence of massive datasets exploring the multiple levels of molecular biology has made their analysis and knowledge transfer more complex. Flexible tools to manage big biological datasets could be of great help for standardizing the usage of developed data visualizations and integration methods. Business intelligence (BI) tools have been used in many fields as exploratory tools. They have numerous connectors to link numerous data repositories with a unified graphic interface, offering an overview of data and facilitating interpretation for decision makers. BI tools could be a flexible and user-friendly way of handling molecular biological data with interactive visualizations. However, it is rather uncommon to see such tools used for the exploration of massive and complex datasets in biological fields. We believe that two main obstacles could be the reason. Firstly, we posit that the way to import data into BI tools are not compatible with biological databases. Secondly, BI tools may not be adapted to certain particularities of complex biological data, namely, the size, the variability of datasets and the availability of specialized visualizations. This paper highlights the use of five BI tools (Elastic Kibana, Siren Investigate, Microsoft Power BI, Salesforce Tableau and Apache Superset) onto which the massive data management repository engine called Elasticsearch is compatible. Four case studies will be discussed in which these BI tools were applied on biological datasets with different characteristics. We conclude that the performance of the tools depends on the complexity of the biological questions and the size of the datasets.

Assuntos

Conjuntos de Dados como Assunto , Software , Visualização de Dados

4.

Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context.

Robin, Vivian; Bodein, Antoine; Scott-Boyer, Marie-Pier; Leclercq, Mickaël; Périn, Olivier; Droit, Arnaud.

Front Mol Biosci ; 9: 962799, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36158572

RESUMO

At the heart of the cellular machinery through the regulation of cellular functions, protein-protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.

5.

Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation.

Mathieu, Alban; Leclercq, Mickael; Sanabria, Melissa; Perin, Olivier; Droit, Arnaud.

Front Microbiol ; 13: 811495, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35359727

RESUMO

Shotgun sequencing of environmental DNA (i.e., metagenomics) has revolutionized the field of environmental microbiology, allowing the characterization of all microorganisms in a sequencing experiment. To identify the microbes in terms of taxonomy and biological activity, the sequenced reads must necessarily be aligned on known microbial genomes/genes. However, current alignment methods are limited in terms of speed and can produce a significant number of false positives when detecting bacterial species or false negatives in specific cases (virus, plasmids, and gene detection). Moreover, recent advances in metagenomics have enabled the reconstruction of new genomes using de novo binning strategies, but these genomes, not yet fully characterized, are not used in classic approaches, whereas machine and deep learning methods can use them as models. In this article, we attempted to review the different methods and their efficiency to improve the annotation of metagenomic sequences. Deep learning models have reached the performance of the widely used k-mer alignment-based tools, with better accuracy in certain cases; however, they still must demonstrate their robustness across the variety of environmental samples and across the rapid expansion of accessible genomes in databases.

6.

Advances in Microbiome-Derived Solutions and Methodologies Are Founding a New Era in Skin Health and Care.

Gueniche, Audrey; Perin, Olivier; Bouslimani, Amina; Landemaine, Leslie; Misra, Namita; Cupferman, Sylvie; Aguilar, Luc; Clavaud, Cécile; Chopra, Tarun; Khodr, Ahmad.

Pathogens ; 11(2)2022 Jan 20.

Artigo em Inglês | MEDLINE | ID: mdl-35215065

RESUMO

The microbiome, as a community of microorganisms and their structural elements, genomes, metabolites/signal molecules, has been shown to play an important role in human health, with significant beneficial applications for gut health. Skin microbiome has emerged as a new field with high potential to develop disruptive solutions to manage skin health and disease. Despite an incomplete toolbox for skin microbiome analyses, much progress has been made towards functional dissection of microbiomes and host-microbiome interactions. A standardized and robust investigation of the skin microbiome is necessary to provide accurate microbial information and set the base for a successful translation of innovations in the dermo-cosmetic field. This review provides an overview of how the landscape of skin microbiome research has evolved from method development (multi-omics/data-based analytical approaches) to the discovery and development of novel microbiome-derived ingredients. Moreover, it provides a summary of the latest findings on interactions between the microbiomes (gut and skin) and skin health/disease. Solutions derived from these two paths are used to develop novel microbiome-based ingredients or solutions acting on skin homeostasis are proposed. The most promising skin and gut-derived microbiome interventional strategies are presented, along with regulatory, safety, industrial, and technical challenges related to a successful translation of these microbiome-based concepts/technologies in the dermo-cosmetic industry.

7.

timeOmics: an R package for longitudinal multi-omics data integration.

Bodein, Antoine; Scott-Boyer, Marie-Pier; Perin, Olivier; Lê Cao, Kim-Anh; Droit, Arnaud.

Bioinformatics ; 38(2): 577-579, 2022 01 03.

Artigo em Inglês | MEDLINE | ID: mdl-34554215

RESUMO

MOTIVATION: Multi-omics data integration enables the global analysis of biological systems and discovery of new biological insights. Multi-omics experimental designs have been further extended with a longitudinal dimension to study dynamic relationships between molecules. However, methods that integrate longitudinal multi-omics data are still in their infancy. RESULTS: We introduce the R package timeOmics, a generic analytical framework for the integration of longitudinal multi-omics data. The framework includes pre-processing, modeling and clustering to identify molecular features strongly associated with time. We illustrate this framework in a case study to detect seasonal patterns of mRNA, metabolites, gut taxa and clinical variables in patients with diabetes mellitus from the integrative Human Microbiome Project. AVAILABILITYAND IMPLEMENTATION: timeOmics is available on Bioconductor and github.com/abodein/timeOmics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Multiômica , Humanos , Genômica/métodos , Análise por Conglomerados

8.

Interpretation of network-based integration from multi-omics longitudinal data.

Bodein, Antoine; Scott-Boyer, Marie-Pier; Perin, Olivier; Lê Cao, Kim-Anh; Droit, Arnaud.

Nucleic Acids Res ; 50(5): e27, 2022 03 21.

Artigo em Inglês | MEDLINE | ID: mdl-34883510

RESUMO

Multi-omics integration is key to fully understand complex biological processes in an holistic manner. Furthermore, multi-omics combined with new longitudinal experimental design can unreveal dynamic relationships between omics layers and identify key players or interactions in system development or complex phenotypes. However, integration methods have to address various experimental designs and do not guarantee interpretable biological results. The new challenge of multi-omics integration is to solve interpretation and unlock the hidden knowledge within the multi-omics data. In this paper, we go beyond integration and propose a generic approach to face the interpretation problem. From multi-omics longitudinal data, this approach builds and explores hybrid multi-omics networks composed of both inferred and known relationships within and between omics layers. With smart node labelling and propagation analysis, this approach predicts regulation mechanisms and multi-omics functional modules. We applied the method on 3 case studies with various multi-omics designs and identified new multi-layer interactions involved in key biological functions that could not be revealed with single omics analysis. Moreover, we highlighted interplay in the kinetics that could help identify novel biological mechanisms. This method is available as an R package netOmics to readily suit any application.

Assuntos

Genômica , Biologia de Sistemas/métodos , Genômica/métodos , Fenótipo

9.

Integration strategies of multi-omics data for machine learning analysis.

Picard, Milan; Scott-Boyer, Marie-Pier; Bodein, Antoine; Périn, Olivier; Droit, Arnaud.

Comput Struct Biotechnol J ; 19: 3735-3746, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34285775

RESUMO

Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.

10.

HCK and ABAA: A Newly Designed Pipeline to Improve Fungi Metabarcoding Analysis.

Mlaga, Kodjovi D; Mathieu, Alban; Beauparlant, Charles Joly; Ott, Alban; Khodr, Ahmad; Perin, Olivier; Droit, Arnaud.

Front Microbiol ; 12: 640693, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34025601

RESUMO

INTRODUCTION: The fungi ITS sequence length dissimilarity, non-specific amplicons, including chimaera formed during Polymerase Chain Reaction (PCR), added to sequencing errors, create bias during similarity clustering and abundance estimation in the downstream analysis. To overcome these challenges, we present a novel approach, Hierarchical Clustering with Kraken (HCK), to classify ITS1 amplicons and Abundance-Base Alternative Approach (ABAA) pipeline to detect and filter non-specific amplicons in fungi metabarcoding sequencing datasets. MATERIALS AND METHODS: We compared the performances of both pipelines against QIIME, KRAKEN, and DADA2 using publicly available fungi ITS mock community datasets and using BLASTn as a reference. We calculated the Precision, Recall, F-score using the True-Positive, False-positive, and False-negative estimation. Alpha diversity (Chao1 and Shannon metrics) was also used to evaluate the diversity estimation of our method. RESULTS: The analysis shows that ABAA reduced the number of false-positive with all metabarcoding methods tested, and HCK increases precision and recall. HCK, coupled with ABAA, improves the F-score and bring alpha diversity metric value close to that of the BLASTn alpha diversity values when compared to QIIME, KRAKEN, and DADA2. CONCLUSION: The developed HCK-ABAA approach allows better identification of the fungi community structures while avoiding use of a reference database for non-specific amplicons filtration. It results in a more robust and stable methodology over time. The software can be downloaded on the following link: https://bitbucket.org/GottySG36/hck/src/master/.

11.

GWENA: gene co-expression networks analysis and extended modules characterization in a single Bioconductor package.

Lemoine, Gwenaëlle G; Scott-Boyer, Marie-Pier; Ambroise, Bathilde; Périn, Olivier; Droit, Arnaud.

BMC Bioinformatics ; 22(1): 267, 2021 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-34034647

RESUMO

BACKGROUND: Network-based analysis of gene expression through co-expression networks can be used to investigate modular relationships occurring between genes performing different biological functions. An extended description of each of the network modules is therefore a critical step to understand the underlying processes contributing to a disease or a phenotype. Biological integration, topology study and conditions comparison (e.g. wild vs mutant) are the main methods to do so, but to date no tool combines them all into a single pipeline. RESULTS: Here we present GWENA, a new R package that integrates gene co-expression network construction and whole characterization of the detected modules through gene set enrichment, phenotypic association, hub genes detection, topological metric computation, and differential co-expression. To demonstrate its performance, we applied GWENA on two skeletal muscle datasets from young and old patients of GTEx study. Remarkably, we prioritized a gene whose involvement was unknown in the muscle development and growth. Moreover, new insights on the variations in patterns of co-expression were identified. The known phenomena of connectivity loss associated with aging was found coupled to a global reorganization of the relationships leading to expression of known aging related functions. CONCLUSION: GWENA is an R package available through Bioconductor ( https://bioconductor.org/packages/release/bioc/html/GWENA.html ) that has been developed to perform extended analysis of gene co-expression networks. Thanks to biological and topological information as well as differential co-expression, the package helps to dissect the role of genes relationships in diseases conditions or targeted phenotypes. GWENA goes beyond existing packages that perform co-expression analysis by including new tools to fully characterize modules, such as differential co-expression, additional enrichment databases, and network visualization.

Assuntos

Redes Reguladoras de Genes , Software , Expressão Gênica , Perfilação da Expressão Gênica , Humanos

12.

KibioR & Kibio: a new architecture for next-generation data querying and sharing in big biology.

Ongaro-Carcy, Régis; Scott-Boyer, Marie-Pier; Dessemond, Adrien; Belleau, François; Leclercq, Mickael; Périn, Olivier; Droit, Arnaud.

Bioinformatics ; 37(17): 2706-2713, 2021 Sep 09.

Artigo em Inglês | MEDLINE | ID: mdl-33751043

RESUMO

MOTIVATION: The growing production of massive heterogeneous biological data offers opportunities for new discoveries. However, performing multi-omics data analysis is challenging, and researchers are forced to handle the ever-increasing complexity of both data management and evolution of our biological understanding. Substantial efforts have been made to unify biological datasets into integrated systems. Unfortunately, they are not easily scalable, deployable and searchable, locally or globally. RESULTS: This publication presents two tools with a simple structure that can help any data provider, organization or researcher, requiring a reliable data search and analysis base. The first tool is Kibio, a scalable and adaptable data storage based on Elasticsearch search engine. The second tool is KibioR, a R package to pull, push and search Kibio datasets or any accessible Elasticsearch-based databases. These tools apply a uniform data exchange model and minimize the burden of data management by organizing data into a decentralized, versatile, searchable and shareable structure. Several case studies are presented using multiple databases, from drug characterization to miRNAs and pathways identification, emphasizing the ease of use and versatility of the Kibio/KibioR framework. AVAILABILITYAND IMPLEMENTATION: Both KibioR and Elasticsearch are open source. KibioR package source is available at https://github.com/regisoc/kibior and the library on CRAN at https://cran.r-project.org/package=kibior. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.

Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data.

Leclercq, Mickael; Vittrant, Benjamin; Martin-Magniette, Marie Laure; Scott Boyer, Marie Pier; Perin, Olivier; Bergeron, Alain; Fradet, Yves; Droit, Arnaud.

Front Genet ; 10: 452, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31156708

RESUMO

The identification of biomarker signatures in omics molecular profiling is usually performed to predict outcomes in a precision medicine context, such as patient disease susceptibility, diagnosis, prognosis, and treatment response. To identify these signatures, we have developed a biomarker discovery tool, called BioDiscML. From a collection of samples and their associated characteristics, i.e., the biomarkers (e.g., gene expression, protein levels, clinico-pathological data), BioDiscML exploits various feature selection procedures to produce signatures associated to machine learning models that will predict efficiently a specified outcome. To this purpose, BioDiscML uses a large variety of machine learning algorithms to select the best combination of biomarkers for predicting categorical or continuous outcomes from highly unbalanced datasets. The software has been implemented to automate all machine learning steps, including data pre-processing, feature selection, model selection, and performance evaluation. BioDiscML is delivered as a stand-alone program and is available for download at https://github.com/mickaelleclercq/BioDiscML.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA