Pesquisa | Portal Regional da BVS (teste)

1.

Protocol to explain support vector machine predictions via exact Shapley value computation.

Mastropietro, Andrea; Bajorath, Jürgen.

STAR Protoc ; 5(2): 103010, 2024 Jun 21.

Artigo em Inglês | MEDLINE | ID: mdl-38607924

RESUMO

Shapley values from cooperative game theory are adapted for explaining machine learning predictions. For large feature sets used in machine learning, Shapley values are approximated. We present a protocol for two techniques for explaining support vector machine predictions with exact Shapley value computation. We detail the application of these algorithms and provide ready-to-use Python scripts and custom code. The final output of the protocol includes quantitative feature analysis and mapping of important features for visualization. For complete details on the use and execution of this protocol, please refer to Feldmann and Bajorath1 and Mastropietro et al.2.

Assuntos

Algoritmos , Máquina de Vetores de Suporte , Humanos , Teoria dos Jogos , Software , Aprendizado de Máquina

2.

Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel.

Mastropietro, Andrea; Feldmann, Christian; Bajorath, Jürgen.

Sci Rep ; 13(1): 19561, 2023 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-37949930

RESUMO

Machine learning (ML) algorithms are extensively used in pharmaceutical research. Most ML models have black-box character, thus preventing the interpretation of predictions. However, rationalizing model decisions is of critical importance if predictions should aid in experimental design. Accordingly, in interdisciplinary research, there is growing interest in explaining ML models. Methods devised for this purpose are a part of the explainable artificial intelligence (XAI) spectrum of approaches. In XAI, the Shapley value concept originating from cooperative game theory has become popular for identifying features determining predictions. The Shapley value concept has been adapted as a model-agnostic approach for explaining predictions. Since the computational time required for Shapley value calculations scales exponentially with the number of features used, local approximations such as Shapley additive explanations (SHAP) are usually required in ML. The support vector machine (SVM) algorithm is one of the most popular ML methods in pharmaceutical research and beyond. SVM models are often explained using SHAP. However, there is only limited correlation between SHAP and exact Shapley values, as previously demonstrated for SVM calculations using the Tanimoto kernel, which limits SVM model explanation. Since the Tanimoto kernel is a special kernel function mostly applied for assessing chemical similarity, we have developed the Shapley value-expressed radial basis function (SVERAD), a computationally efficient approach for the calculation of exact Shapley values for SVM models based upon radial basis function kernels that are widely applied in different areas. SVERAD is shown to produce meaningful explanations of SVM predictions.

3.

XGDAG: explainable gene-disease associations via graph neural networks.

Mastropietro, Andrea; De Carlo, Gianluca; Anagnostopoulos, Aris.

Bioinformatics ; 39(8)2023 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-37531293

RESUMO

MOTIVATION: Disease gene prioritization consists in identifying genes that are likely to be involved in the mechanisms of a given disease, providing a ranking of such genes. Recently, the research community has used computational methods to uncover unknown gene-disease associations; these methods range from combinatorial to machine learning-based approaches. In particular, during the last years, approaches based on deep learning have provided superior results compared to more traditional ones. Yet, the problem with these is their inherent black-box structure, which prevents interpretability. RESULTS: We propose a new methodology for disease gene discovery, which leverages graph-structured data using graph neural networks (GNNs) along with an explainability phase for determining the ranking of candidate genes and understanding the model's output. Our approach is based on a positive-unlabeled learning strategy, which outperforms existing gene discovery methods by exploiting GNNs in a non-black-box fashion. Our methodology is effective even in scenarios where a large number of associated genes need to be retrieved, in which gene prioritization methods often tend to lose their reliability. AVAILABILITY AND IMPLEMENTATION: The source code of XGDAG is available on GitHub at: https://github.com/GiDeCarlo/XGDAG. The data underlying this article are available at: https://www.disgenet.org/, https://thebiogrid.org/, https://doi.org/10.1371/journal.pcbi.1004120.s003, and https://doi.org/10.1371/journal.pcbi.1004120.s004.

Assuntos

Técnicas Genéticas , Aprendizado de Máquina , Reprodutibilidade dos Testes , Redes Neurais de Computação , Software

4.

NIAPU: network-informed adaptive positive-unlabeled learning for disease gene identification.

Stolfi, Paola; Mastropietro, Andrea; Pasculli, Giuseppe; Tieri, Paolo; Vergni, Davide.

Bioinformatics ; 39(2)2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36727493

RESUMO

MOTIVATION: Gene-disease associations are fundamental for understanding disease etiology and developing effective interventions and treatments. Identifying genes not yet associated with a disease due to a lack of studies is a challenging task in which prioritization based on prior knowledge is an important element. The computational search for new candidate disease genes may be eased by positive-unlabeled learning, the machine learning (ML) setting in which only a subset of instances are labeled as positive while the rest of the dataset is unlabeled. In this work, we propose a set of effective network-based features to be used in a novel Markov diffusion-based multi-class labeling strategy for putative disease gene discovery. RESULTS: The performances of the new labeling algorithm and the effectiveness of the proposed features have been tested on 10 different disease datasets using three ML algorithms. The new features have been compared against classical topological and functional/ontological features and a set of network- and biological-derived features already used in gene discovery tasks. The predictive power of the integrated methodology in searching for new disease genes has been found to be competitive against state-of-the-art algorithms. AVAILABILITY AND IMPLEMENTATION: The source code of NIAPU can be accessed at https://github.com/AndMastro/NIAPU. The source data used in this study are available online on the respective websites. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Aprendizado de Máquina , Difusão

5.

EdgeSHAPer: Bond-centric Shapley value-based explanation method for graph neural networks.

Mastropietro, Andrea; Pasculli, Giuseppe; Feldmann, Christian; Rodríguez-Pérez, Raquel; Bajorath, Jürgen.

iScience ; 25(10): 105043, 2022 Oct 21.

Artigo em Inglês | MEDLINE | ID: mdl-36134335

RESUMO

Graph neural networks (GNNs) recursively propagate signals along the edges of an input graph, integrate node feature information with graph structure, and learn object representations. Like other deep neural network models, GNNs have notorious black box character. For GNNs, only few approaches are available to rationalize model decisions. We introduce EdgeSHAPer, a generally applicable method for explaining GNN-based models. The approach is devised to assess edge importance for predictions. Therefore, EdgeSHAPer makes use of the Shapley value concept from game theory. For proof-of-concept, EdgeSHAPer is applied to compound activity prediction, a central task in drug discovery. EdgeSHAPer's edge centricity is relevant for molecular graphs where edges represent chemical bonds. Combined with feature mapping, EdgeSHAPer produces intuitive explanations for compound activity predictions. Compared to a popular node-centric and another edge-centric GNN explanation method, EdgeSHAPer reveals higher resolution in differentiating features determining predictions and identifies minimal pertinent positive feature sets.

6.

Network Proximity-Based Drug Repurposing Strategy for Early and Late Stages of Primary Biliary Cholangitis.

Shahini, Endrit; Pasculli, Giuseppe; Mastropietro, Andrea; Stolfi, Paola; Tieri, Paolo; Vergni, Davide; Cozzolongo, Raffaele; Pesce, Francesco; Giannelli, Gianluigi.

Biomedicines ; 10(7)2022 07 13.

Artigo em Inglês | MEDLINE | ID: mdl-35884999

RESUMO

Primary biliary cholangitis (PBC) is a chronic, cholestatic, immune-mediated, and progressive liver disorder. Treatment to preventing the disease from advancing into later and irreversible stages is still an unmet clinical need. Accordingly, we set up a drug repurposing framework to find potential therapeutic agents targeting relevant pathways derived from an expanded pool of genes involved in different stages of PBC. Starting with updated human protein-protein interaction data and genes specifically involved in the early and late stages of PBC, a network medicine approach was used to provide a PBC "proximity" or "involvement" gene ranking using network diffusion algorithms and machine learning models. The top genes in the proximity ranking, when combined with the original PBC-related genes, resulted in a final dataset of the genes most involved in PBC disease. Finally, a drug repurposing strategy was implemented by mining and utilizing dedicated drug-gene interaction and druggable genome information knowledge bases (e.g., the DrugBank repository). We identified several potential drug candidates interacting with PBC pathways after performing an over-representation analysis on our initial 1121-seed gene list and the resulting disease-associated (algorithm-obtained) genes. The mechanism and potential therapeutic applications of such drugs were then thoroughly discussed, with a particular emphasis on different stages of PBC disease. We found that interleukin/EGFR/TNF-alpha inhibitors, branched-chain amino acids, geldanamycin, tauroursodeoxycholic acid, genistein, antioestrogens, curcumin, antineovascularisation agents, enzyme/protease inhibitors, and antirheumatic agents are promising drugs targeting distinct stages of PBC. We developed robust and transparent selection mechanisms for prioritizing already approved medicinal products or investigational products for repurposing based on recognized unmet medical needs in PBC, as well as solid preliminary data to achieve this goal.

7.

Protocol to explain graph neural network predictions using an edge-centric Shapley value-based approach.

Mastropietro, Andrea; Pasculli, Giuseppe; Bajorath, Jürgen.

STAR Protoc ; 3(4): 101887, 2022 12 16.

Artigo em Inglês | MEDLINE | ID: mdl-36595907

RESUMO

Here we present EdgeSHAPer, a workflow for explaining graph neural networks by approximating Shapley values using Monte Carlo sampling. In this protocol, we describe steps to execute Python scripts for a chemical dataset from the original publication; however, this approach is also applicable to any user-provided dataset. We also detail steps encompassing neural network training, an explanation phase, and analysis via feature mapping. For complete details on the use and execution of this protocol, please refer to Mastropietro et al. (2022).1.

Assuntos

Redes Neurais de Computação , Fluxo de Trabalho

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA