Pesquisa | Portal Regional da BVS (teste)

MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information.

Heyndrickx, Wouter; Mervin, Lewis; Morawietz, Tobias; Sturm, Noé; Friedrich, Lukas; Zalewski, Adam; Pentina, Anastasia; Humbeck, Lina; Oldenhof, Martijn; Niwayama, Ritsuya; Schmidtke, Peter; Fechner, Nikolas; Simm, Jaak; Arany, Adam; Drizard, Nicolas; Jabal, Rama; Afanasyeva, Arina; Loeb, Regis; Verma, Shlok; Harnqvist, Simon; Holmes, Matthew; Pejo, Balazs; Telenczuk, Maria; Holway, Nicholas; Dieckmann, Arne; Rieke, Nicola; Zumsande, Friederike; Clevert, Djork-Arné; Krug, Michael; Luscombe, Christopher; Green, Darren; Ertl, Peter; Antal, Peter; Marcus, David; Do Huu, Nicolas; Fuji, Hideyoshi; Pickett, Stephen; Acs, Gergely; Boniface, Eric; Beck, Bernd; Sun, Yax; Gohier, Arnaud; Rippmann, Friedrich; Engkvist, Ola; Göller, Andreas H; Moreau, Yves; Galtier, Mathieu N; Schuffenhauer, Ansgar; Ceulemans, Hugo.

J Chem Inf Model ; 64(7): 2331-2344, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-37642660

RESUMO

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.

Assuntos

Benchmarking , Relação Quantitativa Estrutura-Atividade , Bioensaio , Aprendizado de Máquina

Molecular Assays Simulator to Unravel Predictors Hacking in Goal-Directed Molecular Generations.

Gendreau, Philippe; Turk, Joseph-André; Drizard, Nicolas; Ribeiro da Silva, Vinicius Barros; Descamps, Clarisse; Gaston-Mathé, Yann.

J Chem Inf Model ; 63(13): 3983-3998, 2023 07 10.

Artigo em Inglês | MEDLINE | ID: mdl-37347961

RESUMO

Generative models are being increasingly used in drug discovery, very often coupled with absorption, distribution, metabolism, and excretion (ADME) bioassays or quantitative structure-activity relationship (QSAR) models to optimize a given set of properties. The molecules proposed by these algorithms are often revealed to be false positives; that is, they are predicted to be active and turn out to be inactive after synthesis and testing, mostly due to overoptimization of the predicted scores, which leads to an actual decrease or stagnation of the real scores. This behavior is also known as the "hacking" of the predictive models by the generative model during the optimization step. This issue is reminiscent of adversarial examples in machine learning and it can be seen as enunciated by Goodhart's law: "when a measure becomes a target, it ceases to be a good measure." This issue is even more apparent in a multiparameter optimization (MPO) case, where the models need to extrapolate outside the training set distribution because there are no known molecules satisfying all the objectives simultaneously in the initial training set. Experimental evaluation of this problem is a hard and expensive task since it requires synthesis and testing of the generated molecules. Thus, efforts have been made to develop in silico "oracles"âreal-valued functions used as proxies for molecular propertiesâto help with the evaluation of these generative-model-based pipelines. However, these oracles have had a limited value so far because they are often too easy to model in comparison with biological assays and are usually limited to mono-objective cases. In this work, we introduce a simulator of multitarget assays using a smartly initialized neural network (NN) that returns continuous values for any input molecule. We use this oracle to replicate a real-world prospective lead optimization (LO) scenario. First, we trained predictive models on an initial small sample of molecules aimed at predicting their oracle values. Afterward, we generated new optimized molecules using the open-source GuacaMol package coupled with the previously built predictive models. Finally, we selected compounds matching the candidate drug target profile (CDTP) according to the predicted values and evaluated them by computing the true oracle values. We observed that even when the predictive models had excellent estimated performance metrics, the final selection still contained multiple false positives according to the NN-based oracle. Then, we evaluated the optimization behavior in mono- and bi-objective scenarios using either a logistic regression or a random forest predictive model. We also propose and evaluate several methods to help mitigate the hacking issue.

Assuntos

Algoritmos , Objetivos , Estudos Prospectivos , Redes Neurais de Computação , Bioensaio

Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance.

Pinel, Philippe; Guichaoua, Gwenn; Najm, Matthieu; Labouille, Stéphanie; Drizard, Nicolas; Gaston-Mathé, Yann; Hoffmann, Brice; Stoven, Véronique.

Mol Inform ; 42(4): e2200216, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36633361

RESUMO

Identification of novel chemotypes with biological activity similar to a known active molecule is an important challenge in drug discovery called 'scaffold hopping'. Small-, medium-, and large-step scaffold hopping efforts may lead to increasing degrees of chemical structure novelty with respect to the parent compound. In the present paper, we focus on the problem of large-step scaffold hopping. We assembled a high quality and well characterized dataset of scaffold hopping examples comprising pairs of active molecules and including a variety of protein targets. This dataset was used to build a benchmark corresponding to the setting of real-life applications: one active molecule is known, and the second active is searched among a set of decoys chosen in a way to avoid statistical bias. This allowed us to evaluate the performance of computational methods for solving large-step scaffold hopping problems. In particular, we assessed how difficult these problems are, particularly for classical 2D and 3D ligand-based methods. We also showed that a machine-learning chemogenomic algorithm outperforms classical methods and we provided some useful hints for future improvements.

Assuntos

Benchmarking , Descoberta de Drogas , Descoberta de Drogas/métodos , Ligantes , Algoritmos , Aprendizado de Máquina

On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks.

Volkov, Mikhail; Turk, Joseph-André; Drizard, Nicolas; Martin, Nicolas; Hoffmann, Brice; Gaston-Mathé, Yann; Rognan, Didier.

J Med Chem ; 65(11): 7946-7958, 2022 06 09.

Artigo em Inglês | MEDLINE | ID: mdl-35608179

RESUMO

Accurate prediction of binding affinities from protein-ligand atomic coordinates remains a major challenge in early stages of drug discovery. Using modular message passing graph neural networks describing both the ligand and the protein in their free and bound states, we unambiguously evidence that an explicit description of protein-ligand noncovalent interactions does not provide any advantage with respect to ligand or protein descriptors. Simple models, inferring binding affinities of test samples from that of the closest ligands or proteins in the training set, already exhibit good performances, suggesting that memorization largely dominates true learning in the deep neural networks. The current study suggests considering only noncovalent interactions while omitting their protein and ligand atomic environments. Removing all hidden biases probably requires much denser protein-ligand training matrices and a coordinated effort of the drug design community to solve the necessary protein-ligand structures.

Assuntos

Redes Neurais de Computação , Proteínas , Descoberta de Drogas , Ligantes , Ligação Proteica , Proteínas/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA