Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
J Chem Inf Model ; 64(12): 4687-4699, 2024 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-38822782

RESUMO

The design of compounds during hit-to-lead often seeks to explore a vector from a core scaffold to form additional interactions with the target protein. A rational approach to this is to probe the region of a protein accessed by a vector with a systematic placement of pharmacophore features in 3D, particularly when bound structures are not available. Herein, we present bbSelect, an open-source tool built to map the placements of pharmacophore features in 3D Euclidean space from a library of R-groups, employing partitioning to drive a diverse and systematic selection to a user-defined size. An evaluation of bbSelect against established methods exemplified the superiority of bbSelect in its ability to perform diverse selections, achieving high levels of pharmacophore feature placement coverage with selection sizes of a fraction of the total set and without the introduction of excess complexity. bbSelect also reports visualizations and rationale to enable users to understand and interrogate results. This provides a tool for the drug discovery community to guide their hit-to-lead activities.


Assuntos
Descoberta de Drogas , Software , Descoberta de Drogas/métodos , Modelos Moleculares , Desenho de Fármacos , Proteínas/química , Farmacóforo
2.
J Chem Inf Model ; 63(4): 1099-1113, 2023 02 27.
Artigo em Inglês | MEDLINE | ID: mdl-36758178

RESUMO

Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge" in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms and were trained on a relatively small data set of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility data sets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge data sets, with the best model, a graph convolutional neural network, resulting in an RMSE of 0.86 log units. Critical analysis of the models reveals systematic differences between the performance of models using certain feature sets and training data sets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modeling complex chemical spaces from sparse training data sets.


Assuntos
Aprendizado Profundo , Solubilidade , Redes Neurais de Computação , Aprendizado de Máquina , Algoritmos
3.
J Chem Inf Model ; 62(6): 1458-1470, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35258972

RESUMO

Accurate and rapid predictions of the binding affinity of a compound to a target are one of the ultimate goals of computer aided drug design. Alchemical approaches to free energy estimations follow the path from an initial state of the system to the final state through alchemical changes of the energy function during a molecular dynamics simulation. Herein, we explore the accuracy and efficiency of two such techniques: relative free energy perturbation (FEP) and multisite lambda dynamics (MSλD). These are applied to a series of inhibitors for the bromodomain-containing protein 4 (BRD4). We demonstrate a procedure for obtaining accurate relative binding free energies using MSλD when dealing with a change in the net charge of the ligand. This resulted in an impressive comparison with experiment, with an average difference of 0.4 ± 0.4 kcal mol-1. In a benchmarking study for the relative FEP calculations, we found that using 20 lambda windows with 0.5 ns of equilibration and 1 ns of data collection for each window gave the optimal compromise between accuracy and speed. Overall, relative FEP and MSλD predicted binding free energies with comparable accuracy, an average of 0.6 kcal mol-1 for each method. However, MSλD makes predictions for a larger molecular space over a much shorter time scale than relative FEP, with MSλD requiring a factor of 18 times less simulation time for the entire molecule space.


Assuntos
Proteínas Nucleares , Fatores de Transcrição , Entropia , Ligantes , Simulação de Dinâmica Molecular , Ligação Proteica , Termodinâmica
4.
Org Biomol Chem ; 19(25): 5632-5641, 2021 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-34105560

RESUMO

The bromodomain-containing protein 4 (BRD4), a member of the bromodomain and extra-terminal domain (BET) family, plays a key role in several diseases, especially cancers. With increased interest in BRD4 as a therapeutic target, many X-ray crystal structures of the protein in complex with small molecule inhibitors are publicly available over the recent decade. In this study, we use this structural information to investigate the conformations of the first bromodomain (BD1) of BRD4. Structural alignment of 297 BRD4-BD1 complexes shows a high level of similarity between the structures of BRD4-BD1, regardless of the bound ligand. We employ WONKA, a tool for detailed analyses of protein binding sites, to compare the active site of over 100 of these crystal structures. The positions of key binding site residues show a high level of conformational similarity, with the exception of Trp81. A focused analysis on the highly conserved water network in the binding site of BRD4-BD1 is performed to identify the positions of these water molecules across the crystal structures. The importance of the water network is illustrated using molecular docking and absolute free energy perturbation simulations. 82% of the ligand poses were better predicted when including water molecules as part of the receptor. Our analysis provides guidance for the design of new BRD4-BD1 inhibitors and the selection of the best structure of BRD4-BD1 to use in structure-based drug design, an important approach for faster and more cost-efficient lead discovery.


Assuntos
Proteínas de Ciclo Celular , Fatores de Transcrição
5.
J Med Chem ; 63(20): 11964-11971, 2020 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-32955254

RESUMO

Machine learning approaches promise to accelerate and improve success rates in medicinal chemistry programs by more effectively leveraging available data to guide a molecular design. A key step of an automated computational design algorithm is molecule generation, where the machine is required to design high-quality, drug-like molecules within the appropriate chemical space. Many algorithms have been proposed for molecular generation; however, a challenge is how to assess the validity of the resulting molecules. Here, we report three Turing-inspired tests designed to evaluate the performance of molecular generators. Profound differences were observed between the performance of molecule generators in these tests, highlighting the importance of selection of the appropriate design algorithms for specific circumstances. One molecule generator, based on match molecular pairs, performed excellently against all tests and thus provides a valuable component for machine-driven medicinal chemistry design workflows.


Assuntos
Algoritmos , Aprendizado de Máquina , Química Farmacêutica , Desenho de Fármacos , Humanos , Estrutura Molecular
6.
J Chem Inf Model ; 60(12): 5699-5713, 2020 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-32659085

RESUMO

Deep learning approaches have become popular in recent years in the field of de novo molecular design. While a variety of different methods are available, it is still a challenge to assess and compare their performance. A particularly promising approach for automated drug design is to use recurrent neural networks (RNNs) as SMILES generators and train them with the learning procedure called "transfer learning". This involves first training the initial model on a large generic data set of molecules to learn the general syntax of SMILES, followed by fine-tuning on a smaller set of molecules, coming from, e.g., a lead optimization program. To create a well-performing transfer learning application which can be automated, it is important to understand how the size of the second data set affects the training process. In addition, extensive postfiltering using similarity metrics of the molecules generated after transfer learning should be avoided, as it can introduce new biases toward the selection of drug candidates. Here, we present results from the application of a gated recurrent unit cell (GRU)-RNN to transfer learning on data sets of varying sizes and complexity. Analysis of the results has allowed us to provide some general guidelines for transfer learning. In particular, we show that data set sizes containing at least 190 molecules are needed for effective GRU-RNN-based molecular generation using transfer learning. The methods presented here should be applicable generally to the benchmarking of other deep learning methodologies for molecule generation.


Assuntos
Desenho de Fármacos , Redes Neurais de Computação , Aprendizado de Máquina
7.
J Chem Inf Model ; 59(3): 1136-1146, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30525594

RESUMO

A key component of automated molecular design is the generation of compound ideas for subsequent filtering and assessment. Recently deep learning approaches have been explored as alternatives to traditional de novo molecular design techniques. Deep learning algorithms rely on learning from large pools of molecules represented as molecular graphs (generally SMILES), and several approaches can be used to tailor the generated molecules to defined regions of chemical space. Cheminformatics has developed alternative higher-level representations that capture the key properties of a set of molecules, and it would be of interest to understand whether such representations can be used to constrain the output of molecule generation algorithms. In this work we explore the use of one such representation, the Reduced Graph, as a definition of target chemical space for a deep learning molecule generator. The Reduced Graph replaces functional groups with superatoms representing the pharmacophoric features. Assigning these superatoms to specific nonorganic element types allows the Reduced Graph to be represented as a valid SMILES string. The mapping from standard SMILES to Reduced Graph SMILES is well-defined, however, the inverse is not true, and this presents a particular challenge. Here we present the results of a novel seq-to-seq approach to molecule generation, where the one to many mapping of Reduced Graph to SMILES is learned on a large training set. This training needs to be performed only once. In a subsequent step, this model can be used to generate arbitrary numbers of compounds that have the same Reduced Graph as any input molecule. Through analysis of data sets in ChEMBL we show that the approach generates valid molecules and can extrapolate to Reduced Graphs unseen in the training set. The method offers an alternative deep learning approach to molecule generation that does not rely on transfer learning, latent space generation, or adversarial networks and is applicable to scaffold hopping and other cheminformatics applications in drug discovery.


Assuntos
Aprendizado Profundo , Preparações Farmacêuticas/química , Quimioinformática , Bases de Dados de Produtos Farmacêuticos , Desenho de Fármacos , Modelos Moleculares , Estrutura Molecular
8.
SLAS Discov ; 23(6): 532-545, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29699447

RESUMO

High-throughput screening (HTS) hits include compounds with undesirable properties. Many filters have been described to identify such hits. Notably, pan-assay interference compounds (PAINS) has been adopted by the community as the standard term to refer to such filters, and very useful guidelines have been adopted by the American Chemical Society (ACS) and subsequently triggered a healthy scientific debate about the pitfalls of draconian use of filters. Using an inhibitory frequency index, we have analyzed in detail the promiscuity profile of the whole GlaxoSmithKline (GSK) HTS collection comprising more than 2 million unique compounds that have been tested in hundreds of screening assays. We provide a comprehensive analysis of many previously published filters and newly described classes of nuisance structures that may serve as a useful source of empirical information to guide the design or growth of HTS collections and hit triaging strategies.


Assuntos
Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala/métodos , Bibliotecas de Moléculas Pequenas/química , Bioensaio/métodos
9.
J Med Chem ; 59(18): 8189-206, 2016 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-27124799

RESUMO

Fragment-based drug discovery (FBDD) is well suited for discovering both drug leads and chemical probes of protein function; it can cover broad swaths of chemical space and allows the use of creative chemistry. FBDD is widely implemented for lead discovery in industry but is sometimes used less systematically in academia. Design principles and implementation approaches for fragment libraries are continually evolving, and the lack of up-to-date guidance may prevent more effective application of FBDD in academia. This Perspective explores many of the theoretical, practical, and strategic considerations that occur within FBDD programs, including the optimal size, complexity, physicochemical profile, and shape profile of fragments in FBDD libraries, as well as compound storage, evaluation, and screening technologies. This compilation of industry experience in FBDD will hopefully be useful for those pursuing FBDD in academia.


Assuntos
Desenho de Fármacos , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Animais , Quinase do Ponto de Checagem 2/antagonistas & inibidores , Inibidores de Integrase de HIV/química , Inibidores de Integrase de HIV/farmacologia , Proteínas de Choque Térmico HSP90/antagonistas & inibidores , Humanos , Metaloproteinase 12 da Matriz/metabolismo , Inibidores de Metaloproteinases de Matriz/química , Inibidores de Metaloproteinases de Matriz/farmacologia , Proteína Quinase 14 Ativada por Mitógeno/antagonistas & inibidores , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacologia , Inibidores da Tripsina/química , Inibidores da Tripsina/farmacologia
10.
J Med Chem ; 59(6): 2452-67, 2016 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-26938474

RESUMO

Inhibitors of mitochondrial branched chain aminotransferase (BCATm), identified using fragment screening, are described. This was carried out using a combination of STD-NMR, thermal melt (Tm), and biochemical assays to identify compounds that bound to BCATm, which were subsequently progressed to X-ray crystallography, where a number of exemplars showed significant diversity in their binding modes. The hits identified were supplemented by searching and screening of additional analogues, which enabled the gathering of further X-ray data where the original hits had not produced liganded structures. The fragment hits were optimized using structure-based design, with some transfer of information between series, which enabled the identification of ligand efficient lead molecules with micromolar levels of inhibition, cellular activity, and good solubility.


Assuntos
Mitocôndrias/enzimologia , Transaminases/antagonistas & inibidores , Adipócitos/efeitos dos fármacos , Adipócitos/enzimologia , Cristalografia por Raios X , Ensaios de Triagem em Larga Escala , Humanos , Espectroscopia de Ressonância Magnética , Modelos Moleculares , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/farmacologia , Ligação Proteica , Relação Estrutura-Atividade
11.
Nat Rev Drug Discov ; 14(7): 475-86, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26091267

RESUMO

The pharmaceutical industry remains under huge pressure to address the high attrition rates in drug development. Attempts to reduce the number of efficacy- and safety-related failures by analysing possible links to the physicochemical properties of small-molecule drug candidates have been inconclusive because of the limited size of data sets from individual companies. Here, we describe the compilation and analysis of combined data on the attrition of drug candidates from AstraZeneca, Eli Lilly and Company, GlaxoSmithKline and Pfizer. The analysis reaffirms that control of physicochemical properties during compound optimization is beneficial in identifying compounds of candidate drug quality and indicates for the first time a link between the physicochemical properties of compounds and clinical failure due to safety issues. The results also suggest that further control of physicochemical properties is unlikely to have a significant effect on attrition rates and that additional work is required to address safety-related failures. Further cross-company collaborations will be crucial to future progress in this area.


Assuntos
Sistemas de Liberação de Medicamentos/métodos , Descoberta de Drogas/métodos , Indústria Farmacêutica/métodos , Drogas em Investigação , Animais , Sistemas de Liberação de Medicamentos/estatística & dados numéricos , Sistemas de Liberação de Medicamentos/tendências , Descoberta de Drogas/estatística & dados numéricos , Descoberta de Drogas/tendências , Avaliação Pré-Clínica de Medicamentos/métodos , Avaliação Pré-Clínica de Medicamentos/estatística & dados numéricos , Avaliação Pré-Clínica de Medicamentos/tendências , Indústria Farmacêutica/estatística & dados numéricos , Indústria Farmacêutica/tendências , Drogas em Investigação/administração & dosagem , Humanos , Estatística como Assunto/métodos , Estatística como Assunto/tendências
12.
J Med Chem ; 58(18): 7140-63, 2015 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-26090771

RESUMO

The hybridization of hits, identified by complementary fragment and high throughput screens, enabled the discovery of the first series of potent inhibitors of mitochondrial branched-chain aminotransferase (BCATm) based on a 2-benzylamino-pyrazolo[1,5-a]pyrimidinone-3-carbonitrile template. Structure-guided growth enabled rapid optimization of potency with maintenance of ligand efficiency, while the focus on physicochemical properties delivered compounds with excellent pharmacokinetic exposure that enabled a proof of concept experiment in mice. Oral administration of 2-((4-chloro-2,6-difluorobenzyl)amino)-7-oxo-5-propyl-4,7-dihydropyrazolo[1,5-a]pyrimidine-3-carbonitrile 61 significantly raised the circulating levels of the branched-chain amino acids leucine, isoleucine, and valine in this acute study.


Assuntos
Proteínas Mitocondriais/antagonistas & inibidores , Pirazóis/química , Pirimidinonas/química , Transaminases/antagonistas & inibidores , Adipócitos/efeitos dos fármacos , Adipócitos/enzimologia , Animais , Cristalografia por Raios X , Humanos , Isoleucina/sangue , Leucina/sangue , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C57BL , Modelos Moleculares , Pirazóis/síntese química , Pirazóis/farmacologia , Pirimidinonas/síntese química , Pirimidinonas/farmacologia , Relação Estrutura-Atividade , Transaminases/química , Valina/sangue
13.
J Comput Aided Mol Des ; 27(4): 321-36, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23615761

RESUMO

We describe the QSAR Workbench, a system for the building and analysis of QSAR models. The system is built around the Pipeline Pilot workflow tool and provides access to a variety of model building algorithms for both continuous and categorical data. Traditionally models are built on a one by one basis and fully exploring the model space of algorithms and descriptor subsets is a time consuming basis. The QSAR Workbench provides a framework to allow for multiple models to be built over a number of modeling algorithms, descriptor combinations and data splits (training and test sets). Methods to analyze and compare models are provided, enabling the user to select the most appropriate model. The Workbench provides a consistent set of routines for data preparation and chemistry normalization that are also applied for predictions. The Workbench provides a large degree of automation with the ability to publish preconfigured model building workflows for a variety of problem domains, whilst providing experienced users full access to the underlying parameterization if required. Methods are provided to allow for publication of selected models as web services, thus providing integration with the chemistry desktop. We describe the design and implementation of the QSAR Workbench and demonstrate its utility through application to two public domain datasets.


Assuntos
Desenho de Fármacos , Modelos Biológicos , Relação Quantitativa Estrutura-Atividade , Algoritmos , Bases de Dados de Produtos Farmacêuticos , Humanos , Fluxo de Trabalho
14.
ACS Med Chem Lett ; 2(1): 28-33, 2011 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-24900251

RESUMO

Traditional lead optimization projects involve long synthesis and testing cycles, favoring extensive structure-activity relationship (SAR) analysis and molecular design steps, in an attempt to limit the number of cycles that a project must run to optimize a development candidate. Microfluidic-based chemistry and biology platforms, with cycle times of minutes rather than weeks, lend themselves to unattended autonomous operation. The bottleneck in the lead optimization process is therefore shifted from synthesis or test to SAR analysis and design. As such, the way is open to an algorithm-directed process, without the need for detailed user data analysis. Here, we present results of two synthesis and screening experiments, undertaken using traditional methodology, to validate a genetic algorithm optimization process for future application to a microfluidic system. The algorithm has several novel features that are important for the intended application. For example, it is robust to missing data and can suggest compounds for retest to ensure reliability of optimization. The algorithm is first validated on a retrospective analysis of an in-house library embedded in a larger virtual array of presumed inactive compounds. In a second, prospective experiment with MMP-12 as the target protein, 140 compounds are submitted for synthesis over 10 cycles of optimization. Comparison is made to the results from the full combinatorial library that was synthesized manually and tested independently. The results show that compounds selected by the algorithm are heavily biased toward the more active regions of the library, while the algorithm is robust to both missing data (compounds where synthesis failed) and inactive compounds. This publication places the full combinatorial library and biological data into the public domain with the intention of advancing research into algorithm-directed lead optimization methods.

15.
Drug Discov Today ; 16(3-4): 164-71, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21129497

RESUMO

The impact of carboaromatic, heteroaromatic, carboaliphatic and heteroaliphatic ring counts and fused aromatic ring count on several developability measures (solubility, lipophilicity, protein binding, P450 inhibition and hERG binding) is the topic for this review article. Recent results indicate that increasing ring counts have detrimental effects on developability in the order carboaromatics≫heteroaromatics>carboaliphatics>heteroaliphatics, with heteroaliphatics exerting a beneficial effect in many cases. Increasing aromatic ring count exerts effects on several developability parameters that are lipophilicity- and size-independent, and fused aromatic systems have a beneficial effect relative to their nonfused counterparts. Increasing aromatic ring count has a detrimental effect on human bioavailability parameters, and heteroaromatic ring count (but not other ring counts) has increased over time in marketed oral drugs.


Assuntos
Desenho de Fármacos , Compostos Heterocíclicos/química , Hidrocarbonetos Aromáticos/química , Preparações Farmacêuticas/química , Administração Oral , Compostos Heterocíclicos/síntese química , Humanos , Hidrocarbonetos Aromáticos/síntese química , Marketing/estatística & dados numéricos , Preparações Farmacêuticas/administração & dosagem , Preparações Farmacêuticas/síntese química , Farmacocinética , Solubilidade , Relação Estrutura-Atividade
16.
J Chem Inf Model ; 50(10): 1872-86, 2010 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-20873842

RESUMO

Previous studies of the analysis of molecular matched pairs (MMPs) have often assumed that the effect of a substructural transformation on a molecular property is independent of the context (i.e., the local structural environment in which that transformation occurs). Experiments with large sets of hERG, solubility, and lipophilicity data demonstrate that the inclusion of contextual information can enhance the predictive power of MMP analyses, with significant trends (both positive and negative) being identified that are not apparent when using conventional, context-independent approaches.


Assuntos
Desenho de Fármacos , Canais de Potássio Éter-A-Go-Go/antagonistas & inibidores , Canais de Potássio Éter-A-Go-Go/metabolismo , Algoritmos , Bases de Dados Factuais , Canais de Potássio Éter-A-Go-Go/química , Humanos , Ligantes , Lipídeos/química , Estrutura Molecular , Solubilidade
17.
J Chem Inf Model ; 49(2): 195-208, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19434823

RESUMO

Neighborhood behavior describes the extent to which small structural changes defined by a molecular descriptor are likely to lead to small property changes. This study evaluates two methods for the quantification of neighborhood behavior: the optimal diagonal method of Patterson et al. and the optimality criterion method of Horvath and Jeandenans. The methods are evaluated using twelve different types of fingerprint (both 2D and 3D) with screening data derived from several lead optimization projects at GlaxoSmithKline. The principal focus of the work is the design of chemical arrays during lead optimization, and the study hence considers not only biological activity but also important drug properties such as metabolic stability, permeability, and lipophilicity. Evidence is provided to suggest that the optimality criterion method may provide a better quantitative description of neighborhood behavior than the optimal diagonal method.


Assuntos
Desenho de Fármacos , Permeabilidade
18.
J Chem Inf Model ; 48(8): 1543-57, 2008 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-18630899

RESUMO

A new machine learning method is presented for extracting interpretable structure-activity relationships from screening data. The method is based on an evolutionary algorithm and reduced graphs and aims to evolve a reduced graph query (subgraph) that is present within the active compounds and absent from the inactives. The reduced graph representation enables heterogeneous compounds, such as those found in high-throughput screening data, to be captured in a single representation with the resulting query encoding structure-activity information in a form that is readily interpretable by a chemist. The application of the method is illustrated using data sets extracted from the well-known MDDR data set and GSK in-house screening data. Queries are evolved that are consistent with the known SARs, and they are also shown to be robust when applied to independent sets that were not used in training.


Assuntos
Técnicas de Química Combinatória/métodos , Algoritmos , Cromossomos/genética , Humanos , Fenótipo , Receptor 5-HT1A de Serotonina/metabolismo , Agonistas do Receptor 5-HT1 de Serotonina , Relação Estrutura-Atividade
19.
J Chem Inf Model ; 48(8): 1558-70, 2008 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-18637673

RESUMO

A multiobjective evolutionary algorithm (MOEA) is described for evolving multiple structure-activity relationships (SARs). The SARs are encoded in easy-to-interpret reduced graph queries which describe features that are preferentially present in active compounds compared to inactives. The MOEA addresses a limitation associated with many machine learning methods; that is, the inherent tradeoff that exists in recall and precision which is usually handled by combining the two objectives into a single measure with a consequent loss of control. By simultaneously optimizing recall and precision, the MOEA generates a family of SARs that lie on the precision-recall (PR) curve. The user is then able to select a query with an appropriate balance in the two objectives: for example, a low recall-high precision query may be preferred when establishing the SAR, whereas a high recall-low precision query may be more appropriate in a virtual screening context. Each query on the PR curve aims at capturing the structure-activity information into a single representation, and each can be considered as an alternative (equally valid) solution. We then investigate combining individual queries into teams with the aim of capturing multiple SARs that may exist in a data set, for example, as is commonly seen in high-throughput screening data sets. Team formation is carried out iteratively as a postprocessing step following the evolution of the individual queries. The inclusion of uniqueness as a third objective within the MOEA provides an effective way of ensuring the queries are complementary in the active compounds they describe. Substantial improvements in both recall and precision are seen for some data sets. Furthermore, the resulting queries provide more detailed structure-activity information than is present in a single query.


Assuntos
Modelos Biológicos , Algoritmos , Humanos , Estrutura Molecular , Receptores 5-HT1 de Serotonina/metabolismo , Agonistas do Receptor 5-HT1 de Serotonina , Relação Estrutura-Atividade
20.
J Chem Inf Model ; 47(1): 219-27, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17238267

RESUMO

We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.


Assuntos
Modelos Estatísticos , Relação Quantitativa Estrutura-Atividade , Algoritmos , Inteligência Artificial , Classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...