Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Comput Aided Mol Des ; 38(1): 13, 2024 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-38493240

RESUMO

The growing size of make-on-demand chemical libraries is posing new challenges to cheminformatics. These ultra-large chemical libraries became too large for exhaustive enumeration. Using a combinatorial approach instead, the resource requirement scales approximately with the number of synthons instead of the number of molecules. This gives access to billions or trillions of compounds as so-called chemical spaces with moderate hardware and in a reasonable time frame. While extremely performant ligand-based 2D methods exist in this context, 3D methods still largely rely on exhaustive enumeration and therefore fail to apply. Here, we present SpaceGrow: a novel shape-based 3D approach for ligand-based virtual screening of billions of compounds within hours on a single CPU. Compared to a conventional superposition tool, SpaceGrow shows comparable pose reproduction capacity based on RMSD and superior ranking performance while being orders of magnitude faster. Result assessment of two differently sized subsets of the eXplore space reveals a higher probability of finding superior results in larger spaces highlighting the potential of searching in ultra-large spaces. Furthermore, the application of SpaceGrow in a drug discovery workflow was investigated in four examples involving G protein-coupled receptors (GPCRs) with the aim to identify compounds with similar binding capabilities and molecular novelty.


Assuntos
Descoberta de Drogas , Bibliotecas de Moléculas Pequenas , Ligantes , Bibliotecas de Moléculas Pequenas/química , Descoberta de Drogas/métodos
2.
J Chem Inf Model ; 64(1): 219-237, 2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38108627

RESUMO

Molecular docking is a standard technique in structure-based drug design (SBDD). It aims to predict the 3D structure of a small molecule in the binding site of a receptor (often a protein). Despite being a common technique, it often necessitates multiple tools and involves manual steps. Here, we present the JAMDA preprocessing and docking workflow that is easy to use and allows fully automated docking. We evaluate the JAMDA docking workflow on binding sites extracted from the complete PDB and derive key factors determining JAMDA's docking performance. With that, we try to remove most of the bias due to manual intervention and provide a realistic estimate of the redocking performance of our JAMDA preprocessing and docking workflow for any PDB structure. On this large PDBScan22 data set, our JAMDA workflow finds a pose with an RMSD of at most 2 Å to the crystal ligand on the top rank for 30.1% of the structures. When applying objective structure quality filters to the PDBScan22 data set, the success rate increases to 61.8%. Given the prepared structures from the JAMDA preprocessing pipeline, both JAMDA and the widely used AutoDock Vina perform comparably on this filtered data set (the PDBScan22-HQ data set).


Assuntos
Desenho de Fármacos , Simulação de Acoplamento Molecular , Sítios de Ligação , Ligantes , Ligação Proteica
3.
Acta Crystallogr D Struct Biol ; 79(Pt 9): 837-856, 2023 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-37561404

RESUMO

Due to the structural complexity of proteins, their corresponding crystal arrangements generally contain a significant amount of solvent-occupied space. These areas allow a certain degree of intracrystalline protein flexibility and mobility of solutes. Therefore, knowledge of the geometry of solvent-filled channels and cavities is essential whenever the dynamics inside a crystal are of interest. Especially in soaking experiments for structure-based drug design, ligands must be able to traverse the crystal solvent channels and reach the corresponding binding pockets. Unsuccessful screenings are sometimes attributed to the geometry of the crystal packing, but the underlying causes are often difficult to understand. This work presents LifeSoaks, a novel tool for analyzing and visualizing solvent channels in protein crystals. LifeSoaks uses a Voronoi diagram-based periodic channel representation which can be efficiently computed. The size and location of channel bottlenecks, which might hinder molecular diffusion, can be directly derived from this representation. This work presents the calculated bottleneck radii for all crystal structures in the PDB and the analysis of a new, hand-curated data set of structures obtained by soaking experiments. The results indicate that the consideration of bottleneck radii and the visual inspection of channels are beneficial for planning soaking experiments.


Assuntos
Proteínas , Solventes , Proteínas/química
4.
J Comput Aided Mol Des ; 36(9): 639-651, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35989379

RESUMO

Fragment-based drug design is an established routine approach in both experimental and computational spheres. Growing fragment hits into viable ligands has increasingly shifted into the spotlight. FastGrow is an application based on a shape search algorithm that addresses this challenge at high speeds of a few milliseconds per fragment. It further features a pharmacophoric interaction description, ensemble flexibility, as well as geometry optimization to become a fully fledged structure-based modeling tool. All features were evaluated in detail on a previously reported collection of fragment growing scenarios extracted from crystallographic data. FastGrow was also shown to perform competitively versus established docking software. A case study on the DYRK1A kinase, using recently reported new chemotypes, illustrates FastGrow's features in practice and its ability to identify active fragments. FastGrow is freely available to the public as a web server at https://fastgrow.plus/ and is part of the SeeSAR 3D software package.


Assuntos
Desenho de Fármacos , Software , Algoritmos , Ligantes
5.
Nucleic Acids Res ; 50(W1): W611-W615, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35489057

RESUMO

Upon the ever-increasing number of publicly available experimentally determined and predicted protein and nucleic acid structures, the demand for easy-to-use tools to investigate these structural models is higher than ever before. The ProteinsPlus web server (https://proteins.plus) comprises a growing collection of molecular modeling tools focusing on protein-ligand interactions. It enables quick access to structural investigations ranging from structure analytics and search methods to molecular docking. It is by now well-established in the community and constantly extended. The server gives easy access not only to experts but also to students and occasional users from the field of life sciences. Here, we describe its recently added new features and tools, beyond them a novel method for on-the-fly molecular docking and a search method for single-residue substitutions in local regions of a protein structure throughout the whole Protein Data Bank. Finally, we provide a glimpse into new avenues for the annotation of AlphaFold structures which are directly accessible via a RESTful service on the ProteinsPlus web server.


Assuntos
Proteínas , Software , Simulação de Acoplamento Molecular , Proteínas/química , Modelos Moleculares , Internet
6.
J Comput Chem ; 42(15): 1095-1100, 2021 06 05.
Artigo em Inglês | MEDLINE | ID: mdl-33904606

RESUMO

Numerical optimization is a common technique in various areas of computational chemistry, molecular modeling and drug design. It is a key element of 3D techniques, for example, the optimization of protein-ligand poses and small-molecule conformers. Here, often the BFGS algorithm or variants thereof are used. However, the BFGS algorithm tends to make unreasonable large changes to the optimized system under certain circumstances. This behavior has been known for a long time and different solutions have been suggested. Recently, we have analyzed the optimization behavior of our novel JAMDA scoring function in detail and proposed the limited step length (LSL)-BFGS algorithm as a new solution to the problem of excessively large steps during optimization. The LSL-BFGS algorithm allows to control the step sizes during optimization. Its unique feature is the inclusion of arbitrary domain knowledge into the selection of the step sizes. Here, we introduce the open-source LSLOpt C++ library that implements this LSL-BFGS algorithm and demonstrate its usage.

7.
J Chem Inf Model ; 60(12): 6502-6522, 2020 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-33258376

RESUMO

Scoring and numerical optimization of protein-ligand poses is an integral part of docking tools. Although many scoring functions exist, many of them are not continuously differentiable and they are rarely explicitly analyzed with respect to their numerical optimization behavior. Here, we present a consistent scheme for pose scoring and gradient-based pose optimization. It consists of a novel variant of the BFGS algorithm enabling step-length control, named LSL-BFGS (limited step length BFGS), and the empirical JAMDA scoring function designed for pose prediction and good numerical optimizability. The JAMDA scoring function shows a high pose prediction performance in the CASF-2016 docking power benchmark, top-ranking a pose with an RMSD of ≤2 Å in about 89% of the cases. The combination of JAMDA scoring with the LSL-BFGS algorithm shows a significantly higher optimization locality (i.e., no excessive movement of poses) than with the classical BFGS algorithm while retaining the characteristically low number of scoring function evaluations. The JAMDA scoring and optimization scheme is freely available for noncommercial use and academic research.


Assuntos
Algoritmos , Proteínas , Benchmarking , Ligantes , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/metabolismo
8.
Nucleic Acids Res ; 48(W1): W48-W53, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32297936

RESUMO

Due to the increasing amount of publicly available protein structures searching, enriching and investigating these data still poses a challenging task. The ProteinsPlus web service (https://proteins.plus) offers a broad range of tools addressing these challenges. The web interface to the tool collection focusing on protein-ligand interactions has been geared towards easy and intuitive access to a large variety of functionality for life scientists. Since our last publication, the ProteinsPlus web service has been extended by additional services as well as it has undergone substantial infrastructural improvements. A keyword search functionality was added on the start page of ProteinsPlus enabling users to work on structures without knowing their PDB code. The tool collection has been augmented by three tools: StructureProfiler validates ligands and active sites using selection criteria of well-established protein-ligand benchmark data sets, WarPP places water molecules in the ligand binding sites of a protein, and METALizer calculates, predicts and scores coordination geometries of metal ions based on surrounding complex atoms. Additionally, all tools provided by ProteinsPlus are available through a REST service enabling the automated integration in structure processing and modeling pipelines.


Assuntos
Proteínas/química , Software , Sítios de Ligação , Ligantes , Metais/química , Modelos Moleculares , Proteínas/metabolismo , Água/química
9.
FEBS Open Bio ; 10(4): 580-592, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32031736

RESUMO

Type VII collagen is an extracellular matrix protein, which is important for skin stability; however, detailed information at the molecular level is scarce. The second vWFA (von Willebrand factor type A) domain of type VII collagen mediates important interactions, and immunization of mice induces skin blistering in certain strains. To understand vWFA2 function and the pathophysiological mechanisms leading to skin blistering, we structurally characterized this domain by X-ray crystallography and NMR spectroscopy. Cell adhesion assays identified two new interactions: one with ß1 integrin via its RGD motif and one with laminin-332. The latter interaction was confirmed by surface plasmon resonance with a KD of about 1 mm. These data show that vWFA2 has additional functions in the extracellular matrix besides interacting with type I collagen.


Assuntos
Colágeno Tipo VII/química , Colágeno Tipo VII/metabolismo , Domínios Proteicos , Fator de von Willebrand/química , Fator de von Willebrand/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Autoanticorpos/imunologia , Sítios de Ligação , Vesícula/imunologia , Vesícula/metabolismo , Adesão Celular , Colágeno Tipo I/metabolismo , Epidermólise Bolhosa Adquirida/imunologia , Epidermólise Bolhosa Adquirida/metabolismo , Matriz Extracelular/metabolismo , Células HaCaT , Humanos , Integrina beta1/química , Integrina beta1/metabolismo , Laminina/metabolismo , Camundongos , Ligação Proteica , Domínios Proteicos/imunologia , Pele/metabolismo , Fator de von Willebrand/imunologia
10.
J Chem Inf Model ; 59(3): 947-961, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30835112

RESUMO

Reports of successful applications of machine learning (ML) methods in structure-based virtual screening (SBVS) are increasing. ML methods such as convolutional neural networks show promising results and often outperform traditional methods such as empirical scoring functions in retrospective validation. However, trained ML models are often treated as black boxes and are not straightforwardly interpretable. In most cases, it is unknown which features in the data are decisive and whether a model's predictions are right for the right reason. Hence, we re-evaluated three widely used benchmark data sets in the context of ML methods and came to the conclusion that not every benchmark data set is suitable. Moreover, we demonstrate on two examples from current literature that bias is learned implicitly and unnoticed from standard benchmarks. On the basis of these results, we conclude that there is a need for eligible validation experiments and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS. Therefore, we provide guidelines for setting up validation experiments and give a perspective on how new data sets could be generated.


Assuntos
Viés , Aprendizado de Máquina , Benchmarking/métodos , Bases de Dados Factuais , Ligantes , Simulação de Acoplamento Molecular/métodos , Estrutura Molecular , Estudos Retrospectivos , Relação Estrutura-Atividade
11.
J Chem Inf Model ; 59(2): 731-742, 2019 02 25.
Artigo em Inglês | MEDLINE | ID: mdl-30747530

RESUMO

Computer-aided drug design methods such as docking, pharmacophore searching, 3D database searching, and the creation of 3D-QSAR models need conformational ensembles to handle the flexibility of small molecules. Here, we present Conformator, an accurate and effective knowledge-based algorithm for generating conformer ensembles. With 99.9% of all test molecules processed, Conformator stands out by its robustness with respect to input formats, molecular geometries, and the handling of macrocycles. With an extended set of rules for sampling torsion angles, a novel algorithm for macrocycle conformer generation, and a new clustering algorithm for the assembly of conformer ensembles, Conformator reaches a median minimum root-mean-square deviation (measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers) of 0.47 Å with no significant difference to the highest-ranked commercial algorithm OMEGA and significantly higher accuracy than seven free algorithms, including the RDKit DG algorithm. Conformator is freely available for noncommercial use and academic research.


Assuntos
Desenho de Fármacos , Conformação Molecular , Algoritmos , Análise por Conglomerados , Compostos Macrocíclicos/química , Modelos Moleculares , Relação Quantitativa Estrutura-Atividade , Fatores de Tempo
12.
Bioinformatics ; 35(5): 874-876, 2019 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-30124779

RESUMO

MOTIVATION: Three-dimensional protein structures are important starting points for elucidating protein function and applications like drug design. Computational methods in this area rely on high quality validation datasets which are usually manually assembled. Due to the increase in published structures as well as the increasing demand for specially tailored validation datasets, automatic procedures should be adopted. RESULTS: StructureProfiler is a new tool for automatic, objective and customizable profiling of X-ray protein structures based on the most frequently applied selection criteria currently in use to assemble benchmark datasets. As examples, four dataset configurations (Astex, Iridium, Platinum, combined), all results of the combined tests and the list of all PDB Ids passing the combined criteria set are attached in the Supplementary Material. AVAILABILITY AND IMPLEMENTATION: StructureProfiler is available as part of the ProteinsPlus web service http://proteins.plus and as standalone tool in the NAOMI ChemBio Suite. Dataset updates together with the tool can be found on http://www.zbh.uni-hamburg.de/structureprofiler. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Biologia Computacional , Desenho de Fármacos , Proteínas
13.
Eur J Med Chem ; 163: 747-762, 2019 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-30576905

RESUMO

Since decades de novo design of small molecules is intensively used and fragment-based drug discovery (FBDD) approaches still gain in popularity. Recent publications considering synthetically feasible de novo drug design underline the ongoing need for new methods. Continuous development of algorithms and tools are made, where a combination of intuitive usage, acceptable runtime, and a thoroughly evaluated workflow on large scale data sets is still a curiosity. Here, we present an intuitive approach for constrained synthetically feasible fragment growing. Starting from a fragment within its crystallized structure building blocks are attached via covalent bond formation to build up larger ligands. Iteratively, conformations are generated inside the binding site and scored to find the best suitable one. To cope with the combinatorial explosion of large flexible building blocks a novel dynamic adaptation algorithm is introduced. The technique achieves low runtimes while keeping high accuracies. The developed workflow is evaluated on a large-scale data set of 264 co-crystallized fragments with their corresponding elaborated ligands. Using our approach for fragment-based ligand growing, we were able to generate putative ligands within an RMSD of less than 2 Šcompared to its crystallized structure. Additionally, we were able to show the benefit of a monolithic tethered docking like methodology compared to state of the art docking. We incorporated our method, NAOMInext, in a clearly arranged graphical user interface that assists the user by defining valuable constraints to improve and accelerate the sampling workflow. In combination with predefined synthetic reaction rules NAOMInext efficiently suggests ideas for the next generation of novel lead compounds.


Assuntos
Algoritmos , Descoberta de Drogas , Fluxo de Trabalho , Cristalografia , Ligantes , Conformação Molecular , Simulação de Acoplamento Molecular/métodos , Relação Estrutura-Atividade
14.
J Chem Inf Model ; 58(8): 1625-1637, 2018 08 27.
Artigo em Inglês | MEDLINE | ID: mdl-30036062

RESUMO

Water molecules are of great importance for the correct representation of ligand binding interactions. Throughout the last years, water molecules and their integration into drug design strategies have received increasing attention. Nowadays a variety of tools are available to place and score water molecules. However, the most frequently applied software solutions require substantial computational resources. In addition, none of the existing methods has been rigorously evaluated on the basis of a large number of diverse protein complexes. Therefore, we present a novel method for placing water molecules, called WarPP, based on interaction geometries previously derived from protein crystal structures. Using a large, previously compiled, high-quality validation set of almost 1500 protein-ligand complexes containing almost 20 000 crystallographically observed water molecules in their active sites, we validated our placement strategy. We correctly placed 80% of the water molecules within 1.0 Šof a crystallographically observed one.


Assuntos
Proteínas/química , Água/química , Sítios de Ligação , Bases de Dados de Proteínas , Ligantes , Modelos Moleculares , Conformação Proteica , Termodinâmica
15.
J Chem Inf Model ; 57(11): 2719-2728, 2017 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-28967749

RESUMO

We assess and compare the performance of eight commercial conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extracted from the PDB. Differences in the performance of commercial algorithms are much smaller than those observed for free algorithms in our previous study (J. Chem. Inf. MODEL: 2017, 57, 529-539). For commercial algorithms, the median minimum root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers are between 0.46 and 0.61 Å. Commercial conformer ensemble generators are characterized by their high robustness, with at least 99% of all input molecules successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked commercial algorithms. Based on a statistical analysis, we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.


Assuntos
Modelos Moleculares , Conformação Molecular , Benchmarking , Descoberta de Drogas
16.
J Biotechnol ; 261: 207-214, 2017 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-28610996

RESUMO

Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de.


Assuntos
Biologia Computacional , Internet , Software , Bases de Dados de Proteínas
17.
Nucleic Acids Res ; 45(W1): W337-W343, 2017 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-28472372

RESUMO

With currently more than 126 000 publicly available structures and an increasing growth rate, the Protein Data Bank constitutes a rich data source for structure-driven research in fields like drug discovery, crop science and biotechnology in general. Typical workflows in these areas involve manifold computational tools for the analysis and prediction of molecular functions. Here, we present the ProteinsPlus web server that offers a unified easy-to-use interface to a broad range of tools for the early phase of structure-based molecular modeling. This includes solutions for commonly required pre-processing tasks like structure quality assessment (EDIA), hydrogen placement (Protoss) and the search for alternative conformations (SIENA). Beyond that, it also addresses frequent problems as the generation of 2D-interaction diagrams (PoseView), protein-protein interface classification (HyPPI) as well as automatic pocket detection and druggablity assessment (DoGSiteScorer). The unified ProteinsPlus interface covering all featured approaches provides various facilities for intuitive input and result visualization, case-specific parameterization and download options for further processing. Moreover, its generalized workflow allows the user a quick familiarization with the different tools. ProteinsPlus also stores the calculated results temporarily for future request and thus facilitates convenient result communication and re-access. The server is freely available at http://proteins.plus.


Assuntos
Conformação Proteica , Software , Sítios de Ligação , Hidrogênio/química , Internet , Ligantes , Modelos Moleculares , Mapeamento de Interação de Proteínas , Proteínas/química
18.
J Chem Inf Model ; 57(2): 122-126, 2017 02 27.
Artigo em Inglês | MEDLINE | ID: mdl-28151651

RESUMO

Many cheminformatics applications like aromaticity detection, SMARTS matching, or the calculation of atomic coordinates require a chemically meaningful perception of the molecular ring topology. The unique ring families (URFs) were recently introduced as a unique, polynomial, and chemically meaningful description of the ring topology. Here we present the first open-source implementation of the URF concept for ring perception. The C library RingDecomposerLib is easy to use, portable, well-documented, and thoroughly tested. Aside from the URFs, other related ring topology descriptions like the relevant cycles (RCs), relevant cycle prototypes (RCPs), and a smallest set of smallest rings (SSSR) can be calculated. We demonstrate the runtime efficiency of the RingDecomposerLib with computing time benchmarks for the complete PubChem Compound Database and thereby show the applicability in large-scale and interactive applications.


Assuntos
Informática/métodos , Bases de Dados de Produtos Farmacêuticos
19.
J Chem Inf Model ; 57(3): 529-539, 2017 03 27.
Artigo em Inglês | MEDLINE | ID: mdl-28206754

RESUMO

We developed a cheminformatics pipeline for the fully automated selection and extraction of high-quality protein-bound ligand conformations from X-ray structural data. The pipeline evaluates the validity and accuracy of the 3D structures of small molecules according to multiple criteria, including their fit to the electron density and their physicochemical and structural properties. Using this approach, we compiled two high-quality datasets from the Protein Data Bank (PDB): a comprehensive dataset and a diversified subset of 4626 and 2912 structures, respectively. The datasets were applied to benchmarking seven freely available conformer ensemble generators: Balloon (two different algorithms), the RDKit standard conformer ensemble generator, the Experimental-Torsion basic Knowledge Distance Geometry (ETKDG) algorithm, Confab, Frog2 and Multiconf-DOCK. Substantial differences in the performance of the individual algorithms were observed, with RDKit and ETKDG generally achieving a favorable balance of accuracy, ensemble size and runtime. The Platinum datasets are available for download from http://www.zbh.uni-hamburg.de/platinum_dataset .


Assuntos
Desenho de Fármacos , Informática/métodos , Benchmarking , Ligantes , Modelos Moleculares , Conformação Molecular , Platina/química , Platina/metabolismo , Proteínas/metabolismo , Fatores de Tempo
20.
J Chem Inf Model ; 56(1): 12-20, 2016 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-26740007

RESUMO

A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.


Assuntos
Descoberta de Drogas/métodos , Aprendizado de Máquina Supervisionado , Animais , Estudos de Viabilidade , Retroalimentação , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...