Pesquisa | Portal Regional da BVS (teste)

Exploring activity landscapes with extended similarity: is Tanimoto enough?

Dunn, Timothy B; López-López, Edgar; Kim, Taewon David; Medina-Franco, José L; Miranda-Quintana, Ramón Alain.

Mol Inform ; 42(7): e2300056, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37202375

RESUMO

Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.

Assuntos

Algoritmos , Descoberta de Drogas , Relação Estrutura-Atividade , Aprendizado de Máquina

Extended continuous similarity indices: theory and application for QSAR descriptor selection.

Rácz, Anita; Dunn, Timothy B; Bajusz, Dávid; Kim, Taewon D; Miranda-Quintana, Ramón Alain; Héberger, Károly.

J Comput Aided Mol Des ; 36(3): 157-173, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-35288838

RESUMO

Extended (or n-ary) similarity indices have been recently proposed to extend the comparative analysis of binary strings. Going beyond the traditional notion of pairwise comparisons, these novel indices allow comparing any number of objects at the same time. This results in a remarkable efficiency gain with respect to other approaches, since now we can compare N molecules in O(N) instead of the common quadratic O(N2) timescale. This favorable scaling has motivated the application of these indices to diversity selection, clustering, phylogenetic analysis, chemical space visualization, and post-processing of molecular dynamics simulations. However, the current formulation of the n-ary indices is limited to vectors with binary or categorical inputs. Here, we present the further generalization of this formalism so it can be applied to numerical data, i.e. to vectors with continuous components. We discuss several ways to achieve this extension and present their analytical properties. As a practical example, we apply this formalism to the problem of feature selection in QSAR and prove that the extended continuous similarity indices provide a convenient way to discern between several sets of descriptors.

Assuntos

Desenho de Fármacos , Relação Quantitativa Estrutura-Atividade , Filogenia

Diversity and Chemical Library Networks of Large Data Sets.

Dunn, Timothy B; Seabra, Gustavo M; Kim, Taewon David; Juárez-Mercado, K Eurídice; Li, Chenglong; Medina-Franco, José L; Miranda-Quintana, Ramón Alain.

J Chem Inf Model ; 62(9): 2186-2201, 2022 05 09.

Artigo em Inglês | MEDLINE | ID: mdl-34723537

RESUMO

The quantification of chemical diversity has many applications in drug discovery, organic chemistry, food, and natural product chemistry, to name a few. As the size of the chemical space is expanding rapidly, it is imperative to develop efficient methods to quantify the diversity of large and ultralarge chemical libraries and visualize their mutual relationships in chemical space. Herein, we show an application of our recently introduced extended similarity indices to measure the fingerprint-based diversity of 19 chemical libraries typically used in drug discovery and natural products research with over 18 million compounds. Based on this concept, we introduce the Chemical Library Networks (CLNs) as a general and efficient framework to represent visually the chemical space of large chemical libraries providing a global perspective of the relation between the libraries. For the 19 compound libraries explored in this work, it was found that the (extended) Tanimoto index offers the best description of extended similarity in combination with RDKit fingerprints. CLNs are general and can be explored with any structure representation and similarity coefficient for large chemical libraries.

Assuntos

Produtos Biológicos , Bibliotecas de Moléculas Pequenas , Produtos Biológicos/química , Descoberta de Drogas/métodos , Bibliotecas de Moléculas Pequenas/química

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA