Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications.

Naveja, José J; Vogt, Martin

Naveja, José J; Vogt, Martin.

Afiliação

Naveja JJ; Instituto de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
Vogt M; Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5-6, 53115 Bonn, Germany.

Molecules ; 26(17)2021 Aug 31.

Article em En | MEDLINE | ID: mdl-34500724

RESUMO

Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis-Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.

Palavras-chave

analogue series; cheminformatics; compound-core relationships; core structure; matched molecular pairs; matched molecular series; medicinal chemistry; molecular scaffold; structure-activity relationships

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Diagnostic_studies Idioma: En Revista: Molecules Assunto da revista: BIOLOGIA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: México País de publicação: Suíça

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google