Pesquisa | Portal Regional da BVS

Compositional Deep Probabilistic Models of DNA-Encoded Libraries.

Chen, Benson; Sultan, Mohammad M; Karaletsos, Theofanis.

J Chem Inf Model ; 64(4): 1123-1133, 2024 Feb 26.

Artigo em Inglês | MEDLINE | ID: mdl-38335055

RESUMO

DNA-encoded library (DEL) has proven to be a powerful tool that utilizes combinatorially constructed small molecules to facilitate highly efficient screening experiments. These selection experiments, involving multiple stages of washing, elution, and identification of potent binders via unique DNA barcodes, often generate complex data. This complexity can potentially mask the underlying signals, necessitating the application of computational tools, such as machine learning, to uncover valuable insights. We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their monosynthon, disynthon, and trisynthon building blocks and capitalizes on the inherent hierarchical structure of these molecules by modeling latent reactions between embedded synthons. Additionally, we investigate methods to improve the observation models for DEL count data, such as integrating covariate factors to more effectively account for data noise. Across two popular public benchmark data sets (CA-IX and HRP), our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure, thereby providing a robust tool for the analysis of DEL data.

Assuntos

DNA , Bibliotecas de Moléculas Pequenas , Bibliotecas de Moléculas Pequenas/química , DNA/química , Modelos Estatísticos , Biblioteca Gênica

An allelic-series rare-variant association test for candidate-gene discovery.

McCaw, Zachary R; O'Dushlaine, Colm; Somineni, Hari; Bereket, Michael; Klein, Christoph; Karaletsos, Theofanis; Casale, Francesco Paolo; Koller, Daphne; Soare, Thomas W.

Am J Hum Genet ; 110(8): 1330-1342, 2023 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-37494930

RESUMO

Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.

Assuntos

Variação Genética , Lipídeos , Simulação por Computador , Estudos de Associação Genética , Fenótipo , Estudo de Associação Genômica Ampla

DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries.

Shmilovich, Kirill; Chen, Benson; Karaletsos, Theofanis; Sultan, Mohammad M.

J Chem Inf Model ; 63(9): 2719-2727, 2023 05 08.

Artigo em Inglês | MEDLINE | ID: mdl-37079427

RESUMO

DNA-encoded library (DEL) technology has enabled significant advances in hit identification by enabling efficient testing of combinatorially generated molecular libraries. DEL screens measure protein binding affinity though sequencing reads of molecules tagged with unique DNA barcodes that survive a series of selection experiments. Computational models have been deployed to learn the latent binding affinities that are correlated to the sequenced count data; however, this correlation is often obfuscated by various sources of noise introduced in its complicated data-generation process. In order to denoise DEL count data and screen for molecules with good binding affinity, computational models require the correct assumptions in their modeling structure to capture the correct signals underlying the data. Recent advances in DEL models have focused on probabilistic formulations of count data, but existing approaches have thus far been limited to only utilizing 2-D molecule-level representations. We introduce a new paradigm, DEL-Dock, that combines ligand-based descriptors with 3-D spatial information from docked protein-ligand complexes. 3-D spatial information allows our model to learn over the actual binding modality rather than using only structure-based information of the ligand. We show that our model is capable of effectively denoising DEL count data to predict molecule enrichment scores that are better correlated with experimental binding affinity measurements compared to prior works. Moreover, by learning over a collection of docked poses we demonstrate that our model, trained only on DEL data, implicitly learns to perform good docking pose selection without requiring external supervision from expensive-to-source protein crystal structures.

Assuntos

DNA , Proteínas , Simulação de Acoplamento Molecular , Ligantes , Modelos Moleculares , Proteínas/química , DNA/química , Ligação Proteica

RiboDiff: detecting changes of mRNA translation efficiency from ribosome footprints.

Zhong, Yi; Karaletsos, Theofanis; Drewe, Philipp; Sreedharan, Vipin T; Kuo, David; Singh, Kamini; Wendel, Hans-Guido; Rätsch, Gunnar.

Bioinformatics ; 33(1): 139-141, 2017 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-27634950

RESUMO

MOTIVATION: Deep sequencing based ribosome footprint profiling can provide novel insights into the regulatory mechanisms of protein translation. However, the observed ribosome profile is fundamentally confounded by transcriptional activity. In order to decipher principles of translation regulation, tools that can reliably detect changes in translation efficiency in case-control studies are needed. RESULTS: We present a statistical framework and an analysis tool, RiboDiff, to detect genes with changes in translation efficiency across experimental treatments. RiboDiff uses generalized linear models to estimate the over-dispersion of RNA-Seq and ribosome profiling measurements separately, and performs a statistical test for differential translation efficiency using both mRNA abundance and ribosome occupancy. AVAILABILITY AND IMPLEMENTATION: RiboDiff webpage http://bioweb.me/ribodiff Source code including scripts for preprocessing the FASTQ data are available at http://github.com/ratschlab/ribodiff CONTACTS: zhongy@cbio.mskcc.org or raetsch@inf.ethz.chSupplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Biossíntese de Proteínas , RNA Mensageiro/metabolismo , Ribossomos/metabolismo , Análise de Sequência de RNA/métodos , Software , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos

ShapePheno: unsupervised extraction of shape phenotypes from biological image collections.

Karaletsos, Theofanis; Stegle, Oliver; Dreyer, Christine; Winn, John; Borgwardt, Karsten M.

Bioinformatics ; 28(7): 1001-8, 2012 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-22333244

RESUMO

MOTIVATION: Accurate large-scale phenotyping has recently gained considerable importance in biology. For example, in genome-wide association studies technological advances have rendered genotyping cheap, leaving phenotype acquisition as the major bottleneck. Automatic image analysis is one major strategy to phenotype individuals in large numbers. Current approaches for visual phenotyping focus predominantly on summarizing statistics and geometric measures, such as height and width of an individual, or color histograms and patterns. However, more subtle, but biologically informative phenotypes, such as the local deformation of the shape of an individual with respect to the population mean cannot be automatically extracted and quantified by current techniques. RESULTS: We propose a probabilistic machine learning model that allows for the extraction of deformation phenotypes from biological images, making them available as quantitative traits for downstream analysis. Our approach jointly models a collection of images using a learned common template that is mapped onto each image through a deformable smooth transformation. In a case study, we analyze the shape deformations of 388 guppy fish (Poecilia reticulata). We find that the flexible shape phenotypes our model extracts are complementary to basic geometric measures. Moreover, these quantitative traits assort the observations into distinct groups and can be mapped to polymorphic genetic loci of the sample set. AVAILABILITY: Code is available under: http://bioweb.me/GEBI CONTACT: theofanis.karaletsos@tuebingen.mpg.de; oliver.stegle@tuebingen.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Inteligência Artificial , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Fenótipo , Animais , Análise por Conglomerados , Biologia Computacional/métodos , Masculino , Cadeias de Markov , Modelos Estatísticos , Poecilia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA