Pesquisa | Portal Regional da BVS

Chainsaw: protein domain segmentation with fully convolutional neural networks.

Wells, Jude; Hawkins-Hooker, Alex; Bordin, Nicola; Sillitoe, Ian; Paige, Brooks; Orengo, Christine.

Bioinformatics ; 40(5)2024 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-38718225

RESUMO

MOTIVATION: Protein domains are fundamental units of protein structure and play a pivotal role in understanding folding, function, evolution, and design. The advent of accurate structure prediction techniques has resulted in an influx of new structural data, making the partitioning of these structures into domains essential for inferring evolutionary relationships and functional classification. RESULTS: This article presents Chainsaw, a supervised learning approach to domain parsing that achieves accuracy that surpasses current state-of-the-art methods. Chainsaw uses a fully convolutional neural network which is trained to predict the probability that each pair of residues is in the same domain. Domain predictions are then derived from these pairwise predictions using an algorithm that searches for the most likely assignment of residues to domains given the set of pairwise co-membership probabilities. Chainsaw matches CATH domain annotations in 78% of protein domains versus 72% for the next closest method. When predicting on AlphaFold models, expert human evaluators were twice as likely to prefer Chainsaw's predictions versus the next best method. AVAILABILITY AND IMPLEMENTATION: github.com/JudeWells/Chainsaw.

Assuntos

Algoritmos , Redes Neurais de Computação , Domínios Proteicos , Proteínas , Proteínas/química , Bases de Dados de Proteínas , Biologia Computacional/métodos , Software , Humanos

Getting personal with epigenetics: towards individual-specific epigenomic imputation with machine learning.

Hawkins-Hooker, Alex; Visonà, Giovanni; Narendra, Tanmayee; Rojas-Carulla, Mateo; Schölkopf, Bernhard; Schweikert, Gabriele.

Nat Commun ; 14(1): 4750, 2023 08 07.

Artigo em Inglês | MEDLINE | ID: mdl-37550323

RESUMO

Epigenetic modifications are dynamic mechanisms involved in the regulation of gene expression. Unlike the DNA sequence, epigenetic patterns vary not only between individuals, but also between different cell types within an individual. Environmental factors, somatic mutations and ageing contribute to epigenetic changes that may constitute early hallmarks or causal factors of disease. Epigenetic modifications are reversible and thus promising therapeutic targets for precision medicine. However, mapping efforts to determine an individual's cell-type-specific epigenome are constrained by experimental costs and tissue accessibility. To address these challenges, we developed eDICE, an attention-based deep learning model that is trained to impute missing epigenomic tracks by conditioning on observed tracks. Using a recently published set of epigenomes from four individual donors, we show that transfer learning across individuals allows eDICE to successfully predict individual-specific epigenetic variation even in tissues that are unmapped in a given donor. These results highlight the potential of machine learning-based imputation methods to advance personalized epigenomics.

Assuntos

Epigênese Genética , Epigenômica , Humanos , Epigenômica/métodos , Aprendizado de Máquina , Epigenoma , Medicina de Precisão/métodos , Metilação de DNA/genética

The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles.

Schreiber, Jacob; Boix, Carles; Wook Lee, Jin; Li, Hongyang; Guan, Yuanfang; Chang, Chun-Chieh; Chang, Jen-Chien; Hawkins-Hooker, Alex; Schölkopf, Bernhard; Schweikert, Gabriele; Carulla, Mateo Rojas; Canakoglu, Arif; Guzzo, Francesco; Nanni, Luca; Masseroli, Marco; Carman, Mark James; Pinoli, Pietro; Hong, Chenyang; Yip, Kevin Y; Spence, Jeffrey P; Batra, Sanjit Singh; Song, Yun S; Mahony, Shaun; Zhang, Zheng; Tan, Wuwei; Shen, Yang; Sun, Yuanfei; Shi, Minyi; Adrian, Jessika; Sandstrom, Richard; Farrell, Nina; Halow, Jessica; Lee, Kristen; Jiang, Lixia; Yang, Xinqiong; Epstein, Charles; Strattan, J Seth; Bernstein, Bradley; Snyder, Michael; Kellis, Manolis; Stafford, William; Kundaje, Anshul.

Genome Biol ; 24(1): 79, 2023 04 18.

Artigo em Inglês | MEDLINE | ID: mdl-37072822

RESUMO

A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.

Assuntos

Algoritmos , Epigenômica , Genômica/métodos

Generating functional protein variants with variational autoencoders.

Hawkins-Hooker, Alex; Depardieu, Florence; Baur, Sebastien; Couairon, Guillaume; Chen, Arthur; Bikard, David.

PLoS Comput Biol ; 17(2): e1008736, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33635868

RESUMO

The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.

Assuntos

Biologia Computacional/métodos , Simulação por Computador , Redes Neurais de Computação , Proteínas/química , Proteínas/fisiologia , Algoritmos , Escherichia coli/genética , Aprendizado de Máquina , Oxirredutases/química , Photorhabdus , Proteínas Recombinantes/química , Reprodutibilidade dos Testes , Solubilidade

Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay.

Shigaki, Dustin; Adato, Orit; Adhikari, Aashish N; Dong, Shengcheng; Hawkins-Hooker, Alex; Inoue, Fumitaka; Juven-Gershon, Tamar; Kenlay, Henry; Martin, Beth; Patra, Ayoti; Penzar, Dmitry D; Schubach, Max; Xiong, Chenling; Yan, Zhongxia; Boyle, Alan P; Kreimer, Anat; Kulakovskiy, Ivan V; Reid, John; Unger, Ron; Yosef, Nir; Shendure, Jay; Ahituv, Nadav; Kircher, Martin; Beer, Michael A.

Hum Mutat ; 40(9): 1280-1291, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31106481

RESUMO

The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.

Assuntos

DNA/química , Epigenômica/métodos , Mutação Puntual , Sítios de Ligação , Linhagem Celular , Cromatina/genética , DNA/metabolismo , Elementos Facilitadores Genéticos , Predisposição Genética para Doença , Humanos , Aprendizado de Máquina , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA