Pesquisa | Portal Regional da BVS (teste)

Predicting novel substrates for enzymes with minimal experimental effort with active learning.

Pertusi, Dante A; Moura, Matthew E; Jeffryes, James G; Prabhu, Siddhant; Walters Biggs, Bradley; Tyo, Keith E J.

Metab Eng ; 44: 171-181, 2017 11.

Artigo em Inglês | MEDLINE | ID: mdl-29030274

RESUMO

Enzymatic substrate promiscuity is more ubiquitous than previously thought, with significant consequences for understanding metabolism and its application to biocatalysis. This realization has given rise to the need for efficient characterization of enzyme promiscuity. Enzyme promiscuity is currently characterized with a limited number of human-selected compounds that may not be representative of the enzyme's versatility. While testing large numbers of compounds may be impractical, computational approaches can exploit existing data to determine the most informative substrates to test next, thereby more thoroughly exploring an enzyme's versatility. To demonstrate this, we used existing studies and tested compounds for four different enzymes, developed support vector machine (SVM) models using these datasets, and selected additional compounds for experiments using an active learning approach. SVMs trained on a chemically diverse set of compounds were discovered to achieve maximum accuracies of ~80% using ~33% fewer compounds than datasets based on all compounds tested in existing studies. Active learning-selected compounds for testing resolved apparent conflicts in the existing training data, while adding diversity to the dataset. The application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVMs that can predict high-probability promiscuous enzymatic reactions and could prove a valuable resource for the design of novel metabolic pathways.

Assuntos

Escherichia coli , Metaboloma , Modelos Biológicos , Máquina de Vetores de Suporte , Escherichia coli/genética , Escherichia coli/metabolismo

Prospective Assessment of Virtual Screening Heuristics Derived Using a Novel Fusion Score.

Pertusi, Dante A; O'Donnell, Gregory; Homsher, Michelle F; Solly, Kelli; Patel, Amita; Stahler, Shannon L; Riley, Daniel; Finley, Michael F; Finger, Eleftheria N; Adam, Gregory C; Meng, Juncai; Bell, David J; Zuck, Paul D; Hudak, Edward M; Weber, Michael J; Nothstein, Jennifer E; Locco, Louis; Quinn, Carissa; Amoss, Adam; Squadroni, Brian; Hartnett, Michelle; Heo, Mee Ra; White, Tara; May, S Alex; Boots, Evelyn; Roberts, Kenneth; Cocchiarella, Patrick; Wolicki, Alex; Kreamer, Anthony; Kutchukian, Peter S; Wassermann, Anne Mai; Uebele, Victor N; Glick, Meir; Rusinko, Andrew; Culberson, J Christopher.

SLAS Discov ; 22(8): 995-1006, 2017 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-28426940

RESUMO

High-throughput screening (HTS) is a widespread method in early drug discovery for identifying promising chemical matter that modulates a target or phenotype of interest. Because HTS campaigns involve screening millions of compounds, it is often desirable to initiate screening with a subset of the full collection. Subsequently, virtual screening methods prioritize likely active compounds in the remaining collection in an iterative process. With this approach, orthogonal virtual screening methods are often applied, necessitating the prioritization of hits from different approaches. Here, we introduce a novel method of fusing these prioritizations and benchmark it prospectively on 17 screening campaigns using virtual screening methods in three descriptor spaces. We found that the fusion approach retrieves 15% to 65% more active chemical series than any single machine-learning method and that appropriately weighting contributions of similarity and machine-learning scoring techniques can increase enrichment by 1% to 19%. We also use fusion scoring to evaluate the tradeoff between screening more chemical matter initially in lieu of replicate samples to prevent false-positives and find that the former option leads to the retrieval of more active chemical series. These results represent guidelines that can increase the rate of identification of promising active compounds in future iterative screens.

Assuntos

Avaliação Pré-Clínica de Medicamentos , Heurística , Interface Usuário-Computador , Aprendizado de Máquina

CellSort: a support vector machine tool for optimizing fluorescence-activated cell sorting and reducing experimental effort.

Yu, Jessica S; Pertusi, Dante A; Adeniran, Adebola V; Tyo, Keith E J.

Bioinformatics ; 33(6): 909-916, 2017 03 15.

Artigo em Inglês | MEDLINE | ID: mdl-27998936

RESUMO

Motivation: High throughput screening by fluorescence activated cell sorting (FACS) is a common task in protein engineering and directed evolution. It can also be a rate-limiting step if high false positive or negative rates necessitate multiple rounds of enrichment. Current FACS software requires the user to define sorting gates by intuition and is practically limited to two dimensions. In cases when multiple rounds of enrichment are required, the software cannot forecast the enrichment effort required. Results: We have developed CellSort, a support vector machine (SVM) algorithm that identifies optimal sorting gates based on machine learning using positive and negative control populations. CellSort can take advantage of more than two dimensions to enhance the ability to distinguish between populations. We also present a Bayesian approach to predict the number of sorting rounds required to enrich a population from a given library size. This Bayesian approach allowed us to determine strategies for biasing the sorting gates in order to reduce the required number of enrichment rounds. This algorithm should be generally useful for improve sorting outcomes and reducing effort when using FACS. Availability and Implementation: Source code available at http://tyolab.northwestern.edu/tools/ . k-tyo@northwestern.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Separação Celular/métodos , Citometria de Fluxo/métodos , Software , Máquina de Vetores de Suporte , Algoritmos , Teorema de Bayes , Leveduras

Efficient searching and annotation of metabolic networks using chemical similarity.

Pertusi, Dante A; Stine, Andrew E; Broadbelt, Linda J; Tyo, Keith E J.

Bioinformatics ; 31(7): 1016-24, 2015 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-25417203

RESUMO

MOTIVATION: The urgent need for efficient and sustainable biological production of fuels and high-value chemicals has elicited a wave of in silico techniques for identifying promising novel pathways to these compounds in large putative metabolic networks. To date, these approaches have primarily used general graph search algorithms, which are prohibitively slow as putative metabolic networks may exceed 1 million compounds. To alleviate this limitation, we report two methods--SimIndex (SI) and SimZyme--which use chemical similarity of 2D chemical fingerprints to efficiently navigate large metabolic networks and propose enzymatic connections between the constituent nodes. We also report a Byers-Waterman type pathway search algorithm for further paring down pertinent networks. RESULTS: Benchmarking tests run with SI show it can reduce the number of nodes visited in searching a putative network by 100-fold with a computational time improvement of up to 10(5)-fold. Subsequent Byers-Waterman search application further reduces the number of nodes searched by up to 100-fold, while SimZyme demonstrates â¼ 90% accuracy in matching query substrates with enzymes. Using these modules, we have designed and annotated an alternative to the methylerythritol phosphate pathway to produce isopentenyl pyrophosphate with more favorable thermodynamics than the native pathway. These algorithms will have a significant impact on our ability to use large metabolic networks that lack annotation of promiscuous reactions. AVAILABILITY AND IMPLEMENTATION: Python files will be available for download at http://tyolab.northwestern.edu/tools/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Biologia Computacional/métodos , Hemiterpenos/metabolismo , Redes e Vias Metabólicas , Metabolômica/métodos , Compostos Organofosforados/metabolismo , Preparações Farmacêuticas/química , Software , Bases de Dados de Compostos Químicos , Anotação de Sequência Molecular

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA