Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Mol Inform ; 42(3): e2200232, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36529710

RESUMO

Maximum common substructures (MCS) have received a lot of attention in the chemoinformatics community. They are typically used as a similarity measure between molecules, showing high predictive performance when used in classification tasks, while being easily explainable substructures. In the present work, we applied the Pairwise Maximum Common Subgraph Feature Generation (PMCSFG) algorithm to automatically detect toxicophores (structural alerts) and to compute fingerprints based on MCS. We present a comparison between our MCS-based fingerprints and 12 well-known chemical fingerprints when used as features in machine learning models. We provide an experimental evaluation and discuss the usefulness of the different methods on mutagenicity data. The features generated by the MCS method have a state-of-the-art performance when predicting mutagenicity, while they are more interpretable than the traditional chemical fingerprints.


Assuntos
Algoritmos , Mutagênicos , Mutagênicos/química , Mutagênese , Aprendizado de Máquina
2.
Proc Natl Acad Sci U S A ; 116(36): 18142-18147, 2019 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-31420515

RESUMO

One of the most challenging tasks in modern science is the development of systems biology models: Existing models are often very complex but generally have low predictive performance. The construction of high-fidelity models will require hundreds/thousands of cycles of model improvement, yet few current systems biology research studies complete even a single cycle. We combined multiple software tools with integrated laboratory robotics to execute three cycles of model improvement of the prototypical eukaryotic cellular transformation, the yeast (Saccharomyces cerevisiae) diauxic shift. In the first cycle, a model outperforming the best previous diauxic shift model was developed using bioinformatic and systems biology tools. In the second cycle, the model was further improved using automatically planned experiments. In the third cycle, hypothesis-led experiments improved the model to a greater extent than achieved using high-throughput experiments. All of the experiments were formalized and communicated to a cloud laboratory automation system (Eve) for automatic execution, and the results stored on the semantic web for reuse. The final model adds a substantial amount of knowledge about the yeast diauxic shift: 92 genes (+45%), and 1,048 interactions (+147%). This knowledge is also relevant to understanding cancer, the immune system, and aging. We conclude that systems biology software tools can be combined and integrated with laboratory robots in closed-loop cycles.


Assuntos
Biologia Computacional , Regulação Fúngica da Expressão Gênica , Robótica , Saccharomyces cerevisiae , Software , Biologia de Sistemas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
3.
PLoS Comput Biol ; 14(4): e1006097, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29684010

RESUMO

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner's predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.


Assuntos
Aprendizado de Máquina , Retroelementos , Sequências Repetidas Terminais , Animais , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Biologia Computacional , Sequência Conservada , DNA de Plantas/genética , Árvores de Decisões , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Evolução Molecular , Genoma de Inseto , Genoma de Planta , Software
4.
PLoS One ; 13(4): e0195997, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29698494

RESUMO

MOTIVATION: Graphlets are small network patterns that can be counted in order to characterise the structure of a network (topology). As part of a topology optimisation process, one could use graphlet counts to iteratively modify a network and keep track of the graphlet counts, in order to achieve certain topological properties. Up until now, however, graphlets were not suited as a metric for performing topology optimisation; when millions of minor changes are made to the network structure it becomes computationally intractable to recalculate all the graphlet counts for each of the edge modifications. RESULTS: IncGraph is a method for calculating the differences in graphlet counts with respect to the network in its previous state, which is much more efficient than calculating the graphlet occurrences from scratch at every edge modification made. In comparison to static counting approaches, our findings show IncGraph reduces the execution time by several orders of magnitude. The usefulness of this approach was demonstrated by developing a graphlet-based metric to optimise gene regulatory networks. IncGraph is able to quickly quantify the topological impact of small changes to a network, which opens novel research opportunities to study changes in topologies in evolving or online networks, or develop graphlet-based criteria for topology optimisation. AVAILABILITY: IncGraph is freely available as an open-source R package on CRAN (incgraph). The development version is also available on GitHub (rcannood/incgraph).


Assuntos
Software , Algoritmos , Redes Reguladoras de Genes , Modelos Biológicos
5.
Mol Inform ; 36(10)2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28590546

RESUMO

This article introduces a new type of structural fragment called a geometrical pattern. Such geometrical patterns are defined as molecular graphs that include a labelling of atoms together with constraints on interatomic distances. The discovery of geometrical patterns in a chemical dataset relies on the induction of multiple decision trees combined in random forests. Each computational step corresponds to a refinement of a preceding set of constraints, extending a previous geometrical pattern. This paper focuses on the mutagenicity of chemicals via the definition of structural alerts in relation with these geometrical patterns. It follows an experimental assessment of the main geometrical patterns to show how they can efficiently originate the definition of a chemical feature related to a chemical function or a chemical property. Geometrical patterns have provided a valuable and innovative approach to bring new pieces of information for discovering and assessing structural characteristics in relation to a particular biological phenotype.


Assuntos
Mutagênese/fisiologia , Carcinógenos/química , Mutagênese/genética , Testes de Mutagenicidade , Mutagênicos/química , Relação Estrutura-Atividade
6.
Expert Rev Proteomics ; 13(5): 495-511, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27031651

RESUMO

With the current expanded technical capabilities to perform mass spectrometry-based biomedical proteomics experiments, an improved focus on the design of experiments is crucial. As it is clear that ignoring the importance of a good design leads to an unprecedented rate of false discoveries which would poison our results, more and more tools are developed to help researchers designing proteomic experiments. In this review, we apply statistical thinking to go through the entire proteomics workflow for biomarker discovery and validation and relate the considerations that should be made at the level of hypothesis building, technology selection, experimental design and the optimization of the experimental parameters.


Assuntos
Espectrometria de Massas/métodos , Proteômica/métodos , Projetos de Pesquisa , Humanos , Proteômica/estatística & dados numéricos , Proteômica/tendências
7.
J R Soc Interface ; 12(104): 20141289, 2015 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-25652463

RESUMO

There is an urgent need to make drug discovery cheaper and faster. This will enable the development of treatments for diseases currently neglected for economic reasons, such as tropical and orphan diseases, and generally increase the supply of new drugs. Here, we report the Robot Scientist 'Eve' designed to make drug discovery more economical. A Robot Scientist is a laboratory automation system that uses artificial intelligence (AI) techniques to discover scientific knowledge through cycles of experimentation. Eve integrates and automates library-screening, hit-confirmation, and lead generation through cycles of quantitative structure activity relationship learning and testing. Using econometric modelling we demonstrate that the use of AI to select compounds economically outperforms standard drug screening. For further efficiency Eve uses a standardized form of assay to compute Boolean functions of compound properties. These assays can be quickly and cheaply engineered using synthetic biology, enabling more targets to be assayed for a given budget. Eve has repositioned several drugs against specific targets in parasites that cause tropical diseases. One validated discovery is that the anti-cancer compound TNP-470 is a potent inhibitor of dihydrofolate reductase from the malaria-causing parasite Plasmodium vivax.


Assuntos
Desenho de Fármacos , Reposicionamento de Medicamentos , Doenças Raras/tratamento farmacológico , Tecnologia Farmacêutica/tendências , Algoritmos , Antineoplásicos/uso terapêutico , Automação , Avaliação Pré-Clínica de Medicamentos , Humanos , Malária Vivax/tratamento farmacológico , Modelos Estatísticos , Plasmodium vivax/efeitos dos fármacos , Relação Quantitativa Estrutura-Atividade , Análise de Regressão , Reprodutibilidade dos Testes , Software , Medicina Tropical
8.
Biol Direct ; 10: 1, 2015 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-25564011

RESUMO

BACKGROUND: A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates different methods to improve the detection of HIV-1 protein coevolution has not been developed. RESULTS: We integrated 27 sequence-based prediction methods published between 2004 and 2013 into an ensemble coevolution system. This system allowed combinations of different sequence-based methods for coevolution predictions. Using HIV-1 protein structures and experimental data, we evaluated the performance of individual and combined sequence-based methods in the prediction of HIV-1 intra- and inter-protein coevolution. We showed that sequence-based methods clustered according to their methodology, and a combination of four methods outperformed any of the 27 individual methods. This four-method combination estimated that HIV-1 intra-protein coevolving positions were mainly located in functional domains and physically contacted with each other in the protein tertiary structures. In the analysis of HIV-1 inter-protein coevolving positions between Gag and protease, protease drug resistance positions near the active site mostly coevolved with Gag cleavage positions (V128, S373-T375, A431, F448-P453) and Gag C-terminal positions (S489-Q500) under selective pressure of protease inhibitors. CONCLUSIONS: This study presents a new ensemble coevolution system which detects position-specific coevolution using combinations of 27 different sequence-based methods. Our findings highlight key coevolving residues within HIV-1 structural proteins and between Gag and protease, shedding light on HIV-1 intra- and inter-protein coevolution.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Protease de HIV/genética , HIV-1/genética , Produtos do Gene gag do Vírus da Imunodeficiência Humana/genética , Área Sob a Curva , Bases de Dados de Proteínas , Produtos do Gene gag/química , Humanos , Modelos Moleculares , Modelos Estatísticos , Ligação Proteica , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes , Proteínas Virais/química
9.
Proteomics ; 14(4-5): 353-66, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24323524

RESUMO

Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.


Assuntos
Inteligência Artificial , Biologia Computacional , Proteômica/métodos , Padrões de Referência , Projetos de Pesquisa
10.
Bioinformatics ; 29(15): 1913-4, 2013 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-23709496

RESUMO

SUMMARY: We present PIUS, a tool that identifies peptides from tandem mass spectrometry data by analyzing the six-frame translation of a complete genome. It differs from earlier studies that have performed such a genomic search in two ways: (i) it considers a larger search space and (ii) it is designed for natural peptide identification rather than proteomics. Differently from other peptidomics tools designed for genome-wide searches, PIUS does not limit the analysis to a set of sequences that match a list of de novo reconstructions. AVAILABILITY: Source code, executables and a detailed technical report are freely available at http://dtai.cs.kuleuven.be/ml/systems/pius. CONTACT: eduardo.costa@cs.kuleuven.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Peptídeos/química , Software , Espectrometria de Massas em Tandem , Algoritmos , Animais , Linhagem Celular , Bases de Dados de Proteínas , Genoma , Genômica , Camundongos , Peptídeos/análise , Proteômica/métodos , Análise de Sequência de Proteína
11.
J Proteome Res ; 12(5): 2253-9, 2013 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-23517142

RESUMO

Trypsin is the workhorse protease in mass spectrometry-based proteomics experiments and is used to digest proteins into more readily analyzable peptides. To identify these peptides after mass spectrometric analysis, the actual digestion has to be mimicked as faithfully as possible in silico. In this paper we introduce CP-DT (Cleavage Prediction with Decision Trees), an algorithm based on a decision tree ensemble that was learned on publicly available peptide identification data from the PRIDE repository. We demonstrate that CP-DT is able to accurately predict tryptic cleavage: tests on three independent data sets show that CP-DT significantly outperforms the Keil rules that are currently used to predict tryptic cleavage. Moreover, the trees generated by CP-DT can make predictions efficiently and are interpretable by domain experts.


Assuntos
Modelos Biológicos , Tripsina/química , Algoritmos , Sequência de Aminoácidos , Animais , Inteligência Artificial , Interpretação Estatística de Dados , Árvores de Decisões , Humanos , Proteólise , Proteômica
12.
BMC Med Inform Decis Mak ; 11: 64, 2011 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-22027016

RESUMO

BACKGROUND: The intensive care unit (ICU) length of stay (LOS) of patients undergoing cardiac surgery may vary considerably, and is often difficult to predict within the first hours after admission. The early clinical evolution of a cardiac surgery patient might be predictive for his LOS. The purpose of the present study was to develop a predictive model for ICU discharge after non-emergency cardiac surgery, by analyzing the first 4 hours of data in the computerized medical record of these patients with Gaussian processes (GP), a machine learning technique. METHODS: Non-interventional study. Predictive modeling, separate development (n = 461) and validation (n = 499) cohort. GP models were developed to predict the probability of ICU discharge the day after surgery (classification task), and to predict the day of ICU discharge as a discrete variable (regression task). GP predictions were compared with predictions by EuroSCORE, nurses and physicians. The classification task was evaluated using aROC for discrimination, and Brier Score, Brier Score Scaled, and Hosmer-Lemeshow test for calibration. The regression task was evaluated by comparing median actual and predicted discharge, loss penalty function (LPF) ((actual-predicted)/actual) and calculating root mean squared relative errors (RMSRE). RESULTS: Median (P25-P75) ICU length of stay was 3 (2-5) days. For classification, the GP model showed an aROC of 0.758 which was significantly higher than the predictions by nurses, but not better than EuroSCORE and physicians. The GP had the best calibration, with a Brier Score of 0.179 and Hosmer-Lemeshow p-value of 0.382. For regression, GP had the highest proportion of patients with a correctly predicted day of discharge (40%), which was significantly better than the EuroSCORE (p < 0.001) and nurses (p = 0.044) but equivalent to physicians. GP had the lowest RMSRE (0.408) of all predictive models. CONCLUSIONS: A GP model that uses PDMS data of the first 4 hours after admission in the ICU of scheduled adult cardiac surgery patients was able to predict discharge from the ICU as a classification as well as a regression task. The GP model demonstrated a significantly better discriminative power than the EuroSCORE and the ICU nurses, and at least as good as predictions done by ICU physicians. The GP model was the only well calibrated model.


Assuntos
Unidades de Terapia Intensiva/organização & administração , Modelos Teóricos , Alta do Paciente , Procedimentos Cirúrgicos Operatórios , Adulto , Inteligência Artificial , Procedimentos Cirúrgicos Cardíacos , Registros Eletrônicos de Saúde , Humanos , Tempo de Internação , Distribuição Normal
13.
Stud Health Technol Inform ; 150: 590-4, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19745380

RESUMO

This work studies the impact of using dynamic information as features in a machine learning algorithm for the prediction task of classifying critically ill patients in two classes according to the time they need to reach a stable state after coronary bypass surgery: less or more than nine hours. On the basis of five physiological variables different dynamic features were extracted. These sets of features served subsequently as inputs for a Gaussian process and the prediction results were compared with the case where only admission data was used for the classification. The dynamic features, especially the cepstral coefficients (aROC: 0.749, Brier score: 0.206), resulted in higher performances when compared to static admission data (aROC: 0.547, Brier score: 0.247). In all cases, the Gaussian process classifier outperformed logistic regression.


Assuntos
Armazenamento e Recuperação da Informação , Estatística como Assunto/métodos , Idoso , Bélgica , Feminino , Humanos , Unidades de Terapia Intensiva , Masculino , Pessoa de Meia-Idade , Distribuição Normal , Desmame do Respirador/estatística & dados numéricos
14.
Best Pract Res Clin Anaesthesiol ; 23(1): 127-43, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19449621

RESUMO

Computerization in healthcare in general, and in the operating room (OR) and intensive care unit (ICU) in particular, is on the rise. This leads to large patient databases, with specific properties. Machine learning techniques are able to examine and to extract knowledge from large databases in an automatic way. Although the number of potential applications for these techniques in medicine is large, few medical doctors are familiar with their methodology, advantages and pitfalls. A general overview of machine learning techniques, with a more detailed discussion of some of these algorithms, is presented in this review.


Assuntos
Inteligência Artificial , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Sistemas Computadorizados de Registros Médicos , Algoritmos , Biologia Computacional/métodos , Sistemas de Apoio a Decisões Clínicas , Humanos , Unidades de Terapia Intensiva , Redes Neurais de Computação , Salas Cirúrgicas/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...