Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Polymers (Basel) ; 15(5)2023 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-36904566

RESUMO

Artificial intelligence (AI) is an emerging technology that is revolutionizing the discovery of new materials. One key application of AI is virtual screening of chemical libraries, which enables the accelerated discovery of materials with desired properties. In this study, we developed computational models to predict the dispersancy efficiency of oil and lubricant additives, a critical property in their design that can be estimated through a quantity named blotter spot. We propose a comprehensive approach that combines machine learning techniques with visual analytics strategies in an interactive tool that supports domain experts' decision-making. We evaluated the proposed models quantitatively and illustrated their benefits through a case study. Specifically, we analyzed a series of virtual polyisobutylene succinimide (PIBSI) molecules derived from a known reference substrate. Our best-performing probabilistic model was Bayesian Additive Regression Trees (BART), which achieved a mean absolute error of 5.50±0.34 and a root mean square error of 7.56±0.47, as estimated through 5-fold cross-validation. To facilitate future research, we have made the dataset, including the potential dispersants used for modeling, publicly available. Our approach can help accelerate the discovery of new oil and lubricant additives, and our interactive tool can aid domain experts in making informed decisions based on blotter spot and other key properties.

2.
J Chem Inf Model ; 62(24): 6342-6351, 2022 12 26.
Artigo em Inglês | MEDLINE | ID: mdl-36066065

RESUMO

The Ames mutagenicity test constitutes the most frequently used assay to estimate the mutagenic potential of drug candidates. While this test employs experimental results using various strains of Salmonella typhimurium, the vast majority of the published in silico models for predicting mutagenicity do not take into account the test results of the individual experiments conducted for each strain. Instead, such QSAR models are generally trained employing overall labels (i.e., mutagenic and nonmutagenic). Recently, neural-based models combined with multitask learning strategies have yielded interesting results in different domains, given their capabilities to model multitarget functions. In this scenario, we propose a novel neural-based QSAR model to predict mutagenicity that leverages experimental results from different strains involved in the Ames test by means of a multitask learning approach. To the best of our knowledge, the modeling strategy hereby proposed has not been applied to model Ames mutagenicity previously. The results yielded by our model surpass those obtained by single-task modeling strategies, such as models that predict the overall Ames label or ensemble models built from individual strains. For reproducibility and accessibility purposes, all source code and datasets used in our experiments are publicly available.


Assuntos
Mutagênicos , Redes Neurais de Computação , Mutagênicos/toxicidade , Reprodutibilidade dos Testes , Mutagênese , Simulação por Computador , Testes de Mutagenicidade/métodos
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34498670

RESUMO

With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. Despite the interest of the community in developing new methods for learning molecular embeddings and their theoretical benefits, comparing molecular embeddings with each other and with traditional representations is not straightforward, which in turn hinders the process of choosing a suitable representation for Quantitative Structure-Activity Relationship (QSAR) modeling. A reason behind this issue is the difficulty of conducting a fair and thorough comparison of the different existing embedding approaches, which requires numerous experiments on various datasets and training scenarios. To close this gap, we reviewed the literature on methods for molecular embeddings and reproduced three unsupervised and two supervised molecular embedding techniques recently proposed in the literature. We compared these five methods concerning their performance in QSAR scenarios using different classification and regression datasets. We also compared these representations to traditional molecular representations, namely molecular descriptors and fingerprints. As opposed to the expected outcome, our experimental setup consisting of over $25 000$ trained models and statistical tests revealed that the predictive performance using molecular embeddings did not significantly surpass that of traditional representations. Although supervised embeddings yielded competitive results compared with those using traditional molecular representations, unsupervised embeddings tended to perform worse than traditional representations. Our results highlight the need for conducting a careful comparison and analysis of the different embedding techniques prior to using them in drug design tasks and motivate a discussion about the potential of molecular embeddings in computer-aided drug design.


Assuntos
Algoritmos , Relação Quantitativa Estrutura-Atividade
4.
JAMIA Open ; 4(4): ooab104, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34927002

RESUMO

The COVID-19 pandemic resulted in an unprecedented production of scientific literature spanning several fields. To facilitate navigation of the scientific literature related to various aspects of the pandemic, we developed an exploratory search system. The system is based on automatically identified technical terms, document citations, and their visualization, accelerating identification of relevant documents. It offers a multi-view interactive search and navigation interface, bringing together unsupervised approaches of term extraction and citation analysis. We conducted a user evaluation with domain experts, including epidemiologists, biochemists, medicinal chemists, and medicine students. In general, most users were satisfied with the relevance and speed of the search results. More interestingly, participants mostly agreed on the capacity of the system to enable exploration and discovery of the search space using the graph visualization and filters. The system is updated on a weekly basis and it is publicly available at http://www.nactem.ac.uk/cord/.

5.
IEEE Trans Vis Comput Graph ; 27(2): 891-901, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33048734

RESUMO

In the modern drug discovery process, medicinal chemists deal with the complexity of analysis of large ensembles of candidate molecules. Computational tools, such as dimensionality reduction (DR) and classification, are commonly used to efficiently process the multidimensional space of features. These underlying calculations often hinder interpretability of results and prevent experts from assessing the impact of individual molecular features on the resulting representations. To provide a solution for scrutinizing such complex data, we introduce ChemVA, an interactive application for the visual exploration of large molecular ensembles and their features. Our tool consists of multiple coordinated views: Hexagonal view, Detail view, 3D view, Table view, and a newly proposed Difference view designed for the comparison of DR projections. These views display DR projections combined with biological activity, selected molecular features, and confidence scores for each of these projections. This conjunction of views allows the user to drill down through the dataset and to efficiently select candidate compounds. Our approach was evaluated on two case studies of finding structurally similar ligands with similar binding affinity to a target protein, as well as on an external qualitative evaluation. The results suggest that our system allows effective visual inspection and comparison of different high-dimensional molecular representations. Furthermore, ChemVA assists in the identification of candidate compounds while providing information on the certainty behind different molecular representations.


Assuntos
Gráficos por Computador , Proteínas
6.
Bioinformatics ; 35(10): 1799-1801, 2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-30329013

RESUMO

SUMMARY: Although the publication rate of the biomedical literature has been growing steadily during the last decades, the accessibility of pertinent research publications for biologist and medical practitioners remains a challenge. This article describes Thalia, which is a semantic search engine that can recognize eight different types of concepts occurring in biomedical abstracts. Thalia is available via a web-based interface or a RESTful API. A key aspect of our search engine is that it is updated from PubMed on a daily basis. We describe here the main building blocks of our tool as well as an evaluation of the retrieval capabilities of Thalia in the context of a precision medicine dataset. AVAILABILITY AND IMPLEMENTATION: Thalia is available at http://nactem.ac.uk/Thalia_BI/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Ferramenta de Busca , Internet , PubMed , Semântica
7.
Bioinformatics ; 34(8): 1389-1397, 2018 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-29228271

RESUMO

Motivation: Pathway models are valuable resources that help us understand the various mechanisms underpinning complex biological processes. Their curation is typically carried out through manual inspection of published scientific literature to find information relevant to a model, which is a laborious and knowledge-intensive task. Furthermore, models curated manually cannot be easily updated and maintained with new evidence extracted from the literature without automated support. Results: We have developed LitPathExplorer, a visual text analytics tool that integrates advanced text mining, semi-supervised learning and interactive visualization, to facilitate the exploration and analysis of pathway models using statements (i.e. events) extracted automatically from the literature and organized according to levels of confidence. LitPathExplorer supports pathway modellers and curators alike by: (i) extracting events from the literature that corroborate existing models with evidence; (ii) discovering new events which can update models; and (iii) providing a confidence value for each event that is automatically computed based on linguistic features and article metadata. Our evaluation of event extraction showed a precision of 89% and a recall of 71%. Evaluation of our confidence measure, when used for ranking sampled events, showed an average precision ranging between 61 and 73%, which can be improved to 95% when the user is involved in the semi-supervised learning process. Qualitative evaluation using pair analytics based on the feedback of three domain experts confirmed the utility of our tool within the context of pathway model exploration. Availability and implementation: LitPathExplorer is available at http://nactem.ac.uk/LitPathExplorer_BI/. Contact: sophia.ananiadou@manchester.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Gráficos por Computador , Mineração de Dados/métodos , Aprendizado de Máquina Supervisionado , Publicações
8.
PLoS One ; 12(4): e0175277, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28414821

RESUMO

The increasing growth of literature in biodiversity presents challenges to users who need to discover pertinent information in an efficient and timely manner. In response, text mining techniques offer solutions by facilitating the automated discovery of knowledge from large textual data. An important step in text mining is the recognition of concepts via their linguistic realisation, i.e., terms. However, a given concept may be referred to in text using various synonyms or term variants, making search systems likely to overlook documents mentioning less known variants, which are albeit relevant to a query term. Domain-specific terminological resources, which include term variants, synonyms and related terms, are thus important in supporting semantic search over large textual archives. This article describes the use of text mining methods for the automatic construction of a large-scale biodiversity term inventory. The inventory consists of names of species, amongst which naming variations are prevalent. We apply a number of distributional semantic techniques on all of the titles in the Biodiversity Heritage Library, to compute semantic similarity between species names and support the automated construction of the resource. With the construction of our biodiversity term inventory, we demonstrate that distributional semantic models are able to identify semantically similar names that are not yet recorded in existing taxonomies. Such methods can thus be used to update existing taxonomies semi-automatically by deriving semantically related taxonomic names from a text corpus and allowing expert curators to validate them. We also evaluate our inventory as a means to improve search by facilitating automatic query expansion. Specifically, we developed a visual search interface that suggests semantically related species names, which are available in our inventory but not always in other repositories, to incorporate into the search query. An assessment of the interface by domain experts reveals that our query expansion based on related names is useful for increasing the number of relevant documents retrieved. Its exploitation can benefit both users and developers of search engines and text mining applications.


Assuntos
Biodiversidade , Mineração de Dados/métodos , Algoritmos , Bibliotecas , Ferramenta de Busca , Semântica , Terminologia como Assunto
9.
Anal Chem ; 88(15): 7476-80, 2016 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-27351615

RESUMO

Liquid chromatography coupled to electrospray tandem mass spectrometry (LC-ESI-MS/MS) is widely used in proteomic and metabolomic workflows. Considerable analytical improvements have been observed when the components of LC systems are scaled down. Currently, nano-ESI is typically done at capillary LC flow rates ranging from 200 to 300 nL/min. At these flow rates, trouble shooting and leak detection of LC systems has become increasingly challenging. In this paper we present a novel proof-of-concept approach to measure flow rates at the tip of electrospray emitters when the ionization voltage is turned off. This was achieved by estimating the changes in the droplet volume over time using digital image analysis. The results are comparable with the traditional methods of measuring flow rates, with the potential advantages of being fully automatable and nondisruptive.

10.
J Cheminform ; 7: 39, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26300983

RESUMO

BACKGROUND: The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert's knowledge in the selection process is needed for increase the confidence in the final set of descriptors. RESULTS: In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. CONCLUSIONS: The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist's expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Graphical abstractVIDEAN allows the visual analysis of candidate subsets of descriptors for QSAR/QSPR. In the two panels on the top, users can interactively explore numerical correlations as well as co-occurrences in the candidate subsets through two interactive graphs.

11.
Molecules ; 17(12): 14937-53, 2012 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-23247367

RESUMO

Volatile organic compounds (VOCs) are contained in a variety of chemicals that can be found in household products and may have undesirable effects on health. Thereby, it is important to model blood-to-liver partition coefficients (log P(liver)) for VOCs in a fast and inexpensive way. In this paper, we present two new quantitative structure-property relationship (QSPR) models for the prediction of log P(liver), where we also propose a hybrid approach for the selection of the descriptors. This hybrid methodology combines a machine learning method with a manual selection based on expert knowledge. This allows obtaining a set of descriptors that is interpretable in physicochemical terms. Our regression models were trained using decision trees and neural networks and validated using an external test set. Results show high prediction accuracy compared to previous log P(liver) models, and the descriptor selection approach provides a means to get a small set of descriptors that is in agreement with theoretical understanding of the target property.


Assuntos
Gases , Modelos Teóricos , Relação Quantitativa Estrutura-Atividade , Compostos Orgânicos Voláteis , Animais , Inteligência Artificial , Gases/química , Gases/toxicidade , Humanos , Fígado/efeitos dos fármacos , Ratos , Compostos Orgânicos Voláteis/química , Compostos Orgânicos Voláteis/toxicidade
12.
Mol Inform ; 30(9): 779-89, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-27467410

RESUMO

This work describes a methodology for assisting virtual screening of drugs during the early stages of the drug development process. This methodology is proposed to improve the reliability of in silico property prediction and it is structured in two steps. Firstly, a transformation is sought for mapping a high-dimensional space defined by potentially redundant or irrelevant molecular descriptors into a low-dimensional application-related space. For this task we evaluate three different target-driven subspace mapping methods, out of which we highlight the recent Correlative Matrix Mapping (CMM) as the most stable. Secondly, we apply an applicability domain model on the low-dimensional space for assessing confidentiality of compound classification. By a probabilistic framework the applicability domain approach identifies poorly represented compounds in the training set (extrapolation problems) and regions in the space where the uncertainty about the correct class is higher than normal (interpolation problems). This two-step approach represents an important contribution to the development of confident prediction tools in the chemoinformatics area, where the field is in need of both interpretable models and methods that estimate the confidence of predictions.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...