Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Chem Inf Model ; 62(7): 1618-1632, 2022 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-35315648

RESUMO

In the construction of QSAR models for the prediction of molecular activity, feature selection is a common task aimed at improving the results and understanding of the problem. The selection of features allows elimination of irrelevant and redundant features, reduces the effect of dimensionality problems, and improves the generalization and interpretability of the models. In many feature selection applications, such as those based on ensembles of feature selectors, it is necessary to combine different selection processes. In this work, we evaluate the application of a new feature selection approach to the prediction of molecular activity, based on the construction of an undirected graph to combine base feature selectors. The experimental results demonstrate the efficiency of the graph-based method in terms of the classification performance, reduction, and redundancy compared to the standard voting method. The graph-based method can be extended to different feature selection algorithms and applied to other cheminformatics problems.


Assuntos
Algoritmos , Projetos de Pesquisa
2.
IEEE Trans Cybern ; 52(5): 2942-2954, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-33027013

RESUMO

Feature selection is one of the most frequent tasks in data mining applications. Its ability to remove useless and redundant features improves the classification performance and gains knowledge about a given problem makes feature selection a common first step in data mining. In many feature selection applications, we need to combine the results of different feature selection processes. The two most common scenarios are the ensembles of feature selectors and the scaling up of feature selection methods using a data division approach. The standard procedure is to store the number of times every feature has been selected as a vote for the feature and then evaluate different selection thresholds with a certain criterion to obtain the final subset of selected features. However, this method is suboptimal as the relationships of the features are not considered in the voting process. Two redundant features may be selected a similar number of times due to the different sets of instances used each time. Thus, a voting scheme would tend to select both of them. In this article, we present a new approach: instead of using only the number of times a feature has been selected, the approach considers how many times the features have been selected together by a feature selection algorithm. The proposal is based on constructing an undirected graph where the vertices are the features, and the edges count the number of times every pair of instances has been selected together. This graph is used to select the best subset of features, avoiding the redundancy introduced by the voting scheme. The proposal improves the results of the standard voting scheme in both ensembles of feature selectors and data division methods for scaling up feature selection.


Assuntos
Algoritmos , Mineração de Dados , Projetos de Pesquisa
3.
J Chem Inf Model ; 61(1): 76-94, 2021 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-33350301

RESUMO

During the drug development process, it is common to carry out toxicity tests and adverse effect studies, which are essential to guarantee patient safety and the success of the research. The use of in silico quantitative structure-activity relationship (QSAR) approaches for this task involves processing a huge amount of data that, in many cases, have an imbalanced distribution of active and inactive samples. This is usually termed the class-imbalance problem and may have a significant negative effect on the performance of the learned models. The performance of feature selection (FS) for QSAR models is usually damaged by the class-imbalance nature of the involved datasets. This paper proposes the use of an FS method focused on dealing with the class-imbalance problems. The method is based on the use of FS ensembles constructed by boosting and using two well-known FS methods, fast clustering-based FS and the fast correlation-based filter. The experimental results demonstrate the efficiency of the proposal in terms of the classification performance compared to standard methods. The proposal can be extended to other FS methods and applied to other problems in cheminformatics.


Assuntos
Algoritmos , Relação Quantitativa Estrutura-Atividade , Simulação por Computador , Humanos , Projetos de Pesquisa
4.
J Cheminform ; 12(1): 61, 2020 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-33372638

RESUMO

The maximum common property similarity (MCPhd) method is presented using descriptors as a new approach to determine the similarity between two chemical compounds or molecular graphs. This method uses the concept of maximum common property arising from the concept of maximum common substructure and is based on the electrotopographic state index for atoms. A new algorithm to quantify the similarity values of chemical structures based on the presented maximum common property concept is also developed in this paper. To verify the validity of this approach, the similarity of a sample of compounds with antimalarial activity is calculated and compared with the results obtained by four different similarity methods: the small molecule subgraph detector (SMSD), molecular fingerprint based (OBabel_FP2), ISIDA descriptors and shape-feature similarity (SHAFTS). The results obtained by the MCPhd method differ significantly from those obtained by the compared methods, improving the quantification of the similarity. A major advantage of the proposed method is that it helps to understand the analogy or proximity between physicochemical properties of the molecular fragments or subgraphs compared with the biological response or biological activity. In this new approach, more than one property can be potentially used. The method can be considered a hybrid procedure because it combines descriptor and the fragment approaches.

5.
J Comput Aided Mol Des ; 34(3): 305-325, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31893338

RESUMO

In the construction of activity prediction models, the use of feature ranking methods is a useful mechanism for extracting information for ranking features in terms of their significance to develop predictive models. This paper studies the influence of feature rankers in the construction of molecular activity prediction models; for this purpose, a comparative study of fourteen rankings methods for feature selection was conducted. The activity prediction models were constructed using four well-known classifiers and a wide collection of datasets. The ranking algorithms were compared considering the performance of these classifiers using different metrics and the consistency of the ranked features.


Assuntos
Modelos Moleculares , Software , Algoritmos , Humanos
6.
J Chem Inf Model ; 59(10): 4120-4130, 2019 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-31514503

RESUMO

The prediction of adverse drug reactions in the discovery of new medicines is highly challenging. In the task of predicting the adverse reactions of chemical compounds, information about different targets is often available. Although we can focus on every adverse drug reaction prediction separately, multilabel approaches have been proven useful in many research areas for taking advantage of the relationship among the targets. However, when approaching the prediction problem from a multilabel point of view, we have to deal with the lack of information for some labels. This missing labels problem is a relevant issue in the field of cheminformatics approaches. This paper aims to predict the adverse drug reaction of commercial drugs using a multilabel approach where the possible presence of missing labels is also taken into consideration. We propose the use of multilabel methods to deal with the prediction of a large set of 27 different adverse reaction targets. We also propose the use of multilabel methods specifically designed to deal with the missing labels problem to test their ability to solve this difficult problem. The results show the validity of the proposed approach, demonstrating a superior performance of the multilabel method compared with the single-label approach in addressing the problem of adverse drug reaction prediction.


Assuntos
Biologia Computacional/métodos , Algoritmos , Simulação por Computador , Descoberta de Drogas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Modelos Moleculares , Relação Quantitativa Estrutura-Atividade
7.
Neural Netw ; 118: 175-191, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31299623

RESUMO

Prototype selection is one of the most common preprocessing tasks in data mining applications. The vast amounts of data that we must handle in practical problems render the removal of noisy, redundant or useless instances a convenient first step for any real-world application. Many algorithms have been proposed for prototype selection. For difficult problems, however, the use of only a single method would unlikely achieve the desired performance. Similar to the problem of classification, ensembles of prototype selectors have been proposed to overcome the limitations of single algorithms. In ensembles of prototype selectors, the usual combination method is based on a voting scheme coupled with an acceptance threshold. However, this method is suboptimal, because the relationships among the prototypes are not taken into account. In this paper, we propose a different approach, in which we consider not only the number of times every prototype has been selected but also the subsets of prototypes that are selected. With this additional information we develop GEEBIES, which is a new way of combining the results of ensembles of prototype selectors. In a large set of problems, we show that our proposal outperforms the standard boosting approach. A way of scaling up our method to large datasets is also proposed and experimentally tested.


Assuntos
Algoritmos , Bases de Dados Factuais , Estudo de Prova de Conceito , Mineração de Dados/normas , Bases de Dados Factuais/normas
8.
J Mol Graph Model ; 90: 235-242, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31103916

RESUMO

In this work, the application of a new strategy called NWFE ensemble (nonparametric weighted feature extraction ensemble) method is proposed. Subspace-supervised projections based on NWFE are incorporated into the construction of ensembles of classifiers to facilitate the correct classification of wrongly classified instances without being detrimental to the overall performance of the ensemble. The performance of NWFE is investigated with a c-Jun N-terminal kinase-3 inhibitor benchmark dataset using different chemical compound representation models. Compared with the standard method, the results obtained show that the applied method improves the prediction performance using two classifiers based on decision trees and support vector machines.


Assuntos
Proteína Quinase 10 Ativada por Mitógeno/antagonistas & inibidores , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacologia , Algoritmos , Árvores de Decisões , Humanos , Máquina de Vetores de Suporte
9.
J Comput Aided Mol Des ; 32(11): 1273-1294, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30367310

RESUMO

Feature selection is commonly used as a preprocessing step to machine learning for improving learning performance, lowering computational complexity and facilitating model interpretation. This paper proposes the application of boosting feature selection to improve the classification performance of standard feature selection algorithms evaluated for the prediction of P-gp inhibitors and substrates. Two well-known classification algorithms, decision trees and support vector machines, were used to classify the chemical compounds. The experimental results showed better performance for boosting feature selection with respect to the standard feature selection algorithms while maintaining the capability for feature reduction.


Assuntos
Subfamília B de Transportador de Cassetes de Ligação de ATP/antagonistas & inibidores , Subfamília B de Transportador de Cassetes de Ligação de ATP/química , Ligantes , Aprendizado de Máquina , Algoritmos , Árvores de Decisões , Estrutura Molecular , Ligação Proteica , Relação Quantitativa Estrutura-Atividade , Máquina de Vetores de Suporte
10.
IEEE Trans Neural Netw Learn Syst ; 28(2): 470-475, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-26731778

RESUMO

The k -nearest neighbor ( k -NN) classifier is one of the most widely used methods of classification due to several interesting features, including good generalization and easy implementation. Although simple, it is usually able to match and even outperform more sophisticated and complex methods. One of the problems with this approach is fixing the appropriate value of k . Although a good value might be obtained using cross validation, it is unlikely that the same value could be optimal for the whole space spanned by the training set. It is evident that different regions of the feature space would require different values of k due to the different distributions of prototypes. The situation of a query instance in the center of a class is very different from the situation of a query instance near the boundary between two classes. In this brief, we present a simple yet powerful approach to setting a local value of k . We associate a potentially different k to every prototype and obtain the best value of k by optimizing a criterion consisting of the local and global effects of the different k values in the neighborhood of the prototype. The proposed method has a fast training stage and the same complexity as the standard k -NN approach at the testing stage. The experiments show that this simple approach can significantly outperform the standard k -NN rule for both standard and class-imbalanced problems in a large set of different problems.

11.
Sensors (Basel) ; 16(11)2016 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-27886087

RESUMO

The current social impact of new technologies has produced major changes in all areas of society, creating the concept of a smart city supported by an electronic infrastructure, telecommunications and information technology. This paper presents a review of Bluetooth Low Energy (BLE), Near Field Communication (NFC) and Visible Light Communication (VLC) and their use and influence within different areas of the development of the smart city. The document also presents a review of Big Data Solutions for the management of information and the extraction of knowledge in an environment where things are connected by an "Internet of Things" (IoT) network. Lastly, we present how these technologies can be combined together to benefit the development of the smart city.

12.
J Comput Aided Mol Des ; 27(2): 185-201, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23386139

RESUMO

In this paper we propose a new method for the generation of 2D-QSAR models for the prediction of activity values of chemicals. Maximum common substructures which are extracted from the data set are used for molecule classification in a tree, where the node of the tree represents molecules or common structures to groups of molecules and the arcs of the tree represent non isomorphic substructures between two nodes of the tree. All paths between pairwise leaf nodes are used to represent the equation system used as representational space in the building of the QSAR model. The proposed model, which is based on the combining of non isomorphic structures, use of molecular descriptors for the calculation of path lengths and classification of the data set based on maximum common substructures, considerably improves the generation of QSAR models with regard to the classical model based only on the use of a set of molecular descriptors. Optimization algorithms based on genetic algorithm and differential evolution approximations have also been used, resulting in the improvement and refinement of the equations obtained.


Assuntos
Algoritmos , Antibacterianos/química , Bases de Dados Factuais , Evolução Molecular , Modelos Teóricos , Mycobacterium tuberculosis/efeitos dos fármacos , Relação Quantitativa Estrutura-Atividade , Antibacterianos/farmacologia , Ácidos Carboxílicos/química , Isoniazida/química , Testes de Sensibilidade Microbiana , Estrutura Molecular
13.
J Chem Inf Model ; 45(5): 1178-94, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16180895

RESUMO

In this paper we propose a new method based on measurements of the structural similarity for the clustering of chemical databases. The proposed method allows the dynamic adjustment of the size and number of cells or clusters in which the database is classified. Classification is carried out using measurements of structural similarity obtained from the matching of molecular graphs. The classification process is open to the use of different similarity indexes and different measurements of matching. This process consists of the projection of the obtained measures of similarity among the elements of the database in a new space of similarity. The possibility of the dynamic readjustment of the dimension and characteristic of the projection space to adapt to the most favorable conditions of the problem under study and the simplicity and computational efficiency make the proposed method appropriate for its use with medium and large databases. The clustering method increases the performance of the screening processes in chemical databases, facilitating the recovery of chemical compounds that share all or subsets of common substructures to a given pattern. For the realization of the work a database of 498 natural compounds with wide molecular diversity extracted from SPECS and BIOSPECS B.V. free database has been used.

14.
J Chem Inf Model ; 45(2): 231-8, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15807483

RESUMO

In this paper we present an algorithm for the generation of molecular graphs with a given value of the Wiener index. The high number of graphs for a given value of the Wiener index is reduced thanks to the application of a set of heuristics taking into account the structural characteristics of the molecules. The selection of parameters as the interval of values for the Wiener index, the diversity and occurrence of atoms and bonds, the size and number of cycles, and the presence of structural patterns guide the processing of the heuristics generating molecular graphs with a considerable saving in computational cost. The modularity in the design of the algorithm allows it to be used as a pattern for the development of other algorithms based on different topological invariants, which allow for its use in areas of interest, say as involving combinatorial databases and screening in chemical databases.

15.
J Chem Inf Comput Sci ; 44(4): 1383-93, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15272846

RESUMO

The great size of chemical databases and the high computational cost required in the atom-atom comparison of molecular structures for the calculation of the similarity between two chemical compounds necessitate the proposal of new clustering models with the aim of reducing the time of recovery of a set of molecules from a database that satisfies a range of similarities with regard to a given molecule pattern. In this paper we make use of the information corresponding to the cycles existing in the structure of molecules as an approach for the classification of chemical databases. The clustering method here proposed is based on the representation of the topological structure of molecules stored in chemical databases through its corresponding cycle graph. This method presents a more appropriate behavior for others described in the bibliography in which the information corresponding to the cyclicity of the molecules is also used.

16.
J Chem Inf Comput Sci ; 44(2): 447-61, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15032524

RESUMO

In this paper, a new representation model using the existing cycles in the topological structure of the molecules is proposed. Extracting all cycles of a molecule, its topological structure can be represented by means of a weighted, colored, and nondirected graph named "cycle graph", where the nodes represent the cycles in the molecule and the edges the common nodes among those cycles. In this paper, the capacity of cycle graph for the extraction of topological descriptors contributing appropriate measures of complexity, cyclicity, and symmetry of cyclical systems is presented.

17.
J Chem Inf Comput Sci ; 44(1): 30-41, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-14741008

RESUMO

In this paper we propose a new algorithm for subgraph isomorphism based on the representation of molecular structures as colored graphs and the representation of these graphs as vectors in n-dimensional spaces. The presented process that obtains all maximum common substructures is based on the solution of a constraint satisfaction problem defined as the common m-dimensional space (m< or =n) in which the vectors representing the matched graphs can be defined.

18.
J Chem Inf Comput Sci ; 43(5): 1378-89, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-14502470

RESUMO

Over the past decade, computer-assisted learning in the field of chemistry has given rise to a large number of systems that approach this objective from different viewpoints: static courses aimed at specific concepts, tutorial systems, 2D and 3D virtual environments, and so on. Correct structuring and representation of the knowledge to be taught, the building of a suitable student interface, and the adaptation of the learning process to the knowledge of the student are but a few of the challenges to be faced in the development of efficient computer-assisted learning systems. The present study tackles the use of computerized dialogue systems as a viable alternative for the simulation of teacher-student interaction and proposes an ontology for the characterization of such interaction, employing the object-oriented paradigm in the modeling of both the knowledge to be taught and the actual level of the student. The proposed solution is based on the representation of the knowledge to be taught through a network of multiconnected knowledge frames (chunks), where each chunk may be specialized in more specific frames (prerequisites and subobjectives), contain associated explanations of varying complexity and with a range of explanatory models, and be associated with one or a set of possible questions the student might ask; to this end, a constantly evolving knowledge model is maintained throughout the explanatory process. Based on the proposed model, Java was used to develop and manage an explanatory system that could be used in any type of teaching system based on student-dominated dialogues. Here, the system has been applied to the teaching of chemistry laboratory practice by its integration into the Virtual Chemistry Laboratory (VCL), a system designed by the authors to simulate chemistry techniques in a virtual 3D world.

19.
J Chem Inf Comput Sci ; 42(6): 1415-24, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12444739

RESUMO

The structural characteristics of a molecule, namely size, bond type and number, cycles, shape, and functional groups, will largely determine its physicochemical properties and biological activity. Extraction of data such as the complete structural information of a molecule's ring system is complex (NP-complete) and has traditionally involved high computational cost. The present study proposes a new operator for the extraction of cycles from a graph. Based on an initial cycle set, the operator employs a reduced number of operations in an iterative process of error-free cycle extraction, hence greatly reducing computational cost. Algorithm efficiency has been enhanced by designing new data structures suited to cycle storage, useful not only for interactive solutions but also for applications managing large volumes of information such as descriptor calculation, QSPR/QSAR, matching, clustering, screening, and filtering. Validation was performed by applying the algorithm to a test suite of chemical compounds of varying complexity.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...