Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
J Chem Inf Model ; 64(4): 1107-1111, 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-38346241

ABSTRACT

There has been a growing recognition of the need for diversity and inclusion in scientific fields. This trend is reflected in the Journal of Chemical Information and Modeling (JCIM), where there has been a gradual increase in the number of papers that embrace this diversity. In this viewpoint, we analyze the evolution of the profile of papers published in JCIM from 1996 to 2022 addressing three diversity criteria, namely interdisciplinarity, geographic and gender distributions, and their impact on citation patterns. We used natural language processing tools for the classification of main areas and gender, as well as metadata, to analyze a total of 7384 articles published in the categories of research articles, reviews, and brief reports. Our analyses reveal that the relative number of articles and citation patterns are similar across the main areas within the scope of JCIM, and international collaboration and publications encompassing two to three research areas attract more citations. The percentage of female authors has increased from 1996 (less than 20%) to 2022 (more than 32%), indicating a positive trend toward gender diversity in almost all geographic regions, although the percentage of publications by single female authors remains lower than 20%. Most JCIM citations come from Europe and the Americas, with a tendency for JCIM papers to cite articles from the same continent. Furthermore, there is a correlation between the gender of the authors, as JCIM manuscripts authored by females are more likely to be cited by other JCIM manuscripts authored by females.


Subject(s)
Models, Chemical , Natural Language Processing , Female , Humans
2.
J Chem Inf Model ; 62(22): 5503-5512, 2022 Nov 28.
Article in English | MEDLINE | ID: mdl-36302503

ABSTRACT

Nanoclusters are remarkably promising for the capture and activation of small molecules for fuel production or as precursors for other chemicals of high commercial value. Since this process occurs under a wide variety of experimental conditions, an improved atomistic understanding of the stability and phase transitions of these systems will be key to the development of successful technological applications. In this work, we proposed a theoretical framework to explore the potential energy surface and configuration space of nanoclusters to map the most important morphologies presented by those systems and the phase transitions between them. A fully automated process was developed, which combines global optimization techniques, classical molecular dynamics, and unsupervised machine learning algorithms. To showcase these capabilities of the approach, we explored the example of copper nanoclusters (Cun) where n = 13, 38, 55, 75, 98, 102, and 147. We not only reported a graphical potential energy surface for each size, but also explored the topology of the configuration space via structural and thermodynamic analyses. The effect of size on the potential energy surface and the critical temperature for solid-liquid phase transitions were also reported, highlighting the impact of magic numbers on those quantities.

3.
J Chem Inf Model ; 62(19): 4702-4712, 2022 Oct 10.
Article in English | MEDLINE | ID: mdl-36122418

ABSTRACT

Ionic liquids have attracted the attention of researchers as possible electrolytes for electrochemical energy storage devices. However, their properties, such as the electrochemical stability window (ESW), ionic conductivity, and diffusivity, are influenced both by the chemical structures of cations and anions and by their combinations. Most studies in the literature focus on the understanding of common ionic liquids, and little effort has been made to find ways to improve our atomistic understanding of those systems. The goal of this paper is to explore the structural characteristics of cations and anions that form ionic liquids that can expand the HOMO/LUMO gap, a property directly linked to the ESW of the electrolyte. For that, we design a framework for randomly generating new ions by combining their fragments. Within this framework, we generate about 104 cations and 104 anions and fully optimize their structures using density functional theory. Our calculations show that aromatic cations are less stable ionic liquids than aliphatic ones, an expected result if chemical rationale is used. More importantly, we can improve the gap by adding electron-donating and electron-withdrawing functional groups to the cations and anions, respectively. The increase can be about 2 V, depending on the case. This improvement is reflected in a wider ESW.

4.
J Chem Inf Model ; 62(17): 3948-3960, 2022 09 12.
Article in English | MEDLINE | ID: mdl-36044610

ABSTRACT

Machine learning as a tool for chemical space exploration broadens horizons to work with known and unknown molecules. At its core lies molecular representation, an essential key to improve learning about structure-property relationships. Recently, contrastive frameworks have been showing impressive results for representation learning in diverse domains. Therefore, this paper proposes a contrastive framework that embraces multimodal molecular data. Specifically, our approach jointly trains a graph encoder and an encoder for the simplified molecular-input line-entry system (SMILES) string to perform the contrastive learning objective. Since SMILES is the basis of our method, i.e., we built the molecular graph from the SMILES, we call our framework as SMILES Contrastive Learning (SMICLR). When stacking a nonlinear regressor on the SMICLR's pretrained encoder and fine-tuning the entire model, we reduced the prediction error by, on average, 44% and 25% for the energetic and electronic properties of the QM9 data set, respectively, over the supervised baseline. We further improved our framework's performance when applying data augmentations in each molecular-input representation. Moreover, SMICLR demonstrated competitive representation learning results in an unsupervised setting.


Subject(s)
Machine Learning
5.
J Chem Inf Model ; 62(4): 817-828, 2022 02 28.
Article in English | MEDLINE | ID: mdl-35174705

ABSTRACT

Some of the most common applications of machine learning (ML) algorithms dealing with small molecules usually fall within two distinct domains, namely, the prediction of molecular properties and the design of novel molecules with some desirable property. Here we unite these applications under a single molecular representation and ML algorithm by modifying the grammar variational autoencoder (GVAE) model with the incorporation of property information into its training procedure, thus creating a supervised GVAE (SGVAE). Results indicate that the biased latent space generated by this approach can successfully be used to predict the molecular properties of the input molecules, produce novel and unique molecules with some desired property and also estimate the properties of random sampled molecules. We illustrate these possibilities by sampling novel molecules from the latent space with specific values of the lowest unoccupied molecular orbital (LUMO) energy after training the model using the QM9 data set. Furthermore, the trained model is also used to predict the properties of a hold-out set and the resulting mean absolute error (MAE) shows values close to chemical accuracy for the dipole moment and atomization energies, even outperforming ML models designed to exclusive predict molecular properties using the SMILES as molecular representation. Therefore, these results show that the proposed approach is a viable way to provide generative ML models with molecular property information in a way that the generation of novel molecules is likely to achieve better results, with the benefit that these new molecules can also have their molecular properties accurately predicted.


Subject(s)
Algorithms , Machine Learning , Physical Phenomena
6.
J Chem Inf Model ; 61(9): 4210-4223, 2021 09 27.
Article in English | MEDLINE | ID: mdl-34387994

ABSTRACT

Most machine learning applications in quantum-chemistry (QC) data sets rely on a single statistical error parameter such as the mean square error (MSE) to evaluate their performance. However, this approach has limitations or can even yield incorrect interpretations. Here, we report a systematic investigation of the two components of the MSE, i.e., the bias and variance, using the QM9 data set. To this end, we experiment with three descriptors, namely (i) symmetry functions (SF, with two-body and three-body functions), (ii) many-body tensor representation (MBTR, with two- and three-body terms), and (iii) smooth overlap of atomic positions (SOAP), to evaluate the prediction process's performance using different numbers of molecules in training samples and the effect of bias and variance on the final MSE. Overall, low sample sizes are related to higher MSE. Moreover, the bias component strongly influences the larger MSEs. Furthermore, there is little agreement among molecules with higher errors (outliers) across different descriptors. However, there is a high prevalence among the outliers intersection set and the convex hull volume of geometric coordinates (VCH). According to the obtained results with the distribution of MSE (and its components bias and variance) and the appearance of outliers, it is suggested to use ensembles of models with a low bias to minimize the MSE, more specifically when using a small number of molecules in the training set.


Subject(s)
Algorithms , Machine Learning , Bias
7.
Soc Netw Anal Min ; 11(1): 47, 2021.
Article in English | MEDLINE | ID: mdl-34025818

ABSTRACT

Fact-checking verifies a multitude of claims and remains a promising solution to fight fake news. The spread of rumors, hoaxes, and conspiracy theories online is evident in times of crisis, when fake news ramped up across platforms, increasing fear and confusion among the population as seen in the COVID-19 pandemic. This article explores fact-checking initiatives in Latin America, using an original Markov-based computational method to cluster topics on tweets and identify their diffusion between different datasets. Drawing on a mixture of quantitative and qualitative methods, including time-series analysis, network analysis and in-depth close reading, our article proposes an in-depth tracing of COVID-related false information across the region, comparing if there is a pattern of behavior through the countries. We rely on the open Twitter application programming interface connection to gather data from public accounts of the six major fact-checking agencies in Latin America, namely Argentina (Chequeado), Brazil (Agência Lupa), Chile (Mala Espina Check), Colombia (Colombia Check from Consejo de Redacciín), Mexico (El Sabueso from Animal Polótico) and Venezuela (Efecto Cocuyo). In total, these profiles account for 102,379 tweets that were collected between January and July 2020. Our study offers insights into the dynamics of online information dissemination beyond the national level and demonstrates how politics intertwine with the health crisis in this period. Our method is capable of clustering topics in a period of overabundance of information, as we fight not only a pandemic but also an infodemic, evidentiating opportunities to understand and slow the spread of false information.

8.
J Chem Inf Model ; 61(5): 2294-2301, 2021 05 24.
Article in English | MEDLINE | ID: mdl-33939914

ABSTRACT

Our atomistic understanding of the physical-chemical parameters that drives the changes in the relative stability of clusters induced by adsorbed molecules is far from satisfactory. In this work, we employed density functional theory calculations to address this problem using CO adsorption on 13-atom transition-metal clusters, TM13, namely, nCO/TM13, where TM = Ru, Rh, Pd, and Ag, and n = 1-6. Unexpectedly, changes in the relative stability take place for all systems at a lower coverage, namely, at n = 3 (Ru13), 4 (Rh13, Ag13), and 2 (Pd13). To address the effects that lead to changes in the stability, we proposed an energy decomposition scheme for the binding energy of the nCO/TM13 systems, which yields that the change in relative stability is dominated by the interaction energy and cluster distortion energy upon adsorption, where the interaction energy is higher for high-energy unprotected clusters. Furthermore, we characterized all adsorption parameters, which helps us to complement our atomistic understanding.


Subject(s)
Quantum Theory , Adsorption
9.
PLoS One ; 16(3): e0248126, 2021.
Article in English | MEDLINE | ID: mdl-33690694

ABSTRACT

Topological analysis and community detection in mobility complex networks have an essential role in many contexts, from economics to the environmental agenda. However, in many cases, the dynamic component of mobility data is not considered directly. In this paper, we study how topological indexes and community structure changes in a business day. For the analyzes, we use a mobility database with a high temporal resolution. Our case study is the city of São José dos Campos (Brazil)-the city is divided into 55 traffic zones. More than 20 thousand people were asked about their travels the day before the survey (Origin-Destination Survey). We generated a set of graphs, where each vertex represents a traffic zone, and the edges are weighted by the number of trips between them, restricted to a time window. We calculated topological properties, such as degree, clustering coefficient and diameter, and the network's community structure. The results show spatially concise community structures related to geographical factors such as highways and the persistence of some communities for different timestamps. These analyses may support the definition and adjustment of public policies to improve urban mobility. For instance, the community structure of the network might be useful for defining inter-zone public transportation.


Subject(s)
Population Dynamics/statistics & numerical data , Transportation/statistics & numerical data , Urban Population/trends , Algorithms , Brazil , Cities , Cluster Analysis , Data Management , Humans , Models, Theoretical , Population Dynamics/trends
10.
J Chem Inf Model ; 61(3): 1125-1135, 2021 03 22.
Article in English | MEDLINE | ID: mdl-33685128

ABSTRACT

The amount of quantum chemistry (QC) data is increasing year by year due to the continuous increase of computational power and development of new algorithms. However, in most cases, our atom-level knowledge of molecular systems has been obtained by manual data analyses based on selected descriptors. In this work, we introduce a data mining framework to accelerate the extraction of insights from QC datasets, which starts with a featurization process that converts atomic features into molecular properties (AtoMF). Then, it employs correlation coefficients (Pearson, Spearman, and Kendall) to investigate the AtoMF features relationship with a target property. We applied our framework to investigate three nanocluster systems, namely, PtnTM55-n, CenZr15-nO30, and (CHn + mH)/TM13. We found several interesting and consistent insights using Spearman and Kendall correlation coefficients, indicating that they are suitable for our approach; however, our results indicate that the Pearson coefficient is very sensitive to outliers and should not be used. Moreover, we highlight problems that can occur during this analysis and discuss how to handle them. Finally, we make available a new Python package that implements the proposed QC data mining framework, which can be used as is or modified to include new features.


Subject(s)
Algorithms , Data Mining , Research Design
11.
J Phys Chem A ; 124(47): 9854-9866, 2020 Nov 25.
Article in English | MEDLINE | ID: mdl-33174750

ABSTRACT

Machine learning (ML) models can potentially accelerate the discovery of tailored materials by learning a function that maps chemical compounds into their respective target properties. In this realm, a crucial step is encoding the molecular systems into the ML model, in which the molecular representation plays a crucial role. Most of the representations are based on the use of atomic coordinates (structure); however, it can increase ML training and predictions' computational cost. Herein, we investigate the impact of choosing free-coordinate descriptors based on the Simplified Molecular Input Line Entry System (SMILES) representation, which can substantially reduce the ML predictions' computational cost. Therefore, we evaluate a feed-forward neural network (FNN) model's prediction performance over five feature selection methods and nine ground-state properties (including energetic, electronic, and thermodynamic properties) from a public data set composed of ∼130k organic molecules. Our best results reached a mean absolute error, close to chemical accuracy, of ∼0.05 eV for the atomization energies (internal energy at 0 K, internal energy at 298.15 K, enthalpy at 298.15 K, and free energy at 298.15 K). Moreover, for the atomization energies, the results obtained an out-of-sample error nine times less than the same FNN model trained with the Coulomb matrix, a traditional coordinate-based descriptor. Furthermore, our results showed how limited the model's accuracy is by employing such low computational cost representation that carries less information about the molecular structure than the most state-of-the-art methods.

12.
Sci Rep ; 10(1): 13303, 2020 08 06.
Article in English | MEDLINE | ID: mdl-32764598

ABSTRACT

All cellular processes can be ultimately understood in terms of respective fundamental biochemical interactions between molecules, which can be modeled as networks. Very often, these molecules are shared by more than one process, therefore interconnecting them. Despite this effect, cellular processes are usually described by separate networks with heterogeneous levels of detail, such as metabolic, protein-protein interaction, and transcription regulation networks. Aiming at obtaining a unified representation of cellular processes, we describe in this work an integrative framework that draws concepts from rule-based modeling. In order to probe the capabilities of the framework, we used an organism-specific database and genomic information to model the whole-cell biochemical network of the Mycoplasma genitalium organism. This modeling accounted for 15 cellular processes and resulted in a single component network, indicating that all processes are somehow interconnected. The topological analysis of the network showed structural consistency with biological networks in the literature. In order to validate the network, we estimated gene essentiality by simulating gene deletions and compared the results with experimental data available in the literature. We could classify 212 genes as essential, being 95% of them consistent with experimental results. Although we adopted a relatively simple organism as a case study, we suggest that the presented framework has the potential for paving the way to more integrated studies of whole organisms leading to a systemic analysis of cells on a broader scale. The modeling of other organisms using this framework could provide useful large-scale models for different fields of research such as bioengineering, network biology, and synthetic biology, and also provide novel tools for medical and industrial applications.


Subject(s)
Models, Biological , Mycoplasma genitalium/cytology , Mycoplasma genitalium/metabolism , Chromosomes, Bacterial/metabolism , Genes, Bacterial/genetics
13.
Nat Commun ; 11(1): 4036, 2020 08 12.
Article in English | MEDLINE | ID: mdl-32788573

ABSTRACT

The number of spatiotemporal data sets has increased rapidly in the last years, which demands robust and fast methods to extract information from this kind of data. Here, we propose a network-based model, called Chronnet, for spatiotemporal data analysis. The network construction process consists of dividing a geometric space into grid cells represented by nodes connected chronologically. Strong links in the network represent consecutive recurrent events between cells. The chronnet construction process is fast, making the model suitable to process large data sets. Using artificial and real data sets, we show how chronnets can capture data properties beyond simple statistics, like frequent patterns, spatial changes, outliers, and spatiotemporal clusters. Therefore, we conclude that chronnets represent a robust tool for the analysis of spatiotemporal data sets.

14.
J Chem Inf Model ; 60(2): 537-545, 2020 02 24.
Article in English | MEDLINE | ID: mdl-31917570

ABSTRACT

In this work, we report an ab initio investigation based on density functional theory calculations within van der Waals D3 corrections to investigate the adsorption properties and activation of CO2 on transition-metal (TM) 13-atom clusters (TM = Ru, Rh, Pd, Ag), which is a key step for the development of subnano catalysts for the conversion of CO2 to high-value products. From our analyses, which include calculations of several properties and the Spearman correlation analysis, we found that CO2 adopts two distinct structures on the selected TM13 clusters, namely, a bent CO2 configuration in which the OCO angle is about 125 to 150° (chemisorption), which is the lowest energy CO2/TM13 configuration for TM = Ru, Rh, Pd. As in the gas phase, the linear CO2 structure yields the lowest energy for CO2/Ag13 and several higher energy configurations for TM = Ru, Rh, Pd. The bent CO2 (activated) is driven by a chemisorption CO2-TM13 interaction due to the charge transfer from the TM13 clusters toward CO2, while a weak physisorption interaction is obtained for the linear CO2 on the TM13 clusters. Thus, the CO2 activation occurs only in the first case and it is driven by charge transfer from the TM13 clusters to the CO2 molecule (i.e., CO2-δ), which is confirmed by our Bader charge analysis and vibrational frequencies.


Subject(s)
Carbon Dioxide/chemistry , Metals, Heavy/chemistry , Quantum Theory , Adsorption , Models, Molecular , Molecular Conformation , Vibration
15.
Sci Rep ; 6: 25570, 2016 05 09.
Article in English | MEDLINE | ID: mdl-27158092

ABSTRACT

A prominent feature of complex networks is the appearance of communities, also known as modular structures. Specifically, communities are groups of nodes that are densely connected among each other but connect sparsely with others. However, detecting communities in networks is so far a major challenge, in particular, when networks evolve in time. Here, we propose a change in the community detection approach. It underlies in defining an intrinsic dynamic for the nodes of the network as interacting particles (based on diffusive equations of motion and on the topological properties of the network) that results in a fast convergence of the particle system into clustered patterns. The resulting patterns correspond to the communities of the network. Since our detection of communities is constructed from a dynamical process, it is able to analyse time-varying networks straightforwardly. Moreover, for static networks, our numerical experiments show that our approach achieves similar results as the methodologies currently recognized as the most efficient ones. Also, since our approach defines an N-body problem, it allows for efficient numerical implementations using parallel computations that increase its speed performance.

16.
Neural Netw ; 24(1): 54-64, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20884173

ABSTRACT

Attention is a critical mechanism for visual scene analysis. By means of attention, it is possible to break down the analysis of a complex scene to the analysis of its parts through a selection process. Empirical studies demonstrate that attentional selection is conducted on visual objects as a whole. We present a neurocomputational model of object-based selection in the framework of oscillatory correlation. By segmenting an input scene and integrating the segments with their conspicuity obtained from a saliency map, the model selects salient objects rather than salient locations. The proposed system is composed of three modules: a saliency map providing saliency values of image locations, image segmentation for breaking the input scene into a set of objects, and object selection which allows one of the objects of the scene to be selected at a time. This object selection system has been applied to real gray-level and color images and the simulation results show the effectiveness of the system.


Subject(s)
Attention/physiology , Brain Mapping , Brain/physiology , Computer Simulation , Neural Networks, Computer , Pattern Recognition, Visual/physiology , Diagnostic Imaging , Humans , Nonlinear Dynamics , Photic Stimulation
17.
Chem Biol Drug Des ; 75(6): 632-40, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20565477

ABSTRACT

Cannabinoid compounds have widely been employed because of its medicinal and psychotropic properties. These compounds are isolated from Cannabis sativa (or marijuana) and are used in several medical treatments, such as glaucoma, nausea associated to chemotherapy, pain and many other situations. More recently, its use as appetite stimulant has been indicated in patients with cachexia or AIDS. In this work, the influence of several molecular descriptors on the psychoactivity of 50 cannabinoid compounds is analyzed aiming one obtain a model able to predict the psychoactivity of new cannabinoids. For this purpose, initially, the selection of descriptors was carried out using the Fisher's weight, the correlation matrix among the calculated variables and principal component analysis. From these analyses, the following descriptors have been considered more relevant: E(LUMO) (energy of the lowest unoccupied molecular orbital), Log P (logarithm of the partition coefficient), VC4 (volume of the substituent at the C4 position) and LP1 (Lovasz-Pelikan index, a molecular branching index). To follow, two neural network models were used to construct a more adequate model for classifying new cannabinoid compounds. The first model employed was multi-layer perceptrons, with algorithm back-propagation, and the second model used was the Kohonen network. The results obtained from both networks were compared and showed that both techniques presented a high percentage of correctness to discriminate psychoactive and psychoinactive compounds. However, the Kohonen network was superior to multi-layer perceptrons.


Subject(s)
Antipsychotic Agents/chemistry , Cannabinoids/chemistry , Neural Networks, Computer , Antipsychotic Agents/pharmacology , Cannabinoids/pharmacology , Cannabis/chemistry , Models, Chemical , Models, Molecular , Quantitative Structure-Activity Relationship , Quantum Theory
18.
Neural Netw ; 22(5-6): 728-37, 2009.
Article in English | MEDLINE | ID: mdl-19595565

ABSTRACT

Object selection refers to the mechanism of extracting objects of interest while ignoring other objects and background in a given visual scene. It is a fundamental issue for many computer vision and image analysis techniques and it is still a challenging task to artificial visual systems. Chaotic phase synchronization takes place in cases involving almost identical dynamical systems and it means that the phase difference between the systems is kept bounded over the time, while their amplitudes remain chaotic and may be uncorrelated. Instead of complete synchronization, phase synchronization is believed to be a mechanism for neural integration in brain. In this paper, an object selection model is proposed. Oscillators in the network representing the salient object in a given scene are phase synchronized, while no phase synchronization occurs for background objects. In this way, the salient object can be extracted. In this model, a shift mechanism is also introduced to change attention from one object to another. Computer simulations show that the model produces some results similar to those observed in natural vision systems.


Subject(s)
Attention , Neural Networks, Computer , Periodicity , Visual Perception , Algorithms , Computer Simulation , Humans , Photic Stimulation , Time Factors
19.
Chaos ; 18(3): 033107, 2008 Sep.
Article in English | MEDLINE | ID: mdl-19045445

ABSTRACT

In many real situations, randomness is considered to be uncertainty or even confusion which impedes human beings from making a correct decision. Here we study the combined role of randomness and determinism in particle dynamics for complex network community detection. In the proposed model, particles walk in the network and compete with each other in such a way that each of them tries to possess as many nodes as possible. Moreover, we introduce a rule to adjust the level of randomness of particle walking in the network, and we have found that a portion of randomness can largely improve the community detection rate. Computer simulations show that the model has good community detection performance and at the same time presents low computational complexity.


Subject(s)
Algorithms , Decision Making , Decision Support Techniques , Models, Statistical , Nonlinear Dynamics , Computer Simulation
SELECTION OF CITATIONS
SEARCH DETAIL
...