Search | VHL Regional Portal

Deep Learning to Generate in Silico Chemical Property Libraries and Candidate Molecules for Small Molecule Identification in Complex Samples.

Colby, Sean M; Nuñez, Jamie R; Hodas, Nathan O; Corley, Courtney D; Renslow, Ryan R.

Anal Chem ; 92(2): 1720-1729, 2020 01 21.

Article in English | MEDLINE | ID: mdl-31661259

ABSTRACT

Comprehensive and unambiguous identification of small molecules in complex samples will revolutionize our understanding of the role of metabolites in biological systems. Existing and emerging technologies have enabled measurement of chemical properties of molecules in complex mixtures and, in concert, are sensitive enough to resolve even stereoisomers. Despite these experimental advances, small molecule identification is inhibited by (i) chemical reference libraries (e.g., mass spectra, collision cross section, and other measurable property libraries) representing <1% of known molecules, limiting the number of possible identifications, and (ii) the lack of a method to generate candidate matches directly from experimental features (i.e., without a library). To this end, we developed a variational autoencoder (VAE) to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. We extended the VAE to include a chemical property decoder, trained as a multitask network, in order to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, with its focus on properties that can be obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involved a cascade of transfer learning iterations. First, molecular representation is learned from a large data set of structures with m/z labels. Next, in silico property values are used to continue training, as experimental property data is limited. Finally, the network is further refined by being trained with the experimental data. This allows the network to learn as much as possible at each stage, enabling success with progressively smaller data sets without overfitting. Once trained, the network can be used to predict chemical properties directly from structure, as well as generate candidate structures with desired chemical properties. Our approach is orders of magnitude faster than first-principles simulation for CCS property prediction. Additionally, the ability to generate novel molecules along manifolds, defined by chemical property analogues, positions DarkChem as highly useful in a number of application areas, including metabolomics and small molecule identification, drug discovery and design, chemical forensics, and beyond.

Subject(s)

Computer Simulation , Deep Learning , Small Molecule Libraries/analysis , Metabolomics , Molecular Structure , Small Molecule Libraries/metabolism

Doing the Impossible: Why Neural Networks Can Be Trained at All.

Hodas, Nathan O; Stinis, Panos.

Front Psychol ; 9: 1185, 2018.

Article in English | MEDLINE | ID: mdl-30050485

ABSTRACT

As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results? Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the above phenomena that forces them to achieve configurations that live on a low-dimensional manifold, avoiding the curse of dimensionality. In the current work we use the concept of mutual information between successive layers of a deep neural network to elucidate this mechanism and suggest possible ways of exploiting it to accelerate training. We show that adding structure to the neural network leads to higher mutual information between layers. High mutual information between layers implies that the effective number of free parameters is exponentially smaller than the raw number of tunable weights, providing insight into why neural networks with far more weights than training points can be reliably trained.

Deep learning for computational chemistry.

Goh, Garrett B; Hodas, Nathan O; Vishnu, Abhinav.

J Comput Chem ; 38(16): 1291-1307, 2017 06 15.

Article in English | MEDLINE | ID: mdl-28272810

ABSTRACT

The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many domains, particularly in speech recognition and computer vision, to the extent that the majority of expert practitioners in those field are now regularly eschewing prior established models in favor of deep learning models. In this review, we provide an introductory overview into the theory of deep neural networks and their unique properties that distinguish them from traditional machine learning algorithms used in cheminformatics. By providing an overview of the variety of emerging applications of deep neural networks, we highlight its ubiquity and broad applicability to a wide range of challenges in the field, including quantitative structure activity relationship, virtual screening, protein structure prediction, quantum chemistry, materials design, and property prediction. In reviewing the performance of deep neural networks, we observed a consistent outperformance against non-neural networks state-of-the-art models across disparate research topics, and deep neural network-based models often exceeded the "glass ceiling" expectations of their respective tasks. Coupled with the maturity of GPU-accelerated computing for training deep neural networks and the exponential growth of chemical data on which to train these networks on, we anticipate that deep learning algorithms will be a valuable tool for computational chemistry. © 2017 Wiley Periodicals, Inc.

Determination of the source of SHG verniers in zebrafish skeletal muscle.

Dempsey, William P; Hodas, Nathan O; Ponti, Aaron; Pantazis, Periklis.

Sci Rep ; 5: 18119, 2015 Dec 11.

Article in English | MEDLINE | ID: mdl-26657568

ABSTRACT

SHG microscopy is an emerging microscopic technique for medically relevant imaging because certain endogenous proteins, such as muscle myosin lattices within muscle cells, are sufficiently spatially ordered to generate detectable SHG without the use of any fluorescent dye. Given that SHG signal is sensitive to the structural state of muscle sarcomeres, SHG functional imaging can give insight into the integrity of muscle cells in vivo. Here, we report a thorough theoretical and experimental characterization of myosin-derived SHG intensity profiles within intact zebrafish skeletal muscle. We determined that "SHG vernier" patterns, regions of bifurcated SHG intensity, are illusory when sarcomeres are staggered with respect to one another. These optical artifacts arise due to the phase coherence of SHG signal generation and the Guoy phase shift of the laser at the focus. In contrast, two-photon excited fluorescence images obtained from fluorescently labeled sarcomeric components do not contain such illusory structures, regardless of the orientation of adjacent myofibers. Based on our results, we assert that complex optical artifacts such as SHG verniers should be taken into account when applying functional SHG imaging as a diagnostic readout for pathological muscle conditions.

Subject(s)

Microscopy/methods , Muscle, Skeletal/metabolism , Myofibrils/metabolism , Myosins/metabolism , Sarcomeres/metabolism , Zebrafish/metabolism , Animals , Artifacts , Diagnostic Imaging/methods , Microscopy, Confocal , Microscopy, Fluorescence, Multiphoton , Muscle Cells/metabolism , Muscle, Skeletal/anatomy & histology , Photons , Reproducibility of Results , Zebrafish/anatomy & histology , Zebrafish/embryology

The simple rules of social contagion.

Hodas, Nathan O; Lerman, Kristina.

Sci Rep ; 4: 4343, 2014 Mar 11.

Article in English | MEDLINE | ID: mdl-24614301

ABSTRACT

It is commonly believed that information spreads between individuals like a pathogen, with each exposure by an informed friend potentially resulting in a naive individual becoming infected. However, empirical studies of social media suggest that individual response to repeated exposure to information is far more complex. As a proxy for intervention experiments, we compare user responses to multiple exposures on two different social media sites, Twitter and Digg. We show that the position of exposing messages on the user-interface strongly affects social contagion. Accounting for this visibility significantly simplifies the dynamics of social contagion. The likelihood an individual will spread information increases monotonically with exposure, while explicit feedback about how many friends have previously spread it increases the likelihood of a response. We provide a framework for unifying information visibility, divided attention, and explicit social feedback to predict the temporal dynamics of user behavior.

Subject(s)

Information Dissemination , Social Behavior , Social Media , Friends/psychology , Humans , Likelihood Functions

Microscopic structure and dynamics of air/water interface by computer simulations--comparison with sum-frequency generation experiments.

Wang, Yanting; Hodas, Nathan O; Jung, Yousung; Marcus, R A.

Phys Chem Chem Phys ; 13(12): 5388-93, 2011 Mar 28.

Article in English | MEDLINE | ID: mdl-21347495

ABSTRACT

The air/water interface was simulated and the mode amplitudes and their ratios of the effective nonlinear sum-frequency generation (SFG) susceptibilities (A(eff)'s) were calculated for the ssp, ppp, and sps polarization combinations and compared with experiments. By designating "surface-sensitive" free OH bonds on the water surface, many aspects of the SFG measurements were calculated and compared with those inferred from experiment. We calculate an average tilt angle close to the SFG observed value of 35, an average surface density of free OH bonds close to the experimental value of about 2.8 × 10(18) m(-2), computed ratios of A(eff)'s that are very similar to those from the SFG experiment, and their absolute values that are in reasonable agreement with experiment. A one-parameter model was used to calculate these properties. The method utilizes results available from independent IR and Raman experiments to obtain some of the needed quantities, rather than calculating them ab initio. The present results provide microscopic information on water structure useful to applications such as in our recent theory of on-water heterogeneous catalysis.

Asymmetry in RNA pseudoknots: observation and theory.

Aalberts, Daniel P; Hodas, Nathan O.

Nucleic Acids Res ; 33(7): 2210-4, 2005.

Article in English | MEDLINE | ID: mdl-15831794

ABSTRACT

RNA can fold into a topological structure called a pseudoknot, composed of non-nested double-stranded stems connected by single-stranded loops. Our examination of the PseudoBase database of pseudoknotted RNA structures reveals asymmetries in the stem and loop lengths and provocative composition differences between the loops. By taking into account differences between major and minor grooves of the RNA double helix, we explain much of the asymmetry with a simple polymer physics model and statistical mechanical theory, with only one adjustable parameter.

Subject(s)

Models, Molecular , RNA/chemistry , Biopolymers/chemistry , Models, Statistical , Nucleic Acid Conformation

Efficient computation of optimal oligo-RNA binding.

Hodas, Nathan O; Aalberts, Daniel P.

Nucleic Acids Res ; 32(22): 6636-42, 2004.

Article in English | MEDLINE | ID: mdl-15608295

ABSTRACT

We present an algorithm that calculates the optimal binding conformation and free energy of two RNA molecules, one or both oligomeric. This algorithm has applications to modeling DNA microarrays, RNA splice-site recognitions and other antisense problems. Although other recent algorithms perform the same calculation in time proportional to the sum of the lengths cubed, O((N1 + N2)3), our oligomer binding algorithm, called bindigo, scales as the product of the sequence lengths, O(N1*N2). The algorithm performs well in practice with the aid of a heuristic for large asymmetric loops. To demonstrate its speed and utility, we use bindigo to investigate the binding proclivities of U1 snRNA to mRNA donor splice sites.

Subject(s)

Algorithms , Computational Biology/methods , Oligoribonucleotides/chemistry , Sequence Analysis, RNA/methods , Base Pairing , Base Sequence , Binding Sites , Nucleic Acid Conformation , Oligoribonucleotides/metabolism , RNA Splice Sites , RNA, Messenger/chemistry , RNA, Messenger/metabolism , RNA, Small Nuclear/chemistry , RNA, Small Nuclear/metabolism , Sequence Alignment

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL