Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
J Chem Inf Model ; 64(11): 4392-4409, 2024 Jun 10.
Article in English | MEDLINE | ID: mdl-38815246

ABSTRACT

By accelerating time-consuming processes with high efficiency, computing has become an essential part of many modern chemical pipelines. Machine learning is a class of computing methods that can discover patterns within chemical data and utilize this knowledge for a wide variety of downstream tasks, such as property prediction or substance generation. The complex and diverse chemical space requires complex machine learning architectures with great learning power. Recently, learning models based on transformer architectures have revolutionized multiple domains of machine learning, including natural language processing and computer vision. Naturally, there have been ongoing endeavors in adopting these techniques to the chemical domain, resulting in a surge of publications within a short period. The diversity of chemical structures, use cases, and learning models necessitate a comprehensive summarization of existing works. In this paper, we review recent innovations in adapting transformers to solve learning problems in chemistry. Because chemical data is diverse and complex, we structure our discussion based on chemical representations. Specifically, we highlight the strengths and weaknesses of each representation, the current progress of adapting transformer architectures, and future directions.


Subject(s)
Cheminformatics , Machine Learning , Cheminformatics/methods
2.
Sci Total Environ ; 929: 172661, 2024 Jun 15.
Article in English | MEDLINE | ID: mdl-38649059

ABSTRACT

The study's objective was to evaluate the status of converted degraded land into productive agricultural models by improving the physicochemical properties of the soil, soil organic matter (SOM), soil organic carbon (SOC) fractions (active and passive), and microbial biomass carbon (MBC), while also generating carbon (C) credit for additional farmers' income. Six models were analyzed, namely: (1) Arjun forest-based agroecosystems (AFBAE); (2) Lemon grass-based agroecosystems (LGBAE); (3) Legume-cereal-moong-based agroecosystems (LCMBAE); (4) Bael-black mustard-based agroecosystems (BMBAE); (5) Guava-wheat-based agroecosystems (GWBAE), and (6) Custard apple -lentil -based agroecosystems (CALBAE). These models were replicated three times in a randomized block design (RBD). Soil samples were collected from the study area at two depths (0-0.30 and 0.30-0.60 m). At a 0-0.30 m depth, the highest bulk density (ρb) of 1.50 Mg m-3 was observed in LCMBAE, while the lowest ρb of 1.43 Mg m-3 was recorded in BMBAE. The soil organic carbon (SOC) and SOC stock values exhibited a range of 4.2-7.7 g kg-1 and 19.0-33.4 Mg ha-1, respectively. In the AFBAE, the highest levels of 163.1 % MBC were found over LCMBAE. At a 0-0.30 m depth, the recalcitrant index (RI) and lability index (LI) ranged from 0.35-0.46 to 1.97-2.11, respectively. Additionally, the AFBAE exhibited the highest total biomass accumulation (39.23 Mg ha-1), carbon dioxide (CO2) biosequestration (287.9 Mg ha-1), and the total social cost of CO2 at US$ 277 ha-1. Furthermore, in the AFBAE, there was a 198.1 % increase in total C credit (US$ 161 ha-1) compared to LCMBAE (US$ 54 ha-1). However, at 0.30-0.60 m depths, GWBAE and CALBAE were statistically equivalent (p ≤ 0.05) in total C stocks. Principal component analysis (PCA) reveals that component-1 accounts for 77.4 % of the variability, while component-2 contributes 18.6 %. This article aimed to convert the degraded land into a sustainable agricultural module by increasing SOC and CO2 biosequestration and producing more C-credit, or climate currency, on underutilized land.

3.
ACS Chem Neurosci ; 14(24): 4395-4408, 2023 Dec 20.
Article in English | MEDLINE | ID: mdl-38050862

ABSTRACT

Abnormal cytosolic aggregation of TAR DNA-binding protein of 43 kDa (TDP-43) is observed in multiple diseases, including amyotrophic lateral sclerosis (ALS), frontotemporal lobar degeneration, and Alzheimer's disease. Previous studies have shown that TDP-43307-319 located at the C-terminal of TDP-43 can form higher-order oligomers and fibrils. Of particular interest are the hexamers that adopt a cylindrin structure that has been strongly correlated to neurotoxicity. In this study, we use the joint pharmacophore space (JPS) model to identify and generate potential TDP-43 inhibitors. Five JPS-designed molecules are evaluated using both experimental and computational methods: ion mobility mass spectrometry, thioflavin T fluorescence assay, circular dichroism spectroscopy, atomic force microscopy, and molecular dynamics simulations. We found that all five molecules can prevent the amyloid fibril formation of TDP-43307-319, but their efficacy varies significantly. Furthermore, among the five molecules, [AC0101] is the most efficient in preventing the formation of higher-order oligomers and dissociating preformed higher-order oligomers. Molecular dynamics simulations show that [AC0101] both is the most flexible and forms the most hydrogen bonds with the TDP-43307-319 monomer. The JPS-designed molecules can insert themselves between the ß-strands in the hexameric cylindrin structure of TDP-43307-319 and can open its structure. Possible mechanisms for JPS-designed molecules to inhibit and dissociate TDP-43307-319 oligomers on an atomistic scale are proposed.


Subject(s)
Alzheimer Disease , Amyotrophic Lateral Sclerosis , Frontotemporal Dementia , Frontotemporal Lobar Degeneration , Humans , Amyotrophic Lateral Sclerosis/drug therapy , Amyotrophic Lateral Sclerosis/metabolism , DNA-Binding Proteins/metabolism
4.
ACS Chem Neurosci ; 14(15): 2717-2726, 2023 08 02.
Article in English | MEDLINE | ID: mdl-37442126

ABSTRACT

Alzheimer's disease (AD) is one of the world's most pressing health crises. AD is an incurable disease affecting more than 6.5 million Americans, predominantly the elderly, and in its later stages, leads to memory loss, dementia, and death. Amyloid ß (Aß) protein aggregates have been one of the pathological hallmarks of AD since its initial characterization. The early stages of Aß accumulation and aggregation involve the formation of oligomers, which are considered neurotoxic and play a key role in further aggregation into fibrils that eventually appear in the brain as amyloid plaques. We have recently shown by combining ion mobility mass spectrometry (IM-MS) and atomic force microscopy (AFM) that Aß42 rapidly forms dodecamers (12-mers) as the terminal oligomeric state, and these dodecamers seed the early formation of Aß42 protofibrils. The link between soluble oligomers and fibril formation is one of the essential aspects for understanding the root cause of the disease state and is critical to developing therapeutic interventions. Utilizing a joint pharmacophore space (JPS) method, potential drugs have been designed specifically for amyloid-related diseases. These small molecules were generated based on crucial chemical features necessary for target selectivity. In this paper, we utilize our combined IM-MS and AFM methods to investigate the impact of three second-generation JPS small-molecule inhibitors, AC0201, AC0202, and AC0203, on dodecamer as well as fibril formation in Aß42. Our results indicate that AC0201 works well as an inhibitor and remodeler of both dodecamers and fibril formation, AC0203 behaves less efficiently, and AC0202 is ineffective.


Subject(s)
Alzheimer Disease , Amyloidosis , Humans , Aged , Amyloid beta-Peptides/metabolism , Alzheimer Disease/metabolism , Brain/metabolism , Amyloid/metabolism , Peptide Fragments/metabolism
5.
Ann N Y Acad Sci ; 1514(1): 70-81, 2022 08.
Article in English | MEDLINE | ID: mdl-35581156

ABSTRACT

Machine learning (ML) and artificial intelligence (AI) have had a profound impact on our lives. Domains like health and learning are naturally helped by human-AI interactions and decision making. In these areas, as ML algorithms prove their value in making important decisions, humans add their distinctive expertise and judgment on social and interpersonal issues that need to be considered in tandem with algorithmic inputs of information. Some questions naturally arise. What rules and regulations should be invoked on the employment of AI, and what protocols should be in place to evaluate available AI resources? What are the forms of effective communication and coordination with AI that best promote effective human-AI teamwork? In this review, we highlight factors that we believe are especially important in assembling and managing human-AI decision making in a group setting.


Subject(s)
Artificial Intelligence , Machine Learning , Algorithms , Decision Making , Humans
6.
Biochemistry ; 59(4): 499-508, 2020 02 04.
Article in English | MEDLINE | ID: mdl-31846303

ABSTRACT

TDP-43 aggregates are a salient feature of amyotrophic lateral sclerosis (ALS), frontotemporal dementia (FTD), and a variety of other neurodegenerative diseases, including Alzheimer's disease (AD). With an anticipated growth in the most susceptible demographic, projections predict neurodegenerative diseases will potentially affect 15 million people in the United States by 2050. Currently, there are no cures for ALS, FTD, or AD. Previous studies of the amyloidogenic core of TDP-43 have demonstrated that oligomers greater than a trimer are associated with toxicity. Utilizing a joint pharmacophore space (JPS) method, potential drugs have been designed specifically for amyloid-related diseases. These molecules were generated on the basis of key chemical features necessary for blood-brain barrier permeability, low adverse side effects, and target selectivity. Combining ion-mobility mass spectrometry and atomic force microscopy with the JPS computational method allows us to more efficiently evaluate a potential drug's efficacy in disrupting the development of putative toxic species. Our results demonstrate the dissociation of higher-order oligomers in the presence of these novel JPS-generated inhibitors into smaller oligomer species. Additionally, drugs approved by the Food and Drug Administration for the treatment of ALS were also evaluated and demonstrated to maintain higher-order oligomeric assemblies. Possible mechanisms for the observed action of the JPS molecules are discussed.


Subject(s)
DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , TDP-43 Proteinopathies/metabolism , Alzheimer Disease/metabolism , Alzheimer Disease/pathology , Amyotrophic Lateral Sclerosis/metabolism , Amyotrophic Lateral Sclerosis/pathology , Blood-Brain Barrier/metabolism , Computational Biology/methods , Drug Design , Frontotemporal Dementia/metabolism , Frontotemporal Dementia/pathology , Humans , Ion Mobility Spectrometry/methods , Microscopy, Atomic Force/methods , Mutation
7.
Nat Commun ; 10(1): 2648, 2019 06 14.
Article in English | MEDLINE | ID: mdl-31201322

ABSTRACT

Polarization affects many forms of social organization. A key issue focuses on which affective relationships are prone to change and how their change relates to performance. In this study, we analyze a financial institutional over a two-year period that employed 66 day traders, focusing on links between changes in affective relations and trading performance. Traders' affective relations were inferred from their IMs (>2 million messages) and trading performance was measured from profit and loss statements (>1 million trades). Here, we find that triads of relationships, the building blocks of larger social structures, have a propensity towards affective balance, but one unbalanced configuration resists change. Further, balance is positively related to performance. Traders with balanced networks have the "hot hand", showing streaks of high performance. Research implications focus on how changes in polarization relate to performance and polarized states can depolarize.


Subject(s)
Commerce , Decision Making/physiology , Models, Psychological , Risk-Taking , Social Networking , Humans , Markov Chains , Text Messaging/statistics & numerical data
8.
J Am Soc Mass Spectrom ; 30(1): 85-93, 2019 Jan.
Article in English | MEDLINE | ID: mdl-29713966

ABSTRACT

Alzheimer's disease (AD) is rapidly reaching epidemic status among a burgeoning aging population. Much evidence suggests the toxicity of this amyloid disease is most influenced by the formation of soluble oligomeric forms of amyloid ß-protein, particularly the 42-residue alloform (Aß42). Developing potential therapeutics in a directed, streamlined approach to treating this disease is necessary. Here we utilize the joint pharmacophore space (JPS) model to design a new molecule [AC0107] incorporating structural characteristics of known Aß inhibitors, blood-brain barrier permeability, and limited toxicity. To test the molecule's efficacy experimentally, we employed ion mobility mass spectrometry (IM-MS) to discover [AC0107] inhibits the formation of the toxic Aß42 dodecamer at both high (1:10) and equimolar concentrations of inhibitor. Atomic force microscopy (AFM) experiments reveal that [AC0107] prevents further aggregation of Aß42, destabilizes preformed fibrils, and reverses Aß42 aggregation. This trend continues for long-term interaction times of 2 days until only small aggregates remain with virtually no fibrils or higher order oligomers surviving. Pairing JPS with IM-MS and AFM presents a powerful and effective first step for AD drug development. Graphical Abstract.


Subject(s)
Amyloid beta-Peptides/antagonists & inhibitors , Amyloid beta-Peptides/metabolism , Drug Design , Ion Mobility Spectrometry/methods , Models, Molecular , Nitriles/pharmacology , Peptide Fragments/antagonists & inhibitors , Peptide Fragments/metabolism , Pyrrolidines/pharmacology , Alzheimer Disease/drug therapy , Blood-Brain Barrier/drug effects , Drug Evaluation, Preclinical/methods , Humans , Machine Learning , Microscopy, Atomic Force
9.
PLoS One ; 13(10): e0204547, 2018.
Article in English | MEDLINE | ID: mdl-30304044

ABSTRACT

Today, many complex tasks are assigned to teams, rather than individuals. One reason for teaming up is expansion of the skill coverage of each individual to the joint team skill set. However, numerous empirical studies of human groups suggest that the performance of equally skilled teams can widely differ. Two natural question arise: What are the factors defining team performance? and How can we best predict the performance of a given team on a specific task? While the team members' task-related capabilities constrain the potential for the team's success, the key to understanding team performance is in the analysis of the team process, encompassing the behaviors of the team members during task completion. In this study, we extend the existing body of research on team process and prediction models of team performance. Specifically, we analyze the dynamics of historical team performance over a series of tasks as well as the fine-grained patterns of collaboration between team members, and formally connect these dynamics to the team performance in the predictive models. Our major qualitative finding is that higher performing teams have well-connected collaboration networks-as indicated by the topological and spectral properties of the latter-which are more robust to perturbations, and where network processes spread more efficiently. Our major quantitative finding is that our predictive models deliver accurate team performance predictions-with a prediction error of 15-25%-on a variety of simple tasks, outperforming baseline models that do not capture the micro-level dynamics of team member behaviors. We also show how to use our models in an application, for optimal online planning of workload distribution in an organization. Our findings emphasize the importance of studying the dynamics of team collaboration as the major driver of high performance in teams.


Subject(s)
Cooperative Behavior , Group Processes , Models, Psychological , Humans , Mental Processes , Regression Analysis
10.
Neuroimage ; 172: 390-403, 2018 05 15.
Article in English | MEDLINE | ID: mdl-29410205

ABSTRACT

We present a method to discover differences between populations with respect to the spatial coherence of their oriented white matter microstructure in arbitrarily shaped white matter regions. This method is applied to diffusion MRI scans of a subset of the Human Connectome Project dataset: 57 pairs of monozygotic and 52 pairs of dizygotic twins. After controlling for morphological similarity between twins, we identify 3.7% of all white matter as being associated with genetic similarity (35.1 k voxels, p<10-4, false discovery rate 1.5%), 75% of which spatially clusters into twenty-two contiguous white matter regions. Furthermore, we show that the orientation similarity within these regions generalizes to a subset of 47 pairs of non-twin siblings, and show that these siblings are on average as similar as dizygotic twins. The regions are located in deep white matter including the superior longitudinal fasciculus, the optic radiations, the middle cerebellar peduncle, the corticospinal tract, and within the anterior temporal lobe, as well as the cerebellum, brain stem, and amygdalae. These results extend previous work using undirected fractional anisotrophy for measuring putative heritable influences in white matter. Our multidirectional extension better accounts for crossing fiber connections within voxels. This bottom up approach has at its basis a novel measurement of coherence within neighboring voxel dyads between subjects, and avoids some of the fundamental ambiguities encountered with tractographic approaches to white matter analysis that estimate global connectivity.


Subject(s)
Brain/anatomy & histology , Connectome/methods , Image Processing, Computer-Assisted/methods , White Matter/anatomy & histology , Diffusion Tensor Imaging/methods , Humans , Twins, Dizygotic , Twins, Monozygotic
11.
PLoS One ; 12(10): e0184344, 2017.
Article in English | MEDLINE | ID: mdl-29016686

ABSTRACT

Modeling the brain as a functional network can reveal the relationship between distributed neurophysiological processes and functional interactions between brain structures. Existing literature on functional brain networks focuses mainly on a battery of network properties in "resting state" employing, for example, modularity, clustering, or path length among regions. In contrast, we seek to uncover functionally connected subnetworks that predict or correlate with cohort differences and are conserved within the subjects within a cohort. We focus on differences in both the rate of learning as well as overall performance in a sensorimotor task across subjects and develop a principled approach for the discovery of discriminative subgraphs of functional connectivity based on imaging acquired during practice. We discover two statistically significant subgraph regions: one involving multiple regions in the visual cortex and another involving the parietal operculum and planum temporale. High functional coherence in the former characterizes sessions in which subjects take longer to perform the task, while high coherence in the latter is associated with high learning rate (performance improvement across trials). Our proposed methodology is general, in that it can be applied to other cognitive tasks, to study learning or to differentiate between healthy patients and patients with neurological disorders, by revealing the salient interactions among brain regions associated with the observed global state. The discovery of such significant discriminative subgraphs promises a better data-driven understanding of the dynamic brain processes associated with high-level cognitive functions.


Subject(s)
Learning/physiology , Magnetic Resonance Imaging/methods , Parietal Lobe/physiology , Visual Cortex/physiology , Adult , Biomarkers , Brain Mapping , Female , Humans , Image Processing, Computer-Assisted , Male , Psychomotor Performance/physiology
13.
Bioorg Med Chem Lett ; 25(13): 2713-9, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-25998502

ABSTRACT

Joint pharmacophore space (JPS), ensemble docking and sequential JPS-ensemble docking were used to select three panels of compounds (10 per panel) for evaluation as LRRK2 inhibitors. These computational methods identified four LRRK2 inhibitors with IC50 values <12µM. The sequential JPS-ensemble docking predicted the majority of active hits. One of the inhibitors (Z-8205) identified using this method was also found to inhibit the G2019S mutant of LRRK2 25-fold better than wild-type enzyme. This bias for the G2019S mutant is proposed to arise from an interaction with S2019 in this form of the enzyme. In addition, Z-8205 was found to only inhibit one other kinase when profiled against a panel of 97 kinases at 10µM.


Subject(s)
Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Protein Serine-Threonine Kinases/antagonists & inhibitors , Amino Acid Substitution , Binding Sites , Computer Simulation , Drug Discovery , High-Throughput Screening Assays , Humans , Leucine-Rich Repeat Serine-Threonine Protein Kinase-2 , Models, Molecular , Mutant Proteins/antagonists & inhibitors , Mutant Proteins/chemistry , Mutant Proteins/genetics , Parkinson Disease/enzymology , Parkinson Disease/genetics , Protein Serine-Threonine Kinases/chemistry , Protein Serine-Threonine Kinases/genetics , Structural Homology, Protein , Structure-Activity Relationship
14.
Adv Mater ; 26(33): 5839-45, 2014 Sep 03.
Article in English | MEDLINE | ID: mdl-25043854

ABSTRACT

Discriminative base motifs within DNA templates for fluorescent silver clusters are identified using methods that combine large experimental data sets with machine learning tools for pattern recognition. Combining the discovery of certain multibase motifs important for determining fluorescence brightness with a generative algorithm, the probability of selecting DNA templates that stabilize fluorescent silver clusters is increased by a factor of >3.


Subject(s)
Artificial Intelligence , DNA/chemistry , Fluorescence , Pattern Recognition, Automated/methods , Silver Compounds/chemistry , Base Sequence , Spectrometry, Fluorescence
15.
PLoS One ; 8(9): e71293, 2013.
Article in English | MEDLINE | ID: mdl-24086251

ABSTRACT

Spatial networks, in which nodes and edges are embedded in space, play a vital role in the study of complex systems. For example, many social networks attach geo-location information to each user, allowing the study of not only topological interactions between users, but spatial interactions as well. The defining property of spatial networks is that edge distances are associated with a cost, which may subtly influence the topology of the network. However, the cost function over distance is rarely known, thus developing a model of connections in spatial networks is a difficult task. In this paper, we introduce a novel model for capturing the interaction between spatial effects and network structure. Our approach represents a unique combination of ideas from latent variable statistical models and spatial network modeling. In contrast to previous work, we view the ability to form long/short-distance connections to be dependent on the individual nodes involved. For example, a node's specific surroundings (e.g. network structure and node density) may make it more likely to form a long distance link than other nodes with the same degree. To capture this information, we attach a latent variable to each node which represents a node's spatial reach. These variables are inferred from the network structure using a Markov Chain Monte Carlo algorithm. We experimentally evaluate our proposed model on 4 different types of real-world spatial networks (e.g. transportation, biological, infrastructure, and social). We apply our model to the task of link prediction and achieve up to a 35% improvement over previous approaches in terms of the area under the ROC curve. Additionally, we show that our model is particularly helpful for predicting links between nodes with low degrees. In these cases, we see much larger improvements over previous models.


Subject(s)
Models, Theoretical , Algorithms , Markov Chains , Monte Carlo Method
16.
Bioinformatics ; 29(7): 940-6, 2013 Apr 01.
Article in English | MEDLINE | ID: mdl-23396124

ABSTRACT

MOTIVATION: Microscopy advances have enabled the acquisition of large-scale biological images that capture whole tissues in situ. This in turn has fostered the study of spatial relationships between cells and various biological structures, which has proved enormously beneficial toward understanding organ and organism function. However, the unique nature of biological images and tissues precludes the application of many existing spatial mining and quantification methods necessary to make inferences about the data. Especially difficult is attempting to quantify the spatial correlation between heterogeneous structures and point objects, which often occurs in many biological tissues. RESULTS: We develop a method to quantify the spatial correlation between a continuous structure and point data in large (17 500 × 17 500 pixel) biological images. We use this method to study the spatial relationship between the vasculature and a type of cell in the retina called astrocytes. We use a geodesic feature space based on vascular structures and embed astrocytes into the space by spatial sampling. We then propose a quantification method in this feature space that enables us to empirically demonstrate that the spatial distribution of astrocytes is often correlated with vascular structure. Additionally, these patterns are conserved in the retina after injury. These results prove the long-assumed patterns of astrocyte spatial distribution and provide a novel methodology for conducting other spatial studies of similar tissue and structures. AVAILABILITY: The Matlab code for the method described in this article can be found at http://www.cs.ucsb.edu/∼dbl/software.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Image Processing, Computer-Assisted/methods , Retina/cytology , Animals , Astrocytes/cytology , Mice , Retinal Vessels/cytology
17.
J Chem Inf Model ; 51(5): 1106-21, 2011 May 23.
Article in English | MEDLINE | ID: mdl-21488651

ABSTRACT

We propose a novel method for pharmacophore analysis by examining the Joint Pharmacophore Space of chemical compounds, targets, and chemical/biological properties. The proposed approach is a notable deviation from existing techniques that analyze compounds on a target-by-target basis, aimed at extracting and optimizing a specific pharmacophore. The underlying geometry of the pharmacophores is responsible for binding between compounds and targets as well as properties of compounds such as Blood Brain Barrier permeability. The identification of this joint space enables us to cluster and classify similar pharmacophores based on geometric arrangements, analyze the diversity of this space, ascribe positive/negative properties to the subspaces, and query and mine a database of compounds for presence or absence of activity. Extensive experiments are carried out to validate the presence of subspaces that uniquely identify geometric configurations conforming to certain biological activities. The discriminative potential of these subspaces is also verified by employing them as a molecular descriptor. Empirical results show promising performance in terms of classification quality highlighting the utility of mining the joint pharmacophore space.


Subject(s)
Algorithms , Antineoplastic Agents/chemistry , Drug Discovery , Antineoplastic Agents/pharmacology , Binding Sites , Blood-Brain Barrier , Capillary Permeability , Cell Line, Tumor , Data Mining , Databases, Chemical , Humans , Hydrogen Bonding , Hydrophobic and Hydrophilic Interactions , Ligands , Molecular Conformation , Molecular Targeted Therapy
18.
BMC Bioinformatics ; 12: 7, 2011 Jan 06.
Article in English | MEDLINE | ID: mdl-21211042

ABSTRACT

BACKGROUND: The combination of genotypic and genome-wide expression data arising from segregating populations offers an unprecedented opportunity to model and dissect complex phenotypes. The immense potential offered by these data derives from the fact that genotypic variation is the sole source of perturbation and can therefore be used to reconcile changes in gene expression programs with the parental genotypes. To date, several methodologies have been developed for modeling eQTL data. These methods generally leverage genotypic data to resolve causal relationships among gene pairs implicated as associates in the expression data. In particular, leading studies have augmented Bayesian networks with genotypic data, providing a powerful framework for learning and modeling causal relationships. While these initial efforts have provided promising results, one major drawback associated with these methods is that they are generally limited to resolving causal orderings for transcripts most proximal to the genomic loci. In this manuscript, we present a probabilistic method capable of learning the causal relationships between transcripts at all levels in the network. We use the information provided by our method as a prior for Bayesian network structure learning, resulting in enhanced performance for gene network reconstruction. RESULTS: Using established protocols to synthesize eQTL networks and corresponding data, we show that our method achieves improved performance over existing leading methods. For the goal of gene network reconstruction, our method achieves improvements in recall ranging from 20% to 90% across a broad range of precision levels and for datasets of varying sample sizes. Additionally, we show that the learned networks can be utilized for expression quantitative trait loci mapping, resulting in upwards of 10-fold increases in recall over traditional univariate mapping. CONCLUSIONS: Using the information from our method as a prior for Bayesian network structure learning yields large improvements in accuracy for the tasks of gene network reconstruction and expression quantitative trait loci mapping. In particular, our method is effective for establishing causal relationships between transcripts located both proximally and distally from genomic loci.


Subject(s)
Bayes Theorem , Gene Regulatory Networks , Models, Genetic , Quantitative Trait Loci , Computer Simulation , Genetic Variation , Genotype , Stochastic Processes
19.
Mol Inform ; 30(9): 809-15, 2011 Sep.
Article in English | MEDLINE | ID: mdl-27467413

ABSTRACT

Identifying the overrepresented substructures from a set of molecules with similar activity is a common task in chemical informatics. Existing substructure miners are deterministic, requiring the activity of all mined molecules to be known with high confidence. In contrast, we introduce pGraphSig, a probabilistic structure miner, which effectively mines structures from noisy data, where many molecules are labeled with their probability of being active. We benchmark pGraphSig on data from several small-molecule high throughput screens, finding that it can more effectively identify overrepresented structures than a deterministic structure miner.

20.
Article in English | MEDLINE | ID: mdl-20431141

ABSTRACT

The recent advent of high-throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genomewide networks. A significant number of genes in such networks remain uncharacterized and predicting the molecular function of these genes remains a major challenge. A number of existing techniques assume that genes with similar functions are topologically close in the network. Our hypothesis is that genes with similar functions observe similar annotation patterns in their neighborhood, regardless of the distance between them in the interaction network. We thus predict molecular functions of uncharacterized genes by comparing their functional neighborhoods to genes of known function. We propose a two-phase approach. First, we extract functional neighborhood features of a gene using Random Walks with Restarts. We then employ a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features. We perform leave-one-out validation experiments on two S. cerevisiae interaction networks and show significant improvements over previous techniques. Our technique provides a natural control of the trade-off between accuracy and coverage of prediction. We further propose and evaluate prediction in sparse genomes by exploiting features from well-annotated genomes.


Subject(s)
Databases, Genetic , Gene Regulatory Networks , Genes , Genomics/methods , Pattern Recognition, Automated/methods , Animals , Models, Statistical , ROC Curve , Saccharomyces cerevisiae/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...