Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 41
Filter
1.
J Chem Inf Model ; 64(14): 5439-5450, 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-38953560

ABSTRACT

Message passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein-ligand complex scoring tasks. Here, we describe the proximity graph network (PGN) package, an open-source toolkit that constructs ligand-receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with proximity graph data structures augment the prediction of ligand-receptor complex properties when ligand-receptor data are available.


Subject(s)
Neural Networks, Computer , Proteins , Ligands , Proteins/chemistry , Proteins/metabolism , Molecular Docking Simulation , Protein Binding
2.
bioRxiv ; 2023 Nov 22.
Article in English | MEDLINE | ID: mdl-38045231

ABSTRACT

The investigation of chromatin organization in single cells holds great promise for identifying causal relationships between genome structure and function. However, analysis of single-molecule data is hampered by extreme yet inherent heterogeneity, making it challenging to determine the contributions of individual chromatin fibers to bulk trends. To address this challenge, we propose ChromaFactor, a novel computational approach based on non-negative matrix factorization that deconvolves single-molecule chromatin organization datasets into their most salient primary components. ChromaFactor provides the ability to identify trends accounting for the maximum variance in the dataset while simultaneously describing the contribution of individual molecules to each component. Applying our approach to two single-molecule imaging datasets across different genomic scales, we find that these primary components demonstrate significant correlation with key functional phenotypes, including active transcription, enhancer-promoter distance, and genomic compartment. ChromaFactor offers a robust tool for understanding the complex interplay between chromatin structure and function on individual DNA molecules, pinpointing which subpopulations drive functional changes and fostering new insights into cellular heterogeneity and its implications for bulk genomic phenomena.

3.
Acta Neuropathol Commun ; 11(1): 202, 2023 12 18.
Article in English | MEDLINE | ID: mdl-38110981

ABSTRACT

Machine learning (ML) has increasingly been used to assist and expand current practices in neuropathology. However, generating large imaging datasets with quality labels is challenging in fields which demand high levels of expertise. Further complicating matters is the often seen disagreement between experts in neuropathology-related tasks, both at the case level and at a more granular level. Neurofibrillary tangles (NFTs) are a hallmark pathological feature of Alzheimer disease, and are associated with disease progression which warrants further investigation and granular quantification at a scale not currently accessible in routine human assessment. In this work, we first provide a baseline of annotator/rater agreement for the tasks of Braak NFT staging between experts and NFT detection using both experts and novices in neuropathology. We use a whole-slide-image (WSI) cohort of neuropathology cases from Emory University Hospital immunohistochemically stained for Tau. We develop a workflow for gathering annotations of the early stage formation of NFTs (Pre-NFTs) and mature intracellular (iNFTs) and show ML models can be trained to learn annotator nuances for the task of NFT detection in WSIs. We utilize a model-assisted-labeling approach and demonstrate ML models can be used to aid in labeling large datasets efficiently. We also show these models can be used to extract case-level features, which predict Braak NFT stages comparable to expert human raters, and do so at scale. This study provides a generalizable workflow for various pathology and related fields, and also provides a technique for accomplishing a high-level neuropathology task with limited human annotations.


Subject(s)
Alzheimer Disease , Neurodegenerative Diseases , Humans , Neurofibrillary Tangles/pathology , Neurodegenerative Diseases/pathology , tau Proteins/metabolism , Workflow , Brain/pathology , Alzheimer Disease/pathology , Machine Learning
4.
Cell Genom ; 3(10): 100410, 2023 Oct 11.
Article in English | MEDLINE | ID: mdl-37868032

ABSTRACT

Natural and experimental genetic variants can modify DNA loops and insulating boundaries to tune transcription, but it is unknown how sequence perturbations affect chromatin organization genome wide. We developed a deep-learning strategy to quantify the effect of any insertion, deletion, or substitution on chromatin contacts and systematically scored millions of synthetic variants. While most genetic manipulations have little impact, regions with CTCF motifs and active transcription are highly sensitive, as expected. Our unbiased screen and subsequent targeted experiments also point to noncoding RNA genes and several families of repetitive elements as CTCF-motif-free DNA sequences with particularly large effects on nearby chromatin interactions, sometimes exceeding the effects of CTCF sites and explaining interactions that lack CTCF. We anticipate that our disruption tracks may be of broad interest and utility as a measure of 3D genome sensitivity, and our computational strategies may serve as a template for biological inquiry with deep learning.

5.
bioRxiv ; 2023 Aug 28.
Article in English | MEDLINE | ID: mdl-37693536

ABSTRACT

Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to observable traits. However, comprehensive chemical screens in diverse biological models are impractical. To address this challenge, we developed ChemProbe, a model that predicts cellular sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and chemical structures. Using ChemProbe, we inferred the chemical sensitivity of cancer cell lines and tumor samples and analyzed how the model makes predictions. We retrospectively evaluated drug response predictions for precision breast cancer treatment and prospectively validated chemical sensitivity predictions in new cellular models, including a genetically modified cell line. Our model interpretation analysis identified transcriptome features reflecting compound targets and protein network modules, identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that allows researchers to measure cellular response to diverse compounds, facilitating research into molecular mechanisms of chemical sensitivity.

6.
Commun Biol ; 6(1): 668, 2023 06 24.
Article in English | MEDLINE | ID: mdl-37355729

ABSTRACT

Precise, scalable, and quantitative evaluation of whole slide images is crucial in neuropathology. We release a deep learning model for rapid object detection and precise information on the identification, locality, and counts of cored plaques and cerebral amyloid angiopathy (CAA). We trained this object detector using a repurposed image-tile dataset without any human-drawn bounding boxes. We evaluated the detector on a new manually-annotated dataset of whole slide images (WSIs) from three institutions, four staining procedures, and four human experts. The detector matched the cohort of neuropathology experts, achieving 0.64 (model) vs. 0.64 (cohort) average precision (AP) for cored plaques and 0.75 vs. 0.51 AP for CAAs at a 0.5 IOU threshold. It provided count and locality predictions that approximately correlated with gold-standard human CERAD-like WSI scoring (p = 0.07 ± 0.10). The openly-available model can quickly score WSIs in minutes without a GPU on a standard workstation.


Subject(s)
Amyloidogenic Proteins , Plaque, Amyloid , Humans , Records , Staining and Labeling , Virion
7.
bioRxiv ; 2023 Jan 17.
Article in English | MEDLINE | ID: mdl-36711704

ABSTRACT

Precise, scalable, and quantitative evaluation of whole slide images is crucial in neuropathology. We release a deep learning model for rapid object detection and precise information on the identification, locality, and counts of cored plaques and cerebral amyloid angiopathies (CAAs). We trained this object detector using a repurposed image-tile dataset without any human-drawn bounding boxes. We evaluated the detector on a new manually-annotated dataset of whole slide images (WSIs) from three institutions, four staining procedures, and four human experts. The detector matched the cohort of neuropathology experts, achieving 0.64 (model) vs. 0.64 (cohort) average precision (AP) for cored plaques and 0.75 vs. 0.51 AP for CAAs at a 0.5 IOU threshold. It provided count and locality predictions that correlated with gold-standard CERAD-like WSI scoring (p=0.07± 0.10). The openly-available model can quickly score WSIs in minutes without a GPU on a standard workstation.

8.
Nat Mach Intell ; 4(6): 583-595, 2022 Jun.
Article in English | MEDLINE | ID: mdl-36276634

ABSTRACT

In microscopy-based drug screens, fluorescent markers carry critical information on how compounds affect different biological processes. However, practical considerations, such as the labor and preparation formats needed to produce different image channels, hinders the use of certain fluorescent markers. Consequently, completed screens may lack biologically informative but experimentally impractical markers. Here, we present a deep learning method for overcoming these limitations. We accurately generated predicted fluorescent signals from other related markers and validated this new machine learning (ML) method on two biologically distinct datasets. We used the ML method to improve the selection of biologically active compounds for Alzheimer's disease (AD) from a completed high-content high-throughput screen (HCS) that had only contained the original markers. The ML method identified novel compounds that effectively blocked tau aggregation, which had been missed by traditional screening approaches unguided by ML. The method improved triaging efficiency of compound rankings over conventional rankings by raw image channels. We reproduced this ML pipeline on a biologically independent cancer-based dataset, demonstrating its generalizability. The approach is disease-agnostic and applicable across diverse fluorescence microscopy datasets.

9.
J Chem Inf Model ; 62(18): 4300-4318, 2022 09 26.
Article in English | MEDLINE | ID: mdl-36102784

ABSTRACT

Machine learning-based drug discovery success depends on molecular representation. Yet traditional molecular fingerprints omit both the protein and pointers back to structural information that would enable better model interpretability. Therefore, we propose LUNA, a Python 3 toolkit that calculates and encodes protein-ligand interactions into new hashed fingerprints inspired by Extended Connectivity FingerPrint (ECFP): EIFP (Extended Interaction FingerPrint), FIFP (Functional Interaction FingerPrint), and Hybrid Interaction FingerPrint (HIFP). LUNA also provides visual strategies to make the fingerprints interpretable. We performed three major experiments exploring the fingerprints' use. First, we trained machine learning models to reproduce DOCK3.7 scores using 1 million docked Dopamine D4 complexes. We found that EIFP-4,096 performed (R2 = 0.61) superior to related molecular and interaction fingerprints. Second, we used LUNA to support interpretable machine learning models. Finally, we demonstrate that interaction fingerprints can accurately identify similarities across molecular complexes that other fingerprints overlook. Hence, we envision LUNA and its interface fingerprints as promising methods for machine learning-based virtual screening campaigns. LUNA is freely available at https://github.com/keiserlab/LUNA.


Subject(s)
Dopamine , Proteins , Drug Discovery/methods , Ligands , Machine Learning , Proteins/chemistry
10.
Acta Neuropathol Commun ; 10(1): 66, 2022 04 28.
Article in English | MEDLINE | ID: mdl-35484610

ABSTRACT

Pathologists can label pathologies differently, making it challenging to yield consistent assessments in the absence of one ground truth. To address this problem, we present a deep learning (DL) approach that draws on a cohort of experts, weighs each contribution, and is robust to noisy labels. We collected 100,495 annotations on 20,099 candidate amyloid beta neuropathologies (cerebral amyloid angiopathy (CAA), and cored and diffuse plaques) from three institutions, independently annotated by five experts. DL methods trained on a consensus-of-two strategy yielded 12.6-26% improvements by area under the precision recall curve (AUPRC) when compared to those that learned individualized annotations. This strategy surpassed individual-expert models, even when unfairly assessed on benchmarks favoring them. Moreover, ensembling over individual models was robust to hidden random annotators. In blind prospective tests of 52,555 subsequent expert-annotated images, the models labeled pathologies like their human counterparts (consensus model AUPRC = 0.74 cored; 0.69 CAA). This study demonstrates a means to combine multiple ground truths into a common-ground DL model that yields consistent diagnoses informed by multiple and potentially variable expert opinions.


Subject(s)
Cerebral Amyloid Angiopathy , Deep Learning , Amyloid beta-Peptides , Cerebral Amyloid Angiopathy/diagnosis , Humans , Neuropathology , Prospective Studies
11.
NPJ Digit Med ; 4(1): 10, 2021 Jan 21.
Article in English | MEDLINE | ID: mdl-33479460

ABSTRACT

Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational "stress tests". Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5-22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.

12.
J Chem Inf Model ; 60(12): 5957-5970, 2020 12 28.
Article in English | MEDLINE | ID: mdl-33245237

ABSTRACT

Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the SNA procedure, drug-screening benchmark performance increases from R2 = 0.1926 ± 0.0186 to 0.4269 ± 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed y-randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.


Subject(s)
Machine Learning , Neural Networks, Computer
13.
J Med Chem ; 63(16): 8705-8722, 2020 08 27.
Article in English | MEDLINE | ID: mdl-32366098

ABSTRACT

The accurate modeling and prediction of small molecule properties and bioactivities depend on the critical choice of molecular representation. Decades of informatics-driven research have relied on expert-designed molecular descriptors to establish quantitative structure-activity and structure-property relationships for drug discovery. Now, advances in deep learning make it possible to efficiently and compactly learn molecular representations directly from data. In this review, we discuss how active research in molecular deep learning can address limitations of current descriptors and fingerprints while creating new opportunities in cheminformatics and virtual screening. We provide a concise overview of the role of representations in cheminformatics, key concepts in deep learning, and argue that learning representations provides a way forward to improve the predictive modeling of small molecule bioactivities and properties.


Subject(s)
Chemistry, Pharmaceutical/methods , Deep Learning , Organic Chemicals/chemistry , Cheminformatics , Models, Molecular , Molecular Structure , Quantitative Structure-Activity Relationship
14.
Acta Neuropathol Commun ; 8(1): 59, 2020 04 28.
Article in English | MEDLINE | ID: mdl-32345363

ABSTRACT

Semi-quantitative scoring schemes like the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) are the most commonly used method in Alzheimer's disease (AD) neuropathology practice. Computational approaches based on machine learning have recently generated quantitative scores for whole slide images (WSIs) that are highly correlated with human derived semi-quantitative scores, such as those of CERAD, for Alzheimer's disease pathology. However, the robustness of such models have yet to be tested in different cohorts. To validate previously published machine learning algorithms using convolutional neural networks (CNNs) and determine if pathological heterogeneity may alter algorithm derived measures, 40 cases from the Goizueta Emory Alzheimer's Disease Center brain bank displaying an array of pathological diagnoses (including AD with and without Lewy body disease (LBD), and / or TDP-43-positive inclusions) and levels of Aß pathologies were evaluated. Furthermore, to provide deeper phenotyping, amyloid burden in gray matter vs whole tissue were compared, and quantitative CNN scores for both correlated significantly to CERAD-like scores. Quantitative scores also show clear stratification based on AD pathologies with or without additional diagnoses (including LBD and TDP-43 inclusions) vs cases with no significant neurodegeneration (control cases) as well as NIA Reagan scoring criteria. Specifically, the concomitant diagnosis group of AD + TDP-43 showed significantly greater CNN-score for cored plaques than the AD group. Finally, we report that whole tissue computational scores correlate better with CERAD-like categories than focusing on computational scores from a field of view with densest pathology, which is the standard of practice in neuropathological assessment per CERAD guidelines. Together these findings validate and expand CNN models to be robust to cohort variations and provide additional proof-of-concept for future studies to incorporate machine learning algorithms into neuropathological practice.


Subject(s)
Alzheimer Disease/diagnosis , Machine Learning , Neural Networks, Computer , Neurodegenerative Diseases/diagnosis , Alzheimer Disease/pathology , Amyloid beta-Peptides , Humans , Image Interpretation, Computer-Assisted , Lewy Body Disease/diagnosis , Lewy Body Disease/pathology , Neurodegenerative Diseases/pathology , TDP-43 Proteinopathies/diagnosis , TDP-43 Proteinopathies/pathology
15.
J Invest Dermatol ; 140(8): 1504-1512, 2020 08.
Article in English | MEDLINE | ID: mdl-32229141

ABSTRACT

Artificial intelligence is becoming increasingly important in dermatology, with studies reporting accuracy matching or exceeding dermatologists for the diagnosis of skin lesions from clinical and dermoscopic images. However, real-world clinical validation is currently lacking. We review dermatological applications of deep learning, the leading artificial intelligence technology for image analysis, and discuss its current capabilities, potential failure modes, and challenges surrounding performance assessment and interpretability. We address the following three primary applications: (i) teledermatology, including triage for referral to dermatologists; (ii) augmenting clinical assessment during face-to-face visits; and (iii) dermatopathology. We discuss equity and ethical issues related to future clinical adoption and recommend specific standardization of metrics for reporting model performance.


Subject(s)
Deep Learning/ethics , Dermatology/methods , Image Processing, Computer-Assisted/methods , Skin Diseases/diagnosis , Skin/diagnostic imaging , Dermatology/ethics , Humans , Image Processing, Computer-Assisted/ethics , Referral and Consultation , Skin/pathology , Skin Diseases/pathology , Telemedicine/ethics , Telemedicine/methods , Triage/ethics , Triage/methods
16.
Nat Commun ; 10(1): 4078, 2019 09 09.
Article in English | MEDLINE | ID: mdl-31501447

ABSTRACT

Anesthetics are generally associated with sedation, but some anesthetics can also increase brain and motor activity-a phenomenon known as paradoxical excitation. Previous studies have identified GABAA receptors as the primary targets of most anesthetic drugs, but how these compounds produce paradoxical excitation is poorly understood. To identify and understand such compounds, we applied a behavior-based drug profiling approach. Here, we show that a subset of central nervous system depressants cause paradoxical excitation in zebrafish. Using this behavior as a readout, we screened thousands of compounds and identified dozens of hits that caused paradoxical excitation. Many hit compounds modulated human GABAA receptors, while others appeared to modulate different neuronal targets, including the human serotonin-6 receptor. Ligands at these receptors generally decreased neuronal activity, but paradoxically increased activity in the caudal hindbrain. Together, these studies identify ligands, targets, and neurons affecting sedation and paradoxical excitation in vivo in zebrafish.


Subject(s)
Behavior, Animal , Conscious Sedation , Receptors, GABA-A/metabolism , Receptors, Serotonin/metabolism , Zebrafish/metabolism , Animals , Ligands , Neural Inhibition , Neurons/physiology , Serotonin Antagonists/chemistry , Zebrafish Proteins/metabolism
17.
Nat Commun ; 10(1): 2173, 2019 05 15.
Article in English | MEDLINE | ID: mdl-31092819

ABSTRACT

Neuropathologists assess vast brain areas to identify diverse and subtly-differentiated morphologies. Standard semi-quantitative scoring approaches, however, are coarse-grained and lack precise neuroanatomic localization. We report a proof-of-concept deep learning pipeline that identifies specific neuropathologies-amyloid plaques and cerebral amyloid angiopathy-in immunohistochemically-stained archival slides. Using automated segmentation of stained objects and a cloud-based interface, we annotate > 70,000 plaque candidates from 43 whole slide images (WSIs) to train and evaluate convolutional neural networks. Networks achieve strong plaque classification on a 10-WSI hold-out set (0.993 and 0.743 areas under the receiver operating characteristic and precision recall curve, respectively). Prediction confidence maps visualize morphology distributions at high resolution. Resulting network-derived amyloid beta (Aß)-burden scores correlate well with established semi-quantitative scores on a 30-WSI blinded hold-out. Finally, saliency mapping demonstrates that networks learn patterns agreeing with accepted pathologic features. This scalable means to augment a neuropathologist's ability suggests a route to neuropathologic deep phenotyping.


Subject(s)
Alzheimer Disease/pathology , Brain/pathology , Deep Learning , Image Processing, Computer-Assisted/methods , Aged , Aged, 80 and over , Cohort Studies , Datasets as Topic , Female , Humans , Male , ROC Curve
18.
Science ; 362(6416)2018 11 16.
Article in English | MEDLINE | ID: mdl-30442776

ABSTRACT

Ahneman et al (Reports, 13 April 2018) applied machine learning models to predict C-N cross-coupling reaction yields. The models use atomic, electronic, and vibrational descriptors as input features. However, the experimental design is insufficient to distinguish models trained on chemical features from those trained solely on random-valued features in retrospective and prospective test scenarios, thus failing classical controls in machine learning.

19.
ACS Chem Biol ; 13(10): 2819-2821, 2018 10 19.
Article in English | MEDLINE | ID: mdl-30336670

ABSTRACT

New machine learning methods to analyze raw chemical and biological data are now widely accessible as open-source toolkits. This positions researchers to leverage powerful, predictive models in their own domains. We caution, however, that the application of machine learning to experimental research merits careful consideration. Machine learning algorithms readily exploit confounding variables and experimental artifacts instead of relevant patterns, leading to overoptimistic performance and poor model generalization. In parallel to the strong control experiments that remain a cornerstone of experimental research, we advance the concept of adversarial controls for scientific machine learning: the design of exacting and purposeful experiments to ensure that predictive performance arises from meaningful models.


Subject(s)
Machine Learning/standards , Models, Theoretical , Logic
20.
Cell ; 174(3): 505-520, 2018 07 26.
Article in English | MEDLINE | ID: mdl-30053424

ABSTRACT

Although gene discovery in neuropsychiatric disorders, including autism spectrum disorder, intellectual disability, epilepsy, schizophrenia, and Tourette disorder, has accelerated, resulting in a large number of molecular clues, it has proven difficult to generate specific hypotheses without the corresponding datasets at the protein complex and functional pathway level. Here, we describe one path forward-an initiative aimed at mapping the physical and genetic interaction networks of these conditions and then using these maps to connect the genomic data to neurobiology and, ultimately, the clinic. These efforts will include a team of geneticists, structural biologists, neurobiologists, systems biologists, and clinicians, leveraging a wide array of experimental approaches and creating a collaborative infrastructure necessary for long-term investigation. This initiative will ultimately intersect with parallel studies that focus on other diseases, as there is a significant overlap with genes implicated in cancer, infectious disease, and congenital heart defects.


Subject(s)
Chromosome Mapping/methods , Neurodevelopmental Disorders/genetics , Systems Biology/methods , Gene Regulatory Networks/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Genomics/methods , Humans , Neurobiology/methods , Neuropsychiatry
SELECTION OF CITATIONS
SEARCH DETAIL
...