Search | VHL Regional Portal

1.

Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning.

Cappelletti, Luca; Rekerle, Lauren; Fontana, Tommaso; Hansen, Peter; Casiraghi, Elena; Ravanmehr, Vida; Mungall, Christopher J; Yang, Jeremy J; Spranger, Leonard; Karlebach, Guy; Caufield, J Harry; Carmody, Leigh; Coleman, Ben; Oprea, Tudor I; Reese, Justin; Valentini, Giorgio; Robinson, Peter N.

Bioinform Adv ; 4(1): vbae036, 2024.

Article in English | MEDLINE | ID: mdl-38577542

ABSTRACT

Motivation: Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes. For this reason, classification procedures are forced to assume that the vast majority of unlabeled edges are negative. Existing approaches to sampling negative edges for training and evaluating classifiers do so by uniformly sampling pairs of nodes. Results: We show here that this sampling strategy typically leads to sets of positive and negative examples with imbalanced node degree distributions. Using representative heterogeneous biomedical knowledge graph and random walk-based graph machine learning, we show that this strategy substantially impacts classification performance. If users of graph machine-learning models apply the models to prioritize examples that are drawn from approximately the same distribution as the positive examples are, then performance of models as estimated in the validation phase may be artificially inflated. We present a degree-aware node sampling approach that mitigates this effect and is simple to implement. Availability and implementation: Our code and data are publicly available at https://github.com/monarch-initiative/negativeExampleSelection.

2.

Computational drug repositioning identifies niclosamide and tribromsalan as inhibitors of Mycobacterium tuberculosis and Mycobacterium abscessus.

Yang, Jeremy J; Goff, Aaron; Wild, David J; Ding, Ying; Annis, Ayano; Kerber, Randy; Foote, Brian; Passi, Anurag; Duerksen, Joel L; London, Shelley; Puhl, Ana C; Lane, Thomas R; Braunstein, Miriam; Waddell, Simon J; Ekins, Sean.

Tuberculosis (Edinb) ; 146: 102500, 2024 May.

Article in English | MEDLINE | ID: mdl-38432118

ABSTRACT

Tuberculosis (TB) is still a major global health challenge, killing over 1.5 million people each year, and hence, there is a need to identify and develop novel treatments for Mycobacterium tuberculosis (M. tuberculosis). The prevalence of infections caused by nontuberculous mycobacteria (NTM) is also increasing and has overtaken TB cases in the United States and much of the developed world. Mycobacterium abscessus (M. abscessus) is one of the most frequently encountered NTM and is difficult to treat. We describe the use of drug-disease association using a semantic knowledge graph approach combined with machine learning models that has enabled the identification of several molecules for testing anti-mycobacterial activity. We established that niclosamide (M. tuberculosis IC90 2.95 µM; M. abscessus IC90 59.1 µM) and tribromsalan (M. tuberculosis IC90 76.92 µM; M. abscessus IC90 147.4 µM) inhibit M. tuberculosis and M. abscessus in vitro. To investigate the mode of action, we determined the transcriptional response of M. tuberculosis and M. abscessus to both compounds in axenic log phase, demonstrating a broad effect on gene expression that differed from known M. tuberculosis inhibitors. Both compounds elicited transcriptional responses indicative of respiratory pathway stress and the dysregulation of fatty acid metabolism.

Subject(s)

Mycobacterium Infections, Nontuberculous , Mycobacterium abscessus , Mycobacterium tuberculosis , Salicylanilides , Tuberculosis , Humans , Mycobacterium tuberculosis/genetics , Mycobacterium Infections, Nontuberculous/microbiology , Niclosamide/pharmacology , Drug Repositioning , Nontuberculous Mycobacteria/genetics , Tuberculosis/drug therapy , Tuberculosis/microbiology

3.

User Centered Rare Disease Clinical Trial Knowledge Graph (RCTKG).

Yang, Jeremy Parker; Leadman, Devon; Ballew, Richard M; Sid, Eric; Xu, Yanji; Mathé, Ewy A; Zhu, Qian.

Stud Health Technol Inform ; 310: 94-98, 2024 Jan 25.

Article in English | MEDLINE | ID: mdl-38269772

ABSTRACT

Drug development in rare diseases is challenging due to the limited availability of subjects with the diseases and recruiting from a small patient population. The high cost and low success rate of clinical trials motivate deliberate analysis of existing clinical trials to understand status of clinical development of orphan drugs and discover new insight for new trial. In this project, we aim to develop a user centered Rare disease based Clinical Trial Knowledge Graph (RCTKG) to integrate publicly available clinical trial data with rare diseases from the Genetic and Rare Disease (GARD) program in a semantic and standardized form for public use. To better serve and represent the interests of rare disease users, user stories were defined for three types of users, patients, healthcare providers and informaticians, to guide the RCTKG design in supporting the GARD program at NCATS/NIH and the broad clinical/research community in rare diseases.

Subject(s)

Pattern Recognition, Automated , Rare Diseases , Humans , Rare Diseases/drug therapy , Rare Diseases/genetics , Health Personnel , Knowledge

4.

Overview of the Knowledge Management Center for Illuminating the Druggable Genome.

Oprea, Tudor I; Bologa, Cristian; Holmes, Jayme; Mathias, Stephen; Metzger, Vincent T; Waller, Anna; Yang, Jeremy J; Leach, Andrew R; Jensen, Lars Juhl; Kelleher, Keith J; Sheils, Timothy K; Mathé, Ewy; Avram, Sorin; Edwards, Jeremy S.

Drug Discov Today ; 29(3): 103882, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38218214

ABSTRACT

The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project aims to aggregate, update, and articulate protein-centric data knowledge for the entire human proteome, with emphasis on the understudied proteins from the three IDG protein families. KMC collates and analyzes data from over 70 resources to compile the Target Central Resource Database (TCRD), which is the web-based informatics platform (Pharos). These data include experimental, computational, and text-mined information on protein structures, compound interactions, and disease and phenotype associations. Based on this knowledge, proteins are classified into different Target Development Levels (TDLs) for identification of understudied targets. Additional work by the KMC focuses on enriching target knowledge and producing DrugCentral and other data visualization tools for expanding investigation of understudied targets.

Subject(s)

Genome , Knowledge Management , Humans , Proteome , Databases, Factual , Informatics

5.

Concurrent Pediatric Lingual and Submental Dermoid Cysts: Case Report and Literature Review.

Gleichmann, Natasha; Creighton, Elizabeth; Zhu, Austin; Willard, Nicholas; Yang, Jeremy; Herrmann, Brian W.

Cureus ; 15(7): e42429, 2023 Jul.

Article in English | MEDLINE | ID: mdl-37637563

ABSTRACT

This pediatric case report describes the novel finding of concurrent submental and lingual dermoid cysts, which to our knowledge, has not been previously reported in the literature. The etiology of cysts involving the tongue, floor of the mouth, and submental neck is varied, representing congenital, inflammatory, and neoplastic sources. Dermoid cysts involving these regions are uncommon and are most frequently reported in the submental, sublingual, and lingual spaces. Presenting symptoms vary with cyst size and position relative to the mylohyoid muscle. MRI is the preferred modality to differentiate dermoid cysts from other etiologies. While interventional techniques have been utilized to treat dermoid cysts in other head and neck locations, surgical excision remains the preferred treatment for those involving oral and floor-of-mouth structures.

6.

Toxicology knowledge graph for structural birth defects.

Evangelista, John Erol; Clarke, Daniel J B; Xie, Zhuorui; Marino, Giacomo B; Utti, Vivian; Jenkins, Sherry L; Ahooyi, Taha Mohseni; Bologa, Cristian G; Yang, Jeremy J; Binder, Jessica L; Kumar, Praveen; Lambert, Christophe G; Grethe, Jeffrey S; Wenger, Eric; Taylor, Deanne; Oprea, Tudor I; de Bono, Bernard; Ma'ayan, Avi.

Commun Med (Lond) ; 3(1): 98, 2023 Jul 17.

Article in English | MEDLINE | ID: mdl-37460679

ABSTRACT

BACKGROUND: Birth defects are functional and structural abnormalities that impact about 1 in 33 births in the United States. They have been attributed to genetic and other factors such as drugs, cosmetics, food, and environmental pollutants during pregnancy, but for most birth defects there are no known causes. METHODS: To further characterize associations between small molecule compounds and their potential to induce specific birth abnormalities, we gathered knowledge from multiple sources to construct a reproductive toxicity Knowledge Graph (ReproTox-KG) with a focus on associations between birth defects, drugs, and genes. Specifically, we gathered data from drug/birth-defect associations from co-mentions in published abstracts, gene/birth-defect associations from genetic studies, drug- and preclinical-compound-induced gene expression changes in cell lines, known drug targets, genetic burden scores for human genes, and placental crossing scores for small molecules. RESULTS: Using ReproTox-KG and semi-supervised learning (SSL), we scored >30,000 preclinical small molecules for their potential to cross the placenta and induce birth defects, and identified >500 birth-defect/gene/drug cliques that can be used to explain molecular mechanisms for drug-induced birth defects. The ReproTox-KG can be accessed via a web-based user interface available at https://maayanlab.cloud/reprotox-kg . This site enables users to explore the associations between birth defects, approved and preclinical drugs, and all human genes. CONCLUSIONS: ReproTox-KG provides a resource for exploring knowledge about the molecular mechanisms of birth defects with the potential of predicting the likelihood of genes and preclinical small molecules to induce birth defects.

While birth defects are common, for most birth defects there are no known causes. During pregnancy, developing babies are exposed to drugs, cosmetics, food, and environmental pollutants that may cause birth defects. However, exactly how these environmental factors are involved in producing birth defects is difficult to discern. Also, birth defects can be a consequence of the genes inherited from the parents. We combined general data about human genes and drugs with specific data previously implicating genes and drugs in inducing birth defects to create a knowledge graph representation that connects genes, drugs, and birth defects. This knowledge graph can be used to explore new links that may explain why birth defects occur, particularly those that result from a combination of inherited and environmental influences.

7.

Pharos 2023: an integrated resource for the understudied human proteome.

Kelleher, Keith J; Sheils, Timothy K; Mathias, Stephen L; Yang, Jeremy J; Metzger, Vincent T; Siramshetty, Vishal B; Nguyen, Dac-Trung; Jensen, Lars Juhl; Vidovic, Dusica; Schürer, Stephan C; Holmes, Jayme; Sharma, Karlie R; Pillai, Ajay; Bologa, Cristian G; Edwards, Jeremy S; Mathé, Ewy A; Oprea, Tudor I.

Nucleic Acids Res ; 51(D1): D1405-D1416, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36624666

ABSTRACT

The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.

Subject(s)

Databases, Factual , Molecular Targeted Therapy , Proteome , Humans , Biological Products , Drug Discovery , Internet , Proteome/drug effects

8.

DrugCentral 2023 extends human clinical data and integrates veterinary drugs.

Avram, Sorin; Wilson, Thomas B; Curpan, Ramona; Halip, Liliana; Borota, Ana; Bora, Alina; Bologa, Cristian G; Holmes, Jayme; Knockel, Jeffrey; Yang, Jeremy J; Oprea, Tudor I.

Nucleic Acids Res ; 51(D1): D1276-D1287, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36484092

ABSTRACT

DrugCentral monitors new drug approvals and standardizes drug information. The current update contains 285 drugs (131 for human use). New additions include: (i) the integration of veterinary drugs (154 for animal use only), (ii) the addition of 66 documented off-label uses and iii) the identification of adverse drug events from pharmacovigilance data for pediatric and geriatric patients. Additional enhancements include chemical substructure searching using SMILES and 'Target Cards' based on UniProt accession codes. Statistics of interests include the following: (i) 60% of the covered drugs are on-market drugs with expired patent and exclusivity coverage, 17% are off-market, and 23% are on-market drugs with active patents and exclusivity coverage; (ii) 59% of the drugs are oral, 33% are parenteral and 18% topical, at the level of the active ingredients; (iii) only 3% of all drugs are for animal use only; however, 61% of the veterinary drugs are also approved for human use; (iv) dogs, cats and horses are by far the most represented target species for veterinary drugs; (v) the physicochemical property profile of animal drugs is very similar to that of human drugs. Use cases include azaperone, the only sedative approved for swine, and ruxolitinib, a Janus kinase inhibitor.

Subject(s)

Drug Approval , Drug-Related Side Effects and Adverse Reactions , Veterinary Drugs , Animals , Humans , Drug-Related Side Effects and Adverse Reactions/veterinary , Veterinary Drugs/administration & dosage , Veterinary Drugs/adverse effects , Off-Label Use/veterinary

9.

Machine learning prediction and tau-based screening identifies potential Alzheimer's disease genes relevant to immunity.

Binder, Jessica; Ursu, Oleg; Bologa, Cristian; Jiang, Shanya; Maphis, Nicole; Dadras, Somayeh; Chisholm, Devon; Weick, Jason; Myers, Orrin; Kumar, Praveen; Yang, Jeremy J; Bhaskar, Kiran; Oprea, Tudor I.

Commun Biol ; 5(1): 125, 2022 02 11.

Article in English | MEDLINE | ID: mdl-35149761

ABSTRACT

With increased research funding for Alzheimer's disease (AD) and related disorders across the globe, large amounts of data are being generated. Several studies employed machine learning methods to understand the ever-growing omics data to enhance early diagnosis, map complex disease networks, or uncover potential drug targets. We describe results based on a Target Central Resource Database protein knowledge graph and evidence paths transformed into vectors by metapath matching. We extracted features between specific genes and diseases, then trained and optimized our model using XGBoost, termed MPxgb(AD). To determine our MPxgb(AD) prediction performance, we examined the top twenty predicted genes through an experimental screening pipeline. Our analysis identified potential AD risk genes: FRRS1, CTRAM, SCGB3A1, FAM92B/CIBAR2, and TMEFF2. FRRS1 and FAM92B are considered dark genes, while CTRAM, SCGB3A1, and TMEFF2 are connected to TREM2-TYROBP, IL-1ß-TNFα, and MTOR-APP AD-risk nodes, suggesting relevance to the pathogenesis of AD.

Subject(s)

Alzheimer Disease , Alzheimer Disease/diagnosis , Alzheimer Disease/genetics , Alzheimer Disease/metabolism , Early Diagnosis , Humans , Machine Learning , Membrane Proteins/metabolism , Neoplasm Proteins

10.

Knowledge graph analytics platform with LINCS and IDG for Parkinson's disease target illumination.

Yang, Jeremy J; Gessner, Christopher R; Duerksen, Joel L; Biber, Daniel; Binder, Jessica L; Ozturk, Murat; Foote, Brian; McEntire, Robin; Stirling, Kyle; Ding, Ying; Wild, David J.

BMC Bioinformatics ; 23(1): 37, 2022 Jan 12.

Article in English | MEDLINE | ID: mdl-35021991

ABSTRACT

BACKGROUND: LINCS, "Library of Integrated Network-based Cellular Signatures", and IDG, "Illuminating the Druggable Genome", are both NIH projects and consortia that have generated rich datasets for the study of the molecular basis of human health and disease. LINCS L1000 expression signatures provide unbiased systems/omics experimental evidence. IDG provides compiled and curated knowledge for illumination and prioritization of novel drug target hypotheses. Together, these resources can support a powerful new approach to identifying novel drug targets for complex diseases, such as Parkinson's disease (PD), which continues to inflict severe harm on human health, and resist traditional research approaches. RESULTS: Integrating LINCS and IDG, we built the Knowledge Graph Analytics Platform (KGAP) to support an important use case: identification and prioritization of drug target hypotheses for associated diseases. The KGAP approach includes strong semantics interpretable by domain scientists and a robust, high performance implementation of a graph database and related analytical methods. Illustrating the value of our approach, we investigated results from queries relevant to PD. Approved PD drug indications from IDG's resource DrugCentral were used as starting points for evidence paths exploring chemogenomic space via LINCS expression signatures for associated genes, evaluated as target hypotheses by integration with IDG. The KG-analytic scoring function was validated against a gold standard dataset of genes associated with PD as elucidated, published mechanism-of-action drug targets, also from DrugCentral. IDG's resource TIN-X was used to rank and filter KGAP results for novel PD targets, and one, SYNGR3 (Synaptogyrin-3), was manually investigated further as a case study and plausible new drug target for PD. CONCLUSIONS: The synergy of LINCS and IDG, via KG methods, empowers graph analytics methods for the investigation of the molecular basis of complex diseases, and specifically for identification and prioritization of novel drug targets. The KGAP approach enables downstream applications via integration with resources similarly aligned with modern KG methodology. The generality of the approach indicates that KGAP is applicable to many disease areas, in addition to PD, the focus of this paper.

Subject(s)

Parkinson Disease , Gene Library , Genome , Humans , Lighting , Parkinson Disease/drug therapy , Parkinson Disease/genetics , Pattern Recognition, Automated

11.

Getting Started with the IDG KMC Datasets and Tools.

Kropiwnicki, Eryk; Binder, Jessica L; Yang, Jeremy J; Holmes, Jayme; Lachmann, Alexander; Clarke, Daniel J B; Sheils, Timothy; Kelleher, Keith J; Metzger, Vincent T; Bologa, Cristian G; Oprea, Tudor I; Ma'ayan, Avi.

Curr Protoc ; 2(1): e355, 2022 Jan.

Article in English | MEDLINE | ID: mdl-35085427

ABSTRACT

The Illuminating the Druggable Genome (IDG) consortium is a National Institutes of Health (NIH) Common Fund program designed to enhance our knowledge of under-studied proteins, more specifically, proteins unannotated within the three most commonly drug-targeted protein families: G-protein coupled receptors, ion channels, and protein kinases. Since 2014, the IDG Knowledge Management Center (IDG-KMC) has generated several open-access datasets and resources that jointly serve as a highly translational machine-learning-ready knowledgebase focused on human protein-coding genes and their products. The goal of the IDG-KMC is to develop comprehensive integrated knowledge for the druggable genome to illuminate the uncharacterized or poorly annotated portion of the druggable genome. The tools derived from the IDG-KMC provide either user-friendly visualizations or ways to impute the knowledge about potential targets using machine learning strategies. In the following protocols, we describe how to use each web-based tool to accelerate illumination in under-studied proteins. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Interacting with the Pharos user interface Basic Protocol 2: Accessing the data in Harmonizome Basic Protocol 3: The ARCHS4 resource Basic Protocol 4: Making predictions about gene function with PrismExp Basic Protocol 5: Using Geneshot to illuminate knowledge about under-studied targets Basic Protocol 6: Exploring under-studied targets with TIN-X Basic Protocol 7: Interacting with the DrugCentral user interface Basic Protocol 8: Estimating Anti-SARS-CoV-2 activities with DrugCentral REDIAL-2020 Basic Protocol 9: Drug Set Enrichment Analysis using Drugmonizome Basic Protocol 10: The Drugmonizome-ML Appyter Basic Protocol 11: The Harmonizome-ML Appyter Basic Protocol 12: GWAS target illumination with TIGA Basic Protocol 13: Prioritizing kinases for lists of proteins and phosphoproteins with KEA3 Basic Protocol 14: Converting PubMed searches to drug sets with the DrugShot Appyter.

Subject(s)

Databases, Genetic , Genome , COVID-19 , Humans , Machine Learning , Proteins , SARS-CoV-2

12.

TIGA: target illumination GWAS analytics.

Yang, Jeremy J; Grissa, Dhouha; Lambert, Christophe G; Bologa, Cristian G; Mathias, Stephen L; Waller, Anna; Wild, David J; Jensen, Lars Juhl; Oprea, Tudor I.

Bioinformatics ; 37(21): 3865-3873, 2021 11 05.

Article in English | MEDLINE | ID: mdl-34086846

ABSTRACT

MOTIVATION: Genome-wide association studies can reveal important genotype-phenotype associations; however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. RESULTS: Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite relative citation ratio, and meanRank scores, to aggregate multivariate evidence.This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY AND IMPLEMENTATION: Web application, datasets and source code via https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genome-Wide Association Study , Lighting , Genotype , Polymorphism, Single Nucleotide , Phenotype

13.

Analyzing knowledge entities about COVID-19 using entitymetrics.

Yu, Qi; Wang, Qi; Zhang, Yafei; Chen, Chongyan; Ryu, Hyeyoung; Park, Namu; Baek, Jae-Eun; Li, Keyuan; Wu, Yifei; Li, Daifeng; Xu, Jian; Liu, Meijun; Yang, Jeremy J; Zhang, Chenwei; Lu, Chao; Zhang, Peng; Li, Xin; Chen, Baitong; Ebeid, Islam Akef; Fensel, Julia; Min, Chao; Zhai, Yujia; Song, Min; Ding, Ying; Bu, Yi.

Scientometrics ; 126(5): 4491-4509, 2021.

Article in English | MEDLINE | ID: mdl-33746309

ABSTRACT

COVID-19 cases have surpassed the 109 + million markers, with deaths tallying up to 2.4 million. Tens of thousands of papers regarding COVID-19 have been published along with countless bibliometric analyses done on COVID-19 literature. Despite this, none of the analyses have focused on domain entities occurring in scientific publications. However, analysis of these bio-entities and the relations among them, a strategy called entity metrics, could offer more insights into knowledge usage and diffusion in specific cases. Thus, this paper presents an entitymetric analysis on COVID-19 literature. We construct an entity-entity co-occurrence network and employ network indicators to analyze the extracted entities. We find that ACE-2 and C-reactive protein are two very important genes and that lopinavir and ritonavir are two very important chemicals, regardless of the results from either ranking.

14.

TCRD and Pharos 2021: mining the human proteome for disease biology.

Sheils, Timothy K; Mathias, Stephen L; Kelleher, Keith J; Siramshetty, Vishal B; Nguyen, Dac-Trung; Bologa, Cristian G; Jensen, Lars Juhl; Vidovic, Dusica; Koleti, Amar; Schürer, Stephan C; Waller, Anna; Yang, Jeremy J; Holmes, Jayme; Bocci, Giovanni; Southall, Noel; Dharkar, Poorva; Mathé, Ewy; Simeonov, Anton; Oprea, Tudor I.

Nucleic Acids Res ; 49(D1): D1334-D1346, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33156327

ABSTRACT

In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.

Subject(s)

Databases, Factual , Genome, Human , Neurodegenerative Diseases/genetics , Proteomics/methods , Software , Virus Diseases/genetics , Animals , Anticonvulsants/chemistry , Anticonvulsants/therapeutic use , Antiviral Agents/chemistry , Antiviral Agents/therapeutic use , Biological Products/chemistry , Biological Products/therapeutic use , Data Mining/statistics & numerical data , Host-Pathogen Interactions/drug effects , Host-Pathogen Interactions/genetics , Humans , Internet , Machine Learning/statistics & numerical data , Mice , Mice, Knockout , Molecular Targeted Therapy/methods , Neurodegenerative Diseases/classification , Neurodegenerative Diseases/drug therapy , Neurodegenerative Diseases/virology , Protein Interaction Mapping , Proteome/agonists , Proteome/antagonists & inhibitors , Proteome/genetics , Proteome/metabolism , Small Molecule Libraries/chemistry , Small Molecule Libraries/therapeutic use , Virus Diseases/classification , Virus Diseases/drug therapy , Virus Diseases/virology

15.

DrugCentral 2021 supports drug discovery and repositioning.

Avram, Sorin; Bologa, Cristian G; Holmes, Jayme; Bocci, Giovanni; Wilson, Thomas B; Nguyen, Dac-Trung; Curpan, Ramona; Halip, Liliana; Bora, Alina; Yang, Jeremy J; Knockel, Jeffrey; Sirimulla, Suman; Ursu, Oleg; Oprea, Tudor I.

Nucleic Acids Res ; 49(D1): D1160-D1169, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33151287

ABSTRACT

DrugCentral is a public resource (http://drugcentral.org) that serves the scientific community by providing up-to-date drug information, as described in previous papers. The current release includes 109 newly approved (October 2018 through March 2020) active pharmaceutical ingredients in the US, Europe, Japan and other countries; and two molecular entities (e.g. mefuparib) of interest for COVID19. New additions include a set of pharmacokinetic properties for â¼1000 drugs, and a sex-based separation of side effects, processed from FAERS (FDA Adverse Event Reporting System); as well as a drug repositioning prioritization scheme based on the market availability and intellectual property rights forFDA approved drugs. In the context of the COVID19 pandemic, we also incorporated REDIAL-2020, a machine learning platform that estimates anti-SARS-CoV-2 activities, as well as the 'drugs in news' feature offers a brief enumeration of the most interesting drugs at the present moment. The full database dump and data files are available for download from the DrugCentral web portal.

Subject(s)

Antiviral Agents/therapeutic use , COVID-19 Drug Treatment , Databases, Pharmaceutical/statistics & numerical data , Drug Approval/statistics & numerical data , Drug Discovery/statistics & numerical data , Drug Repositioning/statistics & numerical data , SARS-CoV-2/drug effects , Antiviral Agents/adverse effects , Antiviral Agents/pharmacokinetics , COVID-19/epidemiology , COVID-19/virology , Drug Approval/methods , Drug Discovery/methods , Drug Repositioning/methods , Epidemics , Europe , Humans , Information Storage and Retrieval/methods , Internet , Japan , SARS-CoV-2/physiology , United States

16.

REDIAL-2020: A suite of machine learning models to estimate Anti-SARS-CoV-2 activities.

Govinda, K C; Bocci, Giovanni; Verma, Srijan; Hassan, Mahmudulla; Holmes, Jayme; Yang, Jeremy J; Sirimulla, Suman; Oprea, Tudor I.

ChemRxiv ; 2020 Sep 16.

Article in English | MEDLINE | ID: mdl-33200119

ABSTRACT

Strategies for drug discovery and repositioning are an urgent need with respect to COVID-19. We developed "REDIAL-2020", a suite of machine learning models for estimating small molecule activity from molecular structure, for a range of SARS-CoV-2 related assays. Each classifier is based on three distinct types of descriptors (fingerprint, physicochemical, and pharmacophore) for parallel model development. These models were trained using high throughput screening data from the NCATS COVID19 portal (https://opendata.ncats.nih.gov/covid19/index.html), with multiple categorical machine learning algorithms. The "best models" are combined in an ensemble consensus predictor that outperforms single models where external validation is available. This suite of machine learning models is available through the DrugCentral web portal (http://drugcentral.org/Redial). Acceptable input formats are: drug name, PubChem CID, or SMILES; the output is an estimate of anti-SARS-CoV-2 activities. The web application reports estimated activity across three areas (viral entry, viral replication, and live virus infectivity) spanning six independent models, followed by a similarity search that displays the most similar molecules to the query among experimentally determined data. The ML models have 60% to 74% external predictivity, based on three separate datasets. Complementing the NCATS COVID19 portal, REDIAL-2020 can serve as a rapid online tool for identifying active molecules for COVID-19 treatment. The source code and specific models are available through Github (https://github.com/sirimullalab/redial-2020), or via Docker Hub (https://hub.docker.com/r/sirimullalab/redial-2020) for users preferring a containerized version.

17.

Interdependence and the cost of uncoordinated responses to COVID-19.

Holtz, David; Zhao, Michael; Benzell, Seth G; Cao, Cathy Y; Rahimian, Mohammad Amin; Yang, Jeremy; Allen, Jennifer; Collis, Avinash; Moehring, Alex; Sowrirajan, Tara; Ghosh, Dipayan; Zhang, Yunhao; Dhillon, Paramveer S; Nicolaides, Christos; Eckles, Dean; Aral, Sinan.

Proc Natl Acad Sci U S A ; 117(33): 19837-19843, 2020 08 18.

Article in English | MEDLINE | ID: mdl-32732433

ABSTRACT

Social distancing is the core policy response to coronavirus disease 2019 (COVID-19). But, as federal, state and local governments begin opening businesses and relaxing shelter-in-place orders worldwide, we lack quantitative evidence on how policies in one region affect mobility and social distancing in other regions and the consequences of uncoordinated regional policies adopted in the presence of such spillovers. To investigate this concern, we combined daily, county-level data on shelter-in-place policies with movement data from over 27 million mobile devices, social network connections among over 220 million Facebook users, daily temperature and precipitation data from 62,000 weather stations, and county-level census data on population demographics to estimate the geographic and social network spillovers created by regional policies across the United States. Our analysis shows that the contact patterns of people in a given region are significantly influenced by the policies and behaviors of people in other, sometimes distant, regions. When just one-third of a state's social and geographic peer states adopt shelter-in-place policies, it creates a reduction in mobility equal to the state's own policy decisions. These spillovers are mediated by peer travel and distancing behaviors in those states. A simple analytical model calibrated with our empirical estimates demonstrated that the "loss from anarchy" in uncoordinated state policies is increasing in the number of noncooperating states and the size of social and geographic spillovers. These results suggest a substantial cost of uncoordinated government responses to COVID-19 when people, ideas, and media move across borders.

Subject(s)

COVID-19/prevention & control , Coronavirus Infections/prevention & control , Cost-Benefit Analysis , Efficiency, Organizational , Logistic Models , Pandemics/prevention & control , Pneumonia, Viral/prevention & control , Quarantine/organization & administration , COVID-19/economics , Coronavirus Infections/economics , Demography/statistics & numerical data , Humans , Pandemics/economics , Physical Distancing , Pneumonia, Viral/economics , Quarantine/economics , Quarantine/methods , Social Media/statistics & numerical data , Transportation/statistics & numerical data , United States

18.

How to Illuminate the Druggable Genome Using Pharos.

Sheils, Timothy; Mathias, Stephen L; Siramshetty, Vishal B; Bocci, Giovanni; Bologa, Cristian G; Yang, Jeremy J; Waller, Anna; Southall, Noel; Nguyen, Dac-Trung; Oprea, Tudor I.

Curr Protoc Bioinformatics ; 69(1): e92, 2020 03.

Article in English | MEDLINE | ID: mdl-31898878

ABSTRACT

Pharos is an integrated web-based informatics platform for the analysis of data aggregated by the Illuminating the Druggable Genome (IDG) Knowledge Management Center, an NIH Common Fund initiative. The current version of Pharos (as of October 2019) spans 20,244 proteins in the human proteome, 19,880 disease and phenotype associations, and 226,829 ChEMBL compounds. This resource not only collates and analyzes data from over 60 high-quality resources to generate these types, but also uses text indexing to find less apparent connections between targets, and has recently begun to collaborate with institutions that generate data and resources. Proteins are ranked according to a knowledge-based classification system, which can help researchers to identify less studied "dark" targets that could be potentially further illuminated. This is an important process for both drug discovery and target validation, as more knowledge can accelerate target identification, and previously understudied proteins can serve as novel targets in drug discovery. Two basic protocols illustrate the levels of detail available for targets and several methods of finding targets of interest. An Alternate Protocol illustrates the difference in available knowledge between less and more studied targets. © 2020 by John Wiley & Sons, Inc. Basic Protocol 1: Search for a target and view details Alternate Protocol: Search for dark target and view details Basic Protocol 2: Filter a target list to get refined results.

Subject(s)

Drug Discovery , Genome , Software , Breast Neoplasms/genetics , Drug Delivery Systems , Female , Genome-Wide Association Study , Humans , Ligands , Receptors, G-Protein-Coupled/metabolism

19.

Monocrystalline Silicon Carbide Disk Resonators on Phononic Crystals with Ultra-Low Dissipation Bulk Acoustic Wave Modes.

Hamelin, Benoit; Yang, Jeremy; Daruwalla, Anosh; Wen, Haoran; Ayazi, Farrokh.

Sci Rep ; 9(1): 18698, 2019 Dec 10.

Article in English | MEDLINE | ID: mdl-31822789

ABSTRACT

Micromechanical resonators with ultra-low energy dissipation are essential for a wide range of applications, such as navigation in GPS-denied environments. Routinely implemented in silicon (Si), their energy dissipation often reaches the quantum limits of Si, which can be surpassed by using materials with lower intrinsic loss. This paper explores dissipation limits in 4H monocrystalline silicon carbide-on-insulator (4H-SiCOI) mechanical resonators fabricated at wafer-level, and reports on ultra-high quality-factors (Q) in gyroscopic-mode disk resonators. The SiC disk resonators are anchored upon an acoustically-engineered Si substrate containing a phononic crystal which suppresses anchor loss and promises QANCHOR near 1 Billion by design. Operating deep in the adiabatic regime, the bulk acoustic wave (BAW) modes of solid SiC disks are mostly free of bulk thermoelastic damping. Capacitively-transduced SiC BAW disk resonators consistently display gyroscopic m = 3 modes with Q-factors above 2 Million (M) at 6.29 MHz, limited by surface TED due to microscale roughness along the disk sidewalls. The surface TED limit is revealed by optical measurements on a SiC disk, with nanoscale smooth sidewalls, exhibiting Q = 18 M at 5.3 MHz, corresponding to f · Q = 9 · 1013 Hz, a 5-fold improvement over the Akhiezer limit of Si. Our results pave the path for integrated SiC resonators and resonant gyroscopes with Q-factors beyond the reach of Si.

20.

edge2vec: Representation learning using edge semantics for biomedical knowledge discovery.

Gao, Zheng; Fu, Gang; Ouyang, Chunping; Tsutsui, Satoshi; Liu, Xiaozhong; Yang, Jeremy; Gessner, Christopher; Foote, Brian; Wild, David; Ding, Ying; Yu, Qi.

BMC Bioinformatics ; 20(1): 306, 2019 Jun 10.

Article in English | MEDLINE | ID: mdl-31238875

ABSTRACT

BACKGROUND: Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. RESULTS: In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. CONCLUSIONS: We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.

Subject(s)

Informatics/methods , Knowledge , Learning , Algorithms , Biomedical Research , Humans , Neural Networks, Computer , Semantics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL