Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
bioRxiv ; 2024 Jul 10.
Article in English | MEDLINE | ID: mdl-39026801

ABSTRACT

Defining the subset of cellular factors governing SARS-CoV-2 replication can provide critical insights into viral pathogenesis and identify targets for host-directed antiviral therapies. While a number of genetic screens have previously reported SARS-CoV-2 host dependency factors, these approaches relied on utilizing pooled genome-scale CRISPR libraries, which are biased towards the discovery of host proteins impacting early stages of viral replication. To identify host factors involved throughout the SARS-CoV-2 infectious cycle, we conducted an arrayed genome-scale siRNA screen. Resulting data were integrated with published datasets to reveal pathways supported by orthogonal datasets, including transcriptional regulation, epigenetic modifications, and MAPK signalling. The identified proviral host factors were mapped into the SARS-CoV-2 infectious cycle, including 27 proteins that were determined to impact assembly and release. Additionally, a subset of proteins were tested across other coronaviruses revealing 17 potential pan-coronavirus targets. Further studies illuminated a role for the heparan sulfate proteoglycan perlecan in SARS-CoV-2 viral entry, and found that inhibition of the non-canonical NF-kB pathway through targeting of BIRC2 restricts SARS-CoV-2 replication both in vitro and in vivo. These studies provide critical insight into the landscape of virus-host interactions driving SARS-CoV-2 replication as well as valuable targets for host-directed antivirals.

2.
bioRxiv ; 2024 May 24.
Article in English | MEDLINE | ID: mdl-38826258

ABSTRACT

This article describes the Cell Maps for Artificial Intelligence (CM4AI) project and its goals, methods, standards, current datasets, software tools , status, and future directions. CM4AI is the Functional Genomics Data Generation Project in the U.S. National Institute of Health's (NIH) Bridge2AI program. Its overarching mission is to produce ethical, AI-ready datasets of cell architecture, inferred from multimodal data collected for human cell lines, to enable transformative biomedical AI research.

3.
Nucleic Acids Res ; 52(W1): W481-W488, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38783119

ABSTRACT

In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.


Subject(s)
Drug Repositioning , Software , Drug Repositioning/methods , Humans , Internet , Drug Discovery/methods , Systems Biology/methods , Computational Biology/methods
4.
bioRxiv ; 2024 Apr 29.
Article in English | MEDLINE | ID: mdl-38746239

ABSTRACT

Advancements in genomic and proteomic technologies have powered the use of gene and protein networks ("interactomes") for understanding genotype-phenotype translation. However, the proliferation of interactomes complicates the selection of networks for specific applications. Here, we present a comprehensive evaluation of 46 current human interactomes, encompassing protein-protein interactions as well as gene regulatory, signaling, colocalization, and genetic interaction networks. Our analysis shows that large composite networks such as HumanNet, STRING, and FunCoup are most effective for identifying disease genes, while smaller networks such as DIP and SIGNOR demonstrate strong interaction prediction performance. These findings provide a benchmark for interactomes across diverse network biology applications and clarify factors that influence network performance. Furthermore, our evaluation pipeline paves the way for continued assessment of emerging and updated interaction networks in the future.

5.
bioRxiv ; 2024 Jan 30.
Article in English | MEDLINE | ID: mdl-38313267

ABSTRACT

Motivation: Molecular Regulatory Pathways (MRPs) are crucial for understanding biological functions. Knowledge Graphs (KGs) have become vital in organizing and analyzing MRPs, providing structured representations of complex biological interactions. Current tools for mining KGs from biomedical literature are inadequate in capturing complex, hierarchical relationships and contextual information about MRPs. Large Language Models (LLMs) like GPT-4 offer a promising solution, with advanced capabilities to decipher the intricate nuances of language. However, their potential for end-to-end KG construction, particularly for MRPs, remains largely unexplored. Results: We present reguloGPT, a novel GPT-4 based in-context learning prompt, designed for the end-to-end joint name entity recognition, N-ary relationship extraction, and context predictions from a sentence that describes regulatory interactions with MRPs. Our reguloGPT approach introduces a context-aware relational graph that effectively embodies the hierarchical structure of MRPs and resolves semantic inconsistencies by embedding context directly within relational edges. We created a benchmark dataset including 400 annotated PubMed titles on N6-methyladenosine (m6A) regulations. Rigorous evaluation of reguloGPT on the benchmark dataset demonstrated marked improvement over existing algorithms. We further developed a novel G-Eval scheme, leveraging GPT-4 for annotation-free performance evaluation and demonstrated its agreement with traditional annotation-based evaluations. Utilizing reguloGPT predictions on m6A-related titles, we constructed the m6A-KG and demonstrated its utility in elucidating m6A's regulatory mechanisms in cancer phenotypes across various cancers. These results underscore reguloGPT's transformative potential for extracting biological knowledge from the literature. Availability and implementation: The source code of reguloGPT, the m6A title and benchmark datasets, and m6A-KG are available at: https://github.com/Huang-AI4Medicine-Lab/reguloGPT.

6.
bioRxiv ; 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-38076945

ABSTRACT

Translating high-confidence (hc) autism spectrum disorder (ASD) genes into viable treatment targets remains elusive. We constructed a foundational protein-protein interaction (PPI) network in HEK293T cells involving 100 hcASD risk genes, revealing over 1,800 PPIs (87% novel). Interactors, expressed in the human brain and enriched for ASD but not schizophrenia genetic risk, converged on protein complexes involved in neurogenesis, tubulin biology, transcriptional regulation, and chromatin modification. A PPI map of 54 patient-derived missense variants identified differential physical interactions, and we leveraged AlphaFold-Multimer predictions to prioritize direct PPIs and specific variants for interrogation in Xenopus tropicalis and human forebrain organoids. A mutation in the transcription factor FOXP1 led to reconfiguration of DNA binding sites and altered development of deep cortical layer neurons in forebrain organoids. This work offers new insights into molecular mechanisms underlying ASD and describes a powerful platform to develop and test therapeutic strategies for many genetically-defined conditions.

7.
ArXiv ; 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-37731657

ABSTRACT

Gene set analysis is a mainstay of functional genomics, but it relies on curated databases of gene functions that are incomplete. Here we evaluate five Large Language Models (LLMs) for their ability to discover the common biological functions represented by a gene set, substantiated by supporting rationale, citations and a confidence assessment. Benchmarking against canonical gene sets from the Gene Ontology, GPT-4 confidently recovered the curated name or a more general concept (73% of cases), while benchmarking against random gene sets correctly yielded zero confidence. Gemini-Pro and Mixtral-Instruct showed ability in naming but were falsely confident for random sets, whereas Llama2-70b had poor performance overall. In gene sets derived from 'omics data, GPT-4 identified novel functions not reported by classical functional enrichment (32% of cases), which independent review indicated were largely verifiable and not hallucinations. The ability to rapidly synthesize common gene functions positions LLMs as valuable 'omics assistants.

8.
Res Sq ; 2023 Sep 18.
Article in English | MEDLINE | ID: mdl-37790547

ABSTRACT

Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI's GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in 'omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.

9.
ArXiv ; 2023 Jul 04.
Article in English | MEDLINE | ID: mdl-37332567

ABSTRACT

In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.

10.
Cell Syst ; 14(6): 447-463.e8, 2023 06 21.
Article in English | MEDLINE | ID: mdl-37220749

ABSTRACT

The DNA damage response (DDR) ensures error-free DNA replication and transcription and is disrupted in numerous diseases. An ongoing challenge is to determine the proteins orchestrating DDR and their organization into complexes, including constitutive interactions and those responding to genomic insult. Here, we use multi-conditional network analysis to systematically map DDR assemblies at multiple scales. Affinity purifications of 21 DDR proteins, with/without genotoxin exposure, are combined with multi-omics data to reveal a hierarchical organization of 605 proteins into 109 assemblies. The map captures canonical repair mechanisms and proposes new DDR-associated proteins extending to stress, transport, and chromatin functions. We find that protein assemblies closely align with genetic dependencies in processing specific genotoxins and that proteins in multiple assemblies typically act in multiple genotoxin responses. Follow-up by DDR functional readouts newly implicates 12 assembly members in double-strand-break repair. The DNA damage response assemblies map is available for interactive visualization and query (ccmi.org/ddram/).


Subject(s)
Chromatin , DNA Repair , DNA Repair/genetics , Chromatin/genetics , DNA Damage/genetics
11.
Front Bioinform ; 3: 1125949, 2023.
Article in English | MEDLINE | ID: mdl-37035036

ABSTRACT

Cytoscape is an open-source bioinformatics environment for the analysis, integration, visualization, and query of biological networks. In this perspective piece, we describe our project to bring the Cytoscape desktop application to the web while explaining our strategy in ways relevant to others in the bioinformatics community. We examine opportunities and challenges in developing bioinformatics software that spans both the desktop and web, and we describe our ongoing efforts to build a Cytoscape web application, highlighting the principles that guide our development.

12.
Bioinformatics ; 39(3)2023 03 01.
Article in English | MEDLINE | ID: mdl-36882166

ABSTRACT

MOTIVATION: The investigation of sets of genes using biological pathways is a common task for researchers and is supported by a wide variety of software tools. This type of analysis generates hypotheses about the biological processes that are active or modulated in a specific experimental context. RESULTS: The Network Data Exchange Integrated Query (NDEx IQuery) is a new tool for network and pathway-based gene set interpretation that complements or extends existing resources. It combines novel sources of pathways, integration with Cytoscape, and the ability to store and share analysis results. The NDEx IQuery web application performs multiple gene set analyses based on diverse pathways and networks stored in NDEx. These include curated pathways from WikiPathways and SIGNOR, published pathway figures from the last 27 years, machine-assembled networks using the INDRA system, and the new NCI-PID v2.0, an updated version of the popular NCI Pathway Interaction Database. NDEx IQuery's integration with MSigDB and cBioPortal now provides pathway analysis in the context of these two resources. AVAILABILITY AND IMPLEMENTATION: NDEx IQuery is available at https://www.ndexbio.org/iquery and is implemented in Javascript and Java.


Subject(s)
Computational Biology , Software , Computational Biology/methods , Protein Interaction Maps , Publications , Databases, Factual , Internet
13.
Nat Protoc ; 18(6): 1745-1759, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36653526

ABSTRACT

A longstanding goal of biomedicine is to understand how alterations in molecular and cellular networks give rise to the spectrum of human diseases. For diseases with shared etiology, understanding the common causes allows for improved diagnosis of each disease, development of new therapies and more comprehensive identification of disease genes. Accordingly, this protocol describes how to evaluate the extent to which two diseases, each characterized by a set of mapped genes, are colocalized in a reference gene interaction network. This procedure uses network propagation to measure the network 'distance' between gene sets. For colocalized diseases, the network can be further analyzed to extract common gene communities at progressive granularities. In particular, we show how to: (1) obtain input gene sets and a reference gene interaction network; (2) identify common subnetworks of genes that encompass or are in close proximity to all gene sets; (3) use multiscale community detection to identify systems and pathways represented by each common subnetwork to generate a network colocalized systems map; (4) validate identified genes and systems using a mouse variant database; and (5) visualize and further investigate select genes, interactions and systems for relevance to phenotype(s) of interest. We demonstrate the utility of this approach by identifying shared biological mechanisms underlying autism and congenital heart disease. However, this protocol is general and can be applied to any gene sets attributed to diseases or other phenotypes with suspected joint association. A typical NetColoc run takes less than an hour. Software and documentation are available at https://github.com/ucsd-ccbb/NetColoc .


Subject(s)
Gene Regulatory Networks , Software , Humans , Databases, Factual , Computational Biology/methods
14.
Viruses ; 14(3)2022 03 15.
Article in English | MEDLINE | ID: mdl-35337019

ABSTRACT

The novel coronavirus SARS-CoV-2 is responsible for the ongoing COVID-19 pandemic and has caused a major health and economic burden worldwide. Understanding how SARS-CoV-2 viral proteins behave in host cells can reveal underlying mechanisms of pathogenesis and assist in development of antiviral therapies. Here, the cellular impact of expressing SARS-CoV-2 viral proteins was studied by global proteomic analysis, and proximity biotinylation (BioID) was used to map the SARS-CoV-2 virus-host interactome in human lung cancer-derived cells. Functional enrichment analyses revealed previously reported and unreported cellular pathways that are associated with SARS-CoV-2 proteins. We have established a website to host the proteomic data to allow for public access and continued analysis of host-viral protein associations and whole-cell proteomes of cells expressing the viral-BioID fusion proteins. Furthermore, we identified 66 high-confidence interactions by comparing this study with previous reports, providing a strong foundation for future follow-up studies. Finally, we cross-referenced candidate interactors with the CLUE drug library to identify potential therapeutics for drug-repurposing efforts. Collectively, these studies provide a valuable resource to uncover novel SARS-CoV-2 biology and inform development of antivirals.


Subject(s)
COVID-19 , SARS-CoV-2 , Biotinylation , Humans , Pandemics , Proteomics
15.
Nat Biotechnol ; 40(4): 566-575, 2022 04.
Article in English | MEDLINE | ID: mdl-34992246

ABSTRACT

Phylogeny estimation (the reconstruction of evolutionary trees) has recently been applied to CRISPR-based cell lineage tracing, allowing the developmental history of an individual tissue or organism to be inferred from a large number of mutated sequences in somatic cells. However, current computational methods are not able to construct phylogenetic trees from extremely large numbers of input sequences. Here, we present a deep distributed computing framework to comprehensively trace accurate large lineages (FRACTAL) that substantially enhances the scalability of current lineage estimation software tools. FRACTAL first reconstructs only an upstream lineage of the input sequences and recursively iterates the same produce for its downstream lineages using independent computing nodes. We demonstrate the utility of FRACTAL by reconstructing lineages from >235 million simulated sequences and from >16 million cells from a simulated experiment with a CRISPR system that accumulates mutations during cell proliferation. We also successfully applied FRACTAL to evolutionary tree reconstructions and to an experiment using error-prone PCR (EP-PCR) for large-scale sequence diversification.


Subject(s)
Algorithms , Software , Cell Lineage/genetics , Mutation , Phylogeny
16.
Curr Protoc ; 1(9): e258, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34570431

ABSTRACT

NDEx, the Network Data Exchange (https://www.ndexbio.org) is a web-based resource where users can find, store, share and publish network models of any type and size. NDEx is integrated with Cytoscape, the widely used desktop application for network analysis and visualization. NDEx and Cytoscape are the pillars of the Cytoscape Ecosystem, a diverse environment of resources, tools, applications and services for network biology workflows. In this article, we introduce researchers to NDEx and highlight how it can simplify common tasks in network biology workflows as well as streamline publication and access to). Finally, we show how NDEx can be used programmatically via Python with the 'ndex2' client library, and point readers to additional examples for other popular programming languages such as JavaScript and R. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Getting started with NDEx Basic Protocol 2: Using NDEx and Cytoscape in a publication-oriented workflow Basic Protocol 3: Manipulating networks in NDEx via Python.


Subject(s)
Computational Biology , Software , Ecosystem , Humans , Workflow
17.
bioRxiv ; 2021 Sep 21.
Article in English | MEDLINE | ID: mdl-34580671

ABSTRACT

The novel coronavirus SARS-CoV-2 is responsible for the ongoing COVID-19 pandemic and has caused a major health and economic burden worldwide. Understanding how SARS-CoV-2 viral proteins behave in host cells can reveal underlying mechanisms of pathogenesis and assist in development of antiviral therapies. Here we use BioID to map the SARS-CoV-2 virus-host interactome using human lung cancer derived A549 cells expressing individual SARS-CoV-2 viral proteins. Functional enrichment analyses revealed previously reported and unreported cellular pathways that are in association with SARS-CoV-2 proteins. We have also established a website to host the proteomic data to allow for public access and continued analysis of host-viral protein associations and whole-cell proteomes of cells expressing the viral-BioID fusion proteins. Collectively, these studies provide a valuable resource to potentially uncover novel SARS-CoV-2 biology and inform development of antivirals.

18.
Science ; 374(6563): eabf3067, 2021 Oct.
Article in English | MEDLINE | ID: mdl-34591613

ABSTRACT

A major goal of cancer research is to understand how mutations distributed across diverse genes affect common cellular systems, including multiprotein complexes and assemblies. Two challenges­how to comprehensively map such systems and how to identify which are under mutational selection­have hindered this understanding. Accordingly, we created a comprehensive map of cancer protein systems integrating both new and published multi-omic interaction data at multiple scales of analysis. We then developed a unified statistical model that pinpoints 395 specific systems under mutational selection across 13 cancer types. This map, called NeST (Nested Systems in Tumors), incorporates canonical processes and notable discoveries, including a PIK3CA-actomyosin complex that inhibits phosphatidylinositol 3-kinase signaling and recurrent mutations in collagen complexes that promote tumor proliferation. These systems can be used as clinical biomarkers and implicate a total of 548 genes in cancer evolution and progression. This work shows how disparate tumor mutations converge on protein assemblies at different scales.


Subject(s)
Neoplasm Proteins/genetics , Neoplasm Proteins/metabolism , Neoplasms/genetics , Neoplasms/metabolism , Protein Interaction Maps/genetics , Genes, Neoplasm , Humans , Mutation , Protein Interaction Mapping/methods
19.
Nat Microbiol ; 6(10): 1319-1333, 2021 10.
Article in English | MEDLINE | ID: mdl-34556855

ABSTRACT

The fate of influenza A virus (IAV) infection in the host cell depends on the balance between cellular defence mechanisms and viral evasion strategies. To illuminate the landscape of IAV cellular restriction, we generated and integrated global genetic loss-of-function screens with transcriptomics and proteomics data. Our multi-omics analysis revealed a subset of both IFN-dependent and independent cellular defence mechanisms that inhibit IAV replication. Amongst these, the autophagy regulator TBC1 domain family member 5 (TBC1D5), which binds Rab7 to enable fusion of autophagosomes and lysosomes, was found to control IAV replication in vitro and in vivo and to promote lysosomal targeting of IAV M2 protein. Notably, IAV M2 was observed to abrogate TBC1D5-Rab7 binding through a physical interaction with TBC1D5 via its cytoplasmic tail. Our results provide evidence for the molecular mechanism utilised by IAV M2 protein to escape lysosomal degradation and traffic to the cell membrane, where it supports IAV budding and growth.


Subject(s)
Autophagy , Immune Evasion , Influenza A virus/physiology , Antiviral Agents/metabolism , GTPase-Activating Proteins/genetics , GTPase-Activating Proteins/metabolism , Host-Pathogen Interactions , Humans , Influenza A virus/pathogenicity , Lysosomes/metabolism , Protein Binding , Viral Matrix Proteins/metabolism , Virus Replication , rab GTP-Binding Proteins/metabolism , rab7 GTP-Binding Proteins
20.
Mol Cell ; 81(12): 2656-2668.e8, 2021 06 17.
Article in English | MEDLINE | ID: mdl-33930332

ABSTRACT

A deficient interferon (IFN) response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has been implicated as a determinant of severe coronavirus disease 2019 (COVID-19). To identify the molecular effectors that govern IFN control of SARS-CoV-2 infection, we conducted a large-scale gain-of-function analysis that evaluated the impact of human IFN-stimulated genes (ISGs) on viral replication. A limited subset of ISGs were found to control viral infection, including endosomal factors inhibiting viral entry, RNA binding proteins suppressing viral RNA synthesis, and a highly enriched cluster of endoplasmic reticulum (ER)/Golgi-resident ISGs inhibiting viral assembly/egress. These included broad-acting antiviral ISGs and eight ISGs that specifically inhibited SARS-CoV-2 and SARS-CoV-1 replication. Among the broad-acting ISGs was BST2/tetherin, which impeded viral release and is antagonized by SARS-CoV-2 Orf7a protein. Overall, these data illuminate a set of ISGs that underlie innate immune control of SARS-CoV-2/SARS-CoV-1 infection, which will facilitate the understanding of host determinants that impact disease severity and offer potential therapeutic strategies for COVID-19.


Subject(s)
Antigens, CD/genetics , Host-Pathogen Interactions/genetics , Interferon Regulatory Factors/genetics , Interferon Type I/genetics , SARS-CoV-2/genetics , Viral Proteins/genetics , Animals , Antigens, CD/chemistry , Antigens, CD/immunology , Binding Sites , Cell Line, Tumor , Chlorocebus aethiops , Endoplasmic Reticulum/genetics , Endoplasmic Reticulum/immunology , Endoplasmic Reticulum/virology , GPI-Linked Proteins/chemistry , GPI-Linked Proteins/genetics , GPI-Linked Proteins/immunology , Gene Expression Regulation , Golgi Apparatus/genetics , Golgi Apparatus/immunology , Golgi Apparatus/virology , HEK293 Cells , Host-Pathogen Interactions/immunology , Humans , Immunity, Innate , Interferon Regulatory Factors/classification , Interferon Regulatory Factors/immunology , Interferon Type I/immunology , Molecular Docking Simulation , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Interaction Domains and Motifs , SARS-CoV-2/immunology , Signal Transduction , Vero Cells , Viral Proteins/chemistry , Viral Proteins/immunology , Virus Internalization , Virus Release/genetics , Virus Release/immunology , Virus Replication/genetics , Virus Replication/immunology
SELECTION OF CITATIONS
SEARCH DETAIL
...