Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46
Filter
Add more filters










Publication year range
1.
Cell Rep Methods ; 3(9): 100580, 2023 09 25.
Article in English | MEDLINE | ID: mdl-37703883

ABSTRACT

Human biology is rooted in highly specialized cell types programmed by a common genome, 98% of which is outside of genes. Genetic variation in the enormous noncoding space is linked to the majority of disease risk. To address the problem of linking these variants to expression changes in primary human cells, we introduce ExPectoSC, an atlas of modular deep-learning-based models for predicting cell-type-specific gene expression directly from sequence. We provide models for 105 primary human cell types covering 7 organ systems, demonstrate their accuracy, and then apply them to prioritize relevant cell types for complex human diseases. The resulting atlas of sequence-based gene expression and variant effects is publicly available in a user-friendly interface and readily extensible to any primary cell types. We demonstrate the accuracy of our approach through systematic evaluations and apply the models to prioritize ClinVar clinical variants of uncertain significance, verifying our top predictions experimentally.


Subject(s)
Ascomycota , Humans , Gene Expression/genetics
2.
Sci Adv ; 9(21): eadg5702, 2023 05 26.
Article in English | MEDLINE | ID: mdl-37235661

ABSTRACT

Genome-wide phenotypic screens in the budding yeast Saccharomyces cerevisiae, enabled by its knockout collection, have produced the largest, richest, and most systematic phenotypic description of any organism. However, integrative analyses of this rich data source have been virtually impossible because of the lack of a central data repository and consistent metadata annotations. Here, we describe the aggregation, harmonization, and analysis of ~14,500 yeast knockout screens, which we call Yeast Phenome. Using this unique dataset, we characterized two unknown genes (YHR045W and YGL117W) and showed that tryptophan starvation is a by-product of many chemical treatments. Furthermore, we uncovered an exponential relationship between phenotypic similarity and intergenic distance, which suggests that gene positions in both yeast and human genomes are optimized for function.


Subject(s)
Saccharomyces cerevisiae , Humans , Saccharomyces cerevisiae/genetics
3.
Database (Oxford) ; 20222022 10 05.
Article in English | MEDLINE | ID: mdl-36197453

ABSTRACT

The coronavirus disease 2019 (COVID-19) pandemic has compelled biomedical researchers to communicate data in real time to establish more effective medical treatments and public health policies. Nontraditional sources such as preprint publications, i.e. articles not yet validated by peer review, have become crucial hubs for the dissemination of scientific results. Natural language processing (NLP) systems have been recently developed to extract and organize COVID-19 data in reasoning systems. Given this scenario, the BioCreative COVID-19 text mining tool interactive demonstration track was created to assess the landscape of the available tools and to gauge user interest, thereby providing a two-way communication channel between NLP system developers and potential end users. The goal was to inform system designers about the performance and usability of their products and to suggest new additional features. Considering the exploratory nature of this track, the call for participation solicited teams to apply for the track, based on their system's ability to perform COVID-19-related tasks and interest in receiving user feedback. We also recruited volunteer users to test systems. Seven teams registered systems for the track, and >30 individuals volunteered as test users; these volunteer users covered a broad range of specialties, including bench scientists, bioinformaticians and biocurators. The users, who had the option to participate anonymously, were provided with written and video documentation to familiarize themselves with the NLP tools and completed a survey to record their evaluation. Additional feedback was also provided by NLP system developers. The track was well received as shown by the overall positive feedback from the participating teams and the users. Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-4/.


Subject(s)
COVID-19 , COVID-19/epidemiology , Data Mining/methods , Databases, Factual , Documentation , Humans , Natural Language Processing
4.
Protein Sci ; 30(1): 187-200, 2021 01.
Article in English | MEDLINE | ID: mdl-33070389

ABSTRACT

The BioGRID (Biological General Repository for Interaction Datasets, thebiogrid.org) is an open-access database resource that houses manually curated protein and genetic interactions from multiple species including yeast, worm, fly, mouse, and human. The ~1.93 million curated interactions in BioGRID can be used to build complex networks to facilitate biomedical discoveries, particularly as related to human health and disease. All BioGRID content is curated from primary experimental evidence in the biomedical literature, and includes both focused low-throughput studies and large high-throughput datasets. BioGRID also captures protein post-translational modifications and protein or gene interactions with bioactive small molecules including many known drugs. A built-in network visualization tool combines all annotations and allows users to generate network graphs of protein, genetic and chemical interactions. In addition to general curation across species, BioGRID undertakes themed curation projects in specific aspects of cellular regulation, for example the ubiquitin-proteasome system, as well as specific disease areas, such as for the SARS-CoV-2 virus that causes COVID-19 severe acute respiratory syndrome. A recent extension of BioGRID, named the Open Repository of CRISPR Screens (ORCS, orcs.thebiogrid.org), captures single mutant phenotypes and genetic interactions from published high throughput genome-wide CRISPR/Cas9-based genetic screens. BioGRID-ORCS contains datasets for over 1,042 CRISPR screens carried out to date in human, mouse and fly cell lines. The biomedical research community can freely access all BioGRID data through the web interface, standardized file downloads, or via model organism databases and partner meta-databases.


Subject(s)
COVID-19/genetics , Databases, Factual , Protein Interaction Mapping , Proteins/genetics , Animals , COVID-19/virology , Humans , Mice , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity , User-Computer Interface
5.
Neuron ; 107(5): 821-835.e12, 2020 09 09.
Article in English | MEDLINE | ID: mdl-32603655

ABSTRACT

A major obstacle to treating Alzheimer's disease (AD) is our lack of understanding of the molecular mechanisms underlying selective neuronal vulnerability, a key characteristic of the disease. Here, we present a framework integrating high-quality neuron-type-specific molecular profiles across the lifetime of the healthy mouse, which we generated using bacTRAP, with postmortem human functional genomics and quantitative genetics data. We demonstrate human-mouse conservation of cellular taxonomy at the molecular level for neurons vulnerable and resistant in AD, identify specific genes and pathways associated with AD neuropathology, and pinpoint a specific functional gene module underlying selective vulnerability, enriched in processes associated with axonal remodeling, and affected by amyloid accumulation and aging. We have made all cell-type-specific profiles and functional networks available at http://alz.princeton.edu. Overall, our study provides a molecular framework for understanding the complex interplay between Aß, aging, and neurodegeneration within the most vulnerable neurons in AD.


Subject(s)
Alzheimer Disease/pathology , Gene Expression Profiling/methods , Machine Learning , Neurons/pathology , Transcriptome , Aging/genetics , Aging/pathology , Alzheimer Disease/genetics , Animals , Gene Regulatory Networks/physiology , Humans , Mice
6.
Plant Physiol ; 179(4): 1893-1907, 2019 04.
Article in English | MEDLINE | ID: mdl-30679268

ABSTRACT

Determining the complete Arabidopsis (Arabidopsis thaliana) protein-protein interaction network is essential for understanding the functional organization of the proteome. Numerous small-scale studies and a couple of large-scale ones have elucidated a fraction of the estimated 300,000 binary protein-protein interactions in Arabidopsis. In this study, we provide evidence that a docking algorithm has the ability to identify real interactions using both experimentally determined and predicted protein structures. We ranked 0.91 million interactions generated by all possible pairwise combinations of 1,346 predicted structure models from an Arabidopsis predicted "structure-ome" and found a significant enrichment of real interactions for the top-ranking predicted interactions, as shown by cosubcellular enrichment analysis and yeast two-hybrid validation. Our success rate for computationally predicted, structure-based interactions was 63% of the success rate for published interactions naively tested using the yeast two-hybrid system and 2.7 times better than for randomly picked pairs of proteins. This study provides another perspective in interactome exploration and biological network reconstruction using protein structural information. We have made these interactions freely accessible through an improved Arabidopsis Interactions Viewer and have created community tools for accessing these and ∼2.8 million other protein-protein and protein-DNA interactions for hypothesis generation by researchers worldwide. The Arabidopsis Interactions Viewer is freely available at http://bar.utoronto.ca/interactions2/.


Subject(s)
Arabidopsis Proteins/chemistry , Arabidopsis/metabolism , Protein Interaction Maps , Software , Algorithms , Arabidopsis/genetics , Arabidopsis Proteins/metabolism , Models, Molecular , Molecular Docking Simulation , Proteome , Two-Hybrid System Techniques
7.
Cell Syst ; 8(2): 152-162.e6, 2019 02 27.
Article in English | MEDLINE | ID: mdl-30685436

ABSTRACT

A key challenge for the diagnosis and treatment of complex human diseases is identifying their molecular basis. Here, we developed a unified computational framework, URSAHD (Unveiling RNA Sample Annotation for Human Diseases), that leverages machine learning and the hierarchy of anatomical relationships present among diseases to integrate thousands of clinical gene expression profiles and identify molecular characteristics specific to each of the hundreds of complex diseases. URSAHD can distinguish between closely related diseases more accurately than literature-validated genes or traditional differential-expression-based computational approaches and is applicable to any disease, including rare and understudied ones. We demonstrate the utility of URSAHD in classifying related nervous system cancers and experimentally verifying novel neuroblastoma-associated genes identified by URSAHD. We highlight the applications for potential targeted drug-repurposing and for quantitatively assessing the molecular response to clinical therapies. URSAHD is freely available for public use, including the use of underlying models, at ursahd.princeton.edu.


Subject(s)
Gene Expression Profiling/methods , Genomics/methods , Machine Learning/standards , Transcriptome/genetics , Humans
8.
Nucleic Acids Res ; 47(D1): D529-D541, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30476227

ABSTRACT

The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the curation and archival storage of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2018 (build 3.4.164), BioGRID contains records for 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species, as classified by an updated set of controlled vocabularies for experimental detection methods. BioGRID also houses records for >700 000 post-translational modification sites. BioGRID now captures chemical interaction data, including chemical-protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature. A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene-phenotype and gene-gene relationships. An extension of the BioGRID resource called the Open Repository for CRISPR Screens (ORCS) database (https://orcs.thebiogrid.org) currently contains over 500 genome-wide screens carried out in human or mouse cell lines. All data in BioGRID is made freely available without restriction, is directly downloadable in standard formats and can be readily incorporated into existing applications via our web service platforms. BioGRID data are also freely distributed through partner model organism databases and meta-databases.


Subject(s)
Databases, Factual , Animals , CRISPR-Cas Systems , Data Curation , Drug Discovery , Genes , Humans , Mice , Protein Interaction Mapping
9.
Article in English | MEDLINE | ID: mdl-28077563

ABSTRACT

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein-protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report.Database URL: http://bioc.sourceforge.net/BioC-BioGRID.html.


Subject(s)
Data Curation/methods , Data Mining/methods , Databases, Genetic , Proteins/genetics , Proteins/metabolism
10.
Nucleic Acids Res ; 45(D1): D369-D379, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27980099

ABSTRACT

The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical-protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases.


Subject(s)
Computational Biology , Databases, Genetic , Proteins , Animals , Computational Biology/methods , Data Curation , Data Mining , Humans , Protein Interaction Mapping , Protein Interaction Maps , Protein Processing, Post-Translational , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Software
11.
Article in English | MEDLINE | ID: mdl-27589962

ABSTRACT

BioC is a simple XML format for text, annotations and relations, and was developed to achieve interoperability for biomedical text processing. Following the success of BioC in BioCreative IV, the BioCreative V BioC track addressed a collaborative task to build an assistant system for BioGRID curation. In this paper, we describe the framework of the collaborative BioC task and discuss our findings based on the user survey. This track consisted of eight subtasks including gene/protein/organism named entity recognition, protein-protein/genetic interaction passage identification and annotation visualization. Using BioC as their data-sharing and communication medium, nine teams, world-wide, participated and contributed either new methods or improvements of existing tools to address different subtasks of the BioC track. Results from different teams were shared in BioC and made available to other teams as they addressed different subtasks of the track. In the end, all submitted runs were merged using a machine learning classifier to produce an optimized output. The biocurator assistant system was evaluated by four BioGRID curators in terms of practical usability. The curators' feedback was overall positive and highlighted the user-friendly design and the convenient gene/protein curation tool based on text mining.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-1-bioc/.


Subject(s)
Data Curation/methods , Data Mining/methods , Electronic Data Processing/methods , Information Dissemination/methods
12.
Genome Res ; 26(5): 670-80, 2016 05.
Article in English | MEDLINE | ID: mdl-26975778

ABSTRACT

We can now routinely identify coding variants within individual human genomes. A pressing challenge is to determine which variants disrupt the function of disease-associated genes. Both experimental and computational methods exist to predict pathogenicity of human genetic variation. However, a systematic performance comparison between them has been lacking. Therefore, we developed and exploited a panel of 26 yeast-based functional complementation assays to measure the impact of 179 variants (101 disease- and 78 non-disease-associated variants) from 22 human disease genes. Using the resulting reference standard, we show that experimental functional assays in a 1-billion-year diverged model organism can identify pathogenic alleles with significantly higher precision and specificity than current computational methods.


Subject(s)
Genetic Complementation Test/methods , Genetic Diseases, Inborn , Saccharomyces cerevisiae , Transcription, Genetic , Genetic Diseases, Inborn/genetics , Genetic Diseases, Inborn/metabolism , Humans , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism
13.
Cold Spring Harb Protoc ; 2016(1): pdb.prot088880, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26729909

ABSTRACT

The BioGRID database is an extensive repository of curated genetic and protein interactions for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe, and the yeast Candida albicans SC5314, as well as for several other model organisms and humans. This protocol describes how to use the BioGRID website to query genetic or protein interactions for any gene of interest, how to visualize the associated interactions using an embedded interactive network viewer, and how to download data files for either selected interactions or the entire BioGRID interaction data set.


Subject(s)
Databases, Genetic , Fungal Proteins/genetics , Fungal Proteins/metabolism , Gene Regulatory Networks , Animals , Internet , Protein Interaction Mapping , Yeasts/metabolism
14.
Cold Spring Harb Protoc ; 2016(1): pdb.top080754, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26729913

ABSTRACT

The Biological General Repository for Interaction Datasets (BioGRID) is a freely available public database that provides the biological and biomedical research communities with curated protein and genetic interaction data. Structured experimental evidence codes, an intuitive search interface, and visualization tools enable the discovery of individual gene, protein, or biological network function. BioGRID houses interaction data for the major model organism species--including yeast, nematode, fly, zebrafish, mouse, and human--with particular emphasis on the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe as pioneer eukaryotic models for network biology. BioGRID has achieved comprehensive curation coverage of the entire literature for these two major yeast models, which is actively maintained through monthly curation updates. As of September 2015, BioGRID houses approximately 335,400 biological interactions for budding yeast and approximately 67,800 interactions for fission yeast. BioGRID also supports an integrated posttranslational modification (PTM) viewer that incorporates more than 20,100 yeast phosphorylation sites curated through its sister database, the PhosphoGRID.


Subject(s)
Databases, Genetic/statistics & numerical data , Gene Regulatory Networks , Protein Interaction Mapping , Animals , Humans , Saccharomyces cerevisiae , Saccharomyces cerevisiae Proteins , Yeasts/genetics , Yeasts/metabolism
15.
Mol Biol Cell ; 26(14): 2575-8, 2015 Jul 15.
Article in English | MEDLINE | ID: mdl-26174066

ABSTRACT

"Big Data" has surpassed "systems biology" and "omics" as the hottest buzzword in the biological sciences, but is there any substance behind the hype? Certainly, we have learned about various aspects of cell and molecular biology from the many individual high-throughput data sets that have been published in the past 15-20 years. These data, although useful as individual data sets, can provide much more knowledge when interrogated with Big Data approaches, such as applying integrative methods that leverage the heterogeneous data compendia in their entirety. Here we discuss the benefits and challenges of such Big Data approaches in biology and how cell and molecular biologists can best take advantage of them.


Subject(s)
Molecular Biology , Systems Biology/methods
16.
Nat Genet ; 47(6): 569-76, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25915600

ABSTRACT

Tissue and cell-type identity lie at the core of human physiology and disease. Understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation, identify the changing functional roles of genes across tissues and illuminate relationships among diseases. We introduce NetWAS, which combines genes with nominally significant genome-wide association study (GWAS) P values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT, provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes that shape specialized cellular functions across more than a hundred human tissues and cell types.


Subject(s)
Gene Regulatory Networks , Protein Interaction Maps , Alzheimer Disease/genetics , Alzheimer Disease/metabolism , Bayes Theorem , Cells, Cultured , Gene Expression Regulation , Gene Ontology , Genome-Wide Association Study , Humans , Hypertension/genetics , Hypertension/metabolism , Models, Biological , Myocytes, Smooth Muscle/physiology , Organ Specificity , Parkinson Disease/genetics , Parkinson Disease/metabolism
17.
Nucleic Acids Res ; 43(Database issue): D470-8, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25428363

ABSTRACT

The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of September 2014, the BioGRID contains 749,912 interactions as drawn from 43,149 publications that represent 30 model organisms. This interaction count represents a 50% increase compared to our previous 2013 BioGRID update. BioGRID data are freely distributed through partner model organism databases and meta-databases and are directly downloadable in a variety of formats. In addition to general curation of the published literature for the major model species, BioGRID undertakes themed curation projects in areas of particular relevance for biomedical sciences, such as the ubiquitin-proteasome system and various human disease-associated interaction networks. BioGRID curation is coordinated through an Interaction Management System (IMS) that facilitates the compilation interaction records through structured evidence codes, phenotype ontologies, and gene annotation. The BioGRID architecture has been improved in order to support a broader range of interaction and post-translational modification types, to allow the representation of more complex multi-gene/protein interactions, to account for cellular phenotypes through structured ontologies, to expedite curation through semi-automated text-mining approaches, and to enhance curation quality control.


Subject(s)
Databases, Genetic , Gene Regulatory Networks , Protein Interaction Mapping , Arachidonic Acid/metabolism , Disease/genetics , Humans , Internet
18.
Database (Oxford) ; 2013: bat026, 2013.
Article in English | MEDLINE | ID: mdl-23674503

ABSTRACT

PhosphoGRID is an online database that curates and houses experimentally verified in vivo phosphorylation sites in the Saccharomyces cerevisiae proteome (www.phosphogrid.org). Phosphosites are annotated with specific protein kinases and/or phosphatases, along with the condition(s) under which the phosphorylation occurs and/or the effects on protein function. We report here an updated data set, including nine additional high-throughput (HTP) mass spectrometry studies. The version 2.0 data set contains information on 20 177 unique phosphorylated residues, representing a 4-fold increase from version 1.0, and includes 1614 unique phosphosites derived from focused low-throughput (LTP) studies. The overlap between HTP and LTP studies represents only ∼3% of the total unique sites, but importantly 45% of sites from LTP studies with defined function were discovered in at least two independent HTP studies. The majority of new phosphosites in this update occur on previously documented proteins, suggesting that coverage of phosphoproteins in the yeast proteome is approaching saturation. We will continue to update the PhosphoGRID data set, with the expectation that the integration of information from LTP and HTP studies will enable the development of predictive models of phosphorylation-based signaling networks. Database URL: http://www.phosphogrid.org/


Subject(s)
Databases, Protein , Phosphoproteins/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , High-Throughput Screening Assays , Phosphorylation , Proteome/metabolism , Signal Transduction
20.
Nat Biotechnol ; 31(1): 34-5, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23302932
SELECTION OF CITATIONS
SEARCH DETAIL
...