Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
PLoS Biol ; 19(12): e3001464, 2021 12.
Article in English | MEDLINE | ID: mdl-34871295

ABSTRACT

The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.


Subject(s)
Crowdsourcing/methods , Data Curation/methods , Molecular Sequence Annotation/methods , Amino Acid Sequence/genetics , Computational Biology/methods , Databases, Protein/trends , Humans , Literature , Proteins/metabolism , Stakeholder Participation
2.
Hum Mutat ; 40(6): 694-705, 2019 06.
Article in English | MEDLINE | ID: mdl-30840782

ABSTRACT

Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated single nucleotide polymorphism (SNP) data show that 32% of UniProtKB variants colocate with 8% of ClinVar SNPs. The majority of colocated UniProtKB disease-associated variants (86%) map to 'pathogenic' ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein, and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.


Subject(s)
Chromosome Mapping/methods , Databases, Genetic , Mutation, Missense , Proteins/chemistry , Binding Sites , Databases, Protein , Genetic Predisposition to Disease , Humans , Molecular Sequence Annotation , Polymorphism, Single Nucleotide , Protein Binding , Proteins/genetics , Proteins/metabolism , Software , Web Browser
4.
JCO Precis Oncol ; 2: 1-11, 2018 Nov.
Article in English | MEDLINE | ID: mdl-35135129

ABSTRACT

PURPOSE: We conducted usability studies on commercially available molecular diagnostic (MDX) test reports to identify strengths and weaknesses in content and form that drive clinical decision making. Given routine genomic testing in cancer medicine, oncologists must interpret MDX reports as well as evidence concerning clinical utility of biomarkers accurately for treatment or trial selection. This work aims to evaluate effectiveness of MDX reports in facilitating cancer treatment planning. METHODS: Fourteen clinicians at an academic tertiary care medical facility, with a wide range of experience in oncology and in the use of molecular testing, participated in this study. Three commercially available, widely used, Clinical Laboratory Improvement Amendments (CLIA)-certified, College of American Pathologists (CAP)-accredited test reports (labeled Laboratories A, B, and C) were used. Eye tracking, surveys, and think-aloud protocols were used to collect usability data for these MDX reports focusing on ease of comprehension and actionability. RESULTS: Clinicians found two primary areas in molecular diagnostic reports most useful for patient care: therapy options with benefit or lack of benefit to patients, including enrolling clinical trials; and pathogenic tumor molecular anomalies detected. Therapeutic implications and therapy classes such as US Food and Drug Administration-approved off-label, on-label, clinical trials were critical for decision making. However, all reports had usability and comprehension issues in these areas and could be improved. CONCLUSION: Focused usability studies can help drive our understanding of the clinical workflow for use of molecular diagnostic tests in cancer care. This in turn can have major effects on quality of care, outcomes, costs, and patient satisfaction. This study demonstrates the use of specific usability techniques (eye tracking and think-aloud protocols) to help clinical laboratories improve MDX report design in a precision oncology treatment setting.

5.
Bioinformatics ; 32(13): 2041-3, 2016 07 01.
Article in English | MEDLINE | ID: mdl-27153712

ABSTRACT

MOTIVATION: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. RESULTS: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt's curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs. AVAILABILITY AND IMPLEMENTATION: http://proteininformationresource.org/rps/viruses/ CONTACT: chenc@udel.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Databases, Protein , Proteome/analysis , Viral Proteins/analysis , Amino Acid Sequence , Cluster Analysis , Computational Biology , Knowledge Bases
6.
Article in English | MEDLINE | ID: mdl-26396508

ABSTRACT

RATIONALE: Subtypes of cigarette smoke-induced disease affect different lung structures and may have distinct pathophysiological mechanisms. OBJECTIVE: To determine if proteomic classification of the cellular and vascular origins of sputum proteins can characterize these mechanisms and phenotypes. SUBJECTS AND METHODS: Individual sputum specimens from lifelong nonsmokers (n=7) and smokers with normal lung function (n=13), mucous hypersecretion with normal lung function (n=11), obstructed airflow without emphysema (n=15), and obstruction plus emphysema (n=10) were assessed with mass spectrometry. Data reduction, logarithmic transformation of spectral counts, and Cytoscape network-interaction analysis were performed. The original 203 proteins were reduced to the most informative 50. Sources were secretory dimeric IgA, submucosal gland serous and mucous cells, goblet and other epithelial cells, and vascular permeability. RESULTS: Epithelial proteins discriminated nonsmokers from smokers. Mucin 5AC was elevated in healthy smokers and chronic bronchitis, suggesting a continuum with the severity of hypersecretion determined by mechanisms of goblet-cell hyperplasia. Obstructed airflow was correlated with glandular proteins and lower levels of Ig joining chain compared to other groups. Emphysema subjects' sputum was unique, with high plasma proteins and components of neutrophil extracellular traps, such as histones and defensins. In contrast, defensins were correlated with epithelial proteins in all other groups. Protein-network interactions were unique to each group. CONCLUSION: The proteomes were interpreted as complex "biosignatures" that suggest distinct pathophysiological mechanisms for mucin 5AC hypersecretion, airflow obstruction, and inflammatory emphysema phenotypes. Proteomic phenotyping may improve genotyping studies by selecting more homogeneous study groups. Each phenotype may require its own mechanistically based diagnostic, risk-assessment, drug- and other treatment algorithms.


Subject(s)
Bronchitis, Chronic/metabolism , Mucin 5AC/metabolism , Pulmonary Disease, Chronic Obstructive/physiopathology , Pulmonary Emphysema/metabolism , Smoking/metabolism , Sputum/metabolism , Adult , Aged , Female , Forced Expiratory Volume , Humans , Immunoglobulin A, Secretory/blood , Male , Middle Aged , Mucus/metabolism , Proteomics
7.
J Proteome Res ; 14(6): 2707-13, 2015 Jun 05.
Article in English | MEDLINE | ID: mdl-25873244

ABSTRACT

The Clinical Proteomic Tumor Analysis Consortium (CPTAC), under the auspices of the National Cancer Institute's Office of Cancer Clinical Proteomics Research, is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to clinical tumor samples with characterized genomic and transcript profiles. The consortium analyzes cancer biospecimens using mass spectrometry, identifying and quantifying the constituent proteins and characterizing each tumor sample's proteome. Mass spectrometry enables highly specific identification of proteins and their isoforms, accurate relative quantitation of protein abundance in contrasting biospecimens, and localization of post-translational protein modifications, such as phosphorylation, on a protein's sequence. The combination of proteomics, transcriptomics, and genomics data from the same clinical tumor samples provides an unprecedented opportunity for tumor proteogenomics. The CPTAC Data Portal is the centralized data repository for the dissemination of proteomic data collected by Proteome Characterization Centers (PCCs) in the consortium. The portal currently hosts 6.3 TB of data and includes proteomic investigations of breast, colorectal, and ovarian tumor tissues from The Cancer Genome Atlas (TCGA). The data collected by the consortium is made freely available to the public through the data portal.


Subject(s)
Biomedical Research , Databases, Protein , Neoplasm Proteins , Proteomics , Humans , Information Storage and Retrieval , Neoplasm Proteins/metabolism , Neoplasms/genetics , Neoplasms/metabolism
8.
Bioinformatics ; 31(6): 926-32, 2015 Mar 15.
Article in English | MEDLINE | ID: mdl-25398609

ABSTRACT

MOTIVATION: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. RESULTS: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (∼7 times shorter hit list before expansion), faster (∼6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation.


Subject(s)
Computational Biology , Databases, Protein , Dioxygenases/metabolism , Membrane Proteins/metabolism , Proteins/metabolism , Sequence Analysis, Protein , Software , AlkB Homolog 5, RNA Demethylase , Cluster Analysis , Dioxygenases/chemistry , Dioxygenases/genetics , Gene Ontology , Humans , Information Storage and Retrieval , Membrane Proteins/chemistry , Membrane Proteins/genetics , Molecular Sequence Annotation , Proteins/chemistry , Proteins/genetics
9.
BMC Immunol ; 15: 61, 2014 Dec 09.
Article in English | MEDLINE | ID: mdl-25486901

ABSTRACT

BACKGROUND: Near universal administration of vaccines mandates intense pharmacovigilance for vaccine safety and a stringently low tolerance for adverse events. Reports of autoimmune diseases (AID) following vaccination have been challenging to evaluate given the high rates of vaccination, background incidence of autoimmunity, and low incidence and variable times for onset of AID after vaccinations. In order to identify biologically plausible pathways to adverse autoimmune events of vaccine-related AID, we used a systems biology approach to create a matrix of innate and adaptive immune mechanisms active in specific diseases, responses to vaccine antigens, adjuvants, preservatives and stabilizers, for the most common vaccine-associated AID found in the Vaccine Adverse Event Reporting System. RESULTS: This report focuses on Guillain-Barre Syndrome (GBS), Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE), and Idiopathic (or immune) Thrombocytopenic Purpura (ITP). Multiple curated databases and automated text mining of PubMed literature identified 667 genes associated with RA, 448 with SLE, 49 with ITP and 73 with GBS. While all data sources provided valuable and unique gene associations, text mining using natural language processing (NLP) algorithms provided the most information but required curation to remove incorrect associations. Six genes were associated with all four AIDs. Thirty-three pathways were shared by the four AIDs. Classification of genes into twelve immune system related categories identified more "Th17 T-cell subtype" genes in RA than the other AIDs, and more "Chemokine plus Receptors" genes associated with RA than SLE. Gene networks were visualized and clustered into interconnected modules with specific gene clusters for each AID, including one in RA with ten C-X-C motif chemokines. The intersection of genes associated with GBS, GBS peptide auto-antigens, influenza A infection, and influenza vaccination created a subnetwork of genes that inferred a possible role for the MAPK signaling pathway in influenza vaccine related GBS. CONCLUSIONS: Results showing unique and common gene sets, pathways, immune system categories and functional clusters of genes in four autoimmune diseases suggest it is possible to develop molecular classifications of autoimmune and inflammatory events. Combining this information with cellular and other disease responses should greatly aid in the assessment of potential immune-mediated adverse events following vaccination.


Subject(s)
Autoimmune Diseases , Computer Simulation , Infection Control , Infections/immunology , Models, Immunological , Vaccination , Vaccines , Adaptive Immunity , Autoimmune Diseases/genetics , Autoimmune Diseases/immunology , Autoimmune Diseases/pathology , Humans , Infections/genetics , Infections/pathology , Vaccines/adverse effects , Vaccines/immunology
10.
J Am Med Inform Assoc ; 19(e1): e125-8, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22323393

ABSTRACT

Quality control and harmonization of data is a vital and challenging undertaking for any successful data coordination center and a responsibility shared between the multiple sites that produce, integrate, and utilize the data. Here we describe a coordinated effort between scientists and data managers in the Cancer Family Registries to implement a data governance infrastructure consisting of both organizational and technical solutions. The technical solution uses a rule-based validation system that facilitates error detection and correction for data centers submitting data to a central informatics database. Validation rules comprise both standard checks on allowable values and a crosscheck of related database elements for logical and scientific consistency. Evaluation over a 2-year timeframe showed a significant decrease in the number of errors in the database and a concurrent increase in data consistency and accuracy.


Subject(s)
Breast Neoplasms , Colonic Neoplasms , Databases, Factual/standards , Registries/standards , Breast Neoplasms/epidemiology , Colonic Neoplasms/epidemiology , Databases, Factual/statistics & numerical data , Humans , Quality Control , Research Design , United States
11.
Bioinformatics ; 27(8): 1190-1, 2011 Apr 15.
Article in English | MEDLINE | ID: mdl-21478197

ABSTRACT

MOTIVATION: Identifier (ID) mapping establishes links between various biological databases and is an essential first step for molecular data integration and functional annotation. ID mapping allows diverse molecular data on genes and proteins to be combined and mapped to functional pathways and ontologies. We have developed comprehensive protein-centric ID mapping services providing mappings for 90 IDs derived from databases on genes, proteins, pathways, diseases, structures, protein families, protein interaction, literature, ontologies, etc. The services are widely used and have been regularly updated since 2006. AVAILABILITY: www.uniprot.org/mappingandproteininformation-resource.org/pirwww/search/idmapping.shtml CONTACT: huang@dbi.udel.edu.


Subject(s)
Databases, Protein , Proteins/chemistry , Proteins/genetics , Software , Internet
12.
Methods Mol Biol ; 694: 323-39, 2011.
Article in English | MEDLINE | ID: mdl-21082443

ABSTRACT

High-throughput proteomic, microarray, protein interaction and other experimental methods all generate long lists of proteins and/or genes that have been identified or have varied in accumulation under the experimental conditions studied. These lists can be difficult to sort through for Biologists to make sense of. Here we describe a next step in data analysis--a bottom-up approach at data integration--starting with protein sequence identifications, mapping them to a common representation of the protein and then bringing in a wide variety of structural, functional, genetic, and disease information related to proteins derived from annotated knowledge bases and then using this information to categorize the lists using Gene Ontology (GO) terms and mappings to biological pathway databases. We illustrate with examples how this can aid in identifying important processes from large complex lists.


Subject(s)
Databases, Protein , Proteins/analysis , Proteomics/methods , Bacillus anthracis/drug effects , Bacillus anthracis/metabolism , Bacterial Proteins/analysis , Magnesium/pharmacology , Metabolic Networks and Pathways/drug effects , Salmonella typhimurium/drug effects , Salmonella typhimurium/metabolism
13.
Adv Bioinformatics ; : 423589, 2010.
Article in English | MEDLINE | ID: mdl-20369061

ABSTRACT

High-throughput "omics" technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput "omics" data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput "omics" data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied "omics" data from different laboratories to make useful connections that could lead to new biological knowledge.

14.
PLoS One ; 4(9): e7162, 2009 Sep 25.
Article in English | MEDLINE | ID: mdl-19779614

ABSTRACT

The NIAID (National Institute for Allergy and Infectious Diseases) Biodefense Proteomics program aims to identify targets for potential vaccines, therapeutics, and diagnostics for agents of concern in bioterrorism, including bacterial, parasitic, and viral pathogens. The program includes seven Proteomics Research Centers, generating diverse types of pathogen-host data, including mass spectrometry, microarray transcriptional profiles, protein interactions, protein structures and biological reagents. The Biodefense Resource Center (www.proteomicsresource.org) has developed a bioinformatics framework, employing a protein-centric approach to integrate and support mining and analysis of the large and heterogeneous data. Underlying this approach is a data warehouse with comprehensive protein + gene identifier and name mappings and annotations extracted from over 100 molecular databases. Value-added annotations are provided for key proteins from experimental findings using controlled vocabulary. The availability of pathogen and host omics data in an integrated framework allows global analysis of the data and comparisons across different experiments and organisms, as illustrated in several case studies presented here. (1) The identification of a hypothetical protein with differential gene and protein expressions in two host systems (mouse macrophage and human HeLa cells) infected by different bacterial (Bacillus anthracis and Salmonella typhimurium) and viral (orthopox) pathogens suggesting that this protein can be prioritized for additional analysis and functional characterization. (2) The analysis of a vaccinia-human protein interaction network supplemented with protein accumulation levels led to the identification of human Keratin, type II cytoskeletal 4 protein as a potential therapeutic target. (3) Comparison of complete genomes from pathogenic variants coupled with experimental information on complete proteomes allowed the identification and prioritization of ten potential diagnostic targets from Bacillus anthracis. The integrative analysis across data sets from multiple centers can reveal potential functional significance and hidden relationships between pathogen and host proteins, thereby providing a systems approach to basic understanding of pathogenicity and target identification.


Subject(s)
Computational Biology/methods , Host-Pathogen Interactions , Proteins/chemistry , Proteomics/methods , Animals , Bacillus anthracis/metabolism , Cluster Analysis , Databases, Protein , Gene Expression Profiling , Genetics , Genomics/methods , Humans , Mice , Protein Structure, Tertiary , Proteome
SELECTION OF CITATIONS
SEARCH DETAIL
...