Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 93
Filter
Add more filters










Publication year range
1.
Database (Oxford) ; 20212021 07 28.
Article in English | MEDLINE | ID: mdl-34318869

ABSTRACT

About 10% of human proteins have no annotated function in protein knowledge bases. A workflow to generate hypotheses for the function of these uncharacterized proteins has been developed, based on predicted and experimental information on protein properties, interactions, tissular expression, subcellular localization, conservation in other organisms, as well as phenotypic data in mutant model organisms. This workflow has been applied to seven uncharacterized human proteins (C6orf118, C7orf25, CXorf58, RSRP1, SMLR1, TMEM53 and TMEM232) in the frame of a course-based undergraduate research experience named Functionathon organized at the University of Geneva to teach undergraduate students how to use biological databases and bioinformatics tools and interpret the results. C6orf118, CXorf58 and TMEM232 were proposed to be involved in cilia-related functions; TMEM53 and SMLR1 were proposed to be involved in lipid metabolism and C7orf25 and RSRP1 were proposed to be involved in RNA metabolism and gene expression. Experimental strategies to test these hypotheses were also discussed. The results of this manual data mining study may contribute to the project recently launched by the Human Proteome Organization (HUPO) Human Proteome Project aiming to fill gaps in the functional annotation of human proteins. Database URL: http://www.nextprot.org.


Subject(s)
Data Mining , Proteome , Databases, Protein , Humans , Students , Workflow
2.
Nat Commun ; 11(1): 5301, 2020 10 16.
Article in English | MEDLINE | ID: mdl-33067450

ABSTRACT

The Human Proteome Organization (HUPO) launched the Human Proteome Project (HPP) in 2010, creating an international framework for global collaboration, data sharing, quality assurance and enhancing accurate annotation of the genome-encoded proteome. During the subsequent decade, the HPP established collaborations, developed guidelines and metrics, and undertook reanalysis of previously deposited community data, continuously increasing the coverage of the human proteome. On the occasion of the HPP's tenth anniversary, we here report a 90.4% complete high-stringency human proteome blueprint. This knowledge is essential for discerning molecular processes in health and disease, as we demonstrate by highlighting potential roles the human proteome plays in our understanding, diagnosis and treatment of cancers, cardiovascular and infectious diseases.


Subject(s)
Disease/genetics , Proteome/genetics , Human Genome Project , Humans , Proteome/chemistry , Proteome/metabolism , Proteomics
3.
Cancer Res ; 80(20): 4314-4323, 2020 10 15.
Article in English | MEDLINE | ID: mdl-32641416

ABSTRACT

Spread of cancer to the brain remains an unmet clinical need in spite of the increasing number of cases among patients with lung, breast cancer, and melanoma most notably. Although research on brain metastasis was considered a minor aspect in the past due to its untreatable nature and invariable lethality, nowadays, limited but encouraging examples have questioned this statement, making it more attractive for basic and clinical researchers. Evidences of its own biological identity (i.e., specific microenvironment) and particular therapeutic requirements (i.e., presence of blood-brain barrier, blood-tumor barrier, molecular differences with the primary tumor) are thought to be critical aspects that must be functionally exploited using preclinical models. We present the coordinated effort of 19 laboratories to compile comprehensive information related to brain metastasis experimental models. Each laboratory has provided details on the cancer cell lines they have generated or characterized as being capable of forming metastatic colonies in the brain, as well as principle methodologies of brain metastasis research. The Brain Metastasis Cell Lines Panel (BrMPanel) represents the first of its class and includes information about the cell line, how tropism to the brain was established, and the behavior of each model in vivo. These and other aspects described are intended to assist investigators in choosing the most suitable cell line for research on brain metastasis. The main goal of this effort is to facilitate research on this unmet clinical need, to improve models through a collaborative environment, and to promote the exchange of information on these valuable resources.


Subject(s)
Brain Neoplasms/pathology , Brain Neoplasms/secondary , Neoplasms, Experimental/pathology , Animals , Blood-Brain Barrier/drug effects , Cell Culture Techniques/methods , Cell Line, Tumor , Humans , Mice , Rats , Tropism , Tumor Microenvironment , Xenograft Model Antitumor Assays
4.
Bioinformatics ; 36(10): 3244-3245, 2020 05 01.
Article in English | MEDLINE | ID: mdl-31985787

ABSTRACT

SUMMARY: The Feature-Viewer is a lightweight library for the visualization of biological data mapped to a protein or nucleotide sequence. It is designed for ease of use while allowing for a full customization. The library is already used by several biological data resources and allows intuitive visual mapping of a full spectra of sequence features for different usages. AVAILABILITY AND IMPLEMENTATION: The Feature-Viewer is open source, compatible with state-of-the-art development technologies and responsive, also for mobile viewing. Documentation and usage examples are available online.


Subject(s)
Computers , Software
5.
Nucleic Acids Res ; 48(D1): D261-D264, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31410491

ABSTRACT

The ABCD (for AntiBodies Chemically Defined) database is a repository of sequenced antibodies, integrating curated information about the antibody and its antigen with cross-links to standardized databases of chemical and protein entities. It is freely available to the academic community, accessible through the ExPASy server (https://web.expasy.org/abcd/). The ABCD database aims at helping to improve reproducibility in academic research by providing a unique, unambiguous identifier associated to each antibody sequence. It also allows to determine rapidly if a sequenced antibody is available for a given antigen.


Subject(s)
Antibodies/chemistry , Databases, Protein , Amino Acid Sequence , Antibodies/immunology , Antigens/chemistry , Antigens/immunology
6.
Int J Cancer ; 146(5): 1299-1306, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31444973

ABSTRACT

Despite an increased awareness of the problematic of cell line cross-contamination and misidentification, it remains nowadays a major source of erroneous experimental results in biomedical research. To prevent it, researchers are expected to frequently test the authenticity of the cell lines they are working on. STR profiling was selected as the international reference method to perform cell line authentication. While the experimental protocols and manipulations for generating a STR profile are well described, the available tools and workflows to analyze such data are lacking. The Cellosaurus knowledge resource aimed to improve the situation by compiling all the publicly available STR profiles from the literature and other databases. As a result, it grew to become the largest database in terms of human STR profiles, with 6,474 distinct cell lines having an associated STR profile (release July 31, 2019). Here we present CLASTR, the Cellosaurus STR similarity search tool enabling users to compare one or more STR profiles with those available in the Cellosaurus cell line knowledge resource. It aims to help researchers in the process of cell line authentication by providing numerous functionalities. The tool is publicly accessible on the SIB ExPASy server (https://web.expasy.org/cellosaurus-str-search) and its source code is available on GitHub under the GPL-3.0 license.


Subject(s)
Cell Line Authentication/methods , Data Mining/methods , Microsatellite Repeats/genetics , Animals , Biomarkers/analysis , Cell Line , DNA Fingerprinting , Databases, Factual , Dogs , Humans , Mice , Software
7.
Nucleic Acids Res ; 48(D1): D328-D334, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31724716

ABSTRACT

The neXtProt knowledgebase (https://www.nextprot.org) is an integrative resource providing both data on human protein and the tools to explore these. In order to provide comprehensive and up-to-date data, we evaluate and add new data sets. We describe the incorporation of three new data sets that provide expression, function, protein-protein binary interaction, post-translational modifications (PTM) and variant information. New SPARQL query examples illustrating uses of the new data were added. neXtProt has continued to develop tools for proteomics. We have improved the peptide uniqueness checker and have implemented a new protein digestion tool. Together, these tools make it possible to determine which proteases can be used to identify trypsin-resistant proteins by mass spectrometry. In terms of usability, we have finished revamping our web interface and completely rewritten our API. Our SPARQL endpoint now supports federated queries. All the neXtProt data are available via our user interface, API, SPARQL endpoint and FTP site, including the new PEFF 1.0 format files. Finally, the data on our FTP site is now CC BY 4.0 to promote its reuse.


Subject(s)
Databases, Protein , Knowledge Bases , Humans , Internet , Mass Spectrometry , Peptides/chemistry , Protein Kinases/chemistry , Protein Kinases/metabolism , Protein Processing, Post-Translational , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Sequence Analysis, RNA , Software , Trypsin , User-Computer Interface
8.
Chem Res Toxicol ; 32(9): 1733-1736, 2019 09 16.
Article in English | MEDLINE | ID: mdl-31203605

ABSTRACT

Research in toxicology relies on in vitro models such as cell lines. These living models are prone to change and may be described in publications with insufficient information or quality control testing. This article sets out recommendations to improve the reliability of cell-based research.


Subject(s)
Cell Culture Techniques/standards , Cell Line , Models, Biological , Animals , Cell Line Authentication , Humans , Quality Control , Reproducibility of Results , Toxicology/methods , Toxicology/standards
10.
Elife ; 82019 01 29.
Article in English | MEDLINE | ID: mdl-30693867

ABSTRACT

The use of misidentified and contaminated cell lines continues to be a problem in biomedical research. Research Resource Identifiers (RRIDs) should reduce the prevalence of misidentified and contaminated cell lines in the literature by alerting researchers to cell lines that are on the list of problematic cell lines, which is maintained by the International Cell Line Authentication Committee (ICLAC) and the Cellosaurus database. To test this assertion, we text-mined the methods sections of about two million papers in PubMed Central, identifying 305,161 unique cell-line names in 150,459 articles. We estimate that 8.6% of these cell lines were on the list of problematic cell lines, whereas only 3.3% of the cell lines in the 634 papers that included RRIDs were on the problematic list. This suggests that the use of RRIDs is associated with a lower reported use of problematic cell lines.


Subject(s)
Bibliometrics , Biomedical Research/standards , Cell Line Authentication/statistics & numerical data , Data Mining/methods , Cell Line , Humans , Periodicals as Topic , PubMed
11.
J Proteome Res ; 17(12): 4160-4170, 2018 12 07.
Article in English | MEDLINE | ID: mdl-30175587

ABSTRACT

The practice of data sharing in the proteomics field took off and quickly spread in recent years as a result of collective effort. Nowadays, most journal editors mandate the submission of the original raw mass spectra to one of the databases of the ProteomeXchange consortium. With the exception of large institutional initiatives such as PeptideAtlas or the GPMDB, few new studies are however based on the reanalysis of mass spectrometry data. A wealth of information is thus left unexploited in public databases and repositories. Here, we present the large-scale reanalysis of 41 publicly available data sets corresponding to experiments carried out on the HeLa cancer cell line using a custom workflow. In addition to the search of new post-translational modification sites and "missing proteins", our main goal is to identify single amino acid variants and evaluate their impact on protein expression and stability through the spectral counting quantification approach. The X!Tandem software was selected to perform the search of a total of 56 363 701 tandem mass spectra against a customized variant protein database, compiled by the application of the in-house MzVar tool on HeLa-specific somatic and genomic variants retrieved from the COSMIC cell line project. After filtering the resulting identifications with a 1% FDR threshold computed at the protein level, 49 466 unique peptides were identified in 7266 protein entries, allowing the validation of 5576 protein entries in accordance with the HPP guidelines version 2.1. A new "missing protein" was observed (FRAT2, NX_O75474, chromosome 10), and 189 new phosphorylation and 392 new protein N-terminal acetylation sites could be identified. Twenty-four variant peptides were also identified, corresponding to 21 variants in 21 proteins. For three of the nine heterozygous cases where both the variant peptide and its wild-type counterpart were detected, the application of a two-tailed sign test showed a significant difference in the abundance of the two peptide versions.


Subject(s)
Databases, Protein , Genetic Variation , Protein Processing, Post-Translational , Proteome/analysis , Acetylation , Amino Acid Sequence , Cell Line, Tumor , HeLa Cells , Humans , Phosphorylation , Proteomics/methods , Software
12.
J Proteome Res ; 17(12): 4211-4226, 2018 12 07.
Article in English | MEDLINE | ID: mdl-30191714

ABSTRACT

20,230 protein-coding genes have been predicted from the analysis of the human genome (neXtProt release 2018-01-17), and about 10% of them are still lacking functional annotation, either predicted by bioinformatics tools or captured from experimental reports. A systematic exploration of the available literature on uncharacterized human genes/proteins led to proposal of functional annotations for 113 proteins and to consolidation of a list of 1,862 uncharacterized human proteins. The advanced search functionality of neXtProt was used extensively in order to examine the landscape of the uncharacterized human proteome in terms of subcellular locations, protein-protein interactions, tissue expression, association with diseases, and 3D structure. Finally, a deep data mining in various publicly available resources allowed building functional hypotheses for 26 uncharacterized human proteins validated at protein level (uPE1). These hypotheses cover the fields of cilia biology, male reproduction, metabolism, nervous system, immunity, inflammation, RNA metabolism, and chromatin biology. They will require experimental validation before they can be considered for annotation. Despite technological progresses, the pace of human protein characterization studies is still slow. It could be accelerated by a better integration of existing knowledge resources and by initiating large collaborative projects involving specialists of different biology fields. We hope that our analysis will contribute to set up the ground for such collaborative approaches and will be exploited by the HUPO Human Proteome Project teams committed to characterize uPE1 proteins.


Subject(s)
Molecular Sequence Annotation , Proteome/genetics , Computational Biology , Data Mining , Genome, Human/genetics , Humans , Methods , Proteome/analysis
13.
Hum Genomics ; 12(1): 36, 2018 07 11.
Article in English | MEDLINE | ID: mdl-29996917

ABSTRACT

BACKGROUND: Germline pathogenic variants in the breast cancer type 1 susceptibility gene BRCA1 are associated with a 60% lifetime risk for breast and ovarian cancer. This overall risk estimate is for all BRCA1 variants; obviously, not all variants confer the same risk of developing a disease. In cancer patients, loss of BRCA1 function in tumor tissue has been associated with an increased sensitivity to platinum agents and to poly-(ADP-ribose) polymerase (PARP) inhibitors. For clinical management of both at-risk individuals and cancer patients, it would be important that each identified genetic variant be associated with clinical significance. Unfortunately for the vast majority of variants, the clinical impact is unknown. The availability of results from studies assessing the impact of variants on protein function may provide insight of crucial importance. RESULTS AND CONCLUSION: We have collected, curated, and structured the molecular and cellular phenotypic impact of 3654 distinct BRCA1 variants. The data was modeled in triple format, using the variant as a subject, the studied function as the object, and a predicate describing the relation between the two. Each annotation is supported by a fully traceable evidence. The data was captured using standard ontologies to ensure consistency, and enhance searchability and interoperability. We have assessed the extent to which functional defects at the molecular and cellular levels correlate with the clinical interpretation of variants by ClinVar submitters. Approximately 30% of the ClinVar BRCA1 missense variants have some molecular or cellular assay available in the literature. Pathogenic variants (as assigned by ClinVar) have at least some significant functional defect in 94% of testable cases. For benign variants, 77% of ClinVar benign variants, for which neXtProt Cancer variant portal has data, shows either no or mild experimental functional defects. While this does not provide evidence for clinical interpretation of variants, it may provide some guidance for variants of unknown significance, in the absence of more reliable data. The neXtProt Cancer variant portal ( https://www.nextprot.org/portals/breast-cancer ) contains over 6300 observations at the molecular and/or cellular level for BRCA1 variants.


Subject(s)
BRCA1 Protein/genetics , Breast Neoplasms/genetics , Genetic Predisposition to Disease , Ovarian Neoplasms/genetics , Adult , Aged , BRCA1 Protein/chemistry , Breast Neoplasms/pathology , Computational Biology , Female , Genetic Variation , Germ-Line Mutation/genetics , Humans , Middle Aged , Ovarian Neoplasms/pathology , Protein Conformation
14.
J Biomol Tech ; 29(2): 25-38, 2018 07.
Article in English | MEDLINE | ID: mdl-29805321

ABSTRACT

The Cellosaurus is a knowledge resource on cell lines. It aims to describe all cell lines used in biomedical research. Its scope encompasses both vertebrates and invertebrates. Currently, information for >100,000 cell lines is provided. For each cell line, it provides a wealth of information, cross-references, and literature citations. The Cellosaurus is available on the ExPASy server (https://web.expasy.org/cellosaurus/) and can be downloaded in a variety of formats. Among its many uses, the Cellosaurus is a key resource to help researchers identify potentially contaminated/misidentified cell lines, thus contributing to improving the quality of research in the life sciences.


Subject(s)
Cell Line/classification , Computational Biology/methods , Databases, Factual , Humans , Software
15.
Cancers (Basel) ; 10(3)2018 Mar 01.
Article in English | MEDLINE | ID: mdl-29494549

ABSTRACT

Protein kinases are a large family of enzymes catalyzing protein phosphorylation. The human genome contains 518 protein kinase genes, 478 of which belong to the classical protein kinase family and 40 are atypical protein kinases [...].

16.
Stem Cell Reports ; 10(1): 1-6, 2018 01 09.
Article in English | MEDLINE | ID: mdl-29320760

ABSTRACT

Unambiguous cell line authentication is essential to avoid loss of association between data and cells. The risk for loss of references increases with the rapidity that new human pluripotent stem cell (hPSC) lines are generated, exchanged, and implemented. Ideally, a single name should be used as a generally applied reference for each cell line to access and unify cell-related information across publications, cell banks, cell registries, and databases and to ensure scientific reproducibility. We discuss the needs and requirements for such a unique identifier and implement a standard nomenclature for hPSCs, which can be automatically generated and registered by the human pluripotent stem cell registry (hPSCreg). To avoid ambiguities in PSC-line referencing, we strongly urge publishers to demand registration and use of the standard name when publishing research based on hPSC lines.


Subject(s)
Biological Specimen Banks , Databases, Factual , Pluripotent Stem Cells , Registries , Terminology as Topic , Humans
18.
Hum Mutat ; 38(5): 485-493, 2017 05.
Article in English | MEDLINE | ID: mdl-28168870

ABSTRACT

Voltage-gated sodium channels are pore-forming transmembrane proteins that selectively allow sodium ions to flow across the plasma membrane according to the electro-chemical gradient thus mediating the rising phase of action potentials in excitable cells and playing key roles in physiological processes such as neurotransmission, skeletal muscle contraction, heart rhythm, and pain sensation. Genetic variations in the nine human genes encoding these channels are known to cause a large range of diseases affecting the nervous and cardiac systems. Understanding the molecular effect of genetic variations is critical for elucidating the pathologic mechanisms of known variations and in predicting the effect of newly discovered ones. To this end, we have created a Web-based tool, the Ion Channels Variants Portal, which compiles all variants characterized functionally in the human sodium channel genes. This portal describes 672 variants each associated with at least one molecular or clinical phenotypic impact, for a total of 4,658 observations extracted from 264 different research articles. These data were captured as structured annotations using standardized vocabularies and ontologies, such as the Gene Ontology and the Ion Channel ElectroPhysiology Ontology. All these data are available to the scientific community via neXtProt at https://www.nextprot.org/portals/navmut.


Subject(s)
Computational Biology , Databases, Genetic , Mutation , Voltage-Gated Sodium Channels/genetics , Voltage-Gated Sodium Channels/metabolism , Animals , Computational Biology/methods , Electrophysiological Phenomena/genetics , Genetic Association Studies , Genetic Predisposition to Disease , Genotype , Humans , Molecular Sequence Annotation , Phenotype , Protein Domains , Severity of Illness Index , Software , Voltage-Gated Sodium Channels/chemistry , Web Browser
19.
Nucleic Acids Res ; 45(D1): D177-D182, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899619

ABSTRACT

The neXtProt human protein knowledgebase (https://www.nextprot.org) continues to add new content and tools, with a focus on proteomics and genetic variation data. neXtProt now has proteomics data for over 85% of the human proteins, as well as new tools tailored to the proteomics community.Moreover, the neXtProt release 2016-08-25 includes over 8000 phenotypic observations for over 4000 variations in a number of genes involved in hereditary cancers and channelopathies. These changes are presented in the current neXtProt update. All of the neXtProt data are available via our user interface and FTP site. We also provide an API access and a SPARQL endpoint for more technical applications.


Subject(s)
Databases, Protein , Proteomics , Genetic Association Studies , Genetic Variation , Humans , Internet , Phenotype , Proteomics/methods , Software , Web Browser
20.
J Proteome Res ; 15(11): 3971-3978, 2016 11 04.
Article in English | MEDLINE | ID: mdl-27487287

ABSTRACT

Within the C-HPP, the Swiss and French teams are responsible for the annotation of proteins from chromosomes 2 and 14, respectively. neXtProt currently reports 1231 entries on chromosome 2 and 624 entries on chromosome 14; of these, 134 and 93 entries are still not experimentally validated and are thus considered as "missing proteins" (PE2-4), respectively. Among these entries, some may never be validated by conventional MS/MS approaches because of incompatible biochemical features. Others have already been validated but are still awaiting annotation. On the basis of information retrieved from the literature and from three of the main C-HPP resources (Human Protein Atlas, PeptideAtlas, and neXtProt), a subset of 40 theoretically detectable missing proteins (25 on chromosome 2 and 15 on chromosome 14) was defined for upcoming targeted studies in sperm samples. This list is proposed as a roadmap for the French and Swiss teams in the near future.


Subject(s)
Chromosomes, Human, Pair 14 , Chromosomes, Human, Pair 2 , Proteome/analysis , Computational Biology/trends , Data Mining/trends , Databases, Protein , France , Humans , Male , Spermatozoa/chemistry , Switzerland , Tandem Mass Spectrometry/standards
SELECTION OF CITATIONS
SEARCH DETAIL
...