Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
bioRxiv ; 2024 May 24.
Article in English | MEDLINE | ID: mdl-38826258

ABSTRACT

This article describes the Cell Maps for Artificial Intelligence (CM4AI) project and its goals, methods, standards, current datasets, software tools , status, and future directions. CM4AI is the Functional Genomics Data Generation Project in the U.S. National Institute of Health's (NIH) Bridge2AI program. Its overarching mission is to produce ethical, AI-ready datasets of cell architecture, inferred from multimodal data collected for human cell lines, to enable transformative biomedical AI research.

2.
Article in English | MEDLINE | ID: mdl-38748859

ABSTRACT

While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research.

3.
bioRxiv ; 2024 Apr 29.
Article in English | MEDLINE | ID: mdl-38746239

ABSTRACT

Advancements in genomic and proteomic technologies have powered the use of gene and protein networks ("interactomes") for understanding genotype-phenotype translation. However, the proliferation of interactomes complicates the selection of networks for specific applications. Here, we present a comprehensive evaluation of 46 current human interactomes, encompassing protein-protein interactions as well as gene regulatory, signaling, colocalization, and genetic interaction networks. Our analysis shows that large composite networks such as HumanNet, STRING, and FunCoup are most effective for identifying disease genes, while smaller networks such as DIP and SIGNOR demonstrate strong interaction prediction performance. These findings provide a benchmark for interactomes across diverse network biology applications and clarify factors that influence network performance. Furthermore, our evaluation pipeline paves the way for continued assessment of emerging and updated interaction networks in the future.

4.
Methods Mol Biol ; 2500: 67-81, 2022.
Article in English | MEDLINE | ID: mdl-35657588

ABSTRACT

Proteoform Suite is an interactive software program for the identification and quantification of intact proteoforms from mass spectrometry data. Proteoform Suite identifies proteoforms observed by intact-mass (MS1) analysis. In intact-mass analysis, unfragmented experimental proteoforms are compared to a database of known proteoform sequences and to one another, searching for mass differences corresponding to well-known post-translational modifications or amino acids. Intact-mass analysis enables proteoforms observed in the MS1 data without MS/MS (MS2) fragmentation to be identified. Proteoform Suite further facilitates the construction and visualization of proteoform families, which are the sets of proteoforms derived from individual genes. Bottom-up peptide identifications and top-down (MS2) proteoform identifications can be integrated into the Proteoform Suite analysis to increase the sensitivity and accuracy of the analysis. Proteoform Suite is open source and freely available at https://github.com/smith-chem-wisc/proteoform-suite .


Subject(s)
Proteomics , Tandem Mass Spectrometry , Humans , Protein Processing, Post-Translational , Proteome/metabolism , Proteomics/methods , Software
5.
Nature ; 600(7889): 536-542, 2021 12.
Article in English | MEDLINE | ID: mdl-34819669

ABSTRACT

The cell is a multi-scale structure with modular organization across at least four orders of magnitude1. Two central approaches for mapping this structure-protein fluorescent imaging and protein biophysical association-each generate extensive datasets, but of distinct qualities and resolutions that are typically treated separately2,3. Here we integrate immunofluorescence images in the Human Protein Atlas4 with affinity purifications in BioPlex5 to create a unified hierarchical map of human cell architecture. Integration is achieved by configuring each approach as a general measure of protein distance, then calibrating the two measures using machine learning. The map, known as the multi-scale integrated cell (MuSIC 1.0), resolves 69 subcellular systems, of which approximately half are to our knowledge undocumented. Accordingly, we perform 134 additional affinity purifications and validate subunit associations for the majority of systems. The map reveals a pre-ribosomal RNA processing assembly and accessory factors, which we show govern rRNA maturation, and functional roles for SRRM1 and FAM120C in chromatin and RPS3A in splicing. By integration across scales, MuSIC increases the resolution of imaging while giving protein interactions a spatial dimension, paving the way to incorporate diverse types of data in proteome-wide cell maps.


Subject(s)
Chromosomes , Proteome , Antigens, Nuclear/genetics , Antigens, Nuclear/metabolism , Chromatin/genetics , Chromosomes/metabolism , Humans , Nuclear Matrix-Associated Proteins/metabolism , Proteome/metabolism , RNA, Ribosomal , RNA-Binding Proteins/genetics
6.
Anal Chem ; 93(26): 9119-9128, 2021 07 06.
Article in English | MEDLINE | ID: mdl-34165955

ABSTRACT

Proton-transfer reactions (PTRs) have emerged as a powerful tool for the study of intact proteins. When coupled with m/z-selective kinetic excitation, such as parallel ion parking (PIP), one can exert exquisite control over rates of reaction with a high degree of specificity. This allows one to "concentrate", in the gas phase, nearly all the signals from an intact protein charge state envelope into a single charge state, improving the signal-to-noise ratio (S/N) by 10× or more. While this approach has been previously reported, here we show that implementing these technologies on a 21 T FT-ICR MS provides a tremendous advantage for intact protein analysis. Advanced strategies for performing PTR with PIP were developed to complement this unique instrument, including subjecting all analyte ions entering the mass spectrometer to PTR and PIP. This experiment, which we call "PTR-MS1-PIP", generates a pseudo-MS1 spectrum derived from ions that are exposed to the PTR reagent and PIP waveforms but have not undergone any prior true mass filtering or ion isolation. The result is an extremely rapid and significant improvement in the spectral S/N of intact proteins. This permits the observation of many more proteoforms and reduces ion injection periods for subsequent tandem mass spectrometry characterization. Additionally, the product ion parking waveform has been optimized to enhance the PTR rate without compromise to the parking efficiency. We demonstrate that this process, called "rapid park", can improve reaction rates by 5-10× and explore critical factors discovered to influence this process. Finally, we demonstrate how coupling PTR-MS1 and rapid park provides a 10-fold reduction in ion injection time, improving the rate of tandem MS sequencing.


Subject(s)
Proteins , Protons , Indicators and Reagents , Ions , Tandem Mass Spectrometry
7.
Cell Syst ; 12(6): 622-635, 2021 06 16.
Article in English | MEDLINE | ID: mdl-34139169

ABSTRACT

Biological systems are by nature multiscale, consisting of subsystems that factor into progressively smaller units in a deeply hierarchical structure. At any level of the hierarchy, an ever-increasing diversity of technologies can be applied to characterize the corresponding biological units and their relations, resulting in large networks of physical or functional proximities-e.g., proximities of amino acids within a protein, of proteins within a complex, or of cell types within a tissue. Here, we review general concepts and progress in using network proximity measures as a basis for creation of multiscale hierarchical maps of biological systems. We discuss the functionalization of these maps to create predictive models, including those useful in translation of genotype to phenotype, along with strategies for model visualization and challenges faced by multiscale modeling in the near future. Collectively, these approaches enable a unified hierarchical approach to biological data, with application from the molecular to the macroscopic.


Subject(s)
Proteins , Systems Biology , Systems Biology/methods
8.
J Proteome Res ; 20(4): 1997-2004, 2021 04 02.
Article in English | MEDLINE | ID: mdl-33683901

ABSTRACT

MetaMorpheus is a free, open-source software program for the identification of peptides and proteoforms from data-dependent acquisition tandem MS experiments. There is inherent uncertainty in these assignments for several reasons, including the limited overlap between experimental and theoretical peaks, the m/z uncertainty, and noise peaks or peaks from coisolated peptides that produce false matches. False discovery rates provide only a set-wise approximation for incorrect spectrum matches. Here we implemented a binary decision tree calculation within MetaMorpheus to compute a posterior error probability, which provides a measure of uncertainty for each peptide-spectrum match. We demonstrate its utility for increasing identifications and resolving ambiguities in bottom-up, top-down, proteogenomic, and nonspecific digestion searches.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Algorithms , Databases, Protein , Peptides , Probability , Software
9.
J Proteome Res ; 20(1): 317-325, 2021 01 01.
Article in English | MEDLINE | ID: mdl-33074679

ABSTRACT

Identification of proteoforms, the different forms of a protein, is important to understand biological processes. A proteoform family is the set of different proteoforms from the same gene. We previously developed the software program Proteoform Suite, which constructs proteoform families and identifies proteoforms by intact-mass analysis. Here, we have applied this approach to top-down proteomic data acquired at the National High Magnetic Field Laboratory 21 tesla Fourier transform ion cyclotron resonance mass spectrometer (data available on the MassIVE platform with identifier MSV000085978). We explored the ability to construct proteoform families and identify proteoforms from the high mass accuracy data that this instrument provides for a complex cell lysate sample from the MCF-7 human breast cancer cell line. There were 2830 observed experimental proteforms, of which 932 were identified, 44 were ambiguous, and 1854 were unidentified. Of the 932 unique identified proteoforms, 766 were identified by top-down MS2 analysis at 1% false discovery rate (FDR) using TDPortal, and 166 were additional intact-mass identifications (∼4.7% calculated global FDR) made using Proteoform Suite. We recently published a proteoform level schema to represent ambiguity in proteoform identifications. We implemented this proteoform level classification in Proteoform Suite for intact-mass identifications, which enables users to determine the ambiguity levels and sources of ambiguity for each intact-mass proteoform identification.


Subject(s)
Cyclotrons , Proteomics , Fourier Analysis , Humans , Mass Spectrometry , Software
10.
J Proteome Res ; 19(8): 3510-3517, 2020 08 07.
Article in English | MEDLINE | ID: mdl-32584579

ABSTRACT

Cellular functions are performed by a vast and diverse set of proteoforms. Proteoforms are the specific forms of proteins produced as a result of genetic variations, RNA splicing, and post-translational modifications (PTMs). Top-down mass spectrometric analysis of intact proteins enables proteoform identification, including proteoforms derived from sequence cleavage events or harboring multiple PTMs. In contrast, bottom-up proteomics identifies peptides, which necessitates protein inference and does not yield proteoform identifications. We seek here to exploit the synergies between these two data types to improve the quality and depth of the overall proteomic analysis. To this end, we automated the large-scale integration of results from multiprotease bottom-up and top-down analyses in the software program Proteoform Suite and applied it to the analysis of proteoforms from the human Jurkat T lymphocyte cell line. We implemented the recently developed proteoform-level classification scheme for top-down tandem mass spectrometry (MS/MS) identifications in Proteoform Suite, which enables users to observe the level and type of ambiguity for each proteoform identification, including which of the ambiguous proteoform identifications are supported by bottom-up-level evidence. We used Proteoform Suite to find instances where top-down identifications aid in protein inference from bottom-up analysis and conversely where bottom-up peptide identifications aid in proteoform PTM localization. We also show the use of bottom-up data to infer proteoform candidates potentially present in the sample, allowing confirmation of such proteoform candidates by intact-mass analysis of MS1 spectra. The implementation of these capabilities in the freely available software program Proteoform Suite enables users to integrate large-scale top-down and bottom-up data sets and to utilize the synergies between them to improve and extend the proteomic analysis.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Humans , Protein Processing, Post-Translational , Proteome/metabolism , Software
11.
ChemElectroChem ; 7(15): 3244-3252, 2020 Aug 03.
Article in English | MEDLINE | ID: mdl-33542892

ABSTRACT

Micromolded carbon paste electrodes are easily fabricated, disposable, and can be integrated into microfluidic devices to fabricate inexpensive sensors and biosensors. In this work, carbon paste microelectrodes were fabricated in poly(dimethylsiloxane) using micromolding techniques and were coupled to a microfluidic channel to fabricate electrogenerated chemiluminescent (ECL) sensors. ECL was generated using both the tris(2,2'-bipyridyl)ruthenium(II)-tripropylamine system and the hydrogen peroxide and luminol system. For each of these ECL systems, the sensor fabrication method was optimized, along with key experimental parameters (applied voltage, solution flow rate, buffer species and luminol concentration). The limit of detection (S/N = 3) for TPrA was ~2.4 µM with a linear range of 10-100µM. For hydrogen peroxide the LOD was ~11 µM and the electrodes gave a linear response between 30 µM and 200 µM hydrogen peroxide. Electrodes containing glucose oxidase were fabricated using this new method, demonstrating that glucose could be indirectly detected via generation of hydrogen peroxide by the enzymatic reaction at the micromolded biosensor.

12.
J Proteome Res ; 18(10): 3671-3680, 2019 10 04.
Article in English | MEDLINE | ID: mdl-31479276

ABSTRACT

Complex human biomolecular processes are made possible by the diversity of human proteoforms. Constructing proteoform families, groups of proteoforms derived from the same gene, is one way to represent this diversity. Comprehensive, high-confidence identification of human proteoforms remains a central challenge in mass spectrometry-based proteomics. We have previously reported a strategy for proteoform identification using intact-mass measurements, and we have since improved that strategy by mass calibration based on search results, the use of a global post-translational modification discovery database, and the integration of top-down proteomics results with intact-mass analysis. In the present study, we combine these strategies for enhanced proteoform identification in total cell lysate from the Jurkat human T lymphocyte cell line. We collected, processed, and integrated three types of proteomics data (NeuCode-labeled intact-mass, label-free top-down, and multi-protease bottom-up) to maximize the number of confident proteoform identifications. The integrated analysis revealed 5950 unique experimentally observed proteoforms, which were assembled into 848 proteoform families. Twenty percent of the observed proteoforms were confidently identified at a 3.9% false discovery rate, representing 1207 unique proteoforms derived from 484 genes.


Subject(s)
Databases, Protein , Proteome , Proteomics/methods , Humans , Jurkat Cells , Mass Spectrometry , Peptide Hydrolases/analysis , Protein Isoforms , Protein Processing, Post-Translational
14.
Anal Chem ; 91(17): 10937-10942, 2019 09 03.
Article in English | MEDLINE | ID: mdl-31393705

ABSTRACT

Proteoforms, the primary effectors of biological processes, are the different forms of proteins that arise from molecular processing events such as alternative splicing and post-translational modifications. Heart diseases exhibit changes in proteoform levels, motivating the development of a deeper understanding of the heart proteoform landscape. Our recently developed two-dimensional top-down proteomics platform coupling serial size exclusion chromatography (sSEC) to reversed-phase chromatography (RPC) expanded coverage of the human heart proteome and allowed observation of high-molecular weight proteoforms. However, most of these observed proteoforms were not identified due to the difficulty in obtaining quality tandem mass spectrometry (MS2) fragmentation data for large proteoforms from complex biological mixtures on a chromatographic time scale. Herein, we sought to identify human heart proteoforms in this data set using an enhanced version of Proteoform Suite, which identifies proteoforms by intact mass alone. Specifically, we added a new feature to Proteoform Suite to determine candidate identifications for isotopically unresolved proteoforms larger than 50 kDa, enabling subsequent MS2 identification of important high-molecular weight human heart proteoforms such as lamin A (72 kDa) and trifunctional enzyme subunit α (79 kDa). With this new workflow for large proteoform identification, endogenous human cardiac myosin binding protein C (140 kDa) was identified for the first time. This study demonstrates the integration of our sSEC-RPC-MS proteomics platform with intact-mass analysis through Proteoform Suite to create a catalog of human heart proteoforms and facilitate the identification of large proteoforms in complex systems.


Subject(s)
Carrier Proteins/isolation & purification , Lamin Type A/isolation & purification , Mitochondrial Trifunctional Protein, alpha Subunit/isolation & purification , Myocardium/chemistry , Protein Processing, Post-Translational , Proteome/isolation & purification , Software , Alternative Splicing , Amino Acid Sequence , Carrier Proteins/chemistry , Carrier Proteins/metabolism , Chromatography, Gel , Chromatography, Reverse-Phase , Humans , Lamin Type A/chemistry , Lamin Type A/metabolism , Mitochondrial Trifunctional Protein, alpha Subunit/chemistry , Mitochondrial Trifunctional Protein, alpha Subunit/metabolism , Myocardium/metabolism , Proteome/chemistry , Proteome/metabolism , Proteomics/methods , Tandem Mass Spectrometry
15.
Proteomics ; 19(10): e1800361, 2019 05.
Article in English | MEDLINE | ID: mdl-31050378

ABSTRACT

A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post-translational modifications. In top-down proteomic analyses, proteoforms are identified and quantified through mass spectrometric analysis of intact proteins. Recent technological developments have enabled comprehensive proteoform analyses in complex samples, and an increasing number of laboratories are adopting top-down proteomic workflows. In this review, some recent advances are outlined and current challenges and future directions for the field are discussed.


Subject(s)
Amino Acids/analysis , Mass Spectrometry , Protein Processing, Post-Translational , Proteome/analysis , Proteomics/methods , Animals , Computational Biology , Electrophoresis, Capillary , Humans , Programming Languages , Reproducibility of Results , Software
16.
J Proteome Res ; 17(10): 3526-3536, 2018 10 05.
Article in English | MEDLINE | ID: mdl-30180576

ABSTRACT

The development of effective strategies for the comprehensive identification and quantification of proteoforms in complex systems is a critical challenge in proteomics. Proteoforms, the specific molecular forms in which proteins are present in biological systems, are the key effectors of biological function. Thus, knowledge of proteoform identities and abundances is essential to unraveling the mechanisms that underlie protein function. We recently reported a strategy that integrates conventional top-down mass spectrometry with intact-mass determinations for enhanced proteoform identifications and the elucidation of proteoform families and applied it to the analysis of yeast cell lysate. In the present work, we extend this strategy to enable quantification of proteoforms, and we examine changes in the abundance of murine mitochondrial proteoforms upon differentiation of mouse myoblasts to myotubes. The integrated top-down and intact-mass strategy provided an increase of ∼37% in the number of identified proteoforms compared to top-down alone, which is in agreement with our previous work in yeast; 1779 unique proteoforms were identified using the integrated strategy compared to 1301 using top-down analysis alone. Quantitative comparison of proteoform differences between the myoblast and myotube cell types showed 129 observed proteoforms exhibiting statistically significant abundance changes (fold change >2 and false discovery rate <5%).


Subject(s)
Mitochondria/metabolism , Mitochondrial Proteins/metabolism , Proteome/metabolism , Proteomics/methods , Tandem Mass Spectrometry/methods , Animals , Cell Differentiation , Cell Line , Mice , Muscle Fibers, Skeletal/cytology , Muscle Fibers, Skeletal/metabolism , Myoblasts/cytology , Myoblasts/metabolism , Reproducibility of Results , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism
17.
Stem Cell Reports ; 10(2): 627-641, 2018 02 13.
Article in English | MEDLINE | ID: mdl-29358085

ABSTRACT

The heterochromatin protein 1 (HP1) family is involved in various functions with maintenance of chromatin structure. During murine somatic cell reprogramming, we find that early depletion of HP1γ reduces the generation of induced pluripotent stem cells, while late depletion enhances the process, with a concomitant change from a centromeric to nucleoplasmic localization and elongation-associated histone H3.3 enrichment. Depletion of heterochromatin anchoring protein SENP7 increased reprogramming efficiency to a similar extent as HP1γ, indicating the importance of HP1γ release from chromatin for pluripotency acquisition. HP1γ interacted with OCT4 and DPPA4 in HP1α and HP1ß knockouts and in H3K9 methylation depleted H3K9M embryonic stem cell (ESC) lines. HP1α and HP1γ complexes in ESCs differed in association with histones, the histone chaperone CAF1 complex, and specific components of chromatin-modifying complexes such as DPY30, implying distinct functional contributions. Taken together, our results reveal the complex contribution of the HP1 proteins to pluripotency.


Subject(s)
Cellular Reprogramming/genetics , Chromatin/genetics , Induced Pluripotent Stem Cells/chemistry , Multiprotein Complexes/genetics , Animals , Chromatin/chemistry , Chromobox Protein Homolog 5 , Chromosomal Proteins, Non-Histone/chemistry , Chromosomal Proteins, Non-Histone/genetics , Endopeptidases/chemistry , Endopeptidases/genetics , Exoribonucleases , Histone-Lysine N-Methyltransferase/chemistry , Histone-Lysine N-Methyltransferase/genetics , Histones/genetics , Humans , Induced Pluripotent Stem Cells/cytology , Mice , Mice, Knockout , Multiprotein Complexes/chemistry , Nuclear Proteins/genetics , Octamer Transcription Factor-3/chemistry , Octamer Transcription Factor-3/genetics , Proteins/chemistry , Proteins/genetics , Repressor Proteins , Ribonucleases , Transcription Factors
18.
J Proteome Res ; 17(1): 568-578, 2018 01 05.
Article in English | MEDLINE | ID: mdl-29195273

ABSTRACT

We present an open-source, interactive program named Proteoform Suite that uses proteoform mass and intensity measurements from complex biological samples to identify and quantify proteoforms. It constructs families of proteoforms derived from the same gene, assesses proteoform function using gene ontology (GO) analysis, and enables visualization of quantified proteoform families and their changes. It is applied here to reveal systemic proteoform variations in the yeast response to salt stress.


Subject(s)
Proteomics/methods , Software , Fungal Proteins/analysis , Fungal Proteins/drug effects , Gene Ontology , Mass Spectrometry , Salts/pharmacology , Stress, Physiological/drug effects
19.
Anal Chem ; 90(2): 1325-1333, 2018 01 16.
Article in English | MEDLINE | ID: mdl-29227670

ABSTRACT

In top-down proteomics, intact proteins are analyzed by tandem mass spectrometry and proteoforms, which are defined forms of a protein with specific sequences of amino acids and localized post-translational modifications, are identified using precursor mass and fragmentation data. Many proteoforms that are detected in the precursor scan (MS1) are not selected for fragmentation by the instrument and therefore remain unidentified in typical top-down proteomic workflows. Our laboratory has developed the open source software program Proteoform Suite to analyze MS1-only intact proteoform data. Here, we have adapted it to provide identifications of proteoform masses in precursor MS1 spectra of top-down data, supplementing the top-down identifications obtained using the MS2 fragmentation data. Proteoform Suite performs mass calibration using high-scoring top-down identifications and identifies additional proteoforms using calibrated, accurate intact masses. Proteoform families, the set of proteoforms from a given gene, are constructed and visualized from proteoforms identified by both top-down and intact-mass analyses. Using this strategy, we constructed proteoform families and identified 1861 proteoforms in yeast lysate, yielding an approximately 40% increase over the original 1291 proteoform identifications observed using traditional top-down analysis alone.


Subject(s)
Mass Spectrometry/methods , Proteome/analysis , Proteomics/methods , Saccharomyces cerevisiae Proteins/analysis , Saccharomyces cerevisiae/chemistry , Software
20.
J Proteome Res ; 16(11): 4156-4165, 2017 11 03.
Article in English | MEDLINE | ID: mdl-28968100

ABSTRACT

A proteoform family is a group of related molecular forms of a protein (proteoforms) derived from the same gene. We have previously described a strategy to identify proteoforms and elucidate proteoform families in complex mixtures of intact proteins. The strategy is based upon measurements of two properties for each proteoform: (i) the accurate proteoform intact-mass, measured by liquid chromatography/mass spectrometry (LC-MS), and (ii) the number of lysine residues in each proteoform, determined using an isotopic labeling approach. These measured properties are then compared with those extracted from a catalog of theoretical proteoforms containing protein sequences and localized post-translational modifications (PTMs) for the organism under study. A match between the measured properties and those in the catalog constitutes an identification of the proteoform. In the present study, this strategy is extended by utilizing a global PTM discovery database and is applied to the widely studied model organism Escherichia coli, providing the most comprehensive elucidation of E. coli proteoforms and proteoform families to date.


Subject(s)
Escherichia coli/chemistry , Multigene Family , Protein Processing, Post-Translational , Proteomics/methods , Chromatography, Liquid , Databases, Protein , Lysine/analysis , Tandem Mass Spectrometry
SELECTION OF CITATIONS
SEARCH DETAIL
...