Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 235
Filter
1.
Article in English | MEDLINE | ID: mdl-39006170

ABSTRACT

Proteoforms, which arise from post-translational modifications, genetic polymorphisms and RNA splice variants, play a pivotal role as drivers in biology. Understanding proteoforms is essential to unravel the intricacies of biological systems and bridge the gap between genotypes and phenotypes. By analysing whole proteins without digestion, top-down proteomics (TDP) provides a holistic view of the proteome and can decipher protein function, uncover disease mechanisms and advance precision medicine. This Primer explores TDP, including the underlying principles, recent advances and an outlook on the future. The experimental section discusses instrumentation, sample preparation, intact protein separation, tandem mass spectrometry techniques and data collection. The results section looks at how to decipher raw data, visualize intact protein spectra and unravel data analysis. Additionally, proteoform identification, characterization and quantification are summarized, alongside approaches for statistical analysis. Various applications are described, including the human proteoform project and biomedical, biopharmaceutical and clinical sciences. These are complemented by discussions on measurement reproducibility, limitations and a forward-looking perspective that outlines areas where the field can advance, including potential future applications.

2.
bioRxiv ; 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38915658

ABSTRACT

Studying protein isoforms is an essential step in biomedical research; at present, the main approach for analyzing proteins is via bottom-up mass spectrometry proteomics, which return peptide identifications, that are indirectly used to infer the presence of protein isoforms. However, the detection and quantification processes are noisy; in particular, peptides may be erroneously detected, and most peptides, known as shared peptides, are associated to multiple protein isoforms. As a consequence, studying individual protein isoforms is challenging, and inferred protein results are often abstracted to the gene-level or to groups of protein isoforms. Here, we introduce IsoBayes, a novel statistical method to perform inference at the isoform level. Our method enhances the information available, by integrating mass spectrometry proteomics and transcriptomics data in a Bayesian probabilistic framework. To account for the uncertainty in the measurement process, we propose a two-layer latent variable approach: first, we sample if a peptide has been correctly detected (or, alternatively filter peptides); second, we allocate the abundance of such selected peptides across the protein(s) they are compatible with. This enables us, starting from peptide-level data, to recover protein-level data; in particular, we: i) infer the presence/absence of each protein isoform (via a posterior probability), ii) estimate its abundance (and credible interval), and iii) target isoforms where transcript and protein relative abundances significantly differ. We benchmarked our approach in simulations, and in two multi-protease real datasets: our method displays good sensitivity and specificity when detecting protein isoforms, its estimated abundances highly correlate with the ground truth, and can detect changes between protein and transcript relative abundances. IsoBayes is freely distributed as a Bioconductor R package, and is accompanied by an example usage vignette.

3.
Anal Bioanal Chem ; 2024 Jun 15.
Article in English | MEDLINE | ID: mdl-38877149

ABSTRACT

Identification of O-glycopeptides from tandem mass spectrometry data is complicated by the near complete dissociation of O-glycans from the peptide during collisional activation and by the combinatorial explosion of possible glycoforms when glycans are retained intact in electron-based activation. The recent O-Pair search method provides an elegant solution to these problems, using a collisional activation scan to identify the peptide sequence and total glycan mass, and a follow-up electron-based activation scan to localize the glycosite(s) using a graph-based algorithm in a reduced search space. Our previous O-glycoproteomics methods with MSFragger-Glyco allowed for extremely fast and sensitive identification of O-glycopeptides from collisional activation data but had limited support for site localization of glycans and quantification of glycopeptides. Here, we report an improved pipeline for O-glycoproteomics analysis that provides proteome-wide, site-specific, quantitative results by incorporating the O-Pair method as a module within FragPipe. In addition to improved search speed and sensitivity, we add flexible options for oxonium ion-based filtering of glycans and support for a variety of MS acquisition methods and provide a comparison between all software tools currently capable of O-glycosite localization in proteome-wide searches.

4.
Proteomics ; 24(8): e2300234, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38487981

ABSTRACT

The identification of proteoforms by top-down proteomics requires both high quality fragmentation spectra and the neutral mass of the proteoform from which the fragments derive. Intact proteoform spectra can be highly complex and may include multiple overlapping proteoforms, as well as many isotopic peaks and charge states. The resulting lower signal-to-noise ratios for intact proteins complicates downstream analyses such as deconvolution. Averaging multiple scans is a common way to improve signal-to-noise, but mass spectrometry data contains artifacts unique to it that can degrade the quality of an averaged spectra. To overcome these limitations and increase signal-to-noise, we have implemented outlier rejection algorithms to remove outlier measurements efficiently and robustly in a set of MS1 scans prior to averaging. We have implemented averaging with rejection algorithms in the open-source, freely available, proteomics search engine MetaMorpheus. Herein, we report the application of the averaging with rejection algorithms to direct injection and online liquid chromatography mass spectrometry data. Averaging with rejection algorithms demonstrated a 45% increase in the number of proteoforms detected in Jurkat T cell lysate. We show that the increase is due to improved spectral quality, particularly in regions surrounding isotopic envelopes.


Subject(s)
Proteome , Proteomics , Proteome/analysis , Proteomics/methods , Protein Processing, Post-Translational , Algorithms , Mass Spectrometry
6.
J Proteome Res ; 23(1): 149-160, 2024 01 05.
Article in English | MEDLINE | ID: mdl-38043095

ABSTRACT

Host RNA binding proteins recognize viral RNA and play key roles in virus replication and antiviral mechanisms. SARS-CoV-2 generates a series of tiered subgenomic RNAs (sgRNAs), each encoding distinct viral protein(s) that regulate different aspects of viral replication. Here, for the first time, we demonstrate the successful isolation of SARS-CoV-2 genomic RNA and three distinct sgRNAs (N, S, and ORF8) from a single population of infected cells and characterize their protein interactomes. Over 500 protein interactors (including 260 previously unknown) were identified as associated with one or more target RNA. These included protein interactors unique to a single RNA pool and others present in multiple pools, highlighting our ability to discriminate between distinct viral RNA interactomes despite high sequence similarity. Individual interactomes indicated viral associations with cell response pathways, including regulation of cytoplasmic ribonucleoprotein granules and posttranscriptional gene silencing. We tested the significance of three protein interactors in these pathways (APOBEC3F, PPP1CC, and MSI2) using siRNA knockdowns, with several knockdowns affecting viral gene expression, most consistently PPP1CC. This study describes a new technology for high-resolution studies of SARS-CoV-2 RNA regulation and reveals a wealth of new viral RNA-associated host factors of potential functional significance to infection.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , Subgenomic RNA , RNA, Viral/genetics , RNA, Viral/metabolism , COVID-19/genetics , Virus Replication/genetics , Genomics , RNA-Binding Proteins/genetics
7.
Anal Chem ; 95(41): 15245-15253, 2023 10 17.
Article in English | MEDLINE | ID: mdl-37791746

ABSTRACT

Top-down proteomics, the tandem mass spectrometric analysis of intact proteoforms, is the dominant method for proteoform characterization in complex mixtures. While this strategy produces detailed molecular information, it also requires extensive instrument time per mass spectrum obtained and thus compromises the depth of proteoform coverage that is accessible on liquid chromatography time scales. Such a top-down analysis is necessary for making original proteoform identifications, but once a proteoform has been confidently identified, the extensive characterization it provides may no longer be required for a subsequent identification of the same proteoform. We present a strategy to identify proteoforms in tissue samples on the basis of the combination of an intact mass determination with a measured count of the number of cysteine residues present in each proteoform. We developed and characterized a cysteine tagging chemistry suitable for the efficient and specific labeling of cysteine residues within intact proteoforms and for providing a count of the cysteine amino acids present. On simple protein mixtures, the tagging chemistry yields greater than 98% labeling of all cysteine residues, with a labeling specificity of greater than 95%. Similar results are observed on more complex samples. In a proof-of-principle study, proteoforms present in a human prostate tumor biopsy were characterized. Observed proteoforms, each characterized by an intact mass and a cysteine count, were grouped into proteoform families (groups of proteoforms originating from the same gene). We observed 2190 unique experimental proteoforms, 703 of which were grouped into 275 proteoform families.


Subject(s)
Cysteine , Tandem Mass Spectrometry , Humans , Cysteine/metabolism , Tandem Mass Spectrometry/methods , Proteins/metabolism , Chromatography, Liquid , Proteomics/methods , Proteome/analysis , Protein Processing, Post-Translational
8.
bioRxiv ; 2023 May 16.
Article in English | MEDLINE | ID: mdl-37293069

ABSTRACT

Host RNA binding proteins recognize viral RNA and play key roles in virus replication and antiviral defense mechanisms. SARS-CoV-2 generates a series of tiered subgenomic RNAs (sgRNAs), each encoding distinct viral protein(s) that regulate different aspects of viral replication. Here, for the first time, we demonstrate the successful isolation of SARS-CoV-2 genomic RNA and three distinct sgRNAs (N, S, and ORF8) from a single population of infected cells and characterize their protein interactomes. Over 500 protein interactors (including 260 previously unknown) were identified as associated with one or more target RNA at either of two time points. These included protein interactors unique to a single RNA pool and others present in multiple pools, highlighting our ability to discriminate between distinct viral RNA interactomes despite high sequence similarity. The interactomes indicated viral associations with cell response pathways including regulation of cytoplasmic ribonucleoprotein granules and posttranscriptional gene silencing. We validated the significance of five protein interactors predicted to exhibit antiviral activity (APOBEC3F, TRIM71, PPP1CC, LIN28B, and MSI2) using siRNA knockdowns, with each knockdown yielding increases in viral production. This study describes new technology for studying SARS-CoV-2 and reveals a wealth of new viral RNA-associated host factors of potential functional significance to infection.

9.
Am J Physiol Lung Cell Mol Physiol ; 325(1): L30-L44, 2023 07 01.
Article in English | MEDLINE | ID: mdl-37130807

ABSTRACT

Despite recent technological advances such as ex vivo lung perfusion (EVLP), the outcome of lung transplantation remains unsatisfactory with ischemic injury being a common cause for primary graft dysfunction. New therapeutic developments are hampered by limited understanding of pathogenic mediators of ischemic injury to donor lung grafts. Here, to identify novel proteomic effectors underlying the development of lung graft dysfunction, using bioorthogonal protein engineering, we selectively captured and identified newly synthesized glycoproteins (NewS-glycoproteins) produced during EVLP with unprecedented temporal resolution of 4 h. Comparing the NewS-glycoproteomes in lungs with and without warm ischemic injury, we discovered highly specific proteomic signatures with altered synthesis in ischemic lungs, which exhibited close association to hypoxia response pathways. Inspired by the discovered protein signatures, pharmacological modulation of the calcineurin pathway during EVLP of ischemic lungs offered graft protection and improved posttransplantation outcome. In summary, the described EVLP-NewS-glycoproteomics strategy delivers an effective new means to reveal molecular mediators of donor lung pathophysiology and offers the potential to guide future therapeutic development.NEW & NOTEWORTHY This study developed and implemented a bioorthogonal strategy to chemoselectively label, enrich, and characterize newly synthesized (NewS-)glycoproteins during 4-h ex vivo lung perfusion (EVLP). Through this approach, the investigators uncovered specific proteomic signatures associated with warm ischemic injury in donor lung grafts. These signatures exhibit high biological relevance to ischemia-reperfusion injury, validating the robustness of the presented approach.


Subject(s)
Lung Transplantation , Reperfusion Injury , Humans , Perfusion , Proteomics , Warm Ischemia , Lung/metabolism , Reperfusion Injury/metabolism , Glycoproteins/metabolism
10.
Elife ; 122023 04 11.
Article in English | MEDLINE | ID: mdl-37039476

ABSTRACT

Mutations in the ubiquitin (Ub) chaperone Ubiquilin 2 (UBQLN2) cause X-linked forms of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) through unknown mechanisms. Here, we show that aggregation-prone, ALS-associated mutants of UBQLN2 (UBQLN2ALS) trigger heat stress-dependent neurodegeneration in Drosophila. A genetic modifier screen implicated endolysosomal and axon guidance genes, including the netrin receptor, Unc-5, as key modulators of UBQLN2 toxicity. Reduced gene dosage of Unc-5 or its coreceptor Dcc/frazzled diminished neurodegenerative phenotypes, including motor dysfunction, neuromuscular junction defects, and shortened lifespan, in flies expressing UBQLN2ALS alleles. Induced pluripotent stem cells (iPSCs) harboring UBQLN2ALS knockin mutations exhibited lysosomal defects while inducible motor neurons (iMNs) expressing UBQLN2ALS alleles exhibited cytosolic UBQLN2 inclusions, reduced neurite complexity, and growth cone defects that were partially reversed by silencing of UNC5B and DCC. The combined findings suggest that altered growth cone dynamics are a conserved pathomechanism in UBQLN2-associated ALS/FTD.


Subject(s)
Amyotrophic Lateral Sclerosis , Frontotemporal Dementia , Humans , Amyotrophic Lateral Sclerosis/genetics , Frontotemporal Dementia/genetics , Axon Guidance , Adaptor Proteins, Signal Transducing/genetics , Adaptor Proteins, Signal Transducing/metabolism , Autophagy-Related Proteins/genetics , Autophagy-Related Proteins/metabolism , Mutation , Transcription Factors/genetics , Ubiquitins/metabolism , Netrin Receptors/genetics
11.
Anal Chem ; 95(18): 7087-7092, 2023 05 09.
Article in English | MEDLINE | ID: mdl-37093976

ABSTRACT

RNA-protein interactions are key to many aspects of cellular homeostasis and their identification is important to understanding cellular function. Multiple strategies have been developed for the RNA-centric characterization of RNA-protein complexes. However, these studies have all been done in immortalized cell lines that do not capture the complexity of heterogeneous tissue samples. Here, we develop hybridization purification of RNA-protein complexes followed by mass spectrometry (HyPR-MS) for use in tissue samples. We isolated both polyadenylated RNA and the specific long noncoding RNA MALAT1 and characterized their protein interactomes. These results demonstrate the feasibility of HyPR-MS in tissue for the multiplexed characterization of specific RNA-protein complexes.


Subject(s)
RNA, Long Noncoding , RNA, Long Noncoding/genetics , Cell Line , RNA, Messenger
12.
Vet Surg ; 52(5): 739-746, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37073566

ABSTRACT

OBJECTIVE: To determine whether one larger or two smaller diameter pins used for tibial tuberosity avulsion fracture (TTAF) stabilization provides greater axial tensile strength and stiffness when subjected to monotonic mechanical load to failure in normal skeletally mature canine cadavers. STUDY DESIGN: Paired ex vivo biomechanical study. SAMPLE POPULATION: Eleven pairs of adult cadaveric dog tibias. METHODS: Twenty-two tibias from 11 dogs were collected to model a TTAF. Each limb of a pair was randomly assigned a one or two-pin fixation. Tibias were subjected to monotonic, axial load to failure. Fixation stiffness, strength, and pin insertion angles were analyzed with parametric testing. Significance was set at p < .05. RESULTS: The mean strength of the single-pin fixation was 426.2 ± 50.5 N compared to two-pin fixation at 639.2 ± 173.5 N (p = .003). The mean stiffness of the single-pin fixation was 57.3 ± 18.7 N/mm and the two-pin fixation was 71.7 ± 20.5 N/mm (p = .029). The normalized ratio between one and two-pin fixation had a mean stiffness of 68% ± 25.8% and strength of 82.8% ± 24.6%. CONCLUSIONS: In an ex vivo cadaveric TTAF model, vertically aligned two-pin fixation offers greater strength and stiffness when compared to a single-pin fixation. CLINICAL SIGNIFICANCE: When repairing TTAF, surgeons should aim to apply two vertically aligned pins rather than a single pin for greater strength and stiffness.


Subject(s)
Dog Diseases , Fractures, Avulsion , Tibial Fractures , Dogs , Animals , Fractures, Avulsion/veterinary , Bone Nails/veterinary , Tibial Fractures/surgery , Tibial Fractures/veterinary , Tibia/surgery , Cadaver , Biomechanical Phenomena , Fracture Fixation/veterinary
13.
Methods Mol Biol ; 2426: 35-66, 2023.
Article in English | MEDLINE | ID: mdl-36308684

ABSTRACT

MetaMorpheus is a free and open-source software program dedicated to the comprehensive analysis of proteomic data. In bottom-up proteomics, protein samples are digested into peptides prior to chromatographic separation and tandem mass spectrometric analysis. The resulting fragmentation spectra are subsequently analyzed with search software programs to obtain peptide identifications and infer the presence of proteins in the samples. MetaMorpheus seeks to maximize the information gleaned from proteomic data through the use of (a) mass calibration, (b) post-translational modification discovery, (c) multiple search algorithms, which aid in the analysis of data from traditional, crosslinking, and glycoproteomic experiments, (d) isotope-based or label-free quantification, (e) multi-protease protein inference, and (f) spectral annotation and data visualization capabilities. This protocol provides detailed descriptions of how use MetaMorpheus and how to customize data analysis workflows using MetaMorpheus tasks to meet the specific needs of the user.


Subject(s)
Data Analysis , Proteomics , Proteomics/methods , Software , Tandem Mass Spectrometry/methods , Peptides/chemistry , Proteins/chemistry , Algorithms , Databases, Protein
14.
Methods Mol Biol ; 2426: 303-313, 2023.
Article in English | MEDLINE | ID: mdl-36308694

ABSTRACT

The rapid and accurate quantification of peptides is a critical element of modern proteomics that has become increasingly challenging as proteomic data sets grow in size and complexity. We present here FlashLFQ, a computer program for high-speed label-free quantification of peptides and proteins following a search of bottom-up mass spectrometry data. FlashLFQ is approximately an order of magnitude faster than established label-free quantification methods and can quantify data-dependent analysis (DDA) search results from any proteomics search program. It is available as a graphical user interface program, a command line tool, a Docker image, and integrated into the MetaMorpheus search software.


Subject(s)
Proteins , Proteomics , Proteomics/methods , Proteins/chemistry , Peptides/chemistry , Software , Mass Spectrometry/methods
15.
Analyst ; 148(3): 475-486, 2023 Jan 31.
Article in English | MEDLINE | ID: mdl-36383138

ABSTRACT

Proteins are the key biological actors within cells, driving many biological processes integral to both healthy and diseased states. Understanding the depth of complexity represented within the proteome is crucial to our scientific understanding of cellular biology and to provide disease specific insights for clinical applications. Mass spectrometry-based proteomics is the premier method for proteome analysis, with the ability to both identify and quantify proteins. Although proteomics continues to grow as a robust field of bioanalytical chemistry, advances are still necessary to enable a more comprehensive view of the proteome. In this review, we provide a broad overview of mass spectrometry-based proteomics in general, and highlight four developing areas of bottom-up proteomics: (1) protein inference, (2) alternative proteases, (3) sample-specific databases and (4) post-translational modification discovery.


Subject(s)
Proteome , Proteomics , Proteomics/methods , Proteome/metabolism , Protein Processing, Post-Translational , Mass Spectrometry/methods , Peptide Hydrolases/metabolism
16.
J Proteome Res ; 21(11): 2609-2618, 2022 11 04.
Article in English | MEDLINE | ID: mdl-36206157

ABSTRACT

Tandem mass spectrometry (MS/MS) is widely employed for the analysis of complex proteomic samples. While protein sequence database searching and spectral library searching are both well-established peptide identification methods, each has shortcomings. Protein sequence databases lack fragment peak intensity information, which can result in poor discrimination between correct and incorrect spectrum assignments. Spectral libraries usually contain fewer peptides than protein sequence databases, which limits the number of peptides that can be identified. Notably, few post-translationally modified peptides are represented in spectral libraries. This is because few search engines can both identify a broad spectrum of PTMs and create corresponding spectral libraries. Also, programs that generate spectral libraries using deep learning approaches are not yet able to accurately predict spectra for the vast majority of PTMs. Here, we address these limitations through use of a hybrid search strategy that combines protein sequence database and spectral library searches to improve identification success rates and sensitivity. This software uses Global PTM Discovery (G-PTM-D) to produce spectral libraries for a wide variety of different PTMs. These features, along with a new spectrum annotation and visualization tool, have been integrated into the freely available and open-source search engine MetaMorpheus.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Databases, Protein , Proteomics/methods , Tandem Mass Spectrometry/methods , Data Analysis , Software , Peptides/analysis , Peptide Library , Algorithms
17.
Elife ; 112022 08 12.
Article in English | MEDLINE | ID: mdl-35959885

ABSTRACT

In eukaryotes, splice sites define the introns of pre-mRNAs and must be recognized and excised with nucleotide precision by the spliceosome to make the correct mRNA product. In one of the earliest steps of spliceosome assembly, the U1 small nuclear ribonucleoprotein (snRNP) recognizes the 5' splice site (5' SS) through a combination of base pairing, protein-RNA contacts, and interactions with other splicing factors. Previous studies investigating the mechanisms of 5' SS recognition have largely been done in vivo or in cellular extracts where the U1/5' SS interaction is difficult to deconvolute from the effects of trans-acting factors or RNA structure. In this work we used colocalization single-molecule spectroscopy (CoSMoS) to elucidate the pathway of 5' SS selection by purified yeast U1 snRNP. We determined that U1 reversibly selects 5' SS in a sequence-dependent, two-step mechanism. A kinetic selection scheme enforces pairing at particular positions rather than overall duplex stability to achieve long-lived U1 binding. Our results provide a kinetic basis for how U1 may rapidly surveil nascent transcripts for 5' SS and preferentially accumulate at these sequences rather than on close cognates.


Subject(s)
Ribonucleoprotein, U1 Small Nuclear , Saccharomyces cerevisiae , RNA Precursors/metabolism , RNA Splice Sites , RNA Splicing , Ribonucleoprotein, U1 Small Nuclear/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Spliceosomes/metabolism
18.
Methods Mol Biol ; 2500: 1-4, 2022.
Article in English | MEDLINE | ID: mdl-35657582

ABSTRACT

The Human Proteoform Project is an ambitious international effort to accelerate the development of technologies for proteoform analysis and to establish comprehensive atlases of proteoforms for humans and model organisms. Proteoforms are the ultimate molecular effectors of function in biology and are thus central to understanding that function. Proteoform analysis as it is practiced today is almost exclusively accomplished by mass spectrometry (MS) and is rapidly advancing in its capabilities. This volume presents a beautiful snapshot of emerging technologies at the exciting frontier of MS-based proteoform analysis.


Subject(s)
Protein Processing, Post-Translational , Proteomics , Humans , Mass Spectrometry/methods , Proteomics/methods
19.
Methods Mol Biol ; 2500: 67-81, 2022.
Article in English | MEDLINE | ID: mdl-35657588

ABSTRACT

Proteoform Suite is an interactive software program for the identification and quantification of intact proteoforms from mass spectrometry data. Proteoform Suite identifies proteoforms observed by intact-mass (MS1) analysis. In intact-mass analysis, unfragmented experimental proteoforms are compared to a database of known proteoform sequences and to one another, searching for mass differences corresponding to well-known post-translational modifications or amino acids. Intact-mass analysis enables proteoforms observed in the MS1 data without MS/MS (MS2) fragmentation to be identified. Proteoform Suite further facilitates the construction and visualization of proteoform families, which are the sets of proteoforms derived from individual genes. Bottom-up peptide identifications and top-down (MS2) proteoform identifications can be integrated into the Proteoform Suite analysis to increase the sensitivity and accuracy of the analysis. Proteoform Suite is open source and freely available at https://github.com/smith-chem-wisc/proteoform-suite .


Subject(s)
Proteomics , Tandem Mass Spectrometry , Humans , Protein Processing, Post-Translational , Proteome/metabolism , Proteomics/methods , Software
20.
Genome Biol ; 23(1): 69, 2022 03 03.
Article in English | MEDLINE | ID: mdl-35241129

ABSTRACT

BACKGROUND: The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. RESULTS: We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. CONCLUSIONS: Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.


Subject(s)
Proteogenomics , Alternative Splicing , Humans , Protein Isoforms/genetics , Proteomics , Sequence Analysis, RNA/methods , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL
...