Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Nat Commun ; 13(1): 744, 2022 02 08.
Article in English | MEDLINE | ID: mdl-35136070

ABSTRACT

The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP , and a web-based version can be accessed at https://smap.shinyapps.io/smap/ .


Subject(s)
Datasets as Topic , Proteogenomics/methods , Chromatin Immunoprecipitation Sequencing , Data Analysis , Female , Humans , Male , Mass Spectrometry/methods , Mass Spectrometry/statistics & numerical data , Proteogenomics/statistics & numerical data , RNA-Seq , Software , Whole Genome Sequencing
3.
J Proteome Res ; 17(11): 3681-3692, 2018 11 02.
Article in English | MEDLINE | ID: mdl-30295032

ABSTRACT

Modern mass spectrometry now permits genome-scale and quantitative measurements of biological proteomes. However, analysis of specific specimens is currently hindered by the incomplete representation of biological variability of protein sequences in canonical reference proteomes and the technical demands for their construction. Here, we report ProteomeGenerator, a framework for de novo and reference-assisted proteogenomic database construction and analysis based on sample-specific transcriptome sequencing and high-accuracy mass spectrometry proteomics. This enables the assembly of proteomes encoded by actively transcribed genes, including sample-specific protein isoforms resulting from non-canonical mRNA transcription, splicing, or editing. To improve the accuracy of protein isoform identification in non-canonical proteomes, ProteomeGenerator relies on statistical target-decoy database matching calibrated using sample-specific controls. Its current implementation includes automatic integration with MaxQuant mass spectrometry proteomics algorithms. We applied this method for the proteogenomic analysis of splicing factor SRSF2 mutant leukemia cells, demonstrating high-confidence identification of non-canonical protein isoforms arising from alternative transcriptional start sites, intron retention, and cryptic exon splicing as well as improved accuracy of genome-scale proteome discovery. Additionally, we report proteogenomic performance metrics for current state-of-the-art implementations of SEQUEST HT, MaxQuant, Byonic, and PEAKS mass spectral analysis algorithms. Finally, ProteomeGenerator is implemented as a Snakemake workflow within a Singularity container for one-step installation in diverse computing environments, thereby enabling open, scalable, and facile discovery of sample-specific, non-canonical, and neomorphic biological proteomes.


Subject(s)
Algorithms , Peptides/chemistry , Proteomics/methods , RNA, Messenger/genetics , Software , Transcriptome , Alternative Splicing , Amino Acid Sequence , Cell Line, Tumor , Humans , Leukocytes/metabolism , Leukocytes/pathology , Mass Spectrometry/statistics & numerical data , Molecular Sequence Annotation , Mutation , Peptide Mapping/statistics & numerical data , Peptides/classification , Peptides/isolation & purification , Proteogenomics/methods , Proteogenomics/statistics & numerical data , Proteome , RNA, Messenger/metabolism , Serine-Arginine Splicing Factors/genetics , Serine-Arginine Splicing Factors/metabolism
4.
J Proteome Res ; 16(7): 2639-2644, 2017 07 07.
Article in English | MEDLINE | ID: mdl-28573858

ABSTRACT

The introduction of new standard formats, proBAM and proBed, improves the integration of genomics and proteomics information, thus aiding proteogenomics applications. These novel formats enable peptide spectrum matches (PSM) to be stored, inspected, and analyzed within the context of the genome. However, an easy-to-use and transparent tool to convert mass spectrometry identification files to these new formats is indispensable. proBAMconvert enables the conversion of common identification file formats (mzIdentML, mzTab, and pepXML) to proBAM/proBed using an intuitive interface. Furthermore, ProBAMconvert enables information to be output both at the PSM and peptide levels and has a command line interface next to the graphical user interface. Detailed documentation and a completely worked-out tutorial is available at http://probam.biobix.be .


Subject(s)
Computational Biology/methods , Genome , Peptides/analysis , Proteogenomics/statistics & numerical data , User-Computer Interface , Algorithms , Animals , Chromosome Mapping/statistics & numerical data , Humans , Information Storage and Retrieval , Proteogenomics/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...