Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
Mol Cell Proteomics ; : 100798, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38871251

ABSTRACT

Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities and future perspectives of this approach and its impact on mass spectrometry-based proteomics.

2.
Anal Chem ; 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-38329031

ABSTRACT

We present UniSpec, an attention-driven deep neural network designed to predict comprehensive collision-induced fragmentation spectra, thereby improving peptide identification in shotgun proteomics. Utilizing a training data set of 1.8 million unique high-quality tandem mass spectra (MS2) from 0.8 million unique peptide ions, UniSpec learned with a peptide fragmentation dictionary encompassing 7919 fragment peaks. Among these, 5712 are neutral loss peaks, with 2310 corresponding to modification-specific neutral losses. Remarkably, UniSpec can predict 73%-77% of fragment intensities based on our NIST reference library spectra, a significant leap from the 35%-45% coverage of only b and y ions. Comparative studies with Prosit elucidate that while both models are strong at predicting their respective fragment ion series, UniSpec particularly shines in generating more complex MS2 spectra with diverse ion annotations. The integration of UniSpec's predictions into shotgun proteomics data analysis boosts the identification rate of tryptic peptides by 48% at a 1% false discovery rate (FDR) and 60% at a more confident 0.1% FDR. Using UniSpec's predicted in-silico spectral library, the search results closely matched those from search engines and experimental spectral libraries used in peptide identification, highlighting its potential as a stand-alone identification tool. The source code and Python scripts are available on GitHub (https://github.com/usnistgov/UniSpec) and Zenodo (https://zenodo.org/records/10452792), and all data sets and analysis results generated in this work were deposited in Zenodo (https://zenodo.org/records/10052268).

3.
J Proteome Res ; 22(7): 2246-2255, 2023 07 07.
Article in English | MEDLINE | ID: mdl-37232537

ABSTRACT

The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.


Subject(s)
Peptide Library , Proteome , Proteome/metabolism , Artificial Intelligence , Tandem Mass Spectrometry , Algorithms , Phosphopeptides , Databases, Protein , Software
5.
Solid State Nucl Magn Reson ; 111: 101701, 2021 02.
Article in English | MEDLINE | ID: mdl-33260039

ABSTRACT

The benefits of triple-resonance experiments for structure determination of macroscopically oriented membrane proteins by solid-state NMR are discussed. While double-resonance 1H/15N experiments are effective for structure elucidation of alpha-helical domains, extension of the method of oriented samples to more complex topologies and assessing side-chain conformations necessitates further development of triple-resonance (1H/13C/15N) NMR pulse sequences. Incorporating additional spectroscopic dimensions involving 13C spin-bearing nuclei, however, introduces essential complications arising from the wide frequency range of the 1H-13C dipolar couplings and 13C CSA (>20 â€‹kHz), and the presence of the 13C-13C homonuclear dipole-dipole interactions. The recently reported ROULETTE-CAHA pulse sequence, in combination with the selective z-filtering, can be used to evolve the structurally informative 1H-13C dipolar coupling arising from the aliphatic carbons while suppressing the signals from the carbonyl and methyl regions. Proton-mediated magnetization transfer under mismatched Hartman-Hahn conditions (MMHH) can be used to correlate 13C and 15N nuclei in such triple-resonance experiments for the subsequent 15N detection. The recently developed pulse sequences are illustrated for n-acetyl Leucine (NAL) single crystal and doubly labeled Pf1 coat protein reconstituted in magnetically aligned bicelles. An interesting observation is that in the case of 15N-labeled NAL measured at 13C natural abundance, the triple (1H/13C/15N) MMHH scheme predominantly gives rise to long-range intermolecular magnetization transfers from 13C to 15N spins; whereas direct Hartmann-Hahn 13C/15N transfer is entirely intramolecular. The presented developments advance NMR of oriented samples for structure determination of membrane proteins and liquid crystals.


Subject(s)
Membrane Proteins , Protons , Magnetic Resonance Imaging , Magnetic Resonance Spectroscopy/methods , Membrane Proteins/chemistry , Nuclear Magnetic Resonance, Biomolecular/methods
6.
J Magn Reson ; 317: 106794, 2020 08.
Article in English | MEDLINE | ID: mdl-32717619

ABSTRACT

High-resolution separated local field (SLF) experiments are employed in oriented-sample solid state NMR to measure angular-dependent heteronuclear dipolar couplings for structure determination. While traditionally these experiments have been designed analytically by determining cycles of pulses with specific phases and durations to achieve cancellation of the homonuclear dipolar terms in the average Hamiltonian, recent work has introduced a computational approach to optimizing linewidths of the 1H-15N dipolar resonances. Accelerated by GPU processors, a computer algorithm searches for the optimal parameters by simulating numerous 1H-15N NMR spectra. This approach, termed ROULETTE, showed promising results by developing a new pulse sequence (ROULETTE-1.0) exhibiting 18% sharper mean linewidths than SAMPI4 for an N-acetyl Leucine (NAL) crystal. Herein, we expand on this previous work to improve the performance of the 1H-15N SLF experiment and extend the work beyond the original approach to new SLF experiments. The new algorithm, in addition to finding pulse durations and phases, now searches for the optimal on/off application scheme of radio frequency irradiation on each channel. This constitutes true de novo optimization, effectively optimizing every aspect of a pulse sequence instead of just phases and durations. With an improved ROULETTE algorithm, we have found a new 1H-15N pulse sequence, termed ROULETTE-2.0, yielding 32% sharper mean linewidths than SAMPI4 for NAL crystal at 500 MHz 1H frequency. Whereas both SAMPI4 and ROULETTE-1.0 have a window where the rf power on the I-channel is turned off, the new pulse sequence is entirely windowless. Furthermore, the reliability of the algorithm has been greatly improved in terms of avoiding false positives, i.e. well-performing pulse sequences in silica that fail to render narrow resonances in experiment. The program has been extended to the 13Cα-1Hα SLF experiments, using a 6 subdwell architecture similar to the 1H-15N optimization. Compared to the PISEMA pulse sequence, the mean 13Cα-1Hα linewidth is 17% sharper for the new pulse sequence, termed ROULETTE-CAHA. In addition to superior performance, the work demonstrates the broad applicability of the algorithm and its adaptability to different NMR experiments and spin systems.

7.
Angew Chem Int Ed Engl ; 59(9): 3554-3557, 2020 02 24.
Article in English | MEDLINE | ID: mdl-31887238

ABSTRACT

In oriented-sample (OS) solid-state NMR of membrane proteins, the angular-dependent dipolar couplings and chemical shifts provide a direct input for structure calculations. However, so far only 1 H-15 N dipolar couplings and 15 N chemical shifts have been routinely assessed in oriented 15 N-labeled samples. The main obstacle for extending this technique to membrane proteins of arbitrary topology has remained in the lack of additional experimental restraints. We have developed a new experimental triple-resonance NMR technique, which was applied to uniformly doubly (15 N, 13 C)-labeled Pf1 coat protein in magnetically aligned DMPC/DHPC bicelles. The previously inaccessible 1 Hα -13 Cα dipolar couplings have been measured, which make it possible to determine the torsion angles between the peptide planes without assuming α-helical structure a priori. The fitting of three angular restraints per peptide plane and filtering by Rosetta scoring functions has yielded a consensus α-helical transmembrane structure for Pf1 protein.


Subject(s)
Membrane Proteins/chemistry , Nuclear Magnetic Resonance, Biomolecular , Carbon Isotopes/chemistry , Inovirus/metabolism , Isotope Labeling , Lipid Bilayers/chemistry , Lipid Bilayers/metabolism , Nitrogen Isotopes/chemistry , Viral Proteins/chemistry
8.
J Magn Reson ; 310: 106641, 2020 01.
Article in English | MEDLINE | ID: mdl-31734619

ABSTRACT

Separated Local Field (SLF) experiments have been routinely used for measuring 1H-15N heteronuclear dipolar couplings in oriented-sample solid-state NMR for structure determination of proteins. In the on-going pursuit of designing better-performing SLF pulse sequences (e.g. by increasing the number of subdwells, and varying the rf amplitudes and phases), analytical treatment of the relevant average Hamiltonian terms may become cumbersome and/or nearly impossible. Numerical simulations of NMR experiments using GPU processors can be employed to rapidly calculate spectra for moderately sized spin systems, which permit an efficient numeric optimization of pulse sequences by the Monte Carlo Simulated Annealing protocol. In this work, a computational strategy was developed to find the optimal phases and timings that substantially improve the 1H-15N dipolar linewidths over a broad range of dipolar couplings as compared to SAMPI4. More than 100 pulse sequences were developed de novo and tested on an N-acetyl Leucine crystal. Seventeen distinct pulse sequences were shown to produce sharper mean linewidths than SAMPI4. Overall, these pulse sequences have more variable parameters (involving non-quadrature phases) and do not involve symmetry between the odd and even dwells, which would likely preclude their rigorous analytical treatment. The top performing pulse sequence, termed ROULETTE-1, has 18% sharper mean linewidths than SAMPI4 when run on an N-acetyl Leucine crystal. This sequence was also shown to be robust over a broad range of 1H carrier frequencies and various crystal orientations. The performance of such an optimized pulse sequence was also illustrated on 15N Leucine-labeled Pf1 coat protein reconstituted in magnetically aligned bicelles. For the optimized pulse sequence the mean peak width was 14% sharper than SAMPI4, which in turn yielded a better signal to noise ratio, 20:1 vs. 17:1. This method is potentially extendable to de novo development of a variety of NMR experiments.


Subject(s)
Monte Carlo Method , Nuclear Magnetic Resonance, Biomolecular/methods , Algorithms , Bacteriophage Pf1/chemistry , Capsid Proteins/chemistry , Computer Simulation , Crystallization , Hydrogen , Leucine/analogs & derivatives , Leucine/chemistry , Nitrogen Isotopes
9.
J Biomol NMR ; 73(5): 229-244, 2019 May.
Article in English | MEDLINE | ID: mdl-31076969

ABSTRACT

Multidimensional solid-state NMR spectra of oriented membrane proteins can be used to infer the backbone torsion angles and hence the overall protein fold by measuring dipolar couplings and chemical shift anisotropies, which depend on the orientation of each peptide plane with respect to the external magnetic field. However, multiple peptide plane orientations can be consistent with a given set of angular restraints. This ambiguity is further exacerbated by experimental uncertainty in obtaining and interpreting such restraints. The previously developed algorithms for structure calculations using angular restraints typically involve a sequential walkthrough along the backbone to find the torsion angles between the consecutive peptide plane orientations that are consistent with the experimental data. This method is sensitive to experimental uncertainty in interpreting the peak positions of as low as ± 10 Hz, often yielding high structural RMSDs for the calculated structures. Here we present a significantly improved version of the algorithm which includes the fitting of several peptide planes at once in order to prevent propagation of error along the backbone. In addition, a protocol has been devised for filtering the structural solutions using Rosetta scoring functions in order to find the structures that both fit the spectrum and satisfy bioinformatics restraints. The robustness of the new algorithm has been tested using synthetic angular restraints generated from the known structures for two proteins: a soluble protein 2gb1 (56 residues), chosen for its diverse secondary structure elements, i.e. an alpha-helix and two beta-sheets, and a membrane protein 4a2n, from which the first two transmembrane helices (having a total of 64 residues) have been used. Extensive simulations have been performed by varying the number of fitted planes, experimental error, and the number of NMR dimensions. It has been found that simultaneously fitting two peptide planes always shifted the distribution of the calculated structures toward lower structural RMSD values as compared to fitting a single torsion-angle pair. For each protein, irrespective of the simulation parameters, Rosetta was able to distinguish the most plausible structures, often having structural RMSDs lower than 2 Å with respect to the original structure. This study establishes a framework for de-novo protein structure prediction using a combination of solid-state NMR angular restraints and bioinformatics.


Subject(s)
Magnetic Resonance Spectroscopy/methods , Membrane Proteins/chemistry , Protein Conformation
10.
J Magn Reson ; 293: 104-114, 2018 08.
Article in English | MEDLINE | ID: mdl-29920407

ABSTRACT

An automated technique for the sequential assignment of NMR backbone resonances of oriented protein samples has been developed and tested based on 15N-15N homonuclear exchange and spin-exchanged separated local-field spectra. By treating the experimental spectral intensity as a pseudopotential, the Monte-Carlo Simulated Annealing algorithm has been employed to seek lowest-energy assignment solutions over a large sampling space where direct enumeration would be unfeasible. The determined sequential assignments have been scored based on the positions of the crosspeaks resulting from the possible orders for the main peaks. This approach is versatile in terms of the parameters that can be specified to achieve the best-fit result. At a minimum the algorithm requires a continuous segment of the main-peak chemical shifts obtained from a uniformly labeled sample and a spin-exchanged experimental spectrum represented as a 2D matrix array. With selective labeling experiments, groups of chemical shifts corresponding to specific locations in the protein backbone can be fixed, thereby decreasing the sampling space. The output from the program consists of a list of top-score main peak assignments, which can be subjected to further scoring criteria until a consensus solution is found. The algorithm has first been tested on a synthetic spectrum with randomly generated chemical shifts and dipolar couplings for the main peaks. The original assignments have been successfully recovered for as many as 100 main peaks when residue-type information was used even in the presence of substantial spectral peak overlap. The algorithm was then applied to assigning two sets of experimental spectra to recover and confirm the previously established assignments in an automated fashion. For the 20-residue transmembrane domain of Pf1 coat protein reconstituted in magnetically aligned bicelles, the original assignment by Park et al. (2010) was recovered by the automated algorithm with additional input from 5 selectively labeled amino acid spectra. The second case considered was the 46 residue Pf1 bacteriophage from Thiriot et al. (2005) and Knox et al. (2010), of which 38 residues were fit. Automated fitting resulted in several possible assignments but not exactly the original assignment. By using a post-fitting filtering procedure based on the number of missed cross peaks and Pf1 helical structure, a consensus spectroscopic assignment is proposed covering 84% of the original assignment. While the automated assignment works best in spectra with well-resolved crosspeaks, it also tolerates substantial spectral crowding to yield reasonable assignments in the cases where ambiguity and degeneracy of possible assignment solutions are inevitable.


Subject(s)
Nuclear Magnetic Resonance, Biomolecular/methods , Proteins/chemistry , Algorithms , Amino Acids/chemistry , Automation , Bacteriophage Pf1/chemistry , Capsid Proteins/chemistry , Monte Carlo Method , Protein Conformation, alpha-Helical , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...