Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Anal Chem ; 77(23): 7581-93, 2005 Dec 01.
Article in English | MEDLINE | ID: mdl-16316165

ABSTRACT

Algorithmic search engines bridge the gap between large tandem mass spectrometry data sets and the identification of proteins associated with biological samples. Improvements in these tools can greatly enhance biological discovery. We present a new scoring scheme for comparing tandem mass spectra with a protein sequence database. The MASPIC (Multinomial Algorithm for Spectral Profile-based Intensity Comparison) scorer converts an experimental tandem mass spectrum into a m/z profile of probability and then scores peak lists from potential candidate peptides using a multinomial distribution model. The MASPIC scoring scheme incorporates intensity, spectral peak density variations, and m/z error distribution associated with peak matches into a multinomial distribution. The scoring scheme was validated on two standard protein mixtures and an additional set of spectra collected on a complex ribosomal protein mixture from Rhodopseudomonas palustris. The results indicate a 5-15% improvement over Sequest for high-confidence identifications. The performance gap grows as sequence database size increases. Additional tests on spectra from proteinase-K digest data showed similar performance improvements demonstrating the advantages in using MASPIC for studying proteins digested with less specific proteases. All these investigations show MASPIC to be a versatile and reliable system for peptide tandem mass spectral identification.


Subject(s)
Peptides/analysis , Peptides/chemistry , Tandem Mass Spectrometry/methods , Algorithms , Amino Acid Sequence , Endopeptidase K/chemistry , Endopeptidase K/metabolism , Molecular Sequence Data , Rhodopseudomonas/chemistry , Ribosomes/chemistry
2.
Anal Chem ; 77(8): 2464-74, 2005 Apr 15.
Article in English | MEDLINE | ID: mdl-15828782

ABSTRACT

Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead, DBDigger determines which spectra can be compared to each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization also reduces the number of times a spectrum must be predicted for a particular candidate sequence and charge state. As a result, DBDigger can accelerate some database searches by more than an order of magnitude. In addition, the software offers features to reduce the performance degradation introduced by posttranslational modification (PTM) searching. DBDigger allows researchers to specify the sequence context in which each PTM is possible. In the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini of peptides. Use of "context-dependent" PTM searching reduces the performance penalty relative to traditional PTM searching. We characterize the performance possible with DBDigger, showcasing MASPIC, a new statistical scorer. We describe the implementation of these innovations in the hope that other researchers will employ them for rapid and highly flexible proteomic database search.


Subject(s)
Algorithms , Databases, Protein , Proteomics/methods , Amino Acid Sequence , Chromatography, High Pressure Liquid , Mass Spectrometry/methods , Molecular Sequence Data , Protein Processing, Post-Translational , Proteins/analysis , Proteins/chemistry , Proteins/metabolism , Ribosomal Proteins/analysis , Ribosomal Proteins/chemistry , Ribosomal Proteins/metabolism , Software
3.
Bioinformatics ; 19(15): 1952-63, 2003 Oct 12.
Article in English | MEDLINE | ID: mdl-14555629

ABSTRACT

MOTIVATION: Experimental methods capable of generating sets of co-regulated genes have become commonplace, however, recognizing the regulatory motifs responsible for this regulation remains difficult. As a result, computational detection of transcription factor binding sites in such data sets has been an active area of research. Most approaches have utilized either Gibbs sampling or greedy strategies to identify such elements in sets of sequences. These existing methods have varying degrees of success depending on the strength and length of the signals and the number of available sequences. We present a new deterministic iterative algorithm for regulatory element detection based on a Markov chain background. As in other methods, sequences in the entire genome and the training set are taken into account in order to discriminate against commonly occurring signals and produce patterns, which are significant in the training set. RESULTS: The results of the algorithm compare favorably with existing tools on previously known and newly compiled data sets. The iteration based search appears rather rigorous, not only finding the binding sites, but also showing how the binding site stands out from genomic background. The approach used to score the results is critical and a discussion of various scoring schemes and options is also presented. Benchmarking of several methods shows that while most tools are good at detecting strong signals, Gibbs sampling algorithms give inconsistent results when the regulatory element signal becomes weak. A Markov chain based background model alleviates the drawbacks of MAP (maximum a posteriori log likelihood) scores. AVAILABILITY: Available on request from the authors. SUPPLEMENTARY INFORMATION: Data and the results presented in this paper are available on the web at http://compbio.ornl.gov/mira/index.html


Subject(s)
Algorithms , Gene Expression Profiling/methods , Gene Expression Regulation/genetics , Genes, Regulator/genetics , Oligonucleotide Array Sequence Analysis/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , Amino Acid Motifs/genetics , Binding Sites , Protein Structure, Tertiary , Reproducibility of Results , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...