Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Proteome Res ; 23(6): 1983-1999, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38728051

RESUMO

In recent years, several deep learning-based methods have been proposed for predicting peptide fragment intensities. This study aims to provide a comprehensive assessment of six such methods, namely Prosit, DeepMass:Prism, pDeep3, AlphaPeptDeep, Prosit Transformer, and the method proposed by Guan et al. To this end, we evaluated the accuracy of the predicted intensity profiles for close to 1.7 million precursors (including both tryptic and HLA peptides) corresponding to more than 18 million experimental spectra procured from 40 independent submissions to the PRIDE repository that were acquired for different species using a variety of instruments and different dissociation types/energies. Specifically, for each method, distributions of similarity (measured by Pearson's correlation and normalized angle) between the predicted and the corresponding experimental b and y fragment intensities were generated. These distributions were used to ascertain the prediction accuracy and rank the prediction methods for particular types of experimental conditions. The effect of variables like precursor charge, length, and collision energy on the prediction accuracy was also investigated. In addition to prediction accuracy, the methods were evaluated in terms of prediction speed. The systematic assessment of these six methods may help in choosing the right method for MS/MS spectra prediction for particular needs.


Assuntos
Aprendizado Profundo , Humanos , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/análise , Espectrometria de Massas em Tandem/métodos , Espectrometria de Massas em Tandem/estatística & dados numéricos , Proteômica/métodos , Proteômica/estatística & dados numéricos
2.
J Am Soc Mass Spectrom ; 35(6): 1138-1155, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38740383

RESUMO

Having fast, accurate, and broad spectrum methods for the identification of microorganisms is of paramount importance to public health, research, and safety. Bottom-up mass spectrometer-based proteomics has emerged as an effective tool for the accurate identification of microorganisms from microbial isolates. However, one major hurdle that limits the deployment of this tool for routine clinical diagnosis, and other areas of research such as culturomics, is the instrument time required for the mass spectrometer to analyze a single sample, which can take ∼1 h per sample, when using mass spectrometers that are presently used in most institutes. To address this issue, in this study, we employed, for the first time, tandem mass tags (TMTs) in multiplex identifications of microorganisms from multiple TMT-labeled samples in one MS/MS experiment. A difficulty encountered when using TMT labeling is the presence of interference in the measured intensities of TMT reporter ions. To correct for interference, we employed in the proposed method a modified version of the expectation maximization (EM) algorithm that redistributes the signal from ion interference back to the correct TMT-labeled samples. We have evaluated the sensitivity and specificity of the proposed method using 94 MS/MS experiments (covering a broad range of protein concentration ratios across TMT-labeled channels and experimental parameters), containing a total of 1931 true positive TMT-labeled channels and 317 true negative TMT-labeled channels. The results of the evaluation show that the proposed method has an identification sensitivity of 93-97% and a specificity of 100% at the species level. Furthermore, as a proof of concept, using an in-house-generated data set composed of some of the most common urinary tract pathogens, we demonstrated that by using the proposed method the mass spectrometer time required per sample, using a 1 h LC-MS/MS run, can be reduced to 10 and 6 min when samples are labeled with TMT-6 and TMT-10, respectively. The proposed method can also be used along with Orbitrap mass spectrometers that have faster MS/MS acquisition rates, like the recently released Orbitrap Astral mass spectrometer, to further reduce the mass spectrometer time required per sample.


Assuntos
Algoritmos , Proteômica , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Humanos , Bactérias/isolamento & purificação , Bactérias/química , Proteínas de Bactérias/análise , Proteínas de Bactérias/química , Proteínas de Bactérias/isolamento & purificação
3.
J Comput Biol ; 31(2): 175-178, 2024 02.
Artigo em Inglês | MEDLINE | ID: mdl-38301204

RESUMO

Although many user-friendly workflows exist for identifications of peptides and proteins in mass-spectrometry-based proteomics, there is a need of easy to use, fast, and accurate workflows for identifications of microorganisms, antimicrobial resistant proteins, and biomass estimation. Identification of microorganisms is a computationally demanding task that requires querying thousands of MS/MS spectra in a database containing thousands to tens of thousands of microorganisms. Existing software can't handle such a task in a time efficient manner, taking hours to process a single MS/MS experiment. Another paramount factor to consider is the necessity of accurate statistical significance to properly control the proportion of false discoveries among the identified microorganisms, and antimicrobial-resistant proteins, and to provide robust biomass estimation. Recently, we have developed Microorganism Classification and Identification (MiCId) workflow that assigns accurate statistical significance to identified microorganisms, antimicrobial-resistant proteins, and biomass estimation. MiCId's workflow is also computationally efficient, taking about 6-17 minutes to process a tandem mass-spectrometry (MS/MS) experiment using computer resources that are available in most laptop and desktop computers, making it a portable workflow. To make data analysis accessible to a broader range of users, beyond users familiar with the Linux environment, we have developed a graphical user interface (GUI) for MiCId's workflow. The GUI brings to users all the functionality of MiCId's workflow in a friendly interface along with tools for data analysis, visualization, and to export results.


Assuntos
Anti-Infecciosos , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Fluxo de Trabalho , Software , Proteínas
4.
medRxiv ; 2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38313291

RESUMO

Objective: To investigate the relationship between vaccination rates and excess mortality during distinct waves of SARS-CoV-2 variant-specific infections, while considering a state's GDP per capita. Methods: We ranked U.S. states by vaccination rates and GDP and employed the CDC's excess mortality model for regression and odds ratio analysis. Results: Regression analysis reveals that both vaccination and GDP are significant factors related to mortality when considering the entire U.S. population. Notably, in wealthier states (with GDP above $65,000), excess mortality is primarily driven by slow vaccination rates, while in less affluent states, low GDP plays a major role. Odds ratio analysis demonstrates an almost twofold increase in mortality linked to the Delta and Omicron BA.1 virus variants in states with the slowest vaccination rates compared to those with the fastest (OR 1.8, 95% CI 1.7-1.9, p < 0.01). However, this gap disappeared in the post-Omicron BA.1 period. Conclusion: The interplay between slow vaccination and low GDP per capita drives high mortality.

5.
J Am Soc Mass Spectrom ; 33(6): 917-931, 2022 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-35500907

RESUMO

Fast and accurate identifications of pathogenic bacteria along with their associated antibiotic resistance proteins are of paramount importance for patient treatments and public health. To meet this goal from the mass spectrometry aspect, we have augmented the previously published Microorganism Classification and Identification (MiCId) workflow for this capability. To evaluate the performance of this augmented workflow, we have used MS/MS datafiles from samples of 10 antibiotic resistance bacterial strains belonging to three different species: Escherichia coli, Klebsiella pneumoniae, and Pseudomonas aeruginosa. The evaluation shows that MiCId's workflow has a sensitivity value around 85% (with a lower bound at about 72%) and a precision greater than 95% in identifying antibiotic resistance proteins. In addition to having high sensitivity and precision, MiCId's workflow is fast and portable, making it a valuable tool for rapid identifications of bacteria as well as detection of their antibiotic resistance proteins. It performs microorganismal identifications, protein identifications, sample biomass estimates, and antibiotic resistance protein identifications in 6-17 min per MS/MS sample using computing resources that are available in most desktop and laptop computers. We have also demonstrated other use of MiCId's workflow. Using MS/MS data sets from samples of two bacterial clonal isolates, one being antibiotic-sensitive while the other being multidrug-resistant, we applied MiCId's workflow to investigate possible mechanisms of antibiotic resistance in these pathogenic bacteria; the results showed that MiCId's conclusions agree with the published study. The new version of MiCId (v.07.01.2021) is freely available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Antibacterianos/farmacologia , Bactérias/química , Farmacorresistência Bacteriana , Resistência Microbiana a Medicamentos , Escherichia coli , Humanos , Proteômica/métodos , Pseudomonas aeruginosa , Espectrometria de Massas em Tandem/métodos , Fluxo de Trabalho
6.
Nucleic Acids Res ; 50(2): e11, 2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-34791389

RESUMO

The choice of guide RNA (gRNA) for CRISPR-based gene targeting is an essential step in gene editing applications, but the prediction of gRNA specificity remains challenging. Lack of transparency and focus on point estimates of efficiency disregarding the information on possible error sources in the model limit the power of existing Deep Learning-based methods. To overcome these problems, we present a new approach, a hybrid of Capsule Networks and Gaussian Processes. Our method predicts the cleavage efficiency of a gRNA with a corresponding confidence interval, which allows the user to incorporate information regarding possible model errors into the experimental design. We provide the first utilization of uncertainty estimation in computational gRNA design, which is a critical step toward accurate decision-making for future CRISPR applications. The proposed solution demonstrates acceptable confidence intervals for most test sets and shows regression quality similar to existing models. We introduce a set of criteria for gRNA selection based on off-target cleavage efficiency and its variance and present a collection of pre-computed gRNAs for human chromosome 22. Using Neural Network Interpretation methods, we show that our model rediscovers an established biological factor underlying cleavage efficiency, the importance of the seed region in gRNA.


Assuntos
Sistemas CRISPR-Cas , Aprendizado Profundo , Edição de Genes , Marcação de Genes , RNA Guia de Cinetoplastídeos/genética , Algoritmos , Edição de Genes/métodos , Marcação de Genes/métodos , Genômica/métodos , Humanos , Redes Neurais de Computação , Reprodutibilidade dos Testes
7.
Front Cell Infect Microbiol ; 11: 634215, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34381737

RESUMO

Bloodstream infections (BSIs), the presence of microorganisms in blood, are potentially serious conditions that can quickly develop into sepsis and life-threatening situations. When assessing proper treatment, rapid diagnosis is the key; besides clinical judgement performed by attending physicians, supporting microbiological tests typically are performed, often requiring microbial isolation and culturing steps, which increases the time required for confirming positive cases of BSI. The additional waiting time forces physicians to prescribe broad-spectrum antibiotics and empirically based treatments, before determining the precise cause of the disease. Thus, alternative and more rapid cultivation-independent methods are needed to improve clinical diagnostics, supporting prompt and accurate treatment and reducing the development of antibiotic resistance. In this study, a culture-independent workflow for pathogen detection and identification in blood samples was developed, using peptide biomarkers and applying bottom-up proteomics analyses, i.e., so-called "proteotyping". To demonstrate the feasibility of detection of blood infectious pathogens, using proteotyping, Escherichia coli and Staphylococcus aureus were included in the study, as the most prominent bacterial causes of bacteremia and sepsis, as well as Candida albicans, one of the most prominent causes of fungemia. Model systems including spiked negative blood samples, as well as positive blood cultures, without further culturing steps, were investigated. Furthermore, an experiment designed to determine the incubation time needed for correct identification of the infectious pathogens in blood cultures was performed. The results for the spiked negative blood samples showed that proteotyping was 100- to 1,000-fold more sensitive, in comparison with the MALDI-TOF MS-based approach. Furthermore, in the analyses of ten positive blood cultures each of E. coli and S. aureus, both the MALDI-TOF MS-based and proteotyping approaches were successful in the identification of E. coli, although only proteotyping could identify S. aureus correctly in all samples. Compared with the MALDI-TOF MS-based approaches, shotgun proteotyping demonstrated higher sensitivity and accuracy, and required significantly shorter incubation time before detection and identification of the correct pathogen could be accomplished.


Assuntos
Bacteriemia , Infecções Estafilocócicas , Bacteriemia/diagnóstico , Candida albicans , Escherichia coli , Humanos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Infecções Estafilocócicas/diagnóstico , Staphylococcus aureus
8.
Sci Rep ; 11(1): 2997, 2021 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-33542373

RESUMO

The rDNA clusters and flanking sequences on human chromosomes 13, 14, 15, 21 and 22 represent large gaps in the current genomic assembly. The organization and the degree of divergence of the human rDNA units within an individual nucleolar organizer region (NOR) are only partially known. To address this lacuna, we previously applied transformation-associated recombination (TAR) cloning to isolate individual rDNA units from chromosome 21. That approach revealed an unexpectedly high level of heterogeneity in human rDNA, raising the possibility of corresponding variations in ribosome dynamics. We have now applied the same strategy to analyze an entire rDNA array end-to-end from a copy of chromosome 22. Sequencing of TAR isolates provided the entire NOR sequence, including proximal and distal junctions that may be involved in nucleolar function. Comparison of the newly sequenced rDNAs to reference sequence for chromosomes 22 and 21 revealed variants that are shared in human rDNA in individuals from different ethnic groups, many of them at high frequency. Analysis infers comparable intra- and inter-individual divergence of rDNA units on the same and different chromosomes, supporting the concerted evolution of rDNA units. The results provide a route to investigate further the role of rDNA variation in nucleolar formation and in the empirical associations of nucleoli with pathology.


Assuntos
Cromossomos Humanos Par 22/genética , DNA Ribossômico/genética , Genoma Humano/genética , Região Organizadora do Nucléolo/genética , Nucléolo Celular/genética , Clonagem Molecular , Heterogeneidade Genética , Genômica , Humanos , Anotação de Sequência Molecular , Ribossomos/genética
9.
Proteomics ; 19(14): e1800367, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30908818

RESUMO

Mass spectrometry-based proteomics starts with identifications of peptides and proteins, which provide the bases for forming the next-level hypotheses whose "validations" are often employed for forming even higher level hypotheses and so forth. Scientifically meaningful conclusions are thus attainable only if the number of falsely identified peptides/proteins is accurately controlled. For this reason, RAId continued to be developed in the past decade. RAId employs rigorous statistics for peptides/proteins identification, hence assigning accurate P-values/E-values that can be used confidently to control the number of falsely identified peptides and proteins. The RAId web service is a versatile tool built to identify peptides and proteins from tandem mass spectrometry data. Not only recognizing various spectra file formats, the web service also allows four peptide scoring functions and choice of three statistical methods for assigning P-values/E-values to identified peptides. Users may upload their own protein database or use one of the available knowledge integrated organismal databases that contain annotated information such as single amino acid polymorphisms, post-translational modifications, and their disease associations. The web service also provides a friendly interface to display, sort using different criteria, and download the identified peptides and proteins. RAId web service is freely available at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid.


Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Proteômica/métodos , Biologia Computacional
10.
PLoS One ; 13(6): e0199162, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29928000

RESUMO

Off-target oligoprobe's interaction with partially complementary nucleotide sequences represents a problem for many bio-techniques. The goal of the study was to identify oligoprobe sequence characteristics that control the ratio between on-target and off-target hybridization. To understand the complex interplay between specific and genome-wide off-target (cross-hybridization) signals, we analyzed a database derived from genomic comparison hybridization experiments performed with an Affymetrix tiling array. The database included two types of probes with signals derived from (i) a combination of specific signal and cross-hybridization and (ii) genomic cross-hybridization only. All probes from the database were grouped into bins according to their sequence characteristics, where both hybridization signals were averaged separately. For selection of specific probes, we analyzed the following sequence characteristics: vulnerability to self-folding, nucleotide composition bias, numbers of G nucleotides and GGG-blocks, and occurrence of probe's k-mers in the human genome. Increases in bin ranges for these characteristics are simultaneously accompanied by a decrease in hybridization specificity-the ratio between specific and cross-hybridization signals. However, both averaged hybridization signals exhibit growing trends along with an increase of probes' binding energy, where the hybridization specific signal increases significantly faster in comparison to the cross-hybridization. The same trend is evident for the S function, which serves as a combined evaluation of probe binding energy and occurrence of probe's k-mers in the genome. Application of S allows extracting a larger number of specific probes, as compared to using only binding energy. Thus, we showed that high values of specific and cross-hybridization signals are not mutually exclusive for probes with high values of binding energy and S. In this study, the application of a new set of sequence characteristics allows detection of probes that are highly specific to their targets for array design and other bio-techniques that require selection of specific probes.


Assuntos
Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Sequência de Bases , Bases de Dados Genéticas , Genoma , Humanos
11.
J Am Soc Mass Spectrom ; 29(8): 1721-1737, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-29873019

RESUMO

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.

12.
Nucleic Acids Res ; 46(13): 6712-6725, 2018 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-29788454

RESUMO

Despite the key role of the human ribosome in protein biosynthesis, little is known about the extent of sequence variation in ribosomal DNA (rDNA) or its pre-rRNA and rRNA products. We recovered ribosomal DNA segments from a single human chromosome 21 using transformation-associated recombination (TAR) cloning in yeast. Accurate long-read sequencing of 13 isolates covering ∼0.82 Mb of the chromosome 21 rDNA complement revealed substantial variation among tandem repeat rDNA copies, several palindromic structures and potential errors in the previous reference sequence. These clones revealed 101 variant positions in the 45S transcription unit and 235 in the intergenic spacer sequence. Approximately 60% of the 45S variants were confirmed in independent whole-genome or RNA-seq data, with 47 of these further observed in mature 18S/28S rRNA sequences. TAR cloning and long-read sequencing enabled the accurate reconstruction of multiple rDNA units and a new, high-quality 44 838 bp rDNA reference sequence, which we have annotated with variants detected from chromosome 21 of a single individual. The large number of variants observed reveal heterogeneity in human rDNA, opening up the possibility of corresponding variations in ribosome dynamics.


Assuntos
Cromossomos Humanos Par 21 , DNA Ribossômico/química , Genes de RNAr , Variação Genética , Animais , Linhagem Celular , Clonagem Molecular , DNA Ribossômico/isolamento & purificação , DNA Espaçador Ribossômico/química , Humanos , Camundongos , Conformação de Ácido Nucleico , Região Organizadora do Nucléolo/química , RNA Ribossômico/química , RNA Ribossômico/metabolismo , Análise de Sequência de DNA
13.
BMC Res Notes ; 11(1): 182, 2018 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-29544540

RESUMO

OBJECTIVE: RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. RESULTS: We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .


Assuntos
Biologia Computacional/métodos , Proteoma/análise , Proteômica/métodos , Software , Interface Usuário-Computador , Bases de Dados de Proteínas , Humanos , Internet , Espectrometria de Massas em Tandem/métodos
14.
RNA Biol ; 14(12): 1649-1654, 2017 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-28722509

RESUMO

Comparison of mRNA and protein structures shows that highly structured mRNAs typically encode compact protein domains suggesting that mRNA structure controls protein folding. This function is apparently performed by distinct structural elements in the mRNA, which implies 'fine tuning' of mRNA structure under selection for optimal protein folding. We find that, during evolution, changes in the mRNA folding energy follow amino acid replacements, reinforcing the notion of an intimate connection between the structures of a mRNA and the protein it encodes, and the double encoding of protein sequence and folding in the mRNA.


Assuntos
Adaptação Biológica , Conformação de Ácido Nucleico , Biossíntese de Proteínas , Dobramento de Proteína , RNA Mensageiro/química , RNA Mensageiro/genética , Animais , Evolução Biológica , Humanos , Estabilidade de RNA , Seleção Genética , Relação Estrutura-Atividade
15.
Bioinformatics ; 32(17): i552-i558, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27587674

RESUMO

MOTIVATION: Target-specific hybridization depends on oligo-probe characteristics that improve hybridization specificity and minimize genome-wide cross-hybridization. Interplay between specific hybridization and genome-wide cross-hybridization has been insufficiently studied, despite its crucial role in efficient probe design and in data analysis. RESULTS: In this study, we defined hybridization specificity as a ratio between oligo target-specific hybridization and oligo genome-wide cross-hybridization. A microarray database, derived from the Genomic Comparison Hybridization (GCH) experiment and performed using the Affymetrix platform, contains two different types of probes. The first type of oligo-probes does not have a specific target on the genome and their hybridization signals are derived from genome-wide cross-hybridization alone. The second type includes oligonucleotides that have a specific target on the genomic DNA and their signals are derived from specific and cross-hybridization components combined together in a total signal. A comparative analysis of hybridization specificity of oligo-probes, as well as their nucleotide sequences and thermodynamic features was performed on the database. The comparison has revealed that hybridization specificity was negatively affected by low stability of the fully-paired oligo-target duplex, stable probe self-folding, G-rich content, including GGG motifs, low sequence complexity and nucleotide composition symmetry. CONCLUSION: Filtering out the probes with defined 'negative' characteristics significantly increases specific hybridization and dramatically decreasing genome-wide cross-hybridization. Selected oligo-probes have two times higher hybridization specificity on average, compared to the probes that were filtered from the analysis by applying suggested cutoff thresholds to the described parameters. A new approach for efficient oligo-probe design is described in our study. CONTACT: shabalin@ncbi.nlm.nih.gov or olga.matveeva@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Razão Sinal-Ruído , Sondas de DNA , Perfilação da Expressão Gênica , Genômica , Oligonucleotídeos , Sensibilidade e Especificidade
16.
Nucleic Acids Res ; 44(22): 10898-10911, 2016 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-27466388

RESUMO

Specific structures in mRNA modulate translation rate and thus can affect protein folding. Using the protein structures from two eukaryotes and three prokaryotes, we explore the connections between the protein compactness, inferred from solvent accessibility, and mRNA structure, inferred from mRNA folding energy (ΔG). In both prokaryotes and eukaryotes, the ΔG value of the most stable 30 nucleotide segment of the mRNA (ΔGmin) strongly, positively correlates with protein solvent accessibility. Thus, mRNAs containing exceptionally stable secondary structure elements typically encode compact proteins. The correlations between ΔG and protein compactness are much more pronounced in predicted ordered parts of proteins compared to the predicted disordered parts, indicative of an important role of mRNA secondary structure elements in the control of protein folding. Additionally, ΔG correlates with the mRNA length and the evolutionary rate of synonymous positions. The correlations are partially independent and were used to construct multiple regression models which explain about half of the variance of protein solvent accessibility. These findings suggest a model in which the mRNA structure, particularly exceptionally stable RNA structural elements, act as gauges of protein co-translational folding by reducing ribosome speed when the nascent peptide needs time to form and optimize the core structure.


Assuntos
Dobramento de Proteína , RNA Mensageiro/fisiologia , Animais , Composição de Bases , Humanos , Cinética , Modelos Lineares , Modelos Moleculares , Conformação de Ácido Nucleico , Biossíntese de Proteínas , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Estabilidade de RNA , RNA Mensageiro/química , Termodinâmica , Transcriptoma
17.
J Am Soc Mass Spectrom ; 27(2): 194-210, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26510657

RESUMO

Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple 'fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.


Assuntos
Bactérias/classificação , Espectrometria de Massas em Tandem/métodos , Espectrometria de Massas em Tandem/estatística & dados numéricos , Bactérias/química , Bases de Dados Factuais , Escherichia coli/classificação , Peptídeos/análise , Peptídeos/química , Pseudomonas aeruginosa/classificação , Software
18.
Nucleic Acids Res ; 42(11): 7132-44, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24792168

RESUMO

Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns.


Assuntos
Evolução Molecular , Isoformas de Proteínas/genética , Iniciação da Transcrição Genética , Terminação da Transcrição Genética , Animais , Variação Genética , Humanos , Camundongos , Proteoma , Transcriptoma
19.
J Am Soc Mass Spectrom ; 25(1): 57-70, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24254576

RESUMO

In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs's algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs's polynomial algorithm, and MIDAs's Fourier transform algorithm. Among the three, IC and MIDAs's polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html.


Assuntos
Isótopos/química , Espectrometria de Massas/métodos , Software , Algoritmos , Internet , Peso Molecular , Proteômica
20.
Front Genet ; 3: 163, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22952469

RESUMO

Small hairpin RNAs (shRNAs) became an important research tool in cell biology. Reliable design of these molecules is essential for the needs of large functional genomics projects. To optimize the design of efficient shRNAs, we performed comparative, thermodynamic, and correlation analyses of ~18,000 miR30-based shRNAs with known functional efficiencies, derived from the Sensor Assay project (Fellmann et al., 2011). We identified features of the shRNA guide strand that significantly correlate with the silencing efficiency and performed multiple regression analysis, using 4/5 of the data for training purposes and 1/5 for cross validation. A model that included the position-dependent nucleotide preferences was predictive in the cross-validation data subset (R = 0.39). However, a model, which in addition to the nucleotide preferences included thermodynamic shRNA features such as a thermodynamic duplex stability and position-dependent thermodynamic profile (dinucleotide free energy) was performing better (R = 0.53). Software "miR_Scan" was developed based upon the optimized models. Calculated mRNA target secondary structure stability showed correlation with shRNA silencing efficiency but failed to improve the model. Correlation analysis demonstrates that our algorithm for identification of efficient miR30-based shRNA molecules performs better than approaches that were developed for design of chemically synthesized siRNAs (R(max) = 0.36).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...