Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Proteome Res ; 21(7): 1628-1639, 2022 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-35612954

RESUMO

Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Processamento Alternativo , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Peptídeos/genética , Peptídeos/metabolismo , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA/metabolismo , Análise de Sequência de RNA , Espectrometria de Massas em Tandem/métodos , Transcriptoma
2.
Comput Struct Biotechnol J ; 19: 3810-3816, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34285780

RESUMO

External DNA sequences can be inserted into an organism's genome either through natural processes such as gene transfer, or through targeted genome engineering strategies. Being able to robustly identify such foreign DNA is a crucial capability for health and biosecurity applications, such as anti-microbial resistance (AMR) detection or monitoring gene drives. This capability does not exist for poorly characterised host genomes or with limited information about the integrated sequence. To address this, we developed the INserted Sequence Information DEtectoR (INSIDER). INSIDER analyses whole genome sequencing data and identifies segments of potentially foreign origin by their significant shift in k-mer signatures. We demonstrate the power of INSIDER to separate integrated DNA sequences from normal genomic sequences on a synthetic dataset simulating the insertion of a CRISPR-Cas gene drive into wild-type yeast. As a proof-of-concept, we use INSIDER to detect the exact AMR plasmid in whole genome sequencing data from a Citrobacter freundii patient isolate. INSIDER streamlines the process of identifying integrated DNA in poorly characterised wild species or when the insert is of unknown origin, thus enhancing the monitoring of emerging biosecurity threats.

4.
Nat Biotechnol ; 39(11): 1453-1465, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34140680

RESUMO

Existing compendia of non-coding RNA (ncRNA) are incomplete, in part because they are derived almost exclusively from small and polyadenylated RNAs. Here we present a more comprehensive atlas of the human transcriptome, which includes small and polyA RNA as well as total RNA from 300 human tissues and cell lines. We report thousands of previously uncharacterized RNAs, increasing the number of documented ncRNAs by approximately 8%. To infer functional regulation by known and newly characterized ncRNAs, we exploited pre-mRNA abundance estimates from total RNA sequencing, revealing 316 microRNAs and 3,310 long non-coding RNAs with multiple lines of evidence for roles in regulating protein-coding genes and pathways. Our study both refines and expands the current catalog of human ncRNAs and their regulatory interactions. All data, analyses and results are available for download and interrogation in the R2 web portal, serving as a basis for future exploration of RNA biology and function.


Assuntos
MicroRNAs , RNA Longo não Codificante , Humanos , MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Mensageiro , RNA não Traduzido/genética , Transcriptoma/genética
5.
Transbound Emerg Dis ; 67(4): 1453-1462, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32306500

RESUMO

Pre-clinical responses to fast-moving infectious disease outbreaks heavily depend on choosing the best isolates for animal models that inform diagnostics, vaccines and treatments. Current approaches are driven by practical considerations (e.g. first available virus isolate) rather than a detailed analysis of the characteristics of the virus strain chosen, which can lead to animal models that are not representative of the circulating or emerging clusters. Here, we suggest a combination of epidemiological, experimental and bioinformatic considerations when choosing virus strains for animal model generation. We discuss the currently chosen SARS-CoV-2 strains for international coronavirus disease (COVID-19) models in the context of their phylogeny as well as in a novel alignment-free bioinformatic approach. Unlike phylogenetic trees, which focus on individual shared mutations, this new approach assesses genome-wide co-developing functionalities and hence offers a more fluid view of the 'cloud of variances' that RNA viruses are prone to accumulate. This joint approach concludes that while the current animal models cover the existing viral strains adequately, there is substantial evolutionary activity that is likely not considered by the current models. Based on insights from the non-discrete alignment-free approach and experimental observations, we suggest isolates for future animal models.


Assuntos
Biologia Computacional , Infecções por Coronavirus/epidemiologia , Surtos de Doenças , Genômica , Pandemias/prevenção & controle , Pneumonia Viral/epidemiologia , Animais , Betacoronavirus/genética , Evolução Biológica , COVID-19 , Modelos Animais de Doenças , Humanos , Filogenia , SARS-CoV-2
6.
Proteomics ; 19(17): e1800444, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31328383

RESUMO

High-resolution MS/MS spectra of peptides can be deisotoped to identify monoisotopic masses of peptide fragments. The use of such masses should improve protein identification rates. However, deisotoping is not universally used and its benefits have not been fully explored. Here, MS2-Deisotoper, a tool for use prior to database search, is used to identify monoisotopic peaks in centroided MS/MS spectra. MS2-Deisotoper works by comparing the mass and relative intensity of each peptide fragment peak to every other peak of greater mass, and by applying a set of rules concerning mass and intensity differences. After comprehensive parameter optimization, it is shown that MS2-Deisotoper can improve the number of peptide spectrum matches (PSMs) identified by up to 8.2% and proteins by up to 2.8%. It is effective with SILAC and non-SILAC MS/MS data. The identification of unique peptide sequences is also improved, increasing the number of human proteoforms by 3.7%. Detailed investigation of results shows that deisotoping increases Mascot ion scores, improves FDR estimation for PSMs, and leads to greater protein sequence coverage. At a peptide level, it is found that the efficacy of deisotoping is affected by peptide mass and charge. MS2-Deisotoper can be used via a user interface or as a command-line tool.


Assuntos
Isótopos de Carbono/análise , Marcação por Isótopo/métodos , Isótopos de Nitrogênio/análise , Fragmentos de Peptídeos/análise , Proteínas/análise , Software , Espectrometria de Massas em Tandem/estatística & dados numéricos , Algoritmos , Isótopos de Carbono/química , Bases de Dados de Proteínas , Humanos , Isótopos de Nitrogênio/química , Fragmentos de Peptídeos/química , Proteínas/química , Espectrometria de Massas em Tandem/métodos
7.
Curr Protoc Bioinformatics ; 66(1): e71, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30653846

RESUMO

Post-translational modifications (PTMs) of proteins act as key regulators of protein activity, including the regulation of protein-protein interactions (PPIs). However, exploring functional links between PTMs and PPIs can be difficult. PTMOracle is a Cytoscape app that facilitates the co-visualization and co-analysis of PTMs in the context of PPI networks. PTMOracle also allows extensive data to be integrated and co-analyzed, allowing the role of domains, motifs, and disordered regions to be considered. Here, we describe several PTMOracle protocols investigating complex PTM-associated relationships and their role in PPIs. This is assisted by OraclePainter for coloring proteins by the modifications present and visualizing these in the context of networks, by OracleTools for cross-matching PTMs with sequence feature for all nodes in the network, and by OracleResults for exploring specific proteins and visualizing their PTMs in the context of protein sequences. This unit aims to demonstrate how PTMOracle can be used to systematically explore network visualizations and generate testable hypotheses regarding the functional role of PTMs in PPIs, and how the results can be analyzed to better understand the regulatory role of PTMs in PPIs. © 2019 by John Wiley & Sons, Inc.


Assuntos
Mapas de Interação de Proteínas , Processamento de Proteína Pós-Traducional , Software , Acetilação , Aminoácidos/metabolismo , Histonas/metabolismo , Humanos , Anotação de Sequência Molecular , Fosfoproteínas/metabolismo , Fosforilação , Fosfotirosina/metabolismo , Proteólise , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
8.
J Proteome Res ; 17(1): 359-373, 2018 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-29057651

RESUMO

The study of post-translational methylation is hampered by the fact that large-scale LC-MS/MS experiments produce high methylpeptide false discovery rates (FDRs). The use of heavy-methyl stable isotope labeling by amino acids in cell culture (heavy-methyl SILAC) can drastically reduce these FDRs; however, this approach is limited by a lack of heavy-methyl SILAC compatible software. To fill this gap, we recently developed MethylQuant. Here, using an updated version of MethylQuant, we demonstrate its methylpeptide validation and quantification capabilities and provide guidelines for its best use. Using reference heavy-methyl SILAC data sets, we show that MethylQuant predicts with statistical significance the true or false positive status of methylpeptides in samples of varying complexity, degree of methylpeptide enrichment, and heavy to light mixing ratios. We introduce methylpeptide confidence indicators, MethylQuant Confidence and MethylQuant Score, and demonstrate their strong performance in complex samples characterized by a lack of methylpeptide enrichment. For these challenging data sets, MethylQuant identifies 882 of 1165 true positive methylpeptide spectrum matches (i.e., >75% sensitivity) at high specificity (<2% FDR) and achieves near-perfect specificity at 41% sensitivity. We also demonstrate that MethylQuant produces high accuracy relative quantification data that are tolerant of interference from coeluting peptide ions. Together MethylQuant's capabilities provide a path toward routine, accurate characterizations of the methylproteome using heavy-methyl SILAC.


Assuntos
Metilação , Processamento de Proteína Pós-Traducional , Proteômica/métodos , Sítios de Ligação , Marcação por Isótopo , Sensibilidade e Especificidade
9.
J Proteome Res ; 16(5): 1988-2003, 2017 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-28349685

RESUMO

Post-translational modifications of proteins (PTMs) act as key regulators of protein activity and of protein-protein interactions (PPIs). To date, it has been difficult to comprehensively explore functional links between PTMs and PPIs. To address this, we developed PTMOracle, a Cytoscape app for coanalyzing PTMs within PPI networks. PTMOracle also allows extensive data to be integrated and coanalyzed with PPI networks, allowing the role of domains, motifs, and disordered regions to be considered. For proteins of interest, or a whole proteome, PTMOracle can generate network visualizations to reveal complex PTM-associated relationships. This is assisted by OraclePainter for coloring proteins by modifications, OracleTools for network analytics, and OracleResults for exploring tabulated findings. To illustrate the use of PTMOracle, we investigate PTM-associated relationships and their role in PPIs in four case studies. In the yeast interactome and its rich set of PTMs, we construct and explore histone-associated and domain-domain interaction networks and show how integrative approaches can predict kinases involved in phosphodegrons. In the human interactome, a phosphotyrosine-associated network is analyzed but highlights the sparse nature of human PPI networks and lack of PTM-associated data. PTMOracle is open source and available at the Cytoscape app store: http://apps.cytoscape.org/apps/ptmoracle .


Assuntos
Aplicativos Móveis , Mapas de Interação de Proteínas , Processamento de Proteína Pós-Traducional , Proteínas Fúngicas , Humanos , Fosfotransferases/metabolismo , Fosfotirosina/metabolismo , Proteínas , Leveduras
10.
Mol Cell Proteomics ; 15(3): 989-1006, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26699799

RESUMO

All large scale LC-MS/MS post-translational methylation site discovery experiments require methylpeptide spectrum matches (methyl-PSMs) to be identified at acceptably low false discovery rates (FDRs). To meet estimated methyl-PSM FDRs, methyl-PSM filtering criteria are often determined using the target-decoy approach. The efficacy of this methyl-PSM filtering approach has, however, yet to be thoroughly evaluated. Here, we conduct a systematic analysis of methyl-PSM FDRs across a range of sample preparation workflows (each differing in their exposure to the alcohols methanol and isopropyl alcohol) and mass spectrometric instrument platforms (each employing a different mode of MS/MS dissociation). Through (13)CD3-methionine labeling (heavy-methyl SILAC) of Saccharomyces cerevisiae cells and in-depth manual data inspection, accurate lists of true positive methyl-PSMs were determined, allowing methyl-PSM FDRs to be compared with target-decoy approach-derived methyl-PSM FDR estimates. These results show that global FDR estimates produce extremely unreliable methyl-PSM filtering criteria; we demonstrate that this is an unavoidable consequence of the high number of amino acid combinations capable of producing peptide sequences that are isobaric to methylated peptides of a different sequence. Separate methyl-PSM FDR estimates were also found to be unreliable due to prevalent sources of false positive methyl-PSMs that produce high peptide identity score distributions. Incorrect methylation site localizations, peptides containing cysteinyl-S-ß-propionamide, and methylated glutamic or aspartic acid residues can partially, but not wholly, account for these false positive methyl-PSMs. Together, these results indicate that the target-decoy approach is an unreliable means of estimating methyl-PSM FDRs and methyl-PSM filtering criteria. We suggest that orthogonal methylpeptide validation (e.g. heavy-methyl SILAC or its offshoots) should be considered a prerequisite for obtaining high confidence methyl-PSMs in large scale LC-MS/MS methylation site discovery experiments and make recommendations on how to reduce methyl-PSM FDRs in samples not amenable to heavy isotope labeling. Data are available via ProteomeXchange with the data identifier PXD002857.


Assuntos
Proteômica/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Reações Falso-Positivas , Metilação , Peptídeos/química , Proteômica/instrumentação , Espectrometria de Massas em Tandem/instrumentação , Espectrometria de Massas em Tandem/métodos
11.
J Proteome Res ; 14(12): 5038-47, 2015 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-26554900

RESUMO

In recent years, proteomic data have contributed to genome annotation efforts, most notably in humans and mice, and spawned a field termed "proteogenomics". Yeast, in contrast with higher eukaryotes, has a small genome, which has lent itself to simpler ORF prediction. Despite this, continual advances in mass spectrometry suggest that proteomics should be able to improve genome annotation even in this well-characterized species. Here we applied a proteogenomics workflow to yeast to identify novel protein-coding genes. Specific databases were generated, from intergenic regions of the genome, which were then queried with MS/MS data. This suggested the existence of several putative novel ORFs of <100 codons, one of which we chose to validate. Synthetic peptides, RNA-Seq analysis, and evidence of evolutionary conservation allowed for the unequivocal definition of a new protein of 78 amino acids encoded on chromosome X, which we dub YJR107C-A. It encodes a new type of domain, which ab initio modeling suggests as predominantly α-helical. We show that this gene is nonessential for growth; however, deletion increases sensitivity to osmotic stress. Finally, from the above discovery process, we discuss a generalizable strategy for the identification of short ORFs and small proteins, many of which are likely to be undiscovered.


Assuntos
Genômica/métodos , Fases de Leitura Aberta , Proteômica/métodos , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Sequência de Aminoácidos , Bases de Dados Genéticas , Técnicas de Inativação de Genes , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos , Espectrometria de Massas em Tandem
12.
J Proteome Res ; 14(9): 3541-54, 2015 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-25961807

RESUMO

Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same sample, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC samples were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).


Assuntos
Proteômica , RNA Mensageiro/genética , Análise de Sequência de RNA , Células Cultivadas , Códon , Éxons , Humanos , Fases de Leitura Aberta , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Splicing de RNA , Espectrometria de Massas em Tandem
13.
J Proteome Res ; 13(1): 84-98, 2014 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-24152167

RESUMO

Direct links between proteomic and genomic/transcriptomic data are not frequently made, partly because of lack of appropriate bioinformatics tools. To help address this, we have developed the PG Nexus pipeline. The PG Nexus allows users to covisualize peptides in the context of genomes or genomic contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of Campylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene validation and supports the identification of mRNA splice junction boundaries and splice variants that are protein-coding. This is illustrated with an analysis of splice junctions covered by human phosphopeptides, and other examples of relevance to the Chromosome-Centric Human Proteome Project. The PG Nexus is open-source and available from https://github.com/IntersectAustralia/ap11_Samifier. It has been integrated into Galaxy and made available in the Galaxy tool shed.


Assuntos
Genoma , Proteômica , Splicing de RNA , RNA Mensageiro/genética , Transcriptoma , Campylobacter/genética , Humanos , Espectrometria de Massas , Fosfopeptídeos/genética , Saccharomyces cerevisiae/genética
14.
Genome Announc ; 1(1)2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23409253

RESUMO

Campylobacter showae UNSWCD was isolated from a patient with Crohn's disease. Here we present a 2.1 Mb draft assembly of its genome.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...