Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 106
Filtrar
2.
J Proteome Res ; 23(1): 418-429, 2024 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-38038272

RESUMO

The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.


Assuntos
Benchmarking , Proteômica , Fluxo de Trabalho , Software , Proteínas , Análise de Dados
3.
J Biomol Tech ; 34(3)2023 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-37969874

RESUMO

Metaproteomics research using mass spectrometry data has emerged as a powerful strategy to understand the mechanisms underlying microbiome dynamics and the interaction of microbiomes with their immediate environment. Recent advances in sample preparation, data acquisition, and bioinformatics workflows have greatly contributed to progress in this field. In 2020, the Association of Biomolecular Research Facilities Proteome Informatics Research Group launched a collaborative study to assess the bioinformatics options available for metaproteomics research. The study was conducted in 2 phases. In the first phase, participants were provided with mass spectrometry data files and were asked to identify the taxonomic composition and relative taxa abundances in the samples without supplying any protein sequence databases. The most challenging question asked of the participants was to postulate the nature of any biological phenomena that may have taken place in the samples, such as interactions among taxonomic species. In the second phase, participants were provided a protein sequence database composed of the species present in the sample and were asked to answer the same set of questions as for phase 1. In this report, we summarize the data processing methods and tools used by participants, including database searching and software tools used for taxonomic and functional analysis. This study provides insights into the status of metaproteomics bioinformatics in participating laboratories and core facilities.


Assuntos
Proteoma , Proteômica , Humanos , Proteômica/métodos , Software , Biologia Computacional , Bases de Dados de Proteínas
4.
J Biomol Tech ; 34(2)2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37435391

RESUMO

Despite the advantages of fewer missing values by collecting fragment ion data on all analytes in the sample as well as the potential for deeper coverage, the adoption of data-independent acquisition (DIA) in proteomics core facility settings has been slow. The Association of Biomolecular Resource Facilities conducted a large interlaboratory study to evaluate DIA performance in proteomics laboratories with various instrumentation. Participants were supplied with generic methods and a uniform set of test samples. The resulting 49 DIA datasets act as benchmarks and have utility in education and tool development. The sample set consisted of a tryptic HeLa digest spiked with high or low levels of 4 exogenous proteins. Data are available in MassIVE MSV000086479. Additionally, we demonstrate how the data can be analyzed by focusing on 2 datasets using different library approaches and show the utility of select summary statistics. These data can be used by DIA newcomers, software developers, or DIA experts evaluating performance with different platforms, acquisition settings, and skill levels.


Assuntos
Benchmarking , Proteômica , Humanos , Medicamentos Genéricos , Escolaridade , Biblioteca Gênica
5.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-36744821

RESUMO

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Assuntos
Aprendizado de Máquina , Proteômica , Proteômica/métodos , Algoritmos , Espectrometria de Massas
7.
J Proteome Res ; 22(2): 632-636, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36693629

RESUMO

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.


Assuntos
Algoritmos , Proteômica , Proteômica/métodos , Reprodutibilidade dos Testes , Peptídeos/análise , Espectrometria de Massas/métodos , Software
8.
J Proteome Res ; 22(2): 514-519, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36173614

RESUMO

It has long been known that biological species can be identified from mass spectrometry data alone. Ten years ago, we described a method and software tool, compareMS2, for calculating a distance between sets of tandem mass spectra, as routinely collected in proteomics. This method has seen use in species identification and mixture characterization in food and feed products, as well as other applications. Here, we present the first major update of this software, including a new metric, a graphical user interface and additional functionality. The data have been deposited to ProteomeXchange with dataset identifier PXD034932.


Assuntos
Software , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Algoritmos
9.
J Am Soc Mass Spectrom ; 33(12): 2203-2214, 2022 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-36371691

RESUMO

Ultrahigh resolution mass spectrometry (UHR-MS) coupled with direct infusion (DI) electrospray ionization offers a fast solution for accurate untargeted profiling. Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometers have been shown to produce a wealth of insights into complex chemical systems because they enable unambiguous molecular formula assignment even if the vast majority of signals is of unknown identity. Interlaboratory comparisons are required to apply this type of instrumentation in quality control (for food industry or pharmaceuticals), large-scale environmental studies, or clinical diagnostics. Extended comparisons employing different FT-ICR MS instruments with qualitative direct infusion analysis are scarce since the majority of detected compounds cannot be quantified. The extent to which observations can be reproduced by different laboratories remains unknown. We set up a preliminary study which encompassed a set of 17 laboratories around the globe, diverse in instrumental characteristics and applications, to analyze the same sets of extracts from commercially available standard human blood plasma and Standard Reference Material (SRM) for blood plasma (SRM1950), which were delivered at different dilutions or spiked with different concentrations of pesticides. The aim of this study was to assess the extent to which the outputs of differently tuned FT-ICR mass spectrometers, with different technical specifications, are comparable for setting the frames of a future DI-FT-ICR MS ring trial. We concluded that a cluster of five laboratories, with diverse instrumental characteristics, showed comparable and representative performance across all experiments, setting a reference to be used in a future ring trial on blood plasma.

11.
J Proteome Res ; 21(11): 2553-2554, 2022 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-36193949
12.
Anal Chem ; 94(44): 15464-15471, 2022 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-36281827

RESUMO

A major obstacle for reusing and integrating existing data is finding the data that is most relevant in a given context. The primary metadata resource is the scientific literature describing the experiments that produced the data. To stimulate the development of natural language processing methods for extracting this information from articles, we have manually annotated 100 recent open access publications in Analytical Chemistry as semantic graphs. We focused on articles mentioning mass spectrometry in their experimental sections, as we are particularly interested in the topic, which is also within the domain of several ontologies and controlled vocabularies. The resulting gold standard dataset is publicly available and directly applicable to validating automated methods for retrieving this metadata from the literature. In the process, we also made a number of observations on the structure and description of experiments and open access publication in this journal.


Assuntos
Processamento de Linguagem Natural , Semântica , Projetos de Pesquisa , Química Analítica
13.
Toxicology ; 477: 153262, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35868597

RESUMO

The zebrafish embryo (ZFE) is a promising alternative non-rodent model in toxicology, and initial studies suggested its applicability in detecting hepatic responses related to drug-induced liver injury (DILI). Here, we hypothesize that detailed analysis of underlying mechanisms of hepatotoxicity in ZFE contributes to the improved identification of hepatotoxic properties of compounds and to the reduction of rodents used for hepatotoxicity assessment. ZFEs were exposed to nine reference hepatotoxicants, targeted at induction of steatosis, cholestasis, and necrosis, and effects compared with negative controls. Protein profiles of the individual compounds were generated using LC-MS/MS. We identified differentially expressed proteins and pathways, but as these showed considerable overlap, phenotype-specific responses could not be distinguished. This led us to identify a set of common hepatotoxicity marker proteins. At the pathway level, these were mainly associated with cellular adaptive stress-responses, whereas single proteins could be linked to common hepatotoxicity-associated processes. Applying several stringency criteria to our proteomics data as well as information from other data sources resulted in a set of potential robust protein markers, notably Igf2bp1, Cox5ba, Ahnak, Itih3b.2, Psma6b, Srsf3a, Ces2b, Ces2a, Tdo2b, and Anxa1c, for the detection of adverse responses.


Assuntos
Doença Hepática Induzida por Substâncias e Drogas , Peixe-Zebra , Animais , Biomarcadores/metabolismo , Doença Hepática Induzida por Substâncias e Drogas/etiologia , Doença Hepática Induzida por Substâncias e Drogas/metabolismo , Cromatografia Líquida , Fígado , Proteoma , Proteínas de Ligação a RNA/metabolismo , Espectrometria de Massas em Tandem , Peixe-Zebra/fisiologia , Proteínas de Peixe-Zebra/genética
14.
J Proteome Res ; 21(4): 1204-1207, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35119864

RESUMO

Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and interpret the recently published and general DOME (Data, Optimization, Model, Evaluation) recommendations for conducting and reporting on machine learning in the specific context of proteomics and metabolomics.


Assuntos
Metabolômica , Proteômica , Aprendizado de Máquina , Metabolômica/métodos , Proteômica/métodos , Espectrometria de Massas em Tandem
15.
F1000Res ; 10: 897, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34804501

RESUMO

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.


Assuntos
Disciplinas das Ciências Biológicas , Biologia Computacional , Benchmarking , Software , Fluxo de Trabalho
16.
J Proteome Res ; 20(10): 4640-4645, 2021 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-34523928

RESUMO

Science is full of overlooked and undervalued research waiting to be rediscovered. Proteomics is no exception. In this perspective, we follow the ripples from a 1960 study of Zuckerkandl, Jones, and Pauling comparing tryptic peptides across animal species. This pioneering work directly led to the molecular clock hypothesis and the ensuing explosion in molecular phylogenetics. In the decades following, proteins continued to provide essential clues on evolutionary history. While technology has continued to improve, contemporary proteomics has strayed from this larger biological context, rarely comparing species or asking how protein structure, function, and interactions have evolved. Here we recombine proteomics with molecular phylogenetics, highlighting the value of framing proteomic results in a larger biological context and how almost forgotten research, though technologically surpassed, can still generate new ideas and illuminate our work from a different perspective. Though it is infeasible to read all research published on a large topic, looking up older papers can be surprisingly rewarding when rediscovering a "gem" at the end of a long citation chain, aided by digital collections and perpetually helpful librarians. Proper literature study reduces unnecessary repetition and allows research to be more insightful and impactful by truly standing on the shoulders of giants. All data was uploaded to MassIVE (https://massive.ucsd.edu/) as dataset MSV000087993.


Assuntos
Peptídeos , Proteômica , Animais , Filogenia
17.
J Proteome Res ; 20(6): 3395-3399, 2021 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-33904308

RESUMO

While mass spectrometry still dominates proteomics research, alternative and potentially disruptive, next-generation technologies are receiving increased investment and attention. Most of these technologies aim at the sequencing of single peptide or protein molecules, typically labeling or otherwise distinguishing a subset of the proteinogenic amino acids. This note considers some theoretical aspects of these future technologies from a bottom-up proteomics viewpoint, including the ability to uniquely identify human proteins as a function of which and how many amino acids can be read, enzymatic efficiency, and the maximum read length. This is done through simulations under ideal and non-ideal conditions to set benchmarks for what may be achievable with future single-molecule sequencing technology. The simulations reveal, among other observations, that the best choice of reading N amino acids performs similarly to the average choice of N+1 amino acids, and that the discrimination power of the amino acids scales with their frequency in the proteome. The simulations are agnostic with respect to the next-generation proteomics platform, and the results and conclusions should therefore be applicable to any single-molecule partial peptide sequencing technology.


Assuntos
Proteoma , Proteômica , Sequência de Aminoácidos , Humanos , Espectrometria de Massas , Peptídeos
19.
J Proteome Res ; 20(4): 2157-2165, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33720735

RESUMO

The bio.tools registry is a main catalogue of computational tools in the life sciences. More than 17 000 tools have been registered by the international bioinformatics community. The bio.tools metadata schema includes semantic annotations of tool functions, that is, formal descriptions of tools' data types, formats, and operations with terms from the EDAM bioinformatics ontology. Such annotations enable the automated composition of tools into multistep pipelines or workflows. In this Technical Note, we revisit a previous case study on the automated composition of proteomics workflows. We use the same four workflow scenarios but instead of using a small set of tools with carefully handcrafted annotations, we explore workflows directly on bio.tools. We use the Automated Pipeline Explorer (APE), a reimplementation and extension of the workflow composition method previously used. Moving "into the wild" opens up an unprecedented wealth of tools and a huge number of alternative workflows. Automated composition tools can be used to explore this space of possibilities systematically. Inevitably, the mixed quality of semantic annotations in bio.tools leads to unintended or erroneous tool combinations. However, our results also show that additional control mechanisms (tool filters, configuration options, and workflow constraints) can effectively guide the exploration toward smaller sets of more meaningful workflows.


Assuntos
Proteômica , Software , Biologia Computacional , Sistema de Registros , Fluxo de Trabalho
20.
Bioinformatics ; 37(17): 2768-2769, 2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-33538780

RESUMO

SUMMARY: In mass spectrometry-based proteomics, accurate peptide masses improve identifications, alignment and quantitation. Getting the most out of any instrument therefore requires proper calibration. Here, we present a new stand-alone software, mzRecal, for universal automatic recalibration of data from all common mass analyzers using standard open formats and based on physical principles. AVAILABILITY AND IMPLEMENTATION: mzRecal is implemented in Go and freely available on https://github.com/524D/mzRecal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...