Search | VHL Regional Portal

Near Real-Time Processing of Proteomics Data Using Hadoop.

Hillman, Chris; Ahmad, Yasmeen; Whitehorn, Mark; Cobley, Andy.

Big Data ; 2(1): 44-9, 2014 Mar.

Article in English | MEDLINE | ID: mdl-27447310

ABSTRACT

This article presents a near real-time processing solution using MapReduce and Hadoop. The solution is aimed at some of the data management and processing challenges facing the life sciences community. Research into genes and their product proteins generates huge volumes of data that must be extensively preprocessed before any biological insight can be gained. In order to carry out this processing in a timely manner, we have investigated the use of techniques from the big data field. These are applied specifically to process data resulting from mass spectrometers in the course of proteomic experiments. Here we present methods of handling the raw data in Hadoop, and then we investigate a process for preprocessing the data using Java code and the MapReduce framework to identify 2D and 3D peaks.

Establishment of a protein frequency library and its application in the reliable identification of specific protein interaction partners.

Boulon, Séverine; Ahmad, Yasmeen; Trinkle-Mulcahy, Laura; Verheggen, Céline; Cobley, Andy; Gregor, Peter; Bertrand, Edouard; Whitehorn, Mark; Lamond, Angus I.

Mol Cell Proteomics ; 9(5): 861-79, 2010 May.

Article in English | MEDLINE | ID: mdl-20023298

ABSTRACT

The reliable identification of protein interaction partners and how such interactions change in response to physiological or pathological perturbations is a key goal in most areas of cell biology. Stable isotope labeling with amino acids in cell culture (SILAC)-based mass spectrometry has been shown to provide a powerful strategy for characterizing protein complexes and identifying specific interactions. Here, we show how SILAC can be combined with computational methods drawn from the business intelligence field for multidimensional data analysis to improve the discrimination between specific and nonspecific protein associations and to analyze dynamic protein complexes. A strategy is shown for developing a protein frequency library (PFL) that improves on previous use of static "bead proteomes." The PFL annotates the frequency of detection in co-immunoprecipitation and pulldown experiments for all proteins in the human proteome. It can provide a flexible and objective filter for discriminating between contaminants and specifically bound proteins and can be used to normalize data values and facilitate comparisons between data obtained in separate experiments. The PFL is a dynamic tool that can be filtered for specific experimental parameters to generate a customized library. It will be continuously updated as data from each new experiment are added to the library, thereby progressively enhancing its utility. The application of the PFL to pulldown experiments is especially helpful in identifying either lower abundance or less tightly bound specific components of protein complexes that are otherwise lost among the large, nonspecific background.

Subject(s)

Peptide Library , Protein Interaction Mapping/methods , Cell Line, Tumor , Databases, Protein , Humans , Isotope Labeling , Models, Biological , Multiprotein Complexes/metabolism , Protein Binding , Protein Subunits/metabolism , RNA Polymerase II/metabolism , Reproducibility of Results

NOPdb: Nucleolar Proteome Database--2008 update.

Ahmad, Yasmeen; Boisvert, François-Michel; Gregor, Peter; Cobley, Andy; Lamond, Angus I.

Nucleic Acids Res ; 37(Database issue): D181-4, 2009 Jan.

Article in English | MEDLINE | ID: mdl-18984612

ABSTRACT

An experimental data handling system has been created as an update to the previous Nucleolar Proteome Database (NOPdb3.0: http://www.lamondlab.com/NOPdb3.0/). This updated system is able to manage large data sets identified by multiple mass spectrometry and has been used to analyse highly purified preparations of human nucleoli from different cell lines. The newly created application includes a dynamic relational database, which is kept up to date by laboratory staff. The data are further annotated with information from specific external sources on the web, including the IPI and Gene Ontology databases. In addition, an Application Programming Interface provides external users with a portal to link into the nucleolar proteome database and hence, gain access to continually updated results. From the initial approximately 700 human proteins identified in the previous iteration of the NOPdb, we have now identified over 50 000 peptides contained in over 4500 human proteins from purified nucleoli, providing enhanced coverage of the nucleolar proteome.

Subject(s)

Cell Nucleolus/chemistry , Databases, Protein , Nuclear Proteins/chemistry , Humans , Internet , Mass Spectrometry , Peptides/chemistry , Proteome/chemistry

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL