Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 25(6): 787-94, 2009 Mar 15.
Article in English | MEDLINE | ID: mdl-19176550

ABSTRACT

MOTIVATION: Matching both the retention index (RI) and the mass spectrum of an unknown compound against a mass spectral reference library provides strong evidence for a correct identification of that compound. Data on retention indices are, however, available for only a small fraction of the compounds in such libraries. We propose a quantitative structure-RI model that enables the ranking and filtering of putative identifications of compounds for which the predicted RI falls outside a predefined window. RESULTS: We constructed multiple linear regression and support vector regression (SVR) models using a set of descriptors obtained with a genetic algorithm as variable selection method. The SVR model is a significant improvement over previous models built for structurally diverse compounds as it covers a large range (360-4100) of RI values and gives better prediction of isomer compounds. The hit list reduction varied from 41% to 60% and depended on the size of the original hit list. Large hit lists were reduced to a greater extend compared with small hit lists. AVAILABILITY: http://appliedbioinformatics.wur.nl/GC-MS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Gas Chromatography-Mass Spectrometry/methods , Metabolomics/methods , Algorithms , Linear Models
2.
Bioinformatics ; 24(16): 1779-86, 2008 Aug 15.
Article in English | MEDLINE | ID: mdl-18562268

ABSTRACT

MOTIVATION: Recent research underlines the importance of finegrained knowledge on protein localization. In particular, subcompartmental localization in the Golgi apparatus is important, for example, for the order of reactions performed in glycosylation pathways or the sorting functions of SNAREs, but is currently poorly understood. RESULTS: We assemble a dataset of type II transmembrane proteins with experimentally determined sub-Golgi localizations and use this information to develop a predictor based on the transmembrane domain of these proteins, making use of a dedicated proteinstructure based kernel in an SVM. Various applications demonstrate the power of our approach. In particular, comparison with a large set of glycan structures illustrates the applicability of our predictions on a 'glycomic' scale and demonstrates a significant correlation between sub-Golgi localization and the ordering of different steps in glycan biosynthesis. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Golgi Apparatus/metabolism , Models, Biological , Models, Chemical , Pattern Recognition, Automated/methods , SNARE Proteins/chemistry , SNARE Proteins/metabolism , Sequence Analysis, Protein/methods , Amino Acid Sequence , Artificial Intelligence , Computer Simulation , Molecular Sequence Data , Structure-Activity Relationship
3.
Bioinformatics ; 24(1): 26-33, 2008 Jan 01.
Article in English | MEDLINE | ID: mdl-18024974

ABSTRACT

MOTIVATION: Transcription factor interactions are the cornerstone of combinatorial control, which is a crucial aspect of the gene regulatory system. Understanding and predicting transcription factor interactions based on their sequence alone is difficult since they are often part of families of factors sharing high sequence identity. Given the scarcity of experimental data on interactions compared to available sequence data, however, it would be most useful to have accurate methods for the prediction of such interactions. RESULTS: We present a method consisting of a Random Forest-based feature-selection procedure that selects relevant motifs out of a set found using a correlated motif search algorithm. Prediction accuracy for several transcription factor families (bZIP, MADS, homeobox and forkhead) reaches 60-90%. In addition, we identified those parts of the sequence that are important for the interaction specificity, and show that these are in agreement with available data. We also used the predictors to perform genome-wide scans for interaction partners and recovered both known and putative new interaction partners.


Subject(s)
Models, Chemical , Pattern Recognition, Automated/methods , Protein Interaction Mapping/methods , Sequence Analysis, Protein/methods , Transcription Factors/chemistry , Amino Acid Sequence , Binding Sites , Combinatorial Chemistry Techniques/methods , Computer Simulation , Data Interpretation, Statistical , Molecular Sequence Data , Protein Binding
SELECTION OF CITATIONS
SEARCH DETAIL
...