Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
Article in English | MEDLINE | ID: mdl-24109769

ABSTRACT

Predicting the localization of a protein has become a useful practice for inferring its function. Most of the reported methods to predict subcellular localizations in Gram-negative bacterial proteins have shown a low false positive rate. However, some subcellular compartmens like "periplasm" and "extracellular medium" are difficult to predict and remain high false negative rates. In this paper, a method based on representation from statistical contact potentials and wavelet transform is presented. The wavelet-based method achieves an overall high performance holding low false and negative rates particularly on periplasm and extracellular medium. Results suggest the contact potentials as an useful alternative to characterize protein sequences.


Subject(s)
Bacterial Proteins/chemistry , Gram-Negative Bacteria , Amino Acid Sequence , Markov Chains , Molecular Sequence Annotation , Protein Transport , Sequence Analysis, Protein , Wavelet Analysis
2.
Article in English | MEDLINE | ID: mdl-24110281

ABSTRACT

A comparative analysis of four multi-label classification methods is performed in order to determine the best topology for the problem of protein function prediction, using support vector machines as base classifiers. Comparisons are done in terms of performance and computational cost of parallelized versions of the algorithms, for determining its applicability in high-throughput scenarios. Results show that the performance of the binary relevance strategy, together with a technique of class balance, remains above several recently proposed techniques for the problem at hand, while employing the smallest computational cost when parallelized. However, stacked classfiers and chain classifications can be conveniently used in pipelines, due to the low number of false positives reported.


Subject(s)
Computational Biology , Proteins/metabolism , Algorithms , Databases, Protein , Embryophyta/metabolism , Proteins/classification , Support Vector Machine
3.
Article in English | MEDLINE | ID: mdl-23367187

ABSTRACT

Predicting the sub-cellular localization of a protein can provide useful information to uncover its molecular functions. In this sense, numerous prediction techniques have been developed, which usually have been focused on global information of the protein or sequence alignments. However, several studies have shown that the functional nature of proteins is ruled by conserved sub-sequence patterns known as domains. In this paper, an alternative methodology (PfamFeat) for gram-positive bacterial sub-cellular localization was developed. PfamFeat is based on information provided by Pfam database, which stores a series of HMM-profiles describing common protein domains. The likelihood of a sequence, to be generated by a given HMM-profile, can be used to characterize sequences in order to use pattern recognition techniques. Success rates obtained with a simple one-nearest neighbor classifier demonstrate that this method is competitive with popular sub-cellular prediction algorithms and it constitutes a promising research trend.


Subject(s)
Gram-Positive Bacteria/metabolism , Subcellular Fractions/metabolism , Algorithms , Computational Biology
4.
Article in English | MEDLINE | ID: mdl-22254467

ABSTRACT

Predict the function of unknown proteins is one of the principal goals in computational biology. The subcellular localization of a protein allows further understanding its structure and molecular function. Numerous prediction techniques have been developed, usually focusing on global information of the protein. But, predictions can be done through the identification of functional sub-sequence patterns known as motifs. For motifs discovery problem, many methods requires a predefined fixed window size in advance and aligned sequences. To confront these problems we proposed a method based on variable length motifs characterization and detection using the continuous wavelet transform (CWT) and a dissimilarity space representation. For analyzing the motifs results generated by our approach, we divide the entire dataset into training (60%) and validation (40%). A Support Vector Machine (SVM) classifier is used as predictor for validation set. The highest Sn = 82.58% and Sp = 92.86%, across 10-fold cross validation, is obtained for endosome proteins. Average results Sn = 74% and Sp = 75.58% are comparable to current state of the art. For data sets whose identity is low (< 40%), the motifs characterization and localization based on CWT shows a good performance and the interpretability of the subsequences in each subcellular localization.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Pattern Recognition, Automated/methods , Proteins/chemistry , Proteins/metabolism , Sequence Analysis, Protein/methods , Subcellular Fractions/metabolism , Amino Acid Sequence , Molecular Sequence Data , Software , Structure-Activity Relationship , Subcellular Fractions/chemistry , Support Vector Machine
5.
Article in English | MEDLINE | ID: mdl-21096466

ABSTRACT

An analysis of the predictability of subcellular locations is performed by using simple pattern recognition techniques in an attempt to capture the real dimensions of the problem at hand. Results show that there are some particular locations that does not need of high complexity classification models to be predicted with high accuracies, and some partial biological explanations are formulated. All the experiments were carried out over a set of Arabidopsis Thaliana proteins and classes were defined according to the plants GO slim.


Subject(s)
Arabidopsis Proteins/metabolism , Pattern Recognition, Automated/methods , Amino Acid Sequence , Arabidopsis Proteins/chemistry , Arabidopsis Proteins/classification , Databases, Protein , Protein Transport , Subcellular Fractions/metabolism
6.
Article in English | MEDLINE | ID: mdl-19162987

ABSTRACT

This paper presents a nonlinear approach for time-frequency representations (TFR) data analysis, based on a statistical learning methodology - support vector regression (SVR), that being a nonlinear framework, matches recent findings on the underlying dynamics of cardiac mechanic activity and phonocardiographic (PCG) recordings. The proposed methodology aims to model the estimated TFRs, and extract relevant features to perform classification between normal and pathologic PCG recordings (with murmur). Modeling of TFR is done by means of SVR, and the distance between regressions is calculated through dissimilarity measures based on dot product. Finally, a k-nn classifier is used for the classification stage, obtaining a validation performance of 97.85%.


Subject(s)
Heart Murmurs/diagnosis , Phonocardiography/statistics & numerical data , Adult , Artificial Intelligence , Biomedical Engineering , Case-Control Studies , Diagnosis, Computer-Assisted/statistics & numerical data , Fourier Analysis , Heart Murmurs/classification , Heart Murmurs/physiopathology , Humans , Nonlinear Dynamics , Regression Analysis , Signal Processing, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL
...