Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Comput Biol Med ; 59: 64-72, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25679476

ABSTRACT

In this paper, a new method for remote protein homology detection is presented. Most discriminative methods concatenate the values extracted from physicochemical properties to build a model that separates homolog and non-homolog examples. Each discriminative method uses a specific strategy to represent the information extracted from the protein sequence and a different number of indices. After the vector representation is achieved, support vector machines (SVM) are usually used. Most classification techniques are not suitable in remote homology detection because they do not address high dimensional datasets. In this paper, we propose a method that reduces the high dimensionality of the vector representation using models that are defined at the 3D level. Next, the models are mapped from the protein primary sequence. The new method, called remote-C3D, is presented and tested on the SCOP 1.53 and SCOP 1.55 datasets. The remote-C3D method achieves a higher accuracy than the composition-based methods and a comparable performance with profile-based methods.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Proteins/classification , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Databases, Protein , Models, Molecular , Support Vector Machine
2.
Comput Biol Med ; 45: 43-50, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24480162

ABSTRACT

A new method for remote protein homology detection, called support vector machine incorporating the context of physicochemical properties (SVM-CP), is presented. Recent discriminative methods are based on concatenating information extracted from each protein by considering several physicochemical properties. We show that there are physicochemical properties that reflect the functional or structural characteristics of each specific protein family, but there are also some physicochemical properties that affect the accuracy of the classification techniques. The research highlights the importance of the selection of physicochemical properties in remote homology detection. Most of the methods slide a window over every protein sequence to extract physicochemical information. This extraction is usually performed by giving the same importance to every value in the window, i.e., averaging the physicochemical values in the observation window. SVM-CP takes into account that every residue in a sliding window has a different weight, which reflects the importance or contribution to the representative value of the window. The SVM-CP method reaches a receiver operating characteristic (ROC) score of 0.93462, which is the highest value for a remote homology detection method based on the sequence composition information.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Amino Acid Sequence , Chemical Phenomena , Molecular Sequence Data , ROC Curve , Support Vector Machine
3.
BMC Genomics ; 12: 506, 2011 Oct 14.
Article in English | MEDLINE | ID: mdl-21999602

ABSTRACT

BACKGROUND: Several studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode. RESULTS: We report here multifractality in the human genome sequence. This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content. In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information. Gene function, cluster of orthologous genes, metabolic pathways, and exons tended to increase their frequencies with ranges of multifractality and large gene families were located in genomic regions with varied multifractality. Additionally, a multifractal map and classification for human chromosomes are proposed. CONCLUSIONS: Based on these findings, we propose a descriptive non-linear model for the structure of the human genome, with some biological implications. This model reveals 1) a multifractal regionalization where many regions coexist that are far from equilibrium and 2) this non-linear organization has significant molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome. Given the role of Alu sequences in gene regulation, genetic diseases, human genetic diversity, adaptation and phylogenetic analyses, these quantifications are especially useful.


Subject(s)
Fractals , Genome, Human , Alu Elements , Base Composition , Chromosome Mapping , Chromosomes, Human/genetics , CpG Islands , Databases, Genetic , Discriminant Analysis , Humans , Models, Genetic , Multigene Family , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL
...