Search | VHL Regional Portal

Sequence information gain based motif analysis.

Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre.

BMC Bioinformatics ; 16: 377, 2015 Nov 09.

Article in English | MEDLINE | ID: mdl-26553056

ABSTRACT

BACKGROUND: The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. RESULTS: This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. CONCLUSIONS: Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.

Subject(s)

Algorithms , Genome , Genomics/methods , Nucleotide Motifs/genetics , Software , Transcription Factors/metabolism , Animals , Binding Sites/genetics , Humans , Mice , Nonlinear Dynamics , Protein Binding/genetics , ROC Curve

A subspace method for the detection of transcription factor binding sites.

Pairó, Erola; Maynou, Joan; Marco, Santiago; Perera, Alexandre.

Bioinformatics ; 28(10): 1328-35, 2012 May 15.

Article in English | MEDLINE | ID: mdl-22467907

ABSTRACT

MOTIVATION: The identification of the sites at which transcription factors (TFs) bind to Deoxyribonucleic acid (DNA) is an important problem in molecular biology. Many computational methods have been developed for motif finding, most of them based on position-specific scoring matrices (PSSMs) which assume the independence of positions within a binding site. However, some experimental and computational studies demonstrate that interdependences within the positions exist. RESULTS: In this article, we introduce a novel motif finding method which constructs a subspace based on the covariance of numerical DNA sequences. When a candidate sequence is projected into the modeled subspace, a threshold in the Q-residuals confidence allows us to predict whether this sequence is a binding site. Using the TRANSFAC and JASPAR databases, we compared our Q-residuals detector with existing PSSM methods. In most of the studied TF binding sites, the Q-residuals detector performs significantly better and faster than MATCH and MAST. As compared with Motifscan, a method which takes into account interdependences, the performance of the Q-residuals detector is better when the number of available sequences is small.

Subject(s)

Algorithms , Nucleotide Motifs , Position-Specific Scoring Matrices , Transcription Factors/metabolism , Animals , Binding Sites , Humans , Protein Binding , Sequence Analysis, DNA/methods , Transcription Factors/chemistry , Transcription Factors/genetics

Classification of Sherry vinegars by combining multidimensional fluorescence, parafac and different classification approaches.

Callejón, Raquel M; Amigo, José Manuel; Pairo, Erola; Garmón, Sergio; Ocaña, Juan Antonio; Morales, Maria Lourdes.

Talanta ; 88: 456-62, 2012 Jan 15.

Article in English | MEDLINE | ID: mdl-22265526

ABSTRACT

Sherry vinegar is a much appreciated product from Jerez-Xérès-Sherry, Manzanilla de Sanlúcar and Vinagre de Jerez Protected Designation in southwestern Spain. Its complexity and the extraordinary organoleptic properties are acquired thanks to the method of production followed, the so-called "criaderas y solera" ageing system. Three qualities for Sherry vinegar are considered according to ageing time in oak barrels: "Vinagre de Jerez" (minimum of 6 months), "Reserva" (at least 2 years) and "Gran Reserva" (at least 10 years). In the last few years, there has been an increasing need to develop rapid, inexpensive and effective analytical methods, as well as requiring low sample manipulation for the analysis and characterization of Sherry vinegar. Fluorescence spectroscopy is emerging as a competitive technique for this purpose, since provides in a few seconds an excitation-emission landscape that may be used as a fingerprint of the vinegar. Multi-way analysis, specifically Parallel Factor Analysis (PARAFAC), is a powerful tool for simultaneous determination of fluorescent components, because they extract the most relevant information from the data and allow building robust models. Moreover, the information obtained by PARAFAC can be used to build robust and reliable classification and discrimination models (e.g. by using Support Vector Machines and Partial Least Squares-Discriminant Analysis models). In this context, the aim of this work was to study the possibilities of multi-way fluorescence linked to PARAFAC and to classify the different Sherry vinegars accordingly to their ageing. The results demonstrated that the use of the proposed analytical and chemometric tools are a perfect combination to extract relevant chemical information about the vinegars as well as to classify and discriminate them considering the different ageing.

Subject(s)

Acetic Acid/analysis , Food Technology , Acetic Acid/classification , Discriminant Analysis , Fluorescence , Multivariate Analysis , Regression Analysis , Spectrometry, Fluorescence/methods , Wine/analysis

MEET: motif elements estimation toolkit.

Pairó, Erola; Maynou, Joan; Vallverdú, Montserrat; Caminal, Pere; Marco, Santiago; Perera, Alexandre.

Annu Int Conf IEEE Eng Med Biol Soc ; 2011: 6483-6, 2011.

Article in English | MEDLINE | ID: mdl-22255823

ABSTRACT

MEET is an R package that integrates a set of algorithms for the detection of transcription factor binding sites (TFBS). The MEET R package includes five motif searching algorithms: MEME/MAST(Multiple Expectation-Maximization for Motif Elicitation), Q-residuals, MDscan (Motif Discovery scan), ITEME (Information Theory Elements for Motif Estimation) and MATCH. In addition MEET allows the user to work with different alignment algorithms: MUSCLE (Multiple Sequence Comparison by Log-Expectation), ClustalW and MEME. The package can work in two modes, training and detection. The training mode allows the user to choose the best parameters of a detector. Once the parameters are chosen, the detection mode allows to analyze a genome looking for binding sites. Both modes can combine the different alignment and detection methods, offering multiple possibilities. Combining the alignments and the detection algorithms makes possible the comparison between detection models at the same level, without having to care about the differences produced during the alignment process. The MEET R package can be downloaded from http://sisbio.recerca.upc.edu/R/MEET_1.0. tar.gz.

Subject(s)

Computational Biology/methods , Algorithms , Amino Acid Motifs , Area Under Curve , Binding Sites , Genes, Fungal , Genome , Probability , Programming Languages , Promoter Regions, Genetic , ROC Curve , Saccharomyces cerevisiae/genetics , Sequence Alignment , Sequence Analysis, DNA , Sequence Analysis, Protein/methods , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL