Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-27723603

ABSTRACT

Characterizing genes with semantic information is an important process regarding the description of gene products. In spite that complete genomes of many organisms have been already sequenced, the biological functions of all of their genes are still unknown. Since experimentally studying the functions of those genes, one by one, would be unfeasible, new computational methods for gene functions inference are needed. We present here a novel computational approach for inferring biological function for a set of genes with previously unknown function, given a set of genes with well-known information. This approach is based on the premise that genes with similar behaviour should be grouped together. This is known as the guilt-by-association principle. Thus, it is possible to take advantage of clustering techniques to obtain groups of unknown genes that are co-clustered with genes that have well-known semantic information (GO annotations). Meaningful knowledge to infer unknown semantic information can therefore be provided by these well-known genes. We provide a method to explore the potential function of new genes according to those currently annotated. The results obtained indicate that the proposed approach could be a useful and effective tool when used by biologists to guide the inference of biological functions for recently discovered genes. Our work sets an important landmark in the field of identifying unknown gene functions through clustering, using an external source of biological input. A simple web interface to this proposal can be found at http://fich.unl.edu.ar/sinc/webdemo/gamma-am/.


Subject(s)
Computational Biology/methods , Gene Ontology , Genes/physiology , Machine Learning , Transcriptome/physiology , Arabidopsis/genetics , Arabidopsis/metabolism , Cluster Analysis , Databases, Genetic , Gene Expression Profiling/methods , Genes/genetics , Models, Genetic , Molecular Sequence Annotation/methods , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Transcriptome/genetics
2.
Talanta ; 152: 45-53, 2016 May 15.
Article in English | MEDLINE | ID: mdl-26992494

ABSTRACT

Volatile profiles of 63 black and 38 green teas from different countries were analysed with Proton Transfer Reaction-Time of Flight-Mass Spectrometry (PTR-ToF-MS) both for tea leaves and tea infusion. The headspace volatile fingerprints were collected and the tea classes and geographical origins were tracked with pattern recognition techniques. The high mass resolution achieved by ToF mass analyser provided determination of sum formula and tentative identifications of the mass peaks. The results provided successful separation of the black and green teas based on their headspace volatile emissions both from the dry tea leaves and their infusions. The volatile fingerprints were then used to build different classification models for discrimination of black and green teas according to their geographical origins. Two different cross validation methods were applied and their effectiveness for origin discrimination was discussed. The classification models showed a separation of black and green teas according to geographical origins the errors being mostly between neighbouring countries.


Subject(s)
Camellia sinensis/chemistry , Mass Spectrometry , Protons , Tea/chemistry , Volatile Organic Compounds/analysis , Volatile Organic Compounds/chemistry , Food Quality , Geography , Time Factors
3.
Article in English | MEDLINE | ID: mdl-23929864

ABSTRACT

Clustering validation indexes are intended to assess the goodness of clustering results. Many methods used to estimate the number of clusters rely on a validation index as a key element to find the correct answer. This paper presents a new validation index based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label. This new clustering index is combined with a solid statistical detection framework, the gap statistic. The resulting method is able to find the right number of arbitrary-shaped clusters in diverse situations, as we show with examples where this information is available. A comparison with several relevant validation methods is carried out using artificial and gene expression data sets. The results are very encouraging, showing that the underlying structure in the data can be more accurately detected with the new clustering index. Our gene expression data results also indicate that this new index is stable under perturbation of the input data.


Subject(s)
Cluster Analysis , Gene Expression Profiling/methods , Genomics/methods , Algorithms , Computer Simulation , Databases, Genetic , Humans , Neoplasms/genetics , Neoplasms/metabolism , Reproducibility of Results
4.
BMC Bioinformatics ; 12: 2, 2011 Jan 04.
Article in English | MEDLINE | ID: mdl-21205299

ABSTRACT

BACKGROUND: The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. RESULTS: In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. CONCLUSIONS: In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Cluster Analysis
5.
IEEE Trans Neural Netw ; 22(1): 37-51, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21062680

ABSTRACT

Many learning problems may vary slowly over time: in particular, some critical real-world applications. When facing this problem, it is desirable that the learning method could find the correct input-output function and also detect the change in the concept and adapt to it. We introduce the time-adaptive support vector machine (TA-SVM), which is a new method for generating adaptive classifiers, capable of learning concepts that change with time. The basic idea of TA-SVM is to use a sequence of classifiers, each one appropriate for a small time window but, in contrast to other proposals, learning all the hyperplanes in a global way. We show that the addition of a new term in the cost function of the set of SVMs (that penalizes the diversity between consecutive classifiers) produces a coupling of the sequence that allows TA-SVM to learn as a single adaptive classifier. We evaluate different aspects of the method using appropriate drifting problems. In particular, we analyze the regularizing effect of changing the number of classifiers in the sequence or adapting the strength of the coupling. A comparison with other methods in several problems, including the well-known STAGGER dataset and the real-world electricity pricing domain, shows the good performance of TA-SVM in all tested situations.


Subject(s)
Algorithms , Artificial Intelligence , Neural Networks, Computer , Computer Simulation/standards , Pattern Recognition, Automated , Problem Solving , Software/standards , Solutions
6.
J Mass Spectrom ; 45(9): 1065-74, 2010 Sep.
Article in English | MEDLINE | ID: mdl-20690164

ABSTRACT

Proton transfer reaction-mass spectrometry (PTR-MS), a direct injection mass spectrometric technique based on an efficient implementation of chemical ionisation, allows for fast and high-sensitivity monitoring of volatile organic compounds (VOCs). The first implementations of PTR-MS, based on quadrupole mass analyzers (PTR-Quad-MS), provided only the nominal mass of the ions measured and thus little chemical information. To partially overcome these limitations and improve the analytical capability of this technique, the coupling of proton transfer reaction ionisation with a time-of-flight mass analyser has been recently realised and commercialised (PTR-TOF-MS). Here we discuss the very first application of this new instrument to agro-industrial problems and dairy science in particular. As a case study, we show here that the rapid PTR-TOF-MS fingerprinting coupled with data-mining methods can quickly verify whether the storage condition of the milk affects the final quality of cheese and we provide relevant examples of better compound identification in comparison with the previous PTR-MS implementations. In particular, 'Trentingrana' cheese produced by four different procedures for milk storage are compared both in the case of winter and summer production. It is indeed possible to set classification models with low prediction errors and to identify the chemical formula of the ion peaks used for classification, providing evidence of the role that this novel spectrometric technique can play for fundamental and applied agro-industrial themes.

7.
Int J Neural Syst ; 13(2): 103-9, 2003 Apr.
Article in English | MEDLINE | ID: mdl-12923923

ABSTRACT

We refine and complement a previously-proposed artificial neural network method for learning hidden signals forcing nonstationary behavior in time series. The method adds an extra input unit to the network and feeds it with the proposed profile for the unknown perturbing signal. The correct time evolution of this new input parameter is learned simultaneously with the intrinsic stationary dynamics underlying the series, which is accomplished by minimizing a suitably-defined error function for the training process. We incorporate here the use of validation data, held out from the training set, to accurately determine the optimal value of a hyperparameter required by the method. Furthermore, we evaluate this algorithm in a controlled situation and show that it outperforms other existing methods in the literature. Finally, we discuss a preliminary application to the real-world sunspot time series and link the obtained hidden perturbing signal to the secular evolution of the solar magnetic field.


Subject(s)
Algorithms , Artificial Intelligence , Behavior , Learning , Neural Networks, Computer , Computer Simulation , Feedback , Humans , Nonlinear Dynamics , Reproducibility of Results , Signal Processing, Computer-Assisted , Solar Activity , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...