Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Metabolites ; 14(3)2024 Feb 29.
Article in English | MEDLINE | ID: mdl-38535309

ABSTRACT

This paper aimed at devising an intelligence-based method to select compounds that can distinguish between open-angle glaucoma patients, type 2 diabetes patients, and healthy controls. Taking the concentration of 188 compounds measured in the aqueous humour (AH) of patients and controls, linear discriminant analysis (LDA) was used to identify the right combination of compounds that could lead to accurate diagnosis. All possibilities, using the leave-one-out approach, were considered through ad hoc programming and in silico massive data production and statistical analysis. Our proof of concept led to the selection of four molecules: acetyl-ornithine (Ac-Orn), C3 acyl-carnitine (C3), diacyl C42:6 phosphatidylcholine (PC aa C42:6), and C3-DC (C4-OH) acyl-carnitine (C3-DC (C4-OH)) that, taken in combination, would lead to a 95% discriminative success. 100% success was obtained with a non-linear combination of the concentration of three of these four compounds. By discarding younger controls to adjust by age, results were similar although one control was misclassified as a diabetes patient. Methods based on the consideration of individual clinical chemical parameters have limitations in the ability to make a reliable diagnosis, stratify patients, and assess disease progression. Leveraging human AH metabolomic data, we developed a procedure that selects a minimal number of metabolites (3-5) and designs algorithms that maximize the overall accuracy evaluating both positive predictive (PPV) and negative predictive (NPV) values. Our approach of simultaneously considering the levels of a few metabolites can be extended to any other body fluid and has potential to advance precision medicine. Artificial intelligence is expected to use algorithms that use the concentration of three to five molecules to correctly diagnose diseases, also allowing stratification of patients and evaluation of disease progression. In addition, this significant advance shifts focus from a single-molecule biomarker approach to that of an appropriate combination of metabolites.

2.
BMC Bioinformatics ; 19(1): 432, 2018 Nov 19.
Article in English | MEDLINE | ID: mdl-30453885

ABSTRACT

BACKGROUND: Support vector machines (SVM) are a powerful tool to analyze data with a number of predictors approximately equal or larger than the number of observations. However, originally, application of SVM to analyze biomedical data was limited because SVM was not designed to evaluate importance of predictor variables. Creating predictor models based on only the most relevant variables is essential in biomedical research. Currently, substantial work has been done to allow assessment of variable importance in SVM models but this work has focused on SVM implemented with linear kernels. The power of SVM as a prediction model is associated with the flexibility generated by use of non-linear kernels. Moreover, SVM has been extended to model survival outcomes. This paper extends the Recursive Feature Elimination (RFE) algorithm by proposing three approaches to rank variables based on non-linear SVM and SVM for survival analysis. RESULTS: The proposed algorithms allows visualization of each one the RFE iterations, and hence, identification of the most relevant predictors of the response variable. Using simulation studies based on time-to-event outcomes and three real datasets, we evaluate the three methods, based on pseudo-samples and kernel principal component analysis, and compare them with the original SVM-RFE algorithm for non-linear kernels. The three algorithms we proposed performed generally better than the gold standard RFE for non-linear kernels, when comparing the truly most relevant variables with the variable ranks produced by each algorithm in simulation studies. Generally, the RFE-pseudo-samples outperformed the other three methods, even when variables were assumed to be correlated in all tested scenarios. CONCLUSIONS: The proposed approaches can be implemented with accuracy to select variables and assess direction and strength of associations in analysis of biomedical data using SVM for categorical or time-to-event responses. Conducting variable selection and interpreting direction and strength of associations between predictors and outcomes with the proposed approaches, particularly with the RFE-pseudo-samples approach can be implemented with accuracy when analyzing biomedical data. These approaches, perform better than the classical RFE of Guyon for realistic scenarios about the structure of biomedical data.


Subject(s)
Algorithms , Biomarkers, Tumor/genetics , Computer Graphics , Liver Cirrhosis, Biliary/mortality , Lung Neoplasms/mortality , Lymphoma, Large B-Cell, Diffuse/mortality , Support Vector Machine , Humans , Liver Cirrhosis, Biliary/genetics , Lung Neoplasms/genetics , Lymphoma, Large B-Cell, Diffuse/genetics , Survival Rate
3.
BMC Bioinformatics ; 17 Suppl 5: 205, 2016 Jun 06.
Article in English | MEDLINE | ID: mdl-27294256

ABSTRACT

BACKGROUND: Pathway expression is multivariate in nature. Thus, from a statistical perspective, to detect differentially expressed pathways between two conditions, methods for inferring differences between mean vectors need to be applied. Maximum mean discrepancy (MMD) is a statistical test to determine whether two samples are from the same distribution, its implementation being greatly simplified using the kernel method. RESULTS: An MMD-based test successfully detected the differential expression between two conditions, specifically the expression of a set of genes involved in certain fatty acid metabolic pathways. Furthermore, we exploited the ability of the kernel method to integrate data and successfully added hepatic fatty acid levels to the test procedure. CONCLUSION: MMD is a non-parametric test that acquires several advantages when combined with the kernelization of data: 1) the number of variables can be greater than the sample size; 2) omics data can be integrated; 3) it can be applied not only to vectors, but to strings, sequences and other common structured data types arising in molecular biology.


Subject(s)
Algorithms , Computational Biology/methods , Gene Expression , Animals , Diet , Fatty Acids/metabolism , Genomics , Liver/metabolism , Metabolomics , Mice , Mice, Knockout , Plant Oils/chemistry , Plant Oils/metabolism , Sunflower Oil
4.
BMC Syst Biol ; 8 Suppl 2: S6, 2014.
Article in English | MEDLINE | ID: mdl-25032747

ABSTRACT

BACKGROUND: Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. RESULTS: We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. CONCLUSIONS: The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.


Subject(s)
Computational Biology/methods , Nutrigenomics , Principal Component Analysis , Statistics as Topic
5.
Science ; 313(5794): 1773-5, 2006 Sep 22.
Article in English | MEDLINE | ID: mdl-16946033

ABSTRACT

Comparisons of recent with historical samples of chromosome inversion frequencies provide opportunities to determine whether genetic change is tracking climate change in natural populations. We determined the magnitude and direction of shifts over time (24 years between samples on average) in chromosome inversion frequencies and in ambient temperature for populations of the fly Drosophila subobscura on three continents. In 22 of 26 populations, climates warmed over the intervals, and genotypes characteristic of low latitudes (warm climates) increased in frequency in 21 of those 22 populations. Thus, genetic change in this fly is tracking climate warming and is doing so globally.


Subject(s)
Chromosome Inversion , Climate , Drosophila/genetics , Animals , Europe , Female , Genome, Insect , Geography , Greenhouse Effect , Male , South America , Temperature , Time Factors , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...