Search | VHL Regional Portal

Combining heterogeneous data sources for neuroimaging based diagnosis: re-weighting and selecting what is important.

Donini, Michele; Monteiro, João M; Pontil, Massimiliano; Hahn, Tim; Fallgatter, Andreas J; Shawe-Taylor, John; Mourão-Miranda, Janaina.

Neuroimage ; 195: 215-231, 2019 07 15.

Article in English | MEDLINE | ID: mdl-30894334

ABSTRACT

Combining neuroimaging and clinical information for diagnosis, as for example behavioral tasks and genetics characteristics, is potentially beneficial but presents challenges in terms of finding the best data representation for the different sources of information. Their simple combination usually does not provide an improvement if compared with using the best source alone. In this paper, we proposed a framework based on a recent multiple kernel learning algorithm called EasyMKL and we investigated the benefits of this approach for diagnosing two different mental health diseases. The well known Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset tackling the Alzheimer Disease (AD) patients versus healthy controls classification task, and a second dataset tackling the task of classifying an heterogeneous group of depressed patients versus healthy controls. We used EasyMKL to combine a huge amount of basic kernels alongside a feature selection methodology, pursuing an optimal and sparse solution to facilitate interpretability. Our results show that the proposed approach, called EasyMKLFS, outperforms baselines (e.g. SVM and SimpleMKL), state-of-the-art random forests (RF) and feature selection (FS) methods.

Subject(s)

Algorithms , Alzheimer Disease/diagnosis , Depression/diagnosis , Machine Learning , Neuroimaging/methods , Humans , Image Interpretation, Computer-Assisted/methods

Learning With Kernels: A Local Rademacher Complexity-Based Analysis With Application to Graph Kernels.

Oneto, Luca; Navarin, Nicolo; Donini, Michele; Ridella, Sandro; Sperduti, Alessandro; Aiolli, Fabio; Anguita, Davide.

IEEE Trans Neural Netw Learn Syst ; 29(10): 4660-4671, 2018 10.

Article in English | MEDLINE | ID: mdl-29990207

ABSTRACT

When dealing with kernel methods, one has to decide which kernel and which values for the hyperparameters to use. Resampling techniques can address this issue but these procedures are time-consuming. This problem is particularly challenging when dealing with structured data, in particular with graphs, since several kernels for graph data have been proposed in literature, but no clear relationship among them in terms of learning properties is defined. In these cases, exhaustive search seems to be the only reasonable approach. Recently, the global Rademacher complexity (RC) and local Rademacher complexity (LRC), two powerful measures of the complexity of a hypothesis space, have shown to be suited for studying kernels properties. In particular, the LRC is able to bound the generalization error of an hypothesis chosen in a space by disregarding those ones which will not be taken into account by any learning procedure because of their high error. In this paper, we show a new approach to efficiently bound the RC of the space induced by a kernel, since its exact computation is an NP-Hard problem. Then we show for the first time that RC can be used to estimate the accuracy and expressivity of different graph kernels under different parameter configurations. The authors' claims are supported by experimental results on several real-world graph data sets.

Scuba: scalable kernel-based gene prioritization.

Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio.

BMC Bioinformatics ; 19(1): 23, 2018 01 25.

Article in English | MEDLINE | ID: mdl-29370760

ABSTRACT

BACKGROUND: The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. RESULTS: We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. CONCLUSIONS: Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .

Subject(s)

User-Computer Interface , Algorithms , Databases, Factual , Genome-Wide Association Study , Humans , Internet

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL