Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Genet ; 14: 1286800, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38125750

RESUMO

Introduction: Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Methods: Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. Results: We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. The clustering derived from netMUG achieved an adjusted Rand index of 1 with respect to the synthesized true labels. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these subgroups. Discussion: netMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.

2.
Sci Rep ; 13(1): 19653, 2023 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-37949935

RESUMO

Personalised cancer screening before therapy paves the way toward improving diagnostic accuracy and treatment outcomes. Most approaches are limited to a single data type and do not consider interactions between features, leaving aside the complementary insights that multimodality and systems biology can provide. In this project, we demonstrate the use of graph theory for data integration via individual networks where nodes and edges are individual-specific. We showcase the consequences of early, intermediate, and late graph-based fusion of RNA-Seq data and histopathology whole-slide images for predicting cancer subtypes and severity. The methodology developed is as follows: (1) we create individual networks; (2) we compute the similarity between individuals from these graphs; (3) we train our model on the similarity matrices; (4) we evaluate the performance using the macro F1 score. Pros and cons of elements of the pipeline are evaluated on publicly available real-life datasets. We find that graph-based methods can increase performance over methods that do not study interactions. Additionally, merging multiple data sources often improves classification compared to models based on single data, especially through intermediate fusion. The proposed workflow can easily be adapted to other disease contexts to accelerate and enhance personalized healthcare.


Assuntos
Neoplasias , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Instalações de Saúde , Imagem Multimodal , RNA-Seq , Registros
3.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38001023

RESUMO

MOTIVATION: Large-scale clinical proteomics datasets of infectious pathogens, combined with antimicrobial resistance outcomes, have recently opened the door for machine learning models which aim to improve clinical treatment by predicting resistance early. However, existing prediction frameworks typically train a separate model for each antimicrobial and species in order to predict a pathogen's resistance outcome, resulting in missed opportunities for chemical knowledge transfer and generalizability. RESULTS: We demonstrate the effectiveness of multimodal learning over proteomic and chemical features by exploring two clinically relevant tasks for our proposed deep learning models: drug recommendation and generalized resistance prediction. By adopting this multi-view representation of the pathogenic samples and leveraging the scale of the available datasets, our models outperformed the previous single-drug and single-species predictive models by statistically significant margins. We extensively validated the multi-drug setting, highlighting the challenges in generalizing beyond the training data distribution, and quantitatively demonstrate how suitable representations of antimicrobial drugs constitute a crucial tool in the development of clinically relevant predictive models. AVAILABILITY AND IMPLEMENTATION: The code used to produce the results presented in this article is available at https://github.com/BorgwardtLab/MultimodalAMR.


Assuntos
Antibacterianos , Proteômica , Farmacorresistência Bacteriana , Aprendizado de Máquina
4.
J Pharm Biomed Anal ; 236: 115690, 2023 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-37688907

RESUMO

Quantitative structure-retention relationship models (QSRR) have been utilized as an alternative to costly and time-consuming separation analyses and associated experiments for predicting retention time. However, achieving 100 % accuracy in retention prediction is unrealistic despite the existence of various tools and approaches. The limitations of vast data availability and time complexity hinder the use of most algorithms for retention prediction. Therefore, in this study, we examined and compared two approaches for modelling retention time using a dataset of small molecules with retention times obtained at multiple conditions, referred to as multi-targets (five pH levels: 2.7, 3.5, 5, 6.5, and 8 at gradient times of 20 min of mobile phase). The first approach involved developing separate models for predicting retention time at each condition (single-target approach), while the second approach aimed to learn a single model for predicting retention across all conditions simultaneously (multi-target approach). Our findings highlight the advantages of the multi-target approach over the single-target modelling approach. The multi-target models are more efficient in terms of size and learning speed compared to the single-target models. These retention prediction models offer two-fold benefits. Firstly, they enhance knowledge and understanding of retention times, identifying molecular descriptors that contribute to changes in retention behaviour under different pH conditions. Secondly, these approaches can be extended to address other multi-target property prediction problems, such as multi-quantitative structure Property(X) relationship studies (mt-QS(X)R).

5.
bioRxiv ; 2023 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-37205363

RESUMO

Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these classes. NetMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.

6.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36738256

RESUMO

Many problems in life sciences can be brought back to a comparison of graphs. Even though a multitude of such techniques exist, often, these assume prior knowledge about the partitioning or the number of clusters and fail to provide statistical significance of observed between-network heterogeneity. Addressing these issues, we developed an unsupervised workflow to identify groups of graphs from reliable network-based statistics. In particular, we first compute the similarity between networks via appropriate distance measures between graphs and use them in an unsupervised hierarchical algorithm to identify classes of similar networks. Then, to determine the optimal number of clusters, we recursively test for distances between two groups of networks. The test itself finds its inspiration in distance-wise ANOVA algorithms. Finally, we assess significance via the permutation of between-object distance matrices. Notably, the approach, which we will call netANOVA, is flexible since users can choose multiple options to adapt to specific contexts and network types. We demonstrate the benefits and pitfalls of our approach via extensive simulations and an application to two real-life datasets. NetANOVA achieved high performance in many simulation scenarios while controlling type I error. On non-synthetic data, comparison against state-of-the-art methods showed that netANOVA is often among the top performers. There are many application fields, including precision medicine, for which identifying disease subtypes via individual-level biological networks improves prevention programs, diagnosis and disease monitoring.


Assuntos
Algoritmos , Análise por Conglomerados , Simulação por Computador , Fluxo de Trabalho , Análise de Variância
7.
Int J Mol Sci ; 23(24)2022 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-36555213

RESUMO

A reoccurring issue in neuroepigenomic studies, especially in the context of neurodegenerative disease, is the use of (heterogeneous) bulk tissue, which generates noise during epigenetic profiling. A workable solution to this issue is to quantify epigenetic patterns in individually isolated neuronal cells using laser capture microdissection (LCM). For this purpose, we established a novel approach for targeted DNA methylation profiling of individual genes that relies on a combination of LCM and limiting dilution bisulfite pyrosequencing (LDBSP). Using this approach, we determined cytosine-phosphate-guanine (CpG) methylation rates of single alleles derived from 50 neurons that were isolated from unfixed post-mortem brain tissue. In the present manuscript, we describe the general workflow and, as a showcase, demonstrate how targeted methylation analysis of various genes, in this case, RHBDF2, OXT, TNXB, DNAJB13, PGLYRP1, C3, and LMX1B, can be performed simultaneously. By doing so, we describe an adapted data analysis pipeline for LDBSP, allowing one to include and correct CpG methylation rates derived from multi-allele reactions. In addition, we show that the efficiency of LDBSP on DNA derived from LCM neurons is similar to the efficiency obtained in previously published studies using this technique on other cell types. Overall, the method described here provides the user with a more accurate estimation of the DNA methylation status of each target gene in the analyzed cell pools, thereby adding further validity to this approach.


Assuntos
Doenças Neurodegenerativas , Humanos , Análise de Sequência de DNA/métodos , Metilação de DNA , Encéfalo , Sequenciamento de Nucleotídeos em Larga Escala , Lasers , Chaperonas Moleculares , Proteínas Reguladoras de Apoptose
8.
BMC Bioinformatics ; 23(1): 57, 2022 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-35105309

RESUMO

Genes and gene products do not function in isolation but as components of complex networks of macromolecules through physical or biochemical interactions. Dependencies of gene mutations on genetic background (i.e., epistasis) are believed to play a role in understanding molecular underpinnings of complex diseases such as inflammatory bowel disease (IBD). However, the process of identifying such interactions is complex due to for instance the curse of high dimensionality, dependencies in the data and non-linearity. Here, we propose a novel approach for robust and computationally efficient epistasis detection. We do so by first reducing dimensionality, per gene via diffusion kernel principal components (kpc). Subsequently, kpc gene summaries are used for downstream analysis including the construction of a gene-based epistasis network. We show that our approach is not only able to recover known IBD associated genes but also additional genes of interest linked to this difficult gastrointestinal disease.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Difusão , Redes Reguladoras de Genes , Polimorfismo de Nucleotídeo Único
9.
Gigascience ; 112022 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-35134928

RESUMO

BACKGROUND: Detecting epistatic interactions at the gene level is essential to understanding the biological mechanisms of complex diseases. Unfortunately, genome-wide interaction association studies involve many statistical challenges that make such detection hard. We propose a multi-step protocol for epistasis detection along the edges of a gene-gene co-function network. Such an approach reduces the number of tests performed and provides interpretable interactions while keeping type I error controlled. Yet, mapping gene interactions into testable single-nucleotide polymorphism (SNP)-interaction hypotheses, as well as computing gene pair association scores from SNP pair ones, is not trivial. RESULTS: Here we compare 3 SNP-gene mappings (positional overlap, expression quantitative trait loci, and proximity in 3D structure) and use the adaptive truncated product method to compute gene pair scores. This method is non-parametric, does not require a known null distribution, and is fast to compute. We apply multiple variants of this protocol to a genome-wide association study dataset on inflammatory bowel disease. Different configurations produced different results, highlighting that various mechanisms are implicated in inflammatory bowel disease, while at the same time, results overlapped with known disease characteristics. Importantly, the proposed pipeline also differs from a conventional approach where no network is used, showing the potential for additional discoveries when prior biological knowledge is incorporated into epistasis detection.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
10.
BioData Min ; 14(1): 16, 2021 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-33608043

RESUMO

BACKGROUND: In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. METHODS: To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. RESULTS: Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. CONCLUSION: We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...