Pesquisa | Portal Regional da BVS (teste)

Medical-informed machine learning: integrating prior knowledge into medical decision systems.

Sirocchi, Christel; Bogliolo, Alessandro; Montagna, Sara.

BMC Med Inform Decis Mak ; 24(Suppl 4): 186, 2024 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-38943085

RESUMO

BACKGROUND: Clinical medicine offers a promising arena for applying Machine Learning (ML) models. However, despite numerous studies employing ML in medical data analysis, only a fraction have impacted clinical care. This article underscores the importance of utilising ML in medical data analysis, recognising that ML alone may not adequately capture the full complexity of clinical data, thereby advocating for the integration of medical domain knowledge in ML. METHODS: The study conducts a comprehensive review of prior efforts in integrating medical knowledge into ML and maps these integration strategies onto the phases of the ML pipeline, encompassing data pre-processing, feature engineering, model training, and output evaluation. The study further explores the significance and impact of such integration through a case study on diabetes prediction. Here, clinical knowledge, encompassing rules, causal networks, intervals, and formulas, is integrated at each stage of the ML pipeline, resulting in a spectrum of integrated models. RESULTS: The findings highlight the benefits of integration in terms of accuracy, interpretability, data efficiency, and adherence to clinical guidelines. In several cases, integrated models outperformed purely data-driven approaches, underscoring the potential for domain knowledge to enhance ML models through improved generalisation. In other cases, the integration was instrumental in enhancing model interpretability and ensuring conformity with established clinical guidelines. Notably, knowledge integration also proved effective in maintaining performance under limited data scenarios. CONCLUSIONS: By illustrating various integration strategies through a clinical case study, this work provides guidance to inspire and facilitate future integration efforts. Furthermore, the study identifies the need to refine domain knowledge representation and fine-tune its contribution to the ML model as the two main challenges to integration and aims to stimulate further research in this direction.

Assuntos

Sistemas de Apoio a Decisões Clínicas , Aprendizado de Máquina , Humanos

Exploring machine learning for untargeted metabolomics using molecular fingerprints.

Sirocchi, Christel; Biancucci, Federica; Donati, Matteo; Bogliolo, Alessandro; Magnani, Mauro; Menotta, Michele; Montagna, Sara.

Comput Methods Programs Biomed ; 250: 108163, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38626559

RESUMO

BACKGROUND: Metabolomics, the study of substrates and products of cellular metabolism, offers valuable insights into an organism's state under specific conditions and has the potential to revolutionise preventive healthcare and pharmaceutical research. However, analysing large metabolomics datasets remains challenging, with available methods relying on limited and incompletely annotated metabolic pathways. METHODS: This study, inspired by well-established methods in drug discovery, employs machine learning on metabolite fingerprints to explore the relationship of their structure with responses in experimental conditions beyond known pathways, shedding light on metabolic processes. It evaluates fingerprinting effectiveness in representing metabolites, addressing challenges like class imbalance, data sparsity, high dimensionality, duplicate structural encoding, and interpretable features. Feature importance analysis is then applied to reveal key chemical configurations affecting classification, identifying related metabolite groups. RESULTS: The approach is tested on two datasets: one on Ataxia Telangiectasia and another on endothelial cells under low oxygen. Machine learning on molecular fingerprints predicts metabolite responses effectively, and feature importance analysis aligns with known metabolic pathways, unveiling new affected metabolite groups for further study. CONCLUSION: In conclusion, the presented approach leverages the strengths of drug discovery to address critical issues in metabolomics research and aims to bridge the gap between these two disciplines. This work lays the foundation for future research in this direction, possibly exploring alternative structural encodings and machine learning models.

Assuntos

Aprendizado de Máquina , Metabolômica , Metabolômica/métodos , Humanos , Linhagem Celular , Ataxia Telangiectasia/metabolismo , Hipóxia Celular/fisiologia

Topological network features determine convergence rate of distributed average algorithms.

Sirocchi, Christel; Bogliolo, Alessandro.

Sci Rep ; 12(1): 21831, 2022 Dec 17.

Artigo em Inglês | MEDLINE | ID: mdl-36528734

RESUMO

Gossip algorithms are message-passing schemes designed to compute averages and other global functions over networks through asynchronous and randomised pairwise interactions. Gossip-based protocols have drawn much attention for achieving robust and fault-tolerant communication while maintaining simplicity and scalability. However, the frequent propagation of redundant information makes them inefficient and resource-intensive. Most previous works have been devoted to deriving performance bounds and developing faster algorithms tailored to specific structures. In contrast, this study focuses on characterising the effect of topological network features on performance so that faster convergence can be engineered by acting on the underlying network rather than the gossip algorithm. The numerical experiments identify the topological limiting factors, the most predictive graph metrics, and the most efficient algorithms for each graph family and for all graphs, providing guidelines for designing and maintaining resource-efficient networks. Regression analyses confirm the explanatory power of structural features and demonstrate the validity of the topological approach in performance estimation. Finally, the high predictive capabilities of local metrics and the possibility of computing them in a distributed manner and at a low computational cost inform the design and implementation of a novel distributed approach for predicting performance from the network topology.

Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments.

Selega, Alina; Sirocchi, Christel; Iosub, Ira; Granneman, Sander; Sanguinetti, Guido.

Nat Methods ; 14(1): 83-89, 2017 01.

Artigo em Inglês | MEDLINE | ID: mdl-27819660

RESUMO

Structure probing coupled with high-throughput sequencing could revolutionize our understanding of the role of RNA structure in regulation of gene expression. Despite recent technological advances, intrinsic noise and high sequence coverage requirements greatly limit the applicability of these techniques. Here we describe a probabilistic modeling pipeline that accounts for biological variability and biases in the data, yielding statistically interpretable scores for the probability of nucleotide modification transcriptome wide. Using two yeast data sets, we demonstrate that our method has increased sensitivity, and thus our pipeline identifies modified regions on many more transcripts than do existing pipelines. Our method also provides confident predictions at much lower sequence coverage levels than those recommended for reliable structural probing. Our results show that statistical modeling extends the scope and potential of transcriptome-wide structure probing experiments.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Estatísticos , RNA/química , RNA/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Pareamento de Bases , Sequência de Bases , Biologia Computacional/métodos , Humanos , Conformação de Ácido Nucleico

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA