Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 140
Filtrar
1.
Mach Learn ; 113(7): 3961-3997, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39221170

RESUMO

There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising datadriven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an "optimized" intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.

2.
Explor Res Clin Soc Pharm ; 15: 100498, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39286030

RESUMO

Objective: This study aims to understand customer perceptions of community pharmacies utilizing publicly available data from Google Maps platform. Materials and methods: Python was used to scrape data with Google Maps APIs. As a result, 17,237 reviews were collected from 512 pharmacies distributed over Riyadh city, Saudi Arabia. Logistic regression was conducted to test the relationships between multiple variables and the given score. In addition, sentiment analysis using VADER (Valence Aware Dictionary for Sentiment Reasoning) model was conducted on written reviews, followed by cross-tabulation and chi-square tests. Results: The Logistic regression model implies that a unit increase in the Pharmacy score enhances the odds of attaining a higher score by approximately 3.734 times. The Mann-Whitney U test showed that a notable and statistically significant difference between "written reviews" and "unwritten reviews" (U = 39,928,072.5, p < 0.001). The Pearson chi-square test generated a value of 2991.315 with 8 degrees of freedom, leading to a p value of 0.000. Discussion: Our study found that the willingness of reviewers to write reviews depends on their perception. This study provides a descriptive analysis of conducted sentiment analysis using VADAR. The chi-square test indicates a significant relationship between rating scores and review sentiments. Conclusion: This study offers valuable findings on customer perception of community pharmacies using a new source of data.

3.
Heliyon ; 10(14): e33781, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39113995

RESUMO

This research examines the unique Chinese approaches to implementing the Early Childhood Curriculum (ECC) in Shenzhen and Hong Kong, drawing on School-based Curriculum Development (SBCD) studies. A total of 200 administrators and teachers were interviewed in total, and transcripts from those interviews were examined, cross-checked, and assessed using document analysis and classroom observation. Through interviews that have been conducted by administrators and teachers analyzed by document analysis and classroom observation, the influence of Chinese culture on ECC implementation is explored using the Cultural-Historical Activity Theory (CHAT). An exploratory, inferential, and descriptive statistical approach evaluates the sociocultural mechanism of ECC in Chinese society. The proposed framework utilizes K-Nearest Neighbor (KNN) regression analysis to illustrate how social development leads to cultural fusion and conflicts. The overall sociocultural framework promotes cultural growth and inheritance in China's early childhood education settings.

4.
Pharmaceutics ; 16(8)2024 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-39204427

RESUMO

The monoclonal antibody (mAb) manufacturing process comes with high profits and high costs, and thus mAb productivity is of vital importance. However, many factors can impact the cell culture process, and lead to mAb productivity reduction. Nowadays, the biopharma industry is actively employing manufacturing information systems, which enable the integration of both online data and offline data. Although the volume of data is large, related data mining studies for mAb productivity improvement are rare. Therefore, a data-driven approach is proposed in this study to leverage both the inline and offline data of the cell culture process to discover the causes of mAb productivity reduction. The approach consists of four steps, namely data preprocessing, phase division, feature extraction and fusion, and cluster comparing. First, data quality issues are solved during the data preprocessing step. Next, the inline data are divided into several phases based on the moving window k-nearest neighbor method. Then, the inline data features are extracted via functional data analysis and combined with the offline data features. Finally, the causes of mAb productivity reduction are identified using the contrasting clusters via the principal component analysis method. A commercial-scale cell culture process case study is provided in this research to verify the effectiveness of the approach. Data from 35 batches were collected, and each batch contained nine inline variables and seven offline variables. The causes of mAb productivity reduction were identified to be the lack of nutrients, and recommended actions were taken according to the result, which was subsequently proven by six validation batches.

5.
Cell Syst ; 15(8): 679-693, 2024 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-39173584

RESUMO

Recent biological studies have been revolutionized in scale and granularity by multiplex and high-throughput assays. Profiling cell responses across several experimental parameters, such as perturbations, time, and genetic contexts, leads to richer and more generalizable findings. However, these multidimensional datasets necessitate a reevaluation of the conventional methods for their representation and analysis. Traditionally, experimental parameters are merged to flatten the data into a two-dimensional matrix, sacrificing crucial experiment context reflected by the structure. As Marshall McLuhan famously stated, "the medium is the message." In this work, we propose that the experiment structure is the medium in which subsequent analysis is performed, and the optimal choice of data representation must reflect the experiment structure. We review how tensor-structured analyses and decompositions can preserve this information. We contend that tensor methods are poised to become integral to the biomedical data sciences toolkit.


Assuntos
Biologia Computacional , Humanos , Biologia Computacional/métodos , Animais , Algoritmos
6.
Ther Innov Regul Sci ; 2024 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-39167298

RESUMO

Whereas AI/ML methods were considered experimental tools in clinical development for some time, nowadays they are widely available. However, stakeholders in the health care industry still need to answer the question which role these methods can realistically play and what standards should be adhered to. Clinical research in late-stage clinical development has particular requirements in terms of robustness, transparency and traceability. These standards should also be adhered to when applying AI/ML methods. Currently there is some formal regulatory guidance available, but this is more directed at settings where a device or medical software is investigated. Here we focus on the application of AI/ML methods in late-stage clinical drug development, i.e. in a setting where currently less guidance is available. This is done via first summarizing available regulatory guidance and work done by regulatory statisticians followed by the presentation of an industry application where the influence of extensive sets of baseline characteristics on the treatment effect can be investigated by applying ML-methods in a standardized manner with intuitive graphical displays leveraging explainable AI methods. The paper aims at stimulating discussions on the role such analyses can play in general rather than advocating for a particular AI/ML-method or indication where such methods could be meaningful.

7.
Artigo em Inglês | MEDLINE | ID: mdl-39082872

RESUMO

Explorative data analysis (EDA) is a critical step in scientific projects, aiming to uncover valuable insights and patterns within data. Traditionally, EDA involves manual inspection, visualization, and various statistical methods. The advent of artificial intelligence (AI) and machine learning (ML) has the potential to improve EDA, offering more sophisticated approaches that enhance its efficacy. This review explores how AI and ML algorithms can improve feature engineering and selection during EDA, leading to more robust predictive models and data-driven decisions. Tree-based models, regularized regression, and clustering algorithms were identified as key techniques. These methods automate feature importance ranking, handle complex interactions, perform feature selection, reveal hidden groupings, and detect anomalies. Real-world applications include risk prediction in total hip arthroplasty and subgroup identification in scoliosis patients. Recent advances in explainable AI and EDA automation show potential for further improvement. The integration of AI and ML into EDA accelerates tasks and uncovers sophisticated insights. However, effective utilization requires a deep understanding of the algorithms, their assumptions, and limitations, along with domain knowledge for proper interpretation. As data continues to grow, AI will play an increasingly pivotal role in EDA when combined with human expertise, driving more informed, data-driven decision-making across various scientific domains. Level of Evidence: Level V - Expert opinion.

8.
Front Public Health ; 12: 1362699, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38584915

RESUMO

Correspondence analysis (CA) is a multivariate statistical and visualization technique. CA is extremely useful in analyzing either two- or multi-way contingency tables, representing some degree of correspondence between columns and rows. The CA results are visualized in easy-to-interpret "bi-plots," where the proximity of items (values of categorical variables) represents the degree of association between presented items. In other words, items positioned near each other are more associated than those located farther away. Each bi-plot has two dimensions, named during the analysis. The naming of dimensions adds a qualitative aspect to the analysis. Correspondence analysis may support medical professionals in finding answers to many important questions related to health, wellbeing, quality of life, and similar topics in a simpler but more informal way than by using more complex statistical or machine learning approaches. In that way, it can be used for dimension reduction and data simplification, clustering, classification, feature selection, knowledge extraction, visualization of adverse effects, or pattern detection.


Assuntos
Pesquisa Biomédica , Qualidade de Vida , Análise por Conglomerados , Aprendizado de Máquina
9.
BMC Bioinformatics ; 25(1): 96, 2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38438881

RESUMO

BACKGROUND: Bisulfite sequencing detects and quantifies DNA methylation patterns, contributing to our understanding of gene expression regulation, genome stability maintenance, conservation of epigenetic mechanisms across divergent taxa, epigenetic inheritance and, eventually, phenotypic variation. Graphical representation of methylation data is crucial in exploring epigenetic regulation on a genome-wide scale in both plants and animals. This is especially relevant for non-model organisms with poorly annotated genomes and/or organisms where genome sequences are not yet assembled on chromosome level. Despite being a technology of choice to profile DNA methylation for many years now there are surprisingly few lightweight and robust standalone tools available for efficient graphical analysis of data in non-model systems. This significantly limits evolutionary studies and agrigenomics research. BSXplorer is a tool specifically developed to fill this gap and assist researchers in explorative data analysis and in visualising and interpreting bisulfite sequencing data more easily. RESULTS: BSXplorer provides in-depth graphical analysis of sequencing data encompassing (a) profiling of methylation levels in metagenes or in user-defined regions using line plots and heatmaps, generation of summary statistics charts, (b) enabling comparative analyses of methylation patterns across experimental samples, methylation contexts and species, and (c) identification of modules sharing similar methylation signatures at functional genomic elements. The tool processes methylation data quickly and offers API and CLI capabilities, along with the ability to create high-quality figures suitable for publication. CONCLUSIONS: BSXplorer facilitates efficient methylation data mining, contrasting and visualization, making it an easy-to-use package that is highly useful for epigenetic research.


Assuntos
Metilação de DNA , Epigênese Genética , Sulfitos , Animais , Análise de Sequência de DNA , Genômica
10.
Materials (Basel) ; 17(5)2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38473679

RESUMO

Fine-grained soils present engineering challenges. Stabilization with marble powder has shown promise for improving engineering properties. Understanding the temporal evolution of Unconfined Compressive Strength (UCS) and geotechnical properties in stabilized soils could aid strength assessment. This study investigates the stabilization of fine-grained clayey soils using waste marble powder as an alternative binder. Laboratory experiments were conducted to evaluate the geotechnical properties of soil-marble powder mixtures, including Atterberg's limits, compaction characteristics, California Bearing Ratio (CBR), Indirect Tensile Strength (ITS), and Unconfined Compressive Strength (UCS). The effects of various factors, such as curing time, molding water content, and composition ratios, on UCS, were analyzed using Exploratory Data Analysis (EDA) techniques, including histograms, box plots, and statistical modeling. The results show that the CBR increased from 10.43 to 22.94% for unsoaked and 4.68 to 12.46% for soaked conditions with 60% marble powder, ITS rose from 100 to 208 kN/m2 with 60-75% marble powder, and UCS rose from 170 to 661 kN/m2 after 28 days of curing, molding water content (optimum at 22.5%), and composition ratios (optimum at 60% marble powder). Complex modeling yielded R2 (0.954) and RMSE (29.82 kN/m2) between predicted and experimental values. This study demonstrates the potential of utilizing waste marble powder as a sustainable and cost-effective binder for soil stabilization, transforming weak soils into viable construction materials.

11.
Comput Struct Biotechnol J ; 23: 483-490, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38261941

RESUMO

INTRODUCTION: The intergovernmental organizations Organisation for Economic Co-operation and Development (OECD) and Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) have developed guidelines for the use of in vitro models for toxicological evaluation of chemicals. However, the presence of manual steps and the requirement of multiple tools for data analysis, apart from being costly and time-consuming, can inadvertently introduce errors by researchers. OBJECTIVES: We have developed the SAEDC platform (Technological Solution for Exploratory Data Analysis and Statistics for Cytotoxicity, in Portuguese), which enables analysis of cytotoxicity data from assays following OECD Guideline No. 129. METHODOLOGY: In vitro experimental data were used to compare with the analysis methodology suggested in the Guideline. We analyzed 117 data sets covering chemicals from Category I to Unclassified according to GHS classification. RESULTS: The four-parameters of non-linear regression (4PL) calculated by the SAEDC platform showed no significant differences compared to standard methodology in any of the data sets (p > 0.05). The coefficient of determination (R-squared) also demonstrated not only a good fit of the 4PL model to the data but also significant similarity to values obtained by the conventional methodology. Finally, the SAEDC platform predicted LD50 values for the chemicals from IC50, using the Registry of Cytotoxicity (RC) regression models. CONCLUSION: The comparison with the standard data analysis methodology revealed that SAEDC platform fulfills the requirements for cytotoxicity data analysis, generating reliable and accurate results with fewer steps performed by researchers. The use of SAEDC platform for obtaining toxicity values can reduce analysis time compared to the standard methodology proposed by regulatory agencies. Thus, automation of the analysis using the SAEDC platform has the potential to save time and resources for cytotoxicity researchers and laboratories while generating reliable results.

12.
Heliyon ; 10(1): e23404, 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38169926

RESUMO

Demand charges are widely used for commercial and industrial consumers. These costs are often not well known, let alone the effects that PV can have on them. This work proposes a methodology to assess the effect of PV on reducing these charges and to optimise the power to be contracted, using techniques taken from exploratory data analysis. This methodology is applied to five case studies of industrial consumers from different sectors in Spain, finding savings between 5 % and 11 % of demand charges in industries with continuous operation and up to 28 % in cases of discontinuous operation. These savings can be even greater if the maximum power that can be contracted is lower than the optimum. The demand charges in Spain consist of a fixed part proportional to the contracted power and a variable part depending on the power peaks exceeding it. Since for the variable part the coincident and non-coincident models coexist, a comparison is made between the two models, finding that in the general case PV users can achieve higher savings with the coincident model.

13.
Environ Res ; 241: 117581, 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-37967705

RESUMO

Plastic consumption and its end-of-life management pose a significant environmental footprint and are energy intensive. Waste-to-resources and prevention strategies have been promoted widely in Europe as countermeasures; however, their effectiveness remains uncertain. This study aims to uncover the environmental footprint patterns of the plastics value chain in the European Union Member States (EU-27) through exploratory data analysis with dimension reduction and grouping. Nine variables are assessed, ranging from socioeconomic and demographic to environmental impacts. Three clusters are formed according to the similarity of a range of characteristics (nine), with environmental impacts being identified as the primary influencing variable in determining the clusters. Most countries belong to Cluster 0, consisting of 17 countries in 2014 and 18 countries in 2019. They represent clusters with a relatively low global warming potential (GWP), with an average value of 2.64 t CO2eq/cap in 2014 and 4.01 t CO2eq/cap in 2019. Among all the assessed countries, Denmark showed a significant change when assessed within the traits of EU-27, categorised from Cluster 1 (high GWP) in 2014 to Cluster 0 (low GWP) in 2019. The analysis of plastic packaging waste statistics in 2019 (data released in 2022) shows that, despite an increase in the recovery rate within the EU-27, the GWP has not reduced, suggesting a rebound effect. The GWP tends to increase in correlation with the higher plastic waste amount. In contrast, other environmental impacts, like eutrophication, abiotic and acidification potential, are identified to be mitigated effectively via recovery, suppressing the adverse effects of an increase in plastic waste generation. The five-year interval data analysis identified distinct clusters within a set of patterns, categorising them based on their similarities. The categorisation and managerial insights serve as a foundation for devising a focused mitigation strategy.


Assuntos
Gerenciamento de Resíduos , Gerenciamento de Resíduos/métodos , Europa (Continente) , Embalagem de Produtos , Meio Ambiente , Aquecimento Global , Plásticos , Reciclagem
14.
Psychometrika ; 2023 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-38085454

RESUMO

Generalized structured component analysis (GSCA) is a structural equation modeling (SEM) procedure that constructs components by weighted sums of observed variables and confirmatorily examines their regressional relationship. The research proposes an exploratory version of GSCA, called exploratory GSCA (EGSCA). EGSCA is analogous to exploratory SEM (ESEM) developed as an exploratory factor-based SEM procedure, which seeks the relationships between the observed variables and the components by orthogonal rotation of the parameter matrices. The indeterminacy of orthogonal rotation in GSCA is first shown as a theoretical support of the proposed method. The whole EGSCA procedure is then presented, together with a new rotational algorithm specialized to EGSCA, which aims at simultaneous simplification of all parameter matrices. Two numerical simulation studies revealed that EGSCA with the following rotation successfully recovered the true values of the parameter matrices and was superior to the existing GSCA procedure. EGSCA was applied to two real datasets, and the model suggested by the EGSCA's result was shown to be better than the model proposed by previous research, which demonstrates the effectiveness of EGSCA in model exploration.

15.
J Proteome Res ; 2023 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-38085827

RESUMO

PMart is a web-based tool for reproducible quality control, exploratory data analysis, statistical analysis, and interactive visualization of 'omics data, based on the functionality of the pmartR R package. The newly improved user interface supports more 'omics data types, additional statistical capabilities, and enhanced options for creating downloadable graphics. PMart supports the analysis of label-free and isobaric-labeled (e.g., TMT, iTRAQ) proteomics, nuclear magnetic resonance (NMR) and mass-spectrometry (MS)-based metabolomics, MS-based lipidomics, and ribonucleic acid sequencing (RNA-seq) transcriptomics data. At the end of a PMart session, a report is available that summarizes the processing steps performed and includes the pmartR R package functions used to execute the data processing. In addition, built-in safeguards in the backend code prevent users from utilizing methods that are inappropriate based on omics data type. PMart is a user-friendly interface for conducting exploratory data analysis and statistical comparisons of omics data without programming.

16.
Sensors (Basel) ; 23(21)2023 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-37960666

RESUMO

In this paper, we propose a data classification and analysis method to estimate fire risk using facility data of thermal power plants. To estimate fire risk based on facility data, we divided facilities into three states-Steady, Transient, and Anomaly-categorized by their purposes and operational conditions. This method is designed to satisfy three requirements of fire protection systems for thermal power plants. For example, areas with fire risk must be identified, and fire risks should be classified and integrated into existing systems. We classified thermal power plants into turbine, boiler, and indoor coal shed zones. Each zone was subdivided into small pieces of equipment. The turbine, generator, oil-related equipment, hydrogen (H2), and boiler feed pump (BFP) were selected for the turbine zone, while the pulverizer and ignition oil were chosen for the boiler zone. We selected fire-related tags from Supervisory Control and Data Acquisition (SCADA) data and acquired sample data during a specific period for two thermal power plants based on inspection of fire and explosion scenarios in thermal power plants over many years. We focused on crucial fire cases such as pool fires, 3D fires, and jet fires and organized three fire hazard levels for each zone. Experimental analysis was conducted with these data set by the proposed method for 500 MW and 100 MW thermal power plants. The data classification and analysis methods presented in this paper can provide indirect experience for data analysts who do not have domain knowledge about power plant fires and can also offer good inspiration for data analysts who need to understand power plant facilities.

17.
J Pers Med ; 13(9)2023 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-37763188

RESUMO

Cardiovascular diseases (CVDs) account for a significant portion of global mortality, emphasizing the need for effective strategies. This study focuses on myocardial infarction, pulmonary thromboembolism, and aortic stenosis, aiming to empower medical practitioners with tools for informed decision making and timely interventions. Drawing from data at Hospital Santa Maria, our approach combines exploratory data analysis (EDA) and predictive machine learning (ML) models, guided by the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. EDA reveals intricate patterns and relationships specific to cardiovascular diseases. ML models achieve accuracies above 80%, providing a 13 min window to predict myocardial ischemia incidents and intervene proactively. This paper presents a Proof of Concept for real-time data and predictive capabilities in enhancing medical strategies.

18.
J Cheminform ; 15(1): 82, 2023 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-37726809

RESUMO

We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24-25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and academics from seventy-nine countries registered for the meeting. As part of the meeting, advances in enumeration and visualization of chemical space, applications in natural product-based drug discovery, drug discovery for neglected diseases, toxicity prediction, and general guidelines for data analysis were discussed. Experts from ChEMBL presented a workshop on how to use the resources of this major compounds database used in cheminformatics. The school also included a round table with editors of cheminformatics journals. The full program of the meeting and the recordings of the sessions are publicly available at https://www.youtube.com/@SchoolChemInfLA/featured .

19.
PeerJ Comput Sci ; 9: e1528, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37705643

RESUMO

Background: Electronic health records (EHRs) play a crucial role in healthcare decision-making by giving physicians insights into disease progression and suitable treatment options. Within EHRs, laboratory test results are frequently utilized for predicting disease progression. However, processing laboratory test results often poses challenges due to variations in units and formats. In addition, leveraging the temporal information in EHRs can improve outcomes, prognoses, and diagnosis predication. Nevertheless, the irregular frequency of the data in these records necessitates data preprocessing, which can add complexity to time-series analyses. Methods: To address these challenges, we developed an open-source R package that facilitates the extraction of temporal information from laboratory records. The proposed lab package generates analysis-ready time series data by segmenting the data into time-series windows and imputing missing values. Moreover, users can map local laboratory codes to the Logical Observation Identifier Names and Codes (LOINC), an international standard. This mapping allows users to incorporate additional information, such as reference ranges and related diseases. Moreover, the reference ranges provided by LOINC enable us to categorize results into normal or abnormal. Finally, the analysis-ready time series data can be further summarized using descriptive statistics and utilized to develop models using machine learning technologies. Results: Using the lab package, we analyzed data from MIMIC-III, focusing on newborns with patent ductus arteriosus (PDA). We extracted time-series laboratory records and compared the differences in test results between patients with and without 30-day in-hospital mortality. We then identified significant variations in several laboratory test results 7 days after PDA diagnosis. Leveraging the time series-analysis-ready data, we trained a prediction model with the long short-term memory algorithm, achieving an area under the receiver operating characteristic curve of 0.83 for predicting 30-day in-hospital mortality in model training. These findings demonstrate the lab package's effectiveness in analyzing disease progression. Conclusions: The proposed lab package simplifies and expedites the workflow involved in laboratory records extraction. This tool is particularly valuable in assisting clinical data analysts in overcoming the obstacles associated with heterogeneous and sparse laboratory records.

20.
Cancers (Basel) ; 15(16)2023 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-37627043

RESUMO

Machine learning (ML) models have become capable of making critical decisions on our behalf. Nevertheless, due to complexity of these models, interpreting their decisions can be challenging, and humans cannot always control them. This paper provides explanations of decisions made by ML models in diagnosing four types of posterior fossa tumors: medulloblastoma, ependymoma, pilocytic astrocytoma, and brainstem glioma. The proposed methodology involves data analysis using kernel density estimations with Gaussian distributions to examine individual MRI features, conducting an analysis on the relationships between these features, and performing a comprehensive analysis of ML model behavior. This approach offers a simple yet informative and reliable means of identifying and validating distinguishable MRI features for the diagnosis of pediatric brain tumors. By presenting a comprehensive analysis of the responses of the four pediatric tumor types to each other and to ML models in a single source, this study aims to bridge the knowledge gap in the existing literature concerning the relationship between ML and medical outcomes. The results highlight that employing a simplistic approach in the absence of very large datasets leads to significantly more pronounced and explainable outcomes, as expected. Additionally, the study also demonstrates that the pre-analysis results consistently align with the outputs of the ML models and the clinical findings reported in the existing literature.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA