Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
1.
Heliyon ; 10(16): e35963, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39247347

RESUMEN

Ontologies play a pivotal role in knowledge representation across various artificial intelligence domains, serving as foundational frameworks for organizing data and concepts. However, the construction and evolution of ontologies frequently lead to logical contradictions that undermine their utility and accuracy. Typically, these contradictions are addressed using an Integer Linear Programming (ILP) model, which traditionally treats all formulas with equal importance, thereby neglecting the distinct impacts of individual formulas within minimal conflict sets. To advance this method, we integrate cooperative game theory to compute the Shapley value for each formula, reflecting its marginal contribution towards resolving logical contradictions. We further construct a graph-based representation of the ontology, enabling the extension of Shapley values to Myerson values. Subsequently, we introduce a Myerson-weighted ILP model that employs a lexicographic approach to eliminate logical contradictions in ontologies. The model ensures the minimum number of formula deletions, subsequently applying Myerson values to guide the prioritization of deletions. Our comparative analysis across 18 ontologies confirms that our approach not only preserves more graph edges than traditional ILP models but also quantifies formula contributions and establishes deletion priorities, presenting a novel approach to ILP-based contradiction resolution.

2.
Artículo en Inglés | MEDLINE | ID: mdl-39102136

RESUMEN

In this study, six individual machine learning (ML) models and a stacked ensemble model (SEM) were used for daytime visibility estimation at Bangkok airport during the dry season (November-April) for 2017-2022. The individual ML models are random forest, adaptive boosting, gradient boosting, extreme gradient boosting, light gradient boosting machine, and cat boosting. The SEM was developed by the combination of outputs from the individual models. Furthermore, the impact of factors affecting visibility was examined using the Shapley Additive exPlanation (SHAP) method, an interpretable ML technique inspired by the game theory-based approach. The predictor variables include different air pollutants, meteorological variables, and time-related variables. The light gradient boosting machine model is identified as the most effective individual ML model. On an hourly time scale, it showed the best performance across three out of four metrics with the ρ = 0.86, MB = 0, ME = 0.48 km (second lowest), and RMSE = 0.8 km. On a daily time scale, the model performed the best for all evaluation metrics with ρ = 0.92, MB = 0.0 km, ME = 0.3 km, and RMSE = 0.43 km. The SEM outperformed all the individual models across three out of four metrics on an hourly time scale with ρ = 0.88, MB = 0.0 km, (second lowest), and RMSE = 0.75 km. On the daily scale, it performed the best with ρ = 0.93, MB = 0.02 km, ME = 0.27 km, and RMSE = 0.4 km. The seasonal average original (VISorig) and meteorologically normalized visibility (VISnorm) decrease from 2017 to 2021 but increase in 2022. The rate of decrease in VISorig is double than rate of decrease in VISnorm which suggests the effect of meteorology visibility degradation. The SHAP analysis identified relative humidity (RH), PM2.5, PM10, day of the season year (i.e., Julian day) (JD), and O3 as the most important variables affecting visibility. At low RH, visibility is not sensitive to changes in RH. However, beyond a threshold, a negative correlation between RH and visibility is found potentially due to the hygroscopic growth of aerosols. The dependence of the Shapley values of PM2.5 and PM10 on RH and the change in average visibilities under different RH intervals also suggest the effect of hygroscopic growth of aerosol on visibility. A negative relationship has been identified between visibility and both PM2.5 and PM10. Visibility is positively correlated with O3 at lower to moderate concentrations, with diminishing impact at very high concentrations. The JD is strongly negatively related to visibility during winter while weakly associated positively later in summer. Findings from this research suggest the feasibility of employing machine learning techniques for predicting visibility and understanding the factors influencing its fluctuations.

3.
Sci Rep ; 14(1): 19622, 2024 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-39179618

RESUMEN

Autoencoders are dimension reduction models in the field of machine learning which can be thought of as a neural network counterpart of principal components analysis (PCA). Due to their flexibility and good performance, autoencoders have been recently used for estimating nonlinear factor models in finance. The main weakness of autoencoders is that the results are less explainable than those obtained with the PCA. In this paper, we propose the adoption of the Shapley value to improve the explainability of autoencoders in the context of nonlinear factor models. In particular, we measure the relevance of nonlinear latent factors using a forecast-based Shapley value approach that measures each latent factor's contributions in determining the out-of-sample accuracy in factor-augmented models. Considering the interesting empirical instance of the commodity market, we identify the most relevant latent factors for each commodity based on their out-of-sample forecasting ability.

4.
Sensors (Basel) ; 24(15)2024 Jul 31.
Artículo en Inglés | MEDLINE | ID: mdl-39124015

RESUMEN

Federated learning is an effective approach for preserving data privacy and security, enabling machine learning to occur in a distributed environment and promoting its development. However, an urgent problem that needs to be addressed is how to encourage active client participation in federated learning. The Shapley value, a classical concept in cooperative game theory, has been utilized for data valuation in machine learning services. Nevertheless, existing numerical evaluation schemes based on the Shapley value are impractical, as they necessitate additional model training, leading to increased communication overhead. Moreover, participants' data may exhibit Non-IID characteristics, posing a significant challenge to evaluating participant contributions. Non-IID data have greatly affected the accuracy of the global model, weakened the marginal effect of the participants, and led to the underestimated contribution measurement results of the participants. Current work often overlooks the impact of heterogeneity on model aggregation. This paper presents a fair federated learning contribution measurement scheme that addresses the need for additional model computations. By introducing a novel aggregation weight, it enhances the accuracy of the contribution measurement. Experiments on the MNIST and Fashion MNIST dataset show that the proposed method can accurately compute the contributions of participants. Compared to existing baseline algorithms, the model accuracy is significantly improved, with a similar time cost.

5.
Res Sq ; 2024 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-38946976

RESUMEN

Objective: The aim of this study was to develop a predictive model for uncorrected/actual fluid intelligence scores in 9-10 year old children using magnetic resonance T1-weighted imaging. Explore the predictive performance of an autoencoder model based on reconstruction regularization for fluid intelligence in adolescents. Methods: We collected actual fluid intelligence scores and T1-weighted MRIs of 11,534 adolescents who completed baseline tasks from ABCD Data Release 3.0. A total of 148 ROIs were selected and 604 features were proposed by FreeSurfer segmentation. The training and testing sets were divided in a ratio of 7:3. To predict fluid intelligence scores, we used AE, MLP and classic machine learning models, and compared their performance on the test set. In addition, we explored their performance across gender subpopulations. Moreover, we evaluated the importance of features using the SHapley Additive Explain method. Results: The proposed model achieves optimal performance on the test set for predicting actual fluid intelligence scores (PCC = 0.209 ± 0.02, MSE = 105.212 ± 2.53). Results show that autoencoders with refactoring regularization are significantly more effective than MLPs and classical machine learning models. In addition, all models performed better on female adolescents than on male adolescents. Further analysis of relevant characteristics in different populations revealed that this may be related to gender differences in underlying fluid intelligence mechanisms. Conclusions: We construct a weak but stable correlation between brain structural features and raw fluid intelligence using autoencoders. Future research may need to explore ensemble regression strategies utilizing multiple machine learning algorithms on multimodal data in order to improve the predictive performance of fluid intelligence based on neuroimaging features.

6.
Water Sci Technol ; 90(1): 156-167, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39007312

RESUMEN

Model parameter estimation is a well-known inverse problem, as long as single-value point data are available as observations of system performance measurement. However, classical statistical methods, such as the minimization of an objective function or maximum likelihood, are no longer straightforward, when measurements are imprecise in nature. Typical examples of the latter include censored data and binary information. Here, we explore Approximate Bayesian Computation as a simple method to perform model parameter estimation with such imprecise information. We demonstrate the method for the example of a plain rainfall-runoff model and illustrate the advantages and shortcomings. Last, we outline the value of Shapley values to determine which type of observation contributes to the parameter estimation and which are of minor importance.


Asunto(s)
Teorema de Bayes , Modelos Teóricos , Lluvia , Modelos Estadísticos
7.
Bull Math Biol ; 86(8): 103, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38980452

RESUMEN

Phylogenetic diversity indices are commonly used to rank the elements in a collection of species or populations for conservation purposes. The derivation of these indices is typically based on some quantitative description of the evolutionary history of the species in question, which is often given in terms of a phylogenetic tree. Both rooted and unrooted phylogenetic trees can be employed, and there are close connections between the indices that are derived in these two different ways. In this paper, we introduce more general phylogenetic diversity indices that can be derived from collections of subsets (clusters) and collections of bipartitions (splits) of the given set of species. Such indices could be useful, for example, in case there is some uncertainty in the topology of the tree being used to derive a phylogenetic diversity index. As well as characterizing some of the indices that we introduce in terms of their special properties, we provide a link between cluster-based and split-based phylogenetic diversity indices that uses a discrete analogue of the classical link between affine and projective geometry. This provides a unified framework for many of the various phylogenetic diversity indices used in the literature based on rooted and unrooted phylogenetic trees, generalizations and new proofs for previous results concerning tree-based indices, and a way to define some new phylogenetic diversity indices that naturally arise as affine or projective variants of each other or as generalizations of tree-based indices.


Asunto(s)
Biodiversidad , Filogenia , Modelos Genéticos , Conceptos Matemáticos , Evolución Biológica , Animales
8.
Sensors (Basel) ; 24(10)2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38793838

RESUMEN

Collaborative crowdsensing is a team collaboration model that harnesses the intelligence of a large network of participants, primarily applied in areas such as intelligent computing, federated learning, and blockchain. Unlike traditional crowdsensing, user recruitment in collaborative crowdsensing not only considers the individual capabilities of users but also emphasizes their collaborative abilities. In this context, this paper takes a unique approach by modeling user interactions as a graph, transforming the recruitment challenge into a graph theory problem. The methodology employs an enhanced Prim algorithm to identify optimal team members by finding the maximum spanning tree within the user interaction graph. After the recruitment, the collaborative crowdsensing explored in this paper presents a challenge of unfair incentives due to users engaging in free-riding behavior. To address these challenges, the paper introduces the MR-SVIM mechanism. Initially, the process begins with a Gaussian mixture model predicting the quality of users' tasks, combined with historical reputation values to calculate their direct reputation. Subsequently, to assess users' significance within the team, aggregation functions and the improved PageRank algorithm are employed for local and global influence evaluation, respectively. Indirect reputation is determined based on users' importance and similarity with interacting peers. Considering the comprehensive reputation value derived from the combined assessment of direct and indirect reputations, and integrating the collaborative capabilities among users, we have formulated a feature function for contribution. This function is applied within an enhanced Shapley value method to assess the relative contributions of each user, achieving a more equitable distribution of earnings. Finally, experiments conducted on real datasets validate the fairness of this mechanism.

9.
Sci Rep ; 14(1): 5179, 2024 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-38431737

RESUMEN

This paper constructs a two-layer road data asset revenue allocation model based on a modified Shapley value approach. The first layer allocates revenue to three roles in the data value realization process: the original data collectors, the data processors, and the data product producers. It fully considers and appropriately adjusts the revenue allocation to each role based on data risk factors. The second layer determines the correction factors for different roles to distribute revenue among the participants within those roles. Finally, the revenue values of the participants within each role are synthesized to obtain a consolidated revenue distribution for each participant. Compared to the traditional Shapley value method, this model establishes a revenue allocation evaluation index system, uses entropy weighting and rough set theory to determine the weights, and adopts a fuzzy comprehensive evaluation and numerical analysis to assess the degree of contribution of participants. It fully accounts for differences in both the qualitative and quantitative contributions of participants, enabling a fairer and more reasonable distribution of revenues. This study provides new perspectives and methodologies for the benefit distribution mechanism in road data assets, which aid in promoting the market-based use of road data assets, and it serves as an important reference for the application of data assetization in the road transportation industry.

10.
J Environ Manage ; 356: 120467, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38484592

RESUMEN

Urban flood risk assessment delivers invaluable information regarding flood management as well as preventing the associated risks in urban areas. The present study prepares a flood risk map and evaluate the practices of low-impact development (LID) intended to decrease the flood risk in Shiraz Municipal District 4, Fars province, Iran. So, this study investigate flood vulnerability using MCDM models and some indices, including population density, building age, socio-economic conditions, floor area ratio, literacy, the elderly population, and the number of building floors to. Then, the map of thematic layers affecting the urban flood hazard, including annual mean rainfall, land use, elevation, slope percentage, curve number, distance from channel, depth of groundwater, and channel density, was prepared in GIS. After conducting a multicollinearity test, data mining models were used to create the urban flood hazard map, and the urban flood risk map was produced using ArcGIS 10.8. The evaluation of vulnerability models was shown through the use of Boolean logic that TOPSIS and VIKOR models were effective in identifying urban flooding vulnerable areas. Data mining models were also evaluated using ROC and precision-recall curves, indicating the accuracy of the RF model. The importance of input variables was measured using Shapley value, which showed that curve number, land use, and elevation were more important in flood hazard modeling. According to the results, 37.8 percent of the area falls into high and very high categories in terms of flooding risk. The study used a stormwater management model (SWMM) to simulate node flooding and provide management scenarios for rainfall events with a return period ranging from 2 to 50 years and five rainstorm events. The use of LID practices in flood management was found to be effective for rainfall events with a return period of less than 10 years, particularly for two-year events. However, the effectiveness of LID practices decreases with an increase in the return period. By applying a combined approach to a region covering approximately 10 percent of the total area of Shiraz Municipal District 4, a reduction of 2-22.8 percent in node flooding was achieved. The analysis of data mining and MCDM models with a physical model revealed that more than 60% of flooded nodes were classified as "high" and "very high" risk categories in the RF-VIKOR and RF-TOPSIS risk models.


Asunto(s)
Inundaciones , Agua Subterránea , Anciano , Humanos , Irán
11.
Artif Intell Med ; 150: 102810, 2024 04.
Artículo en Inglés | MEDLINE | ID: mdl-38553149

RESUMEN

Dysphonia is one of the early symptoms of Parkinson's disease (PD). Most existing methods use feature selection methods to find the optimal subset of voice features for all PD patients. Few have considered the heterogeneity between patients, which implies the need to provide specific prediction models for different patients. However, building the specific model faces the challenge of small sample size, which makes it lack generalization ability. Instance transfer is an effective way to solve this problem. Therefore, this paper proposes a patient-specific game-based transfer (PSGT) method for PD severity prediction. First, a selection mechanism is used to select PD patients with similar disease trends to the target patient from the source domain, which reduces the risk of negative transfer. Then, the contribution of the transferred subjects and their instances to the disease estimation of the target subject is fairly evaluated by the Shapley value, which improves the interpretability of the method. Next, the proportion of valid instances in the transferred subjects is determined, and the instances with higher contribution are transferred to further reduce the difference between the transferred instance subset and the target subject. Finally, the selected subset of instances is added to the training set of the target subject, and the extended data is fed into the random forest to improve the performance of the method. Parkinson's telemonitoring dataset is used to evaluate the feasibility and effectiveness. The mean values of mean absolute error, root mean square error, and volatility obtained by predicting motor-UPDRS and total-UPDRS for target patients are 1.59, 1.95, 1.56 and 1.98, 2.54, 1.94, respectively. Experiment results show that the PSGT has better performance in both prediction error and stability over compared methods.


Asunto(s)
Enfermedad de Parkinson , Humanos , Enfermedad de Parkinson/diagnóstico , Aprendizaje Automático , Índice de Severidad de la Enfermedad
12.
Inquiry ; 61: 469580231224823, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38281114

RESUMEN

Dramatic geographic variations in healthcare expenditures were documented by developed countries, but little is known about such variations under China's context, and what causes such variations. This study aims to examine variations of healthcare expenditures among small areas and to determine the associations between demand-, supply-factors, and per capita inpatient expenditures. This cross-sectional study utilized hospital discharge data aggregated within delineated hospital service areas (HSAs) using the small-area analysis approach. Linear multivariate regression modeling with robust standard errors was used to estimate the sources of variation of per capita inpatient expenditures across HSAs covering the years 2017 to 2019; the Shapley value decomposition method was used to measure the respective contributions of demand-, supply-side to such variations. Among 149 HSAs, demand factors explained most of the (87.4%) overall geographic variation among HSAs. With each 1% increase in GDP per capita and urbanization rate was associated with 0.099% and 0.9% increase in inpatient expenditure per capita, respectively, while each 1% increase in the share of females and the unemployment rate was associated with a 0.7% and 0.4% reduction in the per capita inpatient expenditures, respectively. In supply-side, for every 1 increase in hospital beds per 1000 population, the per capita inpatient expenditures rose by 2.9%, while with every 1% increase in the share of private hospitals, the per capita inpatient expenditures would decrease by 0.4%. With Herfindahl-Hirschman Index decrease 10%, the per capita inpatient expenditures would increase 1.06%. This study suggests demand-side factors are associated with large geographic variation in per capita inpatient expenditures among HSAs, while supply-side factors played an important role. The evaluation of geographic variations in per capita inpatient expenditures as well as its associated factors have great potential to provide an indirect approach to identify possibly existing underutilized or overutilized healthcare procedures.


Asunto(s)
Atención a la Salud , Gastos en Salud , Femenino , Humanos , Análisis de Área Pequeña , Estudios Transversales , Instituciones de Salud
13.
Diagnostics (Basel) ; 13(23)2023 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-38066799

RESUMEN

The aim of this study is to propose a new feature selection method based on the class-based contribution of Shapley values. For this purpose, a clinical decision support system was developed to assist doctors in their diagnosis of lung diseases from lung sounds. The developed systems, which are based on the Decision Tree Algorithm (DTA), create a classification for five different cases: healthy and disease (URTI, COPD, Pneumonia, and Bronchiolitis) states. The most important reason for using a Decision Tree Classifier instead of other high-performance classifiers such as CNN and RNN is that the class contributions of Shapley values can be seen with this classifier. The systems developed consist of either a single DTA classifier or five parallel DTA classifiers each of which is optimized to make a binary classification such as healthy vs. others, COPD vs. Others, etc. Feature sets based on Power Spectral Density (PSD), Mel Frequency Cepstral Coefficients (MFCC), and statistical characteristics extracted from lung sound recordings were used in these classifications. The results indicate that employing features selected based on the class-based contribution of Shapley values, along with utilizing an ensemble (parallel) system, leads to improved classification performance compared to performances using either raw features alone or traditional use of Shapley values.

14.
BMC Public Health ; 23(1): 2328, 2023 11 24.
Artículo en Inglés | MEDLINE | ID: mdl-38001411

RESUMEN

BACKGROUND: The health of migrants has received significant global attention, and it is a particularly significant concern in China, which has the largest migrant population in the world. Analyzing data on samples from the Chinese population holds practical significance. For instance, one can delve into an in-depth analysis of the factors impacting (1) the health records of residents in distinct regions and (2) the current state of family doctor contracts. This study explores the barriers to access these two health services and the variations in the effects and contribution magnitudes. METHODS: This study involved data from 138,755 individuals, extracted from the 2018 National Migration Population Health and Family Planning Dynamic Monitoring Survey database. The theoretical framework employed was the Anderson health service model. To investigate the features and determinants of basic public health service utilization among the migrant population across different regions of China, including the influence of enabling resources and demand factors, x2 tests and binary logistic regression analyses were conducted. The Shapley value method was employed to assess the extent of influence of each factor. RESULTS: The utilization of various service types varied among the migrant population, with significant regional disparities. The results of the decomposition of the Shapley value method highlighted variations in the mechanism underlying the influence of propensity characteristics, enabling resources, and demand factors between the two health service types. Propensity characteristics and demand factors were found to be the primary dimensions with the highest explanatory power; among them, health education for chronic disease prevention and treatment was the most influential factor. CONCLUSION: To better meet the health needs of the migrant population, regional barriers need to be broken down, and the relevance and effectiveness of publicity and education need to be improved. Additionally, by considering the education level, demographic characteristics, and mobility characteristics of the migrant population, along with the relevant health policies, the migrant population needs to be guided to maintain the health records of residents. They should also be encouraged to sign a contract with a family doctor in a more effective manner to promote the equalization of basic health services for the migrant population.


Asunto(s)
Migrantes , Humanos , Atención a la Salud , Servicios de Salud , Encuestas y Cuestionarios , China/epidemiología
15.
Chemosphere ; 342: 140153, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37714468

RESUMEN

Modeling-based prediction methods enable rapid, reagent-free air pollution detection based on inexpensive multi-source data than traditional chemical reaction-based detection methods in order to quickly understand the air pollution situation. In this study, a convolutional neural network (CNN) and long and short-term memory (LSTM) neural networks are integrated to create a CNN-LSTM time series prediction model to predict the concentration of PM2.5 and its chemical components (i.e., heavy metals, carbon component, and water-soluble ions) using meteorological data and air pollutants (PM2.5, SO2, NO2, CO, and O3). In the integrated CNN-LSTM model, the CNN uses convolutional and pooling layers to extract features from the data, whereas the powerful nonlinear mapping and learning capabilities of LSTM enable the time series prediction of air pollution. The experimental results showed that the CNN-LSTM exhibited good generalization ability in the prediction of As, Cd, Cr, Cu, Ni, and Zn, with a mean R2 above 0.9. Mean R2 predicted for PM2.5, Pb, Ti, EC, OC, SO42-, and NO3- ranged from 0.85 to 0.9. Shapley value showed that PM2.5, NO2, SO2, and CO had a greater influence on the predicted heavy metal results of the model. Regarding water-soluble ions, the predicted results were dominantly influenced by PM2.5, CO, and humidity. The prediction of the carbon fraction was affected mainly by the PM2.5 concentration. Additionally, several input variables for various components were eliminated without affecting the prediction accuracy of the model, with R2 between 0.70 and 0.84, thereby maximizing modeling efficiency and lowering operational costs. The fully trained model prediction results showed that most predicted components of PM2.5 were lower during January to March 2020 than those in 2018 and 2019. This study provides insight into improving the accuracy of modeling-based detection methods and promotes the development of integrated air pollution monitoring toward a more sustainable direction.

16.
J Environ Manage ; 346: 118949, 2023 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-37717391

RESUMEN

Due to variations in economic scale, economic structure, and technological advancement across different Chinese provinces and cities, the cost of air pollution reduction differs significantly. Therefore, the total reduction cost can be decreased by capitalizing on these regional discrepancies in reduction cost to carry out cooperative emission reduction. In this paper, taking NOx reduction in North China as an example, a regional cooperative reduction game (CRG) model was constructed to minimize the total cost of emission reduction while achieving future emission reduction targets. The fair allocation of benefits from cooperation plays a crucial role in motivating regions to participate into the cooperation. A comprehensive mechanism of benefits allocation was proposed to achieve fair transferred compensation. The mechanism combines the consumption responsibility principle based on input-output theory and the Shapley value method based on game theory. Compared to the cost before the optimized collaboration, the CRG model will save 20.36% and 13.71% of the total reduction cost in North China, respectively, under the target of 17.68% NOx reduction by 2025 and 66.44% NOx reduction by 2035 relative to 2020. This method can be employed in other regions to achieve targets for air pollution reduction at minimum cost, and to motivate inter-regional cooperation with this practical and fair way of transferred compensation.


Asunto(s)
Contaminantes Atmosféricos , Contaminación del Aire , Contaminantes Atmosféricos/análisis , Contaminación del Aire/prevención & control , Contaminación del Aire/análisis , China , Ciudades , Conservación de los Recursos Naturales
17.
J Biomed Inform ; 144: 104438, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37414368

RESUMEN

Unpacking and comprehending how black-box machine learning algorithms (such as deep learning models) make decisions has been a persistent challenge for researchers and end-users. Explaining time-series predictive models is useful for clinical applications with high stakes to understand the behavior of prediction models, e.g., to determine how different variables and time points influence the clinical outcome. However, existing approaches to explain such models are frequently unique to architectures and data where the features do not have a time-varying component. In this paper, we introduce WindowSHAP, a model-agnostic framework for explaining time-series classifiers using Shapley values. We intend for WindowSHAP to mitigate the computational complexity of calculating Shapley values for long time-series data as well as improve the quality of explanations. WindowSHAP is based on partitioning a sequence into time windows. Under this framework, we present three distinct algorithms of Stationary, Sliding and Dynamic WindowSHAP, each evaluated against baseline approaches, KernelSHAP and TimeSHAP, using perturbation and sequence analyses metrics. We applied our framework to clinical time-series data from both a specialized clinical domain (Traumatic Brain Injury - TBI) as well as a broad clinical domain (critical care medicine). The experimental results demonstrate that, based on the two quantitative metrics, our framework is superior at explaining clinical time-series classifiers, while also reducing the complexity of computations. We show that for time-series data with 120 time steps (hours), merging 10 adjacent time points can reduce the CPU time of WindowSHAP by 80 % compared to KernelSHAP. We also show that our Dynamic WindowSHAP algorithm focuses more on the most important time steps and provides more understandable explanations. As a result, WindowSHAP not only accelerates the calculation of Shapley values for time-series data, but also delivers more understandable explanations with higher quality.


Asunto(s)
Algoritmos , Lesiones Traumáticas del Encéfalo , Humanos , Factores de Tiempo , Benchmarking , Lesiones Traumáticas del Encéfalo/diagnóstico , Aprendizaje Automático
18.
Bioresour Technol ; 382: 129143, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37169206

RESUMEN

In this study, machine learning algorithms and big data analysis were used to decipher the nitrogen removal rate (NRR) and response mechanisms of anammox process under heavy metal stresses. Spearman algorithm and Statistical analysis revealed that Cr6+ had the strongest inhibitory effect on NRR compared to other heavy metals. The established machine learning model (extreme gradient boost) accurately predicted NRR with an accuracy>99%, and the prediction error for new data points was mostly less than 20%. Additionally, the findings of feature analysis demonstrated that Cu2+ and Fe3+ had the strongest effect on the anammox process, respectively. According to the new insights from this study, Cr6+ and Cu2+ should be removed preferentially in anammox processes under heavy metal stress. This study revealed the feasible application of machine learning and big data analysis for NRR prediction of anammox process under heavy metal stress.


Asunto(s)
Metales Pesados , Nitrógeno , Desnitrificación , Oxidación Anaeróbica del Amoníaco , Reactores Biológicos , Oxidación-Reducción , Aprendizaje Automático , Aguas del Alcantarillado
19.
Environ Sci Pollut Res Int ; 30(26): 69274-69288, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37131006

RESUMEN

Traffic assignment in urban transport planning is the process of allocating traffic flows in a network. Traditionally, traffic assignment can reduce travel time or travel costs. As the number of vehicles increases and congestion causes increased emissions, environmental issues in transportation are gaining more and more attention. The main objective of this study is to address the issue of traffic assignment in urban transport networks under an abatement rate constraint. A traffic assignment model based on cooperative game theory is proposed. The influence of vehicle emissions is incorporated into the model. The framework consists of two parts. First, the performance model predicts travel time based on the Wardrop traffic equilibrium principle, which reflects the system travel time. No travelers can experience a lower travel time by unilaterally changing their path. Second, the cooperative game model gives link importance ranking based on the Shapley value, which measures the average marginal utility contribution of links of the network to all possible link coalitions that include the link, and assigns traffic flow based on the average marginal utility contribution of a link with system vehicle emission reduction constraints. The proposed model shows that traffic assignment with emission reduction constraints allows more vehicles in the network with an emission reduction rate of 20% than traditional models.


Asunto(s)
Teoría del Juego , Modelos Teóricos , Transportes , Emisiones de Vehículos/análisis , China
20.
Sci Total Environ ; 882: 163572, 2023 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-37084908

RESUMEN

Soil available water capacity (AWC) is a key function for human survival and well-being. However, its direct measurement is laborious and spatial interpretation is complex. Digital soil mapping (DSM) techniques emerge as an alternative to spatial modeling of soil properties. DSM techniques commonly apply machine learning (ML) models, with a high level of complexity. In this context, we aimed to perform a digital mapping of soil AWC and interpret the results of the Random Forest (RF) algorithm and, in a case study, to show that digital AWC maps can support agricultural planning in response to the local effects of climate change. To do so, we divided this research into two approaches: In the first approach, we showed a DSM using 1857 sample points in a southeastern region of Brazil with laboratory-determined soil attributes, together with a pedotransfer function (PTF), remote sensing and DSM techniques. In the second approach, the constructed AWC digital soil map and weather station data were used to calculate climatological soil water balances for the periods between 1917-1946 and 1991-2020. The result showed the selection of covariates using Shapley values as a criterion contributed to the parsimony of the model, obtaining goodness-of-fit metrics of R2 0.72, RMSE 16.72 mm m-1, CCC 0.83, and Bias of 0.53 over the validation set. The highest contributing covariates for soil AWC prediction were the Landsat multitemporal images with bare soil pixels, mean diurnal, and annual temperature range. Under the current climate conditions, soil available water content (AW) increased during the dry period (April to August). May had the highest increase in AW (∼17 mm m-1) and decrease in September (∼14 mm m-1). The used methodology provides support for AWC modeling at 30 m resolution, as well as insight into the adaptation of crop growth periods to the effects of climate change.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA