ABSTRACT
Monitoring the concentration of pigments like chlorophyll (Chl) in water-bodies is a key task to contribute to their conservation. However, with the existing sensor technology, measurement in real-time and with enough frequency to ensure proper risk management is not completely feasible. In this work, with the concept of data-driven soft-sensing, three hydrophysical features are used together with three meteorological ones to estimate the concentration of Chl in two tributaries of the River Thames. Data driven models, specifically neural networks, are used with three learning approaches: individual, centralized and federated. Data reduction scenarios are proposed in order to analyze the performance of each approach when less data is available. The best results in the training are usually obtained with the individual approach. However, the federated learning provides better generalization ability. It was also observed that in most of the cases the results of the federated learning approach improve those of the centralized one.
Subject(s)
Chlorophyll , Deep Learning , Chlorophyll/analysis , Chlorophyll A/analysis , Environmental Monitoring/methods , Neural Networks, ComputerABSTRACT
In this work the applicability of an ensemble of population and machine learning models to predict the evolution of the COVID-19 pandemic in Spain is evaluated, relying solely on public datasets. Firstly, using only incidence data, we trained machine learning models and adjusted classical ODE-based population models, especially suited to capture long term trends. As a novel approach, we then made an ensemble of these two families of models in order to obtain a more robust and accurate prediction. We then proceed to improve machine learning models by adding more input features: vaccination, human mobility and weather conditions. However, these improvements did not translate to the overall ensemble, as the different model families had also different prediction patterns. Additionally, machine learning models degraded when new COVID variants appeared after training. We finally used Shapley Additive Explanation values to discern the relative importance of the different input features for the machine learning models' predictions. The conclusion of this work is that the ensemble of machine learning models and population models can be a promising alternative to SEIR-like compartmental models, especially given that the former do not need data from recovered patients, which are hard to collect and generally unavailable.
Subject(s)
COVID-19 , Pandemics , Humans , Spain/epidemiology , COVID-19/epidemiology , SARS-CoV-2 , Machine Learning , ForecastingABSTRACT
Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, (α,k)-anonymity, â-diversity, entropy â-diversity, recursive (c,â)-diversity, t-closeness, basic ß-likeness, enhanced ß-likeness and δ-disclosure privacy. For the case of more than one sensitive attribute, two approaches are proposed for evaluating these techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and sensitive attributes. The methods implemented are presented together with the attacks they prevent, the description of the library, examples of the different functions' usage, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed.