Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Entropy (Basel) ; 26(5)2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38785636

ABSTRACT

Using information-theoretic quantities in practical applications with continuous data is often hindered by the fact that probability density functions need to be estimated in higher dimensions, which can become unreliable or even computationally unfeasible. To make these useful quantities more accessible, alternative approaches such as binned frequencies using histograms and k-nearest neighbors (k-NN) have been proposed. However, a systematic comparison of the applicability of these methods has been lacking. We wish to fill this gap by comparing kernel-density-based estimation (KDE) with these two alternatives in carefully designed synthetic test cases. Specifically, we wish to estimate the information-theoretic quantities: entropy, Kullback-Leibler divergence, and mutual information, from sample data. As a reference, the results are compared to closed-form solutions or numerical integrals. We generate samples from distributions of various shapes in dimensions ranging from one to ten. We evaluate the estimators' performance as a function of sample size, distribution characteristics, and chosen hyperparameters. We further compare the required computation time and specific implementation challenges. Notably, k-NN estimation tends to outperform other methods, considering algorithmic implementation, computational efficiency, and estimation accuracy, especially with sufficient data. This study provides valuable insights into the strengths and limitations of the different estimation methods for information-theoretic quantities. It also highlights the significance of considering the characteristics of the data, as well as the targeted information-theoretic quantity when selecting an appropriate estimation technique. These findings will assist scientists and practitioners in choosing the most suitable method, considering their specific application and available data. We have collected the compared estimation methods in a ready-to-use open-source Python 3 toolbox and, thereby, hope to promote the use of information-theoretic quantities by researchers and practitioners to evaluate the information in data and models in various disciplines.

2.
Sci Data ; 11(1): 170, 2024 Feb 05.
Article in English | MEDLINE | ID: mdl-38316782

ABSTRACT

Access to accurate spatio-temporal groundwater level data is crucial for sustainable water management in Chile. Despite this importance, a lack of unified, quality-controlled datasets have hindered large-scale groundwater studies. Our objective was to establish a comprehensive, reliable nationwide groundwater dataset. We curated over 120,000 records from 640 wells, spanning 1970-2021, provided by the General Water Resources Directorate. One notable enhancement to our dataset is the incorporation of elevation data. This addition allows for a more comprehensive estimation of groundwater elevation. Rigorous data quality analysis was executed through a classification scheme applied to raw groundwater level records. This resource is invaluable for researchers, decision-makers, and stakeholders, offering insights into groundwater trends to support informed, sustainable water management. Our study bridges a crucial gap by providing a dependable dataset for expansive studies, aiding water management strategies in Chile.

4.
Entropy (Basel) ; 23(6)2021 Jun 11.
Article in English | MEDLINE | ID: mdl-34208344

ABSTRACT

We develop a simple Quantile Spacing (QS) method for accurate probabilistic estimation of one-dimensional entropy from equiprobable random samples, and compare it with the popular Bin-Counting (BC) and Kernel Density (KD) methods. In contrast to BC, which uses equal-width bins with varying probability mass, the QS method uses estimates of the quantiles that divide the support of the data generating probability density function (pdf) into equal-probability-mass intervals. And, whereas BC and KD each require optimal tuning of a hyper-parameter whose value varies with sample size and shape of the pdf, QS only requires specification of the number of quantiles to be used. Results indicate, for the class of distributions tested, that the optimal number of quantiles is a fixed fraction of the sample size (empirically determined to be ~0.25-0.35), and that this value is relatively insensitive to distributional form or sample size. This provides a clear advantage over BC and KD since hyper-parameter tuning is not required. Further, unlike KD, there is no need to select an appropriate kernel-type, and so QS is applicable to pdfs of arbitrary shape, including those with discontinuous slope and/or magnitude. Bootstrapping is used to approximate the sampling variability distribution of the resulting entropy estimate, and is shown to accurately reflect the true uncertainty. For the four distributional forms studied (Gaussian, Log-Normal, Exponential and Bimodal Gaussian Mixture), expected estimation bias is less than 1% and uncertainty is low even for samples of as few as 100 data points; in contrast, for KD the small sample bias can be as large as -10% and for BC as large as -50%. We speculate that estimating quantile locations, rather than bin-probabilities, results in more efficient use of the information in the data to approximate the underlying shape of an unknown data generating pdf.

5.
Sci Total Environ ; 722: 137935, 2020 Jun 20.
Article in English | MEDLINE | ID: mdl-32208275

ABSTRACT

Precipitation-extremes-driven floods, which compose an important proportion of streamflow but cause severe adverse impacts in the Loess Plateau of China, urged the progressive implementation of ecological restoration (ER) strategies in the Loess Plateau (LP) of China. Knowledge of the linkage between climate variables (especially precipitation extremes) and streamflow generation become more essential for advanced catchment management as ER and climate variability have resulted in reduced streamflow and freshwater stress. Here, a partial least squares regression (PLSR) approach was used to investigate this issue at 16 main catchments of the LP over a reference period (1961-1979). Then, we quantified streamflow decline during the "Integrated Soil and Water Conservation" (1980-1999) and the "Grain for Grain" (2000-2015) strategies by PLSR modeling. We found that the dominant climatic variables controlling annual streamflow include heavy precipitation amount and heavy precipitation days, maximum precipitation event amount, number of consecutive wet days, annual total precipitation (daily precipitation ≥1 mm), and effective precipitation amount (daily precipitation ≥5 mm). Further, the effect of precipitation extremes on streamflow generation is stronger in drier catchments. The impacts of precipitation extremes on streamflow generation can be strengthened by agricultural cultivation and weakened by revegetation (especially reforestation). Overall, we found that climate-driven annual streamflow decreased by 7.5 mm during 1980-1999 and by 5.6 mm during 2000-2015, in comparison to 1961-1979. The dominant cause of streamflow reduction was ER, with the contribution increasing from 59% in 1980-1999 to 82% in 2000-2015. The PLSR approach enables the identification of linkages between climate variables and streamflow generation, and the prediction of climate-driven streamflow. This study yields a greater understanding of the influences of climate variability and ER on streamflow change, and is helpful to identify hydroclimatological trends and projections.

SELECTION OF CITATIONS
SEARCH DETAIL
...