Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Entropy (Basel) ; 26(5)2024 Apr 28.
Article in English | MEDLINE | ID: mdl-38785618

ABSTRACT

This paper presents a comparative study of entropy estimation in a large-alphabet regime. A variety of entropy estimators have been proposed over the years, where each estimator is designed for a different setup with its own strengths and caveats. As a consequence, no estimator is known to be universally better than the others. This work addresses this gap by comparing twenty-one entropy estimators in the studied regime, starting with the simplest plug-in estimator and leading up to the most recent neural network-based and polynomial approximate estimators. Our findings show that the estimators' performance highly depends on the underlying distribution. Specifically, we distinguish between three types of distributions, ranging from uniform to degenerate distributions. For each class of distribution, we recommend the most suitable estimator. Further, we propose a sample-dependent approach, which again considers three classes of distribution, and report the top-performing estimators in each class. This approach provides a data-dependent framework for choosing the desired estimator in practical setups.

2.
Sci Rep ; 14(1): 8089, 2024 04 06.
Article in English | MEDLINE | ID: mdl-38582940

ABSTRACT

Current global COVID-19 booster scheduling strategies mainly focus on vaccinating high-risk populations at predetermined intervals. However, these strategies overlook key data: the direct insights into individual immunity levels from active serological testing and the indirect information available either through sample-based sero-surveillance, or vital demographic, location, and epidemiological factors. Our research, employing an age-, risk-, and region-structured mathematical model of disease transmission-based on COVID-19 incidence and vaccination data from Israel between 15 May 2020 and 25 October 2021-reveals that a more comprehensive strategy integrating these elements can significantly reduce COVID-19 hospitalizations without increasing existing booster coverage. Notably, the effective use of indirect information alone can considerably decrease COVID-19 cases and hospitalizations, without the need for additional vaccine doses. This approach may also be applicable in optimizing vaccination strategies for other infectious diseases, including influenza.


Subject(s)
COVID-19 , Influenza Vaccines , Humans , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines , Vaccination , Hospitalization
3.
Entropy (Basel) ; 26(4)2024 Apr 05.
Article in English | MEDLINE | ID: mdl-38667872

ABSTRACT

The paper addresses the problem of distinguishing the leading agents in the group. The problem is considered in the framework of classification problems, where the agents in the group select the items with respect to certain properties. The suggested method of distinguishing the leading agents utilizes the connectivity between the agents and the Rokhlin distance between the subgroups of the agents. The method is illustrated by numerical examples. The method can be useful in considering the division of labor in swarm dynamics and in the analysis of the data fusion in the tasks based on the wisdom of the crowd techniques.

4.
Article in English | MEDLINE | ID: mdl-36155469

ABSTRACT

Estimating the entropy of a discrete random variable is a fundamental problem in information theory and related fields. This problem has many applications in various domains, including machine learning, statistics, and data compression. Over the years, a variety of estimation schemes have been suggested. However, despite significant progress, most methods still struggle when the sample is small, compared to the variable's alphabet size. In this work, we introduce a practical solution to this problem, which extends the work of McAllester and Statos. The proposed scheme uses the generalization abilities of cross-entropy estimation in deep neural networks (DNNs) to introduce improved entropy estimation accuracy. Furthermore, we introduce a family of estimators for related information-theoretic measures, such as conditional entropy and mutual information (MI). We show that these estimators are strongly consistent and demonstrate their performance in a variety of use cases. First, we consider large alphabet entropy estimation. Then, we extend the scope to MI estimation. Next, we apply the proposed scheme to conditional MI estimation, as we focus on independence testing tasks. Finally, we study a transfer entropy (TE) estimation problem. The proposed estimators demonstrate improved performance compared to existing methods in all of these setups.

5.
Entropy (Basel) ; 24(8)2022 Aug 22.
Article in English | MEDLINE | ID: mdl-36010832

ABSTRACT

This paper addresses the problem of detecting multiple static and mobile targets by an autonomous mobile agent acting under uncertainty. It is assumed that the agent is able to detect targets at different distances and that the detection includes errors of the first and second types. The goal of the agent is to plan and follow a trajectory that results in the detection of the targets in a minimal time. The suggested solution implements the approach of deep Q-learning applied to maximize the cumulative information gain regarding the targets' locations and minimize the trajectory length on the map with a predefined detection probability. The Q-learning process is based on a neural network that receives the agent location and current probability map and results in the preferred move of the agent. The presented procedure is compared with the previously developed techniques of sequential decision making, and it is demonstrated that the suggested novel algorithm strongly outperforms the existing methods.

6.
PLoS One ; 16(7): e0253865, 2021.
Article in English | MEDLINE | ID: mdl-34283839

ABSTRACT

BACKGROUND: Contact mixing plays a key role in the spread of COVID-19. Thus, mobility restrictions of varying degrees up to and including nationwide lockdowns have been implemented in over 200 countries. To appropriately target the timing, location, and severity of measures intended to encourage social distancing at a country level, it is essential to predict when and where outbreaks will occur, and how widespread they will be. METHODS: We analyze aggregated, anonymized health data and cell phone mobility data from Israel. We develop predictive models for daily new cases and the test positivity rate over the next 7 days for different geographic regions in Israel. We evaluate model goodness of fit using root mean squared error (RMSE). We use these predictions in a five-tier categorization scheme to predict the severity of COVID-19 in each region over the next week. We measure magnitude accuracy (MA), the extent to which the correct severity tier is predicted. RESULTS: Models using mobility data outperformed models that did not use mobility data, reducing RMSE by 17.3% when predicting new cases and by 10.2% when predicting the test positivity rate. The best set of predictors for new cases consisted of 1-day lag of past 7-day average new cases, along with a measure of internal movement within a region. The best set of predictors for the test positivity rate consisted of 3-days lag of past 7-day average test positivity rate, along with the same measure of internal movement. Using these predictors, RMSE was 4.812 cases per 100,000 people when predicting new cases and 0.79% when predicting the test positivity rate. MA in predicting new cases was 0.775, and accuracy of prediction to within one tier was 1.0. MA in predicting the test positivity rate was 0.820, and accuracy to within one tier was 0.998. CONCLUSIONS: Using anonymized, macro-level data human mobility data along with health data aids predictions of when and where COVID-19 outbreaks are likely to occur. Our method provides a useful tool for government decision makers, particularly in the post-vaccination era, when focused interventions are needed to contain COVID-19 outbreaks while mitigating the collateral damage from more global restrictions.


Subject(s)
COVID-19/diagnosis , COVID-19/epidemiology , Communicable Disease Control/methods , Humans , Israel
7.
Sensors (Basel) ; 21(12)2021 Jun 21.
Article in English | MEDLINE | ID: mdl-34205774

ABSTRACT

Wireless body area networks (WBANs) have strong potential in the field of health monitoring. However, the energy consumption required for accurate monitoring determines the time between battery charges of the wearable sensors, which is a key performance factor (and can be critical in the case of implantable devices). In this paper, we study the inherent trade-off between the power consumption of the sensors and the probability of misclassifying a patient's health state. We formulate this trade-off as a dynamic problem, in which at each step, we can choose to activate a subset of sensors that provide noisy measurements of the patient's health state. We assume that the (unknown) health state follows a Markov chain, so our problem is formulated as a partially observable Markov decision problem (POMDP). We show that all the past measurements can be summarized as a belief state on the true health state of the patient, which allows tackling the POMDP problem as an MDP on the belief state. Then, we empirically study the performance of a greedy one-step look-ahead policy compared to the optimal policy obtained by solving the dynamic program. For that purpose, we use an open-source Continuous Glucose Monitoring (CGM) dataset of 232 patients over six months and extract the transition matrix and sensor accuracies from the data. We find that the greedy policy saves ≈50% of the energy costs while reducing the misclassification costs by less than 2% compared to the most accurate policy possible that always activates all sensors. Our sensitivity analysis reveals that the greedy policy remains nearly optimal across different cost parameters and a varying number of sensors. The results also have practical importance, because while the optimal policy is too complicated, a greedy one-step look-ahead policy can be easily implemented in WBAN systems.


Subject(s)
Blood Glucose Self-Monitoring , Wireless Technology , Algorithms , Blood Glucose , Humans , Policy
8.
Entropy (Basel) ; 23(2)2021 Feb 17.
Article in English | MEDLINE | ID: mdl-33671301

ABSTRACT

The history of information theory, as a mathematical principle for analyzing data transmission and information communication, was formalized in 1948 with the publication of Claude Shannon's famous paper "A Mathematical Theory of Communication" [...].

9.
Entropy (Basel) ; 22(5)2020 Apr 30.
Article in English | MEDLINE | ID: mdl-33286284

ABSTRACT

The paper considers the detection of multiple targets by a group of mobile robots that perform under uncertainty. The agents are equipped with sensors with positive and non-negligible probabilities of detecting the targets at different distances. The goal is to define the trajectories of the agents that can lead to the detection of the targets in minimal time. The suggested solution follows the classical Koopman's approach applied to an occupancy grid, while the decision-making and control schemes are conducted based on information-theoretic criteria. Sensor fusion in each agent and over the agents is implemented using a general Bayesian scheme. The presented procedures follow the expected information gain approach utilizing the "center of view" and the "center of gravity" algorithms. These methods are compared with a simulated learning method. The activity of the procedures is analyzed using numerical simulations.

10.
Entropy (Basel) ; 22(8)2020 Aug 18.
Article in English | MEDLINE | ID: mdl-33286674

ABSTRACT

Projects are rarely executed exactly as planned. Often, the actual duration of a project's activities differ from the planned duration, resulting in costs stemming from the inaccurate estimation of the activity's completion date. While monitoring a project at various inspection points is pricy, it can lead to a better estimation of the project completion time, hence saving costs. Nonetheless, identifying the optimal inspection points is a difficult task, as it requires evaluating a large number of the project's path options, even for small-scale projects. This paper proposes an analytical method for identifying the optimal project inspection points by using information theory measures. We search for monitoring (inspection) points that can maximize the information about the project's estimated duration or completion time. The proposed methodology is based on a simulation-optimization scheme using a Monte Carlo engine that simulates potential activities' durations. An exhaustive search is performed of all possible monitoring points to find those with the highest expected information gain on the project duration. The proposed algorithm's complexity is little affected by the number of activities, and the algorithm can address large projects with hundreds or thousands of activities. Numerical experimentation and an analysis of various parameters are presented.

11.
Decis Support Syst ; 134: 113290, 2020 Jul.
Article in English | MEDLINE | ID: mdl-32501316

ABSTRACT

In this paper, we propose a comprehensive analytics framework that can serve as a decision support tool for HR recruiters in real-world settings in order to improve hiring and placement decisions. The proposed framework follows two main phases: a local prediction scheme for recruitments' success at the level of a single job placement, and a mathematical model that provides a global recruitment optimization scheme for the organization, taking into account multilevel considerations. In the first phase, a key property of the proposed prediction approach is the interpretability of the machine learning (ML) model, which in this case is obtained by applying the Variable-Order Bayesian Network (VOBN) model to the recruitment data. Specifically, we used a uniquely large dataset that contains recruitment records of hundreds of thousands of employees over a decade and represents a wide range of heterogeneous populations. Our analysis shows that the VOBN model can provide both high accuracy and interpretability insights to HR professionals. Moreover, we show that using the interpretable VOBN can lead to unexpected and sometimes counter-intuitive insights that might otherwise be overlooked by recruiters who rely on conventional methods. We demonstrate that it is feasible to predict the successful placement of a candidate in a specific position at a pre-hire stage and utilize predictions to devise a global optimization model. Our results show that in comparison to actual recruitment decisions, the devised framework is capable of providing a balanced recruitment plan while improving both diversity and recruitment success rates, despite the inherent trade-off between the two.

12.
Entropy (Basel) ; 21(7)2019 Jun 29.
Article in English | MEDLINE | ID: mdl-33267359

ABSTRACT

We propose a new algorithm called the context-based predictive information (CBPI) for estimating the predictive information (PI) between time series, by utilizing a lossy compression algorithm. The advantage of this approach over existing methods resides in the case of sparse predictive information (SPI) conditions, where the ratio between the number of informative sequences to uninformative sequences is small. It is shown that the CBPI achieves a better PI estimation than benchmark methods by ignoring uninformative sequences while improving explainability by identifying the informative sequences. We also provide an implementation of the CBPI algorithm on a real dataset of large banks' stock prices in the U.S. In the last part of this paper, we show how the CBPI algorithm is related to the well-known information bottleneck in its deterministic version.

13.
J Bioinform Comput Biol ; 5(2B): 561-77, 2007 Apr.
Article in English | MEDLINE | ID: mdl-17636862

ABSTRACT

Variable order Markov models and variable order Bayesian trees have been proposed for the recognition of cis-regulatory elements, and it has been demonstrated that they outperform traditional models such as position weight matrices, Markov models, and Bayesian trees for the recognition of binding sites in prokaryotes. Here, we study to which degree variable order models can improve the recognition of eukaryotic cis-regulatory elements. We find that variable order models can improve the recognition of binding sites of all the studied transcription factors. To ease a systematic evaluation of different model combinations based on problem-specific data sets and allow genomic scans of cis-regulatory elements based on fixed and variable order Markov models and Bayesian trees, we provide the VOMBATserver to the public community.


Subject(s)
Algorithms , Chromosome Mapping/methods , Models, Genetic , Regulatory Elements, Transcriptional/genetics , Sequence Analysis, DNA/methods , Software , Transcription Factors/genetics , Bayes Theorem , Computer Simulation , Markov Chains , Models, Statistical , Pattern Recognition, Automated/methods
14.
BMC Bioinformatics ; 8: 111, 2007 Mar 30.
Article in English | MEDLINE | ID: mdl-17397530

ABSTRACT

BACKGROUND: The definition of a distance measure plays a key role in the evaluation of different clustering solutions of gene expression profiles. In this empirical study we compare different clustering solutions when using the Mutual Information (MI) measure versus the use of the well known Euclidean distance and Pearson correlation coefficient. RESULTS: Relying on several public gene expression datasets, we evaluate the homogeneity and separation scores of different clustering solutions. It was found that the use of the MI measure yields a more significant differentiation among erroneous clustering solutions. The proposed measure was also used to analyze the performance of several known clustering algorithms. A comparative study of these algorithms reveals that their "best solutions" are ranked almost oppositely when using different distance measures, despite the found correspondence between these measures when analysing the averaged scores of groups of solutions. CONCLUSION: In view of the results, further attention should be paid to the selection of a proper distance measure for analyzing the clustering of gene expression data.


Subject(s)
Algorithms , Artificial Intelligence , Cluster Analysis , Gene Expression Profiling/methods , Multigene Family/physiology , Oligonucleotide Array Sequence Analysis/methods
15.
Nucleic Acids Res ; 34(Web Server issue): W529-33, 2006 Jul 01.
Article in English | MEDLINE | ID: mdl-16845064

ABSTRACT

Variable order Markov models and variable order Bayesian trees have been proposed for the recognition of transcription factor binding sites, and it could be demonstrated that they outperform traditional models, such as position weight matrices, Markov models and Bayesian trees. We develop a web server for the recognition of DNA binding sites based on variable order Markov models and variable order Bayesian trees offering the following functionality: (i) given datasets with annotated binding sites and genomic background sequences, variable order Markov models and variable order Bayesian trees can be trained; (ii) given a set of trained models, putative DNA binding sites can be predicted in a given set of genomic sequences and (iii) given a dataset with annotated binding sites and a dataset with genomic background sequences, cross-validation experiments for different model combinations with different parameter settings can be performed. Several of the offered services are computationally demanding, such as genome-wide predictions of DNA binding sites in mammalian genomes or sets of 10(4)-fold cross-validation experiments for different model combinations based on problem-specific data sets. In order to execute these jobs, and in order to serve multiple users at the same time, the web server is attached to a Linux cluster with 150 processors. VOMBAT is available at http://pdw-24.ipk-gatersleben.de:8080/VOMBAT/.


Subject(s)
Genomics/methods , Regulatory Elements, Transcriptional , Software , Transcription Factors/metabolism , Algorithms , Bayes Theorem , Binding Sites , Internet , Markov Chains , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...