ABSTRACT
In settings wherein discussion topics are not statically assigned, such as in microblogs, a need exists for identifying and separating topics of a given event. We approach the problem by using a novel type of similarity, calculated between the major terms used in posts. The occurrences of such terms are periodically sampled from the posts stream. The generated temporal series are processed by using marker-based stigmergy, i.e., a biologically-inspired mechanism performing scalar and temporal information aggregation. More precisely, each sample of the series generates a functional structure, called mark, associated with some concentration. The concentrations disperse in a scalar space and evaporate over time. Multiple deposits, when samples are close in terms of instants of time and values, aggregate in a trail and then persist longer than an isolated mark. To measure similarity between time series, the Jaccard's similarity coefficient between trails is calculated. Discussion topics are generated by such similarity measure in a clustering process using Self-Organizing Maps, and are represented via a colored term cloud. Structural parameters are correctly tuned via an adaptation mechanism based on Differential Evolution. Experiments are completed for a real-world scenario, and the resulting similarity is compared with Dynamic Time Warping (DTW) similarity.
Subject(s)
Blogging , Cluster Analysis , Social Media , Algorithms , BiomimeticsABSTRACT
Psychological research has found that human perception of randomness is biased. In particular, people consistently show the overalternating bias: they rate binary sequences of symbols (such as Heads and Tails in coin flipping) with an excess of alternation as more random than prescribed by the normative criteria of Shannon's entropy. Within data mining for medical applications, Marcellin proposed an asymmetric measure of entropy that can be ideal to account for such bias and to quantify subjective randomness. We fitted Marcellin's entropy and Renyi's entropy (a generalized form of uncertainty measure comprising many different kinds of entropies) to experimental data found in the literature with the Differential Evolution algorithm. We observed a better fit for Marcellin's entropy compared to Renyi's entropy. The fitted asymmetric entropy measure also showed good predictive properties when applied to different datasets of randomness-related tasks. We concluded that Marcellin's entropy can be a parsimonious and effective measure of subjective randomness that can be useful in psychological research about randomness perception.