Search | VHL Regional Portal

Using full-text content to characterize and identify best seller books: A study of early 20th-century literature.

da Silva, Giovana D; Silva, Filipi N; de Arruda, Henrique F; E Souza, Bárbara C; Costa, Luciano da F; Amancio, Diego R.

PLoS One ; 19(4): e0302070, 2024.

Article in English | MEDLINE | ID: mdl-38669247

ABSTRACT

Artistic pieces can be studied from several perspectives, one example being their reception among readers over time. In the present work, we approach this interesting topic from the standpoint of literary works, particularly assessing the task of predicting whether a book will become a best seller. Unlike previous approaches, we focused on the full content of books and considered visualization and classification tasks. We employed visualization for the preliminary exploration of the data structure and properties, involving SemAxis and linear discriminant analyses. To obtain quantitative and more objective results, we employed various classifiers. Such approaches were used along with a dataset containing (i) books published from 1895 to 1923 and consecrated as best sellers by the Publishers Weekly Bestseller Lists and (ii) literary works published in the same period but not being mentioned in that list. Our comparison of methods revealed that the best-achieved result-combining a bag-of-words representation with a logistic regression classifier-led to an average accuracy of 0.75 both for the leave-one-out and 10-fold cross-validations. Such an outcome enhances the difficulty in predicting the success of books with high accuracy, even using the full content of the texts. Nevertheless, our findings provide insights into the factors leading to the relative success of a literary work.

Subject(s)

Books , Books/history , History, 20th Century , Humans , Literature/history

The dynamics of knowledge acquisition via self-learning in complex networks.

Lima, Thales S; de Arruda, Henrique F; Silva, Filipi N; Comin, Cesar H; Amancio, Diego R; Costa, Luciano da F.

Chaos ; 28(8): 083106, 2018 Aug.

Article in English | MEDLINE | ID: mdl-30180654

ABSTRACT

Studies regarding knowledge organization and acquisition are of great importance to understand areas related to science and technology. A common way to model the relationship between different concepts is through complex networks. In such representations, networks' nodes store knowledge and edges represent their relationships. Several studies that considered this type of structure and knowledge acquisition dynamics employed one or more agents to discover node concepts by walking on the network. In this study, we investigate a different type of dynamics adopting a single node as the "network brain." Such a brain represents a range of real systems such as the information about the environment that is acquired by a person and is stored in the brain. To store the discovered information in a specific node, the agents walk on the network and return to the brain. We propose three different dynamics and test them on several network models and on a real system, which is formed by journal articles and their respective citations. The results revealed that, according to the adopted walking models, the efficiency of self-knowledge acquisition has only a weak dependency on topology and search strategy.

Topic segmentation via community detection in complex networks.

de Arruda, Henrique F; Costa, Luciano da F; Amancio, Diego R.

Chaos ; 26(6): 063120, 2016 06.

Article in English | MEDLINE | ID: mdl-27368785

ABSTRACT

Many real systems have been modeled in terms of network concepts, and written texts are a particular example of information networks. In recent years, the use of network methods to analyze language has allowed the discovery of several interesting effects, including the proposition of novel models to explain the emergence of fundamental universal patterns. While syntactical networks, one of the most prevalent networked models of written texts, display both scale-free and small-world properties, such a representation fails in capturing other textual features, such as the organization in topics or subjects. We propose a novel network representation whose main purpose is to capture the semantical relationships of words in a simple way. To do so, we link all words co-occurring in the same semantic context, which is defined in a threefold way. We show that the proposed representations favor the emergence of communities of semantically related words, and this feature may be used to identify relevant topics. The proposed methodology to detect topics was applied to segment selected Wikipedia articles. We found that, in general, our methods outperform traditional bag-of-words representations, which suggests that a high-level textual representation may be useful to study the semantical features of texts.

Subject(s)

Computer Simulation , Information Systems , Semantics

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL