Pesquisa | Portal Regional da BVS

Dataset: Annotated soybean market news articles.

Reis Filho, Ivan José Dos; Coleti, Jamille de Campos; Marcacini, Ricardo Marcondes; Rezende, Solange Oliveira.

Data Brief ; 55: 110545, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38952954

RESUMO

This dataset involves a collection of soybean market news through web scraping from a Brazilian website. The news articles gathered span from January 2015 to June 2023 and have undergone a labeling process to categorize them as relevant or non-relevant. The news labeling process was conducted under the guidance of an agricultural economics expert, who collaborated with a group of nine individuals. Ten parameters were considered to assist participants in the labeling process. The dataset comprises approximately 11,000 news articles and serves as a valuable resource for researchers interested in exploring trends in the soybean market. Importantly, this dataset can be utilized for tasks such as classification and natural language processing. It provides insights into labeled soybean market news and supports open science initiatives, facilitating further analysis within the research community.

On the enrichment of time series with textual data for forecasting agricultural commodity prices.

Reis Filho, Ivan José; Marcacini, Ricardo Marcondes; Rezende, Solange Oliveira.

MethodsX ; 9: 101758, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35782724

RESUMO

Forecasting models in the financial market generally use quantitative time-series data. However, external factors can influence data in time-series, such as weather events, economic crises, and the foreign exchange market. This information is not explicit in the time-series and can influence the prediction of the variable values. Textual data can be a source of knowledge about external factors and is potentially helpful for time-series forecasting models. Some studies have presented text mining techniques to combine textual and time-series data. However, the existing representations have limitations, such as the curse of dimensionality and sparse data. This work investigates the finite use of domain-specific terms to investigate these problems by representing textual data with low dimensional space. We consider thirty-three keywords that are potentially important in the domain to enrich time-series using text mining techniques. Four regression models were applied to the representation proposed to predict the future daily price of corn and soybeans. The experimental setup considers a real market scenario, in which the daily sliding window strategy and step-forward forecast were used. The representation proposed has better accuracy in some forecasting scenarios. The results indicate that text data are a promising alternative for enriching time-series representations and reducing uncertainty forecasting models.â¢We show an approach to enriching time-series using domain-specific terms;â¢Representation proposed combines quantitative data with qualitative market factors;â¢Regression Models to learn a forecasting function from enriched time-series.

A network-based positive and unlabeled learning approach for fake news detection.

de Souza, Mariana Caravanti; Nogueira, Bruno Magalhães; Rossi, Rafael Geraldeli; Marcacini, Ricardo Marcondes; Dos Santos, Brucce Neves; Rezende, Solange Oliveira.

Mach Learn ; 111(10): 3549-3592, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34815619

RESUMO

Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (k-Means, k-Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network.

Learning to sense from events via semantic variational autoencoder.

Gôlo, Marcos Paulo Silva; Rossi, Rafael Geraldeli; Marcacini, Ricardo Marcondes.

PLoS One ; 16(12): e0260701, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34941880

RESUMO

In this paper, we introduce the concept of learning to sense, which aims to emulate a complex characteristic of human reasoning: the ability to monitor and understand a set of interdependent events for decision-making processes. Event datasets are composed of textual data and spatio-temporal features that determine where and when a given phenomenon occurred. In learning to sense, related events are mapped closely to each other in a semantic vector space, thereby identifying that they contain similar contextual meaning. However, learning a semantic vector space that satisfies both textual similarities and spatio-temporal constraints is a crucial challenge for event analysis and sensing. This paper investigates a Semantic Variational Autoencoder (SVAE) to fine-tune pre-trained embeddings according to both textual and spatio-temporal events of the class of interest. Experiments involving more than one hundred sensors show that our SVAE outperforms a competitive one-class classification baseline. Moreover, our proposal provides desirable learning requirements to sense scenarios, such as visualization of the sensor decision function and heat maps with the sensor's geographic impact.

Assuntos

Aprendizagem/fisiologia , Acontecimentos que Mudam a Vida , Percepção/fisiologia , Estresse Psicológico , Algoritmos , Humanos , Semântica

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA