Pesquisa | Portal Regional da BVS

CafeteriaSA corpus: scientific abstracts annotated across different food semantic resources.

Cenikj, Gjorgjina; Valencic, Eva; Ispirova, Gordana; Ogrinc, Matevz; Stojanov, Riste; Korosec, Peter; Cavalli, Ermanno; Seljak, Barbara Korousic; Eftimov, Tome.

Database (Oxford) ; 20222022 12 16.

Artigo em Inglês | MEDLINE | ID: mdl-36526439

RESUMO

In the last decades, a great amount of work has been done in predictive modeling of issues related to human and environmental health. Resolution of issues related to healthcare is made possible by the existence of several biomedical vocabularies and standards, which play a crucial role in understanding the health information, together with a large amount of health-related data. However, despite a large number of available resources and work done in the health and environmental domains, there is a lack of semantic resources that can be utilized in the food and nutrition domain, as well as their interconnections. For this purpose, in a European Food Safety Authority-funded project CAFETERIA, we have developed the first annotated corpus of 500 scientific abstracts that consists of 6407 annotated food entities with regard to Hansard taxonomy, 4299 for FoodOn and 3623 for SNOMED-CT. The CafeteriaSA corpus will enable the further development of natural language processing methods for food information extraction from textual data that will allow extracting food information from scientific textual data. Database URL: https://zenodo.org/record/6683798#.Y49wIezMJJF.

Assuntos

Processamento de Linguagem Natural , Semântica , Humanos , Armazenamento e Recuperação da Informação , Bases de Dados Factuais

CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources.

Ispirova, Gordana; Cenikj, Gjorgjina; Ogrinc, Matevz; Valencic, Eva; Stojanov, Riste; Korosec, Peter; Cavalli, Ermanno; Korousic Seljak, Barbara; Eftimov, Tome.

Foods ; 11(17)2022 Sep 02.

Artigo em Inglês | MEDLINE | ID: mdl-36076868

RESUMO

Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with semantic tags from different semantic resources-Hansard taxonomy, FoodOn ontology, SNOMED CT terminology and the FoodEx2 classification system. FoodBase is an annotated corpus of food entities-recipes-which includes a curated version of 1000 instances, considered a gold standard. In this study, we use the curated version of FoodBase and two different approaches for annotating-the NCBO annotator (for the FoodOn and SNOMED CT annotations) and the semi-automatic StandFood method (for the FoodEx2 annotations). The end result is a new version of the golden standard of the FoodBase corpus, called the CafeteriaFCD (Cafeteria Food Consumption Data) corpus. This corpus contains food consumption data-recipes-annotated with semantic tags from the aforementioned four different external semantic resources. With these annotations, data interoperability is achieved between five semantic resources from different domains. This resource can be further utilized for developing and training different information extraction pipelines using state-of-the-art NLP approaches for tracing knowledge about food safety applications.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA