GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction

So-Yeon OH; Ji-Hyeon KIM; Seo-Jin KIM; Hee-Jo NAM; Hyun-Seok PARK

So-Yeon OH; Ji-Hyeon KIM; Seo-Jin KIM; Hee-Jo NAM; Hyun-Seok PARK.

Genomics & Informatics ; : 75-77, 2018.

Article em En | WPRIM | ID: wpr-716819

Biblioteca responsável: WPRO

ABSTRACT

ABSTRACT

Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.

Assuntos

Mineração de Dados; Genoma; Genômica; Informática; Armazenamento e Recuperação da Informação; Coreia (Geográfico); Linguística; Processamento de Linguagem Natural; Semântica

Palavras-chave

biomedical text mining; corpus linguistics; text analytics

Texto completo

Adicionar na Minha BVS

Imprimir

XML

Buscar no Google

Texto completo: 1 Base de dados: WPRIM Assunto principal: Semântica / Processamento de Linguagem Natural / Armazenamento e Recuperação da Informação / Genoma / Genômica / Informática / Mineração de Dados / Coreia (Geográfico) / Linguística País/Região como assunto: Asia Idioma: En Revista: Genomics & Informatics Ano de publicação: 2018 Tipo de documento: Article

Texto completo

Adicionar na Minha BVS

Imprimir

XML

Buscar no Google