Search | Global Index Medicus

Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of Genomics & Informatics

Hyun-Seok PARK.

Genomics & Informatics ; : e40-2018.

Article in English | WPRIM | ID: wpr-739673

ABSTRACT

There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.

Subject(s)

Genomics , Informatics

GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction

So-Yeon OH; Ji-Hyeon KIM; Seo-Jin KIM; Hee-Jo NAM; Hyun-Seok PARK.

Genomics & Informatics ; : 75-77, 2018.

Article in English | WPRIM | ID: wpr-716819

ABSTRACT

Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.

Subject(s)

Data Mining , Genome , Genomics , Informatics , Information Storage and Retrieval , Korea , Linguistics , Natural Language Processing , Semantics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL