Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Añadir filtros








Intervalo de año
1.
Artículo en Inglés | WPRIM | ID: wpr-763804

RESUMEN

Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.


Asunto(s)
Conjunto de Datos , Almacenamiento y Recuperación de la Información , Métodos , Semántica , Vocabulario
2.
Artículo en Inglés | WPRIM | ID: wpr-763805

RESUMEN

In this paper, we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design elements of annotation models and processes that are particularly problematic for, or amenable to, enabling seamless communication across different platforms. The study is conducted in the context of a specific annotation methodology, namely machine-assisted interactive annotation (also known as human-in-the-loop annotation). This methodology requires the ability to freely combine resources from different document repositories, access a wide array of NLP tools that automatically annotate corpora for various linguistic phenomena, and use a sophisticated annotation editor that enables interactive manual annotation coupled with on-the-fly machine learning. We consider three independently developed platforms, each of which utilizes a different model for representing annotations over text, and each of which performs a different role in the process.


Asunto(s)
Lingüística , Aprendizaje Automático , Procesamiento de Lenguaje Natural
3.
Artículo en Inglés | WPRIM | ID: wpr-763810

RESUMEN

The total number of scholarly publications grows day by day, making it necessary to explore and use simple yet effective ways to expose their metadata. Schema.org supports adding structured metadata to web pages via markup, making it easier for data providers but also for search engines to provide the right search results. Bioschemas is based on the standards of schema.org, providing new types, properties and guidelines for metadata, i.e., providing metadata profiles tailored to the Life Sciences domain. Here we present our proposed contribution to Bioschemas (from the project “Biotea”), which supports metadata contributions for scholarly publications via profiles and web components. Biotea comprises a semantic model to represent publications together with annotated elements recognized from the scientific text; our Biotea model has been mapped to schema.org following Bioschemas standards.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Motor de Búsqueda , Semántica
4.
Artículo en Inglés | WPRIM | ID: wpr-739673

RESUMEN

There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.


Asunto(s)
Genómica , Informática
5.
Genomics & Informatics ; : 75-77, 2018.
Artículo en Inglés | WPRIM | ID: wpr-716819

RESUMEN

Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.


Asunto(s)
Minería de Datos , Genoma , Genómica , Informática , Almacenamiento y Recuperación de la Información , Corea (Geográfico) , Lingüística , Procesamiento de Lenguaje Natural , Semántica
6.
Artículo en Chino | WPRIM | ID: wpr-619657

RESUMEN

The paper takes the reports and conference proceedings discussed by domain experts during 2015-2016 International Biocu ration Conference and the research literatures about biocuration and data biocuration in PubMedCentral in recent 5 years as the data sources,analyzes,concludes and summarizes the research subject of biocuration through the content analysis method,and focuses on the sorting of working mechanism of biocuration,construction & application,integration & visualization,review and editing & application of biomedical data standards,mining of biomedical texts,in order to provide international experience for the development of biocuration in China.

7.
Artículo en Chino | WPRIM | ID: wpr-482029

RESUMEN

Five genes that are closely related with leukemia were detected and identified using COREMINE Medi-cal, and the abstracts of related papers covered in PubMed were analyzed with the biomedical text mining tool, Chilibot, which showed that leukemia interacts with the 5 genes detected using COREMINE Medical.

8.
Genomics & Informatics ; : 99-106, 2004.
Artículo en Inglés | WPRIM | ID: wpr-217504

RESUMEN

In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.


Asunto(s)
Minería de Datos , Minería , Procesamiento de Lenguaje Natural , Aprendizaje Automático
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA