Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
Front Artif Intell ; 6: 1223924, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37808622

RESUMO

In the field of automatic text simplification, assessing whether or not the meaning of the original text has been preserved during simplification is of paramount importance. Metrics relying on n-gram overlap assessment may struggle to deal with simplifications which replace complex phrases with their simpler paraphrases. Current evaluation metrics for meaning preservation based on large language models (LLMs), such as BertScore in machine translation or QuestEval in summarization, have been proposed. However, none has a strong correlation with human judgment of meaning preservation. Moreover, such metrics have not been assessed in the context of text simplification research. In this study, we present a meta-evaluation of several metrics we apply to measure content similarity in text simplification. We also show that the metrics are unable to pass two trivial, inexpensive content preservation tests. Another contribution of this study is MeaningBERT (https://github.com/GRAAL-Research/MeaningBERT), a new trainable metric designed to assess meaning preservation between two sentences in text simplification, showing how it correlates with human judgment. To demonstrate its quality and versatility, we will also present a compilation of datasets used to assess meaning preservation and benchmark our study against a large selection of popular metrics.

2.
Front Artif Intell ; 5: 991242, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36213165

RESUMO

Even in highly-developed countries, as many as 15-30% of the population can only understand texts written using a basic vocabulary. Their understanding of everyday texts is limited, which prevents them from taking an active role in society and making informed decisions regarding healthcare, legal representation, or democratic choice. Lexical simplification is a natural language processing task that aims to make text understandable to everyone by replacing complex vocabulary and expressions with simpler ones, while preserving the original meaning. It has attracted considerable attention in the last 20 years, and fully automatic lexical simplification systems have been proposed for various languages. The main obstacle for the progress of the field is the absence of high-quality datasets for building and evaluating lexical simplification systems. In this study, we present a new benchmark dataset for lexical simplification in English, Spanish, and (Brazilian) Portuguese, and provide details about data selection and annotation procedures, to enable compilation of comparable datasets in other languages and domains. As the first multilingual lexical simplification dataset, where instances in all three languages were selected and annotated using comparable procedures, this is the first dataset that offers a direct comparison of lexical simplification systems for three languages. To showcase the usability of the dataset, we adapt two state-of-the-art lexical simplification systems with differing architectures (neural vs. non-neural) to all three languages (English, Spanish, and Brazilian Portuguese) and evaluate their performances on our new dataset. For a fairer comparison, we use several evaluation measures which capture varied aspects of the systems' efficacy, and discuss their strengths and weaknesses. We find that a state-of-the-art neural lexical simplification system outperforms a state-of-the-art non-neural lexical simplification system in all three languages, according to all evaluation measures. More importantly, we find that the state-of-the-art neural lexical simplification systems perform significantly better for English than for Spanish and Portuguese, thus posing a question if such an architecture can be used for successful lexical simplification in other languages, especially the low-resourced ones.

3.
Bioinformatics ; 36(6): 1872-1880, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31730202

RESUMO

MOTIVATION: Biomedical literature is one of the most relevant sources of information for knowledge mining in the field of Bioinformatics. In spite of English being the most widely addressed language in the field; in recent years, there has been a growing interest from the natural language processing community in dealing with languages other than English. However, the availability of language resources and tools for appropriate treatment of non-English texts is lacking behind. Our research is concerned with the semantic annotation of biomedical texts in the Spanish language, which can be considered an under-resourced language where biomedical text processing is concerned. RESULTS: We have carried out experiments to assess the effectiveness of several methods for the automatic annotation of biomedical texts in Spanish. One approach is based on the linguistic analysis of Spanish texts and their annotation using an information retrieval and concept disambiguation approach. A second method takes advantage of a Spanish-English machine translation process to annotate English documents and transfer annotations back to Spanish. A third method takes advantage of the combination of both procedures. Our evaluation shows that a combined system has competitive advantages over the two individual procedures. AVAILABILITY AND IMPLEMENTATION: UMLSMapper (https://snlt.vicomtech.org/umlsmapper) and the annotation transfer tool (http://scientmin.taln.upf.edu/anntransfer/) are freely available for research purposes as web services and/or demos. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento de Linguagem Natural , Semântica , Armazenamento e Recuperação da Informação
4.
Subj. procesos cogn ; 14(2): 247-259, dic. 2010. tab, ilus
Artigo em Espanhol | LILACS | ID: lil-576377

RESUMO

Describimos la aplicación de la tecnología de procesamiento de lenguaje natural (NLP) al análisis del lenguaje subjetivo. En particular, nos concentramos en la problemática de la clasificación de opinión de material textual extraído de fuentes de datos relacionados con negocios. Estudiamos la derivación de los valores de opiniones de palabras a partir del recurso léxico SentiWordNet y utilizamos estos valores para la interpretación de texto con el objetivo de obtener la valoración de una opinión a partir de sus palabras y frases. Utilizamos características de las palabras para inducir un clasificador basado en el uso de Máquinas de Vectores de Soporte que alcanzan resultados acordes con el estado del arte. También mostramos experimentos preliminares en los que el uso de resúmenes de opiniones ofrece ventaja competitiva para el problema de clasificación respecto del uso de documentos completos cuando los documentos son extensos y contienen material tanto subjetivo como no-subjetivo.


We describe the application of natural language processing (NLP) technology to the analysis of subjective language. In particular we concentrate on the problem of opinion classification of textual material extracted from business-related data-sources. We study the derivation of sentiment values for words from the SentiWordNet lexicalresource and use them for text interpretation to produce word, sentence, and text based sentiment features for opinion classification. We use word-based and sentiment basedfeatures to induce a classifier based on the use of Support Vector Machinesachieving state of the art results. We also show preliminary experiments where the use of summaries before opinion classification provides competitive advantage over the use of full documents when the documents are long and contain both subjective andnon-subjective material.


Assuntos
Idioma , Processamento de Linguagem Natural , Software , Psicologia
5.
Subj. procesos cogn ; 14(2): 247-259, dic. 2010. tab, ilus
Artigo em Espanhol | BINACIS | ID: bin-125395

RESUMO

Describimos la aplicación de la tecnología de procesamiento de lenguaje natural (NLP) al análisis del lenguaje subjetivo. En particular, nos concentramos en la problemática de la clasificación de opinión de material textual extraído de fuentes de datos relacionados con negocios. Estudiamos la derivación de los valores de opiniones de palabras a partir del recurso léxico SentiWordNet y utilizamos estos valores para la interpretación de texto con el objetivo de obtener la valoración de una opinión a partir de sus palabras y frases. Utilizamos características de las palabras para inducir un clasificador basado en el uso de Máquinas de Vectores de Soporte que alcanzan resultados acordes con el estado del arte. También mostramos experimentos preliminares en los que el uso de resúmenes de opiniones ofrece ventaja competitiva para el problema de clasificación respecto del uso de documentos completos cuando los documentos son extensos y contienen material tanto subjetivo como no-subjetivo.(AU)


We describe the application of natural language processing (NLP) technology to the analysis of subjective language. In particular we concentrate on the problem of opinion classification of textual material extracted from business-related data-sources. We study the derivation of sentiment values for words from the SentiWordNet lexicalresource and use them for text interpretation to produce word, sentence, and text based sentiment features for opinion classification. We use word-based and sentiment basedfeatures to induce a classifier based on the use of Support Vector Machinesachieving state of the art results. We also show preliminary experiments where the use of summaries before opinion classification provides competitive advantage over the use of full documents when the documents are long and contain both subjective andnon-subjective material.(AU)


Assuntos
Psicologia , Idioma , Software , Processamento de Linguagem Natural
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...