Pesquisa | Portal Regional da BVS

1.

Comparison of various approaches to tagging for the inflectional Slovak language.

Benko, Lubomír; Munkova, Dasa; Pappová, Mária; Munk, Michal.

PeerJ Comput Sci ; 10: e2026, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38855261

RESUMO

Morphological tagging provides essential insights into grammar, structure, and the mutual relationships of words within the sentence. Tagging text in a highly inflectional language presents a challenging task due to word ambiguity. This research aims to compare six different automatic taggers for the inflectional Slovak language, seeking for the most accurate tagger for literary and non-literary texts. Our results indicate that it is useful to differentiate texts into literary and non-literary and subsequently, based on the text style to deploy a tagger. For literary texts, UDPipe2 outperformed others in seven out of nine examined tagset positions. Conversely, for non-literary texts, the RNNTagger exhibited the highest performance in eight out of nine examined tagset positions. The RNNTagger is recommended for both types of the text, the best captures the inflection of the Slovak language, but UDPipe2 demonstrates a higher accuracy for literary texts. Despite dataset size limitations, this study emphasizes the suitability of various taggers for the inflectional languages like Slovak.

2.

The use of residual analysis to improve the error rate accuracy of machine translation.

Benko, Lubomír; Munkova, Dasa; Munk, Michal; Benkova, Lucia; Hajek, Petr.

Sci Rep ; 14(1): 9293, 2024 Apr 23.

Artigo em Inglês | MEDLINE | ID: mdl-38654050

RESUMO

The aim of the study is to compare two different approaches to machine translation-statistical and neural-using automatic MT metrics of error rate and residuals. We examined four available online MT systems (statistical Google Translate, neural Google Translate, and two European commission's MT tools-statistical mt@ec and neural eTranslation) through their products (MT outputs). We propose using residual analysis to improve the accuracy of machine translation error rate. Residuals represent a new approach to comparing the quality of statistical and neural MT outputs. The study provides new insights into evaluating machine translation quality from English and German into Slovak through automatic error rate metrics. In the category of prediction and syntactic-semantic correlativeness, statistical MT showed a significantly higher error rate than neural MT. Conversely, in the category of lexical semantics, neural MT showed a significantly higher error rate than statistical MT. The results indicate that relying solely on the reference when determining MT quality is insufficient. However, when combined with residuals, it offers a more objective view of MT quality and facilitates the comparison of statistical MT and neural MT.

3.

Evaluating automatic sentence alignment approaches on English-Slovak sentences.

Forgac, Frantisek; Munkova, Dasa; Munk, Michal; Kelebercova, Livia.

Sci Rep ; 13(1): 20123, 2023 11 17.

Artigo em Inglês | MEDLINE | ID: mdl-37978270

RESUMO

Parallel texts represent a very valuable resource in many applications of natural language processing. The fundamental step in creating parallel corpus is the alignment. Sentence alignment is the issue of finding correspondence between source sentences and their equivalent translations in the target text. A number of automatic sentence alignment approaches were proposed including neural networks, which can be divided into length-based, lexicon-based, and translation-based. In our study, we used five different aligners, namely Bilingual sentence aligner (BSA), Hunalign, Bleualign, Vecalign, and Bertalign. We evaluated both, the performance of the Bertalign in terms of accuracy against the up to now employed aligners as well as among each other in the language pair English-Sovak. We created our custom corpus consisting of texts collected in 2021 and 2022. Vecalign and Bertalign performed statistically significantly best and BSA the worst. Hunalign and Bleualign achieved the same performance in terms of F1 score. However, Bleualign achieved the most diverse results in terms of performance.

Assuntos

Idioma , Processamento de Linguagem Natural , Eslováquia , Redes Neurais de Computação

4.

Communication models in a foreign language in relation to cognitive style category width and power distance.

Munkova, Dasa; Stranovska, Eva; Munk, Michal.

Front Psychol ; 14: 1272370, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38259576

RESUMO

Introduction: Understanding how category width of cognitive style and power distance impact language use in cultures is crucial for improving cross-cultural communication. We attempt to reveal how English foreign language students, affected by high-context culture, communicate in English as a foreign language. What models of foreign communicative competence do they create? Methods: We applied association rule analysis to find out how the category width of cognitive style affects the foreign communication competence in relation to culture and language. Results: The requester tends to be more formal and transfers conventional norms of the culture of the mother tongue into English, which mainly affects the use of alerters and external modifications of the head act of request. Discussion: A broad categorizer, regardless of social distance, prefers to formulate the request in a conditional over the present tense form, contrary to narrow categorizers who, in a situation of social proximity, prefer the request form in the present tense. A similar finding was shown in the case of external modifications of the head act, where we observed the inversion between broad and narrow categorizers, mainly in the use of minimizers and mitigating devices.

5.

The role of automated evaluation techniques in online professional translator training.

Munkova, Dasa; Munk, Michal; Benko, Lubomír; Hajek, Petr.

PeerJ Comput Sci ; 7: e706, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34712792

RESUMO

The rapid technologisation of translation has influenced the translation industry's direction towards machine translation, post-editing, subtitling services and video content translation. Besides, the pandemic situation associated with COVID-19 has rapidly increased the transfer of business and education to the virtual world. This situation has motivated us not only to look for new approaches to online translator training, which requires a different method than learning foreign languages but in particular to look for new approaches to assess translator performance within online educational environments. Translation quality assessment is a key task, as the concept of quality is closely linked to the concept of optimization. Automatic metrics are very good indicators of quality, but they do not provide sufficient and detailed linguistic information about translations or post-edited machine translations. However, using their residuals, we can identify the segments with the largest distances between the post-edited machine translations and machine translations, which allow us to focus on a more detailed textual analysis of suspicious segments. We introduce a unique online teaching and learning system, which is specifically "tailored" for online translators' training and subsequently we focus on a new approach to assess translators' competences using evaluation techniques-the metrics of automatic evaluation and their residuals. We show that the residuals of the metrics of accuracy (BLEU_n) and error rate (PER, WER, TER, CDER, and HTER) for machine translation post-editing are valid for translator assessment. Using the residuals of the metrics of accuracy and error rate, we can identify errors in post-editing (critical, major, and minor) and subsequently utilize them in more detailed linguistic analysis.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA