Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Lang Resour Eval ; 56(4): 1229-1268, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35194415

RESUMO

Te reo Maori, the Indigenous language of Aotearoa New Zealand, is a distinctive feature of the nation's cultural heritage. This paper documents our efforts to build a corpus of 79,000 Maori-language tweets using computational methods. The Reo Maori Twitter (RMT) Corpus was created by targeting Maori-language users identified by the Indigenous Tweets website, pre-processing their data and filtering out non-Maori tweets, together with other sources of noise. Our motivation for creating such a resource is three-fold: (1) it serves as a rich and unique dataset for linguistic analysis of te reo Maori on social media; (2) it can be used as training data to develop and augment Natural Language Processing (NLP) tools with robust, real-world Maori-language applications; and (3) it will potentially promote awareness of, and encourage positive interaction with, the growing community of Maori tweeters, thereby increasing the use and visibility of te reo Maori in an online environment. While the corpus captures data from 2007 to 2020, our analysis shows that the number of tweets in the RMT Corpus peaked in 2014, and the number of active tweeters peaked in 2017, although at least 600 users were still active in 2020. To the best of our knowledge, the RMT Corpus is the largest publicly-available collection of social media data containing (almost) exclusively Maori text, making it a useful resource for language experts, NLP developers and Indigenous researchers alike. Supplementary Information: The online version contains supplementary material available at 10.1007/s10579-022-09580-w.

2.
Front Artif Intell ; 3: 15, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33733134

RESUMO

Twitter constitutes a rich resource for investigating language contact phenomena. In this paper, we report findings from the analysis of a large-scale diachronic corpus of over one million tweets, containing loanwords from te reo Maori, the indigenous language spoken in New Zealand, into (primarily, New Zealand) English. Our analysis focuses on hashtags comprising mixed-language resources (which we term hybrid hashtags), bringing together descriptive linguistic tools (investigating length, word class, and semantic domains of the hashtags) and quantitative methods (Random Forests and regression analysis). Our work has implications for language change and the study of loanwords (we argue that hybrid hashtags can be linked to loanword entrenchment), and for the study of language on social media (we challenge proposals of hashtags as "words," and show that hashtags have a dual discourse role: a micro-function within the immediate linguistic context in which they occur and a macro-function within the tweet as a whole).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...