Pesquisa | Portal Regional da BVS

GlyReShot: A glyph-aware model with label refinement for few-shot Chinese agricultural named entity recognition.

Liu, Haitao; Song, Jihua; Peng, Weiming.

Heliyon ; 10(12): e32093, 2024 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-38948047

RESUMO

Chinese agricultural named entity recognition (NER) has been studied with supervised learning for many years. However, considering the scarcity of public datasets in the agricultural domain, exploring this task in the few-shot scenario is more practical for real-world demands. In this paper, we propose a novel model named GlyReShot, integrating the knowledge of Chinese character glyph into few-shot NER models. Although the utilization of glyph has been proven successful in supervised models, two challenges still persist in the few-shot setting, i.e., how to obtain glyph representations and when to integrate them into the few-shot model. GlyReShot handles the two challenges by introducing a lightweight glyph representation obtaining module and a training-free label refinement strategy. Specifically, the glyph representations are generated based on the descriptive sentences by filling the predefined template. As most steps come before training, this module aligns well with the few-shot setting. Furthermore, by computing the confidence values for draft predictions, the refinement strategy selectively utilizes the glyph information only when the confidence values are relatively low, thus mitigating the influence of noise. Finally, we annotate a new agricultural NER dataset and the experimental results demonstrate effectiveness of GlyReShot for few-shot Chinese agricultural NER.

Improving Automated Essay Scoring by Prompt Prediction and Matching.

Sun, Jingbo; Song, Tianbao; Song, Jihua; Peng, Weiming.

Entropy (Basel) ; 24(9)2022 Aug 29.

Artigo em Inglês | MEDLINE | ID: mdl-36141091

RESUMO

Automated essay scoring aims to evaluate the quality of an essay automatically. It is one of the main educational application in the field of natural language processing. Recently, Pre-training techniques have been used to improve performance on downstream tasks, and many studies have attempted to use pre-training and then fine-tuning mechanisms in an essay scoring system. However, obtaining better features such as prompts by the pre-trained encoder is critical but not fully studied. In this paper, we create a prompt feature fusion method that is better suited for fine-tuning. Besides, we use multi-task learning by designing two auxiliary tasks, prompt prediction and prompt matching, to obtain better features. The experimental results show that both auxiliary tasks can improve model performance, and the combination of the two auxiliary tasks with the NEZHA pre-trained encoder produces the best results, with Quadratic Weighted Kappa improving 2.5% and Pearson's Correlation Coefficient improving 2% on average across all results on the HSK dataset.

CharAs-CBert: Character Assist Construction-Bert Sentence Representation Improving Sentiment Classification.

Chen, Bo; Peng, Weiming; Song, Jihua.

Sensors (Basel) ; 22(13)2022 Jul 03.

Artigo em Inglês | MEDLINE | ID: mdl-35808519

RESUMO

In the process of semantic capture, traditional sentence representation methods tend to lose a lot of global and contextual semantics and ignore the internal structure information of words in sentences. To address these limitations, we propose a sentence representation method for character-assisted construction-Bert (CharAs-CBert) to improve the accuracy of sentiment text classification. First, based on the construction, a more effective construction vector is generated to distinguish the basic morphology of the sentence and reduce the ambiguity of the same word in different sentences. At the same time, it aims to strengthen the representation of salient words and effectively capture contextual semantics. Second, character feature vectors are introduced to explore the internal structure information of sentences and improve the representation ability of local and global semantics. Then, to make the sentence representation have better stability and robustness, character information, word information, and construction vectors are combined and used together for sentence representation. Finally, the evaluation and verification are carried out on various open-source baseline data such as ACL-14 and SemEval 2014 to demonstrate the validity and reliability of sentence representation, namely, the F1 and ACC are 87.54% and 92.88% on ACL14, respectively.

Assuntos

Chara , Idioma , Reprodutibilidade dos Testes , Semântica , Análise de Sentimentos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA