Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Int J Data Sci Anal ; 15(3): 313-327, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35730040

RESUMO

The rampant of COVID-19 infodemic has almost been simultaneous with the outbreak of the pandemic. Many concerted efforts are made to mitigate its negative effect to information credibility and data legitimacy. Existing work mainly focuses on fact-checking algorithms or multi-class labeling models that are less aware of the intrinsic characteristics of the language. Nor is it discussed how such representations can account for the common psycho-socio-behavior of the information consumers. This work takes a data-driven analytical approach to (1) describe the prominent lexical and grammatical features of COVID-19 misinformation; (2) interpret the underlying (psycho-)linguistic triggers in terms of sentiment, power and activity based on the affective control theory; (3) study the feature indexing for anti-infodemic modeling. The results show distinct language generalization patterns of misinformation of favoring evaluative terms and multimedia devices in delivering a negative sentiment. Such appeals are effective to arouse people's sympathy toward the vulnerable community and foment their spreading behavior.

2.
Database (Oxford) ; 20222022 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-36426767

RESUMO

The Coronavirus Disease 2019 (COVID-19) pandemic has shifted the focus of research worldwide, and more than 10 000 new articles per month have concentrated on COVID-19-related topics. Considering this rapidly growing literature, the efficient and precise extraction of the main topics of COVID-19-relevant articles is of great importance. The manual curation of this information for biomedical literature is labor-intensive and time-consuming, and as such the procedure is insufficient and difficult to maintain. In response to these complications, the BioCreative VII community has proposed a challenging task, LitCovid Track, calling for a global effort to automatically extract semantic topics for COVID-19 literature. This article describes our work on the BioCreative VII LitCovid Track. We proposed the LitCovid Ensemble Learning (LCEL) method for the tasks and integrated multiple biomedical pretrained models to address the COVID-19 multi-label classification problem. Specifically, seven different transformer-based pretrained models were ensembled for the initialization and fine-tuning processes independently. To enhance the representation abilities of the deep neural models, diverse additional biomedical knowledge was utilized to facilitate the fruitfulness of the semantic expressions. Simple yet effective data augmentation was also leveraged to address the learning deficiency during the training phase. In addition, given the imbalanced label distribution of the challenging task, a novel asymmetric loss function was applied to the LCEL model, which explicitly adjusted the negative-positive importance by assigning different exponential decay factors and helped the model focus on the positive samples. After the training phase, an ensemble bagging strategy was adopted to merge the outputs from each model for final predictions. The experimental results show the effectiveness of our proposed approach, as LCEL obtains the state-of-the-art performance on the LitCovid dataset. Database URL: https://github.com/JHnlp/LCEL.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Bases de Dados Factuais , Semântica , Aprendizado de Máquina
3.
BMC Bioinformatics ; 23(1): 259, 2022 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-35768777

RESUMO

BACKGROUND: The COVID-19 pandemic has increasingly accelerated the publication pace of scientific literature. How to efficiently curate and index this large amount of biomedical literature under the current crisis is of great importance. Previous literature indexing is mainly performed by human experts using Medical Subject Headings (MeSH), which is labor-intensive and time-consuming. Therefore, to alleviate the expensive time consumption and monetary cost, there is an urgent need for automatic semantic indexing technologies for the emerging COVID-19 domain. RESULTS: In this research, to investigate the semantic indexing problem for COVID-19, we first construct the new COVID-19 Semantic Indexing dataset, which consists of more than 80 thousand biomedical articles. We then propose a novel semantic indexing framework based on the multi-probe attention neural network (MPANN) to address the COVID-19 semantic indexing problem. Specifically, we employ a k-nearest neighbour based MeSH masking approach to generate candidate topic terms for each input article. We encode and feed the selected candidate terms as well as other contextual information as probes into the downstream attention-based neural network. Each semantic probe carries specific aspects of biomedical knowledge and provides informatively discriminative features for the input article. After extracting the semantic features at both term-level and document-level through the attention-based neural network, MPANN adopts a linear multi-view classifier to conduct the final topic prediction for COVID-19 semantic indexing. CONCLUSION: The experimental results suggest that MPANN promises to represent the semantic features of biomedical texts and is effective in predicting semantic topics for COVID-19 related biomedical articles.


Assuntos
COVID-19 , Semântica , Humanos , Medical Subject Headings , Redes Neurais de Computação , Pandemias
4.
PLoS One ; 17(1): e0260210, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34982791

RESUMO

Leech's corpus-based comparison of English modal verbs from 1961 to 1992 showed the steep decline of all modal verbs together, which he ascribed to continuing changes towards a more equal and less authority-driven society. This study inspired many diachronic and synchronic studies, mostly on English modal verbs and largely assuming the correlation between the use of modal verbs and power relations. Yet, there are continuing debates on sampling design and the choices of corpora. In addition, this hypothesis has not been attested in any other language with comparable corpus size or examined with longitudinal studies. This study tracks the use of Chinese modal verbs from 1901 to 2009, covering the historical events of the New Culture Movement, the establishment of the PRC, the implementation of simplified characters and the completion and finalization of simplification of the Chinese writing system. We found that the usage of modal verbs did rise and fall during the last century, and for more complex reasons. We also demonstrated that our longitudinal end-to-end approach produces convincing analysis on English modal verbs that reconciles conflicting results in the literature adopting Leech's point-to-point approach.


Assuntos
Idioma , Mudança Social , Big Data , China
5.
Behav Res Methods ; 54(2): 987-1009, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34405389

RESUMO

In this article we present the Database of Word-Level Statistics for Mandarin Chinese (DoWLS-MAN). The database addresses the lack of agreement in phonological syllable segmentation specific to Mandarin by offering phonological features for each lexical item according to 16 schematic representations of the syllable (8 with tone and 8 without tone). Those lexical statistics that differ per phonological word and nonword due to changes in syllable segmentation are of the variant category and include subtitle lexical frequency, phonological neighborhood density measures, homophone density, and network science measures. The invariant characteristics consist of each items' lexical tone, phonological transcription, and syllable structure among others. The goal of DoWLS-MAN is to provide researchers both the ability to choose stimuli that are derived from a segmentation schema that supports an existing model of Mandarin speech processing, and the ability to choose stimuli that allow for the testing of hypotheses on phonological segmentation according to multiple schemas. In an exploratory analysis we illustrate how multiple schematic representations of the phonological mental lexicon can aid in hypothesis generation, specifically in terms of phonological processing when reading Chinese orthography. Users of the database can search among over 92,000 words, over 1600 out-of-vocabulary Chinese characters, and 4300 phonological nonwords according to either Chinese orthography, pinyin, or ASCII phonetic script. Users can also generate a list of phonological words and nonwords according to user-defined ranges and categories of lexical characteristics. DoWLS-MAN is available to the public for search or download at https://dowls.site .


Assuntos
Idioma , Fonética , China , Humanos , Leitura , Vocabulário
6.
PLoS One ; 16(2): e0245984, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33534795

RESUMO

This paper adopts models from epidemiology to account for the development and decline of neologisms based on internet usage. The research design focuses on the issue of whether a host-driven epidemic model is well-suited to explain human behavior regarding neologisms. We extracted the search frequency data from Google Trends that covers the ninety most influential Chinese neologisms from 2008-2016 and found that the majority of them possess a similar rapidly rising-decaying pattern. The epidemic model is utilized to fit the evolution of these internet-based neologisms. The epidemic model not only has good fitting performance to model the pattern of rapid growth, but also is able to predict the peak point in the neologism's life cycle. This result underlines the role of human agents in the life cycle of neologisms and supports the macro-theory that the evolution of human languages mirrors the biological evolution of human beings.


Assuntos
Internet , Idioma , Modelos Teóricos , China , Epidemias , Fatores de Tempo
7.
Sci Rep ; 10(1): 596, 2020 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-31937800

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

8.
Sci Rep ; 9(1): 15984, 2019 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-31690737

RESUMO

We investigated network principles underlying mental search through a novel phonological verbal fluency task. Post exclusion, 95 native-language Mandarin speakers produced as many items that differed by a single lexical tone as possible within one minute. Their verbal productions were assessed according to several novel graded fluency measures, and network science measures that accounted for the structure, cohesion and interconnectedness of lexical items. A multivariate regression analysis of our participants' language backgrounds included their mono- or multi-lingual status, English proficiency, and fluency in other Chinese languages/dialects. Higher English proficiency predicted lower error rates and greater interconnectedness, while higher fluency in other Chinese languages/dialects revealed lower successive similarity and lower network coherence. This inverse relationship between English and other Chinese languages/dialects provides evidence of the restructuring of the phonological mental lexicon.


Assuntos
Multilinguismo , Adolescente , Adulto , Feminino , Hong Kong , Humanos , Testes de Linguagem , Masculino , Comportamento Verbal , Adulto Jovem
9.
PLoS One ; 14(2): e0211336, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30785906

RESUMO

Modality exclusivity norms have been developed in different languages for research on the relationship between perceptual and conceptual systems. This paper sets up the first modality exclusivity norms for Chinese, a Sino-Tibetan language with semantics as its orthographically relevant level. The norms are collected through two studies based on Chinese sensory words. The experimental designs take into consideration the morpho-lexical and orthographic structures of Chinese. Study 1 provides a set of norms for Mandarin Chinese single-morpheme words in mean ratings of the extent to which a word is experienced through the five sense modalities. The degrees of modality exclusivity are also provided. The collected norms are further analyzed to examine how sub-lexical orthographic representations of sense modalities in Chinese characters affect speakers' interpretation of the sensory words. In particular, we found higher modality exclusivity rating for the sense modality explicitly represented by a semantic radical component, as well as higher auditory dominant modality rating for characters with transparent phonetic symbol components. Study 2 presents the mean ratings and modality exclusivity of coordinate disyllabic compounds involving multiple sense modalities. These studies open new perspectives in the study of modality exclusivity. First, links between modality exclusivity and writing systems have been established which has strengthened previous accounts of the influence of orthography in the processing of visual information in reading. Second, a new set of modality exclusivity norms of compounds is proposed to show the competition of influence on modality exclusivity from different linguistic factors and potentially allow such norms to be linked to studies on synesthesia and semantic transparency.


Assuntos
Idioma , Adulto , Percepção Auditiva , China , Feminino , Humanos , Masculino , Percepção Gustatória , Percepção Visual
10.
Front Psychol ; 9: 2110, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30467487

RESUMO

This study explores the issues involving pragmatic inferences with prosodic cues. Although there is a well-established literature from multiple languages demonstrating how different pragmatic inferences can be applied to the same syntactic structure, few studies discuss whether prosody can determine types of alternative sets based on the same syntactic structure. In Mandarin Chinese, the same sentence containing a numeral-classifier phrase as a negative polarity item can be employed for two types of scalar inferences based on either the numeral or the noun. The sentence wo yi zhi mayi dou mei kan dao ("I didn't even see one ant") can induce two different scalar inferences: Quantity-contrast ('I did not see one ant, much less two ants, three ants, and so on' by drawing a contrast against the minimal quantity of one), and Type-contrast ('I did not see an ant, much less a dog, a cat, a human being, and so on' by drawing a contrast against the minimally surprising type, that of ants). Taking advantage of similar sentences with the syntactic structure and lexical items, our study examines whether prosodic conditions can guide people to choose pragmatic inferences from a set of options based on the same syntactic structure. The experiments of this study are designed to answer whether prosody interacts with contextual information in this grammatical structure. The results suggest that Mandarin speakers can use sentence prosody to determine which inference is intended, at least in experimental contexts that directly probe explicit awareness of prosody. Prosody does play a role in inducing scalar inferences, but contextual information can override the effects of prosody. Each prosodic pattern can evoke a specific set of scalar inferences, but quantity-contrast inferences are favored over type-contrast inferences. Our experiments show that prosodic prominence can serve as a linguistic cue to pragmatic inferences.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...