Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Cogn Sci ; 48(1): e13402, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38226686

RESUMO

Distinctive aspects of a culture are often reflected in the meaning and usage of words in the language spoken by bearers of that culture. Keywords such as душа (soul) in Russian, hati (heart) in Indonesian and Malay, and gezellig (convivial/cosy/fun) in Dutch are held to be especially culturally revealing, and scholars have identified a number of such keywords using careful linguistic analyses (Peeters, 2020b; Wierzbicka, 1990). Because keywords are expected to have different statistical properties than related words in other languages, we argue that a quantitative comparison of word usage across languages can help to identify cultural keywords. To support this claim, we describe a computational method that compares word frequencies across languages, and apply it to both linguistic corpora and word association data. The method identifies culturally specific words that range from "obvious" examples, such as Amsterdam in Dutch, to non-obvious yet independently proposed examples, such as hati (heart) in Indonesian. We show in addition that linguistic corpora and word association data provide converging evidence about culturally specific words. Our results therefore show how computational analyses and behavioral experiments can supplement the methods previously used by linguists to identify culturally salient words across languages.


Assuntos
Idioma , Linguística , Humanos
2.
J Cheminform ; 13(1): 97, 2021 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-34895295

RESUMO

Chemical patents are a commonly used channel for disclosing novel compounds and reactions, and hence represent important resources for chemical and pharmaceutical research. Key chemical data in patents is often presented in tables. Both the number and the size of tables can be very large in patent documents. In addition, various types of information can be presented in tables in patents, including spectroscopic and physical data, or pharmacological use and effects of chemicals. Since images of Markush structures and merged cells are commonly used in these tables, their structure also shows substantial variation. This heterogeneity in content and structure of tables in chemical patents makes relevant information difficult to find. We therefore propose a new text mining task of automatically categorising tables in chemical patents based on their contents. Categorisation of tables based on the nature of their content can help to identify tables containing key information, improving the accessibility of information in patents that is highly relevant for new inventions. For developing and evaluating methods for the table classification task, we developed a new dataset, called CHEMTABLES, which consists of 788 chemical patent tables with labels of their content type. We introduce this data set in detail. We further establish strong baselines for the table classification task in chemical patents by applying state-of-the-art neural network models developed for natural language processing, including TabNet, ResNet and Table-BERT on CHEMTABLES. The best performing model, Table-BERT, achieves a performance of 88.66 micro-averaged [Formula: see text] score on the table classification task. The CHEMTABLES dataset is publicly available at https://doi.org/10.17632/g7tjh7tbrj.3 , subject to the CC BY NC 3.0 license. Code/models evaluated in this work are in a Github repository https://github.com/zenanz/ChemTables .

3.
Front Res Metr Anal ; 6: 654438, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33870071

RESUMO

Chemical patents represent a valuable source of information about new chemical compounds, which is critical to the drug discovery process. Automated information extraction over chemical patents is, however, a challenging task due to the large volume of existing patents and the complex linguistic properties of chemical patents. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), was introduced to support the development of advanced text mining techniques for chemical patents. The ChEMU 2020 lab proposed two fundamental information extraction tasks focusing on chemical reaction processes described in chemical patents: (1) chemical named entity recognition, requiring identification of essential chemical entities and their roles in chemical reactions, as well as reaction conditions; and (2) event extraction, which aims at identification of event steps relating the entities involved in chemical reactions. The ChEMU 2020 lab received 37 team registrations and 46 runs. Overall, the performance of submissions for these tasks exceeded our expectations, with the top systems outperforming strong baselines. We further show the methods to be robust to variations in sampling of the test data. We provide a detailed overview of the ChEMU 2020 corpus and its annotation, showing that inter-annotator agreement is very strong. We also present the methods adopted by participants, provide a detailed analysis of their performance, and carefully consider the potential impact of data leakage on interpretation of the results. The ChEMU 2020 Lab has shown the viability of automated methods to support information extraction of key information in chemical patents.

4.
J Clin Epidemiol ; 104: 8-14, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30075189

RESUMO

OBJECTIVES: We developed a free, online tool (CrowdCARE: crowdcare.unimelb.edu.au) to crowdsource research critical appraisal. The aim was to examine the validity of this approach for assessing the methodological quality of systematic reviews. STUDY DESIGN AND SETTING: In this prospective, cross-sectional study, a sample of systematic reviews (N = 71), of heterogeneous quality, was critically appraised using the Assessing the Methodological Quality of Systematic Reviews (AMSTAR) tool, in CrowdCARE, by five trained novice and two expert raters. After performing independent appraisals, experts resolved any disagreements by consensus (to produce an "expert consensus" rating, as the gold-standard approach). RESULTS: The expert consensus rating was within ±1 (on an 11-point scale) of the individual expert ratings for 82% of studies and was within ±1 of the mean novice rating for 79% of studies. There was a strong correlation (r2 = 0.89, P < 0.0001) and very good concordance (κ = 0.67, 95% CI: 0.61-0.73) between the expert consensus rating and mean novice rating. CONCLUSION: Crowdsourcing can be used to assess the quality of systematic reviews. Novices can be trained to appraise systematic reviews and, on average, achieve a high degree of accuracy relative to experts. These proof-of-concept data demonstrate the merit of crowdsourcing, compared with current gold standards of appraisal, and the potential capacity for this approach to transform evidence-based practice worldwide by sharing the appraisal load.


Assuntos
Crowdsourcing/métodos , Projetos de Pesquisa/normas , Estudos Transversais , Prática Clínica Baseada em Evidências , Humanos , Estudos Prospectivos , Reprodutibilidade dos Testes , Revisões Sistemáticas como Assunto
5.
J Biomed Semantics ; 9(1): 7, 2018 01 30.
Artigo em Inglês | MEDLINE | ID: mdl-29382397

RESUMO

BACKGROUND: Relation extraction from biomedical publications is an important task in the area of semantic mining of text. Kernel methods for supervised relation extraction are often preferred over manual feature engineering methods, when classifying highly ordered structures such as trees and graphs obtained from syntactic parsing of a sentence. Tree kernels such as the Subset Tree Kernel and Partial Tree Kernel have been shown to be effective for classifying constituency parse trees and basic dependency parse graphs of a sentence. Graph kernels such as the All Path Graph kernel (APG) and Approximate Subgraph Matching (ASM) kernel have been shown to be suitable for classifying general graphs with cycles, such as the enhanced dependency parse graph of a sentence. In this work, we present a high performance Chemical-Induced Disease (CID) relation extraction system. We present a comparative study of kernel methods for the CID task and also extend our study to the Protein-Protein Interaction (PPI) extraction task, an important biomedical relation extraction task. We discuss novel modifications to the ASM kernel to boost its performance and a method to apply graph kernels for extracting relations expressed in multiple sentences. RESULTS: Our system for CID relation extraction attains an F-score of 60%, without using external knowledge sources or task specific heuristic or rules. In comparison, the state of the art Chemical-Disease Relation Extraction system achieves an F-score of 56% using an ensemble of multiple machine learning methods, which is then boosted to 61% with a rule based system employing task specific post processing rules. For the CID task, graph kernels outperform tree kernels substantially, and the best performance is obtained with APG kernel that attains an F-score of 60%, followed by the ASM kernel at 57%. The performance difference between the ASM and APG kernels for CID sentence level relation extraction is not significant. In our evaluation of ASM for the PPI task, ASM performed better than APG kernel for the BioInfer dataset, in the Area Under Curve (AUC) measure (74% vs 69%). However, for all the other PPI datasets, namely AIMed, HPRD50, IEPA and LLL, ASM is substantially outperformed by the APG kernel in F-score and AUC measures. CONCLUSIONS: We demonstrate a high performance Chemical Induced Disease relation extraction, without employing external knowledge sources or task specific heuristics. Our work shows that graph kernels are effective in extracting relations that are expressed in multiple sentences. We also show that the graph kernels, namely the ASM and APG kernels, substantially outperform the tree kernels. Among the graph kernels, we showed the ASM kernel as effective for biomedical relation extraction, with comparable performance to the APG kernel for datasets such as the CID-sentence level relation extraction and BioInfer in PPI. Overall, the APG kernel is shown to be significantly more accurate than the ASM kernel, achieving better performance on most datasets.


Assuntos
Gráficos por Computador , Mineração de Dados/métodos , Mapeamento de Interação de Proteínas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...