Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Biodivers Data J ; 10: e77025, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35068979

RESUMO

VIETBIO [Innovative approaches to biodiversity discovery and characterisation in Vietnam] is a bilateral German-Vietnamese research and capacity building project focusing on the development and transfer of new methods and technology towards an integrated biodiversity discovery and monitoring system for Vietnam. Dedicated field training and testing of innovative methodologies were undertaken in Cuc Phuong National Park as part and with support of the project, which led to the new biodiversity data and records made available in this article collection. VIETBIO is a collaboration between the Museum für Naturkunde Berlin - Leibniz Institute for Evolution and Biodiversity Science (MfN), the Botanic Garden and Botanical Museum, Freie Universität Berlin (BGBM) and the Vietnam National Museum of Nature (VNMN), the Institute of Ecology and Biological Resources (IEBR), the Southern Institute of Ecology (SIE), as well as the Institute of Tropical Biology (ITB); all Vietnamese institutions belong to the Vietnam Academy of Science and Technology (VAST). The article collection "VIETBIO" (https://doi.org/10.3897/bdj.coll.63) reports original results of recent biodiversity recording and survey work undertaken in Cuc Phuong National Park, northern Vietnam, under the framework of the VIETBIO project. The collection consist of this "main" cover paper - characterising the study area, the general project approaches and activities, while also giving an extensive overview on previous studies from this area - followed by individual papers for higher taxa as studied during the project. The main purpose is to make primary biodiversity records openly available, including several new and interesting findings for this biodiversity-rich conservation area. All individual data papers with their respective primary records are expected to provide useful baselines for further taxonomic, phylogenetic, ecological and conservation-related studies on the respective taxa and, thus, will be maintained as separate datasets, including separate GUIDs also for further updating.

2.
J Cheminform ; 13(1): 97, 2021 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-34895295

RESUMO

Chemical patents are a commonly used channel for disclosing novel compounds and reactions, and hence represent important resources for chemical and pharmaceutical research. Key chemical data in patents is often presented in tables. Both the number and the size of tables can be very large in patent documents. In addition, various types of information can be presented in tables in patents, including spectroscopic and physical data, or pharmacological use and effects of chemicals. Since images of Markush structures and merged cells are commonly used in these tables, their structure also shows substantial variation. This heterogeneity in content and structure of tables in chemical patents makes relevant information difficult to find. We therefore propose a new text mining task of automatically categorising tables in chemical patents based on their contents. Categorisation of tables based on the nature of their content can help to identify tables containing key information, improving the accessibility of information in patents that is highly relevant for new inventions. For developing and evaluating methods for the table classification task, we developed a new dataset, called CHEMTABLES, which consists of 788 chemical patent tables with labels of their content type. We introduce this data set in detail. We further establish strong baselines for the table classification task in chemical patents by applying state-of-the-art neural network models developed for natural language processing, including TabNet, ResNet and Table-BERT on CHEMTABLES. The best performing model, Table-BERT, achieves a performance of 88.66 micro-averaged [Formula: see text] score on the table classification task. The CHEMTABLES dataset is publicly available at https://doi.org/10.17632/g7tjh7tbrj.3 , subject to the CC BY NC 3.0 license. Code/models evaluated in this work are in a Github repository https://github.com/zenanz/ChemTables .

3.
Front Res Metr Anal ; 6: 654438, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33870071

RESUMO

Chemical patents represent a valuable source of information about new chemical compounds, which is critical to the drug discovery process. Automated information extraction over chemical patents is, however, a challenging task due to the large volume of existing patents and the complex linguistic properties of chemical patents. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), was introduced to support the development of advanced text mining techniques for chemical patents. The ChEMU 2020 lab proposed two fundamental information extraction tasks focusing on chemical reaction processes described in chemical patents: (1) chemical named entity recognition, requiring identification of essential chemical entities and their roles in chemical reactions, as well as reaction conditions; and (2) event extraction, which aims at identification of event steps relating the entities involved in chemical reactions. The ChEMU 2020 lab received 37 team registrations and 46 runs. Overall, the performance of submissions for these tasks exceeded our expectations, with the top systems outperforming strong baselines. We further show the methods to be robust to variations in sampling of the test data. We provide a detailed overview of the ChEMU 2020 corpus and its annotation, showing that inter-annotator agreement is very strong. We also present the methods adopted by participants, provide a detailed analysis of their performance, and carefully consider the potential impact of data leakage on interpretation of the results. The ChEMU 2020 Lab has shown the viability of automated methods to support information extraction of key information in chemical patents.

4.
BMC Bioinformatics ; 20(1): 72, 2019 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-30755172

RESUMO

BACKGROUND: Given the importance of relation or event extraction from biomedical research publications to support knowledge capture and synthesis, and the strong dependency of approaches to this information extraction task on syntactic information, it is valuable to understand which approaches to syntactic processing of biomedical text have the highest performance. RESULTS: We perform an empirical study comparing state-of-the-art traditional feature-based and neural network-based models for two core natural language processing tasks of part-of-speech (POS) tagging and dependency parsing on two benchmark biomedical corpora, GENIA and CRAFT. To the best of our knowledge, there is no recent work making such comparisons in the biomedical context; specifically no detailed analysis of neural models on this data is available. Experimental results show that in general, the neural models outperform the feature-based models on two benchmark biomedical corpora GENIA and CRAFT. We also perform a task-oriented evaluation to investigate the influences of these models in a downstream application on biomedical event extraction, and show that better intrinsic parsing performance does not always imply better extrinsic event extraction performance. CONCLUSION: We have presented a detailed empirical study comparing traditional feature-based and neural network-based models for POS tagging and dependency parsing in the biomedical context, and also investigated the influence of parser selection for a biomedical event extraction downstream task. AVAILABILITY OF DATA AND MATERIALS: We make the retrained models available at https://github.com/datquocnguyen/BioPosDep .


Assuntos
Pesquisa Biomédica , Armazenamento e Recuperação da Informação , Fala , Algoritmos , Humanos , Processamento de Linguagem Natural , Redes Neurais de Computação , Publicações , Vocabulário
5.
Primates ; 58(3): 435-440, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28492971

RESUMO

Following the split of the silvered langurs of Indochina into two species based on molecular and phenotypic data, there is a need to reevaluate their distribution and update their conservation status. Here, we report the distribution and assess the population size of Germain's langur (Trachypithecus germaini) within its known range across Vietnam. We confirmed this species at six of seven survey sites in different habitats within three provinces in the Mekong Delta Region, including semi-evergreen forest at the Seven Mountains of An Giang Province, mangrove forest in Ngoc Hien and Nam Can Districts and Melaleuca forest in U Minh Ha National Park of Ca Mau Province, and limestone forest at Kien Luong Karst Area and semi-evergreen and evergreen forests at Phu Quoc National Park of Kien Giang Province. We found no evidence of this species in Mui Ca Mau National Park, Ca Mau Province where it was previously reported. We conservatively estimate that the total population of Germain's langurs in Vietnam consists of 362-406 individuals, with the largest population found in the Kien Luong Karst Area. Hunting and habitat loss are severely impacting Germain's langur, resulting in the extirpation of the population in Mui Ca Mau National Park and small, isolated populations in the Seven Mountains and Ngoc Hien and Nam Can Districts. However, the ability of this species to inhabit a wide range of forest types, and its increasing population sizes in Phu Quoc National Park and Kien Luong Karst Area, provide signs of hope that continued conservation actions may help in its long-term survival.


Assuntos
Cercopithecidae , Colobinae , Conservação dos Recursos Naturais , Animais , Ecossistema , Florestas , Vietnã
6.
J Med Internet Res ; 18(8): e232, 2016 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-27573910

RESUMO

BACKGROUND: In public health surveillance, measuring how information enters and spreads through online communities may help us understand geographical variation in decision making associated with poor health outcomes. OBJECTIVE: Our aim was to evaluate the use of community structure and topic modeling methods as a process for characterizing the clustering of opinions about human papillomavirus (HPV) vaccines on Twitter. METHODS: The study examined Twitter posts (tweets) collected between October 2013 and October 2015 about HPV vaccines. We tested Latent Dirichlet Allocation and Dirichlet Multinomial Mixture (DMM) models for inferring topics associated with tweets, and community agglomeration (Louvain) and the encoding of random walks (Infomap) methods to detect community structure of the users from their social connections. We examined the alignment between community structure and topics using several common clustering alignment measures and introduced a statistical measure of alignment based on the concentration of specific topics within a small number of communities. Visualizations of the topics and the alignment between topics and communities are presented to support the interpretation of the results in context of public health communication and identification of communities at risk of rejecting the safety and efficacy of HPV vaccines. RESULTS: We analyzed 285,417 Twitter posts (tweets) about HPV vaccines from 101,519 users connected by 4,387,524 social connections. Examining the alignment between the community structure and the topics of tweets, the results indicated that the Louvain community detection algorithm together with DMM produced consistently higher alignment values and that alignments were generally higher when the number of topics was lower. After applying the Louvain method and DMM with 30 topics and grouping semantically similar topics in a hierarchy, we characterized 163,148 (57.16%) tweets as evidence and advocacy, and 6244 (2.19%) tweets describing personal experiences. Among the 4548 users who posted experiential tweets, 3449 users (75.84%) were found in communities where the majority of tweets were about evidence and advocacy. CONCLUSIONS: The use of community detection in concert with topic modeling appears to be a useful way to characterize Twitter communities for the purpose of opinion surveillance in public health applications. Our approach may help identify online communities at risk of being influenced by negative opinions about public health interventions such as HPV vaccines.


Assuntos
Internet/estatística & dados numéricos , Vacinas contra Papillomavirus , Vigilância em Saúde Pública/métodos , Mídias Sociais/estatística & dados numéricos , Algoritmos , Humanos , Características de Residência/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...