Search | VHL Regional Portal

Extracting experimental parameter entities from scientific articles.

Farnsworth, Steele; Gurdin, Gabrielle; Vargas, Jorge; Mulyar, Andriy; Lewinski, Nastassja; McInnes, Bridget T.

J Biomed Inform ; 126: 103970, 2022 02.

Article in English | MEDLINE | ID: mdl-34920128

ABSTRACT

Systematic reviews are labor-intensive processes to combine all knowledge about a given topic into a coherent summary. Despite the high labor investment, they are necessary to create an exhaustive overview of current evidence relevant to a research question. In this work, we evaluate three state-of-the-art supervised multi-label sequence classification systems to automatically identify 24 different experimental design factors for the categories of Animal, Dose, Exposure, and Endpoint from journal articles describing the experiments related to toxicity and health effects of environmental agents. We then present an in depth analysis of the results evaluating the lexical diversity of the design parameters with respect to model performance, evaluating the impact of tokenization and non-contiguous mentions, and finally evaluating the dependencies between entities within the category entities. We demonstrate that in general, algorithms that use embedded representations of the sequences out-perform statistical algorithms, but that even these algorithms struggle with lexically diverse entities.

Subject(s)

Algorithms , Natural Language Processing , Systematic Reviews as Topic

Identifying Chemical Reactions and Their Associated Attributes in Patents.

Mahendran, Darshini; Gurdin, Gabrielle; Lewinski, Nastassja; Tang, Christina; McInnes, Bridget T.

Front Res Metr Anal ; 6: 688353, 2021.

Article in English | MEDLINE | ID: mdl-34322654

ABSTRACT

Chemical patents are an essential source of information about novel chemicals and chemical reactions. However, with the increasing volume of such patents, mining information about these chemicals and chemical reactions has become a time-intensive and laborious endeavor. In this study, we present a system to extract chemical reaction events from patents automatically. Our approach consists of two steps: 1) named entity recognition (NER)-the automatic identification of chemical reaction parameters from the corresponding text, and 2) event extraction (EE)-the automatic classifying and linking of entities based on their relationships to each other. For our NER system, we evaluate bidirectional long short-term memory (BiLSTM)-based and bidirectional encoder representations from transformer (BERT)-based methods. For our EE system, we evaluate BERT-based, convolutional neural network (CNN)-based, and rule-based methods. We evaluate our NER and EE components independently and as an end-to-end system, reporting the precision, recall, and F 1 score. Our results show that the BiLSTM-based method performed best at identifying the entities, and the CNN-based method performed best at extracting events.

Analysis of Inter-Domain and Cross-Domain Drug Review Polarity Classification.

Gurdin, Gabrielle; Vargas, Jorge A; Maffey, Luke G; Olex, Amy L; Lewinski, Nastassja A; McInnes, Bridget T.

AMIA Jt Summits Transl Sci Proc ; 2020: 201-210, 2020.

Article in English | MEDLINE | ID: mdl-32477639

ABSTRACT

Individuals increasingly rely on social media to discuss health-related issues. One way to provide easier access to relevant in- formation is through sentiment analysis - classifying text into polarity classes such as positive and negative. In this paper, we generated freely available datasets of WebMD.com drug reviews and star ratings for Common, Cancer, Depression, Diabetes, and Hypertension drugs. We explored four supervised learning models: Naive Bayes, Random Forests, Support Vector Machines, and Convolutional Neural Networks for the purpose of determining the polarity of drug reviews. We conducted inter-domain and cross-domain evaluations. We found that SVM obtained the highest f-measure on average and that cross-domain training produced similar or higher results to models trained directly on their respective datasets.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL