Search | VHL Regional Portal

Extracting experimental parameter entities from scientific articles.

Farnsworth, Steele; Gurdin, Gabrielle; Vargas, Jorge; Mulyar, Andriy; Lewinski, Nastassja; McInnes, Bridget T.

J Biomed Inform ; 126: 103970, 2022 02.

Article in English | MEDLINE | ID: mdl-34920128

ABSTRACT

Systematic reviews are labor-intensive processes to combine all knowledge about a given topic into a coherent summary. Despite the high labor investment, they are necessary to create an exhaustive overview of current evidence relevant to a research question. In this work, we evaluate three state-of-the-art supervised multi-label sequence classification systems to automatically identify 24 different experimental design factors for the categories of Animal, Dose, Exposure, and Endpoint from journal articles describing the experiments related to toxicity and health effects of environmental agents. We then present an in depth analysis of the results evaluating the lexical diversity of the design parameters with respect to model performance, evaluating the impact of tokenization and non-contiguous mentions, and finally evaluating the dependencies between entities within the category entities. We demonstrate that in general, algorithms that use embedded representations of the sequences out-perform statistical algorithms, but that even these algorithms struggle with lexically diverse entities.

Subject(s)

Algorithms , Natural Language Processing , Systematic Reviews as Topic

MT-clinical BERT: scaling clinical information extraction with multitask learning.

Mulyar, Andriy; Uzuner, Ozlem; McInnes, Bridget.

J Am Med Inform Assoc ; 28(10): 2108-2115, 2021 09 18.

Article in English | MEDLINE | ID: mdl-34333635

ABSTRACT

OBJECTIVE: Clinical notes contain an abundance of important, but not-readily accessible, information about patients. Systems that automatically extract this information rely on large amounts of training data of which there exists limited resources to create. Furthermore, they are developed disjointly, meaning that no information can be shared among task-specific systems. This bottleneck unnecessarily complicates practical application, reduces the performance capabilities of each individual solution, and associates the engineering debt of managing multiple information extraction systems. MATERIALS AND METHODS: We address these challenges by developing Multitask-Clinical BERT: a single deep learning model that simultaneously performs 8 clinical tasks spanning entity extraction, personal health information identification, language entailment, and similarity by sharing representations among tasks. RESULTS: We compare the performance of our multitasking information extraction system to state-of-the-art BERT sequential fine-tuning baselines. We observe a slight but consistent performance degradation in MT-Clinical BERT relative to sequential fine-tuning. DISCUSSION: These results intuitively suggest that learning a general clinical text representation capable of supporting multiple tasks has the downside of losing the ability to exploit dataset or clinical note-specific properties when compared to a single, task-specific model. CONCLUSIONS: We find our single system performs competitively with all state-the-art task-specific systems while also benefiting from massive computational benefits at inference.

Subject(s)

Information Storage and Retrieval , Natural Language Processing , Humans , Language

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL