Pesquisa | Portal Regional da BVS

1.

Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports.

Casey, Arlene; Davidson, Emma; Grover, Claire; Tobin, Richard; Grivas, Andreas; Zhang, Huayu; Schrempf, Patrick; O'Neil, Alison Q; Lee, Liam; Walsh, Michael; Pellie, Freya; Ferguson, Karen; Cvoro, Vera; Wu, Honghan; Whalley, Heather; Mair, Grant; Whiteley, William; Alex, Beatrice.

Front Digit Health ; 5: 1184919, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37840686

RESUMO

Background: Natural language processing (NLP) has the potential to automate the reading of radiology reports, but there is a need to demonstrate that NLP methods are adaptable and reliable for use in real-world clinical applications. Methods: We tested the F1 score, precision, and recall to compare NLP tools on a cohort from a study on delirium using images and radiology reports from NHS Fife and a population-based cohort (Generation Scotland) that spans multiple National Health Service health boards. We compared four off-the-shelf rule-based and neural NLP tools (namely, EdIE-R, ALARM+, ESPRESSO, and Sem-EHR) and reported on their performance for three cerebrovascular phenotypes, namely, ischaemic stroke, small vessel disease (SVD), and atrophy. Clinical experts from the EdIE-R team defined phenotypes using labelling techniques developed in the development of EdIE-R, in conjunction with an expert researcher who read underlying images. Results: EdIE-R obtained the highest F1 score in both cohorts for ischaemic stroke, ≥93%, followed by ALARM+, ≥87%. The F1 score of ESPRESSO was ≥74%, whilst that of Sem-EHR is ≥66%, although ESPRESSO had the highest precision in both cohorts, 90% and 98%. For F1 scores for SVD, EdIE-R scored ≥98% and ALARM+ ≥90%. ESPRESSO scored lowest with ≥77% and Sem-EHR ≥81%. In NHS Fife, F1 scores for atrophy by EdIE-R and ALARM+ were 99%, dropping in Generation Scotland to 96% for EdIE-R and 91% for ALARM+. Sem-EHR performed lowest for atrophy at 89% in NHS Fife and 73% in Generation Scotland. When comparing NLP tool output with brain image reads using F1 scores, ALARM+ scored 80%, outperforming EdIE-R at 66% in ischaemic stroke. For SVD, EdIE-R performed best, scoring 84%, with Sem-EHR 82%. For atrophy, EdIE-R and both ALARM+ versions were comparable at 80%. Conclusions: The four NLP tools show varying F1 (and precision/recall) scores across all three phenotypes, although more apparent for ischaemic stroke. If NLP tools are to be used in clinical settings, this cannot be performed "out of the box." It is essential to understand the context of their development to assess whether they are suitable for the task at hand or whether further training, re-training, or modification is required to adapt tools to the target task.

2.

The reporting quality of natural language processing studies: systematic review of studies of radiology reports.

Davidson, Emma M; Poon, Michael T C; Casey, Arlene; Grivas, Andreas; Duma, Daniel; Dong, Hang; Suárez-Paniagua, Víctor; Grover, Claire; Tobin, Richard; Whalley, Heather; Wu, Honghan; Alex, Beatrice; Whiteley, William.

BMC Med Imaging ; 21(1): 142, 2021 10 02.

Artigo em Inglês | MEDLINE | ID: mdl-34600486

RESUMO

BACKGROUND: Automated language analysis of radiology reports using natural language processing (NLP) can provide valuable information on patients' health and disease. With its rapid development, NLP studies should have transparent methodology to allow comparison of approaches and reproducibility. This systematic review aims to summarise the characteristics and reporting quality of studies applying NLP to radiology reports. METHODS: We searched Google Scholar for studies published in English that applied NLP to radiology reports of any imaging modality between January 2015 and October 2019. At least two reviewers independently performed screening and completed data extraction. We specified 15 criteria relating to data source, datasets, ground truth, outcomes, and reproducibility for quality assessment. The primary NLP performance measures were precision, recall and F1 score. RESULTS: Of the 4,836 records retrieved, we included 164 studies that used NLP on radiology reports. The commonest clinical applications of NLP were disease information or classification (28%) and diagnostic surveillance (27.4%). Most studies used English radiology reports (86%). Reports from mixed imaging modalities were used in 28% of the studies. Oncology (24%) was the most frequent disease area. Most studies had dataset size > 200 (85.4%) but the proportion of studies that described their annotated, training, validation, and test set were 67.1%, 63.4%, 45.7%, and 67.7% respectively. About half of the studies reported precision (48.8%) and recall (53.7%). Few studies reported external validation performed (10.8%), data availability (8.5%) and code availability (9.1%). There was no pattern of performance associated with the overall reporting quality. CONCLUSIONS: There is a range of potential clinical applications for NLP of radiology reports in health services and research. However, we found suboptimal reporting quality that precludes comparison, reproducibility, and replication. Our results support the need for development of reporting standards specific to clinical NLP studies.

Assuntos

Processamento de Linguagem Natural , Radiografia , Radiologia/normas , Conjuntos de Dados como Assunto , Humanos , Reprodutibilidade dos Testes , Relatório de Pesquisa/normas

3.

A systematic review of natural language processing applied to radiology reports.

Casey, Arlene; Davidson, Emma; Poon, Michael; Dong, Hang; Duma, Daniel; Grivas, Andreas; Grover, Claire; Suárez-Paniagua, Víctor; Tobin, Richard; Whiteley, William; Wu, Honghan; Alex, Beatrice.

BMC Med Inform Decis Mak ; 21(1): 179, 2021 06 03.

Artigo em Inglês | MEDLINE | ID: mdl-34082729

RESUMO

BACKGROUND: Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports. METHODS: We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics. RESULTS: We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results. CONCLUSIONS: Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.

Assuntos

Sistemas de Informação em Radiologia , Radiologia , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Reprodutibilidade dos Testes

4.

Text mining brain imaging reports.

Alex, Beatrice; Grover, Claire; Tobin, Richard; Sudlow, Cathie; Mair, Grant; Whiteley, William.

J Biomed Semantics ; 10(Suppl 1): 23, 2019 11 12.

Artigo em Inglês | MEDLINE | ID: mdl-31711539

RESUMO

BACKGROUND: With the improvements to text mining technology and the availability of large unstructured Electronic Healthcare Records (EHR) datasets, it is now possible to extract structured information from raw text contained within EHR at reasonably high accuracy. We describe a text mining system for classifying radiologists' reports of CT and MRI brain scans, assigning labels indicating occurrence and type of stroke, as well as other observations. Our system, the Edinburgh Information Extraction for Radiology reports (EdIE-R) system, which we describe here, was developed and tested on a collection of radiology reports.The work reported in this paper is based on 1168 radiology reports from the Edinburgh Stroke Study (ESS), a hospital-based register of stroke and transient ischaemic attack patients. We manually created annotations for this data in parallel with developing the rule-based EdIE-R system to identify phenotype information related to stroke in radiology reports. This process was iterative and domain expert feedback was considered at each iteration to adapt and tune the EdIE-R text mining system which identifies entities, negation and relations between entities in each report and determines report-level labels (phenotypes). RESULTS: The inter-annotator agreement (IAA) for all types of annotations is high at 96.96 for entities, 96.46 for negation, 95.84 for relations and 94.02 for labels. The equivalent system scores on the blind test set are equally high at 95.49 for entities, 94.41 for negation, 98.27 for relations and 96.39 for labels for the first annotator and 96.86, 96.01, 96.53 and 92.61, respectively for the second annotator. CONCLUSION: Automated reading of such EHR data at such high levels of accuracies opens up avenues for population health monitoring and audit, and can provide a resource for epidemiological studies. We are in the process of validating EdIE-R in separate larger cohorts in NHS England and Scotland. The manually annotated ESS corpus will be available for research purposes on application.

Assuntos

Encéfalo/diagnóstico por imagem , Mineração de Dados , Neuroimagem , Relatório de Pesquisa , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural

5.

A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records.

Wheater, Emily; Mair, Grant; Sudlow, Cathie; Alex, Beatrice; Grover, Claire; Whiteley, William.

BMC Med Inform Decis Mak ; 19(1): 184, 2019 09 09.

Artigo em Inglês | MEDLINE | ID: mdl-31500613

RESUMO

BACKGROUND: Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS). METHODS: We used anonymized text brain imaging reports from a cohort study of stroke/TIA patients and from a regional hospital to develop and test an NLP algorithm. Two experts marked up text in 1692 reports for 24 cerebrovascular and other neurological phenotypes. We developed and tested a rule-based NLP algorithm first within the cohort study, and further evaluated it in the reports from the regional hospital. RESULTS: The agreement between expert readers was excellent (Cohen's κ =0.93) in both datasets. In the final test dataset (n = 700) in unseen regional hospital reports, the algorithm had very good performance for a report of any ischaemic stroke [sensitivity 89% (95% CI:81-94); positive predictive value (PPV) 85% (76-90); specificity 100% (95% CI:0.99-1.00)]; any haemorrhagic stroke [sensitivity 96% (95% CI: 80-99), PPV 72% (95% CI:55-84); specificity 100% (95% CI:0.99-1.00)]; brain tumours [sensitivity 96% (CI:87-99); PPV 84% (73-91); specificity: 100% (95% CI:0.99-1.00)] and cerebral small vessel disease and cerebral atrophy (sensitivity, PPV and specificity all > 97%). We obtained few reports of subarachnoid haemorrhage, microbleeds or subdural haematomas. In 110,695 reports from NHS Tayside, atrophy (n = 28,757, 26%), small vessel disease (15,015, 14%) and old, deep ischaemic strokes (10,636, 10%) were the commonest findings. CONCLUSIONS: An NLP algorithm can be developed in UK NHS radiology records to allow identification of cohorts of patients with important brain imaging phenotypes at a scale that would otherwise not be possible.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Neuroimagem , Radiologia , Adulto , Idoso , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Medicina Estatal , Acidente Vascular Cerebral/diagnóstico por imagem , Reino Unido , Adulto Jovem

6.

Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content.

Jones, Shawn M; Van de Sompel, Herbert; Shankar, Harihar; Klein, Martin; Tobin, Richard; Grover, Claire.

PLoS One ; 11(12): e0167475, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27911955

RESUMO

Increasingly, scholarly articles contain URI references to "web at large" resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they report on. A reader who visits a web at large resource by following a URI reference in an article, some time after its publication, is led to believe that the resource's content is representative of what the author originally referenced. However, due to the dynamic nature of the web, that may very well not be the case. We reuse a dataset from a previous study in which several authors of this paper were involved, and investigate to what extent the textual content of web at large resources referenced in a vast collection of Science, Technology, and Medicine (STM) articles published between 1997 and 2012 has remained stable since the publication of the referencing article. We do so in a two-step approach that relies on various well-established similarity measures to compare textual content. In a first step, we use 19 web archives to find snapshots of referenced web at large resources that have textual content that is representative of the state of the resource around the time of publication of the referencing paper. We find that representative snapshots exist for about 30% of all URI references. In a second step, we compare the textual content of representative snapshots with that of their live web counterparts. We find that for over 75% of references the content has drifted away from what it was when referenced. These results raise significant concerns regarding the long term integrity of the web-based scholarly record and call for the deployment of techniques to combat these problems.

Assuntos

Internet , Publicações

7.

Use of the Edinburgh geoparser for georeferencing digitized historical collections.

Grover, Claire; Tobin, Richard; Byrne, Kate; Woollard, Matthew; Reid, James; Dunn, Stuart; Ball, Julian.

Philos Trans A Math Phys Eng Sci ; 368(1925): 3875-89, 2010 Aug 28.

Artigo em Inglês | MEDLINE | ID: mdl-20643682

RESUMO

We report on two JISC-funded projects that aimed to enrich the metadata of digitized historical collections with georeferences and other information automatically computed using geoparsing and related information extraction technologies. Understanding location is a critical part of any historical research, and the nature of the collections makes them an interesting case study for testing automated methodologies for extracting content. The two projects (GeoDigRef and Embedding GeoCrossWalk) have looked at how automatic georeferencing of resources might be useful in developing improved geographical search capacities across collections. In this paper, we describe the work that was undertaken to configure the geoparser for the collections as well as the evaluations that were performed.

8.

Automating curation using a natural language processing pipeline.

Alex, Beatrice; Grover, Claire; Haddow, Barry; Kabadjov, Mijail; Klein, Ewan; Matthews, Michael; Tobin, Richard; Wang, Xinglong.

Genome Biol ; 9 Suppl 2: S10, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18834488

RESUMO

BACKGROUND: The tasks in BioCreative II were designed to approximate some of the laborious work involved in curating biomedical research papers. The approach to these tasks taken by the University of Edinburgh team was to adapt and extend the existing natural language processing (NLP) system that we have developed as part of a commercial curation assistant. Although this paper concentrates on using NLP to assist with curation, the system can be equally employed to extract types of information from the literature that is immediately relevant to biologists in general. RESULTS: Our system was among the highest performing on the interaction subtasks, and competitive performance on the gene mention task was achieved with minimal development effort. For the gene normalization task, a string matching technique that can be quickly applied to new domains was shown to perform close to average. CONCLUSION: The technologies being developed were shown to be readily adapted to the BioCreative II tasks. Although high performance may be obtained on individual tasks such as gene mention recognition and normalization, and document classification, tasks in which a number of components must be combined, such as detection and normalization of interacting protein pairs, are still challenging for NLP systems.

Assuntos

Automação , Processamento de Linguagem Natural , Genes , Reprodutibilidade dos Testes

9.

Assisted curation: does text mining really help?

Alex, Beatrice; Grover, Claire; Haddow, Barry; Kabadjov, Mijail; Klein, Ewan; Matthews, Michael; Roebuck, Stuart; Tobin, Richard; Wang, Xinglong.

Pac Symp Biocomput ; : 556-67, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18229715

RESUMO

Although text mining shows considerable promise as a tool for supporting the curation of biomedical text, there is little concrete evidence as to its effectiveness. We report on three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP), together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs). In our curation scenario, we found that a maximum speed-up of 1/3 in curation time can be expected if NLP output is perfectly accurate. The preference of one curator for consistent NLP output and output with high recall needs to be confirmed in a larger study with several curators.

Assuntos

Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Inteligência Artificial , Biologia Computacional , Mapeamento de Interação de Proteínas/estatística & dados numéricos

10.

Exploring the boundaries: gene and protein identification in biomedical text.

Finkel, Jenny; Dingare, Shipra; Manning, Christopher D; Nissim, Malvina; Alex, Beatrice; Grover, Claire.

BMC Bioinformatics ; 6 Suppl 1: S5, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-15960839

RESUMO

BACKGROUND: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. METHODS: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. RESULTS: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. CONCLUSION: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.

Assuntos

Pesquisa Biomédica/classificação , Genes , Literatura , Proteínas/classificação , Pesquisa Biomédica/métodos , Biologia Computacional/classificação , Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/classificação , Armazenamento e Recuperação da Informação/métodos , Terminologia como Assunto

11.

A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations.

Dingare, Shipra; Nissim, Malvina; Finkel, Jenny; Manning, Christopher; Grover, Claire.

Comp Funct Genomics ; 6(1-2): 77-85, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-18629295

RESUMO

We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA