Pesquisa | Portal Regional da BVS

Performance of two large language models for data extraction in evidence synthesis.

Konet, Amanda; Thomas, Ian; Gartlehner, Gerald; Kahwati, Leila; Hilscher, Rainer; Kugley, Shannon; Crotty, Karen; Viswanathan, Meera; Chew, Robert.

Res Synth Methods ; 2024 Jun 19.

Artigo em Inglês | MEDLINE | ID: mdl-38895747

RESUMO

Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.

Data extraction for evidence synthesis using a large language model: A proof-of-concept study.

Gartlehner, Gerald; Kahwati, Leila; Hilscher, Rainer; Thomas, Ian; Kugley, Shannon; Crotty, Karen; Viswanathan, Meera; Nussbaumer-Streit, Barbara; Booth, Graham; Erskine, Nathaniel; Konet, Amanda; Chew, Robert.

Res Synth Methods ; 15(4): 576-589, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38432227

RESUMO

Data extraction is a crucial, yet labor-intensive and error-prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof-of-concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English-language, open-access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test-retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (n = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero-shot learning). Based on findings of our proof-of-concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.

Assuntos

Aprendizado de Máquina , Estudo de Prova de Conceito , Humanos , Reprodutibilidade dos Testes , Revisões Sistemáticas como Assunto , Ensaios Clínicos Controlados Aleatórios como Assunto , Algoritmos , Armazenamento e Recuperação da Informação/métodos , Idioma , Software , Processamento de Linguagem Natural , Projetos de Pesquisa

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA