Distilling large language models for matching patients to clinical trials.

Nievas, Mauro; Basu, Aditya; Wang, Yanshan; Singh, Hrituraj

Nievas, Mauro; Basu, Aditya; Wang, Yanshan; Singh, Hrituraj.

Afiliação

Nievas M; Triomics Research, Triomics, Inc., San Francisco, CA 94105, United States.
Basu A; Triomics Research, Triomics, Inc., Bengaluru, Karnataka 560102, India.
Wang Y; Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States.
Singh H; Triomics Research, Triomics, Inc., San Francisco, CA 94105, United States.

J Am Med Inform Assoc ; 31(9): 1953-1963, 2024 Sep 01.

Article em En | MEDLINE | ID: mdl-38641416

ABSTRACT

ABSTRACT

OBJECTIVE:

The objective of this study is to systematically examine the efficacy of both proprietary (GPT-3.5, GPT-4) and open-source large language models (LLMs) (LLAMA 7B, 13B, 70B) in the context of matching patients to clinical trials in healthcare. MATERIALS AND

METHODS:

The study employs a multifaceted evaluation framework, incorporating extensive automated and human-centric assessments along with a detailed error analysis for each model, and assesses LLMs' capabilities in analyzing patient eligibility against clinical trial's inclusion and exclusion criteria. To improve the adaptability of open-source LLMs, a specialized synthetic dataset was created using GPT-4, facilitating effective fine-tuning under constrained data conditions.

RESULTS:

The findings indicate that open-source LLMs, when fine-tuned on this limited and synthetic dataset, achieve performance parity with their proprietary counterparts, such as GPT-3.5.

DISCUSSION:

This study highlights the recent success of LLMs in the high-stakes domain of healthcare, specifically in patient-trial matching. The research demonstrates the potential of open-source models to match the performance of proprietary models when fine-tuned appropriately, addressing challenges like cost, privacy, and reproducibility concerns associated with closed-source proprietary LLMs.

CONCLUSION:

The study underscores the opportunity for open-source LLMs in patient-trial matching. To encourage further research and applications in this field, the annotated evaluation dataset and the fine-tuned LLM, Trial-LLAMA, are released for public use.

Assuntos

Ensaios Clínicos como Assunto; Seleção de Pacientes; Humanos; Linguagens de Programação; Processamento de Linguagem Natural

Palavras-chave

GPT-3.5; GPT-4; LLAMA; clinical trial matching; distillation; large language models

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Ensaios Clínicos como Assunto / Seleção de Pacientes Limite: Humans Idioma: En Revista: J Am Med Inform Assoc / J. am. med. inform. assoc / Journal of the american medical informatics association Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google