Large language model produces high accurate diagnosis of cancer from end-motif profiles of cell-free DNA.

Liu, Jilei; Shen, Hongru; Chen, Kexin; Li, Xiangchun

Liu, Jilei; Shen, Hongru; Chen, Kexin; Li, Xiangchun.

Afiliação

Liu J; Key Laboratory of Cancer Prevention and Therapy, Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin Medical University, Tianjin, 300060, China.
Shen H; Key Laboratory of Cancer Prevention and Therapy, Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin Medical University, Tianjin, 300060, China.
Chen K; Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Ca
Li X; Key Laboratory of Cancer Prevention and Therapy, Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin Medical University, Tianjin, 300060, China.

Brief Bioinform ; 25(5)2024 Jul 25.

Article em En | MEDLINE | ID: mdl-39222060

ABSTRACT

ABSTRACT

Instruction-tuned large language models (LLMs) demonstrate exceptional ability to align with human intentions. We present an LLM-based model-instruction-tuned LLM for assessment of cancer (iLLMAC)-that can detect cancer using cell-free deoxyribonucleic acid (cfDNA) end-motif profiles. Developed on plasma cfDNA sequencing data from 1135 cancer patients and 1106 controls across three datasets, iLLMAC achieved area under the receiver operating curve (AUROC) of 0.866 [95% confidence interval (CI), 0.773-0.959] for cancer diagnosis and 0.924 (95% CI, 0.841-1.0) for hepatocellular carcinoma (HCC) detection using 16 end-motifs. Performance increased with more motifs, reaching 0.886 (95% CI, 0.794-0.977) and 0.956 (95% CI, 0.89-1.0) for cancer diagnosis and HCC detection, respectively, with 64 end-motifs. On an external-testing set, iLLMAC achieved AUROC of 0.912 (95% CI, 0.849-0.976) for cancer diagnosis and 0.938 (95% CI, 0.885-0.992) for HCC detection with 64 end-motifs, significantly outperforming benchmarked methods. Furthermore, iLLMAC achieved high classification performance on datasets with bisulfite and 5-hydroxymethylcytosine sequencing. Our study highlights the effectiveness of LLM-based instruction-tuning for cfDNA-based cancer detection.

Assuntos

Carcinoma Hepatocelular; Ácidos Nucleicos Livres; Humanos; Ácidos Nucleicos Livres/sangue; Carcinoma Hepatocelular/diagnóstico; Carcinoma Hepatocelular/genética; Carcinoma Hepatocelular/sangue; Neoplasias Hepáticas/diagnóstico; Neoplasias Hepáticas/genética; Neoplasias Hepáticas/sangue; Neoplasias/diagnóstico; Neoplasias/genética; Neoplasias/sangue; Curva ROC; Biomarcadores Tumorais/genética; Biomarcadores Tumorais/sangue; Motivos de Nucleotídeos; Metilação de DNA

Palavras-chave

cell-free DNA; early cancer diagnosis; large language models

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Carcinoma Hepatocelular / Ácidos Nucleicos Livres Limite: Humans Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China País de publicação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google