Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.

Chen, Minjun; Wu, Yue; Wingerd, Byron; Liu, Zhichao; Xu, Joshua; Thakkar, Shraddha; Pedersen, Thomas J; Donnelly, Tom; Mann, Nicholas; Tong, Weida; Wolfinger, Russell D; Bao, Wenjun

Chen, Minjun; Wu, Yue; Wingerd, Byron; Liu, Zhichao; Xu, Joshua; Thakkar, Shraddha; Pedersen, Thomas J; Donnelly, Tom; Mann, Nicholas; Tong, Weida; Wolfinger, Russell D; Bao, Wenjun.

Afiliación

Chen M; Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States.
Wu Y; Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States.
Wingerd B; JMP Statistical Discovery LLC, Cary, NC, United States.
Liu Z; Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT, United States.
Xu J; Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States.
Thakkar S; Department of Pharmaceutical Sciences, University of Arkansas for Medical Sciences, Little Rock, AR, United States.
Pedersen TJ; JMP Statistical Discovery LLC, Cary, NC, United States.
Donnelly T; JMP Statistical Discovery LLC, Cary, NC, United States.
Mann N; Department of Mathematics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
Tong W; Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States.
Wolfinger RD; JMP Statistical Discovery LLC, Cary, NC, United States.
Bao W; JMP Statistical Discovery LLC, Cary, NC, United States.

Front Artif Intell ; 7: 1401810, 2024.

Article en En | MEDLINE | ID: mdl-38887604

ABSTRACT

ABSTRACT

Introduction:

Regulatory agencies generate a vast amount of textual data in the review process. For example, drug labeling serves as a valuable resource for regulatory agencies, such as U.S. Food and Drug Administration (FDA) and Europe Medical Agency (EMA), to communicate drug safety and effectiveness information to healthcare professionals and patients. Drug labeling also serves as a resource for pharmacovigilance and drug safety research. Automated text classification would significantly improve the analysis of drug labeling documents and conserve reviewer resources.

Methods:

We utilized artificial intelligence in this study to classify drug-induced liver injury (DILI)-related content from drug labeling documents based on FDA's DILIrank dataset. We employed text mining and XGBoost models and utilized the Preferred Terms of Medical queries for adverse event standards to simplify the elimination of common words and phrases while retaining medical standard terms for FDA and EMA drug label datasets. Then, we constructed a document term matrix using weights computed by Term Frequency-Inverse Document Frequency (TF-IDF) for each included word/term/token.

Results:

The automatic text classification model exhibited robust performance in predicting DILI, achieving cross-validation AUC scores exceeding 0.90 for both drug labels from FDA and EMA and literature abstracts from the Critical Assessment of Massive Data Analysis (CAMDA).

Discussion:

Moreover, the text mining and XGBoost functions demonstrated in this study can be applied to other text processing and classification tasks.

Palabras clave

CatBoost; LightGBM; Matthews correlation coefficient (MCC); TF-IDF; XGBoost; anatomical therapeutic chemical classification (ATC); area under the curve

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Front Artif Intell Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Suiza

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google