Recherche | Index Medicus Global

Class-imbalance Prediction and High-dimensional Risk Factor Identification of Adverse Reactions of Traditional Chinese Medicine with Centralized Monitoring in Real-world Hospitals / 中国实验方剂学杂志

Feibiao XIE; Yehui PENG; Wei YANG; Jinfa TANG; Juan LIU; Weixia LI; Hui ZHANG; Dongyuan WU; Yali WU; Yuanming LENG; Xinghua XIANG.

Chinese Journal of Experimental Traditional Medical Formulae ; (24): 114-122, 2023.

Article Dans Chinois | WPRIM | ID: wpr-975163

Résumé

ObjectiveTo achieve high-dimensional prediction of class imbalanced of adverse drug reaction（ADR） of traditional Chinese medicine（TCM） and to classify and identify risk factors affecting the occurrence of ADR based on the post-marketing safety data of TCM monitored centrally in real world hospitals. MethodThe ensemble clustering resampling combined with regularized Group Lasso regression was used to perform high-dimensional balancing of ADR class-imbalanced data， and then to integrate the balanced datasets to achieve ADR prediction and the risk factor identification by category. ResultA practical example study of the proposed method on a monitoring data of TCM injection performed that the accuracy of the ADR prediction， the prediction sensitivity， the prediction specificity and the area under receiver operating characteristic curve（AUC） were all above 0.8 on the test set. Meanwhile， 40 risk factors affecting the occurrence of ADR were screened out from total 600 high-dimensional variables. And the effect of risk factors on the occurrence of ADR was identified by classification weighting. The important risk factors were classified as follows：past history， medication information， name of combined drugs， disease status， number of combined drugs and personal data. ConclusionIn the real world data of rare ADR with a large amount of clinical variables， this paper realized accurate ADR prediction on high-dimensional and class imbalanced condition， and classified and identified the key risk factors and their clinical significance of categories， so as to provide risk early warning for clinical rational drug use and combined drug use， as well as scientific basis for reevaluation of safety of post-marketing TCM.

Simulation study on variable selection method for high-dimensional biomedical data / 西安交通大学学报(医学版)

Jingxian WNAG; Peng ZHAO; Yemian LI; Yuhui YANG; Fangyao CHEN.

Journal of Xi'an Jiaotong University(Medical Sciences) ; (6): 628-632, 2021.

Article Dans Chinois | WPRIM | ID: wpr-1006702

Résumé

【Objective】 To compare the performance of five commonly used variable selection methods in high-dimensional biomedical data variable screening so as to explore the effects of sample size and association among candidate variables on screening results and provide evidence for the development of variable selection strategy in high-dimensional biomedical data analysis. 【Methods】 Variable selection algorithms were implemented based on R-programming language. Monte Carlo method was used to simulate high-dimensional biomedical data under different conditions to evaluate and compare the performance of different variable selection methods. Variable selection performance was evaluated based on the true positive rate and true negative rate in screening. 【Results】 For specified high-dimensional data, the variable selection performance was improved for all the methods when sample size was increased, and the association between candidate variables did affect variable screening results. Simulation results indicated that the elastic network algorithm yielded the best screening performance, LASSO algorithm took the second place, and ridge algorithm did not work at all. 【Conclusion】 Elastic network algorithm is an ideal variable screening method for high-dimensional data variable screening.

Radiomics as a Quantitative Imaging Biomarker: Practical Considerations and the Current Standpoint in Neuro-oncologic Studies / 대한핵의학회잡지

Ji-Eun PARK; Ho-Sung KIM; Ji-Eun PARK; Ho-Sung KIM; Ji-Eun PARK; Ho-Sung KIM.

Korean Journal of Nuclear Medicine ; : 99-108, 2018.

Article Dans Anglais | WPRIM | ID: wpr-786980

Résumé

Radiomics utilizes high-dimensional imaging data to discover the association with diagnostic, prognostic, predictive endpoint or radiogenomics. It is an emerging field of study that potentially depicts the intratumoral heterogeneity from quantitative and classified high-throughput data. The radiomics approach has an analytic pipeline where the imaging features are extracted, processed and analyzed. At this point, special data handling is essential because it faces issues of a high-dimensional biomarker compared to a single biomarker approach. This article describes the potential role of radiomics in oncologic studies, the basic analytic pipeline and special data handling with high-dimensional data to facilitate the radiomics approach as a tool for personalized medicine in oncology.

Sujets)

Caractéristiques de la population , Médecine de précision

An ensemble-based likelihood ratio approach for family-based genomic risk prediction / 浙江大学学报（英文版）（B辑：生物医学和生物技术）

Hui AN; Chang-Shuai WEI; Oliver WANG; Da-Hui WANG; Liang-Wen XU; Qing LU; Cheng-Yin YE.

Journal of Zhejiang University. Science. B ; (12): 935-947, 2018.

Article Dans Anglais | WPRIM | ID: wpr-1010434

Résumé

OBJECTIVE@#As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history.@*METHODS@#In this study, we propose an ensemble-based likelihood ratio (ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic (ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building.@*RESULTS@#Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level.@*CONCLUSIONS@#By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.

Sujets)

Femelle , Humains , Mâle , Aire sous la courbe , Simulation numérique , Trouble de la conduite/physiopathologie , Santé de la famille , Marqueurs génétiques , Prédisposition génétique à une maladie , Variation génétique , Génome humain , Étude d'association pangénomique , Génomique , Fonctions de vraisemblance , Modèles génétiques , Odds ratio , Pedigree , Courbe ROC , Reproductibilité des résultats , Facteurs de risque

Introduction to Bayesian variable selection methods in high-dimensional omics data analysis / 中华流行病学杂志

Xiaoqiang DONG; Shuhong XU; Ran TAO; Tong WANG.

Chinese Journal of Epidemiology ; (12): 679-683, 2017.

Article Dans Chinois | WPRIM | ID: wpr-737706

Résumé

With the rapid development of genome sequencing technology and bioinformatics in recent years,it has become possible to measure thousands of omics data which might be associated with the progress of diseases,i.e."high-dimensional data".This type of omics data have a common feature that the number of variable p is usually greater than the observation cases n,and often has high correlation between independent variables.Therefore,it is a great statistical challenge to identify really meaningful variables from omics data.This paper summarizes the methods of Bayesian variable selection in the analysis of high-dimensional data.

Introduction to Bayesian variable selection methods in high-dimensional omics data analysis / 中华流行病学杂志

Xiaoqiang DONG; Shuhong XU; Ran TAO; Tong WANG.

Chinese Journal of Epidemiology ; (12): 679-683, 2017.

Article Dans Chinois | WPRIM | ID: wpr-736238

Résumé

GraPT: Genomic InteRpreter about Predictive Toxicology

Jung-Hoon WOO; Yu-Rang PARK; Yong JUNG; Ji-Hun KIM; Ju-Han KIM.

Genomics & Informatics ; : 129-132, 2006.

Article Dans Anglais | WPRIM | ID: wpr-61948

Résumé

Toxicogenomics has recently emerged in the field of toxicology and the DNA microarray technique has become common strategy for predictive toxicology which studies molecular mechanism caused by exposure of chemical or environmental stress. Although microarray experiment offers extensive genomic information to the researchers, yet high dimensional characteristic of the data often makes it hard to extract meaningful result. Therefore we developed toxicant enrichment analysis similar to the common enrichment approach. We also developed web-based system graPT to enable considerable prediction of toxic endpoints of experimental chemical.

Sujets)

Séquençage par oligonucléotides en batterie , Toxicogénétique , Toxicologie

Data Mining for High Dimensional Data in Drug Discovery and Development

Kwan-R LEE; Daniel-C PARK; Xiwu LIN; Sergio ESLAVA.

Genomics & Informatics ; : 65-74, 2003.

Article Dans Anglais | WPRIM | ID: wpr-197484

Résumé

Data mining differs primarily from traditional data analysis on an important dimension, namely the scale of the data. That is the reason why not only statistical but also computer science principles are needed to extract information from large data sets. In this paper we briefly review data mining, its characteristics, typical data mining algorithms, and potential and ongoing applications of data mining at biopharmaceutical industries. The distinguishing characteristics of data mining lie in its understandability, scalability, its problem driven nature, and its analysis of retrospective or observational data in contrast to experimentally designed data. At a high level one can identify three types of problems for which data mining is useful: description, prediction and search. Brief review of data mining algorithms include decision trees and rules, nonlinear classification methods, memory-based methods, model-based clustering, and graphical dependency models. Application areas covered are discovery compound libraries, clinical trial and disease management data, genomics and proteomics, structural databases for candidate drug compounds, and other applications of pharmaceutical relevance.

Sujets)

Classification , Fouille de données , Ensemble de données , Arbres de décision , Prise en charge de la maladie , Découverte de médicament , Génomique , Protéomique , Études rétrospectives , Statistiques comme sujet

Résumé

Résumé

Résumé

Sujets)

Résumé

Sujets)

Résumé

Résumé

Résumé

Sujets)

Résumé

Sujets)

ENVOYER À:

SÉLECTION CITATIONS

Détails de la recherche