Search | VHL Regional Portal

Survival analysis in breast cancer: evaluating ensemble learning techniques for prediction.

Buyrukoglu, Gonca.

PeerJ Comput Sci ; 10: e2147, 2024.

Article in English | MEDLINE | ID: mdl-39145224

ABSTRACT

Breast cancer is most commonly faced with form of cancer amongst women worldwide. In spite of the fact that the breast cancer research and awareness have gained considerable momentum, there is still no one treatment due to disease heterogeneity. Survival data may be of specific interest in breast cancer studies to understand its dynamic and complex trajectories. This study copes with the most important covariates affecting the disease progression. The study utilizes the German Breast Cancer Study Group 2 (GBSG2) and the Molecular Taxonomy of Breast Cancer International Consortium dataset (METABRIC) datasets. In both datasets, interests lie in relapse of the disease and the time when the relapse happens. The three models, namely the Cox proportional hazards (PH) model, random survival forest (RSF) and conditional inference forest (Cforest) were employed to analyse the breast cancer datasets. The goal of this study is to apply these methods in prediction of breast cancer progression and compare their performances based on two different estimation methods: the bootstrap estimation and the bootstrap .632 estimation. The model performance was evaluated in concordance index (C-index) and prediction error curves (pec) for discrimination. The Cox PH model has a lower C-index and bigger prediction error compared to the RSF and the Cforest approach for both datasets. The analysis results of GBSG2 and METABRIC datasets reveal that the RSF and the Cforest algorithms provide non-parametric alternatives to Cox PH model for estimation of the survival probability of breast cancer patients.

Optimal zinc level and uncertainty quantification in agricultural soils via visible near-infrared reflectance and soil chemical properties.

Agyeman, Prince Chapman; Kebonye, Ndiye Michael; Khosravi, Vahid; Kingsley, John; Boruvka, Lubos; Vasát, Radim; Boateng, Charles Mario.

J Environ Manage ; 326(Pt A): 116701, 2023 Jan 15.

Article in English | MEDLINE | ID: mdl-36395645

ABSTRACT

Zinc (Zn) is a vital element required by all living creatures for optimal health and ecosystem functioning. Therefore, several researchers have modeled and mapped its occurrence and distribution in soils. Nonetheless, leveraging model predictive performances while coupling information derived from visible near-infrared (Vis-NIR) and soils (i.e. chemical properties) to estimate potential toxic elements (PTEs) like Zn in agricultural soils is largely untapped. This study applies two methods to rapidly monitor Zn concentration in agricultural soil. Firstly, employing Vis-NIR and machine learning algorithms (MLAs) (Context 1) and secondly, applying Vis-NIR, soil chemical properties (SCP), and MLAs (Context 2). For the Vis-NIR information, single and combined pretreatment methods were applied. The following MLAs were used: conditional inference forest (CIF), partial least squares regression (PLSR), M5 tree model (M5), extreme gradient boosting (EGB), and support vector machine regression (SVMR) respectively. For context 1, the results indicated that M5-MSC (M5 tree model-multiplicative scatter correction) with coefficient of determination (R2) = 0.72, root mean square error (RMSE) = 21.08 (mg/kg), median absolute error (MdAE) = 13.69 and ratio of performance to interquartile range (RPIQ) = 1.63 was promising. Regarding context 2, CIF with spectral pretreatment and soil properties [CIF-DWTLOGMSC + SCP (conditional inference forest-discrete wavelet transformation-logarithmic transformation-multiplicative scatter correction-soil chemical properties)] yielded the best performance of R2 = 0.86, RMSE = 14.52 (mg/kg), MdAE = 6.25 and RPIQ = 1.78. Altogether, for contexts 1 and 2, the CIF-DWTLOGMSC + SCP approach (context 2) was the best Zn model outcome for the agricultural soil. The uncertainty map revealed a low to high error distribution in context 1, and a low to moderate distribution in context 2 for all models except CIF, which had some patches with high uncertainty. We conclude that a multiple optimization approach for modeling Zn levels in agricultural soils is invaluable and may provide fast and reliable information needed for area-specific decision-making.

Subject(s)

Ecosystem , Soil , Uncertainty , Agriculture , Zinc

A comparative study of forest methods for time-to-event data: variable selection and predictive performance.

Liu, Yingxin; Zhou, Shiyu; Wei, Hongxia; An, Shengli.

BMC Med Res Methodol ; 21(1): 193, 2021 09 25.

Article in English | MEDLINE | ID: mdl-34563138

ABSTRACT

BACKGROUND: As a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF. METHODS: In this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of ranking top of the variable importance measures of the correct variables, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) and c-index to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction. RESULTS: Simulations show that three forests methods differ slightly in prediction performance. MSR-RF and RSF might perform better than CIF when there are only continuous or binary variables in the datasets. For variable selection performance, When there are multiple categorical variables in the datasets, the selection frequency of RSF seems to be lowest in most cases. MSR-RF and CIF have higher selection rates, and CIF perform well especially with the interaction term. The fact that correlation degree of the variables has little effect on the selection frequency indicates that three forest methods can handle data with correlation. When there are only continuous variables in the datasets, MSR-RF perform better. When there are only binary variables in the datasets, RSF and MSR-RF have more advantages than CIF. When the variable dimension increases, MSR-RF and RSF seem to be more robustthan CIF CONCLUSIONS: All three methods show advantages in prediction performances and variable selection performances under different situations. The recent proposed methodology MSR-RF possess practical value and is well worth popularizing. It is important to identify the appropriate method in real use according to the research aim and the nature of covariates.

Subject(s)

Machine Learning , Computer Simulation , Humans , Proportional Hazards Models

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL