Search | VHL Regional Portal

1.

Computational Efficient Approximations of the Concordance Probability in a Big Data Setting.

Oirbeek, Robin Van; Ponnet, Jolien; Baesens, Bart; Verdonck, Tim.

Big Data ; 2023 Jun 07.

Article in English | MEDLINE | ID: mdl-37289184

ABSTRACT

Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.

2.

Social Network Analytics for Supervised Fraud Detection in Insurance.

Óskarsdóttir, María; Ahmed, Waqas; Antonio, Katrien; Baesens, Bart; Dendievel, Rémi; Donas, Tom; Reynkens, Tom.

Risk Anal ; 42(8): 1872-1890, 2022 08.

Article in English | MEDLINE | ID: mdl-33547691

ABSTRACT

Insurance fraud occurs when policyholders file claims that are exaggerated or based on intentional damages. This contribution develops a fraud detection strategy by extracting insightful information from the social network of a claim. First, we construct a network by linking claims with all their involved parties, including the policyholders, brokers, experts, and garages. Next, we establish fraud as a social phenomenon in the network and use the BiRank algorithm with a fraud-specific query vector to compute a fraud score for each claim. From the network, we extract features related to the fraud scores as well as the claims' neighborhood structure. Finally, we combine these network features with the claim-specific features and build a supervised model with fraud in motor insurance as the target variable. Although we build a model for only motor insurance, the network includes claims from all available lines of business. Our results show that models with features derived from the network perform well when detecting fraud and even outperform the models using only the classical claim-specific features. Combining network and claim-specific features further improves the performance of supervised learning models to detect fraud. The resulting model flags highly suspicions claims that need to be further investigated. Our approach provides a guided and intelligent selection of claims and contributes to a more effective fraud investigation process.

Subject(s)

Fraud , Insurance , Algorithms , Social Networking , United States

3.

BART: BAckward Regression Trimming.

Baesens, Bart.

Big Data ; 7(3): 207-213, 2019 09.

Article in English | MEDLINE | ID: mdl-31373828

Subject(s)

Models, Statistical , Big Data , Regression Analysis , Software

4.

An Interview with Bart Baesens, One of the Authors of Principles of Database Management.

Lemahieu, Wilfried; Vanden Broucke, Seppe; Baesens, Bart.

Big Data ; 6(2): 69-71, 2018 06.

Article in English | MEDLINE | ID: mdl-29924650

Subject(s)

Big Data , Books , Information Storage and Retrieval

5.

Profit-Based Model Selection for Customer Retention Using Individual Customer Lifetime Values.

Óskarsdóttir, María; Baesens, Bart; Vanthienen, Jan.

Big Data ; 6(1): 53-65, 2018 03.

Article in English | MEDLINE | ID: mdl-29570412

ABSTRACT

The goal of customer retention campaigns, by design, is to add value and enhance the operational efficiency of businesses. For organizations that strive to retain their customers in saturated, and sometimes fast moving, markets such as the telecommunication and banking industries, implementing customer churn prediction models that perform well and in accordance with the business goals is vital. The expected maximum profit (EMP) measure is tailored toward this problem by taking into account the costs and benefits of a retention campaign and estimating its worth for the organization. Unfortunately, the measure assumes fixed and equal customer lifetime value (CLV) for all customers, which has been shown to not correspond well with reality. In this article, we extend the EMP measure to take into account the variability in the lifetime values of customers, thereby basing it on individual characteristics. We demonstrate how to incorporate the heterogeneity of CLVs when CLVs are known, when their prior distribution is known, and when neither is known. By taking into account individual CLVs, our proposed approach of measuring model performance gives novel insights when deciding on a customer retention campaign. The method is dependent on the characteristics of the customer base as is compliant with modern business analytics and accommodates the data-driven culture that has manifested itself within organizations.

Subject(s)

Commerce , Consumer Behavior , Economic Competition , Algorithms , Efficiency, Organizational , Models, Statistical , United States

6.

Special Issue on Profit-Driven Analytics.

Baesens, Bart; Verbeke, Wouter; Bravo, Cristián.

Big Data ; 6(1): 1-2, 2018 03.

Article in English | MEDLINE | ID: mdl-29570413

Subject(s)

Commerce , Data Analysis , Income

7.

Call for Papers: Special Issue on Profit-Driven Analytics.

Baesens, Bart; Verbeke, Wouter; Bravo, Cristián.

Big Data ; 5(2): 69-70, 2017 06.

Article in English | MEDLINE | ID: mdl-28632440

8.

Call for Papers: Special Issue on Profit-Driven Analytics.

Baesens, Bart; Verbeke, Wouter; Bravo, Cristián.

Big Data ; 5(1): 3-4, 2017 03.

Article in English | MEDLINE | ID: mdl-28234016

9.

Monitoring care processes in the gynecologic oncology department.

Caron, Filip; Vanthienen, Jan; Vanhaecht, Kris; Limbergen, Erik Van; De Weerdt, Jochen; Baesens, Bart.

Comput Biol Med ; 44: 88-96, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24377692

ABSTRACT

The care processes of healthcare providers are typically considered as human-centric, flexible, evolving, complex and multi-disciplinary. Consequently, acquiring an insight in the dynamics of these care processes can be an arduous task. A novel event log based approach for extracting valuable medical and organizational information on past executions of the care processes is presented in this study. Care processes are analyzed with the help of a preferential set of process mining techniques in order to discover recurring patterns, analyze and characterize process variants and identify adverse medical events.

Subject(s)

Delivery of Health Care , Genital Diseases, Female/therapy , Models, Theoretical , Neoplasms/therapy , Female , Humans

10.

A process mining-based investigation of adverse events in care processes.

Caron, Filip; Vanthienen, Jan; Vanhaecht, Kris; Van Limbergen, Erik; Deweerdt, Jochen; Baesens, Bart.

Health Inf Manag ; 43(1): 16-25, 2014.

Article in English | MEDLINE | ID: mdl-27010685

ABSTRACT

This paper proposes the Clinical Pathway Analysis Method (CPAM) approach that enables the extraction of valuable organisational and medical information on past clinical pathway executions from the event logs of healthcare information systems. The method deals with the complexity of real-world clinical pathways by introducing a perspective-based segmentation of the date-stamped event log. CPAM enables the clinical pathway analyst to effectively and efficiently acquire a profound insight into the clinical pathways. By comparing the specific medical conditions of patients with the factors used for characterising the different clinical pathway variants, the medical expert can identify the best therapeutic option. Process mining-based analytics enables the acquisition of valuable insights into clinical pathways, based on the complete audit traces of previous clinical pathway instances. Additionally, the methodology is suited to assess guideline compliance and analyse adverse events. Finally, the methodology provides support for eliciting tacit knowledge and providing treatment selection assistance.

Subject(s)

Critical Pathways/standards , Data Mining , Hospital Information Systems/standards , Process Assessment, Health Care/standards , Algorithms , Information Storage and Retrieval

11.

Predictors of Gleason Score (GS) upgrading on subsequent prostatectomy: a single Institution study in a cohort of patients with GS 6.

Mehta, Vikas; Rycyna, Kevin; Baesens, Bart M M; Barkan, Güliz A; Paner, Gladell P; Flanigan, Robert C; Wojcik, Eva M; Venkataraman, Girish.

Int J Clin Exp Pathol ; 5(6): 496-502, 2012.

Article in English | MEDLINE | ID: mdl-22949931

ABSTRACT

BACKGROUND: Biopsy Gleason score (bGS) remains an important prognostic indicator for adverse outcomes in Prostate Cancer (PCA). In the light of recent studies purporting difference in prognostic outcomes for the subgroups of GS7 group (primary Gleason pattern 4 vs. 3), upgrading of a bGS of 6 to a GS≥7 has serious implications. We sought to identify pre-operative factors associated with upgrading in a cohort of GS6 patients who underwent prostatectomy. DESIGN: We identified 281 cases of GS6 PCA on biopsy with subsequent prostatectomies. Using data on pre-operative variables (age, PSA, biopsy pathology parameters), logistic regression models (LRM) were developed to identify factors that could be used to predict upgrading to GS≥7 on subsequent prostatectomy. A decision tree (DT) was constructed. RESULTS: 92 of 281 cases (32.7%) were upgraded on subsequent prostatectomy. LRM identified a model with two variables with statistically significant ability to predict upgrading, including pre-biopsy PSA (Odds Ratio 8.66; 2.03-37.49, 95% CI) and highest percentage of cancer at any single biopsy site (Odds Ratio 1.03, 1.01-1.05, 95% CI). This two-parameter model yielded an area under curve of 0.67. The decision tree was constructed using only 3 leave nodes; with a test set classification accuracy of 70%. CONCLUSIONS: A simplistic model using clinical and biopsy data is able to predict the likelihood of upgrading of GS with an acceptable level of certainty. External validation of these findings along with development of a nomogram will aid in better stratifying the cohort of low risk patients as based on the GS.

Subject(s)

Adenocarcinoma/pathology , Prostatectomy , Prostatic Intraepithelial Neoplasia/pathology , Prostatic Neoplasms/pathology , Adenocarcinoma/blood , Adult , Aged , Biopsy , Cohort Studies , Decision Trees , Humans , Logistic Models , Male , Middle Aged , Neoplasm Grading , Neoplasm Invasiveness , Preoperative Period , Prognosis , Prostate-Specific Antigen/blood , Prostatic Intraepithelial Neoplasia/blood , Prostatic Neoplasms/blood , Risk Assessment/methods

12.

Guest editorial: special section on white box nonlinear prediction models.

Baesens, Bart; Martens, David; Setiono, Rudy; Zurada, Jacek M.

IEEE Trans Neural Netw ; 22(12): 2406-8, 2011 Dec.

Article in English | MEDLINE | ID: mdl-22167353

Subject(s)

Algorithms , Artificial Intelligence , Data Mining/methods , Databases, Factual , Nonlinear Dynamics , Computer Simulation

13.

Rule extraction from minimal neural networks for credit card screening.

Setiono, Rudy; Baesens, Bart; Mues, Christophe.

Int J Neural Syst ; 21(4): 265-76, 2011 Aug.

Article in English | MEDLINE | ID: mdl-21809474

ABSTRACT

While feedforward neural networks have been widely accepted as effective tools for solving classification problems, the issue of finding the best network architecture remains unresolved, particularly so in real-world problem settings. We address this issue in the context of credit card screening, where it is important to not only find a neural network with good predictive performance but also one that facilitates a clear explanation of how it produces its predictions. We show that minimal neural networks with as few as one hidden unit provide good predictive accuracy, while having the added advantage of making it easier to generate concise and comprehensible classification rules for the user. To further reduce model size, a novel approach is suggested in which network connections from the input units to this hidden unit are removed by a very straightaway pruning procedure. In terms of predictive accuracy, both the minimized neural networks and the rule sets generated from them are shown to compare favorably with other neural network based classifiers. The rules generated from the minimized neural networks are concise and thus easier to validate in a real-life setting.

Subject(s)

Algorithms , Artificial Intelligence , Neural Networks, Computer , Decision Making , Models, Statistical

14.

Morphometric signature differences in nuclei of Gleason pattern 4 areas in Gleason 7 prostate cancer with differing primary grades on needle biopsy.

Venkataraman, Girish; Rycyna, Kevin; Rabanser, Alexander; Heinze, Georg; Baesens, Bart M M; Ananthanarayanan, Vijayalakshmi; Paner, Gladell P; Barkan, Güliz A; Flanigan, Robert C; Wojcik, Eva M.

J Urol ; 181(1): 88-93; discussion 93-4, 2009 Jan.

Article in English | MEDLINE | ID: mdl-19012924

ABSTRACT

PURPOSE: Since clinically significant upgrading of the biopsy Gleason score has an adverse clinical impact, ancillary tools besides the visual determination of primary Gleason pattern are essential to aid in better risk stratification. MATERIALS AND METHODS: A total of 61 prostate biopsies were selected in patients with a diagnosis of Gleason score 7 prostatic adenocarcinoma, including 41 with primary Gleason pattern 3 and 20 with primary Gleason pattern 4. Slides from these tissues were stained using Feulgen stain, a nuclear DNA stain. Areas of Gleason pattern in all cases were analyzed for 40 nuclear morphometric descriptors of size, shape and chromatin using a CAS-200 system (BD). The primary outcome analyzed was the ability of morphometric features to identify visually determined primary Gleason pattern 4 on the biopsy. Data were analyzed using logistic regression as well as a C4.5 decision tree with and without preselection. RESULTS: Decision tree analysis yielded the best model. Automatic feature selection identified minimum nuclear diameter as the most discriminative feature in a 3-parameter model with 85% classification accuracy. Using a preselected 3-parameter model including minimum diameter, angularity and sum optical density the decision tree yielded a slightly lesser accuracy of around 79%. Bootstrap validation of logistic regression results revealed that there was no unique model that could significantly explain the variance in primary Gleason pattern status, although minimum nuclear diameter was the most frequently selected parameter. CONCLUSIONS: In this small cohort of patients with Gleason score 7 disease we report that Gleason pattern 4 nuclei from those with primary Gleason pattern 4 are generally larger with coarser chromatin compared with the Gleason pattern 4 in patients with primary Gleason pattern 3. These findings may aid in better risk stratification of the Gleason score 7 group by supplementing visual estimation of the percent of primary Gleason pattern 3 and 4 in the biopsy.

Subject(s)

Adenocarcinoma/pathology , Prostatic Neoplasms/pathology , Aged , Biopsy, Needle , Cell Nucleus/pathology , Humans , Male , Regression Analysis

15.

Minerva: sequential covering for rule extraction.

Huysmans, Johan; Setiono, Rudy; Baesens, Bart; Vanthienen, Jan.

IEEE Trans Syst Man Cybern B Cybern ; 38(2): 299-309, 2008 Apr.

Article in English | MEDLINE | ID: mdl-18348915

ABSTRACT

Various benchmarking studies have shown that artificial neural networks and support vector machines often have superior performance when compared to more traditional machine learning techniques. The main resistance against these newer techniques is based on their lack of interpretability: it is difficult for the human analyst to understand the reasoning behind these models' decisions. Various rule extraction (RE) techniques have been proposed to overcome this opacity restriction. These techniques are able to represent the behavior of the complex model with a set of easily understandable rules. However, most of the existing RE techniques can only be applied under limited circumstances, e.g., they assume that all inputs are categorical or can only be applied if the black-box model is a neural network. In this paper, we present Minerva, which is a new algorithm for RE. The main advantage of Minerva is its ability to extract a set of rules from any type of black-box model. Experiments show that the extracted models perform well in comparison with various other rule and decision tree learners.

Subject(s)

Algorithms , Artificial Intelligence , Decision Support Techniques , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods

ABSTRACT

ABSTRACT

Subject(s)

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL