Search | VHL Regional Portal

1.

Visualization for Trust in Machine Learning Revisited: The State of the Field in 2023.

Chatzimparmpas, Angelos; Kucher, Kostiantyn; Kerren, Andreas.

IEEE Comput Graph Appl ; 44(3): 99-113, 2024.

Article in English | MEDLINE | ID: mdl-38294921

ABSTRACT

Visualization for explainable and trustworthy machine learning remains one of the most important and heavily researched fields within information visualization and visual analytics with various application domains, such as medicine, finance, and bioinformatics. After our 2020 state-of-the-art report comprising 200 techniques, we have persistently collected peer-reviewed articles describing visualization techniques, categorized them based on the previously established categorization schema consisting of 119 categories, and provided the resulting collection of 542 techniques in an online survey browser. In this survey article, we present the updated findings of new analyses of this dataset as of fall 2023 and discuss trends, insights, and eight open challenges for using visualizations in machine learning. Our results corroborate the rapidly growing trend of visualization techniques for increasing trust in machine learning models in the past three years, with visualization found to help improve popular model explainability methods and check new deep learning architectures, for instance.

2.

Visual analysis of blow molding machine multivariate time series data.

Musleh, Maath; Chatzimparmpas, Angelos; Jusufi, Ilir.

J Vis (Tokyo) ; 25(6): 1329-1342, 2022.

Article in English | MEDLINE | ID: mdl-35845181

ABSTRACT

Abstract: The recent development in the data analytics field provides a boost in production for modern industries. Small-sized factories intend to take full advantage of the data collected by sensors used in their machinery. The ultimate goal is to minimize cost and maximize quality, resulting in an increase in profit. In collaboration with domain experts, we implemented a data visualization tool to enable decision-makers in a plastic factory to improve their production process. The tool is an interactive dashboard with multiple coordinated views supporting the exploration from both local and global perspectives. In summary, we investigate three different aspects: methods for preprocessing multivariate time series data, clustering approaches for the already refined data, and visualization techniques that aid domain experts in gaining insights into the different stages of the production process. Here we present our ongoing results grounded in a human-centered development process. We adopt a formative evaluation approach to continuously upgrade our dashboard design that eventually meets partners' requirements and follows the best practices within the field. We also conducted a case study with a domain expert to validate the potential application of the tool in the real-life context. Finally, we assessed the usability and usefulness of the tool with a two-layer summative evaluation that showed encouraging results.

3.

FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches.

Chatzimparmpas, Angelos; Martins, Rafael M; Kucher, Kostiantyn; Kerren, Andreas.

IEEE Trans Vis Comput Graph ; 28(4): 1773-1791, 2022 04.

Article in English | MEDLINE | ID: mdl-34990365

ABSTRACT

The machine learning (ML) life cycle involves a series of iterative steps, from the effective gathering and preparation of the data-including complex feature engineering processes-to the presentation and improvement of results, with various algorithms to choose from in every step. Feature engineering in particular can be very beneficial for ML, leading to numerous improvements such as boosting the predictive results, decreasing computational times, reducing excessive noise, and increasing the transparency behind the decisions taken during the training. Despite that, while several visual analytics tools exist to monitor and control the different stages of the ML life cycle (especially those related to data and algorithms), feature engineering support remains inadequate. In this paper, we present FeatureEnVi, a visual analytics system specifically designed to assist with the feature engineering process. Our proposed system helps users to choose the most important feature, to transform the original features into powerful alternatives, and to experiment with different feature generation combinations. Additionally, data space slicing allows users to explore the impact of features on both local and global scales. FeatureEnVi utilizes multiple automatic feature selection techniques; furthermore, it visually guides users with statistical evidence about the influence of each feature (or subsets of features). The final outcome is the extraction of heavily engineered features, evaluated by multiple validation metrics. The usefulness and applicability of FeatureEnVi are demonstrated with two use cases and a case study. We also report feedback from interviews with two ML experts and a visualization researcher who assessed the effectiveness of our system.

Subject(s)

Computer Graphics , Machine Learning , Algorithms

4.

StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics.

Chatzimparmpas, Angelos; Martins, Rafael M; Kucher, Kostiantyn; Kerren, Andreas.

IEEE Trans Vis Comput Graph ; 27(2): 1547-1557, 2021 02.

Article in English | MEDLINE | ID: mdl-33048687

ABSTRACT

In machine learning (ML), ensemble methods-such as bagging, boosting, and stacking-are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process. This challenge stems from the enormous space of available solutions, with different sets of data instances and features that could be used for training, several algorithms to choose from, and instantiations of these algorithms using diverse parameters (i.e., models) that perform differently according to various metrics. In this work, we present a knowledge generation model, which supports ensemble learning with the use of visualization, and a visual analytics system for stacked generalization. Our system, StackGenVis, assists users in dynamically adapting performance metrics, managing data instances, selecting the most important features for a given data set, choosing a set of top-performant and diverse algorithms, and measuring the predictive performance. In consequence, our proposed tool helps users to decide between distinct models and to reduce the complexity of the resulting stack by removing overpromising and underperforming models. The applicability and effectiveness of StackGenVis are demonstrated with two use cases: a real-world healthcare data set and a collection of data related to sentiment/stance detection in texts. Finally, the tool has been evaluated through interviews with three ML experts.

5.

t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections.

Chatzimparmpas, Angelos; Martins, Rafael M; Kerren, Andreas.

IEEE Trans Vis Comput Graph ; 26(8): 2696-2714, 2020 Aug.

Article in English | MEDLINE | ID: mdl-32305922

ABSTRACT

t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this article, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool's effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL