Pesquisa | Portal Regional da BVS

SliceTeller: A Data Slice-Driven Approach for Machine Learning Model Validation.

Zhang, Xiaoyu; Ono, Jorge Piazentin; Song, Huan; Gou, Liang; Ma, Kwan-Liu; Ren, Liu.

IEEE Trans Vis Comput Graph ; 29(1): 842-852, 2023 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-36179005

RESUMO

Real-world machine learning applications need to be thoroughly evaluated to meet critical product requirements for model release, to ensure fairness for different groups or individuals, and to achieve a consistent performance in various scenarios. For example, in autonomous driving, an object classification model should achieve high detection rates under different conditions of weather, distance, etc. Similarly, in the financial setting, credit-scoring models must not discriminate against minority groups. These conditions or groups are called as "Data Slices". In product MLOps cycles, product developers must identify such critical data slices and adapt models to mitigate data slice problems. Discovering where models fail, understanding why they fail, and mitigating these problems, are therefore essential tasks in the MLOps life-cycle. In this paper, we present SliceTeller, a novel tool that allows users to debug, compare and improve machine learning models driven by critical data slices. SliceTeller automatically discovers problematic slices in the data, helps the user understand why models fail. More importantly, we present an efficient algorithm, SliceBoosting, to estimate trade-offs when prioritizing the optimization over certain slices. Furthermore, our system empowers model developers to compare and analyze different model versions during model iterations, allowing them to choose the model version best suitable for their applications. We evaluate our system with three use cases, including two real-world use cases of product development, to demonstrate the power of SliceTeller in the debugging and improvement of product-quality ML models.

PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines.

Ono, Jorge Piazentin; Castelo, Sonia; Lopez, Roque; Bertini, Enrico; Freire, Juliana; Silva, Claudio.

IEEE Trans Vis Comput Graph ; 27(2): 390-400, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33048694

RESUMO

In recent years, a wide variety of automated machine learning (AutoML) methods have been proposed to generate end-to-end ML pipelines. While these techniques facilitate the creation of models, given their black-box nature, the complexity of the underlying algorithms, and the large number of pipelines they derive, they are difficult for developers to debug. It is also challenging for machine learning experts to select an AutoML system that is well suited for a given problem. In this paper, we present the Pipeline Profiler, an interactive visualization tool that allows the exploration and comparison of the solution space of machine learning (ML) pipelines produced by AutoML systems. PipelineProfiler is integrated with Jupyter Notebook and can be combined with common data science tools to enable a rich set of analyses of the ML pipelines, providing users a better understanding of the algorithms that generated them as well as insights into how they can be improved. We demonstrate the utility of our tool through use cases where PipelineProfiler is used to better understand and improve a real-world AutoML system. Furthermore, we validate our approach by presenting a detailed analysis of a think-aloud experiment with six data scientists who develop and evaluate AutoML tools.

StatCast Dashboard: Exploration of Spatiotemporal Baseball Data.

Lage, Marcos; Ono, Jorge Piazentin; Cervone, Daniel; Chiang, Justin; Dietrich, Carlos; Silva, Claudio T.

IEEE Comput Graph Appl ; 36(5): 28-37, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-28113146

RESUMO

Major League Baseball (MLB) has a long history of providing detailed, high-quality data, leading to a tremendous surge in sports analytics research in recent years. In 2015, MLB.com released the StatCast spatiotemporal data-tracking system, which has been used in approximately 2,500 games since its inception to capture player and ball locations as well as semantically meaningful game events. This article presents a visualization and analytics infrastructure to help query and facilitate the analysis of this new tracking data. The goal is to go beyond descriptive statistics of individual plays, allowing analysts to study diverse collections of games and game events. The proposed system enables the exploration of the data using a simple querying interface and a set of flexible interactive visualization tools.

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA