Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Vis Comput Graph ; 24(1): 256-266, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28866555

RESUMO

Visualizing outliers in massive datasets requires statistical pre-processing in order to reduce the scale of the problem to a size amenable to rendering systems like D3, Plotly or analytic systems like R or SAS. This paper presents a new algorithm, called hdoutliers, for detecting multidimensional outliers. It is unique for a) dealing with a mixture of categorical and continuous variables, b) dealing with big-p (many columns of data), c) dealing with big-n (many rows of data), d) dealing with outliers that mask other outliers, and e) dealing consistently with unidimensional and multidimensional datasets. Unlike ad hoc methods found in many machine learning papers, hdoutliers is based on a distributional model that allows outliers to be tagged with a probability. This critical feature reduces the likelihood of false discoveries.

2.
IEEE Trans Vis Comput Graph ; 20(12): 1624-32, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26356876

RESUMO

Scagnostics (Scatterplot Diagnostics) were developed by Wilkinson et al. based on an idea of Paul and John Tukey, in order to discern meaningful patterns in large collections of scatterplots. The Tukeys' original idea was intended to overcome the impediments involved in examining large scatterplot matrices (multiplicity of plots and lack of detail). Wilkinson's implementation enabled for the first time scagnostics computations on many points as well as many plots. Unfortunately, scagnostics are sensitive to scale transformations. We illustrate the extent of this sensitivity and show how it is possible to pair statistical transformations with scagnostics to enable discovery of hidden structures in data that are not discernible in untransformed visualizations.

3.
IEEE Trans Vis Comput Graph ; 19(3): 470-83, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23307611

RESUMO

We introduce a method (Scagnostic time series) and an application (TimeSeer) for organizing multivariate time series and for guiding interactive exploration through high-dimensional data. The method is based on nine characterizations of the 2D distributions of orthogonal pairwise projections on a set of points in multidimensional euclidean space. These characterizations include measures, such as, density, skewness, shape, outliers, and texture. Working directly with these Scagnostic measures, we can locate anomalous or interesting subseries for further analysis. Our application is designed to handle the types of doubly multivariate data series that are often found in security, financial, social, and other sectors.


Assuntos
Algoritmos , Gráficos por Computador , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Modelos Estatísticos , Software , Interface Usuário-Computador , Simulação por Computador , Análise Multivariada , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
4.
IEEE Trans Vis Comput Graph ; 18(2): 321-31, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21383412

RESUMO

Scientists conducting microarray and other experiments use circular Venn and Euler diagrams to analyze and illustrate their results. As one solution to this problem, this paper introduces a statistical model for fitting area-proportional Venn and Euler diagrams to observed data. The statistical model outlined in this paper includes a statistical loss function and a minimization procedure that enables formal estimation of the Venn/Euler area-proportional model for the first time. A significance test of the null hypothesis is computed for the solution. Residuals from the model are available for inspection. As a result, this algorithm can be used for both exploration and inference on real data sets. A Java program implementing this algorithm is available under the Mozilla Public License. An R function venneuler() is available as a package in CRAN and a plugin is available in Cytoscape.


Assuntos
Algoritmos , Biologia Computacional/métodos , Gráficos por Computador , Modelos Estatísticos , Internet , Software
5.
IEEE Trans Vis Comput Graph ; 16(6): 1044-52, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20975142

RESUMO

An ongoing challenge for information visualization is how to deal with over-plotting forced by ties or the relatively limited visual field of display devices. A popular solution is to represent local data density with area (bubble plots, treemaps), color (heatmaps), or aggregation (histograms, kernel densities, pixel displays). All of these methods have at least one of three deficiencies:1) magnitude judgments are biased because area and color have convex downward perceptual functions, 2) area, hue, and brightness have relatively restricted ranges of perceptual intensity compared to length representations, and/or 3) it is difficult to brush or link to individual cases when viewing aggregations. In this paper, we introduce a new technique for visualizing and interacting with datasets that preserves density information by stacking overlapping cases. The overlapping data can be points or lines or other geometric elements, depending on the type of plot. We show real-dataset applications of this stacking paradigm and compare them to other techniques that deal with over-plotting in high-dimensional displays.

6.
IEEE Trans Vis Comput Graph ; 12(6): 1363-72, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17073361

RESUMO

We introduce a method for organizing multivariate displays and for guiding interactive exploration through high-dimensional data. The method is based on nine characterizations of the 2D distributions of orthogonal pairwise projections on a set of points in multidimensional Euclidean space. These characterizations include such measures as density, skewness, shape, outliers, and texture. Statistical analysis of these measures leads to ways for 1) organizing 2D scatterplots of points for coherent viewing, 2) locating unusual (outlying) marginal 2D distributions of points for anomaly detection, and 3) sorting multivariate displays based on high-dimensional data, such as trees, parallel coordinates, and glyphs.


Assuntos
Algoritmos , Gráficos por Computador , Interpretação Estatística de Dados , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Análise Multivariada , Interface Usuário-Computador , Simulação por Computador , Reconhecimento Automatizado de Padrão/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...