Search | VHL Regional Portal

1.

KG4Vis: A Knowledge Graph-Based Approach for Visualization Recommendation.

Li, Haotian; Wang, Yong; Zhang, Songheng; Song, Yangqiu; Qu, Huamin.

IEEE Trans Vis Comput Graph ; 28(1): 195-205, 2022 Jan.

Article in English | MEDLINE | ID: mdl-34587080

ABSTRACT

Visualization recommendation or automatic visualization generation can significantly lower the barriers for general users to rapidly create effective data visualizations, especially for those users without a background in data visualizations. However, existing rule-based approaches require tedious manual specifications of visualization rules by visualization experts. Other machine learning-based approaches often work like black-box and are difficult to understand why a specific visualization is recommended, limiting the wider adoption of these approaches. This paper fills the gap by presenting KG4Vis, a knowledge graph (KG)-based approach for visualization recommendation. It does not require manual specifications of visualization rules and can also guarantee good explainability. Specifically, we propose a framework for building knowledge graphs, consisting of three types of entities (i.e., data features, data columns and visualization design choices) and the relations between them, to model the mapping rules between data and effective visualizations. A TransE-based embedding technique is employed to learn the embeddings of both entities and relations of the knowledge graph from existing dataset-visualization pairs. Such embeddings intrinsically model the desirable visualization rules. Then, given a new dataset, effective visualizations can be inferred from the knowledge graph with semantically meaningful rules. We conducted extensive evaluations to assess the proposed approach, including quantitative comparisons, case studies and expert interviews. The results demonstrate the effectiveness of our approach.

2.

Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks.

Wang, Chenguang; Song, Yangqiu; El-Kishky, Ahmed; Roth, Dan; Zhang, Ming; Han, Jiawei.

KDD ; 2015: 1215-1224, 2015 Aug.

Article in English | MEDLINE | ID: mdl-26705504

ABSTRACT

One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, Word-Net. Experimental results on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features.

3.

KnowSim: A Document Similarity Measure on Structured Heterogeneous Information Networks.

Wang, Chenguang; Song, Yangqiu; Li, Haoran; Zhang, Ming; Han, Jiawei.

Proc IEEE Int Conf Data Min ; 2015: 1015-1020, 2015 Nov.

Article in English | MEDLINE | ID: mdl-27034626

ABSTRACT

As a fundamental task, document similarity measure has broad impact to document-based classification, clustering and ranking. Traditional approaches represent documents as bag-of-words and compute document similarities using measures like cosine, Jaccard, and dice. However, entity phrases rather than single words in documents can be critical for evaluating document relatedness. Moreover, types of entities and links between entities/words are also informative. We propose a method to represent a document as a typed heterogeneous information network (HIN), where the entities and relations are annotated with types. Multiple documents can be linked by the words and entities in the HIN. Consequently, we convert the document similarity problem to a graph distance problem. Intuitively, there could be multiple paths between a pair of documents. We propose to use the meta-path defined in HIN to compute distance between documents. Instead of burdening user to define meaningful meta-paths, an automatic method is proposed to rank the meta-paths. Given the meta-paths associated with ranking scores, an HIN-based similarity measure, KnowSim, is proposed to compute document similarities. Using Freebase, a well-known world knowledge base, to conduct semantic parsing and construct HIN for documents, our experiments on 20Newsgroups and RCV1 datasets show that KnowSim generates impressive high-quality document clustering.

4.

ImageHive: interactive content-aware image summarization.

Tan, Li; Song, Yangqiu; Liu, Shixia; Xie, Lexing.

IEEE Comput Graph Appl ; 32(1): 46-55, 2012.

Article in English | MEDLINE | ID: mdl-24808292

ABSTRACT

ImageHive communicates information about an image collection by generating a summary image that preserves the relationships between images and avoids occluding their salient parts. It uses a constrained graph-layout algorithm first, to preserve image similarities and keep important parts visible, and then a constrained Voronoi tessellation algorithm to locally refine the layout and tile the image plane.

5.

TextFlow: towards better understanding of evolving topics in text.

Cui, Weiwei; Liu, Shixia; Tan, Li; Shi, Conglei; Song, Yangqiu; Gao, Zekai J; Tong, Xin; Qu, Huamin.

IEEE Trans Vis Comput Graph ; 17(12): 2412-21, 2011 Dec.

Article in English | MEDLINE | ID: mdl-22034362

ABSTRACT

Understanding how topics evolve in text data is an important and challenging task. Although much work has been devoted to topic analysis, the study of topic evolution has largely been limited to individual topics. In this paper, we introduce TextFlow, a seamless integration of visualization and topic mining techniques, for analyzing various evolution patterns that emerge from multiple topics. We first extend an existing analysis technique to extract three-level features: the topic evolution trend, the critical event, and the keyword correlation. Then a coherent visualization that consists of three new visual components is designed to convey complex relationships between them. Through interaction, the topic mining model and visualization can communicate with each other to help users refine the analysis result and gain insights into the data progressively. Finally, two case studies are conducted to demonstrate the effectiveness and usefulness of TextFlow in helping users understand the major topic evolution patterns in time-varying text data.

6.

Parallel spectral clustering in distributed systems.

Chen, Wen-Yen; Song, Yangqiu; Bai, Hongjie; Lin, Chih-Jen; Chang, Edward Y.

IEEE Trans Pattern Anal Mach Intell ; 33(3): 568-86, 2011 Mar.

Article in English | MEDLINE | ID: mdl-20421667

ABSTRACT

Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.

Subject(s)

Algorithms , Artificial Intelligence , Computer Communication Networks/instrumentation , Models, Statistical , Systems Integration , Cluster Analysis , Computer Simulation , Pattern Recognition, Automated/methods , Reproducibility of Results

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL