Search | VHL Regional Portal

IPGAN: Generating Informative Item Pairs by Adversarial Sampling.

Guo, Guibing; Zhou, Huan; Chen, Bowei; Liu, Zhirong; Xu, Xiao; Chen, Xu; Dong, Zhenhua; He, Xiuqiang.

IEEE Trans Neural Netw Learn Syst ; 33(2): 694-706, 2022 Feb.

Article in English | MEDLINE | ID: mdl-33108294

ABSTRACT

Negative sampling plays an important role in ranking-based recommender models. However, most existing sampling methods cannot generate informative item pairs with positive and negative instances due to two limitations: 1) they merely treat observed items as positive instances, ignoring the existence of potential positive items (i.e., nonobserved items users may prefer) and the probability of observed but noisy items and 2) they fail to capture the relationship between positive and negative items during negative sampling, which may cause the unexpected selection of potential positive items. In this article, we introduce a dynamic sampling strategy to search informative item pairs. Specifically, we first sample a positive instance from all the items by leveraging the overall features of user's observed items. Then, we strategically select a negative instance by considering its correlation with the sampled positive one. Formally, we propose an item pair generative adversarial network named IPGAN, where our sampling strategy is realized in two generative models for positive and negative instances, respectively. In addition, IPGAN can also ensure that the sampled item pairs are informative relative to the ground truth by a discriminative model. What is more, we propose a batch-training approach to further enhance both user and item modeling by alleviating the special bias (noise) from different users. This approach can also significantly accelerate the process of model training compared with classical GAN method for recommendation. Experimental results on three real data sets show that our approach outperforms other state-of-the-art approaches in terms of recommendation accuracy.

Adaptive Spatio-Temporal Graph Enhanced Vision-Language Representation for Video QA.

Jin, Weike; Zhao, Zhou; Cao, Xiaochun; Zhu, Jieming; He, Xiuqiang; Zhuang, Yueting.

IEEE Trans Image Process ; 30: 5477-5489, 2021.

Article in English | MEDLINE | ID: mdl-33950840

ABSTRACT

Vision-language research has become very popular, which focuses on understanding of visual contents, language semantics and relationships between them. Video question answering (Video QA) is one of the typical tasks. Recently, several BERT style pre-training methods have been proposed and shown effectiveness on various vision-language tasks. In this work, we leverage the successful vision-language transformer structure to solve the Video QA problem. However, we do not pre-train it with any video data, because video pre-training requires massive computing resources and is hard to perform with only a few GPUs. Instead, our work aims to leverage image-language pre-training to help with video-language modeling, by sharing a common module design. We further introduce an adaptive spatio-temporal graph to enhance the vision-language representation learning. That is, we adaptively refine the spatio-temporal tubes of salient objects according to their spatio-temporal relations learned through a hierarchical graph convolution process. Finally, we can obtain a number of fine-grained tube-level video object representations, as the visual inputs of the vision-language transformer module. Experiments on three widely used Video QA datasets show that our model achieves the new state-of-the-art results.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL