Your browser doesn't support javascript.
loading
Visual Commonsense-Aware Representation Network for Video Captioning.
Article en En | MEDLINE | ID: mdl-38127607
ABSTRACT
Generating consecutive descriptions for videos, that is, video captioning, requires taking full advantage of visual representation along with the generation process. Existing video captioning methods focus on an exploration of spatial-temporal representations and their relationships to produce inferences. However, such methods only exploit the superficial association contained in a video itself without considering the intrinsic visual commonsense knowledge that exists in a video dataset, which may hinder their capabilities of knowledge cognitive to reason accurate descriptions. To address this problem, we propose a simple, yet effective method, called visual commonsense-aware representation network (VCRN), for video captioning. Specifically, we construct a Video Dictionary, a plug-and-play component, obtained by clustering all video features from the total dataset into multiple clustered centers without additional annotation. Each center implicitly represents a visual commonsense concept in a video domain, which is utilized in our proposed visual concept selection (VCS) component to obtain a video-related concept feature. Next, a concept-integrated generation (CIG) component is proposed to enhance caption generation. Extensive experiments on three public video captioning benchmarks MSVD, MSR-VTT, and VATEX, demonstrate that our method achieves state-of-the-art performance, indicating the effectiveness of our method. In addition, our method is integrated into the existing method of video question answering (VideoQA) and improves this performance, which further demonstrates the generalization capability of our method. The source code has been released at https//github.com/zchoi/VCRN.

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: IEEE Trans Neural Netw Learn Syst Año: 2023 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: IEEE Trans Neural Netw Learn Syst Año: 2023 Tipo del documento: Article Pais de publicación: Estados Unidos