Search | VHL Regional Portal

UNIMEMnet: Learning long-term motion and appearance dynamics for video prediction with a unified memory network.

Dai, Kuai; Li, Xutao; Luo, Chuyao; Chen, Wuqiao; Ye, Yunming; Feng, Shanshan.

Neural Netw ; 168: 256-271, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37774512

ABSTRACT

As a pixel-wise dense forecast task, video prediction is challenging due to its high computation complexity, dramatic future uncertainty, and extremely complicated spatial-temporal patterns. Many deep learning methods are proposed for the task, which bring up significant improvements. However, they focus on modeling short-term spatial-temporal dynamics and fail to sufficiently exploit long-term ones. As a result, the methods tend to deliver unsatisfactory performance for a long-term forecast requirement. In this article, we propose a novel unified memory network (UNIMEMnet) for long-term video prediction, which can effectively exploit long-term motion-appearance dynamics and unify the short-term spatial-temporal dynamics and long-term ones in an architecture. In the UNIMEMnet, a dual branch multi-scale memory module is carefully designed to extract and preserve long-term spatial-temporal patterns. In addition, a short-term spatial-temporal dynamics module and an alignment and fusion module are devised to capture and coordinate short-term motion-appearance dynamics with long-term ones from our designed memory module. Extensive experiments on five video prediction datasets from both synthetic and real-world scenarios are conducted, which validate the effectiveness and superiority of our proposed method UNIMEMnet over state-of-the-art methods.

Subject(s)

Motion , Uncertainty

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL