Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 1 de 1
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12408-12426, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37819806

RESUMO

Natural untrimmed videos provide rich visual content for self-supervised learning. Yet most previous efforts to learn spatio-temporal representations rely on manually trimmed videos, such as Kinetics dataset (Carreira and Zisserman 2017), resulting in limited diversity in visual patterns and limited performance gains. In this work, we aim to improve video representations by leveraging the rich information in natural untrimmed videos. For this purpose, we propose learning a hierarchy of temporal consistencies in videos, i.e., visual consistency and topical consistency, corresponding respectively to clip pairs that tend to be visually similar when separated by a short time span, and clip pairs that share similar topics when separated by a long time span. Specifically, we present a Hierarchical Consistency (HiCo++) learning framework, in which the visually consistent pairs are encouraged to share the same feature representations by contrastive learning, while topically consistent pairs are coupled through a topical classifier that distinguishes whether they are topic-related, i.e., from the same untrimmed video. Additionally, we impose a gradual sampling algorithm for the proposed hierarchical consistency learning, and demonstrate its theoretical superiority. Empirically, we show that HiCo++ can not only generate stronger representations on untrimmed videos, but also improve the representation quality when applied to trimmed videos. This contrasts with standard contrastive learning, which fails to learn powerful representations from untrimmed videos. Source code will be made available here.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...