MixIR: Mixing Input and Representations for Contrastive Learning.

Zhao, Tianhao; Guo, Xiaoyang; Lin, Yutian; Du, Bo

Zhao, Tianhao; Guo, Xiaoyang; Lin, Yutian; Du, Bo.

IEEE Trans Neural Netw Learn Syst ; PP2024 Aug 14.

Article en En | MEDLINE | ID: mdl-39141459

ABSTRACT

ABSTRACT

Recently, contrastive learning has shown significant progress in learning visual representations from unlabeled data. The core idea is training the backbone to be invariant to different augmentations of an instance. While most methods only maximize the feature similarity between two augmented data, we further generate more challenging training samples and force the model to keep predicting aggregated representation on these hard samples. In this article, we propose MixIR, a mixture-based approach upon the traditional Siamese network. On the one hand, we input two augmented images of an instance to the backbone and obtain the aggregated representation by performing an elementwise maximum of two features. On the other hand, we take the mixture of these augmented images as input and expect the model prediction to be close to the aggregated representation. In this way, the model could access more variant data samples of an instance and keep predicting invariant representations for them. Thus, the learned model is more discriminative compared with previous contrastive learning methods. Extensive experiments on large-scale datasets show that MixIR steadily improves the baseline and achieves competitive results with state-of-the-art methods. Our code is available at https//github.com/happytianhao/MixIR.

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: IEEE Trans Neural Netw Learn Syst Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google