Your browser doesn't support javascript.
loading
Kernel Masked Image Modeling Through the Lens of Theoretical Understanding.
Article in En | MEDLINE | ID: mdl-39190525
ABSTRACT
Masked image modeling (MIM) has been considered as the state-of-the-art (SOTA) self-supervised learning (SSL) technique in terms of visual pretraining. The impressive generalization ability of MIM also paves the way for the remarkable success of large-scale vision foundation models. In this article, we further discuss the validity and advantages of implementing MIM techniques in the reproducing kernel Hilbert spaces (RKHSs) and we associate the analysis with a novel MIM method named R-MIM (short for RKHS-MIM). Through the careful construction of an augmentation graph and by using spectral decomposition techniques, we establish a systematic theoretical understanding between the proposed R-MIM's generalization ability and the choice of kernel function used during training. Specifically, we reach a conclusion that both of the local Lipschitz constant of the resultant R-MIM model and the corresponding expected pretraining error can have a strong composite effect on bounding downstream task error, depending on the kernel options. We demonstrate that under mild mathematical assumptions, R-MIM method is guaranteed to return a lower bound on downstream tasks in comparison to vanilla MIM techniques, such as masked autoencoder (MAE) and SimMIM. Empirical justification well corroborates our theoretical hypothesis and analysis in showing the superior generalization of the proposed R-MIM and the theoretical link to kernel choices. The code is available at https//github.com/yurui-q/R-MIM.

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: IEEE Trans Neural Netw Learn Syst Year: 2024 Document type: Article Country of publication: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: IEEE Trans Neural Netw Learn Syst Year: 2024 Document type: Article Country of publication: United States