Search | VHL Regional Portal

Digging Into Uncertainty-Based Pseudo-Label for Robust Stereo Matching.

Shen, Zhelun; Song, Xibin; Dai, Yuchao; Zhou, Dingfu; Rao, Zhibo; Zhang, Liangjun.

IEEE Trans Pattern Anal Mach Intell ; 45(12): 14301-14320, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37590113

ABSTRACT

Due to the domain differences and unbalanced disparity distribution across multiple datasets, current stereo matching approaches are commonly limited to a specific dataset and generalize poorly to others. Such domain shift issue is usually addressed by substantial adaptation on costly target-domain ground-truth data, which cannot be easily obtained in practical settings. In this paper, we propose to dig into uncertainty estimation for robust stereo matching. Specifically, to balance the disparity distribution, we employ a pixel-level uncertainty estimation to adaptively adjust the next stage disparity searching space, in this way driving the network progressively prune out the space of unlikely correspondences. Then, to solve the limited ground truth data, an uncertainty-based pseudo-label is proposed to adapt the pre-trained model to the new domain, where pixel-level and area-level uncertainty estimation are proposed to filter out the high-uncertainty pixels of predicted disparity maps and generate sparse while reliable pseudo-labels to align the domain gap. Experimentally, our method shows strong cross-domain, adapt, and joint generalization and obtains 1st place on the stereo task of Robust Vision Challenge 2020. Additionally, our uncertainty-based pseudo-labels can be extended to train monocular depth estimation networks in an unsupervised way and even achieves comparable performance with the supervised methods.

Rethinking Training Strategy in Stereo Matching.

Rao, Zhibo; Dai, Yuchao; Shen, Zhelun; He, Renjie.

IEEE Trans Neural Netw Learn Syst ; 34(10): 7796-7809, 2023 Oct.

Article in English | MEDLINE | ID: mdl-35143404

ABSTRACT

In stereo matching, various learning-based approaches have shown impressive performance in solving traditional difficulties on multiple datasets. While most progress is obtained on a specific dataset with a dataset-specific network design, the performance on the single dataset and cross dataset affected by training strategy is often ignored. In this article, we analyze the relationship between different training strategies and performance by retraining some representative state-of-the-art methods (e.g., geometry and context network (GC-Net), pyramid stereo matching network (PSM-Net), and guided aggregation network (GA-Net), etc.). According to our research, it is surprising that the performance of networks on single or cross datasets is significantly improved by pre-training and data augmentation without any particular structure acquirement. Based on this discovery, we improve our previous non-local context attention network (NLCA-Net) to NLCA-Net v2 and train it with the novel strategy and rethink the training strategy of stereo matching concurrently. The quantitative experiments demonstrate that: 1) our model is capable of reaching top performance on both the single dataset and the multiple datasets with the same parameters in this study, which also won the 2nd place in the stereo task of the ECCV Robust vision Challenge 2020 (RVC 2020); and 2) on small datasets (e.g., KITTI, ETH3D, and Middlebury), the model's generalization and robustness are significantly affected by pre-training and data augmentation, even exceeding the network structure's influence in some cases. These observations present a challenge to the conventional wisdom of network architectures in this stage. We expect these discoveries to encourage researchers to rethink the current paradigm of "excessive attention on the performance of a single small dataset" in stereo matching.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL