Search | VHL Regional Portal

Learning Self-Corrective Network via Adaptive Self-Labeling and Dynamic NMS for High-Performance Long-Term Tracking.

Zhang, Zhibin; Xue, Wanli; Zhang, Kaihua; Liu, Bo; Zhang, Chengwei; Liu, Jingen; Chen, Shengyong.

IEEE Trans Neural Netw Learn Syst ; PP2023 Nov 07.

Article in English | MEDLINE | ID: mdl-37934642

ABSTRACT

This article presents a self-corrective network-based long-term tracker (SCLT) including a self-modulated tracking reliability evaluator (STRE) and a self-adjusting proposal postprocessor (SPPP). The targets in the long-term sequences often suffer from severe appearance variations. Existing long-term trackers often online update their models to adapt the variations, but the inaccurate tracking results introduce cumulative error into the updated model that may cause severe drift issue. To this end, a robust long-term tracker should have the self-corrective capability that can judge whether the tracking result is reliable or not, and then it is able to recapture the target when severe drift happens caused by serious challenges (e.g., full occlusion and out-of-view). To address the first issue, the STRE designs an effective tracking reliability classifier that is built on a modulation subnetwork. The classifier is trained using the samples with pseudo labels generated by an adaptive self-labeling strategy. The adaptive self-labeling can automatically label the hard negative samples that are often neglected in existing trackers according to the statistical characteristics of target state, and the network modulation mechanism can guide the backbone network to learn more discriminative features without extra training data. To address the second issue, after the STRE has been triggered, the SPPP follows it with a dynamic NMS to recapture the target in time and accurately. In addition, the STRE and the SPPP demonstrate good transportability ability, and their performance is improved when combined with multiple baselines. Compared to the commonly used greedy NMS, the proposed dynamic NMS leverages an adaptive strategy to effectively handle the different conditions of in view and out of view, thereby being able to select the most probable object box that is essential to accurately online update the basic tracker. Extensive evaluations on four large-scale and challenging benchmark datasets including VOT2021LT, OxUvALT, TLP, and LaSOT demonstrate superiority of the proposed SCLT to a variety of state-of-the-art long-term trackers in terms of all measures. Source codes and demos can be found at https://github.com/TJUT-CV/SCLT.

CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation.

Wang, Xiao; Liu, Jingen; Mei, Tao; Luo, Jiebo.

IEEE Trans Neural Netw Learn Syst ; PP2023 May 04.

Article in English | MEDLINE | ID: mdl-37141054

ABSTRACT

Some cognitive research has discovered that humans accomplish event segmentation as a side effect of event anticipation. Inspired by this discovery, we propose a simple yet effective end-to-end self-supervised learning framework for event segmentation/boundary detection. Unlike the mainstream clustering-based methods, our framework exploits a transformer-based feature reconstruction scheme to detect event boundaries by reconstruction errors. This is consistent with the fact that humans spot new events by leveraging the deviation between their prediction and what is perceived. Thanks to their heterogeneity in semantics, the frames at boundaries are difficult to be reconstructed (generally with large reconstruction errors), which is favorable for event boundary detection. In addition, since the reconstruction occurs on the semantic feature level instead of the pixel level, we develop a temporal contrastive feature embedding (TCFE) module to learn the semantic visual representation for frame feature reconstruction (FFR). This procedure is like humans building up experiences with "long-term memory." The goal of our work is to segment generic events rather than localize some specific ones. We focus on achieving accurate event boundaries. As a result, we adopt the F1 score (Precision/Recall) as our primary evaluation metric for a fair comparison with previous approaches. Meanwhile, we also calculate the conventional frame-based mean over frames (MoF) and intersection over union (IoU) metric. We thoroughly benchmark our work on four publicly available datasets and demonstrate much better results. The source code is available at https://github.com/wang3702/CoSeg.

Multi-Task Siamese Network for Retinal Artery/Vein Separation via Deep Convolution Along Vessel.

Wang, Zhiwei; Jiang, Xixi; Liu, Jingen; Cheng, Kwang-Ting; Yang, Xin.

IEEE Trans Med Imaging ; 39(9): 2904-2919, 2020 09.

Article in English | MEDLINE | ID: mdl-32167888

ABSTRACT

Vascular tree disentanglement and vessel type classification are two crucial steps of the graph-based method for retinal artery-vein (A/V) separation. Existing approaches treat them as two independent tasks and mostly rely on ad hoc rules (e.g. change of vessel directions) and hand-crafted features (e.g. color, thickness) to handle them respectively. However, we argue that the two tasks are highly correlated and should be handled jointly since knowing the A/V type can unravel those highly entangled vascular trees, which in turn helps to infer the types of connected vessels that are hard to classify based on only appearance. Therefore, designing features and models isolatedly for the two tasks often leads to a suboptimal solution of A/V separation. In view of this, this paper proposes a multi-task siamese network which aims to learn the two tasks jointly and thus yields more robust deep features for accurate A/V separation. Specifically, we first introduce Convolution Along Vessel (CAV) to extract the visual features by convolving a fundus image along vessel segments, and the geometric features by tracking the directions of blood flow in vessels. The siamese network is then trained to learn multiple tasks: i) classifying A/V types of vessel segments using visual features only, and ii) estimating the similarity of every two connected segments by comparing their visual and geometric features in order to disentangle the vasculature into individual vessel trees. Finally, the results of two tasks mutually correct each other to accomplish final A/V separation. Experimental results demonstrate that our method can achieve accuracy values of 94.7%, 96.9%, and 94.5% on three major databases (DRIVE, INSPIRE, WIDE) respectively, which outperforms recent state-of-the-arts.

Subject(s)

Retinal Artery , Retinal Vein , Algorithms , Fundus Oculi , Retina , Retinal Vein/diagnostic imaging , Retinal Vessels/diagnostic imaging

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL