RESUMO
Recent works on video salient object detection have demonstrated that directly transferring the generalization ability of image-based models to video data without modeling spatial-temporal information remains nontrivial and challenging. Considering both intraframe accuracy and interframe consistency of saliency detection, this article presents a novel cross-attention based encoder-decoder model under the Siamese framework (CASNet) for video salient object detection. A baseline encoder-decoder model trained with Lovász softmax loss function is adopted as a backbone network to guarantee the accuracy of intraframe salient object detection. Self- and cross-attention modules are incorporated into our model in order to preserve the saliency correlation and improve intraframe salient detection consistency. Extensive experimental results obtained by ablation analysis and cross-data set validation demonstrate the effectiveness of our proposed method. Quantitative results indicate that our CASNet model outperforms 19 state-of-the-art image- and video-based methods on six benchmark data sets.
RESUMO
This paper focuses on the event-triggered sequential fusion estimation for the multi-sensor systems with correlated noises. An event-triggered communication mechanism is introduced to reduce unnecessary energy waste. Considering that measurement noise of different sensors is correlated with each other and also correlated with the system noise of the previous step, an event-triggered sequential fusion estimation algorithm is proposed in the sense of linear minimum covariance. The standard values of the correlation parameters are defined to ensure convergence of the designed fusion algorithm and an upper bound of the estimation error covariance is given. A numerical example is used to illustrate the effectiveness of the presented fusion algorithm.