Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 33: 3145-3160, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38656843

RESUMO

Multi-view subspace clustering (MVSC) has drawn significant attention in recent study. In this paper, we propose a novel approach to MVSC. First, the new method is capable of preserving high-order neighbor information of the data, which provides essential and complicated underlying relationships of the data that is not straightforwardly preserved by the first-order neighbors. Second, we design log-based nonconvex approximations to both tensor rank and tensor sparsity, which are effective and more accurate than the convex approximations. For the associated shrinkage problems, we provide elegant theoretical results for the closed-form solutions, for which the convergence is guaranteed by theoretical analysis. Moreover, the new approximations have some interesting properties of shrinkage effects, which are guaranteed by elegant theoretical results. Extensive experimental results confirm the effectiveness of the proposed method.

2.
IEEE Trans Image Process ; 33: 338-353, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38100339

RESUMO

Existing salient object detection methods are capable of predicting binary maps that highlight visually salient regions. However, these methods are limited in their ability to differentiate the relative importance of multiple objects and the relationships among them, which can lead to errors and reduced accuracy in downstream tasks that depend on the relative importance of multiple objects. To conquer, this paper proposes a new paradigm for saliency ranking, which aims to completely focus on ranking salient objects by their "importance order". While previous works have shown promising performance, they still face ill-posed problems. First, the saliency ranking ground truth (GT) orders generation methods are unreasonable since determining the correct ranking order is not well-defined, resulting in false alarms. Second, training a ranking model remains challenging because most saliency ranking methods follow the multi-task paradigm, leading to conflicts and trade-offs among different tasks. Third, existing regression-based saliency ranking methods are complex for saliency ranking models due to their reliance on instance mask-based saliency ranking orders. These methods require a significant amount of data to perform accurately and can be challenging to implement effectively. To solve these problems, this paper conducts an in-depth analysis of the causes and proposes a whole-flow processing paradigm of saliency ranking task from the perspective of "GT data generation", "network structure design" and "training protocol". The proposed approach outperforms existing state-of-the-art methods on the widely-used SALICON set, as demonstrated by extensive experiments with fair and reasonable comparisons. The saliency ranking task is still in its infancy, and our proposed unified framework can serve as a fundamental strategy to guide future work. The code and data will be available at https://github.com/MengkeSong/Saliency-Ranking-Paradigm.

3.
IEEE Trans Vis Comput Graph ; 29(11): 4361-4371, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37788214

RESUMO

We present FineStyle, a novel framework for motion style transfer that generates expressive human animations with specific styles for virtual reality and vision fields. It incorporates semantic awareness, which improves motion representation and allows for precise and stylish animation generation. Existing methods for motion style transfer have all failed to consider the semantic meaning behind the motion, resulting in limited controls over the generated human animations. To improve, FineStyle introduces a new cross-modality fusion module called Dual Interactive-Flow Fusion (DIFF). As the first attempt, DIFF integrates motion style features and semantic flows, producing semantic-aware style codes for fine-grained motion style transfer. FineStyle uses an innovative two-stage semantic guidance approach that leverages semantic clues to enhance the discriminative power of both semantic and style features. At an early stage, a semantic-guided encoder introduces distinct semantic clues into the style flow. Then, at a fine stage, both flows are further fused interactively, selecting the matched and critical clues from both flows. Extensive experiments demonstrate that FineStyle outperforms state-of-the-art methods in visual quality and controllability. By considering the semantic meaning behind motion style patterns, FineStyle allows for more precise control over motion styles. Source code and model are available on https://github.com/XingliangJin/Fine-Style.git.

4.
IEEE Trans Image Process ; 31: 6649-6663, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36260595

RESUMO

Recent research advances in salient object detection (SOD) could largely be attributed to ever-stronger multi-scale feature representation empowered by the deep learning technologies. The existing SOD deep models extract multi-scale features via the off-the-shelf encoders and combine them smartly via various delicate decoders. However, the kernel sizes in this commonly-used thread are usually "fixed". In our new experiments, we have observed that kernels of small size are preferable in scenarios containing tiny salient objects. In contrast, large kernel sizes could perform better for images with large salient objects. Inspired by this observation, we advocate the "dynamic" scale routing (as a brand-new idea) in this paper. It will result in a generic plug-in that could directly fit the existing feature backbone. This paper's key technical innovations are two-fold. First, instead of using the vanilla convolution with fixed kernel sizes for the encoder design, we propose the dynamic pyramid convolution (DPConv), which dynamically selects the best-suited kernel sizes w.r.t. the given input. Second, we provide a self-adaptive bidirectional decoder design to accommodate the DPConv-based encoder best. The most significant highlight is its capability of routing between feature scales and their dynamic collection, making the inference process scale-aware. As a result, this paper continues to enhance the current SOTA performance. Both the code and dataset are publicly available at https://github.com/wuzhenyubuaa/DPNet.

5.
IEEE Trans Image Process ; 31: 6124-6138, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36112559

RESUMO

Most existing RGB-D salient object detection (SOD) methods are primarily focusing on cross-modal and cross-level saliency fusion, which has been proved to be efficient and effective. However, these methods still have a critical limitation, i.e., their fusion patterns - typically the combination of selective characteristics and its variations, are too highly dependent on the network's non-linear adaptability. In such methods, the balances between RGB and D (Depth) are formulated individually considering the intermediate feature slices, but the relation at the modality level may not be learned properly. The optimal RGB-D combinations differ depending on the RGB-D scenarios, and the exact complementary status is frequently determined by multiple modality-level factors, such as D quality, the complexity of the RGB scene, and degree of harmony between them. Therefore, given the existing approaches, it may be difficult for them to achieve further performance breakthroughs, as their methodologies belong to some methods that are somewhat less modality sensitive. To conquer this problem, this paper presents the Modality-aware Decoder (MaD). The critical technical innovations include a series of feature embedding, modality reasoning, and feature back-projecting and collecting strategies, all of which upgrade the widely-used multi-scale and multi-level decoding process to be modality-aware. Our MaD achieves competitive performance over other state-of-the-art (SOTA) models without using any fancy tricks in the decoder's design. Codes and results will be publicly available at https://github.com/MengkeSong/MaD.

6.
IEEE Trans Image Process ; 30: 4238-4252, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33819154

RESUMO

Human attention is an interactive activity between our visual system and our brain, using both low-level visual stimulus and high-level semantic information. Previous image salient object detection (SOD) studies conduct their saliency predictions via a multitask methodology in which pixelwise saliency regression and segmentation-like saliency refinement are conducted simultaneously. However, this multitask methodology has one critical limitation: the semantic information embedded in feature backbones might be degenerated during the training process. Our visual attention is determined mainly by semantic information, which is evidenced by our tendency to pay more attention to semantically salient regions even if these regions are not the most perceptually salient at first glance. This fact clearly contradicts the widely used multitask methodology mentioned above. To address this issue, this paper divides the SOD problem into two sequential steps. First, we devise a lightweight, weakly supervised deep network to coarsely locate the semantically salient regions. Next, as a postprocessing refinement, we selectively fuse multiple off-the-shelf deep models on the semantically salient regions identified by the previous step to formulate a pixelwise saliency map. Compared with the state-of-the-art (SOTA) models that focus on learning the pixelwise saliency in single images using only perceptual clues, our method aims at investigating the object-level semantic ranks between multiple images, of which the methodology is more consistent with the human attention mechanism. Our method is simple yet effective, and it is the first attempt to consider salient object detection as mainly an object-level semantic reranking problem.

7.
IEEE Trans Image Process ; 30: 3995-4007, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33784620

RESUMO

We have witnessed a growing interest in video salient object detection (VSOD) techniques in today's computer vision applications. In contrast with temporal information (which is still considered a rather unstable source thus far), the spatial information is more stable and ubiquitous, thus it could influence our vision system more. As a result, the current main-stream VSOD approaches have inferred and obtained their saliency primarily from the spatial perspective, still treating temporal information as subordinate. Although the aforementioned methodology of focusing on the spatial aspect is effective in achieving a numeric performance gain, it still has two critical limitations. First, to ensure the dominance by the spatial information, its temporal counterpart remains inadequately used, though in some complex video scenes, the temporal information may represent the only reliable data source, which is critical to derive the correct VSOD. Second, both spatial and temporal saliency cues are often computed independently in advance and then integrated later on, while the interactions between them are omitted completely, resulting in saliency cues with limited quality. To combat these challenges, this paper advocates a novel spatiotemporal network, where the key innovation is the design of its temporal unit. Compared with other existing competitors (e.g., convLSTM), the proposed temporal unit exhibits an extremely lightweight design that does not degrade its strong ability to sense temporal information. Furthermore, it fully enables the computation of temporal saliency cues that interact with their spatial counterparts, ultimately boosting the overall VSOD performance and realizing its full potential towards mutual performance improvement for each. The proposed method is easy to implement yet still effective, achieving high-quality VSOD at 50 FPS in real-time applications.

8.
IEEE Trans Image Process ; 30: 2350-2363, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33481710

RESUMO

The existing fusion-based RGB-D salient object detection methods usually adopt the bistream structure to strike a balance in the fusion trade-off between RGB and depth (D). While the D quality usually varies among the scenes, the state-of-the-art bistream approaches are depth-quality-unaware, resulting in substantial difficulties in achieving complementary fusion status between RGB and D and leading to poor fusion results for low-quality D. Thus, this paper attempts to integrate a novel depth-quality-aware subnet into the classic bistream structure in order to assess the depth quality prior to conducting the selective RGB-D fusion. Compared to the SOTA bistream methods, the major advantage of our method is its ability to lessen the importance of the low-quality, no-contribution, or even negative-contribution D regions during RGB-D fusion, achieving a much improved complementary status between RGB and D. Our source code and data are available online at https://github.com/qdu1995/DQSD.

9.
Knowl Based Syst ; 2332021 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-36059387

RESUMO

We introduce a new classifier for small-sample image data based on a two-dimensional discriminative regression approach. For a test example, our method estimates a discriminative representation from training examples, which accounts for discriminativeness between classes and enables accurate derivation of categorical information. Unlike existing methods that vectored image data, the learning of the representation in our method is performed with the two-dimensional features of the data, and thus inherent spatial information of the data is fully exploited. This new type of two-dimensional discriminative regression, different from existing regression models, allows for building a highly effective and robust classifier for image data through explicitly incorporating discriminative information and inherent spatial information. We compare our method with several state-of-the-art classifiers of small-sample images and experimental results show superior performance of the proposed method in classification accuracy as well as robustness to noise corruption.

10.
IEEE Trans Image Process ; 30: 458-471, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33201813

RESUMO

Existing RGB-D salient object detection methods treat depth information as an independent component to complement RGB and widely follow the bistream parallel network architecture. To selectively fuse the CNN features extracted from both RGB and depth as a final result, the state-of-the-art (SOTA) bistream networks usually consist of two independent subbranches: one subbranch is used for RGB saliency, and the other aims for depth saliency. However, depth saliency is persistently inferior to the RGB saliency because the RGB component is intrinsically more informative than the depth component. The bistream architecture easily biases its subsequent fusion procedure to the RGB subbranch, leading to a performance bottleneck. In this paper, we propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction, where we cyclically convert the original 4-dimensional RGB-D into DGB, RDB and RGD. Then, a newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D, achieving a new SOTA performance.

11.
IEEE Trans Vis Comput Graph ; 26(12): 3535-3545, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32941153

RESUMO

The 2D image based salient object detection (SOD) has been extensively explored, while the 360° omnidirectional image based SOD has received less research attention and there exist three major bottlenecks that are limiting its performance. Firstly, the currently available training data is insufficient for the training of 360° SOD deep model. Secondly, the visual distortions in 360° omnidirectional images usually result in large feature gap between 360° images and 2D images; consequently, the widely used stage-wise training-a widely-used solution to alleviate the training data shortage problem, becomes infeasible when conducing SOD in 360° omnidirectional images. Thirdly, the existing 360° SOD approach has followed a multi-task methodology that performs salient object localization and segmentation-like saliency refinement at the same time, being faced with extremely large problem domain, making the training data shortage dilemma even worse. To tackle all these issues, this paper divides the 360° SOD into a multi-staqe task, the key rationale of which is to decompose the original complex problem domain into sequential easy sub problems that only demand for small-scale training data. Meanwhile, we learn how to rank the "object-level semantical saliency", aiming to locate salient viewpoints and objects accurately. Specifically, to alleviate the training data shortage problem, we have released a novel dataset named 360-SSOD, containing 1,105 360° omnidirectional images with manually annotated object-level saliency ground truth, whose semantical distribution is more balanced than that of the existing dataset. Also, we have compared the proposed method with 13 SOTA methods, and all quantitative results have demonstrated the performance superiority.

12.
Artigo em Inglês | MEDLINE | ID: mdl-32012011

RESUMO

To solve the saliency detection problem in RGB-D images, the depth information plays a critical role in distinguishing salient objects or foregrounds from cluttered backgrounds. As the complementary component to color information, the depth quality directly dictates the subsequent saliency detection performance. However, due to artifacts and the limitation of depth acquisition devices, the quality of the obtained depth varies tremendously across different scenarios. Consequently, conventional selective fusion-based RGB-D saliency detection methods may result in a degraded detection performance in cases containing salient objects with low color contrast coupled with a low depth quality. To solve this problem, we make our initial attempt to estimate additional high-quality depth information, which is denoted by Depth+. Serving as a complement to the original depth, Depth+ will be fed into our newly designed selective fusion network to boost the detection performance. To achieve this aim, we first retrieve a small group of images that are similar to the given input, and then the inter-image, nonlocal correspondences are built accordingly. Thus, by using these inter-image correspondences, the overall depth can be coarsely estimated by utilizing our newly designed depth-transferring strategy. Next, we build fine-grained, object-level correspondences coupled with a saliency prior to further improve the depth quality of the previous estimation. Compared to the original depth, our newly estimated Depth+ is potentially more informative for detection improvement. Finally, we feed both the original depth and the newly estimated Depth+ into our selective deep fusion network, whose key novelty is to achieve an optimal complementary balance to make better decisions toward improving saliency boundaries.

13.
Artigo em Inglês | MEDLINE | ID: mdl-31449017

RESUMO

This paper proposes to utilize supervised deep convolutional neural networks to take full advantage of the long-term spatial-temporal information in order to improve the video saliency detection performance. The conventional methods, which use the temporally neighbored frames solely, could easily encounter transient failure cases when the spatial-temporal saliency clues are less-trustworthy for a long period. To tackle the aforementioned limitation, we plan to identify those beyond-scope frames with trustworthy long-term saliency clues first and then align it with the current problem domain for an improved video saliency detection.

14.
IEEE Trans Image Process ; 26(7): 3156-3170, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28221994

RESUMO

This paper advocates a novel video saliency detection method based on the spatial-temporal saliency fusion and low-rank coherency guided saliency diffusion. In sharp contrast to the conventional methods, which conduct saliency detection locally in a frame-by-frame way and could easily give rise to incorrect low-level saliency map, in order to overcome the existing difficulties, this paper proposes to fuse the color saliency based on global motion clues in a batch-wise fashion. And we also propose low-rank coherency guided spatial-temporal saliency diffusion to guarantee the temporal smoothness of saliency maps. Meanwhile, a series of saliency boosting strategies are designed to further improve the saliency accuracy. First, the original long-term video sequence is equally segmented into many short-term frame batches, and the motion clues of the individual video batch are integrated and diffused temporally to facilitate the computation of color saliency. Then, based on the obtained saliency clues, inter-batch saliency priors are modeled to guide the low-level saliency fusion. After that, both the raw color information and the fused low-level saliency are regarded as the low-rank coherency clues, which are employed to guide the spatial-temporal saliency diffusion with the help of an additional permutation matrix serving as the alternative rank selection strategy. Thus, it could guarantee the robustness of the saliency map's temporal consistence, and further boost the accuracy of the computed saliency map. Moreover, we conduct extensive experiments on five public available benchmarks, and make comprehensive, quantitative evaluations between our method and 16 state-of-the-art techniques. All the results demonstrate the superiority of our method in accuracy, reliability, robustness, and versatility.

15.
IEEE Trans Image Process ; 24(8): 2303-16, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25700446

RESUMO

This paper advocates a novel multiscale, structure-sensitive saliency detection method, which can distinguish multilevel, reliable saliency from various natural pictures in a robust and versatile way. One key challenge for saliency detection is to guarantee the entire salient object being characterized differently from nonsalient background. To tackle this, our strategy is to design a structure-aware descriptor based on the intrinsic biharmonic distance metric. One benefit of introducing this descriptor is its ability to simultaneously integrate local and global structure information, which is extremely valuable for separating the salient object from nonsalient background in a multiscale sense. Upon devising such powerful shape descriptor, the remaining challenge is to capture the saliency to make sure that salient subparts actually stand out among all possible candidates. Toward this goal, we conduct multilevel low-rank and sparse analysis in the intrinsic feature space spanned by the shape descriptors defined on over-segmented super-pixels. Since the low-rank property emphasizes much more on stronger similarities among super-pixels, we naturally obtain a scale space along the rank dimension in this way. Multiscale saliency can be obtained by simply computing differences among the low-rank components across the rank scale. We conduct extensive experiments on some public benchmarks, and make comprehensive, quantitative evaluation between our method and existing state-of-the-art techniques. All the results demonstrate the superiority of our method in accuracy, reliability, robustness, and versatility.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...