Search | VHL Regional Portal

Box2Mask: Box-Supervised Instance Segmentation via Level-Set Evolution.

Li, Wentong; Liu, Wenyu; Zhu, Jianke; Cui, Miaomiao; Yu, Risheng; Hua, Xiansheng; Zhang, Lei.

IEEE Trans Pattern Anal Mach Intell ; 46(7): 5157-5173, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38319771

ABSTRACT

In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot instance segmentation approach, namely Box2Mask, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision. Specifically, both the input image and its deep features are employed to evolve the level-set curves implicitly, and a local consistency module based on a pixel affinity kernel is used to mine the local context and spatial relations. Two types of single-stage frameworks, i.e., CNN-based and transformer-based frameworks, are developed to empower the level-set evolution for box-supervised instance segmentation, and each framework consists of three essential components: instance-aware decoder, box-level matching assignment and level-set evolution. By minimizing the level-set energy function, the mask map of each instance can be iteratively optimized within its bounding box annotation. The experimental results on five challenging testbeds, covering general scenes, remote sensing, medical and scene text images, demonstrate the outstanding performance of our proposed Box2Mask approach for box-supervised instance segmentation. In particular, with the Swin-Transformer large backbone, our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recently developed fully mask-supervised methods.

Manifold adversarial training for supervised and semi-supervised learning.

Zhang, Shufei; Huang, Kaizhu; Zhu, Jianke; Liu, Yang.

Neural Netw ; 140: 282-293, 2021 Aug.

Article in English | MEDLINE | ID: mdl-33839600

ABSTRACT

We propose a new regularization method for deep learning based on the manifold adversarial training (MAT). Unlike previous regularization and adversarial training methods, MAT further considers the local manifold of latent representations. Specifically, MAT manages to build an adversarial framework based on how the worst perturbation could affect the statistical manifold in the latent space rather than the output space. Particularly, a latent feature space with the Gaussian Mixture Model (GMM) is first derived in a deep neural network. We then define the smoothness by the largest variation of Gaussian mixtures when a local perturbation is given around the input data point. On one hand, the perturbations are added in the way that would rough the statistical manifold of the latent space the worst. On the other hand, the model is trained to promote the manifold smoothness the most in the latent space. Importantly, since the latent space is more informative than the output space, the proposed MAT can learn a more robust and compact data representation, leading to further performance improvement. The proposed MAT is important in that it can be considered as a superset of one recently-proposed discriminative feature learning approach called center loss. We conduct a series of experiments in both supervised and semi-supervised learning on four benchmark data sets, showing that the proposed MAT can achieve remarkable performance, much better than those of the state-of-the-art approaches. In addition, we present a series of visualization which could generate further understanding or explanation on adversarial examples.

Subject(s)

Supervised Machine Learning/standards , Benchmarking

Dynamic Saliency-Aware Regularization for Correlation Filter-Based Object Tracking.

Feng, Wei; Han, Ruize; Guo, Qing; Zhu, Jianke; Wang, Song.

IEEE Trans Image Process ; 28(7): 3232-3245, 2019 Jul.

Article in English | MEDLINE | ID: mdl-30703022

ABSTRACT

With a good balance between tracking accuracy and speed, correlation filter (CF) has become one of the best object tracking frameworks, based on which many successful trackers have been developed. Recently, spatially regularized CF tracking (SRDCF) has been developed to remedy the annoying boundary effects of CF tracking, thus further boosting the tracking performance. However, SRDCF uses a fixed spatial regularization map constructed from a loose bounding box and its performance inevitably degrades when the target or background show significant variations, such as object deformation or occlusion. To address this problem, we propose a new dynamic saliency-aware regularized CF tracking (DSAR-CF) scheme. In DSAR-CF, a simple yet effective energy function, which reflects the object saliency and tracking reliability in the spatial-temporal domain, is defined to guide the online updating of the regularization weight map using an efficient level-set algorithm. Extensive experiments validate that the proposed DSAR-CF leads to better performance in terms of accuracy and speed than the original SRDCF.

Treelets Binary Feature Retrieval for Fast Keypoint Recognition.

Zhu, Jianke; Wu, Chenxia; Chen, Chun; Cai, Deng.

IEEE Trans Cybern ; 45(10): 2129-41, 2015 Oct.

Article in English | MEDLINE | ID: mdl-25398187

ABSTRACT

Fast keypoint recognition is essential to many vision tasks. In contrast to the classification-based approaches, we directly formulate the keypoint recognition as an image patch retrieval problem, which enjoys the merit of finding the matched keypoint and its pose simultaneously. To effectively extract the binary features from each patch surrounding the keypoint, we make use of treelets transform that can group the highly correlated data together and reduce the noise through the local analysis. Treelets is a multiresolution analysis tool, which provides an orthogonal basis to reflect the geometry of the noise-free data. To facilitate the real-world applications, we have proposed two novel approaches. One is the convolutional treelets that capture the image patch information locally and globally while reducing the computational cost. The other is the higher-order treelets that reflect the relationship between the rows and columns within image patch. An efficient sub-signature-based locality sensitive hashing scheme is employed for fast approximate nearest neighbor search in patch retrieval. Experimental evaluations on both synthetic data and the real-world Oxford dataset have shown that our proposed treelets binary feature retrieval methods outperform the state-of-the-art feature descriptors and classification-based approaches.

Retrieval-based face annotation by weak label regularized local coordinate coding.

Wang, Dayong; Hoi, Steven C H; He, Ying; Zhu, Jianke; Mei, Tao; Luo, Jiebo.

IEEE Trans Pattern Anal Mach Intell ; 36(3): 550-63, 2014 Mar.

Article in English | MEDLINE | ID: mdl-24457510

ABSTRACT

Auto face annotation, which aims to detect human faces from a facial image and assign them proper human names, is a fundamental research problem and beneficial to many real-world applications. In this work, we address this problem by investigating a retrieval-based annotation scheme of mining massive web facial images that are freely available over the Internet. In particular, given a facial image, we first retrieve the top $(n)$ similar instances from a large-scale web facial image database using content-based image retrieval techniques, and then use their labels for auto annotation. Such a scheme has two major challenges: 1) how to retrieve the similar facial images that truly match the query, and 2) how to exploit the noisy labels of the top similar facial images, which may be incorrect or incomplete due to the nature of web images. In this paper, we propose an effective Weak Label Regularized Local Coordinate Coding (WLRLCC) technique, which exploits the principle of local coordinate coding by learning sparse features, and employs the idea of graph-based weak label regularization to enhance the weak labels of the similar facial images. An efficient optimization algorithm is proposed to solve the WLRLCC problem. Moreover, an effective sparse reconstruction scheme is developed to perform the face annotation task. We conduct extensive empirical studies on several web facial image databases to evaluate the proposed WLRLCC algorithm from different aspects. The experimental results validate its efficacy. We share the two constructed databases "WDB" (714,454 images of 6,025 people) and "ADB" (126,070 images of 1,200 people) with the public. To further improve the efficiency and scalability, we also propose an offline approximation scheme (AWLRLCC) which generally maintains comparable results but significantly reduces the annotation time.

Subject(s)

Biometric Identification/methods , Databases, Factual , Face/anatomy & histology , Image Processing, Computer-Assisted/methods , Algorithms , Artificial Intelligence , Female , Humans , Internet , Male

Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition.

Zhang, Luming; Gao, Yue; Hong, Chaoqun; Feng, Yinfu; Zhu, Jianke; Cai, Deng.

IEEE Trans Cybern ; 44(8): 1408-19, 2014 Aug.

Article in English | MEDLINE | ID: mdl-24184790

ABSTRACT

In computer vision and multimedia analysis, it is common to use multiple features (or multimodal features) to represent an object. For example, to well characterize a natural scene image, we typically extract a set of visual features to represent its color, texture, and shape. However, it is challenging to integrate multimodal features optimally. Since they are usually high-order correlated, e.g., the histogram of gradient (HOG), bag of scale invariant feature transform descriptors, and wavelets are closely related because they collaboratively reflect the image texture. Nevertheless, the existing algorithms fail to capture the high-order correlation among multimodal features. To solve this problem, we present a new multimodal feature integration framework. Particularly, we first define a new measure to capture the high-order correlation among the multimodal features, which can be deemed as a direct extension of the previous binary correlation. Therefore, we construct a feature correlation hypergraph (FCH) to model the high-order relations among multimodal features. Finally, a clustering algorithm is performed on FCH to group the original multimodal features into a set of partitions. Moreover, a multiclass boosting strategy is developed to obtain a strong classifier by combining the weak classifiers learned from each partition. The experimental results on seven popular datasets show the effectiveness of our approach.

A fast 2D shape recovery approach by fusing features and appearance.

Zhu, Jianke; Lyu, Michael R; Huang, Thomas S.

IEEE Trans Pattern Anal Mach Intell ; 31(7): 1210-24, 2009 Jul.

Article in English | MEDLINE | ID: mdl-19443920

ABSTRACT

In this paper, we present a fusion approach to solve the nonrigid shape recovery problem, which takes advantage of both the appearance information and the local features. We have two major contributions. First, we propose a novel progressive finite Newton optimization scheme for the feature-based nonrigid surface detection problem, which is reduced to only solving a set of linear equations. The key is to formulate the nonrigid surface detection as an unconstrained quadratic optimization problem that has a closed-form solution for a given set of observations. Second, we propose a deformable Lucas-Kanade algorithm that triangulates the template image into small patches and constrains the deformation through the second-order derivatives of the mesh vertices. We formulate it into a sparse regularized least squares problem, which is able to reduce the computational cost and the memory requirement. The inverse compositional algorithm is applied to efficiently solve the optimization problem. We have conducted extensive experiments for performance evaluation on various environments, whose promising results show that the proposed algorithm is both efficient and effective.

Subject(s)

Algorithms , Artificial Intelligence , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Subtraction Technique , Models, Biological , Reproducibility of Results , Sensitivity and Specificity

A novel kernel-based maximum a posteriori classification method.

Xu, Zenglin; Huang, Kaizhu; Zhu, Jianke; King, Irwin; Lyu, Michael R.

Neural Netw ; 22(7): 977-87, 2009 Sep.

Article in English | MEDLINE | ID: mdl-19167865

ABSTRACT

Kernel methods have been widely used in pattern recognition. Many kernel classifiers such as Support Vector Machines (SVM) assume that data can be separated by a hyperplane in the kernel-induced feature space. These methods do not consider the data distribution and are difficult to output the probabilities or confidences for classification. This paper proposes a novel Kernel-based Maximum A Posteriori (KMAP) classification method, which makes a Gaussian distribution assumption instead of a linear separable assumption in the feature space. Robust methods are further proposed to estimate the probability densities, and the kernel trick is utilized to calculate our model. The model is theoretically and empirically important in the sense that: (1) it presents a more generalized classification model than other kernel-based algorithms, e.g., Kernel Fisher Discriminant Analysis (KFDA); (2) it can output probability or confidence for classification, therefore providing potential for reasoning under uncertainty; and (3) multi-way classification is as straightforward as binary classification in this model, because only probability calculation is involved and no one-against-one or one-against-others voting is needed. Moreover, we conduct an extensive experimental comparison with state-of-the-art classification methods, such as SVM and KFDA, on both eight UCI benchmark data sets and three face data sets. The results demonstrate that KMAP achieves very promising performance against other models.

Subject(s)

Algorithms , Artificial Intelligence , Information Storage and Retrieval , Pattern Recognition, Automated , Biometry , Discriminant Analysis , Face , Humans , Image Interpretation, Computer-Assisted , Nonlinear Dynamics , Normal Distribution

Robust regularized kernel regression.

Zhu, Jianke; Hoi, Steven C H; Lyu, Michael Rung-Tsong.

IEEE Trans Syst Man Cybern B Cybern ; 38(6): 1639-44, 2008 Dec.

Article in English | MEDLINE | ID: mdl-19022733

ABSTRACT

Robust regression techniques are critical to fitting data with noise in real-world applications. Most previous work of robust kernel regression is usually formulated into a dual form, which is then solved by some quadratic program solver consequently. In this correspondence, we propose a new formulation for robust regularized kernel regression under the theoretical framework of regularization networks and then tackle the optimization problem directly in the primal. We show that the primal and dual approaches are equivalent to achieving similar regression performance, but the primal formulation is more efficient and easier to be implemented than the dual one. Different from previous work, our approach also optimizes the bias term. In addition, we show that the proposed solution can be easily extended to other noise-reliable loss function, including the Huber- epsilon insensitive loss function. Finally, we conduct a set of experiments on both artificial and real data sets, in which promising results show that the proposed method is effective and more efficient than traditional approaches.

Subject(s)

Algorithms , Artificial Intelligence , Data Interpretation, Statistical , Models, Statistical , Pattern Recognition, Automated/methods , Regression Analysis , Computer Simulation

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL