Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-37824319

ABSTRACT

The existence of redundancy in convolutional neural networks (CNNs) enables us to remove some filters/channels with acceptable performance drops. However, the training objective of CNNs usually tends to minimize an accuracy-related loss function without any attention paid to the redundancy, making the redundancy distribute randomly on all the filters, such that removing any of them may trigger information loss and accuracy drop, necessitating a fine-tuning step for recovery. In this article, we propose to manipulate the redundancy during training to facilitate network pruning. To this end, we propose a novel centripetal SGD (C-SGD) to make some filters identical, resulting in ideal redundancy patterns, as such filters become purely redundant due to their duplicates, hence removing them does not harm the network. As shown on CIFAR and ImageNet, C-SGD delivers better performance because the redundancy is better organized, compared to the existing methods. The efficiency also characterizes C-SGD because it is as fast as regular SGD, requires no fine-tuning, and can be conducted simultaneously on all the layers even in very deep CNNs. Besides, C-SGD can improve the accuracy of CNNs by first training a model with the same architecture but wider layers and then squeezing it into the original width.

2.
Lancet Digit Health ; 4(8): e584-e593, 2022 08.
Article in English | MEDLINE | ID: mdl-35725824

ABSTRACT

BACKGROUND: A large training dataset with high-quality annotations is necessary for building an accurate and generalisable deep learning system, which can be difficult and expensive to prepare in medical applications. We present a novel deep-learning-based system, requiring no annotator but weak annotation from a diagnosis report, for accurate and generalisable performance in detecting multiple head disorders from CT scans, including ischaemia, haemorrhage, tumours, and skull fractures. METHODS: Our system was developed on 104 597 head CT scans from the Chinese PLA General Hospital, with associated textual diagnosis reports. Without expert annotation, we used keyword matching on the reports to automatically generate disorder labels for each scan. The labels were inaccurate because of the unreliable annotator-free strategy and inexact because of scan-level annotation. We proposed RoLo, a novel weakly supervised learning algorithm, with a noise-tolerant mechanism and a multi-instance learning strategy to address these issues. RoLo was tested on retrospective (2357 scans from the Chinese PLA General Hospital), prospective (650 scans from the Chinese PLA General Hospital), cross-centre (1525 scans from the Brain Hospital of Hunan Province), cross-equipment (1484 scans from the Chinese PLA General Hospital), and cross-nation (CQ500 public dataset from India) test datasets. Four radiologists were tested on the prospective test dataset before and after viewing system recommendations to assess whether the system could improve diagnostic performance. FINDINGS: The area under the receiver operating characteristic curve for detecting the four disorder types was 0·976 (95% CI 0·976-0·976) for retrospective, 0·975 (0·974-0·976) for prospective, 0·965 (0·964-0·966) for cross-centre, and 0·971 (0·971-0·972) for cross-equipment test datasets, and 0·964 (0·964-0·966) for CQ500 (with only haemorrhage and fracture). The system achieved similar performance to four radiologists and helped to improve sensitivity and specificity by 0·109 (95% CI 0·086-0·131) and 0·022 (0·017-0·026), respectively. INTERPRETATION: Without expert annotated data, our system achieved accurate and generalisable performance for head disorder detection. The system improved the diagnostic performance of radiologists. Because of its accuracy and generalisability, our computer-aided diganostic system could be used in clinical practice to improve the accuracy and efficiency of radiologists in different hospitals. FUNDING: National Key R&D Program of China, National Natural Science Foundation of China, and Beijing Natural Science Foundation.


Subject(s)
Deep Learning , Algorithms , Polyesters , Prospective Studies , Retrospective Studies
3.
Eur Radiol ; 32(4): 2235-2245, 2022 Apr.
Article in English | MEDLINE | ID: mdl-34988656

ABSTRACT

BACKGROUND: Main challenges for COVID-19 include the lack of a rapid diagnostic test, a suitable tool to monitor and predict a patient's clinical course and an efficient way for data sharing among multicenters. We thus developed a novel artificial intelligence system based on deep learning (DL) and federated learning (FL) for the diagnosis, monitoring, and prediction of a patient's clinical course. METHODS: CT imaging derived from 6 different multicenter cohorts were used for stepwise diagnostic algorithm to diagnose COVID-19, with or without clinical data. Patients with more than 3 consecutive CT images were trained for the monitoring algorithm. FL has been applied for decentralized refinement of independently built DL models. RESULTS: A total of 1,552,988 CT slices from 4804 patients were used. The model can diagnose COVID-19 based on CT alone with the AUC being 0.98 (95% CI 0.97-0.99), and outperforms the radiologist's assessment. We have also successfully tested the incorporation of the DL diagnostic model with the FL framework. Its auto-segmentation analyses co-related well with those by radiologists and achieved a high Dice's coefficient of 0.77. It can produce a predictive curve of a patient's clinical course if serial CT assessments are available. INTERPRETATION: The system has high consistency in diagnosing COVID-19 based on CT, with or without clinical data. Alternatively, it can be implemented on a FL platform, which would potentially encourage the data sharing in the future. It also can produce an objective predictive curve of a patient's clinical course for visualization. KEY POINTS: • CoviDet could diagnose COVID-19 based on chest CT with high consistency; this outperformed the radiologist's assessment. Its auto-segmentation analyses co-related well with those by radiologists and could potentially monitor and predict a patient's clinical course if serial CT assessments are available. It can be integrated into the federated learning framework. • CoviDet can be used as an adjunct to aid clinicians with the CT diagnosis of COVID-19 and can potentially be used for disease monitoring; federated learning can potentially open opportunities for global collaboration.


Subject(s)
Artificial Intelligence , COVID-19 , Algorithms , Humans , Radiologists , Tomography, X-Ray Computed/methods
4.
IEEE Trans Cybern ; 52(3): 1798-1811, 2022 Mar.
Article in English | MEDLINE | ID: mdl-32525805

ABSTRACT

Typical image aesthetics assessment (IAA) is modeled for the generic aesthetics perceived by an "average" user. However, such generic aesthetics models neglect the fact that users' aesthetic preferences vary significantly depending on their unique preferences. Therefore, it is essential to tackle the issue for personalized IAA (PIAA). Since PIAA is a typical small sample learning (SSL) problem, existing PIAA models are usually built by fine-tuning the well-established generic IAA (GIAA) models, which are regarded as prior knowledge. Nevertheless, this kind of prior knowledge based on "average aesthetics" fails to incarnate the aesthetic diversity of different people. In order to learn the shared prior knowledge when different people judge aesthetics, that is, learn how people judge image aesthetics, we propose a PIAA method based on meta-learning with bilevel gradient optimization (BLG-PIAA), which is trained using individual aesthetic data directly and generalizes to unknown users quickly. The proposed approach consists of two phases: 1) meta-training and 2) meta-testing. In meta-training, the aesthetics assessment of each user is regarded as a task, and the training set of each task is divided into two sets: 1) support set and 2) query set. Unlike traditional methods that train a GIAA model based on average aesthetics, we train an aesthetic meta-learner model by bilevel gradient updating from the support set to the query set using many users' PIAA tasks. In meta-testing, the aesthetic meta-learner model is fine-tuned using a small amount of aesthetic data of a target user to obtain the PIAA model. The experimental results show that the proposed method outperforms the state-of-the-art PIAA metrics, and the learned prior model of BLG-PIAA can be quickly adapted to unseen PIAA tasks.


Subject(s)
Artificial Intelligence , Esthetics , Esthetics/psychology , Humans , Photography
5.
IEEE Trans Cybern ; 52(10): 10000-10013, 2022 Oct.
Article in English | MEDLINE | ID: mdl-33760749

ABSTRACT

Thanks to large-scale labeled training data, deep neural networks (DNNs) have obtained remarkable success in many vision and multimedia tasks. However, because of the presence of domain shift, the learned knowledge of the well-trained DNNs cannot be well generalized to new domains or datasets that have few labels. Unsupervised domain adaptation (UDA) studies the problem of transferring models trained on one labeled source domain to another unlabeled target domain. In this article, we focus on UDA in visual emotion analysis for both emotion distribution learning and dominant emotion classification. Specifically, we design a novel end-to-end cycle-consistent adversarial model, called CycleEmotionGAN++. First, we generate an adapted domain to align the source and target domains on the pixel level by improving CycleGAN with a multiscale structured cycle-consistency loss. During the image translation, we propose a dynamic emotional semantic consistency loss to preserve the emotion labels of the source images. Second, we train a transferable task classifier on the adapted domain with feature-level alignment between the adapted and target domains. We conduct extensive UDA experiments on the Flickr-LDL and Twitter-LDL datasets for distribution learning and ArtPhoto and Flickr and Instagram datasets for emotion classification. The results demonstrate the significant improvements yielded by the proposed CycleEmotionGAN++ compared to state-of-the-art UDA approaches.


Subject(s)
Neural Networks, Computer , Semantics , Emotions , Humans
6.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6729-6751, 2022 10.
Article in English | MEDLINE | ID: mdl-34214034

ABSTRACT

Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges - the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.


Subject(s)
Algorithms , Emotions , Image Processing, Computer-Assisted
7.
IEEE Trans Image Process ; 30: 9179-9192, 2021.
Article in English | MEDLINE | ID: mdl-34739374

ABSTRACT

RGB-D saliency detection is receiving more and more attention in recent years. There are many efforts have been devoted to this area, where most of them try to integrate the multi-modal information, i.e. RGB images and depth maps, via various fusion strategies. However, some of them ignore the inherent difference between the two modalities, which leads to the performance degradation when handling some challenging scenes. Therefore, in this paper, we propose a novel RGB-D saliency model, namely Dynamic Selective Network (DSNet), to perform salient object detection (SOD) in RGB-D images by taking full advantage of the complementarity between the two modalities. Specifically, we first deploy a cross-modal global context module (CGCM) to acquire the high-level semantic information, which can be used to roughly locate salient objects. Then, we design a dynamic selective module (DSM) to dynamically mine the cross-modal complementary information between RGB images and depth maps, and to further optimize the multi-level and multi-scale information by executing the gated and pooling based selection, respectively. Moreover, we conduct the boundary refinement to obtain high-quality saliency maps with clear boundary details. Extensive experiments on eight public RGB-D datasets show that the proposed DSNet achieves a competitive and excellent performance against the current 17 state-of-the-art RGB-D SOD models.


Subject(s)
Algorithms , Semantics
8.
IEEE Trans Image Process ; 30: 7554-7566, 2021.
Article in English | MEDLINE | ID: mdl-34449360

ABSTRACT

Despite the great success achieved by prevailing binary local descriptors, they are still suffering from two problems: 1) vulnerable to the geometric transformations; 2) lack of an effective treatment to the highly-correlated bits that are generated by directly applying the scheme of image hashing. To tackle both limitations, we propose an unsupervised Transformation-invariant Binary Local Descriptor learning method (TBLD). Specifically, the transformation invariance of binary local descriptors is ensured by projecting the original patches and their transformed counterparts into an identical high-dimensional feature space and an identical low-dimensional descriptor space simultaneously. Meanwhile, it enforces the dissimilar image patches to have distinctive binary local descriptors. Moreover, to reduce high correlations between bits, we propose a bottom-up learning strategy, termed Adversarial Constraint Module, where low-coupling binary codes are introduced externally to guide the learning of binary local descriptors. With the aid of the Wasserstein loss, the framework is optimized to encourage the distribution of the generated binary local descriptors to mimic that of the introduced low-coupling binary codes, eventually making the former more low-coupling. Experimental results on three benchmark datasets well demonstrate the superiority of the proposed method over the state-of-the-art methods. The project page is available at https://github.com/yoqim/TBLD.

9.
IEEE Trans Image Process ; 30: 293-304, 2021.
Article in English | MEDLINE | ID: mdl-33186105

ABSTRACT

While convolutional neural network (CNN) has achieved overwhelming success in various vision tasks, its heavy computational cost and storage overhead limit the practical use on mobile or embedded devices. Recently, compressing CNN models has attracted considerable attention, where pruning CNN filters, also known as the channel pruning, has generated great research popularity due to its high compression rate. In this paper, a new channel pruning framework is proposed, which can significantly reduce the computational complexity while maintaining sufficient model accuracy. Unlike most existing approaches that seek to-be-pruned filters layer by layer, we argue that choosing appropriate layers for pruning is more crucial, which can result in more complexity reduction but less performance drop. To this end, we utilize a long short-term memory (LSTM) to learn the hierarchical characteristics of a network and generate a global network pruning scheme. On top of it, we propose a data-dependent soft pruning method, dubbed Squeeze-Excitation-Pruning (SEP), which does not physically prune any filters but selectively excludes some kernels involved in calculating forward and backward propagations depending on the pruning scheme. Compared with the hard pruning, our soft pruning can better retain the capacity and knowledge of the baseline model. Experimental results demonstrate that our approach still achieves comparable accuracy even when reducing 70.1% Floating-point operation per second (FLOPs) for VGG and 47.5% for Resnet-56.

10.
Article in English | MEDLINE | ID: mdl-32976101

ABSTRACT

Despite the thrilling success achieved by existing binary descriptors, most of them are still in the mire of three limitations: 1) vulnerable to the geometric transformations; 2) incapable of preserving the manifold structure when learning binary codes; 3) NO guarantee to find the true match if multiple candidates happen to have the same Hamming distance to a given query. All these together make the binary descriptor less effective, given large-scale visual recognition tasks. In this paper, we propose a novel learning-based feature descriptor, namely Unsupervised Deep Binary Descriptor (UDBD), which learns transformation invariant binary descriptors via projecting the original data and their transformed sets into a joint binary space. Moreover, we involve a ℓ2,1-norm loss term in the binary embedding process to gain simultaneously the robustness against data noises and less probability of mistakenly flipping bits of the binary descriptor, on top of it, a graph constraint is used to preserve the original manifold structure in the binary space. Furthermore, a weak bit mechanism is adopted to find the real match from candidates sharing the same minimum Hamming distance, thus enhancing matching performance. Extensive experimental results on public datasets show the superiority of UDBD in terms of matching and retrieval accuracy over state-of-the-arts.

11.
Article in English | MEDLINE | ID: mdl-31995495

ABSTRACT

Traditional image aesthetics assessment (IAA) approaches mainly predict the average aesthetic score of an image. However, people tend to have different tastes on image aesthetics, which is mainly determined by their subjective preferences. As an important subjective trait, personality is believed to be a key factor in modeling individual's subjective preference. In this paper, we present a personality-assisted multi-task deep learning framework for both generic and personalized image aesthetics assessment. The proposed framework comprises two stages. In the first stage, a multi-task learning network with shared weights is proposed to predict the aesthetics distribution of an image and Big-Five (BF) personality traits of people who like the image. The generic aesthetics score of the image can be generated based on the predicted aesthetics distribution. In order to capture the common representation of generic image aesthetics and people's personality traits, a Siamese network is trained using aesthetics data and personality data jointly. In the second stage, based on the predicted personality traits and generic aesthetics of an image, an inter-task fusion is introduced to generate individual's personalized aesthetic scores on the image. The performance of the proposed method is evaluated using two public image aesthetics databases. The experimental results demonstrate that the proposed method outperforms the state-of-the-arts in both generic and personalized IAA tasks.

12.
IEEE Trans Image Process ; 28(8): 3752-3765, 2019 Aug.
Article in English | MEDLINE | ID: mdl-30835225

ABSTRACT

Recent years have witnessed the success of deep convolutional neural networks for image classification and many related tasks. It should be pointed out that the existing training strategies assume that there is a clean dataset for model learning. In elaborately constructed benchmark datasets, deep network has yielded promising performance under the assumption. However, in real-world applications, it is burdensome and expensive to collect sufficient clean training samples. On the other hand, collecting noisy labeled samples is very economical and practical, especially with the rapidly increasing amount of visual data in the web. Unfortunately, the accuracy of current deep models may drop dramatically even with 5%-10% label noise. Therefore, enabling label noise resistant classification has become a crucial issue in the data driven deep learning approaches. In this paper, we propose a DEep COnfiDEnce network (DECODE) to address this issue. In particular, based on the distribution of mislabeled data, we adopt a confidence evaluation module that is able to determine the confidence that a sample is mislabeled. With the confidence, we further use a weighting strategy to assign different weights to different samples so that the model pays less attention to low confidence data, which is more likely to be noise. In this way, the deep model is more robust to label noise. DECODE is designed to be general, such that it can be easily combined with existing studies. We conduct extensive experiments on several datasets, and the results validate that DECODE can improve the accuracy of deep models trained with noisy data.

13.
Article in English | MEDLINE | ID: mdl-30452370

ABSTRACT

This paper proposes a deep hashing framework, namely Unsupervised Deep Video Hashing (UDVH), for largescale video similarity search with the aim to learn compact yet effective binary codes. Our UDVH produces the hash codes in a self-taught manner by jointly integrating discriminative video representation with optimal code learning, where an efficient alternating approach is adopted to optimize the objective function. The key differences from most existing video hashing methods lie in 1) UDVH is an unsupervised hashing method that generates hash codes by cooperatively utilizing feature clustering and a specifically-designed binarization with the original neighborhood structure preserved in the binary space; 2) a specific rotation is developed and applied onto video features such that the variance of each dimension can be balanced, thus facilitating the subsequent quantization step. Extensive experiments performed on three popular video datasets show that UDVH is overwhelmingly better than the state-of-the-arts in terms of various evaluation metrics, which makes it practical in real-world applications.

14.
IEEE Trans Cybern ; 48(11): 3218-3231, 2018 Nov.
Article in English | MEDLINE | ID: mdl-29990033

ABSTRACT

Detecting events from massive social media data in social networks can facilitate browsing, search, and monitoring of real-time events by corporations, governments, and users. The short, conversational, heterogeneous, and real-time characteristics of social media data bring great challenges for event detection. The existing event detection approaches rely mainly on textual information, while the visual content of microblogs and the intrinsic correlation among the heterogeneous data are scarcely explored. To deal with the above challenges, we propose a novel real-time event detection method by generating an intermediate semantic level from social multimedia data, named microblog clique (MC), which is able to explore the high correlations among different microblogs. Specifically, the proposed method comprises three stages. First, the heterogeneous data in microblogs is formulated in a hypergraph structure. Hypergraph cut is conducted to group the highly correlated microblogs with the same topics as the MCs, which can address the information inadequateness and data sparseness issues. Second, a bipartite graph is constructed based on the generated MCs and the transfer cut partition is performed to detect the events. Finally, for new incoming microblogs, incremental hypergraph is constructed based on the latest MCs to generate new MCs, which are classified by bipartite graph partition into existing events or new ones. Extensive experiments are conducted on the events in the Brand-Social-Net dataset and the results demonstrate the superiority of the proposed method, as compared to the state-of-the-art approaches.

15.
IEEE Trans Neural Netw Learn Syst ; 29(6): 2472-2487, 2018 06.
Article in English | MEDLINE | ID: mdl-28500009

ABSTRACT

To make the problem of multilabel classification with many classes more tractable, in recent years, academia has seen efforts devoted to performing label space dimension reduction (LSDR). Specifically, LSDR encodes high-dimensional label vectors into low-dimensional code vectors lying in a latent space, so as to train predictive models at much lower costs. With respect to the prediction, it performs classification for any unseen instance by recovering a label vector from its predicted code vector via a decoding process. In this paper, we propose a novel method, namely End-to-End Feature-aware label space Encoding (E2FE), to perform LSDR. Instead of requiring an encoding function like most previous works, E2FE directly learns a code matrix formed by code vectors of the training instances in an end-to-end manner. Another distinct property of E2FE is its feature awareness attributable to the fact that the code matrix is learned by jointly maximizing the recoverability of the label space and the predictability of the latent space. Based on the learned code matrix, E2FE further trains predictive models to map instance features into code vectors, and also learns a linear decoding matrix for efficiently recovering the label vector of any unseen instance from its predicted code vector. Theoretical analyses show that both the code matrix and the linear decoding matrix in E2FE can be efficiently learned. Moreover, similar to previous works, E2FE can be specified to learn an encoding function. And it can also be extended with kernel tricks to handle nonlinear correlations between the feature space and the latent space. Comprehensive experiments conducted on diverse benchmark data sets with many classes show consistent performance gains of E2FE over the state-of-the-art methods.

16.
IEEE Trans Image Process ; 26(7): 3277-3290, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28436875

ABSTRACT

By transferring knowledge from the abundant labeled samples of known source classes, zero-shot learning (ZSL) makes it possible to train recognition models for novel target classes that have no labeled samples. Conventional ZSL approaches usually adopt a two-step recognition strategy, in which the test sample is projected into an intermediary space in the first step, and then the recognition is carried out by considering the similarity between the sample and target classes in the intermediary space. Due to this redundant intermediate transformation, information loss is unavoidable, thus degrading the performance of overall system. Rather than adopting this two-step strategy, in this paper, we propose a novel one-step recognition framework that is able to perform recognition in the original feature space by using directly trained classifiers. To address the lack of labeled samples for training supervised classifiers for the target classes, we propose to transfer samples from source classes with pseudo labels assigned, in which the transferred samples are selected based on their transferability and diversity. Moreover, to account for the unreliability of pseudo labels of transferred samples, we modify the standard support vector machine formulation such that the unreliable positive samples can be recognized and suppressed in the training phase. The entire framework is fairly general with the possibility of further extensions to several common ZSL settings. Extensive experiments on four benchmark data sets demonstrate the superiority of the proposed framework, compared with the state-of-the-art approaches, in various settings.

17.
IEEE Trans Image Process ; 26(3): 1344-1354, 2017 Mar.
Article in English | MEDLINE | ID: mdl-28092559

ABSTRACT

Sparse representation and image hashing are powerful tools for data representation and image retrieval respectively. The combinations of these two tools for scalable image retrieval, i.e., sparse hashing (SH) methods, have been proposed in recent years and the preliminary results are promising. The core of those methods is a scheme that can efficiently embed the (high-dimensional) image features into a low-dimensional Hamming space, while preserving the similarity between features. Existing SH methods mostly focus on finding better sparse representations of images in the hash space. We argue that the anchor set utilized in sparse representation is also crucial, which was unfortunately underestimated by the prior art. To this end, we propose a novel SH method that optimizes the integration of the anchors, such that the features can be better embedded and binarized, termed as Sparse Hashing with Optimized Anchor Embedding. The central idea is to push the anchors far from the axis while preserving their relative positions so as to generate similar hashcodes for neighboring features. We formulate this idea as an orthogonality constrained maximization problem and an efficient and novel optimization framework is systematically exploited. Extensive experiments on five benchmark image data sets demonstrate that our method outperforms several state-of-the-art related methods.

18.
IEEE Trans Cybern ; 47(12): 4342-4355, 2017 Dec.
Article in English | MEDLINE | ID: mdl-28113531

ABSTRACT

For efficiently retrieving nearest neighbors from large-scale multiview data, recently hashing methods are widely investigated, which can substantially improve query speeds. In this paper, we propose an effective probability-based semantics-preserving hashing (SePH) method to tackle the problem of cross-view retrieval. Considering the semantic consistency between views, SePH generates one unified hash code for all observed views of any instance. For training, SePH first transforms the given semantic affinities of training data into a probability distribution, and aims to approximate it with another one in Hamming space, via minimizing their Kullback-Leibler divergence. Specifically, the latter probability distribution is derived from all pair-wise Hamming distances between to-be-learnt hash codes of the training data. Then with learnt hash codes, any kind of predictive models like linear ridge regression, logistic regression, or kernel logistic regression, can be learnt as hash functions in each view for projecting the corresponding view-specific features into hash codes. As for out-of-sample extension, given any unseen instance, the learnt hash functions in its observed views can predict view-specific hash codes. Then by deriving or estimating the corresponding output probabilities with respect to the predicted view-specific hash codes, a novel probabilistic approach is further proposed to utilize them for determining a unified hash code. To evaluate the proposed SePH, we conduct extensive experiments on diverse benchmark datasets, and the experimental results demonstrate that SePH is reasonable and effective.

19.
IEEE Trans Image Process ; 26(1): 107-118, 2017 Jan.
Article in English | MEDLINE | ID: mdl-27775517

ABSTRACT

With the dramatic development of the Internet, how to exploit large-scale retrieval techniques for multimodal web data has become one of the most popular but challenging problems in computer vision and multimedia. Recently, hashing methods are used for fast nearest neighbor search in large-scale data spaces, by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. Inspired by this, in this paper, we introduce a novel supervised cross-modality hashing framework, which can generate unified binary codes for instances represented in different modalities. Particularly, in the learning phase, each bit of a code can be sequentially learned with a discrete optimization scheme that jointly minimizes its empirical loss based on a boosting strategy. In a bitwise manner, hash functions are then learned for each modality, mapping the corresponding representations into unified hash codes. We regard this approach as cross-modality sequential discrete hashing (CSDH), which can effectively reduce the quantization errors arisen in the oversimplified rounding-off step and thus lead to high-quality binary codes. In the test phase, a simple fusion scheme is utilized to generate a unified hash code for final retrieval by merging the predicted hashing results of an unseen instance from different modalities. The proposed CSDH has been systematically evaluated on three standard data sets: Wiki, MIRFlickr, and NUS-WIDE, and the results show that our method significantly outperforms the state-of-the-art multimodality hashing techniques.

SELECTION OF CITATIONS
SEARCH DETAIL
...