Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
IEEE Trans Vis Comput Graph ; 29(2): 1330-1344, 2023 Feb.
Article in English | MEDLINE | ID: mdl-34529567

ABSTRACT

Grid collages (GClg) of small image collections are popular and useful in many applications, such as personal album management, online photo posting, and graphic design. In this article, we focus on how visual effects influence individual preferences through various arrangements of multiple images under such scenarios. A novel balance-aware metric is proposed to bridge the gap between multi-image joint presentation and visual pleasure. The metric merges psychological achievements into the field of grid collage. To capture user preference, a bonus mechanism related to a user-specified special location in the grid and uniqueness values of the subimages is integrated into the metric. An end-to-end reinforcement learning mechanism empowers the model without tedious manual annotations. Experiments demonstrate that our metric can evaluate the GClg visual balance in line with human subjective perception, and the model can generate visually pleasant GClg results, which is comparable to manual designs.

2.
IEEE Trans Image Process ; 31: 3386-3398, 2022.
Article in English | MEDLINE | ID: mdl-35471883

ABSTRACT

Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost. However, compressing Transformer remains as an open problem due to its internal complexity of the layer designs, i.e., Multi-Head Attention (MHA) and Feed-Forward Network (FFN). To address this issue, we introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer. LW-Transformer applies Group-wise Transformation to reduce both the parameters and computations of Transformer, while also preserving its two main properties, i.e., the efficient attention modeling on diverse subspaces of MHA, and the expanding-scaling feature transformation of FFN. We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets. Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks. To examine the generalization ability, we apply LW-Transformer to the task of image classification, and build its network based on a recently proposed image Transformer called Swin-Transformer, where the effectiveness can be also confirmed.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(5): 2453-2467, 2022 May.
Article in English | MEDLINE | ID: mdl-33270558

ABSTRACT

Online image hashing has received increasing research attention recently, which processes large-scale data in a streaming fashion to update the hash functions on-the-fly. To this end, most existing works exploit this problem under a supervised setting, i.e., using class labels to boost the hashing performance, which suffers from the defects in both adaptivity and efficiency: First, large amounts of training batches are required to learn up-to-date hash functions, which leads to poor online adaptivity. Second, the training is time-consuming, which contradicts with the core need of online learning. In this paper, a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH), is proposed to address the above two challenges by introducing a novel and efficient inner product operation. To achieve fast online adaptivity, a class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches. Quantitatively, such a decomposition further leads to at least 75 percent storage saving. To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently. Without additional constraints and variables, the time complexity is significantly reduced. Such a scheme is also quantitatively shown to well preserve past information during updating hashing functions. We have quantitatively demonstrated that the collective effort of class-wise updating and semi-relaxation optimization provides a superior performance comparing to various state-of-the-art methods, which is verified through extensive experiments on three widely-used datasets.

4.
IEEE Trans Neural Netw Learn Syst ; 33(12): 7357-7366, 2022 12.
Article in English | MEDLINE | ID: mdl-34101606

ABSTRACT

Popular network pruning algorithms reduce redundant information by optimizing hand-crafted models, and may cause suboptimal performance and long time in selecting filters. We innovatively introduce adaptive exemplar filters to simplify the algorithm design, resulting in an automatic and efficient pruning approach called EPruner. Inspired by the face recognition community, we use a message-passing algorithm Affinity Propagation on the weight matrices to obtain an adaptive number of exemplars, which then act as the preserved filters. EPruner breaks the dependence on the training data in determining the "important" filters and allows the CPU implementation in seconds, an order of magnitude faster than GPU-based SOTAs. Moreover, we show that the weights of exemplars provide a better initialization for the fine-tuning. On VGGNet-16, EPruner achieves a 76.34%-FLOPs reduction by removing 88.80% parameters, with 0.06% accuracy improvement on CIFAR-10. In ResNet-152, EPruner achieves a 65.12%-FLOPs reduction by removing 64.18% parameters, with only 0.71% top-5 accuracy loss on ILSVRC-2012. Our code is available at https://github.com/lmbxmu/EPruner.


Subject(s)
Algorithms , Neural Networks, Computer
5.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 2936-2952, 2021 09.
Article in English | MEDLINE | ID: mdl-33710952

ABSTRACT

Neural architecture search (NAS) has achieved unprecedented performance in various computer vision tasks. However, most existing NAS methods are defected in search efficiency and model generalizability. In this paper, we propose a novel NAS framework, termed MIGO-NAS, with the aim to guarantee the efficiency and generalizability in arbitrary search spaces. On the one hand, we formulate the search space as a multivariate probabilistic distribution, which is then optimized by a novel multivariate information-geometric optimization (MIGO). By approximating the distribution with a sampling, training, and testing pipeline, MIGO guarantees the memory efficiency, training efficiency, and search flexibility. Besides, MIGO is the first time to decrease the estimation error of natural gradient in multivariate distribution. On the other hand, for a set of specific constraints, the neural architectures are generated by a novel dynamic programming network generation (DPNG), which significantly reduces the training cost under various hardware environments. Experiments validate the advantages of our approach over existing methods by establishing a superior accuracy and efficiency i.e., 2.39 test error on CIFAR-10 benchmark and 21.7 on ImageNet benchmark, with only 1.5 GPU hours and 96 GPU hours for searching, respectively. Besides, the searched architectures can be well generalize to computer vision tasks including object detection and semantic segmentation, i.e., 25× FLOPs compression, with 6.4 mAP gain over Pascal VOC dataset, and 29.9× FLOPs compression, with only 1.41 percent performance drop over Cityscapes dataset. The code is publicly available.

6.
IEEE Trans Vis Comput Graph ; 27(4): 2298-2312, 2021 04.
Article in English | MEDLINE | ID: mdl-31647438

ABSTRACT

With the surge of images in the information era, people demand an effective and accurate way to access meaningful visual information. Accordingly, effective and accurate communication of information has become indispensable. In this article, we propose a content-based approach that automatically generates a clear and informative visual summarization based on design principles and cognitive psychology to represent image collections. We first introduce a novel method to make representative and nonredundant summarizations of image collections, thereby ensuring data cleanliness and emphasizing important information. Then, we propose a tree-based algorithm with a two-step optimization strategy to generate the final layout that operates as follows: (1) an initial layout is created by constructing a tree randomly based on the grouping results of the input image set; (2) the layout is refined through a coarse adjustment in a greedy manner, followed by gradient back propagation drawing on the training procedure of neural networks. We demonstrate the usefulness and effectiveness of our method via extensive experimental results and user studies. Our visual summarization algorithm can precisely and efficiently capture the main content of image collections better than alternative methods or commercial tools.

7.
IEEE Trans Pattern Anal Mach Intell ; 42(10): 2410-2422, 2020 Oct.
Article in English | MEDLINE | ID: mdl-31442969

ABSTRACT

In this paper, we address the problem of monocular depth estimation when only a limited number of training image-depth pairs are available. To achieve a high regression accuracy, the state-of-the-art estimation methods rely on CNNs trained with a large number of image-depth pairs, which are prohibitively costly or even infeasible to acquire. Aiming to break the curse of such expensive data collections, we propose a semi-supervised adversarial learning framework that only utilizes a small number of image-depth pairs in conjunction with a large number of easily-available monocular images to achieve high performance. In particular, we use one generator to regress the depth and two discriminators to evaluate the predicted depth, i.e., one inspects the image-depth pair while the other inspects the depth channel alone. These two discriminators provide their feedbacks to the generator as the loss to generate more realistic and accurate depth predictions. Experiments show that the proposed approach can (1) improve most state-of-the-art models on the NYUD v2 dataset by effectively leveraging additional unlabeled data sources; (2) reach state-of-the-art accuracy when the training set is small, e.g., on the Make3D dataset; (3) adapt well to an unseen new dataset (Make3D in our case) after training on an annotated dataset (KITTI in our case).

8.
Oncologist ; 24(9): 1159-1165, 2019 09.
Article in English | MEDLINE | ID: mdl-30996009

ABSTRACT

BACKGROUND: Computed tomography (CT) is essential for pulmonary nodule detection in diagnosing lung cancer. As deep learning algorithms have recently been regarded as a promising technique in medical fields, we attempt to integrate a well-trained deep learning algorithm to detect and classify pulmonary nodules derived from clinical CT images. MATERIALS AND METHODS: Open-source data sets and multicenter data sets have been used in this study. A three-dimensional convolutional neural network (CNN) was designed to detect pulmonary nodules and classify them into malignant or benign diseases based on pathologically and laboratory proven results. RESULTS: The sensitivity and specificity of this well-trained model were found to be 84.4% (95% confidence interval [CI], 80.5%-88.3%) and 83.0% (95% CI, 79.5%-86.5%), respectively. Subgroup analysis of smaller nodules (<10 mm) have demonstrated remarkable sensitivity and specificity, similar to that of larger nodules (10-30 mm). Additional model validation was implemented by comparing manual assessments done by different ranks of doctors with those performed by three-dimensional CNN. The results show that the performance of the CNN model was superior to manual assessment. CONCLUSION: Under the companion diagnostics, the three-dimensional CNN with a deep learning algorithm may assist radiologists in the future by providing accurate and timely information for diagnosing pulmonary nodules in regular clinical practices. IMPLICATIONS FOR PRACTICE: The three-dimensional convolutional neural network described in this article demonstrated both high sensitivity and high specificity in classifying pulmonary nodules regardless of diameters as well as superiority compared with manual assessment. Although it still warrants further improvement and validation in larger screening cohorts, its clinical application could definitely facilitate and assist doctors in clinical practice.


Subject(s)
Deep Learning , Lung Neoplasms/diagnosis , Neural Networks, Computer , Radiographic Image Interpretation, Computer-Assisted/methods , Algorithms , Databases, Factual/statistics & numerical data , Female , Humans , Lung/diagnostic imaging , Lung/pathology , Lung Neoplasms/classification , Lung Neoplasms/diagnostic imaging , Male , Middle Aged , ROC Curve , Retrospective Studies , Tomography, X-Ray Computed/methods
9.
IEEE Trans Vis Comput Graph ; 24(12): 3019-3031, 2018 12.
Article in English | MEDLINE | ID: mdl-29990105

ABSTRACT

In this paper, we present a method for reconstructing the drawing process of Chinese brush paintings. We demonstrate the possibility of computing an artistically reasonable drawing order from a static brush painting that is consistent with the rules of art. We map the key principles of drawing composition to our computational framework, which first organizes the strokes in three stages and then optimizes stroke ordering with natural evolution strategies. Our system produces reasonable animated constructions of Chinese brush paintings with minimal or no user intervention. We test our algorithm on a range of input paintings with varying degrees of complexity and structure and then evaluate the results via a user study. We discuss the applications of the proposed system to painting instruction, painting animation, and image stylization, especially in the context of art teaching.

10.
IEEE Trans Image Process ; 25(3): 1152-62, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26731765

ABSTRACT

In this paper, we present a novel algorithm to simultaneously accomplish color quantization and dithering of images. This is achieved by minimizing a perception-based cost function, which considers pixel-wise differences between filtered versions of the quantized image and the input image. We use edge aware filters in defining the cost function to avoid mixing colors on the opposite sides of an edge. The importance of each pixel is weighted according to its saliency. To rapidly minimize the cost function, we use a modified multi-scale iterative conditional mode (ICM) algorithm, which updates one pixel a time while keeping other pixels unchanged. As ICM is a local method, careful initialization is required to prevent termination at a local minimum far from the global one. To address this problem, we initialize ICM with a palette generated by a modified median-cut method. Compared with previous approaches, our method can produce high-quality results with a fewer visual artifacts but also requires significantly less computational effort.

11.
IEEE Trans Vis Comput Graph ; 22(12): 2564-2578, 2016 12.
Article in English | MEDLINE | ID: mdl-26761821

ABSTRACT

Similar objects are ubiquitous and abundant in both natural and artificial scenes. Determining the visual importance of several similar objects in a complex photograph is a challenge for image understanding algorithms. This study aims to define the importance of similar objects in an image and to develop a method that can select the most important instances for an input image from multiple similar objects. This task is challenging because multiple objects must be compared without adequate semantic information. This challenge is addressed by building an image database and designing an interactive system to measure object importance from human observers. This ground truth is used to define a range of features related to the visual importance of similar objects. Then, these features are used in learning-to-rank and random forest to rank similar objects in an image. Importance predictions were validated on 5,922 objects. The most important objects can be identified automatically. The factors related to composition (e.g., size, location, and overlap) are particularly informative, although clarity and color contrast are also important. We demonstrate the usefulness of similar object importance on various applications, including image retargeting, image compression, image re-attentionizing, image admixture, and manipulation of blindness images.

SELECTION OF CITATIONS
SEARCH DETAIL
...