Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 42
Filter
1.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13083-13099, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37335789

ABSTRACT

While 3D visual saliency aims to predict regional importance of 3D surfaces in agreement with human visual perception and has been well researched in computer vision and graphics, latest work with eye-tracking experiments shows that state-of-the-art 3D visual saliency methods remain poor at predicting human fixations. Cues emerging prominently from these experiments suggest that 3D visual saliency might associate with 2D image saliency. This paper proposes a framework that combines a Generative Adversarial Network and a Conditional Random Field for learning visual saliency of both a single 3D object and a scene composed of multiple 3D objects with image saliency ground truth to 1) investigate whether 3D visual saliency is an independent perceptual measure or just a derivative of image saliency and 2) provide a weakly supervised method for more accurately predicting 3D visual saliency. Through extensive experiments, we not only demonstrate that our method significantly outperforms the state-of-the-art approaches, but also manage to answer the interesting and worthy question proposed within the title of this paper.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 905-918, 2023 Jan.
Article in English | MEDLINE | ID: mdl-35104210

ABSTRACT

Face portrait line drawing is a unique style of art which is highly abstract and expressive. However, due to its high semantic constraints, many existing methods learn to generate portrait drawings using paired training data, which is costly and time-consuming to obtain. In this paper, we propose a novel method to automatically transform face photos to portrait drawings using unpaired training data with two new features; i.e., our method can (1) learn to generate high quality portrait drawings in multiple styles using a single network and (2) generate portrait drawings in a "new style" unseen in the training data. To achieve these benefits, we (1) propose a novel quality metric for portrait drawings which is learned from human perception, and (2) introduce a quality loss to guide the network toward generating better looking portrait drawings. We observe that existing unpaired translation methods such as CycleGAN tend to embed invisible reconstruction information indiscriminately in the whole drawings due to significant information imbalance between the photo and portrait drawing domains, which leads to important facial features missing. To address this problem, we propose a novel asymmetric cycle mapping that enforces the reconstruction information to be visible and only embedded in the selected facial regions. Along with localized discriminators for important facial regions, our method well preserves all important facial features in the generated drawings. Generator dissection further explains that our model learns to incorporate face semantic information during drawing generation. Extensive experiments including a user study show that our model outperforms state-of-the-art methods.

3.
IEEE Trans Vis Comput Graph ; 28(2): 1317-1327, 2022 Feb.
Article in English | MEDLINE | ID: mdl-32755863

ABSTRACT

3D models are commonly used in computer vision and graphics. With the wider availability of mesh data, an efficient and intrinsic deep learning approach to processing 3D meshes is in great need. Unlike images, 3D meshes have irregular connectivity, requiring careful design to capture relations in the data. To utilize the topology information while staying robust under different triangulations, we propose to encode mesh connectivity using Laplacian spectral analysis, along with mesh feature aggregation blocks (MFABs) that can split the surface domain into local pooling patches and aggregate global information amongst them. We build a mesh hierarchy from fine to coarse using Laplacian spectral clustering, which is flexible under isometric transformations. Inside the MFABs there are pooling layers to collect local information and multi-layer perceptrons to compute vertex features of increasing complexity. To obtain the relationships among different clusters, we introduce a Correlation Net to compute a correlation matrix, which can aggregate the features globally by matrix multiplication with cluster features. Our network architecture is flexible enough to be used on meshes with different numbers of vertices. We conduct several experiments including shape segmentation and classification, and our method outperforms state-of-the-art algorithms for these tasks on the ShapeNet and COSEG datasets.

4.
IEEE Trans Pattern Anal Mach Intell ; 43(10): 3462-3475, 2021 10.
Article in English | MEDLINE | ID: mdl-32310761

ABSTRACT

Despite significant effort and notable success of neural style transfer, it remains challenging for highly abstract styles, in particular line drawings. In this paper, we propose APDrawingGAN++, a generative adversarial network (GAN) for transforming face photos to artistic portrait drawings (APDrawings), which addresses substantial challenges including highly abstract style, different drawing techniques for different facial features, and high perceptual sensitivity to artifacts. To address these, we propose a composite GAN architecture that consists of local networks (to learn effective representations for specific facial features) and a global network (to capture the overall content). We provide a theoretical explanation for the necessity of this composite GAN structure by proving that any GAN with a single generator cannot generate artistic styles like APDrawings. We further introduce a classification-and-synthesis approach for lips and hair where different drawing styles are used by artists, which applies suitable styles for a given input. To capture the highly abstract art form inherent in APDrawings, we address two challenging operations-(1) coping with lines with small misalignments while penalizing large discrepancy and (2) generating more continuous lines-by introducing two novel loss terms: one is a novel distance transform loss with nonlinear mapping and the other is a novel line continuity loss, both of which improve the line quality. We also develop dedicated data augmentation and pre-training to further improve results. Extensive experiments, including a user study, show that our method outperforms state-of-the-art methods, both qualitatively and quantitatively.

5.
IEEE Trans Vis Comput Graph ; 27(1): 151-164, 2021 Jan.
Article in English | MEDLINE | ID: mdl-31329121

ABSTRACT

Recently, effort has been made to apply deep learning to the detection of mesh saliency. However, one major barrier is to collect a large amount of vertex-level annotation as saliency ground truth for training the neural networks. Quite a few pilot studies showed that this task is difficult. In this work, we solve this problem by developing a novel network trained in a weakly supervised manner. The training is end-to-end and does not require any saliency ground truth but only the class membership of meshes. Our Classification-for-Saliency CNN (CfS-CNN) employs a multi-view setup and contains a newly designed two-channel structure which integrates view-based features of both classification and saliency. It essentially transfers knowledge from 3D object classification to mesh saliency. Our approach significantly outperforms the existing state-of-the-art methods according to extensive experimental results. Also, the CfS-CNN can be directly used for scene saliency. We showcase two novel applications based on scene saliency to demonstrate its utility.

6.
PLoS One ; 15(9): e0239840, 2020.
Article in English | MEDLINE | ID: mdl-32970775

ABSTRACT

The association between alcohol outlets and violence has long been recognised, and is commonly used to inform policing and licensing policies (such as staggered closing times and zoning). Less investigated, however, is the association between violent crime and other urban points of interest, which while associated with the city centre alcohol consumption economy, are not explicitly alcohol outlets. Here, machine learning (specifically, LASSO regression) is used to model the distribution of violent crime for the central 9 km2 of ten large UK cities. Densities of 620 different Point of Interest types (sourced from Ordnance Survey) are used as predictors, with the 10 most explanatory variables being automatically selected for each city. Cross validation is used to test generalisability of each model. Results show that the inclusion of additional point of interest types produces a more accurate model, with significant increases in performance over a baseline univariate alcohol-outlet only model. Analysis of chosen variables for city-specific models shows potential candidates for new strategies on a per-city basis, with combined-model variables showing the general trend in POI/violence association across the UK. Although alcohol outlets remain the best individual predictor of violence, other points of interest should also be considered when modelling the distribution of violence in city centres. The presented method could be used to develop targeted, city-specific initiatives that go beyond alcohol outlets and also consider other locations.


Subject(s)
Crime/statistics & numerical data , Urban Population/statistics & numerical data , Cities/statistics & numerical data , Crime/classification , Housing/statistics & numerical data , Humans , Restaurants/statistics & numerical data , Spatio-Temporal Analysis , United Kingdom
7.
Biodivers Data J ; 8: e47051, 2020.
Article in English | MEDLINE | ID: mdl-32269476

ABSTRACT

Digitisation of natural history collections has evolved from creating databases for the recording of specimens' catalogue and label data to include digital images of specimens. This has been driven by several important factors, such as a need to increase global accessibility to specimens and to preserve the original specimens by limiting their manual handling. The size of the collections pointed to the need of high throughput digitisation workflows. However, digital imaging of large numbers of fragile specimens is an expensive and time-consuming process that should be performed only once. To achieve this, the digital images produced need to be useful for the largest set of applications possible and have a potentially unlimited shelf life. The constraints on digitisation speed need to be balanced against the applicability and longevity of the images, which, in turn, depend directly on the quality of those images. As a result, the quality criteria that specimen images need to fulfil influence the design, implementation and execution of digitisation workflows. Different standards and guidelines for producing quality research images from specimens have been proposed; however, their actual adaptation to suit the needs of different types of specimens requires further analysis. This paper presents the digitisation workflow implemented by Meise Botanic Garden (MBG). This workflow is relevant because of its modular design, its strong focus on image quality assessment, its flexibility that allows combining in-house and outsourced digitisation, processing, preservation and publishing facilities and its capacity to evolve for integrating alternative components from different sources. The design and operation of the digitisation workflow is provided to showcase how it was derived, with particular attention to the built-in audit trail within the workflow, which ensures the scalable production of high-quality specimen images and how this audit trail ensures that new modules do not affect either the speed of imaging or the quality of the images produced.

8.
Article in English | MEDLINE | ID: mdl-32203021

ABSTRACT

Mesh color edit propagation aims to propagate the color from a few color strokes to the whole mesh, which is useful for mesh colorization, color enhancement and color editing, etc. Compared with image edit propagation, luminance information is not available for 3D mesh data, so the color edit propagation is more difficult on 3D meshes than images, with far less research carried out. This paper proposes a novel solution based on sparse graph regularization. Firstly, a few color strokes are interactively drawn by the user, and then the color will be propagated to the whole mesh by minimizing a sparse graph regularized nonlinear energy function. The proposed method effectively measures geometric similarity over shapes by using a set of complementary multiscale feature descriptors, and effectively controls color bleeding via a sparse ℓ1 optimization rather than quadratic minimization used in existing work. The proposed framework can be applied for the task of interactive mesh colorization, mesh color enhancement and mesh color editing. Extensive qualitative and quantitative experiments show that the proposed method outperforms the state-of-the-art methods.

9.
IEEE Trans Pattern Anal Mach Intell ; 42(6): 1537-1544, 2020 Jun.
Article in English | MEDLINE | ID: mdl-31056488

ABSTRACT

Finding the informative subspaces of high-dimensional datasets is at the core of numerous applications in computer vision, where spectral-based subspace clustering is arguably the most widely studied method due to its strong empirical performance. Such algorithms first compute an affinity matrix to construct a self-representation for each sample using other samples as a dictionary. Sparsity and connectivity of the self-representation play important roles in effective subspace clustering. However, simultaneous optimization of both factors is difficult due to their conflicting nature, and most existing methods are designed to address only one factor. In this paper, we propose a post-processing technique to optimize both sparsity and connectivity by finding good neighbors. Good neighbors induce key connections among samples within a subspace and not only have large affinity coefficients but are also strongly connected to each other. We reassign the coefficients of the good neighbors and eliminate other entries to generate a new coefficient matrix. We show that the few good neighbors can effectively recover the subspace, and the proposed post-processing step of finding good neighbors is complementary to most existing subspace clustering algorithms. Experiments on five benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods with negligible additional computation cost.

10.
IEEE Trans Pattern Anal Mach Intell ; 42(6): 1394-1407, 2020 Jun.
Article in English | MEDLINE | ID: mdl-30762528

ABSTRACT

In this paper we have developed a family of shape measures. All the measures from the family evaluate the degree to which a shape looks like a predefined convex polygon. A quite new approach in designing object shape based measures has been applied. In most cases such measures were defined by exploiting some shape properties. Such properties are optimized (e.g., maximized or minimized) by certain shapes and based on this, the new shape measures were defined. An illustrative example might be the shape circularity measure derived by exploiting the well-known result that the circle has the largest area among all the shapes with the same perimeter. Of course, there are many more such examples (e.g., ellipticity, linearity, elongation, and squareness measures are some of them). There are different approaches as well. In the approach applied here, no desired property is needed and no optimizing shape has to be found. We start from a desired convex polygon, and develop the related shape measure. The method also allows a tuning parameter. Thus, there is a new 2-fold family of shape measures, dependent on a predefined convex polygon, and a tuning parameter, that controls the measure's behavior. The measures obtained range over the interval (0,1] and pick the maximal possible value, equal to 1, if and only if the measured shape coincides with the selected convex polygon that was used to develop the particular measure. All the measures are invariant with respect to translations, rotations, and scaling transformations. An extension of the method leads to a family of new shape convexity measures.

11.
IEEE Trans Neural Netw Learn Syst ; 31(8): 2832-2846, 2020 08.
Article in English | MEDLINE | ID: mdl-31199274

ABSTRACT

Class imbalance is a challenging problem in many classification tasks. It induces biased classification results for minority classes that contain less training samples than others. Most existing approaches aim to remedy the imbalanced number of instances among categories by resampling the majority and minority classes accordingly. However, the imbalanced level of difficulty of recognizing different categories is also crucial, especially for distinguishing samples with many classes. For example, in the task of clinical skin disease recognition, several rare diseases have a small number of training samples, but they are easy to diagnose because of their distinct visual properties. On the other hand, some common skin diseases, e.g., eczema, are hard to recognize due to the lack of special symptoms. To address this problem, we propose a self-paced balance learning (SPBL) algorithm in this paper. Specifically, we introduce a comprehensive metric termed the complexity of image category that is a combination of both sample number and recognition difficulty. First, the complexity is initialized using the model of the first pace, where the pace indicates one iteration in the self-paced learning paradigm. We then assign each class a penalty weight that is larger for more complex categories and smaller for easier ones, after which the curriculum is reconstructed by rearranging the training samples. Consequently, the model can iteratively learn discriminative representations via balancing the complexity in each pace. Experimental results on the SD-198 and SD-260 benchmark data sets demonstrate that the proposed SPBL algorithm performs favorably against the state-of-the-art methods. We also demonstrate the effectiveness of the SPBL algorithm's generalization capacity on various tasks, such as indoor scene image recognition and object classification.


Subject(s)
Algorithms , Machine Learning , Pattern Recognition, Automated/methods , Skin Diseases/diagnosis , Databases, Factual/statistics & numerical data , Humans , Pattern Recognition, Automated/statistics & numerical data
12.
IEEE Trans Vis Comput Graph ; 26(6): 2204-2218, 2020 06.
Article in English | MEDLINE | ID: mdl-30530330

ABSTRACT

An importance measure of 3D objects inspired by human perception has a range of applications since people want computers to behave like humans in many tasks. This paper revisits a well-defined measure, distinction of 3D surface mesh, which indicates how important a region of a mesh is with respect to classification. We develop a method to compute it based on a classification network and a Markov Random Field (MRF). The classification network learns view-based distinction by handling multiple views of a 3D object. Using a classification network has an advantage of avoiding the training data problem which has become a major obstacle of applying deep learning to 3D object understanding tasks. The MRF estimates the parameters of a linear model for combining the view-based distinction maps. The experiments using several publicly accessible datasets show that the distinctive regions detected by our method are not just significantly different from those detected by methods based on handcrafted features, but more consistent with human perception. We also compare it with other perceptual measures and quantitatively evaluate its performance in the context of two applications. Furthermore, due to the view-based nature of our method, we are able to easily extend mesh distinction to 3D scenes containing multiple objects.

13.
Article in English | MEDLINE | ID: mdl-31478853

ABSTRACT

State-of-the-art neural style transfer methods have demonstrated amazing results by training feed-forward convolutional neural networks or using an iterative optimization strategy. The image representation used in these methods, which contains two components: style representation and content representation, is typically based on high-level features extracted from pretrained classification networks. Because the classification networks are originally designed for object recognition, the extracted features often focus on the central object and neglect other details. As a result, the style textures tend to scatter over the stylized outputs and disrupt the content structures. To address this issue, we present a novel image stylization method that involves an additional structure representation. Our structure representation, which considers two factors: i) the global structure represented by the depth map and ii) the local structure details represented by the image edges, effectively reflects the spatial distribution of all the components in an image as well as the structure of dominant objects respectively. Experimental results demonstrate that our method achieves an impressive visual effectiveness, which is particularly significant when processing images sensitive to structure distortion, e.g. images containing multiple objects potentially at different depths, or dominant objects with clear structures.

14.
Article in English | MEDLINE | ID: mdl-31034411

ABSTRACT

Given a reference colour image and a destination grayscale image, this paper presents a novel automatic colourisation algorithm that transfers colour information from the reference image to the destination image. Since the reference and destination images may contain content at different or even varying scales (due to changes of distance between objects and the camera), existing texture matching based methods can often perform poorly. We propose a novel cross-scale texture matching method to improve the robustness and quality of the colourisation results. Suitable matching scales are considered locally, which are then fused using global optimisation that minimises both the matching errors and spatial change of scales. The minimisation is efficiently solved using a multi-label graph-cut algorithm. Since only low-level texture features are used, texture matching based colourisation can still produce semantically incorrect results, such as meadow appearing above the sky. We consider a class of semantic violation where the statistics of up-down relationships learnt from the reference image are violated and propose an effective method to identify and correct unreasonable colourisation. Finally, a novel nonlocal ℓ1 optimisation framework is developed to propagate high confidence micro-scribbles to regions of lower confidence to produce a fully colourised image. Qualitative and quantitative evaluations show that our method outperforms several state-of-the-art methods.

15.
IEEE Trans Image Process ; 28(8): 3973-3985, 2019 Aug.
Article in English | MEDLINE | ID: mdl-30843836

ABSTRACT

In this paper, we propose a unified framework to discover the number of clusters and group the data points into different clusters using subspace clustering simultaneously. Real data distributed in a high-dimensional space can be disentangled into a union of low-dimensional subspaces, which can benefit various applications. To explore such intrinsic structure, state-of-the-art subspace clustering approaches often optimize a self-representation problem among all samples, to construct a pairwise affinity graph for spectral clustering. However, a graph with pairwise similarities lacks robustness for segmentation, especially for samples which lie on the intersection of two subspaces. To address this problem, we design a hyper-correlation-based data structure termed as the triplet relationship, which reveals high relevance and local compactness among three samples. The triplet relationship can be derived from the self-representation matrix, and be utilized to iteratively assign the data points to clusters. Based on the triplet relationship, we propose a unified optimizing scheme to automatically calculate clustering assignments. Specifically, we optimize a model selection reward and a fusion reward by simultaneously maximizing the similarity of triplets from different clusters while minimizing the correlation of triplets from the same cluster. The proposed algorithm also automatically reveals the number of clusters and fuses groups to avoid over-segmentation. Extensive experimental results on both synthetic and real-world datasets validate the effectiveness and robustness of the proposed method.

16.
Emotion ; 19(4): 746-750, 2019 Jun.
Article in English | MEDLINE | ID: mdl-30080075

ABSTRACT

Recent research has linked facial expressions to mind perception. Specifically, Bowling and Banissy (2017) found that ambiguous doll-human morphs were judged as more likely to have a mind when smiling. Herein, we investigate 3 key potential boundary conditions of this "expression-to-mind" effect. First, we demonstrate that face inversion impairs the ability of happy expressions to signal mindful states in static faces; however, inversion does not disrupt this effect for dynamic displays of emotion. Finally, we demonstrate that not all emotions have equivalent effects. Whereas happy faces generate more mind ascription compared to neutral faces, we find that expressions of disgust actually generate less mind ascription than those of happiness. (PsycINFO Database Record (c) 2019 APA, all rights reserved).


Subject(s)
Emotions/physiology , Facial Expression , Adult , Female , Humans , Male , Young Adult
17.
Sci Rep ; 8(1): 11901, 2018 08 09.
Article in English | MEDLINE | ID: mdl-30093680

ABSTRACT

There is a large body of historical documents that are too fragile to be opened or unrolled, making their contents inaccessible. Recent improvements in X-ray scanning technology and computer vision techniques make it possible to perform a "virtual" unrolling of such documents. We describe a novel technique to process a stack of 3D X-ray images to identify the surface of parchment scrolls, unroll them, and create a visualization of their written contents. Unlike existing techniques, we can handle even challenging cases with minimal manual interaction. Our novel approach was deployed on two 15th and 16th century damaged historic scrolls from the manors of Bressingham and Diss Heywood. The former has become fused, probably due to exposure to moisture, and cannot be fully unrolled. The latter was severely burnt several hundred years ago, becoming thoroughly charred, heat-shrunken, and distorted, with all the sheets now brittle and fused together. Our virtual unrolling revealed text that has been hidden for centuries.

18.
IEEE Trans Image Process ; 27(4): 1914-1926, 2018 Apr.
Article in English | MEDLINE | ID: mdl-29990041

ABSTRACT

We develop a framework to virtually unroll fragile historical parchment scrolls, which cannot be physically unfolded via a sequence of X-ray tomographic slices, thus providing easy access to those parchments whose contents have remained hidden for centuries. The first step is to produce a topologically correct segmentation, which is challenging as the parchment layers vary significantly in thickness, contain substantial interior textures and can often stick together in places. For this purpose, our method starts with linking the broken layers in a slice using the topological structure propagated from its previous processed slice. To ensure topological correctness, we identify fused regions by detecting junction sections, and then match them using global optimization efficiently solved by the blossom algorithm, taking into account the shape energy of curves separating fused layers. The fused layers are then separated using as-parallel-as-possible curves connecting junction section pairs. To flatten the segmented parchment, pixels in different frames need to be put into alignment. This is achieved via a dynamic programming-based global optimization, which minimizes the total matching distances and penalizes stretches. Eventually, the text of the parchment is revealed by ink projection. We demonstrate the effectiveness of our approach using challenging real-world data sets, including the water damaged fifteenth century Bressingham scroll.

19.
IEEE Trans Image Process ; 27(11): 5288-5302, 2018 Nov.
Article in English | MEDLINE | ID: mdl-29994213

ABSTRACT

For image retrieval methods based on bag of visual words, much attention has been paid to enhancing the discriminative powers of the local features. Although retrieved images are usually similar to a query in minutiae, they may be significantly different from a semantic perspective, which can be effectively distinguished by convolutional neural networks (CNN). Such images should not be considered as relevant pairs. To tackle this problem, we propose to construct a dynamic match kernel by adaptively calculating the matching thresholds between query and candidate images based on the pairwise distance among deep CNN features. In contrast to the typical static match kernel which is independent to the global appearance of retrieved images, the dynamic one leverages the semantical similarity as a constraint for determining the matches. Accordingly, we propose a semantic-constrained retrieval framework by incorporating the dynamic match kernel, which focuses on matched patches between relevant images and filters out the ones for irrelevant pairs. Furthermore, we demonstrate that the proposed kernel complements recent methods, such as hamming embedding, multiple assignment, local descriptors aggregation, and graph-based re-ranking, while it outperforms the static one under various settings on off-the-shelf evaluation metrics. We also propose to evaluate the matched patches both quantitatively and qualitatively. Extensive experiments on five benchmark data sets and large-scale distractors validate the merits of the proposed method against the state-of-the-art methods for image retrieval.

20.
J Imaging ; 5(1)2018 Dec 21.
Article in English | MEDLINE | ID: mdl-34470180

ABSTRACT

Single-level principal component analysis (PCA) and multi-level PCA (mPCA) methods are applied here to a set of (2D frontal) facial images from a group of 80 Finnish subjects (34 male; 46 female) with two different facial expressions (smiling and neutral) per subject. Inspection of eigenvalues gives insight into the importance of different factors affecting shapes, including: biological sex, facial expression (neutral versus smiling), and all other variations. Biological sex and facial expression are shown to be reflected in those components at appropriate levels of the mPCA model. Dynamic 3D shape data for all phases of a smile made up a second dataset sampled from 60 adult British subjects (31 male; 29 female). Modes of variation reflected the act of smiling at the correct level of the mPCA model. Seven phases of the dynamic smiles are identified: rest pre-smile, onset 1 (acceleration), onset 2 (deceleration), apex, offset 1 (acceleration), offset 2 (deceleration), and rest post-smile. A clear cycle is observed in standardized scores at an appropriate level for mPCA and in single-level PCA. mPCA can be used to study static shapes and images, as well as dynamic changes in shape. It gave us much insight into the question "what's in a smile?".

SELECTION OF CITATIONS
SEARCH DETAIL
...