Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38713571

ABSTRACT

Text-to-image generation models have significantly broadened the horizons of creative expression through the power of natural language. However, navigating these models to generate unique concepts, alter their appearance, or reimagine them in unfamiliar roles presents an intricate challenge. For instance, how can we exploit language-guided models to transpose an anime character into a different art style, or envision a beloved character in a radically different setting or role? This paper unveils a novel approach named DreamAnime, designed to provide this level of creative freedom. Using a minimal set of 2-3 images of a user-specified concept such as an anime character or an art style, we teach our model to encapsulate its essence through novel "words" in the embedding space of a pre-existing text-to-image model. Crucially, we disentangle the concepts of style and identity into two separate "words", thus providing the ability to manipulate them independently. These distinct "words" can then be pieced together into natural language sentences, promoting an intuitive and personalized creative process. Empirical results suggest that this disentanglement into separate word embeddings successfully captures a broad range of unique and complex concepts, with each word focusing on style or identity as appropriate. Comparisons with existing methods illustrate DreamAnime's superior capacity to accurately interpret and recreate the desired concepts across various applications and tasks. Code is available at https://github.com/chnshx/DreamAnime.

2.
Article in English | MEDLINE | ID: mdl-38335081

ABSTRACT

Throughout history, static paintings have captivated viewers within display frames, yet the possibility of making these masterpieces vividly interactive remains intriguing. This research paper introduces 3DArtmator, a novel approach that aims to represent artforms in a highly interpretable stylized space, enabling 3D-aware animatable reconstruction and editing. Our rationale is to transfer the interpretability and 3D controllability of the latent space in a 3D-aware GAN to a stylized sub-space of a customized GAN, revitalizing the original artforms. To this end, the proposed two-stage optimization framework of 3DArtmator begins with discovering an anchor in the original latent space that accurately mimics the pose and content of a given art painting. This anchor serves as a reliable indicator of the original latent space local structure, therefore sharing the same editable predefined expression vectors. In the second stage, we train a customized 3D-aware GAN specific to the input artform, while enforcing the preservation of the original latent local structure through a meticulous style-directional difference loss. This approach ensures the creation of a stylized sub-space that remains interpretable and retains 3D control. The effectiveness and versatility of 3DArtmator are validated through extensive experiments across a diverse range of art styles. With the ability to generate 3D reconstruction and editing for artforms while maintaining interpretability, 3DArtmator opens up new possibilities for artistic exploration and engagement.

3.
IEEE Trans Image Process ; 32: 2636-2648, 2023.
Article in English | MEDLINE | ID: mdl-37115827

ABSTRACT

Using a sequence of discrete still images to tell a story or introduce a process has become a tradition in the field of digital visual media. With the surge in these media and the requirements in downstream tasks, acquiring their main topics or genres in a very short time is urgently needed. As a representative form of the media, comic enjoys a huge boom as it has gone digital. However, different from natural images, comic images are divided by panels, and the images are not visually consistent from page to page. Therefore, existing works tailored for natural images perform poorly in analyzing comics. Considering the identification of comic genres is tied to the overall story plotting, a long-term understanding that makes full use of the semantic interactions between multi-level comic fragments needs to be fully exploited. In this paper, we propose [Formula: see text]Comic, a Panel-Page-aware Comic genre classification model, which takes page sequences of comics as the input and produces class-wise probabilities. [Formula: see text]Comic utilizes detected panel boxes to extract panel representations and deploys self-attention to construct panel-page understanding, assisted with interdependent classifiers to model label correlation. We develop the first comic dataset for the task of comic genre classification with multi-genre labels. Our approach is proved by experiments to outperform state-of-the-art methods on related tasks. We also validate the extensibility of our network to perform in the multi-modal scenario. Finally, we show the practicability of our approach by giving effective genre prediction results for whole comic books.

4.
IEEE Trans Vis Comput Graph ; 29(9): 4001-4014, 2023 Sep.
Article in English | MEDLINE | ID: mdl-35552137

ABSTRACT

Occluding effects have been frequently used to present weather conditions and environments in cartoon animations, such as raining, snowing, moving leaves, and moving petals. While these effects greatly enrich the visual appeal of the cartoon animations, they may also cause undesired occlusions on the content area, which significantly complicate the analysis and processing of the cartoon animations. In this article, we make the first attempt to separate the occluding effects and content for cartoon animations. The major challenge of this problem is that, unlike natural effects that are realistic and small-sized, the effects of cartoons are usually stylistic and large-sized. Besides, effects in cartoons are manually drawn, so their motions are more unpredictable than realistic effects. To separate occluding effects and content for cartoon animations, we propose to leverage the difference in the motion patterns of the effects and the content, and capture the locations of the effects based on a multi-scale flow-based effect prediction (MFEP) module. A dual-task learning system is designed to extract the effect video and reconstruct the effect-removed content video at the same time. We apply our method on a large number of cartoon videos of different content and effects. Experiments show that our method significantly outperforms the existing methods. We further demonstrate how the separated effects and content facilitate the analysis and processing of cartoon videos through different applications, including segmentation, inpainting, and effect migration.

5.
Article in English | MEDLINE | ID: mdl-36343000

ABSTRACT

Photorealistic multiview face synthesis from a single image is a challenging problem. Existing works mainly learn a texture mapping model from the source to the target faces. However, they rarely consider the geometric constraints on the internal deformation arising from pose variations, which causes a high level of uncertainty in face pose modeling, and hence, produces inferior results for large pose variations. Moreover, current methods typically suffer from undesired facial details loss due to the adoption of the de-facto standard encoder-decoder architecture without any skip connections (SCs). In this article, we directly learn and exploit geometric constraints and propose a fully deformable network to simultaneously model the deformations of both landmarks and faces for face synthesis. Specifically, our model consists of two parts: a deformable landmark learning network (DLLN) and a gated deformable face synthesis network (GDFSN). The DLLN converts an initial reference landmark to an individual-specific target landmark as delicate pose guidance for face rotation. The GDFSN adopts a dual-stream structure, with one stream estimating the deformation of two views in the form of convolution offsets according to the source pose and the converted target pose, and the other leveraging the predicted deformation offsets to create the target face. In this way, individual-aware pose changes are explicitly modeled in the face generator to cope with geometric transformation, by adaptively focusing on pertinent regions of the source face. To compensate for offset estimation errors, we introduce a soft-gating mechanism for adaptive fusion between deformable features and primitive features. Additionally, a pose-aligned SC (PASC) is tailored to propagate low-level input features to the appropriate positions in the output features for further enhancing the facial details and identity preservation. Extensive experiments on six benchmarks show that our approach performs favorably against the state-of-the-arts, especially with large pose changes. Code is available at https://github.com/cschengxu/FDFace.

6.
Article in English | MEDLINE | ID: mdl-37015410

ABSTRACT

Converting a human portrait to anime style is a desirable but challenging problem. Existing methods fail to resolve this problem due to the large inherent gap between two domains that cannot be overcome by a simple direct mapping. For this reason, these methods struggle to preserve the appearance features in the original photo. In this paper, we discover an intermediate domain, the coser portrait (portraits of humans costuming as anime characters), that helps bridge this gap. It alleviates the learning ambiguity and loosens the mapping difficulty in a progressive manner. Specifically, we start from learning the mapping between coser and anime portraits, and present a proxy-guided domain adaptation learning scheme with three progressive adaptation stages to shift the initial model to the human portrait domain. In this way, our model can generate visually pleasant anime portraits with well-preserved appearances given the human portrait. Our model adopts a disentangled design by breaking down the translation problem into two specific subtasks of face deformation and portrait stylization. This further elevates the generation quality. Extensive experimental results show that our model can achieve visually compelling translation with better appearance preservation and perform favorably against the existing methods both qualitatively and quantitatively. Our code and datasets are available at https://github.com/NeverGiveU/PDA-Translation.

7.
IEEE Trans Image Process ; 30: 6700, 2021.
Article in English | MEDLINE | ID: mdl-34339368

ABSTRACT

In the above article [1], unfortunately, Fig. 5 was not displayed correctly with many empty images. The correct version is supplemented here.

8.
IEEE Trans Image Process ; 30: 6024-6035, 2021.
Article in English | MEDLINE | ID: mdl-34181543

ABSTRACT

Existing GAN-based multi-view face synthesis methods rely heavily on "creating" faces, and thus they struggle in reproducing the faithful facial texture and fail to preserve identity when undergoing a large angle rotation. In this paper, we combat this problem by dividing the challenging large-angle face synthesis into a series of easy small-angle rotations, and each of them is guided by a face flow to maintain faithful facial details. In particular, we propose a Face Flow-guided Generative Adversarial Network (FFlowGAN) that is specifically trained for small-angle synthesis. The proposed network consists of two modules, a face flow module that aims to compute a dense correspondence between the input and target faces. It provides strong guidance to the second module, face synthesis module, for emphasizing salient facial texture. We apply FFlowGAN multiple times to progressively synthesize different views, and therefore facial features can be propagated to the target view from the very beginning. All these multiple executions are cascaded and trained end-to-end with a unified back-propagation, and thus we ensure each intermediate step contributes to the final result. Extensive experiments demonstrate the proposed divide-and-conquer strategy is effective, and our method outperforms the state-of-the-art on four benchmark datasets qualitatively and quantitatively.

9.
IEEE Trans Vis Comput Graph ; 27(1): 178-189, 2021 Jan.
Article in English | MEDLINE | ID: mdl-31352345

ABSTRACT

Deep learning has been recently demonstrated as an effective tool for raster-based sketch simplification. Nevertheless, it remains challenging to simplify extremely rough sketches. We found that a simplification network trained with a simple loss, such as pixel loss or discriminator loss, may fail to retain the semantically meaningful details when simplifying a very sketchy and complicated drawing. In this paper, we show that, with a well-designed multi-layer perceptual loss, we are able to obtain aesthetic and neat simplification results preserving semantically important global structures as well as fine details without blurriness and excessive emphasis on local structures. To do so, we design a multi-layer discriminator by fusing all VGG feature layers to differentiate sketches and clean lines. The weights used in layer fusing are automatically learned via an intelligent adjustment mechanism. Furthermore, to evaluate our method, we compare our method to state-of-the-art methods through multiple experiments, including visual comparison and intensive user study.

10.
IEEE Trans Neural Netw Learn Syst ; 32(8): 3761-3769, 2021 Aug.
Article in English | MEDLINE | ID: mdl-32822308

ABSTRACT

With the explosive growth of action categories, zero-shot action recognition aims to extend a well-trained model to novel/unseen classes. To bridge the large knowledge gap between seen and unseen classes, in this brief, we visually associate unseen actions with seen categories in a visually connected graph, and the knowledge is then transferred from the visual features space to semantic space via the grouped attention graph convolutional networks (GAGCNs). In particular, we extract visual features for all the actions, and a visually connected graph is built to attach seen actions to visually similar unseen categories. Moreover, the proposed grouped attention mechanism exploits the hierarchical knowledge in the graph so that the GAGCN enables propagating the visual-semantic connections from seen actions to unseen ones. We extensively evaluate the proposed method on three data sets: HMDB51, UCF101, and NTU RGB + D. Experimental results show that the GAGCN outperforms state-of-the-art methods.

11.
IEEE Trans Vis Comput Graph ; 24(5): 1705-1716, 2018 05.
Article in English | MEDLINE | ID: mdl-28436877

ABSTRACT

Most graphics hardware features memory to store textures and vertex data for rendering. However, because of the irreversible trend of increasing complexity of scenes, rendering a scene can easily reach the limit of memory resources. Thus, vertex data are preferably compressed, with a requirement that they can be decompressed during rendering. In this paper, we present a novel method to exploit existing hardware texture compression circuits to facilitate the decompression of vertex data in graphics processing unit (GPUs). This built-in hardware allows real-time, random-order decoding of data. However, vertex data must be packed into textures, and careless packing arrangements can easily disrupt data coherence. Hence, we propose an optimization approach for the best vertex data permutation that minimizes compression error. All of these result in fast and high-quality vertex data decompression for real-time rendering. To further improve the visual quality, we introduce vertex clustering to reduce the dynamic range of data during quantization. Our experiments demonstrate the effectiveness of our method for various vertex data of 3D models during rendering with the advantages of a minimized memory footprint and high frame rate.

12.
IEEE Trans Vis Comput Graph ; 24(8): 2440-2455, 2018 08.
Article in English | MEDLINE | ID: mdl-28650819

ABSTRACT

Superfluidity is a special state of matter exhibiting macroscopic quantum phenomena and acting like a fluid with zero viscosity. In such a state, superfluid vortices exist as phase singularities of the model equation with unique distributions. This paper presents novel techniques to aid the visual understanding of superfluid vortices based on the state-of-the-art non-linear Klein-Gordon equation, which evolves a complex scalar field, giving rise to special vortex lattice/ring structures with dynamic vortex formation, reconnection, and Kelvin waves, etc. By formulating a numerical model with theoretical physicists in superfluid research, we obtain high-quality superfluid flow data sets without noise-like waves, suitable for vortex visualization. By further exploring superfluid vortex properties, we develop a new vortex identification and visualization method: a novel mechanism with velocity circulation to overcome phase singularity and an orthogonal-plane strategy to avoid ambiguity. Hence, our visualizations can help reveal various superfluid vortex structures and enable domain experts for related visual analysis, such as the steady vortex lattice/ring structures, dynamic vortex string interactions with reconnections and energy radiations, where the famous Kelvin waves and decaying vortex tangle were clearly observed. These visualizations have assisted physicists to verify the superfluid model, and further explore its dynamic behavior more intuitively.

13.
IEEE Trans Vis Comput Graph ; 23(5): 1479-1491, 2017 05.
Article in English | MEDLINE | ID: mdl-26915127

ABSTRACT

Traditional methods in graphics to simulate liquid-air dynamics under different scenarios usually employ separate approaches with sophisticated interface tracking/reconstruction techniques. In this paper, we propose a novel unified approach which is easy and effective to produce a variety of liquid-air interface phenomena. These phenomena, such as complex surface splashes, bubble interactions, as well as surface tension effects, can co-exist in one single simulation, and are created within the same computational framework. Such a framework is unique in that it is free from any complicated interface tracking/reconstruction procedures. Our approach is developed from the two-phase lattice Boltzmann method with the mean field model, which provides a unified framework for interface dynamics but is numerically unstable under turbulent conditions. Considering the drawbacks of the existing approaches, we propose techniques to suppress oscillations for significant stability enhancement, as well as derive a new subgrid-scale model to further improve stability, faithfully preserving liquid-air interface details without excessive diffusion by taking into account the density variation. The whole framework is highly parallel, enabling very efficient implementation. Comparisons with the related approaches show superiority on stable simulations with detail preservation and multiphase phenomena simultaneously involved. A set of animation results demonstrate the effectiveness of our method.

14.
IEEE Trans Vis Comput Graph ; 23(8): 1910-1923, 2017 08.
Article in English | MEDLINE | ID: mdl-27323365

ABSTRACT

While ASCII art is a worldwide popular art form, automatic generating structure-based ASCII art from natural photographs remains challenging. The major challenge lies on extracting the perception-sensitive structure from the natural photographs so that a more concise ASCII art reproduction can be produced based on the structure. However, due to excessive amount of texture in natural photos, extracting perception-sensitive structure is not easy, especially when the structure may be weak and within the texture region. Besides, to fit different target text resolutions, the amount of the extracted structure should also be controllable. To tackle these challenges, we introduce a visual perception mechanism of non-classical receptive field modulation (non-CRF modulation) from physiological findings to this ASCII art application, and propose a new model of non-CRF modulation which can better separate the weak structure from the crowded texture, and also better control the scale of texture suppression. Thanks to our non-CRF model, more sensible ASCII art reproduction can be obtained. In addition, to produce more visually appealing ASCII arts, we propose a novel optimization scheme to obtain the optimal placement of proportional-font characters. We apply our method on a rich variety of images, and visually appealing ASCII art can be obtained in all cases.

15.
Sensors (Basel) ; 16(4)2016 Apr 11.
Article in English | MEDLINE | ID: mdl-27077855

ABSTRACT

Exterior orientation parameters' (EOP) estimation using space resection plays an important role in topographic reconstruction for push broom scanners. However, existing models of space resection are highly sensitive to errors in data. Unfortunately, for lunar imagery, the altitude data at the ground control points (GCPs) for space resection are error-prone. Thus, existing models fail to produce reliable EOPs. Motivated by a finding that for push broom scanners, angular rotations of EOPs can be estimated independent of the altitude data and only involving the geographic data at the GCPs, which are already provided, hence, we divide the modeling of space resection into two phases. Firstly, we estimate the angular rotations based on the reliable geographic data using our proposed mathematical model. Then, with the accurate angular rotations, the collinear equations for space resection are simplified into a linear problem, and the global optimal solution for the spatial position of EOPs can always be achieved. Moreover, a certainty term is integrated to penalize the unreliable altitude data for increasing the error tolerance. Experimental results evidence that our model can obtain more accurate EOPs and topographic maps not only for the simulated data, but also for the real data from Chang'E-1, compared to the existing space resection model.

16.
Sensors (Basel) ; 15(2): 4326-52, 2015 Feb 12.
Article in English | MEDLINE | ID: mdl-25686317

ABSTRACT

We propose a novel biometric recognition method that identifies the inner knuckle print (IKP). It is robust enough to confront uncontrolled lighting conditions, pose variations and low imaging quality. Such robustness is crucial for its application on portable devices equipped with consumer-level cameras. We achieve this robustness by two means. First, we propose a novel feature extraction scheme that highlights the salient structure and suppresses incorrect and/or unwanted features. The extracted IKP features retain simple geometry and morphology and reduce the interference of illumination. Second, to counteract the deformation induced by different hand orientations, we propose a novel structure-context descriptor based on local statistics. To our best knowledge, we are the first to simultaneously consider the illumination invariance and deformation tolerance for appearance-based low-resolution hand biometrics. Settings in previous works are more restrictive. They made strong assumptions either about the illumination condition or the restrictive hand orientation. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods in terms of recognition accuracy, especially under uncontrolled lighting conditions and the flexible hand orientation requirement.

SELECTION OF CITATIONS
SEARCH DETAIL
...