Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 50
Filter
1.
Article in English | MEDLINE | ID: mdl-38502621

ABSTRACT

Cartoon animation video is a popular visual entertainment form worldwide, however many classic animations were produced in a 4:3 aspect ratio that is incompatible with modern widescreen displays. Existing methods like cropping lead to information loss while retargeting causes distortion. Animation companies still rely on manual labor to renovate classic cartoon animations, which is tedious and labor-intensive, but can yield higher-quality videos. Conventional extrapolation or inpainting methods tailored for natural videos struggle with cartoon animations due to the lack of textures in anime, which affects the motion estimation of the objects. In this paper, we propose a novel framework designed to automatically outpaint 4:3 anime to 16:9 via region-guided motion inference. Our core concept is to identify the motion correspondences between frames within a sequence in order to reconstruct missing pixels. Initially, we estimate optical flow guided by region information to address challenges posed by exaggerated movements and solid-color regions in cartoon animations. Subsequently, frames are stitched to produce a pre-filled guide frame, offering structural clues for the extension of optical flow maps. Finally, a voting and fusion scheme utilizes learned fusion weights to blend the aligned neighboring reference frames, resulting in the final outpainting frame. Extensive experiments confirm the superiority of our approach over existing methods.

2.
Article in English | MEDLINE | ID: mdl-38261497

ABSTRACT

Being essential in animation creation, colorizing anime line drawings is usually a tedious and time-consuming manual task. Reference-based line drawing colorization provides an intuitive way to automatically colorize target line drawings using reference images. The prevailing approaches are based on generative adversarial networks (GANs), yet these methods still cannot generate high-quality results comparable to manually-colored ones. In this paper, a new AnimeDiffusion approach is proposed via hybrid diffusions for the automatic colorization of anime face line drawings. This is the first attempt to utilize the diffusion model for reference-based colorization, which demands a high level of control over the image synthesis process. To do so, a hybrid end-to-end training strategy is designed, including phase 1 for training diffusion model with classifier-free guidance and phase 2 for efficiently updating color tone with a target reference colored image. The model learns denoising and structure-capturing ability in phase 1, and in phase 2, the model learns more accurate color information. Utilizing our hybrid training strategy, the network convergence speed is accelerated, and the colorization performance is improved. Our AnimeDiffusion generates colorization results with semantic correspondence and color consistency. In addition, the model has a certain generalization performance for line drawings of different line styles. To train and evaluate colorization methods, an anime face line drawing colorization benchmark dataset, containing 31,696 training data and 579 testing data, is introduced and shared. Extensive experiments and user studies have demonstrated that our proposed AnimeDiffusion outperforms state-of-the-art GAN-based methods and another diffusion-based model, both quantitatively and qualitatively.

3.
IEEE Trans Cybern ; 54(5): 3299-3312, 2024 May.
Article in English | MEDLINE | ID: mdl-37471181

ABSTRACT

Automatic kidney and tumor segmentation from CT volumes is a critical prerequisite/tool for diagnosis and surgical treatment (such as partial nephrectomy). However, it remains a particularly challenging issue as kidneys and tumors often exhibit large-scale variations, irregular shapes, and blurring boundaries. We propose a novel 3-D network to comprehensively tackle these problems; we call it 3DSN-Net. Compared with existing solutions, it has two compelling characteristics. First, with a new scale-aware feature extraction (SAFE) module, the proposed 3DSN-Net is capable of adaptively selecting appropriate receptive fields according to the sizes of targets instead of indiscriminately enlarging them, which is particularly essential for improving the segmentation accuracy of the tumor with large scale variation. Second, we propose a novel yet efficient nonlocal context guidance (NCG) mechanism to capture global dependencies to tackle irregular shapes and blurring boundaries of kidneys and tumors. Instead of directly harnessing a 3-D NCG mechanism, which makes the number of parameters exponentially increase and hence the network difficult to be trained under limited training data, we develop a 2.5D NCG mechanism based on projections of feature cubes, which achieves a tradeoff between segmentation accuracy and network complexity. We extensively evaluate the proposed 3DSN-Net on the famous KiTS dataset with many challenging kidney and tumor cases. Experimental results demonstrate our solution consistently outperforms state-of-the-art 3-D networks after being equipped with scale aware and NCG mechanisms, particularly for tumor segmentation.


Subject(s)
Kidney , Neoplasms , Humans , Kidney/diagnostic imaging , Tomography, X-Ray Computed , Image Processing, Computer-Assisted
4.
Article in English | MEDLINE | ID: mdl-37883263

ABSTRACT

Video holds significance in computer graphics applications. Because of the heterogeneous of digital devices, retargeting videos becomes an essential function to enhance user viewing experience in such applications. In the research of video retargeting, preserving the relevant visual content in videos, avoiding flicking, and processing time are the vital challenges. Extending image retargeting techniques to the video domain is challenging due to the high running time. Prior work of video retargeting mainly utilizes time-consuming preprocessing to analyze frames. Plus, being tolerant of different video content, avoiding important objects from shrinking, and the ability to play with arbitrary ratios are the limitations that need to be resolved in these systems requiring investigation. In this paper, we present an end-to-end RETVI method to retarget videos to arbitrary aspect ratios. We eliminate the computational bottleneck in the conventional approaches by designing RETVI with two modules, content feature analyzer (CFA) and adaptive deforming estimator (ADE). The extensive experiments and evaluations show that our system outperforms previous work in quality and running time.

5.
Article in English | MEDLINE | ID: mdl-37307186

ABSTRACT

As the metaverse develops rapidly, 3D facial age transformation is attracting increasing attention, which may bring many potential benefits to a wide variety of users, e.g., 3D aging figures creation, 3D facial data augmentation and editing. Compared with 2D methods, 3D face aging is an underexplored problem. To fill this gap, we propose a new mesh-to-mesh Wasserstein generative adversarial network (MeshWGAN) with a multi-task gradient penalty to model a continuous bi-directional 3D facial geometric aging process. To the best of our knowledge, this is the first architecture to achieve 3D facial geometric age transformation via real 3D scans. As previous image-to-image translation methods cannot be directly applied to the 3D facial mesh, which is totally different from 2D images, we built a mesh encoder, decoder, and multi-task discriminator to facilitate mesh-to-mesh transformations. To mitigate the lack of 3D datasets containing children's faces, we collected scans from 765 subjects aged 5-17 in combination with existing 3D face databases, which provided a large training dataset. Experiments have shown that our architecture can predict 3D facial aging geometries with better identity preservation and age closeness compared to 3D trivial baselines. We also demonstrated the advantages of our approach via various 3D face-related graphics applications. Our project will be publicly available at: https://github.com/Easy-Shu/MeshWGAN.

6.
Article in English | MEDLINE | ID: mdl-37030778

ABSTRACT

Image collage is a very useful tool for visualizing an image collection. Most of the existing methods and commercial applications for generating image collages are designed on simple shapes, such as rectangular and circular layouts. This greatly limits the use of image collages in some artistic and creative settings. Although there are some methods that can generate irregularly-shaped image collages, they often suffer from severe image overlapping and excessive blank space. This prevents such methods from being effective information communication tools. In this paper, we present a shape slicing algorithm and an optimization scheme that can create image collages of arbitrary shapes in an informative and visually pleasing manner given an input shape and an image collection. To overcome the challenge of irregular shapes, we propose a novel algorithm, called Shape-Aware Slicing, which partitions the input shape into cells based on medial axis and binary slicing tree. Shape-Aware Slicing, which is designed specifically for irregular shapes, takes human perception and shape structure into account to generate visually pleasing partitions. Then, the layout is optimized by analyzing input images with the goal of maximizing the total salient regions of the images. To evaluate our method, we conduct extensive experiments and compare our results against previous work. The evaluations show that our proposed algorithm can efficiently arrange image collections on irregular shapes and create visually superior results than prior work and existing commercial tools.

7.
Article in English | MEDLINE | ID: mdl-37021849

ABSTRACT

If the video has long been mentioned as a widespread visualization form, the animation sequence in the video is mentioned as storytelling for people. Producing an animation requires intensive human labor from skilled professional artists to obtain plausible animation in both content and motion direction, incredibly for animations with complex content, multiple moving objects, and dense movement. This paper presents an interactive framework to generate new sequences according to the users' preference on the starting frame. The critical contrast of our approach versus prior work and existing commercial applications is that novel sequences with arbitrary starting frame are produced by our system with a consistent degree in both content and motion direction. To achieve this effectively, we first learn the feature correlation on the frameset of the given video through a proposed network called RSFNet. Then, we develop a novel path-finding algorithm, SDPF, which formulates the knowledge of motion directions of the source video to estimate the smooth and plausible sequences. The extensive experiments show that our framework can produce new animations on the cartoon and natural scenes and advance prior works and commercial applications to enable users to obtain more predictable results.

8.
IEEE Trans Vis Comput Graph ; 29(2): 1330-1344, 2023 Feb.
Article in English | MEDLINE | ID: mdl-34529567

ABSTRACT

Grid collages (GClg) of small image collections are popular and useful in many applications, such as personal album management, online photo posting, and graphic design. In this article, we focus on how visual effects influence individual preferences through various arrangements of multiple images under such scenarios. A novel balance-aware metric is proposed to bridge the gap between multi-image joint presentation and visual pleasure. The metric merges psychological achievements into the field of grid collage. To capture user preference, a bonus mechanism related to a user-specified special location in the grid and uniqueness values of the subimages is integrated into the metric. An end-to-end reinforcement learning mechanism empowers the model without tedious manual annotations. Experiments demonstrate that our metric can evaluate the GClg visual balance in line with human subjective perception, and the model can generate visually pleasant GClg results, which is comparable to manual designs.

9.
IEEE Trans Vis Comput Graph ; 28(6): 2517-2529, 2022 Jun.
Article in English | MEDLINE | ID: mdl-33085618

ABSTRACT

Hypnotic line art is a modern form in which white narrow curved ribbons, with the width and direction varying along each path over a black background, provide a keen sense of 3D objects regarding surface shapes and topological contours. However, the procedure of manually creating such line art work can be quite tedious and time-consuming. In this article, we present an interactive system that offers a What-You-See-Is-What-You-Get (WYSIWYG) scheme for producing hypnotic line art images by integrating and placing evenly-spaced streamlines in tensor fields. With an input picture segmented, the user just needs to sketch a few illustrative strokes to guide the construction of a tensor field for each part of the objects therein. Specifically, we propose a new method which controls, with great precision, the aesthetic layout and artistic drawing of an array of streamlines in each tensor field to emulate the style of hypnotic line art. Given several parameters for streamlines such as density, thickness, and sharpness, our system is capable of generating professional-level hypnotic line art work. With great ease of use, it allows art designers to explore a wide variety of possibilities to obtain hypnotic line art results of their own preferences.

10.
IEEE Trans Vis Comput Graph ; 28(8): 2895-2908, 2022 Aug.
Article in English | MEDLINE | ID: mdl-33259303

ABSTRACT

Color design for 3D indoor scenes is a challenging problem due to many factors that need to be balanced. Although learning from images is a commonly adopted strategy, this strategy may be more suitable for natural scenes in which objects tend to have relatively fixed colors. For interior scenes consisting mostly of man-made objects, creative yet reasonable color assignments are expected. We propose C3 Assignment, a system providing diverse suggestions for interior color design while satisfying general global and local rules including color compatibility, color mood, contrast, and user preference. We extend these constraints from the image domain to [Formula: see text], and formulate 3D interior color design as an optimization problem. The design is accomplished in an omnidirectional manner to ensure a comfortable experience when the inhabitant observes the interior scene from possible positions and directions. We design a surrogate-assisted evolutionary algorithm to efficiently solve the highly nonlinear optimization problem for interactive applications, and investigate the system performance concerning problem complexity, solver convergence, and suggestion diversity. Preliminary user studies have been conducted to validate the rule extension from 2D to 3D and to verify system usability.

11.
IEEE Trans Image Process ; 30: 6142-6155, 2021.
Article in English | MEDLINE | ID: mdl-34214036

ABSTRACT

Recently, Convolutional Neural Networks (CNNs) have achieved great improvements in blind image motion deblurring. However, most existing image deblurring methods require a large amount of paired training data and fail to maintain satisfactory structural information, which greatly limits their application scope. In this paper, we present an unsupervised image deblurring method based on a multi-adversarial optimized cycle-consistent generative adversarial network (CycleGAN). Although original CycleGAN can handle unpaired training data well, the generated high-resolution images are probable to lose content and structure information. To solve this problem, we utilize a multi-adversarial mechanism based on CycleGAN for blind motion deblurring to generate high-resolution images iteratively. In this multi-adversarial manner, the hidden layers of the generator are gradually supervised, and the implicit refinement is carried out to generate high-resolution images continuously. Meanwhile, we also introduce the structure-aware mechanism to enhance the structure and detail retention ability of the multi-adversarial network for deblurring by taking the edge map as guidance information and adding multi-scale edge constraint functions. Our approach not only avoids the strict need for paired training data and the errors caused by blur kernel estimation, but also maintains the structural information better with multi-adversarial learning and structure-aware mechanism. Comprehensive experiments on several benchmarks have shown that our approach prevails the state-of-the-art methods for blind image motion deblurring.

12.
IEEE Trans Biomed Eng ; 68(8): 2540-2551, 2021 08.
Article in English | MEDLINE | ID: mdl-33417536

ABSTRACT

Visual understanding of liver vessels anatomy between the living donor-recipient (LDR) pair can assist surgeons to optimize transplant planning by avoiding non-targeted arteries which can cause severe complications. We propose to visually analyze the anatomical variants of the liver vessels anatomy to maximize similarity for finding a suitable Living Donor-Recipient (LDR) pair. Liver vessels are segmented from computed tomography angiography (CTA) volumes by employing a cascade incremental learning (CIL) model. Our CIL architecture is able to find optimal solutions, which we use to update the model with liver vessel CTA images. A novel ternary tree based algorithm is proposed to map all the possible liver vessel variants into their respective tree topologies. The tree topologies of the recipient's and donor's liver vessels are then used for an appropriate matching. The proposed algorithm utilizes a set of defined vessel tree variants which are updated to maintain the maximum matching options by leveraging the accurate segmentation results of the vessels derived from the incremental learning ability of the CIL. We introduce a novel concept of in-order digital string based comparison to match the geometry of two anatomically varied trees. Experiments through visual illustrations and quantitative analysis demonstrated the effectiveness of our approach compared to state-of-the-art.


Subject(s)
Liver Transplantation , Living Donors , Angiography , Humans , Liver/diagnostic imaging , Tomography, X-Ray Computed
13.
IEEE Trans Vis Comput Graph ; 27(4): 2298-2312, 2021 04.
Article in English | MEDLINE | ID: mdl-31647438

ABSTRACT

With the surge of images in the information era, people demand an effective and accurate way to access meaningful visual information. Accordingly, effective and accurate communication of information has become indispensable. In this article, we propose a content-based approach that automatically generates a clear and informative visual summarization based on design principles and cognitive psychology to represent image collections. We first introduce a novel method to make representative and nonredundant summarizations of image collections, thereby ensuring data cleanliness and emphasizing important information. Then, we propose a tree-based algorithm with a two-step optimization strategy to generate the final layout that operates as follows: (1) an initial layout is created by constructing a tree randomly based on the grouping results of the input image set; (2) the layout is refined through a coarse adjustment in a greedy manner, followed by gradient back propagation drawing on the training procedure of neural networks. We demonstrate the usefulness and effectiveness of our method via extensive experimental results and user studies. Our visual summarization algorithm can precisely and efficiently capture the main content of image collections better than alternative methods or commercial tools.

14.
IEEE Trans Vis Comput Graph ; 26(8): 2546-2559, 2020 Aug.
Article in English | MEDLINE | ID: mdl-30676963

ABSTRACT

Depth of field (DOF) is utilized widely to deliver artistic effects in photography. However, existing post-processing techniques for rendering DOF effects introduce visual artifacts such as color leakage, blurring discontinuity, and the partial occlusion problems which limit the application of DOF. Traditionally, occluded pixels are ignored or not well estimated although they might make key contributions to images. In this paper, we propose a new filtering approach which takes approximated occluded pixels into account to synthesize the DOF effects for images. In our approach, images are separated into different layers based on depth. Besides, we utilize adaptive PatchMatch method to estimate the intensities of occluded pixels, especially in the background region. We again propose a new multilayer-neighborhood optimization to estimate occluded pixels contributions and render the images. Finally, we apply gathering filter to achieve the rendered images with elite DOF effects. Multiple experiments have shown that our approach can handle color leakage, blurring discontinuity and partial occlusion problem while providing high-quality DOF rendering effects.

15.
IEEE Trans Vis Comput Graph ; 26(2): 1332-1346, 2020 Feb.
Article in English | MEDLINE | ID: mdl-30207961

ABSTRACT

Decomposing an image into the shading and reflectance layers remains challenging due to its severely under-constrained nature. We present an approach based on illumination decomposition that recovers the intrinsic images without additional information, e.g., depth or user interaction. Our approach is based on the rationale that the shading component contains the step and drift channels simultaneously. We decompose the illumination into two channels: the step shading, corresponding to the sharp shading changes due to cast shadow or abrupt shape changes; the drift shading, accounting for the smooth shading variations due to gradual illumination changes or slow shape changes. Due to such transformation of turning the conventional assumption that shading has smoothness as reasonable prior, our model has the advantages in handling real images, especially with the cast shadows or strong shape edges. We also apply a much stricter edge classifier along with a reinforcement process to enhance our method. We formulate the problem using a two-parameter energy function and split it into two energy functions corresponding to the reflectance and step shading. Experiments on the MIT dataset, the IIW dataset and the MPI Sintel dataset have shown the success of our approach over the state-of-the-art methods.

16.
IEEE Trans Vis Comput Graph ; 25(9): 2763-2776, 2019 09.
Article in English | MEDLINE | ID: mdl-30047889

ABSTRACT

Relief is an art form part way between 3D sculpture and 2D painting. We present a novel approach for generating a texture-mapped high-relief model from a single brush painting. Our aim is to extract the brushstrokes from a painting and generate the individual corresponding relief proxies rather than recovering the exact depth map from the painting, which is a tricky computer vision problem, requiring assumptions that are rarely satisfied. The relief proxies of brushstrokes are then combined together to form a 2.5D high-relief model. To extract brushstrokes from 2D paintings, we apply layer decomposition and stroke segmentation by imposing boundary constraints. The segmented brushstrokes preserve the style of the input painting. By inflation and a displacement map of each brushstroke, the features of brushstrokes are preserved by the resultant high-relief model of the painting. We demonstrate that our approach is able to produce convincing high-reliefs from a variety of paintings(with humans, animals, flowers, etc.). As a secondary application, we show how our brushstroke extraction algorithm could be used for image editing. As a result, our brushstroke extraction algorithm is specifically geared towards paintings with each brushstroke drawn very purposefully, such as Chinese paintings, Rosemailing paintings, etc.

17.
IEEE Trans Vis Comput Graph ; 24(2): 1114-1126, 2018 02.
Article in English | MEDLINE | ID: mdl-28129179

ABSTRACT

Stylizing a 3D model with characteristic shapes or appearances is common in product design, particularly in the design of 3D model merchandise, such as souvenirs, toys, furniture, and stylized items. A model stylization approach is proposed in this study. The approach combines base and style models while preserving user-specified shape features of the base model and the attractive features of the style model with limited assistance from a user. The two models are first combined at the topological level. A tree-growing technique is utilized to search for all possible combinations of the two models. Second, the models are combined at textural and geometric levels by employing a morphing technique. Results show that the proposed approach generates various appealing models and allows users to control the diversity of the output models and adjust the blending degree between the base and style models. The results of this work are also experimentally compared with those of a recent work through a user study. The comparison indicates that our results are more appealing, feature-preserving, and reasonable than those of the compared previous study. The proposed system allows product designers to easily explore design possibilities and assists novice users in creating their own stylized models.

18.
IEEE Trans Vis Comput Graph ; 24(2): 1103-1113, 2018 02.
Article in English | MEDLINE | ID: mdl-28141524

ABSTRACT

Escher transmutation is a graphic art that smoothly transforms one tile pattern into another tile pattern with dual perception. A classic example is the artwork called Sky and Water, in which a compelling figure-ground arrangement is applied to portray the transmutation of a bird in sky and a fish in water. The shape of a bird is progressively deformed and dissolves into the background while the background gradually reveals the shape of a fish. This paper introduces a system to create a variety of Escher-like transmutations, which includes the algorithms for initializing a tile pattern with dual figure-ground arrangement, for searching for the best matched shape of a user-specified motif from a database, and for transforming the content and shapes of tile patterns using a content-aware warping technique. The proposed system, integrating the graphic techniques of tile initialization, shape matching, and shape warping, allows users to create various Escher-like transmutations with minimal user interaction. Experimental results and conducted user studies demonstrate the feasibility and flexibility of the proposed system in Escher art generation.

19.
IEEE Trans Vis Comput Graph ; 23(5): 1534-1545, 2017 05.
Article in English | MEDLINE | ID: mdl-26930686

ABSTRACT

Ambiguous figure-ground images, mostly represented as binary images, are fascinating as they present viewers a visual phenomena of perceiving multiple interpretations from a single image. In one possible interpretation, the white region is seen as a foreground figure while the black region is treated as shapeless background. Such perception can reverse instantly at any moment. In this paper, we investigate the theory behind this ambiguous perception and present an automatic algorithm to generate such images. We model the problem as a binary image composition using two object contours and approach it through a three-stage pipeline. The algorithm first performs a partial shape matching to find a good partial contour matching between objects. This matching is based on a content-aware shape matching metric, which captures features of ambiguous figure-ground images. Then we combine matched contours into a compound contour using an adaptive contour deformation, followed by computing an optimal cropping window and image binarization for the compound contour that maximize the completeness of object contours in the final composition. We have tested our system using a wide range of input objects and generated a large number of convincing examples with or without user guidance. The efficiency of our system and quality of results are verified through an extensive experimental study.

20.
IEEE Trans Vis Comput Graph ; 23(7): 1796-1808, 2017 07.
Article in English | MEDLINE | ID: mdl-27254869

ABSTRACT

We introduce an interactive user-driven method to reconstruct high-relief 3D geometry from a single photo. Particularly, we consider two novel but challenging reconstruction issues: i) common non-rigid objects whose shapes are organic rather than polyhedral/symmetric, and ii) double-sided structures, where front and back sides of some curvy object parts are revealed simultaneously on image. To address these issues, we develop a three-stage computational pipeline. First, we construct a 2.5D model from the input image by user-driven segmentation, automatic layering, and region completion, handling three common types of occlusion. Second, users can interactively mark-up slope and curvature cues on the image to guide our constrained optimization model to inflate and lift up the image layers. We provide real-time preview of the inflated geometry to allow interactive editing. Third, we stitch and optimize the inflated layers to produce a high-relief 3D model. Compared to previous work, we can generate high-relief geometry with large viewing angles, handle complex organic objects with multiple occluded regions and varying shape profiles, and reconstruct objects with double-sided structures. Lastly, we demonstrate the applicability of our method on a wide variety of input images with human, animals, flowers, etc.

SELECTION OF CITATIONS
SEARCH DETAIL
...