Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
Article in English | MEDLINE | ID: mdl-38354074

ABSTRACT

Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient in conveying the overall scene context, it may be insufficient to control precisely. In this paper, we explore customized video generation by utilizing text as context description and motion structure (e.g. frame- wise depth) as concrete guidance. Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules. This two-stage learning scheme not only reduces the computing resources required, but also improves the performance by transferring the rich concepts available in image datasets solely into video generation. Moreover, we use a simple yet effective causal attention mask strategy to enable longer video synthesis, which mitigates the potential quality degradation effectively. Experimental results show the superiority of our method over existing baselines, particularly in terms of temporal coherence and fidelity to users' guidance. In addition, our model enables several intriguing applications that demonstrate potential for practical usage. The code, model weights, and videos are publicly available at our project page: https://doubiiu.github.io/projects/Make-Your-Video/.

2.
IEEE Trans Image Process ; 32: 4259-4274, 2023.
Article in English | MEDLINE | ID: mdl-37486835

ABSTRACT

Conventional social media platforms usually downscale high-resolution (HR) images to restrict their resolution to a specific size for saving transmission/storage cost, which makes those visual details inaccessible to other users. To bypass this obstacle, recent invertible image downscaling methods jointly model the downscaling/upscaling problems and achieve impressive performance. However, they only consider fixed integer scale factors and may be inapplicable to generic downscaling tasks towards resolution restriction as posed by social media platforms. In this paper, we propose an effective and universal Scale-Arbitrary Invertible Image Downscaling Network (AIDN), to downscale HR images with arbitrary scale factors in an invertible manner. Particularly, the HR information is embedded in the downscaled low-resolution (LR) counterparts in a nearly imperceptible form such that our AIDN can further restore the original HR images solely from the LR images. The key to supporting arbitrary scale factors is our proposed Conditional Resampling Module (CRM) that conditions the downscaling/upscaling kernels and sampling locations on both scale factors and image content. Extensive experimental results demonstrate that our AIDN achieves top performance for invertible downscaling with both arbitrary integer and non-integer scale factors. Also, both quantitative and qualitative evaluations show our AIDN is robust to the lossy image compression standard. The source code and trained models are publicly available at https://github.com/Doubiiu/AIDN.

3.
Article in English | MEDLINE | ID: mdl-37220038

ABSTRACT

Traditional halftoning usually drops colors when dithering images with binary dots, which makes it difficult to recover the original color information. We proposed a novel halftoning technique that converts a color image into a binary halftone with full restorability to its original version. Our novel base halftoning technique consists of two convolutional neural networks (CNNs) to produce the reversible halftone patterns, and a noise incentive block (NIB) to mitigate the flatness degradation issue of CNNs. Furthermore, to tackle the conflicts between the blue-noise quality and restoration accuracy in our novel base method, we proposed a predictor-embedded approach to offload predictable information from the network, which in our case is the luminance information resembling from the halftone pattern. Such an approach allows the network to gain more flexibility to produce halftones with better blue-noise quality without compromising the restoration quality. Detailed studies on the multiple-stage training method and loss weightings have been conducted. We have compared our predictor-embedded method and our novel method regarding spectrum analysis on halftone, halftone accuracy, restoration accuracy, and the data embedding studies. Our entropy evaluation evidences our halftone contains less encoding information than our novel base method. The experiments show our predictor-embedded method gains more flexibility to improve the blue-noise quality of halftones and maintains a comparable restoration quality with a higher tolerance for disturbances.

4.
IEEE Trans Vis Comput Graph ; 29(7): 3226-3237, 2023 Jul.
Article in English | MEDLINE | ID: mdl-35239483

ABSTRACT

This work presents an innovative method for point set self-embedding, that encodes the structural information of a dense point set into its sparser version in a visual but imperceptible form. The self-embedded point set can function as the ordinary downsampled one and be visualized efficiently on mobile devices. Particularly, we can leverage the self-embedded information to fully restore the original point set for detailed analysis on remote servers. This task is challenging, since both the self-embedded point set and the restored point set should resemble the original one. To achieve a learnable self-embedding scheme, we design a novel framework with two jointly-trained networks: one to encode the input point set into its self-embedded sparse point set and the other to leverage the embedded information for inverting the original point set back. Further, we develop a pair of up-shuffle and down-shuffle units in the two networks, and formulate loss terms to encourage the shape similarity and point distribution in the results. Extensive qualitative and quantitative results demonstrate the effectiveness of our method on both synthetic and real-scanned datasets. The source code and trained models will be publicly available at https://github.com/liruihui/Self-Embedding.

5.
Article in English | MEDLINE | ID: mdl-36350869

ABSTRACT

Light fields are 4D scene representations that are typically structured as arrays of views or several directional samples per pixel in a single view. However, this highly correlated structure is not very efficient to transmit and manipulate, especially for editing. To tackle this issue, we propose a novel representation learning framework that can encode the light field into a single meta-view that is both compact and editable. Specifically, the meta-view composes of three visual channels and a complementary meta channel that is embedded with geometric and residual appearance information. The visual channels can be edited using existing 2D image editing tools, before reconstructing the whole edited light field. To facilitate edit propagation against occlusion, we design a special editing-aware decoding network that consistently propagates the visual edits to the whole light field upon reconstruction. Extensive experiments show that our proposed method achieves competitive representation accuracy and meanwhile enables consistent edit propagation.

6.
IEEE Trans Vis Comput Graph ; 27(1): 178-189, 2021 Jan.
Article in English | MEDLINE | ID: mdl-31352345

ABSTRACT

Deep learning has been recently demonstrated as an effective tool for raster-based sketch simplification. Nevertheless, it remains challenging to simplify extremely rough sketches. We found that a simplification network trained with a simple loss, such as pixel loss or discriminator loss, may fail to retain the semantically meaningful details when simplifying a very sketchy and complicated drawing. In this paper, we show that, with a well-designed multi-layer perceptual loss, we are able to obtain aesthetic and neat simplification results preserving semantically important global structures as well as fine details without blurriness and excessive emphasis on local structures. To do so, we design a multi-layer discriminator by fusing all VGG feature layers to differentiate sketches and clean lines. The weights used in layer fusing are automatically learned via an intelligent adjustment mechanism. Furthermore, to evaluate our method, we compare our method to state-of-the-art methods through multiple experiments, including visual comparison and intensive user study.

7.
IEEE Trans Pattern Anal Mach Intell ; 43(12): 4491-4504, 2021 Dec.
Article in English | MEDLINE | ID: mdl-32750783

ABSTRACT

Unlike images, finding the desired video content in a large pool of videos is not easy due to the time cost of loading and watching. Most video streaming and sharing services provide the video preview function for a better browsing experience. In this paper, we aim to generate a video preview from a single image. To this end, we propose two cascaded networks, the motion embedding network and the motion expansion network. The motion embedding network aims to embed the spatio-temporal information into an embedded image, called video snapshot. On the other end, the motion expansion network is proposed to invert the video back from the input video snapshot. To hold the invertibility of motion embedding and expansion during training, we design four tailor-made losses and a motion attention module to make the network focus on the temporal information. In order to enhance the viewing experience, our expansion network involves an interpolation module to produce a longer video preview with a smooth transition. Extensive experiments demonstrate that our method can successfully embed the spatio-temporal information of a video into one "live" image, which can be converted back to a video preview. Quantitative and qualitative evaluations are conducted on a large number of videos to prove the effectiveness of our proposed method. In particular, statistics of PSNR and SSIM on a large number of videos show the proposed method is general, and it can generate a high-quality video from a single image.

8.
IEEE Trans Vis Comput Graph ; 24(5): 1705-1716, 2018 05.
Article in English | MEDLINE | ID: mdl-28436877

ABSTRACT

Most graphics hardware features memory to store textures and vertex data for rendering. However, because of the irreversible trend of increasing complexity of scenes, rendering a scene can easily reach the limit of memory resources. Thus, vertex data are preferably compressed, with a requirement that they can be decompressed during rendering. In this paper, we present a novel method to exploit existing hardware texture compression circuits to facilitate the decompression of vertex data in graphics processing unit (GPUs). This built-in hardware allows real-time, random-order decoding of data. However, vertex data must be packed into textures, and careless packing arrangements can easily disrupt data coherence. Hence, we propose an optimization approach for the best vertex data permutation that minimizes compression error. All of these result in fast and high-quality vertex data decompression for real-time rendering. To further improve the visual quality, we introduce vertex clustering to reduce the dynamic range of data during quantization. Our experiments demonstrate the effectiveness of our method for various vertex data of 3D models during rendering with the advantages of a minimized memory footprint and high frame rate.

9.
IEEE Trans Vis Comput Graph ; 24(7): 2103-2117, 2018 07.
Article in English | MEDLINE | ID: mdl-28534776

ABSTRACT

Shading is a tedious process for artists involved in 2D cartoon and manga production given the volume of contents that the artists have to prepare regularly over tight schedule. While we can automate shading production with the presence of geometry, it is impractical for artists to model the geometry for every single drawing. In this work, we aim to automate shading generation by analyzing the local shapes, connections, and spatial arrangement of wrinkle strokes in a clean line drawing. By this, artists can focus more on the design rather than the tedious manual editing work, and experiment with different shading effects under different conditions. To achieve this, we have made three key technical contributions. First, we model five perceptual cues by exploring relevant psychological principles to estimate the local depth profile around strokes. Second, we formulate stroke interpretation as a global optimization model that simultaneously balances different interpretations suggested by the perceptual cues and minimizes the interpretation discrepancy. Lastly, we develop a wrinkle-aware inflation method to generate a height field for the surface to support the shading region computation. In particular, we enable the generation of two commonly-used shading styles: 3D-like soft shading and manga-style flat shading.

10.
IEEE Trans Vis Comput Graph ; 23(8): 1910-1923, 2017 08.
Article in English | MEDLINE | ID: mdl-27323365

ABSTRACT

While ASCII art is a worldwide popular art form, automatic generating structure-based ASCII art from natural photographs remains challenging. The major challenge lies on extracting the perception-sensitive structure from the natural photographs so that a more concise ASCII art reproduction can be produced based on the structure. However, due to excessive amount of texture in natural photos, extracting perception-sensitive structure is not easy, especially when the structure may be weak and within the texture region. Besides, to fit different target text resolutions, the amount of the extracted structure should also be controllable. To tackle these challenges, we introduce a visual perception mechanism of non-classical receptive field modulation (non-CRF modulation) from physiological findings to this ASCII art application, and propose a new model of non-CRF modulation which can better separate the weak structure from the crowded texture, and also better control the scale of texture suppression. Thanks to our non-CRF model, more sensible ASCII art reproduction can be obtained. In addition, to produce more visually appealing ASCII arts, we propose a novel optimization scheme to obtain the optimal placement of proportional-font characters. We apply our method on a rich variety of images, and visually appealing ASCII art can be obtained in all cases.

11.
Sensors (Basel) ; 15(2): 4326-52, 2015 Feb 12.
Article in English | MEDLINE | ID: mdl-25686317

ABSTRACT

We propose a novel biometric recognition method that identifies the inner knuckle print (IKP). It is robust enough to confront uncontrolled lighting conditions, pose variations and low imaging quality. Such robustness is crucial for its application on portable devices equipped with consumer-level cameras. We achieve this robustness by two means. First, we propose a novel feature extraction scheme that highlights the salient structure and suppresses incorrect and/or unwanted features. The extracted IKP features retain simple geometry and morphology and reduce the interference of illumination. Second, to counteract the deformation induced by different hand orientations, we propose a novel structure-context descriptor based on local statistics. To our best knowledge, we are the first to simultaneously consider the illumination invariance and deformation tolerance for appearance-based low-resolution hand biometrics. Settings in previous works are more restrictive. They made strong assumptions either about the illumination condition or the restrictive hand orientation. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods in terms of recognition accuracy, especially under uncontrolled lighting conditions and the flexible hand orientation requirement.

12.
IEEE Trans Vis Comput Graph ; 19(11): 1808-19, 2013 Nov.
Article in English | MEDLINE | ID: mdl-24029902

ABSTRACT

Change blindness refers to human inability to recognize large visual changes between images. In this paper, we present the first computational model of change blindness to quantify the degree of blindness between an image pair. It comprises a novel context-dependent saliency model and a measure of change, the former dependent on the site of the change, and the latter describing the amount of change. This saliency model in particular addresses the influence of background complexity, which plays an important role in the phenomenon of change blindness. Using the proposed computational model, we are able to synthesize changed images with desired degrees of blindness. User studies and comparisons to state-of-the-art saliency models demonstrate the effectiveness of our model.


Subject(s)
Computer Simulation , Image Processing, Computer-Assisted/methods , Models, Biological , Vision Disorders , Visual Perception/physiology , Adult , Algorithms , Analysis of Variance , Female , Humans , Male
13.
Med Eng Phys ; 35(7): 958-68, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23046972

ABSTRACT

B-spline based deformable model is commonly used in recovering three-dimensional (3D) cardiac motion from tagged MRI due to its compact description, localized continuity and control flexibility. However, existing approaches usually ignore an important well-known fact that myocardial tissue is incompressible. In this paper, we propose to reconstruct 3D cardiac motion from tagged MRI using an incompressible B-solid model. We demonstrate that cardiac motion recovery can be achieved more with greater accuracy by considering both smoothness and incompressibility of the myocardium. Specifically, our incompressible B-solid model is formulated as a 3D tensor product of B-splines, where each piece of B-spline represents a smooth and divergence-free displacement field of myocardium with respect to radial, longitudinal and circumferential direction, respectively. We further formulate the fitting of the incompressible B-solid model as an optimization problem and solve it with a two-stage algorithm. Finally, the 3D myocardium strains are obtained from the reconstructed incompressible displacement fields and visualized in a comprehensive way. The proposed method is evaluated on both synthetic and in vivo human datasets. Comparisons with state-of-the-art methods are also conducted to validate the proposed method. Experimental results demonstrate that our method has a higher accuracy and more stable volume-preserving ability than previous methods, yielding an average displacement error of 0.21 mm and a Jacobian determinant mean of 1.029.


Subject(s)
Algorithms , Heart/physiology , Imaging, Three-Dimensional/methods , Movement , Humans , Magnetic Resonance Imaging
14.
IEEE Trans Vis Comput Graph ; 18(11): 1836-48, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22392711

ABSTRACT

Estimating illumination and deformation fields on textures is essential for both analysis and application purposes. Traditional methods for such estimation usually require complicated and sometimes labor-intensive processing. In this paper, we propose a new perspective for this problem and suggest a novel statistical approach which is much simpler and more efficient. Our experiments show that many textures in daily life are statistically invariant in terms of colors and gradients. Variations of such statistics can be assumed to be influenced by illumination and deformation. This implies that we can inversely estimate the spatially varying illumination and deformation according to the variation of the texture statistics. This enables us to decompose a texture photo into an illumination field, a deformation field, and an implicit texture which are illumination- and deformation-free, within a short period of time, and with minimal user input. By processing and recombining these components, a variety of synthesis effects, such as exemplar preparation, texture replacement, surface relighting, as well as geometry modification, can be well achieved. Finally, convincing results are shown to demonstrate the effectiveness of the proposed method.

15.
Phys Med Biol ; 56(19): 6291-310, 2011 Oct 07.
Article in English | MEDLINE | ID: mdl-21896965

ABSTRACT

The epicardial potential (EP)-targeted inverse problem of electrocardiography (ECG) has been widely investigated as it is demonstrated that EPs reflect underlying myocardial activity. It is a well-known ill-posed problem as small noises in input data may yield a highly unstable solution. Traditionally, L2-norm regularization methods have been proposed to solve this ill-posed problem. But the L2-norm penalty function inherently leads to considerable smoothing of the solution, which reduces the accuracy of distinguishing abnormalities and locating diseased regions. Directly using the L1-norm penalty function, however, may greatly increase computational complexity due to its non-differentiability. We propose an L1-norm regularization method in order to reduce the computational complexity and make rapid convergence possible. Variable splitting is employed to make the L1-norm penalty function differentiable based on the observation that both positive and negative potentials exist on the epicardial surface. Then, the inverse problem of ECG is further formulated as a bound-constrained quadratic problem, which can be efficiently solved by gradient projection in an iterative manner. Extensive experiments conducted on both synthetic data and real data demonstrate that the proposed method can handle both measurement noise and geometry noise and obtain more accurate results than previous L2- and L1-norm regularization methods, especially when the noises are large.


Subject(s)
Electrocardiography/methods , Epicardial Mapping/methods , Heart Conduction System/diagnostic imaging , Image Processing, Computer-Assisted/methods , Algorithms , Computer Simulation , Heart Conduction System/pathology , Humans , Models, Cardiovascular , Radiography , Signal-To-Noise Ratio , Time Factors
16.
IEEE Trans Vis Comput Graph ; 17(10): 1499-509, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21817170

ABSTRACT

Environment sampling is a popular technique for rendering scenes with distant environment illumination. However, the temporal consistency of animations synthesized under dynamic environment sequences has not been fully studied. This paper addresses this problem and proposes a novel method, namely spatiotemporal sampling, to fully exploit both the temporal and spatial coherence of environment sequences. Our method treats an environment sequence as a spatiotemporal volume and samples the sequence by stratifying the volume adaptively. For this purpose, we first present a new metric to measure the importance of each stratified volume. A stratification algorithm is then proposed to adaptively suppress the abrupt temporal and spatial changes in the generated sampling patterns. The proposed method is able to automatically adjust the number of samples for each environment frame and produce temporally coherent sampling patterns. Comparative experiments demonstrate the capability of our method to produce smooth and consistent animations under dynamic environment sequences.

17.
Neuroimage ; 54 Suppl 1: S180-8, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20382235

ABSTRACT

The vestibular system is the sensory organ responsible for perceiving head rotational movements and maintaining postural balance of human body. The objectives of this study are to propose an innovative computational technique capable of automatically segmenting the vestibular system and to analyze its geometrical features from high resolution T2-weighted MR images. In this study, the proposed technique was used to test the hypothesis that the morphoanatomy of vestibular system in adolescent idiopathic scoliosis (AIS) patients is different from healthy control subjects. The findings could contribute significantly to the understanding of the etiopathogenesis of AIS. The segmentation pipeline consisted of extraction of region of interest, image pre-processing, K-means clustering, and surface smoothing. The geometry of this high-genus labyrinth structure was analyzed through automatic partition into genus-0 units and approximation using the best-fit circle and plane for each unit. The metrics of the best-fit planes and circles were taken as shape measures. The proposed technique was applied on a cohort of 20 right-thoracic AIS patients (mean age 14.7 years old) and 20 age-matched healthy girls. The intermediate results were validated by subjective scoring. The result showed that the distance between centers of lateral and superior canals and the angle with vertex at the center of posterior canal were significantly smaller in AIS than in healthy controls in the left-side vestibular system with p=0.0264 and p=0.0200 respectively, but not in the right-side counterparts. The detected morphoanatomical changes are likely to be associated with subclinical postural, vestibular and proprioceptive dysfunctions reported frequently in AIS. This study has demonstrated that the proposed method could be applied in MRI-based morphoanatomy studies of vestibular system clinically.


Subject(s)
Image Interpretation, Computer-Assisted/methods , Magnetic Resonance Imaging , Scoliosis/pathology , Vestibule, Labyrinth/pathology , Adolescent , Female , Humans
18.
IEEE Trans Vis Comput Graph ; 17(10): 1475-86, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21149892

ABSTRACT

In this paper, we present a novel method to extract motion of a dynamic object from a video that is captured by a handheld camera, and apply it to a 3D character. Unlike the motion capture techniques, neither special sensors/trackers nor a controllable environment is required. Our system significantly automates motion imitation which is traditionally conducted by professional animators via manual keyframing. Given the input video sequence, we track the dynamic reference object to obtain trajectories of both 2D and 3D tracking points. With them as constraints, we then transfer the motion to the target 3D character by solving an optimization problem to maintain the motion gradients. We also provide a user-friendly editing environment for users to fine tune the motion details. As casual videos can be used, our system, therefore, greatly increases the supply source of motion data. Examples of imitating various types of animal motion are shown.

19.
IEEE Trans Vis Comput Graph ; 17(1): 51-63, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21071787

ABSTRACT

Cube mapping is widely used in many graphics applications due to the availability of hardware support. However, it does not sample the spherical surface evenly. Recently, a uniform spherical mapping, isocube mapping, was proposed. It exploits the six-face structure used in cube mapping and samples the spherical surface evenly. Unfortunately, some texels in isocube mapping are not rectilinear. This nonrectilinear property may degrade the filtering quality. This paper proposes a novel spherical mapping, namely unicube mapping. It has the advantages of cube mapping (exploitation of hardware and rectilinear structure) and isocube mapping (evenly sampling pattern). In the implementation, unicube mapping uses a simple function to modify the lookup vector before the conventional cube map lookup process. Hence, unicube mapping fully exploits the cube map hardware for real-time filtering and lookup. More importantly, its rectilinear partition structure allows a direct and real-time acquisition of the texture environment. This property facilitates dynamic environment mapping in a real time manner.


Subject(s)
Algorithms , Computer Graphics , Computer-Aided Design , Environment Design , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , User-Computer Interface , Equipment Design , Humans , Numerical Analysis, Computer-Assisted
20.
J Med Syst ; 34(3): 261-71, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20503610

ABSTRACT

Realistic modeling of soft tissue deformation is crucial to virtual orthopedic surgery, especially orthopedic trauma surgery which involves layered heterogeneous soft tissues. In this paper, a novel modeling framework for multilayered soft tissue deformation is proposed in order to facilitate the development of orthopedic surgery simulators. We construct our deformable model according to the layered structure of real human organs, and this results in a multilayered model. The division of layers is based on the segmented Chinese Visible Human (CVH) dataset. This enhances the realism and accuracy in the simulation. For the sake of efficiency, we employ 3D mass-spring system to our multilayered model. The nonlinear passive biomechanical properties of skin and skeletal muscle are achieved by introducing a bilinear elasticity scheme to the springs in the mass-spring system. To efficiently and accurately reproduce the biomechanical properties of certain human tissues, an optimization approach is employed in configuring the parameters of the springs. Experimental data from biomechanics literatures are used as benchmarking references. With the employment of Physics Processing Unit (PPU) and high quality volume visualization, our framework is developed into an interactive and intuitive platform for virtual surgery training systems. Several experiments demonstrate the feasibility of the proposed framework in providing interactive and realistic deformation for orthopedic surgery simulation.


Subject(s)
Connective Tissue/surgery , Models, Biological , Orthopedic Procedures/education , User-Computer Interface , Visible Human Projects , Computer Simulation , Connective Tissue/anatomy & histology , Connective Tissue/pathology , Hong Kong , Humans , Orthopedic Procedures/methods , Orthopedics/education , Orthopedics/methods , Surgery, Computer-Assisted/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...