Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Assunto principal
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38354074

RESUMO

Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient in conveying the overall scene context, it may be insufficient to control precisely. In this paper, we explore customized video generation by utilizing text as context description and motion structure (e.g. frame- wise depth) as concrete guidance. Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules. This two-stage learning scheme not only reduces the computing resources required, but also improves the performance by transferring the rich concepts available in image datasets solely into video generation. Moreover, we use a simple yet effective causal attention mask strategy to enable longer video synthesis, which mitigates the potential quality degradation effectively. Experimental results show the superiority of our method over existing baselines, particularly in terms of temporal coherence and fidelity to users' guidance. In addition, our model enables several intriguing applications that demonstrate potential for practical usage. The code, model weights, and videos are publicly available at our project page: https://doubiiu.github.io/projects/Make-Your-Video/.

2.
Artigo em Inglês | MEDLINE | ID: mdl-32142436

RESUMO

Image composition is one of the most important applications in image processing. However, the inharmonious appearance between the spliced region and background degrade the quality of the image. Thus, we address the problem of Image Harmonization: Given a spliced image and the mask of the spliced region, we try to harmonize the "style" of the pasted region with the background (non-spliced region). Previous approaches have been focusing on learning directly by the neural network. In this work, we start from an empirical observation: the differences can only be found in the spliced region between the spliced image and the harmonized result while they share the same semantic information and the appearance in the nonspliced region. Thus, in order to learn the feature map in the masked region and the others individually, we propose a novel attention module named Spatial-Separated Attention Module (S2AM). Furthermore, we design a novel image harmonization framework by inserting the S2AM in the coarser low-level features of the Unet structure by two different ways. Besides image harmonization, we make a big step for harmonizing the composite image without the specific mask under previous observation. The experiments show that the proposed S2AM performs better than other state-of-the-art attention modules in our task. Moreover, we demonstrate the advantages of our model against other state-of-the-art image harmonization methods via criteria from multiple points of view.

3.
Sci Data ; 6(1): 226, 2019 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-31641123

RESUMO

Shells are very common objects in the world, often used for decorations, collections, academic research, etc. With tens of thousands of species, shells are not easy to identify manually. Until now, no one has proposed the recognition of shells using machine learning techniques. We initially present a shell dataset, containing 7894 shell species with 29622 samples, where totally 59244 shell images for shell features extraction and recognition are used. Three features of shells, namely colour, shape and texture were generated from 134 shell species with 10 samples, which were then validated by two different classifiers: k-nearest neighbours (k-NN) and random forest. Since the development of conchology is mature, we believe this dataset can represent a valuable resource for automatic shell recognition. The extracted features of shells are also useful in developing and optimizing new machine learning techniques. Furthermore, we hope more researchers can present new methods to extract shell features and develop new classifiers based on this dataset, in order to improve the recognition performance of shell species.


Assuntos
Exoesqueleto , Animais , Aprendizado de Máquina
4.
IEEE Comput Graph Appl ; 39(2): 52-64, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30530355

RESUMO

Research works in novel viewpoint synthesis are based mainly on multiview input images. In this paper, we focus on a more challenging and ill-posed problem that is to synthesize surrounding novel viewpoints from a single image. To achieve this goal, we design a full resolution network to extract fine-scale image features, which contributes to prevent blurry artifacts. We also involve a pretrained relative depth estimation network, thus three-dimensional information is utilized to infer the flow field between the input and the target image. Since the depth network is trained by depth order between any pair of objects, large-scale image features are also involved in our system. Finally, a synthesis layer is used to not only warp the observed pixels to the desired positions but also hallucinate the missing pixels from other recorded pixels. Experiments show that our technique successfully synthesizes reasonable novel viewpoints surrounding the input, while other state-of-the-art techniques fail.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...