Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38090868

RESUMO

Blind face restoration (BFR) aims to recover high-quality (HQ) face images from low-quality (LQ) ones and usually resorts to facial priors for improving restoration performance. However, current methods still suffer from two major difficulties: 1) how to derive a powerful network architecture without extensive hand tuning and 2) how to capture complementary information from multiple facial priors in one network to improve restoration performance. To this end, we propose a face restoration searching network (FRSNet) to adaptively search the suitable feature extraction architecture within our specified search space, which can directly contribute to the restoration quality. On the basis of FRSNet, we further design our multiple facial prior searching network (MFPSNet) with a multiprior learning scheme. MFPSNet optimally extracts information from diverse facial priors and fuses the information into image features, ensuring that both external guidance and internal features are reserved. In this way, MFPSNet takes full advantage of semantic-level (parsing maps), geometric-level (facial heat maps), reference-level (facial dictionaries), and pixel-level (degraded images) information and, thus, generates faithful and realistic images. Quantitative and qualitative experiments show that the MFPSNet performs favorably on both synthetic and real-world datasets against the state-of-the-art (SOTA) BFR methods. The codes are publicly available at: https://github.com/YYJ1anG/MFPSNet.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12635-12649, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37310842

RESUMO

Vision transformers have shown great success on numerous computer vision tasks. However, their central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Linear attention was introduced in natural language processing (NLP) which reorders the self-attention mechanism to mitigate a similar issue, but directly applying existing linear attention to vision may not lead to satisfactory results. We investigate this problem and point out that existing linear attention methods ignore an inductive bias in vision tasks, i.e., 2D locality. In this article, we propose Vicinity Attention, which is a type of linear attention that integrates 2D locality. Specifically, for each image patch, we adjust its attention weight based on its 2D Manhattan distance from its neighbouring patches. In this case, we achieve 2D locality in a linear complexity where the neighbouring image patches receive stronger attention than far away patches. In addition, we propose a novel Vicinity Attention Block that is comprised of Feature Reduction Attention (FRA) and Feature Preserving Connection (FPC) in order to address the computational bottleneck of linear attention approaches, including our Vicinity Attention, whose complexity grows quadratically with respect to the feature dimension. The Vicinity Attention Block computes attention in a compressed feature space with an extra skip connection to retrieve the original feature distribution. We experimentally validate that the block further reduces computation without degenerating the accuracy. Finally, to validate the proposed methods, we build a linear vision transformer backbone named Vicinity Vision Transformer (VVT). Targeting general vision tasks, we build VVT in a pyramid structure with progressively reduced sequence length. We perform extensive experiments on CIFAR-100, ImageNet-1 k, and ADE20 K datasets to validate the effectiveness of our method. Our method has a slower growth rate in terms of computational overhead than previous transformer-based and convolution-based networks when the input resolution increases. In particular, our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous approaches.

3.
IEEE Trans Image Process ; 32: 3040-3053, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37163394

RESUMO

In this paper, we address the problem of video-based rain streak removal by developing an event-aware multi-patch progressive neural network. Rain streaks in video exhibit correlations in both temporal and spatial dimensions. Existing methods have difficulties in modeling the characteristics. Based on the observation, we propose to develop a module encoding events from neuromorphic cameras to facilitate deraining. Events are captured asynchronously at pixel-level only when intensity changes by a margin exceeding a certain threshold. Due to this property, events contain considerable information about moving objects including rain streaks passing though the camera across adjacent frames. Thus we suggest that utilizing it properly facilitates deraining performance non-trivially. In addition, we develop a multi-patch progressive neural network. The multi-patch manner enables various receptive fields by partitioning patches and the progressive learning in different patch levels makes the model emphasize each patch level to a different extent. Extensive experiments show that our method guided by events outperforms the state-of-the-art methods by a large margin in synthetic and real-world datasets.

4.
Nano Lett ; 23(5): 1659-1665, 2023 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-36745111

RESUMO

The interfacial interaction of 2D materials with the substrate leads to striking surface faceting affecting its electronic properties. Here, we quantitatively study the orientation-dependent facet topographies observed on the catalyst under graphene using electron backscatter diffraction and atomic force microscopy. The original flat catalyst surface transforms into two facets: a low-energy low-index surface, e.g. (111), and a vicinal (high-index) surface. The critical role of graphene strain, besides anisotropic interfacial energy, in forming the observed topographies is revealed by molecular simulations. These insights are applicable to other 2D/3D heterostructures.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3968-3978, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-35687621

RESUMO

Recent deep face hallucination methods show stunning performance in super-resolving severely degraded facial images, even surpassing human ability. However, these algorithms are mainly evaluated on non-public synthetic datasets. It is thus unclear how these algorithms perform on public face hallucination datasets. Meanwhile, most of the existing datasets do not well consider the distribution of races, which makes face hallucination methods trained on these datasets biased toward some specific races. To address the above two problems, in this paper, we build a public Ethnically Diverse Face dataset, EDFace-Celeb-1 M, and design a benchmark task for face hallucination. Our dataset includes 1.7 million photos that cover different countries, with relatively balanced race composition. To the best of our knowledge, it is the largest-scale and publicly available face hallucination dataset in the wild. Associated with this dataset, this paper also contributes various evaluation protocols and provides comprehensive analysis to benchmark the existing state-of-the-art methods. The benchmark evaluations demonstrate the performance and limitations of state-of-the-art algorithms. https://github.com/HDCVLab/EDFace-Celeb-1M.


Assuntos
Algoritmos , Benchmarking , Humanos , Alucinações
6.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 1287-1293, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35130145

RESUMO

Video deraining is an important task in computer vision as the unwanted rain hampers the visibility of videos and deteriorates the robustness of most outdoor vision systems. Despite the significant success which has been achieved for video deraining recently, two major challenges remain: 1) how to exploit the vast information among successive frames to extract powerful spatio-temporal features across both the spatial and temporal domains, and 2) how to restore high-quality derained videos with a high-speed approach. In this paper, we present a new end-to-end video deraining framework, dubbed Enhanced Spatio-Temporal Interaction Network (ESTINet), which considerably boosts current state-of-the-art video deraining quality and speed. The ESTINet takes the advantage of deep residual networks and convolutional long short-term memory, which can capture the spatial features and temporal correlations among successive frames at the cost of very little computational resource. Extensive experiments on three public datasets show that the proposed ESTINet can achieve faster speed than the competitors, while maintaining superior performance over the state-of-the-art methods. https://github.com/HDCVLab/Enhanced-Spatio-Temporal-Interaction-Learning-for-Video-Deraining.

7.
IEEE Trans Image Process ; 30: 7608-7619, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34469300

RESUMO

Rain streaks and raindrops are two natural phenomena, which degrade image capture in different ways. Currently, most existing deep deraining networks take them as two distinct problems and individually address one, and thus cannot deal adequately with both simultaneously. To address this, we propose a Dual Attention-in-Attention Model (DAiAM) which includes two DAMs for removing both rain streaks and raindrops. Inside the DAM, there are two attentive maps - each of which attends to the heavy and light rainy regions, respectively, to guide the deraining process differently for applicable regions. In addition, to further refine the result, a Differential-driven Dual Attention-in-Attention Model (D-DAiAM) is proposed with a "heavy-to-light" scheme to remove rain via addressing the unsatisfying deraining regions. Extensive experiments on one public raindrop dataset, one public rain streak and our synthesized joint rain streak and raindrop (JRSRD) dataset have demonstrated that the proposed method not only is capable of removing rain streaks and raindrops simultaneously, but also achieves the state-of-the-art performance on both tasks.

8.
IEEE Trans Image Process ; 30: 7101-7111, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34351860

RESUMO

Single-image super-resolution (SR) and multi-frame SR are two ways to super resolve low-resolution images. Single-Image SR generally handles each image independently, but ignores the temporal information implied in continuing frames. Multi-frame SR is able to model the temporal dependency via capturing motion information. However, it relies on neighbouring frames which are not always available in the real world. Meanwhile, slight camera shake easily causes heavy motion blur on long-distance-shot low-resolution images. To address these problems, a Blind Motion Deblurring Super-Reslution Networks, BMDSRNet, is proposed to learn dynamic spatio-temporal information from single static motion-blurred images. Motion-blurred images are the accumulation over time during the exposure of cameras, while the proposed BMDSRNet learns the reverse process and uses three-streams to learn Bidirectional spatio-temporal information based on well designed reconstruction loss functions to recover clean high-resolution images. Extensive experiments demonstrate that the proposed BMDSRNet outperforms recent state-of-the-art methods, and has the ability to simultaneously deal with image deblurring and SR.

9.
IEEE Trans Image Process ; 30: 7419-7431, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34403338

RESUMO

Images captured in snowy days suffer from noticeable degradation of scene visibility, which degenerates the performance of current vision-based intelligent systems. Removing snow from images thus is an important topic in computer vision. In this paper, we propose a Deep Dense Multi-Scale Network (DDMSNet) for snow removal by exploiting semantic and depth priors. As images captured in outdoor often share similar scenes and their visibility varies with depth from camera, such semantic and depth information provides a strong prior for snowy image restoration. We incorporate the semantic and depth maps as input and learn the semantic-aware and geometry-aware representation to remove snow. In particular, we first create a coarse network to remove snow from the input images. Then, the coarsely desnowed images are fed into another network to obtain the semantic and depth labels. Finally, we design a DDMSNet to learn semantic-aware and geometry-aware representation via a self-attention mechanism to produce the final clean images. Experiments evaluated on public synthetic and real-world snowy images verify the superiority of the proposed method, offering better results both quantitatively and qualitatively. https://github.com/HDCVLab/Deep-Dense-Multi-scale-Network https://github.com/HDCVLab/Deep-Dense-Multi-scale-Network.

10.
IEEE Trans Image Process ; 30: 5085-5095, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33856992

RESUMO

Automatic hand-drawn sketch recognition is an important task in computer vision. However, the vast majority of prior works focus on exploring the power of deep learning to achieve better accuracy on complete and clean sketch images, and thus fail to achieve satisfactory performance when applied to incomplete or destroyed sketch images. To address this problem, we first develop two datasets that contain different levels of scrawl and incomplete sketches. Then, we propose an angular-driven feedback restoration network (ADFRNet), which first detects the imperfect parts of a sketch and then refines them into high quality images, to boost the performance of sketch recognition. By introducing a novel "feedback restoration loop" to deliver information between the middle stages, the proposed model can improve the quality of generated sketch images while avoiding the extra memory cost associated with popular cascading generation schemes. In addition, we also employ a novel angular-based loss function to guide the refinement of sketch images and learn a powerful discriminator in the angular space. Extensive experiments conducted on the proposed imperfect sketch datasets demonstrate that the proposed model is able to efficiently improve the quality of sketch images and achieve superior performance over the current state-of-the-art methods.

11.
IEEE Trans Image Process ; 28(1): 291-301, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30176588

RESUMO

Camera shake or target movement often leads to undesired blur effects in videos captured by a hand-held camera. Despite significant efforts having been devoted to video-deblur research, two major challenges remain: 1) how to model the spatio-temporal characteristics across both the spatial domain (i.e., image plane) and the temporal domain (i.e., neighboring frames) and 2) how to restore sharp image details with respect to the conventionally adopted metric of pixel-wise errors. In this paper, to address the first challenge, we propose a deblurring network (DBLRNet) for spatial-temporal learning by applying a 3D convolution to both the spatial and temporal domains. Our DBLRNet is able to capture jointly spatial and temporal information encoded in neighboring frames, which directly contributes to the improved video deblur performance. To tackle the second challenge, we leverage the developed DBLRNet as a generator in the generative adversarial network (GAN) architecture and employ a content loss in addition to an adversarial loss for efficient adversarial training. The developed network, which we name as deblurring GAN, is tested on two standard benchmarks and achieves the state-of-the-art performance.

12.
J Acoust Soc Am ; 142(4): 1730, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-29092555

RESUMO

Gedeon streaming is known to considerably deteriorate the thermal efficiency of a traveling-wave thermoacoustic engine with looped configuration. The time-average pressure drop induced by a jet pump can efficiently suppress the Gedeon streaming. In this study, such suppression mechanism of the jet pump is investigated, and the emphasis is put on the effects of the dimensionless rounding, the taper angle, and the cross-sectional area ratio. An experimental apparatus has been set up to measure the time-averaged pressure drop induced by the jet pumps in oscillatory flow. Controlled experiments and characterization reveal the time-averaged pressure drop and working efficiency increase with a rise in dimensionless rounding when it is less than 0.15. For jet pumps with the fixed opening areas, the taper angle in the range from 3° to 9° is capable of producing a larger time-averaged pressure drop with a higher working efficiency, and the change of taper angle has little effect on the performance. However, performance degradation is observed as the taper angle increases beyond 9°. Moreover, when the taper angle ranges from 3° to 9°, the time-averaged pressure drop and working efficiency can be improved by increasing the cross-sectional area ratio.

13.
IEEE Trans Image Process ; 26(9): 4193-4203, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28371777

RESUMO

One key challenging issue of facial expression recognition is to capture the dynamic variation of facial physical structure from videos. In this paper, we propose a part-based hierarchical bidirectional recurrent neural network (PHRNN) to analyze the facial expression information of temporal sequences. Our PHRNN models facial morphological variations and dynamical evolution of expressions, which is effective to extract "temporal features" based on facial landmarks (geometry information) from consecutive frames. Meanwhile, in order to complement the still appearance information, a multi-signal convolutional neural network (MSCNN) is proposed to extract "spatial features" from still frames. We use both recognition and verification signals as supervision to calculate different loss functions, which are helpful to increase the variations of different expressions and reduce the differences among identical expressions. This deep evolutional spatial-temporal network (composed of PHRNN and MSCNN) extracts the partial-whole, geometry-appearance, and dynamic-still information, effectively boosting the performance of facial expression recognition. Experimental results show that this method largely outperforms the state-of-the-art ones. On three widely used facial expression databases (CK+, Oulu-CASIA, and MMI), our method reduces the error rates of the previous best ones by 45.5%, 25.8%, and 24.4%, respectively.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...