Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38349822

RESUMO

Blind image restoration (IR) is a common yet challenging problem in computer vision. Classical model-based methods and recent deep learning (DL)-based methods represent two different methodologies for this problem, each with its own merits and drawbacks. In this paper, we propose a novel blind image restoration method, aiming to integrate both the advantages of them. Specifically, we construct a general Bayesian generative model for the blind IR, which explicitly depicts the degradation process. In this proposed model, a pixel-wise non-i.i.d. Gaussian distribution is employed to fit the image noise. It is with more flexibility than the simple i.i.d. Gaussian or Laplacian distributions as adopted in most of conventional methods, so as to handle more complicated noise types contained in the image degradation. To solve the model, we design a variational inference algorithm where all the expected posteriori distributions are parameterized as deep neural networks to increase their model capability. Notably, such an inference algorithm induces a unified framework to jointly deal with the tasks of degradation estimation and image restoration. Further, the degradation information estimated in the former task is utilized to guide the latter IR process. Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts. The source code is available at https://github.com/zsyOAOA/VIRNet.

2.
Opt Express ; 32(1): 879-890, 2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38175110

RESUMO

Conventional optical microscopes generally provide blurry and indistinguishable images for subwavelength nanostructures. However, a wealth of intensity and phase information is hidden in the corresponding diffraction-limited optical patterns and can be used for the recognition of structural features, such as size, shape, and spatial arrangement. Here, we apply a deep-learning framework to improve the spatial resolution of optical imaging for metal nanostructures with regular shapes yet varied arrangement. A convolutional neural network (CNN) is constructed and pre-trained by the optical images of randomly distributed gold nanoparticles as input and the corresponding scanning-electron microscopy images as ground truth. The CNN is then learned to recover reversely the non-diffracted super-resolution images of both regularly arranged nanoparticle dimers and randomly clustered nanoparticle multimers from their blurry optical images. The profiles and orientations of these structures can also be reconstructed accurately. Moreover, the same network is extended to deblur the optical images of randomly cross-linked silver nanowires. Most sections of these intricate nanowire nets are recovered well with a slight discrepancy near their intersections. This deep-learning augmented framework opens new opportunities for computational super-resolution optical microscopy with many potential applications in the fields of bioimaging and nanoscale fabrication and characterization. It could also be applied to significantly enhance the resolving capability of low-magnification scanning-electron microscopy.

3.
Artigo em Inglês | MEDLINE | ID: mdl-36279330

RESUMO

Weight decay (WD) is a fundamental and practical regularization technique in improving generalization of current deep learning models. However, it is observed that the WD does not work effectively for an adaptive optimization algorithm (such as Adam), as it works for SGD. Specifically, the solution found by Adam with the WD often generalizes unsatisfactorily. Though efforts have been made to mitigate this issue, the reason for such deficiency is still vague. In this article, we first show that when using the Adam optimizer, the weight norm increases very fast along with the training procedure, which is in contrast to SGD where the weight norm increases relatively slower and tends to converge. The fast increase of weight norm is adverse to WD; in consequence, the Adam optimizer will lose efficacy in finding solution that generalizes well. To resolve this problem, we propose to tailor Adam by introducing a regularization term on the adaptive learning rate, such that it is friendly to WD. Meanwhile, we introduce first moment on the WD to further enhance the regularization effect. We show that the proposed method is able to find solution with small norm and generalizes better than SGD. We test the proposed method on general image classification and fine-grained image classification tasks with different networks. Experimental results on all these cases substantiate the effectiveness of the proposed method in help improving the generalization. Specifically, the proposed method improves the test accuracy of Adam by a large margin and even improves the performance of SGD by 0.84% on CIFAR 10 and 1.03 % on CIFAR 100 with ResNet-50. The code of this article is public available at xxx.

4.
Artigo em Inglês | MEDLINE | ID: mdl-32310768

RESUMO

Exposure bracketing is crucial to high dynamic range imaging, but it is prone to halos for static scenes and ghosting artifacts for dynamic scenes. The recently proposed structural patch decomposition for multi-exposure fusion (SPD-MEF) has achieved reliable performance in deghosting, but suffers from visible halo artifacts and is computationally expensive. In addition, its relationship to other MEF methods is unclear. We show that without explicitly performing structural patch decomposition, we arrive at an unnormalized version of SPD-MEF, which enjoys an order of 30× speed-up, and is closely related to pixel-level MEF methods as well as the standard two-layer decomposition method for MEF. Moreover, we develop a fast multi-scale SPD-MEF method, which can effectively reduce halo artifacts. Experimental results demonstrate the effectiveness of the proposed MEF method in terms of speed and quality.

5.
IEEE Trans Pattern Anal Mach Intell ; 42(4): 851-864, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-30596570

RESUMO

In many science and engineering fields that require computational models to predict certain physical quantities, we are often faced with the selection of the best model under the constraint that only a small sample set can be physically measured. One such example is the prediction of human perception of visual quality, where sample images live in a high dimensional space with enormous content variations. We propose a new methodology for model comparison named group maximum differentiation (gMAD) competition. Given multiple computational models, gMAD maximizes the chances of falsifying a "defender" model using the rest models as "attackers". It exploits the sample space to find sample pairs that maximally differentiate the attackers while holding the defender fixed. Based on the results of the attacking-defending game, we introduce two measures, aggressiveness and resistance, to summarize the performance of each model at attacking other models and defending attacks from other models, respectively. We demonstrate the gMAD competition using three examples-image quality, image aesthetics, and streaming video quality-of-experience. Although these examples focus on visually discriminable quantities, the gMAD methodology can be extended to many other fields, and is especially useful when the sample space is large, the physical measurement is expensive and the cost of computational prediction is low.

6.
IEEE Trans Neural Netw Learn Syst ; 31(4): 1070-1083, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31226087

RESUMO

Multiview Subspace Learning (MSL), which aims at obtaining a low-dimensional latent subspace from multiview data, has been widely used in practical applications. Most recent MSL approaches, however, only assume a simple independent identically distributed (i.i.d.) Gaussian or Laplacian noise for all views of data, which largely underestimates the noise complexity in practical multiview data. Actually, in real cases, noises among different views generally have three specific characteristics. First, in each view, the data noise always has a complex configuration beyond a simple Gaussian or Laplacian distribution. Second, the noise distributions of different views of data are generally nonidentical and with evident distinctiveness. Third, noises among all views are nonindependent but obviously correlated. Based on such understandings, we elaborately construct a new MSL model by more faithfully and comprehensively considering all these noise characteristics. First, the noise in each view is modeled as a Dirichlet process (DP) Gaussian mixture model (DPGMM), which can fit a wider range of complex noise types than conventional Gaussian or Laplacian. Second, the DPGMM parameters in each view are different from one another, which encodes the "nonidentical" noise property. Third, the DPGMMs on all views share the same high-level priors by using the technique of hierarchical DP, which encodes the "nonindependent" noise property. All the aforementioned ideas are incorporated into an integrated graphics model which can be appropriately solved by the variational Bayes algorithm. The superiority of the proposed method is verified by experiments on 3-D reconstruction simulations, multiview face modeling, and background subtraction, as compared with the current state-of-the-art MSL methods.

7.
IEEE Trans Image Process ; 28(12): 6077-6090, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31217115

RESUMO

Detecting objects in surveillance videos is an important problem due to its wide applications in traffic control and public security. Existing methods tend to face performance degradation because of false positive or misalignment problems. We propose a novel framework, namely, Foreground Gating and Background Refining Network (FG-BR Net), for surveillance object detection (SOD). To reduce false positives in background regions, which is a critical problem in SOD, we introduce a new module that first subtracts the background of a video sequence and then generates high-quality region proposals. Unlike previous background subtraction methods that may wrongly remove the static foreground objects in a frame, a feedback connection from detection results to background subtraction process is proposed in our model to distill both static and moving objects in surveillance videos. Furthermore, we introduce another module, namely, the background refining stage, to refine the detection results with more accurate localizations. Pairwise non-local operations are adopted to cope with the misalignments between the features of original and background frames. Extensive experiments on real-world traffic surveillance benchmarks demonstrate the competitive performance of the proposed FG-BR Net. In particular, FG-BR Net ranks on the top among all the methods on hard and sunny subsets of the UA-DETRAC detection dataset, without any bells and whistles.

8.
IEEE Trans Image Process ; 28(7): 3162-3176, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30676960

RESUMO

Being able to cover a wide range of views, pan-tilt-zoom (PTZ) cameras have been widely deployed in visual surveillance systems. To achieve a global-view perception of a surveillance scene, it is necessary to generate its panoramic background image, which can be used for the subsequent applications such as road segmentation, active tracking, and so on. However, few works have been reported on this problem, partially due to the lack of benchmark dataset and the high complexity of panoramic image generation of PTZ cameras. In this paper, we build, for the first time to our best knowledge, a benchmark PTZ camera dataset with multiple views, and derive a complete set of panoramic transformation formulas for PTZ cameras. We further propose a fast multi-band blending method to address the efficiency issue in panoramic image fusion and mosaicing. Some related panoramic transformations are also developed, such as cylindrical and overlooking transformations. Our proposed approach exhibits impressive accuracy and efficiency in PTZ panorama generation as well as panoramic image mosaicing.

9.
IEEE Trans Neural Netw Learn Syst ; 30(7): 2093-2107, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30442621

RESUMO

The extreme learning machine (ELM) has attracted much attention over the past decade due to its fast learning speed and convincing generalization performance. However, there still remains a practical issue to be approached when applying the ELM: the randomly generated hidden node parameters without tuning can lead to the hidden node outputs being nonuniformly distributed, thus giving rise to poor generalization performance. To address this deficiency, a novel activation function with an affine transformation (AT) on its input is introduced into the ELM, which leads to an improved ELM algorithm that is referred to as an AT-ELM in this paper. The scaling and translation parameters of the AT activation function are computed based on the maximum entropy principle in such a way that the hidden layer outputs approximately obey a uniform distribution. Application of the AT-ELM algorithm in nonlinear function regression shows its robustness to the range scaling of the network inputs. Experiments on nonlinear function regression, real-world data set classification, and benchmark image recognition demonstrate better performance for the AT-ELM compared with the original ELM, the regularized ELM, and the kernel ELM. Recognition results on benchmark image data sets also reveal that the AT-ELM outperforms several other state-of-the-art algorithms in general.

10.
IEEE Trans Pattern Anal Mach Intell ; 40(7): 1726-1740, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-28767363

RESUMO

We propose an effective online background subtraction method, which can be robustly applied to practical videos that have variations in both foreground and background. Different from previous methods which often model the foreground as Gaussian or Laplacian distributions, we model the foreground for each frame with a specific mixture of Gaussians (MoG) distribution, which is updated online frame by frame. Particularly, our MoG model in each frame is regularized by the learned foreground/background knowledge in previous frames. This makes our online MoG model highly robust, stable and adaptive to practical foreground and background variations. The proposed model can be formulated as a concise probabilistic MAP model, which can be readily solved by EM algorithm. We further embed an affine transformation operator into the proposed model, which can be automatically adjusted to fit a wide range of video background transformations and make the method more robust to camera movements. With using the sub-sampling technique, the proposed method can be accelerated to execute more than 250 frames per second on average, meeting the requirement of real-time background subtraction for practical video processing tasks. The superiority of the proposed method is substantiated by extensive experiments implemented on synthetic and real videos, as compared with state-of-the-art online and offline background subtraction methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...