Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
IEEE Trans Image Process ; 33: 285-296, 2024.
Article in English | MEDLINE | ID: mdl-38090850

ABSTRACT

Transformers show a great impact on visual tracking thanks to their powerful representation learning capabilities. As the capacity of the model grows, the speed of the tracker tends to decrease gradually. Our work focuses on dealing with massively redundant information in tracking sequences with the Saliency Region Tracker (SRTrack). SRTrack is a heuristic two-stage tracker consisting of a lightweight tracking stage and a saliency stage. The former can handle simple tracking sequences while the latter is designed to perform delicate tracking on challenging frames with more discriminative features. However, the two-stage design leads to feature extrapolation, creating inconsistencies between training and inference features. In order to mitigate this problem, we develop an attention scaling factor that guarantees model robustness while yielding a slight performance gain. Our SRTrack achieves a state-of-the-art 0.699 AUC running at 61 FPS on LaSOT. Several experiments on large benchmarks demonstrate the high efficiency and accuracy of SRTrack.

2.
IEEE Trans Image Process ; 32: 3690-3701, 2023.
Article in English | MEDLINE | ID: mdl-37384474

ABSTRACT

Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences based on planar parallax, which takes full advantage of the omnipresent road plane geometry in driving scenes. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a γ map (the ratio of height to depth) for 3D reconstruction. The γ map has the potential to construct a two-dimensional transformation between two consecutive frames. It implies planar parallax and can be combined with the road plane serving as a reference to estimate the 3D structure by warping the consecutive frames. Furthermore, we introduce a novel cross-attention module to make the network better perceive the displacements caused by planar parallax. To verify the effectiveness of our method, we sample data from the Waymo Open Dataset and construct annotations related to planar parallax. Comprehensive experiments are conducted on the sampled dataset to demonstrate the 3D reconstruction accuracy of our approach in challenging scenarios.

3.
Neural Netw ; 162: 557-570, 2023 May.
Article in English | MEDLINE | ID: mdl-36996687

ABSTRACT

Restoring high quality images from raw data in low light is challenging due to various noises caused by limited photon count and complicated Image Signal Process (ISP). Although several restoration and enhancement approaches are proposed, they may fail in extreme conditions, such as imaging short exposure raw data. The first path-breaking attempt is to utilize the connection between a pair of short and long exposure raw data and outputs RGB images as the final results. However, the whole pipeline still suffers from some blurs and color distortion. To overcome those difficulties, we propose an end-to-end network that contains two effective subnets to joint demosaic and denoise low exposure raw images. While traditional ISP are difficult to image them in acceptable conditions, the short exposure raw images can be better restored and enhanced by our model. For denoising, the proposed Short2Long raw restoration subnet outputs pseudo long exposure raw data with little noisy points. Then for demosaicing, the proposed Color consistent RGB enhancement subnet generates corresponding RGB images with the desired attributes: sharpness, color vividness, good contrast and little noise. By training the network in an end-to-end manner, our method avoids additional tuning by experts. We conduct experiments to reveal good results on three raw data datasets. We also illustrate the effectiveness of each module and the well generalization ability of this model.


Subject(s)
Algorithms , Image Enhancement , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Signal-To-Noise Ratio
4.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3590-3601, 2022 07.
Article in English | MEDLINE | ID: mdl-33621170

ABSTRACT

In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the local Rademacher complexity by the strict mathematical deduction. The analyses of dropout in FCNs and DropBlock in CNNs with keep rate matrices in different layers are also included in the complexity analyses. With the new regularization function, we establish a two-stage procedure to obtain the optimal keep rate matrix and weight matrix to realize the whole training model. Extensive experiments have been conducted to demonstrate the effectiveness of LocalDrop in different models by comparing it with several algorithms and the effects of different hyperparameters on the final performances.


Subject(s)
Algorithms , Neural Networks, Computer
5.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 2972-2975, 2021 11.
Article in English | MEDLINE | ID: mdl-34891869

ABSTRACT

Cone-Beam Computed Tomography (CBCT) imaging modality is used to acquire 3D volumetric image of the human body. CBCT plays a vital role in diagnosing dental diseases, especially cyst or tumour-like lesions. Current computer-aided detection and diagnostic systems have demonstrated diagnostic value in a range of diseases, however, the capability of such a deep learning method on transmissive lesions has not been investigated. In this study, we propose an automatic method for the detection of transmissive lesions of jawbones using CBCT images. We integrated a pre-trained DenseNet with pathological information to reduce the intra-class variation within a patient's images in the 3D volume (stack) that may affect the performance of the model. Our proposed method separates each CBCT stacks into seven intervals based on their disease manifestation. To evaluate the performance of our method, we created a new dataset containing 353 patients' CBCT data. A patient-wise image division strategy was employed to split the training and test sets. The overall lesion detection accuracy of 80.49% was achieved, outperforming the baseline DenseNet result of 77.18%. The result demonstrates the feasibility of our method for detecting transmissive lesions in CBCT images.Clinical relevance - The proposed strategy aims at providing automatic detection of the transmissive lesions of jawbones with the use of CBCT images that can reduce the workload of clinical radiologists, improve their diagnostic efficiency, and meet the preliminary requirement for the diagnosis of this kind of disease when there is a lack of radiologists.


Subject(s)
Spiral Cone-Beam Computed Tomography , Cone-Beam Computed Tomography , Humans , Imaging, Three-Dimensional
6.
IEEE Trans Image Process ; 30: 1784-1798, 2021.
Article in English | MEDLINE | ID: mdl-33417551

ABSTRACT

Image inpainting is a challenging computer vision task that aims to fill in missing regions of corrupted images with realistic contents. With the development of convolutional neural networks, many deep learning models have been proposed to solve image inpainting issues by learning information from a large amount of data. In particular, existing algorithms usually follow an encoding and decoding network architecture in which some operations with standard schemes are employed, such as static convolution, which only considers pixels with fixed grids, and the monotonous normalization style (e.g., batch normalization). However, these techniques are not well-suited for the image inpainting task because the random corrupted regions in the input images tend to mislead the inpainting process and generate unreasonable content. In this paper, we propose a novel dynamic selection network (DSNet) to solve this problem in image inpainting tasks. The principal idea of the proposed DSNet is to distinguish the corrupted region from the valid ones throughout the entire network architecture, which may help make full use of the information in the known area. Specifically, the proposed DSNet has two novel dynamic selection modules, namely, the validness migratable convolution (VMC) and regional composite normalization (RCN) modules, which share a dynamic selection mechanism that helps utilize valid pixels better. By replacing vanilla convolution with the VMC module, spatial sampling locations are dynamically selected in the convolution phase, resulting in a more flexible feature extraction process. Besides, the RCN module not only combines several normalization methods but also normalizes the feature regions selectively. Therefore, the proposed DSNet can illustrate realistic and fine-detailed images by adaptively selecting features and normalization styles. Experimental results on three public datasets show that our proposed method outperforms state-of-the-art methods both quantitatively and qualitatively.

7.
IEEE Trans Cybern ; 51(2): 673-685, 2021 Feb.
Article in English | MEDLINE | ID: mdl-31021816

ABSTRACT

In this paper, we propose a novel nonlocal patch tensor-based visual data completion algorithm and analyze its potential problems. Our algorithm consists of two steps: the first step is initializing the image with triangulation-based linear interpolation and the second step is grouping similar nonlocal patches as a tensor then applying the proposed tensor completion technique. Specifically, with treating a group of patch matrices as a tensor, we impose the low-rank constraint on the tensor through the recently proposed tensor nuclear norm. Moreover, we observe that after the first interpolation step, the image gets blurred and, thus, the similar patches we have found may not exactly match the reference. We name the problem "Patch Mismatch," and then in order to avoid the error caused by it, we further decompose the patch tensor into a low-rank tensor and a sparse tensor, which means the accepted horizontal strips in mismatched patches. Furthermore, our theoretical analysis shows that the error caused by Patch Mismatch can be decomposed into two components, one of which can be bounded by a reasonable assumption named local patch similarity, and the other part is lower than that using matrix completion. Extensive experimental results on real-world datasets verify our method's superiority to the state-of-the-art tensor-based image inpainting methods.

8.
Article in English | MEDLINE | ID: mdl-32386150

ABSTRACT

Synthetic visual data refers to the data automatically rendered by the mature computer graphic algorithms. With the rapid development of these techniques, we can now collect photo-realistic synthetic images with accurate pixel-level annotations without much effort. However, due to the domain gaps between synthetic data and real data, in terms of not only visual appearance but also label distribution, directly applying models trained on synthetic images to real ones can hardly yield satisfactory performance. Since the collection of accurate labels for real images is very laborious and time-consuming, developing algorithms which can learn from synthetic images is of great significance. In this paper, we propose a novel framework, namely Active Pseudo-Labeling (APL), to reduce the domain gaps between synthetic images and real images. In APL framework, we first predict pseudo-labels for the unlabeled real images in the target domain by actively adapting the style of the real images to source domain. Specifically, the style of real images is adjusted via a novel task guided generative model, and then pseudo-labels are predicted for these actively adapted images. Lastly, we fine-tune the source-trained model in the pseudo-labeled target domain, which helps to fit the distribution of the real data. Experiments on both semantic segmentation and object detection tasks with several challenging benchmark data sets demonstrate the priority of our proposed method compared to the existing state-of-the-art approaches.

9.
Neural Netw ; 123: 261-272, 2020 Mar.
Article in English | MEDLINE | ID: mdl-31887686

ABSTRACT

Face alignment is a typical facial behavior analysis task in computer vision. However, the performance of face alignment is degraded greatly when the face image is partially occluded. In order to achieve better mapping between facial appearance features and shape increments, we propose a robust and occlusion-free face alignment algorithm in which a face de-occlusion module and a deep regression module are integrated into a cascaded deep generative regression model. The face de-occlusion module is a disentangled representation learning Generative Adversarial Networks (GANs) which aims to locate occlusions and recover the genuine appearance from partially occluded face image. The deep regression module can enhance facial appearance representation by utilizing the recovered faces to obtain more accurate regressors. Then, by the cascaded deep generative regression model, we recover the partially occluded face image and achieve accurate locating of landmarks gradually. It is interesting to show that the cascaded deep generative regression model can effectively locate occlusions and recover more genuine faces, which can be further used to improve the performance of face alignment. Experimental results conducted on four challenging occluded face datasets demonstrate that our method outperforms state-of-the-art methods.


Subject(s)
Biometric Identification/methods , Image Processing, Computer-Assisted/methods , Face/anatomy & histology , Humans
10.
Article in English | MEDLINE | ID: mdl-31369373

ABSTRACT

Covariate shift assumption based domain adaptation approaches usually utilize only one common transformation to align marginal distributions and make conditional distributions preserved. However, one common transformation may cause loss of useful information, such as variances and neighborhood relationship in both source and target domain. To address this problem, we propose a novel method called homologous component analysis (HCA) where we try to find two totally different but homologous transformations to align distributions with side information and make conditional distributions preserved. As it is hard to find a closed form solution to the corresponding optimization problem, we solve them by means of the alternating direction minimizing method (ADMM) in the context of Stiefel manifolds. We also provide a generalization error bound for domain adaptation in semi-supervised case and two transformations can help to decrease this upper bound more than only one common transformation does. Extensive experiments on synthetic and real data show the effectiveness of the proposed method by comparing its classification accuracy with the state-of-the-art methods and numerical evidence on chordal distance and Frobenius distance shows that resulting optimal transformations are different.

11.
IEEE Trans Cybern ; 49(7): 2406-2419, 2019 Jul.
Article in English | MEDLINE | ID: mdl-29994036

ABSTRACT

Hyperspectral image (HSI) contains a large number of spatial-spectral information, which will make the traditional classification methods face an enormous challenge to discriminate the types of land-cover. Feature learning is very effective to improve the classification performances. However, the current feature learning approaches are mostly based on a simple intrinsic structure. To represent the complex intrinsic spatial-spectral of HSI, a novel feature learning algorithm, termed spatial-spectral hypergraph discriminant analysis (SSHGDA), has been proposed on the basis of spatial-spectral information, discriminant information, and hypergraph learning. SSHGDA constructs a reconstruction between-class scatter matrix, a weighted within-class scatter matrix, an intraclass spatial-spectral hypergraph, and an interclass spatial-spectral hypergraph to represent the intrinsic properties of HSI. Then, in low-dimensional space, a feature learning model is designed to compact the intraclass information and separate the interclass information. With this model, an optimal projection matrix can be obtained to extract the spatial-spectral features of HSI. SSHGDA can effectively reveal the complex spatial-spectral structures of HSI and enhance the discriminating power of features for land-cover classification. Experimental results on the Indian Pines and PaviaU HSI data sets show that SSHGDA can achieve better classification accuracies in comparison with some state-of-the-art methods.

12.
IEEE Trans Cybern ; 49(4): 1440-1453, 2019 Apr.
Article in English | MEDLINE | ID: mdl-29994595

ABSTRACT

Semisupervised learning (SSL) methods have been proved to be effective at solving the labeled samples shortage problem by using a large number of unlabeled samples together with a small number of labeled samples. However, many traditional SSL methods may not be robust with too much labeling noisy data. To address this issue, in this paper, we propose a robust graph-based SSL method based on maximum correntropy criterion to learn a robust and strong generalization model. In detail, the graph-based SSL framework is improved by imposing supervised information on the regularizer, which can strengthen the constraint on labels, thus ensuring that the predicted labels of each cluster are close to the true labels. Furthermore, the maximum correntropy criterion is introduced into the graph-based SSL framework to suppress labeling noise. Extensive image classification experiments prove the generalization and robustness of the proposed SSL method.

13.
IEEE Trans Cybern ; 48(1): 16-28, 2018 Jan.
Article in English | MEDLINE | ID: mdl-28113695

ABSTRACT

In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature, and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation has not efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient.

14.
J Environ Manage ; 205: 85-98, 2018 Jan 01.
Article in English | MEDLINE | ID: mdl-28968590

ABSTRACT

Severe environmental and health impacts have been experienced in the Western Lake Erie Basin (WLEB) because of eutrophication and associated proliferation of harmful algae blooms. Efforts to improve water quality within the WLEB have been on-going for several decades. However, water quality improvements in the basin have not been realized as anticipated. In this study, factors affecting water quality within the WLEB were evaluated with a view to differentiating their impacts and informing further assessments in the basin. Over the long-term (1966-2015) and basin-wide, total annual precipitation increased significantly by about 2.4 mm/year while mean monthly streamflows also increased during the same period although the increase was not significant (p = 0.36). There was, however, a significant increase in spring streamflows during this period (p = 0.003). Patterns in water quality parameters showed significant reductions in total suspended solids (TSS) (p < 0.001) and total phosphorus (TP) (p = 0.018) while soluble reactive phosphorus (SRP) increased significantly (p < 0.001), and in particular from about 1995. Results of near-term (2005-2015) analysis showed a non-significant (p = 0.262) reduction in TSS concentrations of about 0.25 mg/L/year. TP concentrations did not vary substantially during the same period while a 0.11 mg/L/year increase in nitrate and a 0.001 mg/L/year increase in SRP were observed, with increases in nitrates being significant (p = 0.013). TP and SRP concentrations, however, remained high within the basin with daily values ranging between 0.03 and 1.84 mg/L and less than 0.002-0.52 mg/L, respectively. Basin-wide, both spring precipitation and spring streamflows increased significantly during the period 2005-2015 (p < 0.001). Overall, no substantial changes in land use were observed, suggesting that water quality responses might be attributable to management. Based on recent data, corn acreage in the basin and fertilizer applied to corn increased by 33% and 10% respectively. Combined Sewer Overflows (CSOs) and impoundments were also important factors due to their prevalence in the basin. Based on the analysis, changes in agricultural management, increase in spring precipitation, CSOs, legacy phosphorus, and the presence of dams were thought to present constraints to water quality improvements despite conservation efforts within the basin.


Subject(s)
Eutrophication , Water Quality , Environmental Monitoring , Lakes , Phosphorus , Quality Improvement , Water
15.
IEEE Trans Image Process ; 26(4): 1694-1707, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28092540

ABSTRACT

Multi-label learning draws great interests in many real world applications. It is a highly costly task to assign many labels by the oracle for one instance. Meanwhile, it is also hard to build a good model without diagnosing discriminative labels. Can we reduce the label costs and improve the ability to train a good model for multi-label learning simultaneously? Active learning addresses the less training samples problem by querying the most valuable samples to achieve a better performance with little costs. In multi-label active learning, some researches have been done for querying the relevant labels with less training samples or querying all labels without diagnosing the discriminative information. They all cannot effectively handle the outlier labels for the measurement of uncertainty. Since maximum correntropy criterion (MCC) provides a robust analysis for outliers in many machine learning and data mining algorithms, in this paper, we derive a robust multi-label active learning algorithm based on an MCC by merging uncertainty and representativeness, and propose an efficient alternating optimization method to solve it. With MCC, our method can eliminate the influence of outlier labels that are not discriminative to measure the uncertainty. To make further improvement on the ability of information measurement, we merge uncertainty and representativeness with the prediction labels of unknown data. It cannot only enhance the uncertainty but also improve the similarity measurement of multi-label data with labels information. Experiments on benchmark multi-label data sets have shown a superior performance than the state-of-the-art methods.

16.
IEEE Trans Cybern ; 47(4): 1017-1027, 2017 Apr.
Article in English | MEDLINE | ID: mdl-26992191

ABSTRACT

Deep networks have achieved excellent performance in learning representation from visual data. However, the supervised deep models like convolutional neural network require large quantities of labeled data, which are very expensive to obtain. To solve this problem, this paper proposes an unsupervised deep network, called the stacked convolutional denoising auto-encoders, which can map images to hierarchical representations without any label information. The network, optimized by layer-wise training, is constructed by stacking layers of denoising auto-encoders in a convolutional way. In each layer, high dimensional feature maps are generated by convolving features of the lower layer with kernels learned by a denoising auto-encoder. The auto-encoder is trained on patches extracted from feature maps in the lower layer to learn robust feature detectors. To better train the large network, a layer-wise whitening technique is introduced into the model. Before each convolutional layer, a whitening layer is embedded to sphere the input data. By layers of mapping, raw images are transformed into high-level feature representations which would boost the performance of the subsequent support vector machine classifier. The proposed algorithm is evaluated by extensive experimentations and demonstrates superior classification performance to state-of-the-art unsupervised networks.

17.
IEEE Trans Cybern ; 47(1): 14-26, 2017 Jan.
Article in English | MEDLINE | ID: mdl-26595936

ABSTRACT

How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified best-versus-second-best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.

SELECTION OF CITATIONS
SEARCH DETAIL
...