Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38819971

RESUMO

Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense priors. To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery. We further design an observation enhancement strategy for an elegant combination of our framework with different VLN agents, where we utilize the corrected landmark features to obtain enhanced observation features for action decision. Extensive experimental results on multiple popular VLN benchmarks (R2R, REVERIE, R4R, RxR) show the significant superiority of CONSOLE over strong baselines. Especially, our CONSOLE establishes the new state-of-the-art results on R2R and R4R in unseen scenarios.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38512732

RESUMO

Self-supervised learning aims to learn representation that can be effectively generalized to downstream tasks. Many self-supervised approaches regard two views of an image as both the input and the self-supervised signals, assuming that either view contains the same task-relevant information and the shared information is (approximately) sufficient for predicting downstream tasks. Recent studies show that discarding superfluous information not shared between the views can improve generalization. Hence, the ideal representation is sufficient for downstream tasks and contains minimal superfluous information, termed minimal sufficient representation. One can learn this representation by maximizing the mutual information between the representation and the supervised view while eliminating superfluous information. Nevertheless, the computation of mutual information is notoriously intractable. In this work, we propose an objective termed multi-view entropy bottleneck (MVEB) to learn minimal sufficient representation effectively. MVEB simplifies the minimal sufficient learning to maximizing both the agreement between the embeddings of two views and the differential entropy of the embedding distribution. Our experiments confirm that MVEB significantly improves performance. For example, it achieves top-1 accuracy of 76.9% on ImageNet with a vanilla ResNet-50 backbone on linear evaluation. To the best of our knowledge, this is the new state-of-the-art result with ResNet-50.

3.
Sci Rep ; 13(1): 20667, 2023 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-38001131

RESUMO

Aiming at the engineering problem of roadway deformation and instability of swelling soft rock widely existed in Kailuan mining area, the mineral composition and microstructure of such soft rock were obtained by conducting scanning electron microscopy, X-ray diffraction experiments, uniaxial and conventional triaxial tests, and the law of softening and expanding of such soft rock and the failure mechanism of surrounding rock were identified. The combined support scheme of multi-level anchor bolt, bottom corner pressure relief and fractional grouting is proposed. The roadway supporting parameters are adjusted and optimized by FLAC3D numerical simulation, and three supporting methods of multi-layer anchor bolt, bottom corner pressure relief and fractional grouting are determined and their parameters are optimized. The study results show that: the total amount of clay minerals is 53-75%, pores, fissures, nanoscale and micron layer gaps are developed, providing a penetrating channel for water infiltration to soften the surrounding rock; the three-level anchor pressure-relief and grouting support technology can control the sinking amount of the roof within 170 mm, the bottom drum amount within 210 mm, the bolts of each level is evenly distributed in tension, and the maximum stress and bottom drum displacement in the pressure relief area are significantly reduced; the pressure-relief groove promotes the development of bottom corner cracks, accelerates the secondary distribution of peripheral stress, and weakens the effect of high stress on the shallow area. Using time or displacement as the index, optimizing the grouting time, filling the primary and excavation cracks, blocking the expansion and softening effect of water on the rock mass, realizing the dynamic unity of structural yielding pressure and surrounding rock modification, has guiding significance for the support control of soft rock roadway.

4.
Sensors (Basel) ; 23(20)2023 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-37896686

RESUMO

The precise detection of stratum interfaces holds significant importance in geological discontinuity recognition and roadway support optimization. In this study, the model for locating rock interfaces through change point detection was proposed, and a drilling test on composite strength mortar specimens was conducted. With the logistic function and the particle swarm optimization algorithm, the drilling specific energy was modulated to detect the stratum interface. The results indicate that the drilling specific energy after the modulation of the logistic function showed a good anti-interference quality under stable drilling and sensitivity under interface drilling, and its average recognition error was 2.83 mm, which was lower than the error of 6.56 mm before modulation. The particle swarm optimization algorithm facilitated the adaptive matching of drive parameters to drilling data features, yielding a substantial 50.88% decrease in the recognition error rate. This study contributes to enhancing the perception accuracy of stratum interfaces and eliminating the potential danger of roof collapse.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12133-12147, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37200122

RESUMO

Despite the substantial progress of active learning for image recognition, there lacks a systematic investigation of instance-level active learning for object detection. In this paper, we propose to unify instance uncertainty calculation with image uncertainty estimation for informative image selection, creating a multiple instance differentiation learning (MIDL) method for instance-level active learning. MIDL consists of a classifier prediction differentiation module and a multiple instance differentiation module. The former leverages two adversarial instance classifiers trained on the labeled and unlabeled sets to estimate instance uncertainty of the unlabeled set. The latter treats unlabeled images as instance bags and re-estimates image-instance uncertainty using the instance classification model in a multiple instance learning fashion. Through weighting the instance uncertainty using instance class probability and instance objectness probability under the total probability formula, MIDL unifies the image uncertainty with instance uncertainty in the Bayesian theory framework. Extensive experiments validate that MIDL sets a solid baseline for instance-level active learning. On commonly used object detection datasets, it outperforms other state-of-the-art methods by significant margins, particularly when the labeled sets are small.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10027-10043, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37022275

RESUMO

Super-Resolution from a single motion Blurred image (SRB) is a severely ill-posed problem due to the joint degradation of motion blurs and low spatial resolution. In this article, we employ events to alleviate the burden of SRB and propose an Event-enhanced SRB (E-SRB) algorithm, which can generate a sequence of sharp and clear images with High Resolution (HR) from a single blurry image with Low Resolution (LR). To achieve this end, we formulate an event-enhanced degeneration model to consider the low spatial resolution, motion blurs, and event noises simultaneously. We then build an event-enhanced Sparse Learning Network (eSL-Net++) upon a dual sparse learning scheme where both events and intensity frames are modeled with sparse representations. Furthermore, we propose an event shuffle-and-merge scheme to extend the single-frame SRB to the sequence-frame SRB without any additional training process. Experimental results on synthetic and real-world datasets show that the proposed eSL-Net++ outperforms state-of-the-art methods by a large margin. Datasets, codes, and more results are available at https://github.com/ShinyWang33/eSL-Net-Plusplus.

7.
IEEE Trans Image Process ; 31: 4023-4038, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35679376

RESUMO

In recent years, image denoising has benefited a lot from deep neural networks. However, these models need large amounts of noisy-clean image pairs for supervision. Although there have been attempts in training denoising networks with only noisy images, existing self-supervised algorithms suffer from inefficient network training, heavy computational burden, or dependence on noise modeling. In this paper, we proposed a self-supervised framework named Neighbor2Neighbor for deep image denoising. We develop a theoretical motivation and prove that by designing specific samplers for training image pairs generation from only noisy images, we can train a self-supervised denoising network similar to the network trained with clean images supervision. Besides, we propose a regularizer in the perspective of optimization to narrow the optimization gap between the self-supervised denoiser and the supervised denoiser. We present a very simple yet effective self-supervised training scheme based on the theoretical understandings: training image pairs are generated by random neighbor sub-samplers, and denoising networks are trained with a regularized loss. Moreover, we propose a training strategy named BayerEnsemble to adapt the Neighbor2Neighbor framework in raw image denoising. The proposed Neighbor2Neighbor framework can enjoy the progress of state-of-the-art supervised denoising networks in network architecture design. It also avoids heavy dependence on the assumption of the noise distribution. We evaluate the Neighbor2Neighbor framework through extensive experiments, including synthetic experiments with different noise distributions and real-world experiments under various scenarios. The code is available online: https://github.com/TaoHuang2018/Neighbor2Neighbor.

8.
IEEE Trans Neural Netw Learn Syst ; 33(12): 7091-7100, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34125685

RESUMO

We propose a novel network pruning approach by information preserving of pretrained network weights (filters). Network pruning with the information preserving is formulated as a matrix sketch problem, which is efficiently solved by the off-the-shelf frequent direction method. Our approach, referred to as FilterSketch, encodes the second-order information of pretrained weights, which enables the representation capacity of pruned networks to be recovered with a simple fine-tuning procedure. FilterSketch requires neither training from scratch nor data-driven iterative optimization, leading to a several-orders-of-magnitude reduction of time cost in the optimization of pruning. Experiments on CIFAR-10 show that FilterSketch reduces 63.3% of floating-point operations (FLOPs) and prunes 59.9% of network parameters with negligible accuracy cost for ResNet-110. On ILSVRC-2012, it reduces 45.5% of FLOPs and removes 43.0% of parameters with only 0.69% accuracy drop for ResNet-50. Our code and pruned models can be found at https://github.com/lmbxmu/FilterSketch.

9.
IEEE Trans Image Process ; 30: 3908-3921, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33750690

RESUMO

This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images. A key challenge for this novel view synthesis arises from the reconstruction process, when the views from different input images may not be consistent due to obstruction in the light path. We overcome this by jointly modeling the epipolar property and occlusion in designing a convolutional neural network. We start by defining and computing the aperture disparity map, which approximates the parallax and measures the pixel-wise shift between two views. While this relates to free-space rendering and can fail near the object boundaries, we further develop a warping confidence map to address pixel occlusion in these challenging regions. The proposed method is evaluated on diverse real-world and synthetic light field scenes, and it shows better performance over several state-of-the-art techniques.

10.
IEEE Trans Image Process ; 30: 2538-2548, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33481714

RESUMO

Natural language moment localization aims at localizing video clips according to a natural language description. The key to this challenging task lies in modeling the relationship between verbal descriptions and visual contents. Existing approaches often sample a number of clips from the video, and individually determine how each of them is related to the query sentence. However, this strategy can fail dramatically, in particular when the query sentence refers to some visual elements that appear outside of, or even are distant from, the target clip. In this paper, we address this issue by designing an Interaction-Integrated Network (I2N), which contains a few Interaction-Integrated Cells (I2Cs). The idea lies in the observation that the query sentence not only provides a description to the video clip, but also contains semantic cues on the structure of the entire video. Based on this, I2Cs go one step beyond modeling short-term contexts in the time domain by encoding long-term video content into every frame feature. By stacking a few I2Cs, the obtained network, I2N, enjoys an improved ability of inference, brought by both (I) multi-level correspondence between vision and language and (II) more accurate cross-modal alignment. When evaluated on a challenging video moment localization dataset named DiDeMo, I2N outperforms the state-of-the-art approach by a clear margin of 1.98%. On other two challenging datasets, Charades-STA and TACoS, I2N also reports competitive performance.

11.
IEEE Trans Image Process ; 30: 2060-2071, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33460378

RESUMO

Person re-identification is a crucial task of identifying pedestrians of interest across multiple surveillance camera views. For person re-identification, a pedestrian is usually represented with features extracted from a rectangular image region that inevitably contains the scene background, which incurs ambiguity to distinguish different pedestrians and degrades the accuracy. Thus, we propose an end-to-end foreground-aware network to discriminate against the foreground from the background by learning a soft mask for person re-identification. In our method, in addition to the pedestrian ID as supervision for the foreground, we introduce the camera ID of each pedestrian image for background modeling. The foreground branch and the background branch are optimized collaboratively. By presenting a target attention loss, the pedestrian features extracted from the foreground branch become more insensitive to backgrounds, which greatly reduces the negative impact of changing backgrounds on pedestrian matching across different camera views. Notably, in contrast to existing methods, our approach does not require an additional dataset to train a human landmark detector or a segmentation model for locating the background regions. The experimental results conducted on three challenging datasets, i.e., Market-1501, DukeMTMC-reID, and MSMT17, demonstrate the effectiveness of our approach.


Assuntos
Identificação Biométrica/métodos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Algoritmos , Humanos , Pedestres , Gravação em Vídeo
12.
IEEE Trans Image Process ; 27(9): 4357-4366, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29870353

RESUMO

In steerable filters, a filter of arbitrary orientation can be generated by a linear combination of a set of "basis filters." Steerable properties dominate the design of the traditional filters, e.g., Gabor filters and endow features the capability of handling spatial transformations. However, such properties have not yet been well explored in the deep convolutional neural networks (DCNNs). In this paper, we develop a new deep model, namely, Gabor convolutional networks (GCNs or Gabor CNNs), with Gabor filters incorporated into DCNNs such that the robustness of learned features against the orientation and scale changes can be reinforced. By manipulating the basic element of DCNNs, i.e., the convolution operator, based on Gabor filters, GCNs can be easily implemented and are readily compatible with any popular deep learning architecture. We carry out extensive experiments to demonstrate the promising performance of our GCNs framework, and the results show its superiority in recognizing objects, especially when the scale and rotation changes take place frequently. Moreover, the proposed GCNs have much fewer network parameters to be learned and can effectively reduce the training complexity of the network, leading to a more compact deep learning model while still maintaining a high feature representation capacity. The source code can be found at https://github.com/bczhangbczhang.

13.
IEEE Trans Image Process ; 22(2): 778-89, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23060336

RESUMO

Human detection in images is challenged by the view and posture variation problem. In this paper, we propose a piecewise linear support vector machine (PL-SVM) method to tackle this problem. The motivation is to exploit the piecewise discriminative function to construct a nonlinear classification boundary that can discriminate multiview and multiposture human bodies from the backgrounds in a high-dimensional feature space. A PL-SVM training is designed as an iterative procedure of feature space division and linear SVM training, aiming at the margin maximization of local linear SVMs. Each piecewise SVM model is responsible for a subspace, corresponding to a human cluster of a special view or posture. In the PL-SVM, a cascaded detector is proposed with block orientation features and a histogram of oriented gradient features. Extensive experiments show that compared with several recent SVM methods, our method reaches the state of the art in both detection accuracy and computational efficiency, and it performs best when dealing with low-resolution human regions in clutter backgrounds.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Máquina de Vetores de Suporte , Atividades Cotidianas , Animais , Bases de Dados Factuais , Humanos , Gravação em Vídeo
14.
IEEE Trans Image Process ; 21(9): 4180-9, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22645268

RESUMO

3-D technologies are considered as the next generation of multimedia applications. Currently, one of the challenges faced by 3-D applications is the shortage of 3-D resources. To solve this problem, many 3-D modeling methods are proposed to directly recover 3-D geometry from 2-D images. However, these methods on single view modeling either require intensive user interaction, or are restricted to a specific kind of object. In this paper, we propose a novel 3-D modeling approach to recover 3-D geometry from a single image of a symmetric object with minimal user interaction. Symmetry is one of the most common properties of natural or manmade objects. Given a single view of a symmetric object, the user marks some symmetric lines and depth discontinuity regions on the image. Our algorithm first finds a set of planes to approximately fit to the object, and then a rough 3-D point cloud is generated by an optimization procedure. The occluded part of the object is further recovered using symmetry information. Experimental results on various indoor and outdoor objects show that the proposed system can obtain 3-D models from single images with only a little user interaction.

15.
IEEE Trans Pattern Anal Mach Intell ; 33(1): 3-15, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21088315

RESUMO

Three-dimensional object reconstruction from a single 2D line drawing is an important problem in computer vision. Many methods have been presented to solve this problem, but they usually fail when the geometric structure of a 3D object becomes complex. In this paper, a novel approach based on a divide-and-conquer strategy is proposed to handle the 3D reconstruction of a planar-faced complex manifold object from its 2D line drawing with hidden lines visible. The approach consists of four steps: 1) identifying the internal faces of the line drawing, 2) decomposing the line drawing into multiple simpler ones based on the internal faces, 3) reconstructing the 3D shapes from these simpler line drawings, and 4) merging the 3D shapes into one complete object represented by the original line drawing. A number of examples are provided to show that our approach can handle 3D reconstruction of more complex objects than previous methods.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Desenho Assistido por Computador , Aumento da Imagem/métodos
16.
IEEE Trans Pattern Anal Mach Intell ; 32(10): 1858-70, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20724762

RESUMO

This paper proposes a new 3D face recognition approach, Collective Shape Difference Classifier (CSDC), to meet practical application requirements, i.e., high recognition performance, high computational efficiency, and easy implementation. We first present a fast posture alignment method which is self-dependent and avoids the registration between an input face against every face in the gallery. Then, a Signed Shape Difference Map (SSDM) is computed between two aligned 3D faces as a mediate representation for the shape comparison. Based on the SSDMs, three kinds of features are used to encode both the local similarity and the change characteristics between facial shapes. The most discriminative local features are selected optimally by boosting and trained as weak classifiers for assembling three collective strong classifiers, namely, CSDCs with respect to the three kinds of features. Different schemes are designed for verification and identification to pursue high performance in both recognition and computation. The experiments, carried out on FRGC v2 with the standard protocol, yield three verification rates all better than 97.9 percent with the FAR of 0.1 percent and rank-1 recognition rates above 98 percent. Each recognition against a gallery with 1,000 faces only takes about 3.6 seconds. These experimental results demonstrate that our algorithm is not only effective but also time efficient.


Assuntos
Inteligência Artificial , Identificação Biométrica/métodos , Face/anatomia & histologia , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Humanos , Nariz/fisiologia , Postura/fisiologia , Análise de Componente Principal , Curva ROC
17.
IEEE Trans Image Process ; 19(9): 2254-64, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20363679

RESUMO

Image segmentation plays an important role in computer vision and image analysis. In this paper, image segmentation is formulated as a labeling problem under a probability maximization framework. To estimate the label configuration, an iterative optimization scheme is proposed to alternately carry out the maximum a posteriori (MAP) estimation and the maximum likelihood (ML) estimation. The MAP estimation problem is modeled with Markov random fields (MRFs) and a graph cut algorithm is used to find the solution to the MAP estimation. The ML estimation is achieved by computing the means of region features in a Gaussian model. Our algorithm can automatically segment an image into regions with relevant textures or colors without the need to know the number of regions in advance. Its results match image edges very well and are consistent with human perception. Comparing to six state-of-the-art algorithms, extensive experiments have shown that our algorithm performs the best.

18.
IEEE Trans Image Process ; 19(2): 533-44, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19887313

RESUMO

This paper proposes a novel high-order local pattern descriptor, local derivative pattern (LDP), for face recognition. LDP is a general framework to encode directional pattern features based on local derivative variations. The n(th)-order LDP is proposed to encode the (n-1)(th) -order local derivative direction variations, which can capture more detailed information than the first-order local pattern used in local binary pattern (LBP). Different from LBP encoding the relationship between the central point and its neighbors, the LDP templates extract high-order local information by encoding various distinctive spatial relationships contained in a given local region. Both gray-level images and Gabor feature images are used to evaluate the comparative performances of LDP and LBP. Extensive experimental results on FERET, CAS-PEAL, CMU-PIE, Extended Yale B, and FRGC databases show that the high-order LDP consistently performs much better than LBP for both face identification and face verification under various conditions.


Assuntos
Algoritmos , Identificação Biométrica/métodos , Face/anatomia & histologia , Bases de Dados Factuais , Humanos
19.
IEEE Trans Image Process ; 19(4): 1087-96, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-20028634

RESUMO

Subspace learning techniques for face recognition have been widely studied in the past three decades. In this paper, we study the problem of general subspace-based face recognition under the scenarios with spatial misalignments and/or image occlusions. For a given subspace derived from training data in a supervised, unsupervised, or semi-supervised manner, the embedding of a new datum and its underlying spatial misalignment parameters are simultaneously inferred by solving a constrained l1 norm optimization problem, which minimizes the l1 error between the misalignment-amended image and the image reconstructed from the given subspace along with its principal complementary subspace. A byproduct of this formulation is the capability to detect the underlying image occlusions. Extensive experiments on spatial misalignment estimation, image occlusion detection, and face recognition with spatial misalignments and/or image occlusions all validate the effectiveness of our proposed general formulation for misalignment-robust face recognition.


Assuntos
Algoritmos , Inteligência Artificial , Identificação Biométrica/métodos , Face/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Bases de Dados Factuais , Humanos , Reprodutibilidade dos Testes
20.
IEEE Trans Image Process ; 18(10): 2153-66, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19586823

RESUMO

This paper studies phase singularities (PSs) for image representation. We show that PSs calculated with Laguerre-Gauss filters contain important information and provide a useful tool for image analysis. PSs are invariant to image translation and rotation. We introduce several invariant features to characterize the core structures around PSs and analyze the stability of PSs to noise addition and scale change. We also study the characteristics of PSs in a scale space, which lead to a method to select key scales along phase singularity curves. We demonstrate two applications of PSs: object tracking and image matching. In object tracking, we use the iterative closest point algorithm to determine the correspondences of PSs between two adjacent frames. The use of PSs allows us to precisely determine the motions of tracked objects. In image matching, we combine PSs and scale-invariant feature transform (SIFT) descriptor to deal with the variations between two images and examine the proposed method on a benchmark database. The results indicate that our method can find more correct matching pairs with higher repeatability rates than some well-known methods.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...