Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15665-15679, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37669204

ABSTRACT

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel, which significantly reduces the requirement of the length of the sequence. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms.

2.
Natl Sci Rev ; 10(6): nwad115, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37292085

ABSTRACT

This paper presents a novel and efficient algorithm for Chinese historical document understanding, incorporating three key components: a multi-oriented text detector, a dual-path learning-based text recognizer, and a heuristic-based reading order predictor.

3.
IEEE Trans Neural Netw Learn Syst ; 34(11): 8503-8515, 2023 Nov.
Article in English | MEDLINE | ID: mdl-35226609

ABSTRACT

Large amounts of labeled data are urgently required for the training of robust text recognizers. However, collecting handwriting data of diverse styles, along with an immense lexicon, is considerably expensive. Although data synthesis is a promising way to relieve data hunger, two key issues of handwriting synthesis, namely, style representation and content embedding, remain unsolved. To this end, we propose a novel method that can synthesize parameterized and controllable handwriting S tyles for arbitrary-Length and O ut-of-vocabulary text based on a G enerative A dversarial N etwork (GAN), termed SLOGAN. Specifically, we propose a style bank to parameterize specific handwriting styles as latent vectors, which are input to a generator as style priors to achieve the corresponding handwritten styles. The training of the style bank requires only writer identification of the source images, rather than attribute annotations. Moreover, we embed the text content by providing an easily obtainable printed style image, so that the diversity of the content can be flexibly achieved by changing the input printed image. Finally, the generator is guided by dual discriminators to handle both the handwriting characteristics that appear as separated characters and in a series of cursive joins. Our method can synthesize words that are not included in the training vocabulary and with various new styles. Extensive experiments have shown that high-quality text images with great style diversity and rich vocabulary can be synthesized using our method, thereby enhancing the robustness of the recognizer.

4.
IEEE Trans Cybern ; 52(2): 1021-1034, 2022 Feb.
Article in English | MEDLINE | ID: mdl-32459622

ABSTRACT

Filtering and propagation are two basic operations in image analysis and rendering, and they are also widely used in computer graphics and machine learning. However, the models of filtering and propagation were based on diverse mathematical formulations, which have not been fully understood. This article aims to explore the properties of both filtering and propagation models from a partial differential equation (PDE) learning perspective. We propose a unified PDE learning framework based on nonlinear reaction-diffusion with a guided map, graph Laplacian, and reaction weight. It reveals that: 1) the guided map and reaction weight determines whether the PDE produces filtering or propagation diffusion and 2) the kernel of graph Laplacian controls the diffusion pattern. Based on the proposed PDE framework, we derive the mathematical relations between different models, including learning to diffusion (LTD) model, label propagation, edit propagation, and edge-aware filter. In practical verification, we apply the PDE framework to design diffusion operations with the adaptive kernel to tackle the ill-posed problem of facial intrinsic image analysis (FIIA). A flexible task-aware FIIA system is built to achieve various facial rendering effects, such as face image relighting and delighting, artistic illumination transfer, illumination-aware face swapping, or transfiguring. Qualitative and quantitative experiments show the effectiveness and flexibility of task-aware FIIA and provide new insights on PDE learning for visual analysis and rendering.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Computer Graphics , Face/diagnostic imaging , Image Processing, Computer-Assisted/methods , Machine Learning
5.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8048-8064, 2022 11.
Article in English | MEDLINE | ID: mdl-34460364

ABSTRACT

End-to-end text-spotting, which aims to integrate detection and recognition in a unified framework, has attracted increasing attention due to its simplicity of the two complimentary tasks. It remains an open problem especially when processing arbitrarily-shaped text instances. Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output. Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2). Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance of arbitrary shapes, significantly improving the precision of recognition over previous methods. 3) Different from previous methods, which often suffer from complex post-processing and sensitive hyper-parameters, our ABCNet v2 maintains a simple pipeline with the only post-processing non-maximum suppression (NMS). 4) As the performance of text recognition closely depends on feature alignment, ABCNet v2 further adopts a simple yet effective coordinate convolution to encode the position of the convolutional filters, which leads to a considerable improvement with negligible computation overhead. Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the-art performance while maintaining very high efficiency. More importantly, as there is little work on quantization of text spotting models, we quantize our models to improve the inference time of the proposed ABCNet v2. This can be valuable for real-time applications. Code and model are available at: https://git.io/AdelaiDet.


Subject(s)
Algorithms , Benchmarking
6.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6472-6485, 2022 Oct.
Article in English | MEDLINE | ID: mdl-34101587

ABSTRACT

Handwritten signature verification is a challenging task because signatures of a writer may be skillfully imitated by a forger. As skilled forgeries are generally difficult to acquire for training, in this paper, we propose a deep learning-based dynamic signature verification framework, SynSig2Vec, to address the skilled forgery attack without training with any skilled forgeries. Specifically, SynSig2Vec consists of a novel learning-by-synthesis method for training and a 1D convolutional neural network model, called Sig2Vec, for signature representation extraction. The learning-by-synthesis method first applies the Sigma Lognormal model to synthesize signatures with different distortion levels for genuine template signatures, and then learns to rank these synthesized samples in a learnable representation space based on average precision optimization. The representation space is achieved by the proposed Sig2Vec model, which is designed to extract fixed-length representations from dynamic signatures of arbitrary lengths. Through this training method, the Sig2Vec model can extract extremely effective signature representations for verification. Our SynSig2Vec framework requires only genuine signatures for training, yet achieves state-of-the-art performance on the largest dynamic signature database to date, DeepSignDB, in both skilled forgery and random forgery scenarios. Source codes of SynSig2Vec will be available at https://github.com/LaiSongxuan/SynSig2Vec.

7.
Sensors (Basel) ; 21(18)2021 Sep 16.
Article in English | MEDLINE | ID: mdl-34577433

ABSTRACT

Landing an unmanned aerial vehicle (UAV) autonomously and safely is a challenging task. Although the existing approaches have resolved the problem of precise landing by identifying a specific landing marker using the UAV's onboard vision system, the vast majority of these works are conducted in either daytime or well-illuminated laboratory environments. In contrast, very few researchers have investigated the possibility of landing in low-illumination conditions by employing various active light sources to lighten the markers. In this paper, a novel vision system design is proposed to tackle UAV landing in outdoor extreme low-illumination environments without the need to apply an active light source to the marker. We use a model-based enhancement scheme to improve the quality and brightness of the onboard captured images, then present a hierarchical-based method consisting of a decision tree with an associated light-weight convolutional neural network (CNN) for coarse-to-fine landing marker localization, where the key information of the marker is extracted and reserved for post-processing, such as pose estimation and landing control. Extensive evaluations have been conducted to demonstrate the robustness, accuracy, and real-time performance of the proposed vision system. Field experiments across a variety of outdoor nighttime scenarios with an average luminance of 5 lx at the marker locations have proven the feasibility and practicability of the system.

8.
Article in English | MEDLINE | ID: mdl-32857697

ABSTRACT

Scene text removal has attracted increasing research interests owing to its valuable applications in privacy protection, camera-based virtual reality translation, and image editing. However, existing approaches, which fall short on real applications, are mainly because they were evaluated on synthetic or unrepresentative datasets. To fill this gap and facilitate this research direction, this paper proposes a real-world dataset called SCUT-EnsText that consists of 3,562 diverse images selected from public scene text reading benchmarks, and each image is scrupulously annotated to provide visually plausible erasure targets. With SCUT-EnsText, we design a novel GANbased model termed EraseNet that can automatically remove text located on the natural images. The model is a two-stage network that consists of a coarse-erasure sub-network and a refinement sub-network. The refinement sub-network targets improvement in the feature representation and refinement of the coarse outputs to enhance the removal performance. Additionally, EraseNet contains a segmentation head for text perception and a local-global SN-Patch-GAN with spectral normalization (SN) on both the generator and discriminator for maintaining the training stability and the congruity of the erased regions. A sufficient number of experiments are conducted on both the previous public dataset and the brand-new SCUT-EnsText. Our EraseNet significantly outperforms the existing state-of-the-art methods in terms of all metrics, with remarkably superior higherquality results. The dataset and code will be made available at https://github.com/HCIILAB/SCUT-EnsText.

9.
Article in English | MEDLINE | ID: mdl-31794397

ABSTRACT

Scene text in the environment is complicated. It can exist in arbitrary text fonts, sizes or shapes. Although scene text detection has witnessed considerable progress in recent years, the detection of text with complex shapes, especially curved text, remains challenging. Datasets with adequate samples to overcome the problem presented by curved text (or other irregularly shaped text) have been introduced only recently; however, the performance of the reported methods on these datasets is unsatisfactory. Therefore, detecting arbitrarily shaped text remains a challenging. This motivated us to propose the Mask Tightness Text Detector (Mask TTD) to improve text detection performance. Mask TTD uses a tightness prior and text frontier learning to enhance pixel-wise mask prediction. In addition, it achieves mutual promotion by integrating a branch for the polygonal boundary of each text region, which significantly improves the detection performance of arbitrarily shaped text. Experiments demonstrate that Mask TTD can achieve state-ofthe-art performance on existing curved text datasets (CTW1500, Total-text, and CUTE80) and three common benchmark datasets (RCTW-17, MSRA-TD500, and ICDAR 2015). It is worth mentioning that on CTW1500, our method can outperform previous methods, especially at higher intersection over union (IoU) thresholds (16% higher than the next-best method with an IoU threshold of 0.8), which demonstrates its potential for tight text detection. Moreover, on the largest Chinese-based dataset RCTW-17, Mask TTD outperforms other methods by a large margin in terms of both the Average Precision and F-measure, showing its powerful generalization ability.

10.
IEEE Trans Neural Netw Learn Syst ; 29(11): 5174-5184, 2018 11.
Article in English | MEDLINE | ID: mdl-29994078

ABSTRACT

Robotic control in a continuous action space has long been a challenging topic. This is especially true when controlling robots to solve compound tasks, as both basic skills and compound skills need to be learned. In this paper, we propose a hierarchical deep reinforcement learning algorithm to learn basic skills and compound skills simultaneously. In the proposed algorithm, compound skills and basic skills are learned by two levels of hierarchy. In the first level of hierarchy, each basic skill is handled by its own actor, overseen by a shared basic critic. Then, in the second level of hierarchy, compound skills are learned by a meta critic by reusing basic skills. The proposed algorithm was evaluated on a Pioneer 3AT robot in three different navigation scenarios with fully observable tasks. The simulations were built in Gazebo 2 in a robot operating system Indigo environment. The results show that the proposed algorithm can learn both high performance basic skills and compound skills through the same learning process. The compound skills learned outperform those learned by a discrete action space deep reinforcement learning algorithm.

11.
IEEE Trans Neural Syst Rehabil Eng ; 26(3): 563-572, 2018 03.
Article in English | MEDLINE | ID: mdl-29522400

ABSTRACT

Detecting and Please provide the correct one analyzing the event-related potential (ERP) remains an important problem in neuroscience. Due to the low signal-to-noise ratio and complex spatio-temporal patterns of ERP signals, conventional methods usually rely on ensemble averaging technique for reliable detection, which may obliterate subtle but important information in each trial of ERP signals. Inspired by deep learning methods, we propose a novel hybrid network termed ERP-NET. With hybrid deep structure, the proposed network is able to learn complex spatial and temporal patterns from single-trial ERP signals. To verify the effectiveness of ERP-NET, we carried out a few ERP detection experiments that the proposed model achieved cutting-edge performance. The experimental results demonstrate that the patterns learned by the ERP-NET are discriminative ERP components in which the ERP signals are properly characterized. More importantly, as an effective approach to single-trial analysis, ERP-NET is able to discover new ERP patterns which are significant to neuroscience study as well as BCI applications. Therefore, the proposed ERP-NET is a promising tool for the research on ERP signals.


Subject(s)
Electroencephalography/methods , Evoked Potentials/physiology , Neural Networks, Computer , Algorithms , Brain-Computer Interfaces , Electroencephalography/instrumentation , Event-Related Potentials, P300/physiology , Humans , Prosthesis Design , Signal Processing, Computer-Assisted , Signal-To-Noise Ratio
12.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 1903-1917, 2018 08.
Article in English | MEDLINE | ID: mdl-28767364

ABSTRACT

Online handwritten Chinese text recognition (OHCTR) is a challenging problem as it involves a large-scale character set, ambiguous segmentation, and variable-length input sequences. In this paper, we exploit the outstanding capability of path signature to translate online pen-tip trajectories into informative signature feature maps, successfully capturing the analytic and geometric properties of pen strokes with strong local invariance and robustness. A multi-spatial-context fully convolutional recurrent network (MC-FCRN) is proposed to exploit the multiple spatial contexts from the signature feature maps and generate a prediction sequence while completely avoiding the difficult segmentation problem. Furthermore, an implicit language model is developed to make predictions based on semantic context within a predicting feature sequence, providing a new perspective for incorporating lexicon constraints and prior knowledge about a certain language in the recognition procedure. Experiments on two standard benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with correct rates of 97.50 and 96.58 percent, respectively, which are significantly better than the best result reported thus far in the literature.

13.
Chem Commun (Camb) ; 53(28): 3986-3989, 2017 Apr 04.
Article in English | MEDLINE | ID: mdl-28337498

ABSTRACT

Palladium-catalyzed intermolecular amination of unactivated C(sp3)-H bonds was developed. Using NFSI as both the amino source and the oxidant, this protocol operates under mild conditions with excellent terminal selectivity and a broad substrate scope. Moreover, the directing group can be easily removed to produce 1,2-amino alcohols.

14.
IEEE Trans Neural Netw Learn Syst ; 27(6): 1392-404, 2016 06.
Article in English | MEDLINE | ID: mdl-25265635

ABSTRACT

With the rapid development of mobile devices and pervasive computing technologies, acceleration-based human activity recognition, a difficult yet essential problem in mobile apps, has received intensive attention recently. Different acceleration signals for representing different activities or even a same activity have different attributes, which causes troubles in normalizing the signals. We thus cannot directly compare these signals with each other, because they are embedded in a nonmetric space. Therefore, we present a nonmetric scheme that retains discriminative and robust frequency domain information by developing a novel ensemble manifold rank preserving (EMRP) algorithm. EMRP simultaneously considers three aspects: 1) it encodes the local geometry using the ranking order information of intraclass samples distributed on local patches; 2) it keeps the discriminative information by maximizing the margin between samples of different classes; and 3) it finds the optimal linear combination of the alignment matrices to approximate the intrinsic manifold lied in the data. Experiments are conducted on the South China University of Technology naturalistic 3-D acceleration-based activity dataset and the naturalistic mobile-devices based human activity dataset to demonstrate the robustness and effectiveness of the new nonmetric scheme for acceleration-based human activity recognition.


Subject(s)
Human Activities , Algorithms , Humans , Pattern Recognition, Automated
15.
IEEE Trans Cybern ; 46(3): 756-65, 2016 Mar.
Article in English | MEDLINE | ID: mdl-25838536

ABSTRACT

Chinese character font recognition (CCFR) has received increasing attention as the intelligent applications based on optical character recognition becomes popular. However, traditional CCFR systems do not handle noisy data effectively. By analyzing in detail the basic strokes of Chinese characters, we propose that font recognition on a single Chinese character is a sequence classification problem, which can be effectively solved by recurrent neural networks. For robust CCFR, we integrate a principal component convolution layer with the 2-D long short-term memory (2DLSTM) and develop principal component 2DLSTM (PC-2DLSTM) algorithm. PC-2DLSTM considers two aspects: 1) the principal component layer convolution operation helps remove the noise and get a rational and complete font information and 2) simultaneously, 2DLSTM deals with the long-range contextual processing along scan directions that can contribute to capture the contrast between character trajectory and background. Experiments using the frequently used CCFR dataset suggest the effectiveness of PC-2DLSTM compared with other state-of-the-art font recognition methods.

16.
IEEE Trans Cybern ; 45(2): 242-52, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25486658

ABSTRACT

In recent years, person reidentification has received growing attention with the increasing popularity of intelligent video surveillance. This is because person reidentification is critical for human tracking with multiple cameras. Recently, keep it simple and straightforward (KISS) metric learning has been regarded as a top level algorithm for person reidentification. The covariance matrices of KISS are estimated by maximum likelihood (ML) estimation. It is known that discriminative learning based on the minimum classification error (MCE) is more reliable than classical ML estimation with the increasing of the number of training samples. When considering a small sample size problem, direct MCE KISS does not work well, because of the estimate error of small eigenvalues. Therefore, we further introduce the smoothing technique to improve the estimates of the small eigenvalues of a covariance matrix. Our new scheme is termed the minimum classification error-KISS (MCE-KISS). We conduct thorough validation experiments on the VIPeR and ETHZ datasets, which demonstrate the robustness and effectiveness of MCE-KISS for person reidentification.

17.
IEEE Trans Cybern ; 44(12): 2600-12, 2014 Dec.
Article in English | MEDLINE | ID: mdl-24710839

ABSTRACT

In this paper, we propose a unified facial beautification framework with respect to skin homogeneity, lighting, and color. A novel region-aware mask is constructed for skin manipulation, which can automatically select the edited regions with great precision. Inspired by the state-of-the-art edit propagation techniques, we present an adaptive edge-preserving energy minimization model with a spatially variant parameter and a high-dimensional guided feature space for mask generation. Using region-aware masks, our method facilitates more flexible and accurate facial skin enhancement while the complex manipulations are simplified considerably. In our beautification framework, a portrait is decomposed into smoothness, lighting, and color layers by an edge-preserving operator. Next, facial landmarks and significant features are extracted as input constraints for mask generation. After three region-aware masks have been obtained, a user can perform facial beautification simply by adjusting the skin parameters. Furthermore, the combinations of parameters can be optimized automatically, depending on the data priors and psychological knowledge. We performed both qualitative and quantitative evaluation for our method using faces with different genders, races, ages, poses, and backgrounds from various databases. The experimental results demonstrate that our technique is superior to previous methods and comparable to commercial systems, for example, PicTreat, Portrait+ , and Portraiture.


Subject(s)
Face/anatomy & histology , Image Interpretation, Computer-Assisted/methods , Paintings , Pattern Recognition, Automated/methods , Photography/methods , Skin/anatomy & histology , Algorithms , Artificial Intelligence , Humans , Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
18.
IEEE Trans Cybern ; 43(5): 1406-17, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23846511

ABSTRACT

With the rapid development of the RGB-D sensors and the promptly growing population of the low-cost Microsoft Kinect sensor, scene classification, which is a hard, yet important, problem in computer vision, has gained a resurgence of interest recently. That is because the depth of information provided by the Kinect sensor opens an effective and innovative way for scene classification. In this paper, we propose a new scheme for scene classification, which applies locality-constrained linear coding (LLC) to local SIFT features for representing the RGB-D samples and classifies scenes through the cooperation between a new rank preserving sparse learning (RPSL) based dimension reduction and a simple classification method. RPSL considers four aspects: 1) it preserves the rank order information of the within-class samples in a local patch; 2) it maximizes the margin between the between-class samples on the local patch; 3) the L1-norm penalty is introduced to obtain the parsimony property; and 4) it models the classification error minimization by utilizing the least-squares error minimization. Experiments are conducted on the NYU Depth V1 dataset and demonstrate the robustness and effectiveness of RPSL for scene classification.


Subject(s)
Algorithms , Artificial Intelligence , Computer Peripherals , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Whole Body Imaging/methods , Computer Simulation , Computer Systems , Image Enhancement/instrumentation , Image Enhancement/methods , Transducers , Video Games , Whole Body Imaging/instrumentation
19.
IEEE Trans Cybern ; 43(6): 1747-54, 2013 Dec.
Article in English | MEDLINE | ID: mdl-23757592

ABSTRACT

As the clinical application grows, there is a rapid technical development of 3-D ultrasound imaging. Compared with 2-D ultrasound imaging, 3-D ultrasound imaging can provide improved qualitative and quantitative information for various clinical applications. In this paper, we proposed a novel tracking method for a freehand 3-D ultrasound imaging system with improved portability, reduced degree of freedom, and cost. We designed a sliding track with a linear position sensor attached, and it transmitted positional data via a wireless communication module based on Bluetooth, resulting in a wireless spatial tracking modality. A traditional 2-D ultrasound probe fixed to the position sensor on the sliding track was used to obtain real-time B-scans, and the positions of the B-scans were simultaneously acquired when moving the probe along the track in a freehand manner. In the experiments, the proposed method was applied to ultrasound phantoms and real human tissues. The results demonstrated that the new system outperformed a previously developed freehand system based on a traditional six-degree-of-freedom spatial sensor in phantom and in vivo studies, indicating its merit in clinical applications for human tissues and organs.


Subject(s)
Image Enhancement/instrumentation , Imaging, Three-Dimensional/instrumentation , Ultrasonography/instrumentation , Wireless Technology/instrumentation , Equipment Design , Equipment Failure Analysis , Humans , Phantoms, Imaging , Reproducibility of Results , Sensitivity and Specificity
20.
Ultrasonics ; 52(2): 266-75, 2012 Feb.
Article in English | MEDLINE | ID: mdl-21925692

ABSTRACT

OBJECTIVES: This paper introduces a new graph-based method for segmenting breast tumors in US images. BACKGROUND AND MOTIVATION: Segmentation for breast tumors in ultrasound (US) images is crucial for computer-aided diagnosis system, but it has always been a difficult task due to the defects inherent in the US images, such as speckles and low contrast. METHODS: The proposed segmentation algorithm constructed a graph using improved neighborhood models. In addition, taking advantages of local statistics, a new pair-wise region comparison predicate that was insensitive to noises was proposed to determine the mergence of any two of adjacent subregions. RESULTS AND CONCLUSION: Experimental results have shown that the proposed method could improve the segmentation accuracy by 1.5-5.6% in comparison with three often used segmentation methods, and should be capable of segmenting breast tumors in US images.


Subject(s)
Ultrasonography, Mammary/methods , Algorithms , Breast Neoplasms/diagnostic imaging , Female , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...