Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7123-7141, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36417745

ABSTRACT

Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. First, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Second, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Third, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Additionally, based on an ensemble of the iterative predictions, a self-training method is developed which can learn from unlabeled images effectively. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers. Code is available at https://github.com/FangShancheng/ABINet-PP.

2.
IEEE Trans Image Process ; 31: 5585-5598, 2022.
Article in English | MEDLINE | ID: mdl-35998166

ABSTRACT

The exploration of linguistic information promotes the development of scene text recognition task. Benefiting from the significance in parallel reasoning and global relationship capture, transformer-based language model (TLM) has achieved dominant performance recently. As a decoupled structure from the recognition process, we argue that TLM's capability is limited by the input low-quality visual prediction. To be specific: 1) The visual prediction with low character-wise accuracy increases the correction burden of TLM. 2) The inconsistent word length between visual prediction and original image provides a wrong language modeling guidance in TLM. In this paper, we propose a Progressive scEne Text Recognizer (PETR) to improve the capability of transformer-based language model by handling above two problems. Firstly, a Destruction Learning Module (DLM) is proposed to consider the linguistic information in the visual context. DLM introduces the recognition of destructed images with disordered patches in the training stage. Through guiding the vision model to restore patch orders and make word-level prediction on the destructed images, visual prediction with high character-wise accuracy is obtained by exploring inner relationship between the local visual patches. Secondly, a new Language Rectification Module (LRM) is proposed to optimize the word length for language guidance rectification. Through progressively implementing LRM in different language modeling steps, a novel progressive rectification network is constructed to handle some extremely challenging cases (e.g. distortion, occlusion, etc.). By utilizing DLM and LRM, PETR enhances the capability of transformer-based language model from a more general aspect, that is, focusing on the reduction of correction burden and rectification of language modeling guidance. Compared with parallel transformer-based methods, PETR obtains 1.0% and 0.8% improvement on regular and irregular datasets respectively while introducing only 1.7M additional parameters. The extensive experiments on both English and Chinese benchmarks demonstrate that PETR achieves the state-of-the-art results.

3.
Front Oncol ; 12: 901586, 2022.
Article in English | MEDLINE | ID: mdl-35686096

ABSTRACT

Background: Although deep learning systems (DLSs) have been developed to diagnose urine cytology, more evidence is required to prove if such systems can predict histopathology results as well. Methods: We retrospectively retrieved urine cytology slides and matched histological results. High-power field panel images were annotated by a certified urological pathologist. A deep learning system was designed with a ResNet101 Faster R-CNN (faster region-based convolutional neural network). It was firstly built to spot cancer cells. Then, it was directly used to predict the likelihood of the presence of tissue malignancy. Results: We retrieved 441 positive cases and 395 negative cases. The development involved 387 positive cases, accounting for 2,668 labeled cells, to train the DLS to spot cancer cells. The DLS was then used to predict corresponding histopathology results. In an internal test set of 85 cases, the area under the curve (AUC) was 0.90 (95%CI 0.84-0.96), and the kappa score was 0.68 (95%CI 0.52-0.84), indicating substantial agreement. The F1 score was 0.56, sensitivity was 71% (95%CI 52%-85%), and specificity was 94% (95%CI 84%-98%). In an extra test set of 333 cases, the DLS achieved 0.25 false-positive cells per image. The AUC was 0.93 (95%CI 0.90-0.95), and the kappa score was 0.58 (95%CI 0.46-0.70) indicating moderate agreement. The F1 score was 0.66, sensitivity was 67% (95%CI 54%-78%), and specificity was 92% (95%CI 88%-95%). Conclusions: The deep learning system could predict if there was malignancy using cytocentrifuged urine cytology images. The process was explainable since the prediction of malignancy was directly based on the abnormal cells selected by the model and can be verified by examining those candidate abnormal cells in each image. Thus, this DLS was not just a tool for pathologists in cytology diagnosis. It simultaneously provided novel histopathologic insights for urologists.

4.
Neuroinformatics ; 16(3-4): 445-455, 2018 10.
Article in English | MEDLINE | ID: mdl-29350328

ABSTRACT

How to read Uyghur text from biomedical graphic images is a challenge problem due to the complex layout and cursive writing of Uyghur. In this paper, we propose a system that extracts text from Uyghur biomedical images, and matches the text in a specific lexicon for semantic analysis. The proposed system possesses following distinctive properties: first, it is an integrated system which firstly detects and crops the Uyghur text lines using a single fully convolutional neural network, and then keywords in the lexicon are matched by a well-designed matching network. Second, to train the matching network effectively an online sampling method is applied, which generates synthetic data continually. Finally, we propose a GPU acceleration scheme for matching network to match a complete Uyghur text line directly rather than a single window. Experimental results on benchmark dataset show our method achieves a good performance of F-measure 74.5%. Besides, our system keeps high efficiency with 0.5s running time for each image due to the GPU acceleration scheme.


Subject(s)
Biomedical Technology/methods , Data Mining/methods , Handwriting , Neural Networks, Computer , Pattern Recognition, Automated/methods , Semantics , China/ethnology , Databases, Factual , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...