Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1638-1652, 2022 03.
Article in English | MEDLINE | ID: mdl-32822292

ABSTRACT

Text instance as one category of self-described objects provides valuable information for understanding and describing cluttered scenes. The rich and precise high-level semantics embodied in the text could drastically benefit the understanding of the world around us. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text information, i.e., to accurately localize and recognize a specific targeted text instance in a cluttered image from natural language descriptions (referring expressions). To address this issue, first a novel recurrent dense text localization network (DTLN) is proposed to sequentially decode the intermediate convolutional representations of a cluttered scene image into a set of distinct text instance detections. Our approach avoids repeated text detections at multiple scales by recurrently memorizing previous detections, and effectively tackles crowded text instances in close proximity. Second, we propose a context reasoning text retrieval (CRTR) model, which jointly encodes text instances and their context information through a recurrent network, and ranks localized text bounding boxes by a scoring function of context compatibility. Third, a recurrent text recognition module is introduced to extend the applicability of aforementioned DTLN and CRTR models, via text verification or transcription. Quantitative evaluations on standard scene text extraction benchmarks and a newly collected scene text retrieval dataset demonstrate the effectiveness and advantages of our models for the joint scene text localization, retrieval, and recognition task.


Subject(s)
Algorithms , Semantics
2.
Article in English | MEDLINE | ID: mdl-31369378

ABSTRACT

Text instance provides valuable information for the understanding and interpretation of natural scenes. The rich, precise high-level semantics embodied in the text could be beneficial for understanding the world around us, and empower a wide range of real-world applications. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text segmentation mask, i.e. scene text segmentation from natural language descriptions (referring expressions) like orange text on a little boy in black swinging a bat. The solution of this novel problem enables accurate segmentation of scene text instances from the complex background. In our proposed framework, a unified deep network jointly models visual and linguistic information by encoding both region-level and pixel-level visual features of natural scene images into spatial feature maps, and then decode them into saliency response map of text instances. To conduct quantitative evaluations, we establish a new scene text referring expression segmentation dataset: COCO-CharRef. Experimental results demonstrate the effectiveness of the proposed framework on the text instance segmentation task. By combining image-based visual features with language-based textual explanations, our framework outperforms baselines that are derived from state-of-the-art text localization and natural language object retrieval methods on COCO-CharRef dataset.

3.
IEEE Trans Mob Comput ; 18(3): 702-714, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30774566

ABSTRACT

This paper presents a new holistic vision-based mobile assistive navigation system to help blind and visually impaired people with indoor independent travel. The system detects dynamic obstacles and adjusts path planning in real-time to improve navigation safety. First, we develop an indoor map editor to parse geometric information from architectural models and generate a semantic map consisting of a global 2D traversable grid map layer and context-aware layers. By leveraging the visual positioning service (VPS) within the Google Tango device, we design a map alignment algorithm to bridge the visual area description file (ADF) and semantic map to achieve semantic localization. Using the on-board RGB-D camera, we develop an efficient obstacle detection and avoidance approach based on a time-stamped map Kalman filter (TSM-KF) algorithm. A multi-modal human-machine interface (HMI) is designed with speech-audio interaction and robust haptic interaction through an electronic SmartCane. Finally, field experiments by blindfolded and blind subjects demonstrate that the proposed system provides an effective tool to help blind individuals with indoor navigation and wayfinding.

SELECTION OF CITATIONS
SEARCH DETAIL
...