Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4136-4151, 2023 Apr.
Article in English | MEDLINE | ID: mdl-35816538

ABSTRACT

Weakly-supervised temporal action localization (W-TAL) aims to classify and localize all action instances in untrimmed videos under only video-level supervision. Without frame-level annotations, it is challenging for W-TAL methods to clearly distinguish actions and background, which severely degrades the action boundary localization and action proposal scoring. In this paper, we present an adaptive two-stream consensus network (A-TSCN) to address this problem. Our A-TSCN features an iterative refinement training scheme: a frame-level pseudo ground truth is generated and iteratively updated from a late-fusion activation sequence, and used to provide frame-level supervision for improved model training. Besides, we introduce an adaptive attention normalization loss, which adaptively selects action and background snippets according to video attention distribution. By differentiating the attention values of the selected action snippets and background snippets, it forces the predicted attention to act as a binary selection and promotes the precise localization of action boundaries. Furthermore, we propose a video-level and a snippet-level uncertainty estimator, and they can mitigate the adverse effect caused by learning from noisy pseudo ground truth. Experiments conducted on the THUMOS14, ActivityNet v1.2, ActivityNet v1.3, and HACS datasets show that our A-TSCN outperforms current state-of-the-art methods, and even achieves comparable performance with several fully-supervised methods.

2.
IEEE Trans Med Imaging ; 42(7): 2057-2067, 2023 Jul.
Article in English | MEDLINE | ID: mdl-36215346

ABSTRACT

Federated Learning (FL) is a machine learning paradigm where many local nodes collaboratively train a central model while keeping the training data decentralized. This is particularly relevant for clinical applications since patient data are usually not allowed to be transferred out of medical facilities, leading to the need for FL. Existing FL methods typically share model parameters or employ co-distillation to address the issue of unbalanced data distribution. However, they also require numerous rounds of synchronized communication and, more importantly, suffer from a privacy leakage risk. We propose a privacy-preserving FL framework leveraging unlabeled public data for one-way offline knowledge distillation in this work. The central model is learned from local knowledge via ensemble attention distillation. Our technique uses decentralized and heterogeneous local data like existing FL approaches, but more importantly, it significantly reduces the risk of privacy leakage. We demonstrate that our method achieves very competitive performance with more robust privacy preservation based on extensive experiments on image classification, segmentation, and reconstruction tasks.


Subject(s)
Machine Learning , Privacy , Humans
3.
Internet Things (Amst) ; 18: 100511, 2022 May.
Article in English | MEDLINE | ID: mdl-37521492

ABSTRACT

The use of face masks is an important way to fight the COVID-19 pandemic. In this paper, we envision the Smart Mask, an IoT supported platform and ecosystem aiming to prevent and control the spreading of COVID-19 and other respiratory viruses. The integration of sensing, materials, AI, wireless, IoT, and software will help the gathering of health data and health-related event detection in real time from the user as well as from their environment. In the larger scale, with the help of AI-based analysis for health data it is possible to predict and decrease medical costs with accurate diagnoses and treatment plans, where the comparison of personal data to large-scale public data enables drawing up a personal health trajectory, for example. Key research problems for smart respiratory protective equipment are identified in addition to future research directions. A Smart Mask prototype was developed with accompanying user application, backend and heath AI to study the concept.

4.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6494-6503, 2022 Nov.
Article in English | MEDLINE | ID: mdl-34086579

ABSTRACT

Modern convolutional neural network (CNN)-based object detectors focus on feature configuration during training but often ignore feature optimization during inference. In this article, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages. We introduce a generic inference-aware feature filtering (IFF) module that can be easily combined with existing detectors, resulting in our iffDetector. Unlike conventional open-loop feature calculation approaches without feedback, the proposed IFF module performs the closed-loop feature optimization by leveraging high-level semantics to enhance the convolutional features. By applying the Fourier transform to analyze our detector, we prove that the IFF module acts as a negative feedback that can theoretically guarantee the stability of the feature learning. IFF can be fused with CNN-based object detectors in a plug-and-play manner with little computational cost overhead. Experiments on the PASCAL VOC and MS COCO datasets demonstrate that our iffDetector consistently outperforms state-of-the-art methods with significant margins.

5.
IEEE Trans Pattern Anal Mach Intell ; 43(11): 3799-3819, 2021 11.
Article in English | MEDLINE | ID: mdl-32365018

ABSTRACT

Charts are useful communication tools for the presentation of data in a visually appealing format that facilitates comprehension. There have been many studies dedicated to chart mining, which refers to the process of automatic detection, extraction and analysis of charts to reproduce the tabular data that was originally used to create them. By allowing access to data which might not be available in other formats, chart mining facilitates the creation of many downstream applications. This paper presents a comprehensive survey of approaches across all components of the automated chart mining pipeline, such as (i) automated extraction of charts from documents; (ii) processing of multi-panel charts; (iii) automatic image classifiers to collect chart images at scale; (iv) automated extraction of data from each chart image, for popular chart types as well as selected specialized classes; (v) applications of chart mining; and (vi) datasets for training and evaluation, and the methods that were used to build them. Finally, we summarize the main trends found in the literature and provide pointers to areas for further research in chart mining.

6.
Article in English | MEDLINE | ID: mdl-30387730

ABSTRACT

Document image binarization classifies each pixel in an input document image as either foreground or background under the assumption that the document is pseudo binary in nature. However, noise introduced during acquisition or due to aging or handling of the document can make binarization a challenging task. This paper presents a novel game theory inspired binarization technique for degraded document images. A two-player, non-zero-sum, non-cooperative game is designed at the pixel level to extract the local information, which is then fed to a K-means algorithm to classify a pixel as foreground or background. We also present a preprocessing step that is performed to eliminate the intensity variation that often appears in the background and a post-processing step to refine the results. The method is tested on seven publicly available datasets, namely, DIBCO 2009-14 and 2016. The experimental results show that GiB (Game theory Inspired Binarization) outperforms competing state-of-the-art methods in most cases.

7.
IEEE Trans Image Process ; 25(9): 4444-4457, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27362977

ABSTRACT

Blind image quality assessment (BIQA) research aims to develop a perceptual model to evaluate the quality of distorted images automatically and accurately without access to the non-distorted reference images. The state-of-the-art general purpose BIQA methods can be classified into two categories according to the types of features used. The first includes handcrafted features which rely on the statistical regularities of natural images. These, however, are not suitable for images containing text and artificial graphics. The second includes learning-based features which invariably require large codebook or supervised codebook updating procedures to obtain satisfactory performance. These are time-consuming and not applicable in practice. In this paper, we propose a novel general purpose BIQA method based on high order statistics aggregation (HOSA), requiring only a small codebook. HOSA consists of three steps. First, local normalized image patches are extracted as local features through a regular grid, and a codebook containing 100 codewords is constructed by K-means clustering. In addition to the mean of each cluster, the diagonal covariance and coskewness (i.e., dimension-wise variance and skewness) of clusters are also calculated. Second, each local feature is softly assigned to several nearest clusters and the differences of high order statistics (mean, variance and skewness) between local features and corresponding clusters are softly aggregated to build the global quality aware image representation. Finally, support vector regression is adopted to learn the mapping between perceptual features and subjective opinion scores. The proposed method has been extensively evaluated on ten image databases with both simulated and realistic image distortions, and shows highly competitive performance to the state-of-the-art BIQA methods.

8.
IEEE Trans Pattern Anal Mach Intell ; 37(7): 1480-500, 2015 Jul.
Article in English | MEDLINE | ID: mdl-26352454

ABSTRACT

This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery. It summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems. Existing techniques are categorized as either stepwise or integrated and sub-problems are highlighted including text localization, verification, segmentation and recognition. Special issues associated with the enhancement of degraded text and the processing of video text, multi-oriented, perspectively distorted and multilingual text are also addressed. The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared. This review provides a fundamental comparison and analysis of the remaining problems in the field.

9.
IEEE Trans Image Process ; 21(7): 3129-38, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22410336

ABSTRACT

The goal of no-reference objective image quality assessment (NR-IQA) is to develop a computational model that can predict the human-perceived quality of distorted images accurately and automatically without any prior knowledge of reference images. Most existing NR-IQA approaches are distortion specific and are typically limited to one or two specific types of distortions. In most practical applications, however, information about the distortion type is not really available. In this paper, we propose a general-purpose NR-IQA approach based on visual codebooks. A visual codebook consisting of Gabor-filter-based local features extracted from local image patches is used to capture complex statistics of a natural image. The codebook encodes statistics by quantizing the feature space and accumulating histograms of patch appearances. This method does not assume any specific types of distortions; however, when evaluating images with a particular type of distortion, it does require examples with the same or similar distortion for training. Experimental results demonstrate that the predicted quality score using our method is consistent with human-perceived image quality. The proposed method is comparable to state-of-the-art general-purpose NR-IQA methods and outperforms the full-reference image quality metrics, peak signal-to-noise ratio and structural similarity index on the Laboratory for Image and Video Engineering IQA database.

10.
IEEE Trans Pattern Anal Mach Intell ; 31(11): 2015-31, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19762928

ABSTRACT

As one of the most pervasive methods of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. However, detection and segmentation of free-form objects such as signatures from clustered background is currently an open document analysis problem. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multiscale approach to jointly detecting and segmenting signatures from document images. Rather than focusing on local features that typically have large variations, our approach captures the structural saliency using a signature production model and computes the dynamic curvature of 2D contour fragments over multiple scales. This detection framework is general and computationally tractable. Second, we treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant nonrigid shape matching. We propose two novel measures of shape dissimilarity based on anisotropic scaling and registration residual error and present a supervised learning framework for combining complementary shape information from different dissimilarity metrics using LDA. We quantitatively study state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval. We further demonstrate our matching techniques in offline signature verification. Extensive experiments using large real-world collections of English and Arabic machine-printed and handwritten documents demonstrate the excellent performance of our approaches.


Subject(s)
Algorithms , Artificial Intelligence , Electronic Data Processing/methods , Handwriting , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Biometry/methods , Image Enhancement/methods , Reading , Reproducibility of Results , Sensitivity and Specificity , Subtraction Technique
11.
IEEE Trans Pattern Anal Mach Intell ; 31(2): 193-209, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19110488

ABSTRACT

Resolution of different types of loops in handwritten script presents a difficult task and is an important step in many classic word recognition systems, writer modeling, and signature verification. When processing a handwritten script, a great deal of ambiguity occurs when strokes overlap, merge, or intersect. This paper presents a novel loop modeling and contour-based handwriting analysis that improves loop investigation. We show excellent results on various loop resolution scenarios, including axial loop understanding and collapsed loop recovery. We demonstrate our approach for loop investigation on several realistic data sets of static binary images and compare with the ground truth of the genuine online signal.


Subject(s)
Algorithms , Artificial Intelligence , Electronic Data Processing/methods , Handwriting , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Computer Graphics , Documentation , Humans , Image Enhancement/methods , Numerical Analysis, Computer-Assisted , Online Systems , Reproducibility of Results , Sensitivity and Specificity , Signal Processing, Computer-Assisted , Subtraction Technique , User-Computer Interface
12.
IEEE Trans Pattern Anal Mach Intell ; 30(8): 1313-29, 2008 Aug.
Article in English | MEDLINE | ID: mdl-18566488

ABSTRACT

Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map, where each element represents the probability that the underlying pixel belongs to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike connected component based methods ( [1], [2] for example), the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts, such as Arabic, Chinese, Korean, and Hindi, demonstrate that our algorithm consistently outperforms previous methods [1]-[3]. Further experiments show the proposed algorithm is robust to scale change, rotation, and noise.


Subject(s)
Algorithms , Artificial Intelligence , Documentation/methods , Electronic Data Processing/methods , Handwriting , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
13.
IEEE Trans Pattern Anal Mach Intell ; 30(4): 591-605, 2008 Apr.
Article in English | MEDLINE | ID: mdl-18276966

ABSTRACT

Compared to typical scanners, handheld cameras offer convenient, flexible, portable, and non-contact image capture, which enables many new applications and breathes new life into existing ones. However, camera-captured documents may suffer from distortions caused by non-planar document shape and perspective projection, which lead to failure of current OCR technologies. We present a geometric rectification framework for restoring the frontal-flat view of a document from a single camera-captured image. Our approach estimates 3D document shape from texture flow information obtained directly from the image without requiring additional 3D/metric data or prior camera calibration. Our framework provides a unified solution for both planar and curved documents and can be applied in many, especially mobile, camera-based document analysis applications. Experiments show that our method produces results that are significantly more OCR compatible than the original images.


Subject(s)
Documentation/methods , Electronic Data Processing/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Photography/methods , Algorithms , Artifacts , Artificial Intelligence , Reproducibility of Results , Sensitivity and Specificity
14.
IEEE Trans Pattern Anal Mach Intell ; 28(4): 643-9, 2006 Apr.
Article in English | MEDLINE | ID: mdl-16566512

ABSTRACT

In previous work on point matching, a set of points is often treated as an instance of a joint distribution to exploit global relationships in the point set. For nonrigid shapes, however, the local relationship among neighboring points is stronger and more stable than the global one. In this paper, we introduce the lotion of a neighborhood structure for the general point matching problem. We formulate point matching as an optimization problem to preserve local neighborhood structures during matching. Our approach has a simple graph matching interpretation, where each point is a node in the graph, and two nodes are connected by an edge if they are neighbors. The optimal match between two graphs is the one that maximizes the number of matched edges. Existing techniques are leveraged to search for an optimal solution with the shape context distance used to initialize the graph matching, followed by relaxation labeling updates for refinement. Extensive experiments show the robustness of our approach under deformation, noise in point locations, outliers, occlusion, and rotation. It outperforms the shape context and TPS-RPM algorithms on most scenarios.


Subject(s)
Algorithms , Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Signal Processing, Computer-Assisted , Subtraction Technique , Computer Graphics , Image Enhancement/methods , Information Storage and Retrieval/methods
15.
IEEE Trans Pattern Anal Mach Intell ; 27(5): 777-92, 2005 May.
Article in English | MEDLINE | ID: mdl-15875798

ABSTRACT

The detection of groups of parallel lines is important in applications such as form processing and text (handwriting) extraction from rule lined paper. These tasks can be very challenging in degraded documents where the lines are severely broken. In this paper, we propose a novel model-based method which incorporates high-level context to detect these lines. After preprocessing (such as skew correction and text filtering), we use trained Hidden Markov Models (HMM) to locate the optimal positions of all lines simultaneously on the horizontal or vertical projection profiles, based on the Viterbi decoding. The algorithm is trainable so it can be easily adapted to different application scenarios. The experiments conducted on known form processing and rule line detection show our method is robust, and achieves better results than other widely used line detection methods.


Subject(s)
Algorithms , Artificial Intelligence , Electronic Data Processing/methods , Handwriting , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Computer Graphics , Documentation/methods , Image Enhancement/methods , Markov Chains , Models, Statistical , Numerical Analysis, Computer-Assisted , Reproducibility of Results , Sensitivity and Specificity , Signal Processing, Computer-Assisted , Subtraction Technique
16.
IEEE Trans Pattern Anal Mach Intell ; 26(3): 337-53, 2004 Mar.
Article in English | MEDLINE | ID: mdl-15376881

ABSTRACT

In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections.


Subject(s)
Algorithms , Artificial Intelligence , Electronic Data Processing/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated , Writing , Computer Graphics , Documentation , Image Enhancement/methods , Models, Statistical , Numerical Analysis, Computer-Assisted , Reading , Reproducibility of Results , Sensitivity and Specificity , Signal Processing, Computer-Assisted , Stochastic Processes , Subtraction Technique , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...