Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3013-3030, 2024 May.
Article in English | MEDLINE | ID: mdl-38090825

ABSTRACT

Fast person re-identification (ReID) aims to search person images quickly and accurately. The main idea of recent fast ReID methods is the hashing algorithm, which learns compact binary codes and performs fast Hamming distance and counting sort. However, a very long code is needed for high accuracy (e.g., 2048), which compromises search speed. In this work, we introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine (CtF) hashing code search strategy, which complementarily uses short and long codes, achieving both faster speed and better accuracy. It uses shorter codes to coarsely rank broad matching similarities and longer codes to refine only a few top candidates for more accurate instance ReID. Specifically, we design an All-in-One (AiO) module together with a Distance Threshold Optimization (DTO) algorithm. In AiO, we simultaneously learn and enhance multiple codes of different lengths in a single model. It learns multiple codes in a pyramid structure, and encourage shorter codes to mimic longer codes by self-distillation. DTO solves a complex threshold search problem by a simple optimization process, and the balance between accuracy and speed is easily controlled by a single parameter. It formulates the optimization target as a Fß score that can be optimised by Gaussian cumulative distribution functions. Besides, we find even short code (e.g., 32) still takes a long time under large-scale gallery due to the O(n) time complexity. To solve the problem, we propose a gallery-size-free latent-attributes-based One-Shot-Filter (OSF) strategy, that is always O(1) time complexity, to quickly filter major easy negative gallery images, Specifically, we design a Latent-Attribute-Learning (LAL) module supervised a Single-Direction-Metric (SDM) Loss. LAL is derived from principal component analysis (PCA) that keeps largest variance using shortest feature vector, meanwhile enabling batch and end-to-end learning. Every logit of a feature vector represents a meaningful attribute. SDM is carefully designed for fine-grained attribute supervision, outperforming common metrics such as Euclidean and Cosine metrics. Experimental results on 2 datasets show that CtF+OSF is not only 2% more accurate but also 5× faster than contemporary hashing ReID methods. Compared with non-hashing ReID methods, CtF is 50× faster with comparable accuracy. OSF further speeds CtF by 2× again and upto 10× in total with almost no accuracy drop.

2.
IEEE Trans Pattern Anal Mach Intell ; 42(7): 1770-1782, 2020 07.
Article in English | MEDLINE | ID: mdl-30843803

ABSTRACT

Most existing person re-identification (re-id) methods rely on supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in a practical re-id deployment, due to the lack of exhaustive identity labelling of positive and negative image pairs for every camera-pair. In this work, we present an unsupervised re-id deep learning approach. It is capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data end-to-end. We formulate an Unsupervised Tracklet Association Learning (UTAL) framework. This is by jointly learning within-camera tracklet discrimination and cross-camera tracklet association in order to maximise the discovery of tracklet identity matching both within and across camera views. Extensive experiments demonstrate the superiority of the proposed model over the state-of-the-art unsupervised learning and domain adaptation person re-id methods on eight benchmarking datasets.

3.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 2009-2022, 2018 08.
Article in English | MEDLINE | ID: mdl-28796607

ABSTRACT

Zero-Shot Learning (ZSL) for visual recognition is typically achieved by exploiting a semantic embedding space. In such a space, both seen and unseen class labels as well as image features can be embedded so that the similarity among them can be measured directly. In this work, we consider that the key to effective ZSL is to compute an optimal distance metric in the semantic embedding space. Existing ZSL works employ either euclidean or cosine distances. However, in a high-dimensional space where the projected class labels (prototypes) are sparse, these distances are suboptimal, resulting in a number of problems including hubness and domain shift. To overcome these problems, a novel manifold distance computed on a semantic class prototype graph is proposed which takes into account the rich intrinsic semantic structure, i.e., semantic manifold, of the class prototype distribution. To further alleviate the domain shift problem, a new regularisation term is introduced into a ranking loss based embedding model. Specifically, the ranking loss objective is regularised by unseen class prototypes to prevent the projected object features from being biased towards the seen prototypes. Extensive experiments on four benchmarks show that our method significantly outperforms the state-of-the-art.

4.
Int J Comput Vis ; 126(12): 1288-1310, 2018.
Article in English | MEDLINE | ID: mdl-30930537

ABSTRACT

Most existing person re-identification (re-id) methods are unsuitable for real-world deployment due to two reasons: Unscalability to large population size, and Inadaptability over time. In this work, we present a unified solution to address both problems. Specifically, we propose to construct an identity regression space (IRS) based on embedding different training person identities (classes) and formulate re-id as a regression problem solved by identity regression in the IRS. The IRS approach is characterised by a closed-form solution with high learning efficiency and an inherent incremental learning capability with human-in-the-loop. Extensive experiments on four benchmarking datasets (VIPeR, CUHK01, CUHK03 and Market-1501) show that the IRS model not only outperforms state-of-the-art re-id methods, but also is more scalable to large re-id population size by rapidly updating model and actively selecting informative samples with reduced human labelling effort.

5.
IEEE Trans Pattern Anal Mach Intell ; 38(3): 563-77, 2016 Mar.
Article in English | MEDLINE | ID: mdl-27046498

ABSTRACT

The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require a large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. This differs from existing methods in that (1) the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order, and (2) the outlier detection and learning to rank problems are solved jointly. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations.

6.
IEEE Trans Pattern Anal Mach Intell ; 38(3): 591-606, 2016 Mar.
Article in English | MEDLINE | ID: mdl-27046499

ABSTRACT

Solving the problem of matching people across non-overlapping multi-camera views, known as person re-identification (re-id), has received increasing interests in computer vision. In a real-world application scenario, a watch-list (gallery set) of a handful of known target people are provided with very few (in many cases only a single) image(s) (shots) per target. Existing re-id methods are largely unsuitable to address this open-world re-id challenge because they are designed for (1) a closed-world scenario where the gallery and probe sets are assumed to contain exactly the same people, (2) person-wise identification whereby the model attempts to verify exhaustively against each individual in the gallery set, and (3) learning a matching model using multi-shots. In this paper, a novel transfer local relative distance comparison (t-LRDC) model is formulated to address the open-world person re-identification problem by one-shot group-based verification. The model is designed to mine and transfer useful information from a labelled open-world non-target dataset. Extensive experiments demonstrate that the proposed approach outperforms both non-transfer learning and existing transfer learning based re-id methods.


Subject(s)
Biometric Identification/methods , Biometric Identification/standards , Image Processing, Computer-Assisted/methods , Image Processing, Computer-Assisted/standards , Algorithms , Databases, Factual , Humans
7.
IEEE Trans Pattern Anal Mach Intell ; 38(12): 2501-2514, 2016 12.
Article in English | MEDLINE | ID: mdl-26829777

ABSTRACT

Current person re-identification (ReID) methods typically rely on single-frame imagery features, whilst ignoring space-time information from image sequences often available in the practical surveillance scenarios. Single-frame (single-shot) based visual appearance matching is inherently limited for person ReID in public spaces due to the challenging visual ambiguity and uncertainty arising from non-overlapping camera views where viewing condition changes can cause significant people appearance variations. In this work, we present a novel model to automatically select the most discriminative video fragments from noisy/incomplete image sequences of people from which reliable space-time and appearance features can be computed, whilst simultaneously learning a video ranking function for person ReID. Using the PRID 2011, iLIDS-VID, and HDA+ image sequence datasets, we extensively conducted comparative evaluations to demonstrate the advantages of the proposed model over contemporary gait recognition, holistic image sequence matching and state-of-the-art single-/multi-shot ReID methods.


Subject(s)
Algorithms , Biometric Identification/methods , Discriminant Analysis , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Photography/methods , Video Recording/methods , Humans
8.
IEEE Trans Neural Netw Learn Syst ; 27(6): 1345-57, 2016 06.
Article in English | MEDLINE | ID: mdl-25622327

ABSTRACT

While clustering is usually an unsupervised operation, there are circumstances where we have access to prior belief that pairs of samples should (or should not) be assigned with the same cluster. Constrained clustering aims to exploit this prior belief as constraint (or weak supervision) to influence the cluster formation so as to obtain a data structure more closely resembling human perception. Two important issues remain open: 1) how to exploit sparse constraints effectively and 2) how to handle ill-conditioned/noisy constraints generated by imperfect oracles. In this paper, we present a novel pairwise similarity measure framework to address the above issues. Specifically, in contrast to existing constrained clustering approaches that blindly rely on all features for constraint propagation, our approach searches for neighborhoods driven by discriminative feature selection for more effective constraint diffusion. Crucially, we formulate a novel approach to handling the noisy constraint problem, which has been unrealistically ignored in the constrained clustering literature. Extensive comparative results show that our method is superior to the state-of-the-art constrained clustering approaches and can generally benefit existing pairwise similarity-based data clustering algorithms, such as spectral clustering and affinity propagation.

9.
IEEE Trans Pattern Anal Mach Intell ; 37(11): 2332-45, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26440271

ABSTRACT

Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.

10.
IEEE Trans Pattern Anal Mach Intell ; 36(2): 303-16, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24356351

ABSTRACT

The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular, we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multimodal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we 1) introduce a concept of semilatent attribute space, expressing user-defined and latent attributes in a unified framework, and 2) propose a novel scalable probabilistic topic model for learning multimodal semilatent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multitask learning, learning with label noise, N-shot transfer learning, and importantly zero-shot learning.


Subject(s)
Artificial Intelligence , Documentation/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Photography/methods , Video Recording/methods , Algorithms , Reproducibility of Results , Sensitivity and Specificity
11.
IEEE Trans Pattern Anal Mach Intell ; 35(3): 653-68, 2013 Mar.
Article in English | MEDLINE | ID: mdl-22732661

ABSTRACT

Matching people across nonoverlapping camera views at different locations and different times, known as person reidentification, is both a hard and important problem for associating behavior of people observed in a large distributed space over a prolonged period of time. Person reidentification is fundamentally challenging because of the large visual appearance changes caused by variations in view angle, lighting, background clutter, and occlusion. To address these challenges, most previous approaches aim to model and extract distinctive and reliable visual features. However, seeking an optimal and robust similarity measure that quantifies a wide range of features against realistic viewing conditions from a distance is still an open and unsolved problem for person reidentification. In this paper, we formulate person reidentification as a relative distance comparison (RDC) learning problem in order to learn the optimal similarity measure between a pair of person images. This approach avoids treating all features indiscriminately and does not assume the existence of some universally distinctive and reliable features. To that end, a novel relative distance comparison model is introduced. The model is formulated to maximize the likelihood of a pair of true matches having a relatively smaller distance than that of a wrong match pair in a soft discriminant manner. Moreover, in order to maintain the tractability of the model in large scale learning, we further develop an ensemble RDC model. Extensive experiments on three publicly available benchmarking datasets are carried out to demonstrate the clear superiority of the proposed RDC models over related popular person reidentification techniques. The results also show that the new RDC models are more robust against visual appearance changes and less susceptible to model overfitting compared to other related existing models.


Subject(s)
Biometric Identification/methods , Image Processing, Computer-Assisted/methods , Algorithms , Humans , Machine Learning , Video Recording
12.
IEEE Trans Pattern Anal Mach Intell ; 34(4): 762-77, 2012 Apr.
Article in English | MEDLINE | ID: mdl-21844619

ABSTRACT

Context is critical for reducing the uncertainty in object detection. However, context modeling is challenging because there are often many different types of contextual information coexisting with different degrees of relevance to the detection of target object(s) in different images. It is therefore crucial to devise a context model to automatically quantify and select the most effective contextual information for assisting in detecting the target object. Nevertheless, the diversity of contextual information means that learning a robust context model requires a larger training set than learning the target object appearance model, which may not be available in practice. In this work, a novel context modeling framework is proposed without the need for any prior scene segmentation or context annotation. We formulate a polar geometric context descriptor for representing multiple types of contextual information. In order to quantify context, we propose a new maximum margin context (MMC) model to evaluate and measure the usefulness of contextual information directly and explicitly through a discriminant context inference method. Furthermore, to address the problem of context learning with limited data, we exploit the idea of transfer learning based on the observation that although two categories of objects can have very different visual appearance, there can be similarity in their context and/or the way contextual information helps to distinguish target objects from nontarget objects. To that end, two novel context transfer learning models are proposed which utilize training samples from source object classes to improve the learning of the context model for a target object class based on a joint maximum margin learning framework. Experiments are carried out on PASCAL VOC2005 and VOC2007 data sets, a luggage detection data set extracted from the i-LIDS data set, and a vehicle detection data set extracted from outdoor surveillance footage. Our results validate the effectiveness of the proposed models for quantifying and transferring contextual information, and demonstrate that they outperform related alternative context models.


Subject(s)
Algorithms , Pattern Recognition, Automated/methods , Visual Perception/physiology , Humans , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Visual/physiology
13.
IEEE Trans Pattern Anal Mach Intell ; 34(9): 1799-813, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22184260

ABSTRACT

Activity modeling and unusual event detection in a network of cameras is challenging, particularly when the camera views are not overlapped. We show that it is possible to detect unusual events in multiple disjoint cameras as context-incoherent patterns through incremental learning of time delayed dependencies between distributed local activities observed within and across camera views. Specifically, we model multicamera activities using a Time Delayed Probabilistic Graphical Model (TD-PGM) with different nodes representing activities in different decomposed regions from different views and the directed links between nodes encoding their time delayed dependencies. To deal with visual context changes, we formulate a novel incremental learning method for modeling time delayed dependencies that change over time. We validate the effectiveness of the proposed approach using a synthetic data set and videos captured from a camera network installed at a busy underground station.

14.
IEEE Trans Pattern Anal Mach Intell ; 33(12): 2451-64, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21519099

ABSTRACT

One of the most interesting and desired capabilities for automated video behavior analysis is the identification of rarely occurring and subtle behaviors. This is of practical value because dangerous or illegal activities often have few or possibly only one prior example to learn from and are often subtle. Rare and subtle behavior learning is challenging for two reasons: (1) Contemporary modeling approaches require more data and supervision than may be available and (2) the most interesting and potentially critical rare behaviors are often visually subtle-occurring among more obvious typical behaviors or being defined by only small spatio-temporal deviations from typical behaviors. In this paper, we introduce a novel weakly supervised joint topic model which addresses these issues. Specifically, we introduce a multiclass topic model with partially shared latent structure and associated learning and inference algorithms. These contributions will permit modeling of behaviors from as few as one example, even without localization by the user and when occurring in clutter, and subsequent classification and localization of such behaviors online and in real time. We extensively validate our approach on two standard public-space data sets, where it clearly outperforms a batch of contemporary alternatives.

15.
Dalton Trans ; (39): 8237-47, 2009 Oct 21.
Article in English | MEDLINE | ID: mdl-19789776

ABSTRACT

Zirconium and hafnium complexes bearing new 1,2-ethanediyl- or 1,3-propanediyl-linked bis(beta-diketiminate) ligands, [{C(n)H(2n)-(BDI(Ar))(2)}MCl(2)] (Ar = 2,6-Me(2)-C(6)H(3), 2,6-Cl(2)-C(6)H(3), 2,6-(i)Pr(2)-C(6)H(3); M = Zr, n = 2 (4a-c), n = 3 (5a-c); M = Hf, n = 2 (6b)), were synthesized via the reaction of MCl(4).2THF and one equivalent of dilithium salt of the corresponding ligand. Distorted trigonal prismatic and octahedral coordination geometries as well as C(1)-symmetric structures are found for zirconium complexes and in the solid state. Variable temperature (1)H NMR spectra indicated the fluxional nature of and in solution. Upon activation with methylaluminoxane (MAO), all these complexes except hafnium complex displayed moderate catalytic activities for ethylene polymerization. 1,2-Ethanediyl-linked complexes and are generally more active than their 1,3-propanediyl-linked analogues. The substituents at the ortho-positions of the phenyl rings have different effect on the catalytic activities of 1,2-ethanediyl-linked series or 1,3-propanediyl-linked series. It is noteworthy that even at a low Al/Zr molar ratio of 500, the catalytic activities of these zirconium complexes could be retained. Polyethylenes with broad molecular weight distributions (MWD = 15.3-20.3) were produced, which might result from the fluxional character of the zirconium complexes. The linear structure of obtained polyethylenes was further determined by (13)C NMR spectroscopy and DSC analysis.

16.
Dalton Trans ; (25): 3345-57, 2008 Jul 07.
Article in English | MEDLINE | ID: mdl-18560667

ABSTRACT

A series of aluminium alkyl complexes (BDI)AlEt(2) (3a-m) bearing symmetrical or unsymmetrical beta-diketiminate ligand (BDI) frameworks were obtained from the reaction of triethyl aluminium and the corresponding beta-diketimine. The monomeric structure of the aluminium complex 3k was confirmed by an X-ray diffraction study, which shows that the aluminium center is coordinated by both of the nitrogen donors of the chelating diketiminate ligand and the two ethyl groups in a distorted tetrahedral geometry. Attempt to synthesize beta-diketiminate aluminium alkoxide complexes by the reactions of monochloride complex "(BDI-2a)AlMeCl" (4) with alkali salts of 2-propanol gave unexpectedly an aluminoxane [(BDI-2a)AlMe](2)(micro-O) (7) as characterized by X-ray diffraction methods. Complexes 3a-m and [(2,6-(i)Pr(2)C(6)H(3)NCMe)(2)HC]AlEt(2) (8) were found to catalyze the ring-opening polymerization (ROP) of epsilon-caprolactone with moderate activities. The steric and electronic characteristics of the ancillary ligands have a significant influence on the polymerization performance of the corresponding aluminium complexes. The introduction of electron-donating substituents at the para-positions of the aryl rings in the ligand resulted in an apparent decrease in catalytic activity. Complex 3h showed the highest activity among the investigated aluminium complexes due to the high electrophilicity of the metal center induced by the meta-trifluoromethyl substituents on the aryl rings. The increase of steric hindrance of the ligand by introducing ortho-substituents onto the phenyl moieties also resulted in a decrease in the catalytic activity. Although the viscosity average molecular weights (M(eta)) of the obtained poly(caprolactone)s increased with the enhancement of monomer conversion, the ROPs of epsilon-caprolactone initiated by complexes 3a-m and 8 were not well-controlled, as judged from the broad molecular weight distributions (PDI = 1.66-3.74, M(w)/M(n)) of the obtained polymers and the nonlinear relationship of molecular weight versus monomer conversion.


Subject(s)
Aluminum/chemistry , Organometallic Compounds/chemical synthesis , Polyesters/chemistry , 2-Propanol/chemistry , Caproates/chemistry , Catalysis , Cyclization , Hydrocarbons, Chlorinated/chemistry , Hydrocarbons, Cyclic/chemistry , Hydrocarbons, Fluorinated/chemistry , Lactones/chemistry , Ligands , Models, Chemical , Molecular Weight , Viscosity , X-Ray Diffraction
17.
IEEE Trans Image Process ; 17(6): 873-86, 2008 Jun.
Article in English | MEDLINE | ID: mdl-18482883

ABSTRACT

Existing learning-based face super-resolution (hallucination) techniques generate high-resolution images of a single facial modality (i.e., at a fixed expression, pose and illumination) given one or set of low-resolution face images as probe. Here, we present a generalized approach based on a hierarchical tensor (multilinear) space representation for hallucinating high-resolution face images across multiple modalities, achieving generalization to variations in expression and pose. In particular, we formulate a unified tensor which can be reduced to two parts: a global image-based tensor for modeling the mappings among different facial modalities, and a local patch-based multiresolution tensor for incorporating high-resolution image details. For realistic hallucination of unregistered low-resolution faces contained in raw images, we develop an automatic face alignment algorithm capable of pixel-wise alignment by iteratively warping the probing face to its projection in the space of training face images. Our experiments show not only performance superiority over existing benchmark face super-resolution techniques on single modal face hallucination, but also novelty of our approach in coping with multimodal hallucination and its robustness in automatic alignment under practical imaging conditions.


Subject(s)
Algorithms , Artificial Intelligence , Biometry/methods , Face/anatomy & histology , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Subtraction Technique , Humans , Reproducibility of Results , Sensitivity and Specificity
18.
IEEE Trans Pattern Anal Mach Intell ; 30(5): 893-908, 2008 May.
Article in English | MEDLINE | ID: mdl-18369257

ABSTRACT

This paper aims to address the problem of modelling video behaviour captured in surveillancevideos for the applications of online normal behaviour recognition and anomaly detection. A novelframework is developed for automatic behaviour profiling and online anomaly sampling/detectionwithout any manual labelling of the training dataset. The framework consists of the followingkey components: (1) A compact and effective behaviour representation method is developed basedon discrete scene event detection. The similarity between behaviour patterns are measured basedon modelling each pattern using a Dynamic Bayesian Network (DBN). (2) Natural grouping ofbehaviour patterns is discovered through a novel spectral clustering algorithm with unsupervisedmodel selection and feature selection on the eigenvectors of a normalised affinity matrix. (3) Acomposite generative behaviour model is constructed which is capable of generalising from asmall training set to accommodate variations in unseen normal behaviour patterns. (4) A run-timeaccumulative anomaly measure is introduced to detect abnormal behaviour while normal behaviourpatterns are recognised when sufficient visual evidence has become available based on an onlineLikelihood Ratio Test (LRT) method. This ensures robust and reliable anomaly detection and normalbehaviour recognition at the shortest possible time. The effectiveness and robustness of our approachis demonstrated through experiments using noisy and sparse datasets collected from both indoorand outdoor surveillance scenarios. In particular, it is shown that a behaviour model trained usingan unlabelled dataset is superior to those trained using the same but labelled dataset in detectinganomaly from an unseen video. The experiments also suggest that our online LRT based behaviourrecognition approach is advantageous over the commonly used Maximum Likelihood (ML) methodin differentiating ambiguities among different behaviour classes observed online.


Subject(s)
Algorithms , Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Video Recording/methods , Image Enhancement/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...