Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 52
Filter
1.
J Affect Disord ; 350: 728-740, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38246281

ABSTRACT

BACKGROUND: The binary major depressive disorder (MDD) diagnosis is inadequate and should never be used in research. AIMS: The study's objective is to explicate our novel precision nomothetic strategy for constructing depression models based on adverse childhood experiences (ACEs), lifetime and current phenome, and biomarker (atherogenicity indices) scores. METHODS: This study assessed recurrence of illness (ROI: namely recurrence of depressive episodes and suicidal behaviors, SBs), lifetime and current SBs and the phenome of depression, neuroticism, dysthymia, anxiety disorders, and lipid biomarkers including apolipoprotein (Apo)A, ApoB, free cholesterol and cholesteryl esters, triglycerides, high density lipoprotein cholesterol in 67 normal controls and 66 MDD patients. We computed atherogenic and reverse cholesterol transport indices. RESULTS: We were able to extract one factor from a) the lifetime phenome of depression comprising ROI, and traits such as neuroticism, dysthymia and anxiety disorders, and b) the phenome of the acute phase (based on depression, anxiety and quality of life scores). PLS analysis showed that 55.7 % of the variance in the lifetime + current phenome factor was explained by increased atherogenicity, neglect and sexual abuse, while atherogenicity partially mediated the effects of neglect. Cluster analysis generated a cluster of patients with major dysmood disorder, which was externally validated by increased atherogenicity and characterized by increased scores of all clinical features. CONCLUSIONS: The outcome of depression should not be represented as a binary variable (MDD or not), but rather as multiple dimensional scores based on biomarkers, ROI, subclinical depression traits, and lifetime and current phenome scores including SBs.


Subject(s)
Depressive Disorder, Major , Humans , Depressive Disorder, Major/diagnosis , Suicidal Ideation , Depression , Quality of Life , Biomarkers , Cholesterol
3.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 7078-7092, 2022 10.
Article in English | MEDLINE | ID: mdl-34255625

ABSTRACT

Very deep Convolutional Neural Networks (CNNs) have greatly improved the performance on various image restoration tasks. However, this comes at a price of increasing computational burden, hence limiting their practical usages. We observe that some corrupted image regions are inherently easier to restore than others since the distortion and content vary within an image. To leverage this, we propose Path-Restore, a multi-path CNN with a pathfinder that can dynamically select an appropriate route for each image region. We train the pathfinder using reinforcement learning with a difficulty-regulated reward. This reward is related to the performance, complexity and "the difficulty of restoring a region". A policy mask is further investigated to jointly process all the image regions. We conduct experiments on denoising and mixed restoration tasks. The results show that our method achieves comparable or superior performance to existing approaches with less computational cost. In particular, Path-Restore is effective for real-world denoising, where the noise distribution varies across different regions on a single image. Compared to the state-of-the-art RIDNet [1], our method achieves comparable performance and runs 2.7x faster on the realistic Darmstadt Noise Dataset [2]. Models and codes are available on the project page: https://www.mmlab-ntu.com/project/pathrestore/.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Signal-To-Noise Ratio
4.
IEEE Trans Pattern Anal Mach Intell ; 44(4): 2004-2018, 2022 04.
Article in English | MEDLINE | ID: mdl-33108282

ABSTRACT

Although generative adversarial networks (GANs) have made significant progress in face synthesis, there lacks enough understanding of what GANs have learned in the latent representation to map a random code to a photo-realistic image. In this work, we propose a framework called InterFaceGAN to interpret the disentangled face representation learned by the state-of-the-art GAN models and study the properties of the facial semantics encoded in the latent space. We first find that GANs learn various semantics in some linear subspaces of the latent space. After identifying these subspaces, we can realistically manipulate the corresponding facial attributes without retraining the model. We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection, resulting in more precise control of the attribute manipulation. Besides manipulating the gender, age, expression, and presence of eyeglasses, we can even alter the face pose and fix the artifacts accidentally made by GANs. Furthermore, we perform an in-depth face identity analysis and a layer-wise analysis to evaluate the editing results quantitatively. Finally, we apply our approach to real face editing by employing GAN inversion approaches and explicitly training feed-forward models based on the synthetic data established by InterFaceGAN. Extensive experimental results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable face representation.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Head , Image Processing, Computer-Assisted/methods , Semantics
5.
IEEE Trans Pattern Anal Mach Intell ; 43(8): 2555-2569, 2021 Aug.
Article in English | MEDLINE | ID: mdl-32142417

ABSTRACT

Over four decades, the majority addresses the problem of optical flow estimation using variational methods. With the advance of machine learning, some recent works have attempted to address the problem using convolutional neural network (CNN) and have showed promising results. FlowNet2 [1] , the state-of-the-art CNN, requires over 160M parameters to achieve accurate flow estimation. Our LiteFlowNet2 outperforms FlowNet2 on Sintel and KITTI benchmarks, while being 25.3 times smaller in the model size and 3.1 times faster in the running speed. LiteFlowNet2 is built on the foundation laid by conventional methods and resembles the corresponding roles as data fidelity and regularization in variational methods. We compute optical flow in a spatial-pyramid formulation as SPyNet [2] but through a novel lightweight cascaded flow inference. It provides high flow estimation accuracy through early correction with seamless incorporation of descriptor matching. Flow regularization is used to ameliorate the issue of outliers and vague flow boundaries through feature-driven local convolutions. Our network also owns an effective structure for pyramidal feature extraction and embraces feature warping rather than image warping as practiced in FlowNet2 and SPyNet. Comparing to LiteFlowNet [3] , LiteFlowNet2 improves the optical flow accuracy on Sintel Clean by 23.3 percent, Sintel Final by 12.8 percent, KITTI 2012 by 19.6 percent, and KITTI 2015 by 18.8 percent, while being 2.2 times faster. Our network protocol and trained models are made publicly available on https://github.com/twhui/LiteFlowNet2.

6.
IEEE Trans Pattern Anal Mach Intell ; 42(11): 2781-2794, 2020 11.
Article in English | MEDLINE | ID: mdl-31071017

ABSTRACT

Data for face analysis often exhibit highly-skewed class distribution, i.e., most data belong to a few majority classes, while the minority classes only contain a scarce amount of instances. To mitigate this issue, contemporary deep learning methods typically follow classic strategies such as class re-sampling or cost-sensitive training. In this paper, we conduct extensive and systematic experiments to validate the effectiveness of these classic schemes for representation learning on class-imbalanced data. We further demonstrate that more discriminative deep representation can be learned by enforcing a deep network to maintain inter-cluster margins both within and between classes. This tight constraint effectively reduces the class imbalance inherent in the local data neighborhood, thus carving much more balanced class boundaries locally. We show that it is easy to deploy angular margins between the cluster distributions on a hypersphere manifold. Such learned Cluster-based Large Margin Local Embedding (CLMLE), when combined with a simple k-nearest cluster algorithm, shows significant improvements in accuracy over existing methods on both face recognition and face attribute prediction tasks that exhibit imbalanced class distribution.

7.
IEEE Trans Pattern Anal Mach Intell ; 41(11): 2740-2755, 2019 Nov.
Article in English | MEDLINE | ID: mdl-30183621

ABSTRACT

We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme. This unique design enables the TSN framework to efficiently learn action models by using the whole video. The learned models could be easily deployed for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the implementation of the TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on five challenging action recognition benchmarks: HMDB51 (71.0 percent), UCF101 (94.9 percent), THUMOS14 (80.1 percent), ActivityNet v1.2 (89.6 percent), and Kinetics400 (75.7 percent). In addition, using the proposed RGB difference as a simple motion representation, our method can still achieve competitive accuracy on UCF101 (91.0 percent) while running at 340 FPS. Furthermore, based on the proposed TSN framework, we won the video classification track at the ActivityNet challenge 2016 among 24 teams.

8.
IEEE Trans Neural Netw Learn Syst ; 29(5): 1503-1513, 2018 05.
Article in English | MEDLINE | ID: mdl-28362590

ABSTRACT

Data imbalance is common in many vision tasks where one or more classes are rare. Without addressing this issue, conventional methods tend to be biased toward the majority class with poor predictive accuracy for the minority class. These methods further deteriorate on small, imbalanced data that have a large degree of class overlap. In this paper, we propose a novel discriminative sparse neighbor approximation (DSNA) method to ameliorate the effect of class-imbalance during prediction. Specifically, given a test sample, we first traverse it through a cost-sensitive decision forest to collect a good subset of training examples in its local neighborhood. Then, we generate from this subset several class-discriminating but overlapping clusters and model each as an affine subspace. From these subspaces, the proposed DSNA iteratively seeks an optimal approximation of the test sample and outputs an unbiased prediction. We show that our method not only effectively mitigates the imbalance issue, but also allows the prediction to extrapolate to unseen data. The latter capability is crucial for achieving accurate prediction on small data set with limited samples. The proposed imbalanced learning method can be applied to both classification and regression tasks at a wide range of imbalance levels. It significantly outperforms the state-of-the-art methods that do not possess an imbalance handling mechanism, and is found to perform comparably or even better than recent deep learning methods by using hand-crafted features only.

9.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 1845-1859, 2018 08.
Article in English | MEDLINE | ID: mdl-28809674

ABSTRACT

We propose a deep convolutional neural network (CNN) for face detection leveraging on facial attributes based supervision. We observe a phenomenon that part detectors emerge within CNN trained to classify attributes from uncropped face images, without any explicit part supervision. The observation motivates a new method for finding faces through scoring facial parts responses by their spatial structure and arrangement. The scoring mechanism is data-driven, and carefully formulated considering challenging cases where faces are only partially visible. This consideration allows our network to detect faces under severe occlusion and unconstrained pose variations. Our method achieves promising performance on popular benchmarks including FDDB, PASCAL Faces, AFW, and WIDER FACE.


Subject(s)
Deep Learning , Face , Neural Networks, Computer , Pattern Recognition, Automated/methods , Artificial Intelligence/statistics & numerical data , Databases, Factual , Deep Learning/statistics & numerical data , Face/anatomy & histology , Humans , Pattern Recognition, Automated/statistics & numerical data , Supervised Machine Learning/statistics & numerical data
10.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 1814-1828, 2018 08.
Article in English | MEDLINE | ID: mdl-28796610

ABSTRACT

Semantic segmentation tasks can be well modeled by Markov Random Field (MRF). This paper addresses semantic segmentation by incorporating high-order relations and mixture of label contexts into MRF. Unlike previous works that optimized MRFs using iterative algorithm, we solve MRF by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass. Specifically, DPN extends a contemporary CNN to model unary terms and additional layers are devised to approximate the mean field (MF) algorithm for pairwise terms. It has several appealing properties. First, different from the recent works that required many iterations of MF during back-propagation, DPN is able to achieve high performance by approximating one iteration of MF. Second, DPN represents various types of pairwise terms, making many existing models as its special cases. Furthermore, pairwise terms in DPN provide a unified framework to encode rich contextual information in high-dimensional data, such as images and videos. Third, DPN makes MF easier to be parallelized and speeded up, thus enabling efficient inference. DPN is thoroughly evaluated on standard semantic image/video segmentation benchmarks, where a single DPN model yields state-of-the-art segmentation accuracies on PASCAL VOC 2012, Cityscapes dataset and CamVid dataset.

11.
IEEE Trans Pattern Anal Mach Intell ; 39(7): 1320-1334, 2017 07.
Article in English | MEDLINE | ID: mdl-27392342

ABSTRACT

In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new deep learning object detection framework has innovations in multiple aspects. In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty. A new pre-training strategy is proposed to learn feature representations more suitable for the object detection task and with good generalization capability. By changing the net structures, training strategies, adding and removing some key components in the detection pipeline, a set of models with large diversity are obtained, which significantly improves the effectiveness of model averaging. The proposed approach improves the mean averaged precision obtained by RCNN [16], which was the state-of-the-art, from 31% to 50.3% on the ILSVRC2014 detection test set. It also outperforms the winner of ILSVRC2014, GoogLeNet, by 6.1%. Detailed component-wise analysis is also provided through extensive experimental evaluation, which provides a global view for people to understand the deep learning object detection pipeline.

12.
IEEE Trans Pattern Anal Mach Intell ; 38(5): 918-30, 2016 May.
Article in English | MEDLINE | ID: mdl-27046839

ABSTRACT

In this study, we show that landmark detection or face alignment task is not a single and independent problem. Instead, its robustness can be greatly improved with auxiliary information. Specifically, we jointly optimize landmark detection together with the recognition of heterogeneous but subtly correlated facial attributes, such as gender, expression, and appearance attributes. This is non-trivial since different attribute inference tasks have different learning difficulties and convergence rates. To address this problem, we formulate a novel tasks-constrained deep model, which not only learns the inter-task correlation but also employs dynamic task coefficients to facilitate the optimization convergence when learning multiple complex tasks. Extensive evaluations show that the proposed task-constrained learning (i) outperforms existing face alignment methods, especially in dealing with faces with severe occlusion and pose variation, and (ii) reduces model complexity drastically compared to the state-of-the-art methods based on cascaded deep model.

13.
IEEE Trans Pattern Anal Mach Intell ; 38(2): 295-307, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26761735

ABSTRACT

We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously, and show better overall reconstruction quality.

14.
IEEE Trans Pattern Anal Mach Intell ; 38(10): 1997-2009, 2016 10.
Article in English | MEDLINE | ID: mdl-26660699

ABSTRACT

This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification. A key contribution of this work is to learn high-level relational visual features with rich identity similarity information. The deep ConvNets in our model start by extracting local relational visual features from two face images in comparison, which are further processed through multiple layers to extract high-level and global relational features. To keep enough discriminative information, we use the last hidden layer neuron activations of the ConvNet as features for face verification instead of those of the output layer. To characterize face similarities from different aspects, we concatenate the features extracted from different face region pairs by different deep ConvNets. The resulting high-dimensional relational features are classified by an RBM for face verification. After pre-training each ConvNet and the RBM separately, the entire hybrid network is jointly optimized to further improve the accuracy. Various aspects of the ConvNet structures, relational features, and face verification classifiers are investigated. Our model achieves the state-of-the-art face verification performance on the challenging LFW dataset under both the unrestricted protocol and the setting when outside data is allowed to be used for training.

15.
IEEE Trans Image Process ; 23(2): 810-22, 2014 Feb.
Article in English | MEDLINE | ID: mdl-26270920

ABSTRACT

Modeling the temporal structure of sub-activities is an important yet challenging problem in complex activity classification. This paper proposes a latent hierarchical model (LHM) to describe the decomposition of complex activity into sub-activities in a hierarchical way. The LHM has a tree-structure, where each node corresponds to a video segment (sub-activity) at certain temporal scale. The starting and ending time points of each sub-activity are represented by two latent variables, which are automatically determined during the inference process. We formulate the training problem of the LHM in a latent kernelized SVM framework and develop an efficient cascade inference method to speed up classification. The advantages of our methods come from: 1) LHM models the complex activity with a deep structure, which is decomposed into sub-activities in a coarse-to-fine manner and 2) the starting and ending time points of each segment are adaptively determined to deal with the temporal displacement and duration variation of sub-activity. We conduct experiments on three datasets: 1) the KTH; 2) the Hollywood2; and 3) the Olympic Sports. The experimental results show the effectiveness of the LHM in complex activity classification. With dense features, our LHM achieves the state-of-the-art performance on the Hollywood2 dataset and the Olympic Sports dataset.

16.
IEEE Trans Pattern Anal Mach Intell ; 36(11): 2199-213, 2014 Nov.
Article in English | MEDLINE | ID: mdl-26353061

ABSTRACT

Designing effective features is a fundamental problem in computer vision. However, it is usually difficult to achieve a great tradeoff between discriminative power and robustness. Previous works shown that spatial co-occurrence can boost the discriminative power of features. However the current existing co-occurrence features are taking few considerations to the robustness and hence suffering from sensitivity to geometric and photometric variations. In this work, we study the Transform Invariance (TI) of co-occurrence features. Concretely we formally introduce a Pairwise Transform Invariance (PTI) principle, and then propose a novel Pairwise Rotation Invariant Co-occurrence Local Binary Pattern (PRICoLBP) feature, and further extend it to incorporate multi-scale, multi-orientation, and multi-channel information. Different from other LBP variants, PRICoLBP can not only capture the spatial context co-occurrence information effectively, but also possess rotation invariance. We evaluate PRICoLBP comprehensively on nine benchmark data sets from five different perspectives, e.g., encoding strategy, rotation invariance, the number of templates, speed, and discriminative power compared to other LBP variants. Furthermore we apply PRICoLBP to six different but related applications-texture, material, flower, leaf, food, and scene classification, and demonstrate that PRICoLBP is efficient, effective, and of a well-balanced tradeoff between the discriminative power and robustness.


Subject(s)
Image Processing, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Algorithms , Databases, Factual , Food/classification , Plants/classification
17.
IEEE Trans Pattern Anal Mach Intell ; 36(4): 810-23, 2014 Apr.
Article in English | MEDLINE | ID: mdl-26353202

ABSTRACT

Image re-ranking, as an effective way to improve the results of web-based image search, has been adopted by current commercial search engines such as Bing and Google. Given a query keyword, a pool of images are first retrieved based on textual information. By asking the user to select a query image from the pool, the remaining images are re-ranked based on their visual similarities with the query image. A major challenge is that the similarities of visual features do not well correlate with images' semantic meanings which interpret users' search intention. Recently people proposed to match images in a semantic space which used attributes or reference classes closely related to the semantic meanings of images as basis. However, learning a universal visual semantic space to characterize highly diverse images from the web is difficult and inefficient. In this paper, we propose a novel image re-ranking framework, which automatically offline learns different semantic spaces for different query keywords. The visual features of images are projected into their related semantic spaces to get semantic signatures. At the online stage, images are re-ranked by comparing their semantic signatures obtained from the semantic space specified by the query keyword. The proposed query-specific semantic signatures significantly improve both the accuracy and efficiency of image re-ranking. The original visual features of thousands of dimensions can be projected to the semantic signatures as short as 25 dimensions. Experimental results show that 25-40 percent relative improvement has been achieved on re-ranking precisions compared with the state-of-the-art methods.

18.
IEEE Trans Pattern Anal Mach Intell ; 36(8): 1586-99, 2014 Aug.
Article in English | MEDLINE | ID: mdl-26353340

ABSTRACT

Collective motions of crowds are common in nature and have attracted a great deal of attention in a variety of multidisciplinary fields. Collectiveness, which indicates the degree of individuals acting as a union, is a fundamental and universal measurement for various crowd systems. By quantifying the topological structures of collective manifolds of crowd, this paper proposes a descriptor of collectiveness and its efficient computation for the crowd and its constituent individuals. The Collective Merging algorithm is then proposed to detect collective motions from random motions. We validate the effectiveness and robustness of the proposed collectiveness on the system of self-driven particles as well as other real crowd systems such as pedestrian crowds and bacteria colony. We compare the collectiveness descriptor with human perception for collective motion and show their high consistency. As a universal descriptor, the proposed crowd collectiveness can be used to compare different crowd systems. It has a wide range of applications, such as detecting collective motions from crowd clutters, monitoring crowd dynamics, and generating maps of collectiveness for crowded scenes. A new Collective Motion Database, which consists of 413 video clips from 62 crowded scenes, is released to the public.

19.
IEEE Trans Pattern Anal Mach Intell ; 35(6): 1397-409, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23599054

ABSTRACT

In this paper, we propose a novel explicit image filter called guided filter. Derived from a local linear model, the guided filter computes the filtering output by considering the content of a guidance image, which can be the input image itself or another different image. The guided filter can be used as an edge-preserving smoothing operator like the popular bilateral filter [1], but it has better behaviors near edges. The guided filter is also a more generic concept beyond smoothing: It can transfer the structures of the guidance image to the filtering output, enabling new filtering applications like dehazing and guided feathering. Moreover, the guided filter naturally has a fast and nonapproximate linear time algorithm, regardless of the kernel size and the intensity range. Currently, it is one of the fastest edge-preserving filters. Experiments show that the guided filter is both effective and efficient in a great variety of computer vision and computer graphics applications, including edge-aware smoothing, detail enhancement, HDR compression, image matting/feathering, dehazing, joint upsampling, etc.

20.
IEEE Trans Pattern Anal Mach Intell ; 35(2): 367-80, 2013 Feb.
Article in English | MEDLINE | ID: mdl-22529324

ABSTRACT

In this paper, we propose a framework of transforming images from a source image space to a target image space, based on learning coupled dictionaries from a training set of paired images. The framework can be used for applications such as image super-resolution and estimation of image intrinsic components (shading and albedo). It is based on a local parametric regression approach, using sparse feature representations over learned coupled dictionaries across the source and target image spaces. After coupled dictionary learning, sparse coefficient vectors of training image patch pairs are partitioned into easily retrievable local clusters. For any test image patch, we can fast index into its closest local cluster and perform a local parametric regression between the learned sparse feature spaces. The obtained sparse representation (together with the learned target space dictionary) provides multiple constraints for each pixel of the target image to be estimated. The final target image is reconstructed based on these constraints. The contributions of our proposed framework are three-fold. 1) We propose a concept of coupled dictionary learning based on coupled sparse coding which requires the sparse coefficient vectors of a pair of corresponding source and target image patches to have the same support, i.e., the same indices of nonzero elements. 2) We devise a space partitioning scheme to divide the high-dimensional but sparse feature space into local clusters. The partitioning facilitates extremely fast retrieval of closest local clusters for query patches. 3) Benefiting from sparse feature-based image transformation, our method is more robust to corrupted input data, and can be considered as a simultaneous image restoration and transformation process. Experiments on intrinsic image estimation and super-resolution demonstrate the effectiveness and efficiency of our proposed method.


Subject(s)
Algorithms , Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Subtraction Technique , Computer Simulation , Image Enhancement/methods , Models, Biological , Models, Statistical , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...