Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Sensors (Basel) ; 24(9)2024 Apr 24.
Article in English | MEDLINE | ID: mdl-38732817

ABSTRACT

Existing retinex-based low-light image enhancement strategies focus heavily on crafting complex networks for Retinex decomposition but often result in imprecise estimations. To overcome the limitations of previous methods, we introduce a straightforward yet effective strategy for Retinex decomposition, dividing images into colormaps and graymaps as new estimations for reflectance and illumination maps. The enhancement of these maps is separately conducted using a diffusion model for improved restoration. Furthermore, we address the dual challenge of perturbation removal and brightness adjustment in illumination maps by incorporating brightness guidance. This guidance aids in precisely adjusting the brightness while eliminating disturbances, ensuring a more effective enhancement process. Extensive quantitative and qualitative experimental analyses demonstrate that our proposed method improves the performance by approximately 4.4% on the LOL dataset compared to other state-of-the-art diffusion-based methods, while also validating the model's generalizability across multiple real-world datasets.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12408-12426, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37819806

ABSTRACT

Natural untrimmed videos provide rich visual content for self-supervised learning. Yet most previous efforts to learn spatio-temporal representations rely on manually trimmed videos, such as Kinetics dataset (Carreira and Zisserman 2017), resulting in limited diversity in visual patterns and limited performance gains. In this work, we aim to improve video representations by leveraging the rich information in natural untrimmed videos. For this purpose, we propose learning a hierarchy of temporal consistencies in videos, i.e., visual consistency and topical consistency, corresponding respectively to clip pairs that tend to be visually similar when separated by a short time span, and clip pairs that share similar topics when separated by a long time span. Specifically, we present a Hierarchical Consistency (HiCo++) learning framework, in which the visually consistent pairs are encouraged to share the same feature representations by contrastive learning, while topically consistent pairs are coupled through a topical classifier that distinguishes whether they are topic-related, i.e., from the same untrimmed video. Additionally, we impose a gradual sampling algorithm for the proposed hierarchical consistency learning, and demonstrate its theoretical superiority. Empirically, we show that HiCo++ can not only generate stronger representations on untrimmed videos, but also improve the representation quality when applied to trimmed videos. This contrasts with standard contrastive learning, which fails to learn powerful representations from untrimmed videos. Source code will be made available here.

3.
IEEE Trans Image Process ; 32: 3717-3731, 2023.
Article in English | MEDLINE | ID: mdl-37405882

ABSTRACT

Improving boundary segmentation results has recently attracted increasing attention in the field of semantic segmentation. Since existing popular methods usually exploit the long-range context, the boundary cues are obscure in the feature space, leading to poor boundary results. In this paper, we propose a novel conditional boundary loss (CBL) for semantic segmentation to improve the performance of the boundaries. The CBL creates a unique optimization goal for each boundary pixel, conditioned on its surrounding neighbors. The conditional optimization of the CBL is easy yet effective. In contrast, most previous boundary-aware methods have difficult optimization goals or may cause potential conflicts with the semantic segmentation task. Specifically, the CBL enhances the intra-class consistency and inter-class difference, by pulling each boundary pixel closer to its unique local class center and pushing it away from its different-class neighbors. Moreover, the CBL filters out noisy and incorrect information to obtain precise boundaries, since only surrounding neighbors that are correctly classified participate in the loss calculation. Our loss is a plug-and-play solution that can be used to improve the boundary segmentation performance of any semantic segmentation network. We conduct extensive experiments on ADE20K, Cityscapes, and Pascal Context, and the results show that applying the CBL to various popular segmentation networks can significantly improve the mIoU and boundary F-score performance.

4.
Article in English | MEDLINE | ID: mdl-37037250

ABSTRACT

Recognizing human-object interaction (HOI) aims at inferring various relationships between actions and objects. Although great progress in HOI has been made, the long-tail problem and combinatorial explosion problem are still practical challenges. To this end, we formulate HOI as a few-shot task to tackle both challenges and design a novel dynamic generation method to address this task. The proposed approach is called semantic-aware dynamic generation networks (SADG-Nets). Specifically, SADG-Net first assigns semantic-aware task representations for different batches of data, which further generates dynamic parameters. It obtains the features that highlight intercategory discriminability and intracategory commonality adaptively. In addition, we also design a dual semantic-aware encoder module (DSAE-Module), that is, verb-aware and noun-aware branches, to yield both action and object prototypes of HOI for each task space, which generalizes to novel combinations by transferring similarities among interactions. Extensive experimental results on two benchmark datasets, that is, humans interacting with common objects (HICO)-FS and trento universal HOI (TUHOI)-FS, illustrate that our SADG-Net achieves superior performance over state-of-the-art approaches, which proves its impressive effectiveness on few-shot HOI recognition.

5.
IEEE Trans Cybern ; 53(3): 1641-1652, 2023 Mar.
Article in English | MEDLINE | ID: mdl-34506295

ABSTRACT

Human parsing is a fine-grained semantic segmentation task, which needs to understand human semantic parts. Most existing methods model human parsing as a general semantic segmentation, which ignores the inherent relationship among hierarchical human parts. In this work, we propose a pose-guided hierarchical semantic decomposition and composition framework for human parsing. Specifically, our method includes a semantic maintained decomposition and composition (SMDC) module and a pose distillation (PC) module. SMDC progressively disassembles the human body to focus on the more concise regions of interest in the decomposition stage and then gradually assembles human parts under the guidance of pose information in the composition stage. Notably, SMDC maintains the atomic semantic labels during both stages to avoid the error propagation issue of the hierarchical structure. To further take advantage of the relationship of human parts, we introduce pose information as explicit guidance for the composition. However, the discrete structure prediction in pose estimation is against the requirement of the continuous region in human parsing. To this end, we design a PC module to broadcast the maximum responses of pose estimation to form the continuous structure in the way of knowledge distillation. The experimental results on the look-into-person (LIP) and PASCAL-Person-Part datasets demonstrate the superiority of our method compared with the state-of-the-art methods, that is, 55.21% mean Intersection of Union (mIoU) on LIP and 69.88% mIoU on PASCAL-Person-Part.


Subject(s)
Semantics , Humans
6.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7319-7337, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36355744

ABSTRACT

Person search aims at localizing and recognizing query persons from raw video frames, which is a combination of two sub-tasks, i.e., pedestrian detection and person re-identification. The dominant fashion is termed as the one-step person search that jointly optimizes detection and identification in a unified network, exhibiting higher efficiency. However, there remain major challenges: (i) conflicting objectives of multiple sub-tasks under the shared feature space, (ii) inconsistent memory bank caused by the limited batch size, (iii) underutilized unlabeled identities during the identification learning. To address these issues, we develop an enhanced decoupled and memory-reinforced network (DMRNet++). First, we simplify the standard tightly coupled pipelines and establish a task-decoupled framework (TDF). Second, we build a memory-reinforced mechanism (MRM), with a slow-moving average of the network to better encode the consistency of the memorized features. Third, considering the potential of unlabeled samples, we model the recognition process as semi-supervised learning. An unlabeled-aided contrastive loss (UCL) is developed to boost the identification feature learning by exploiting the aggregation of unlabeled identities. Experimentally, the proposed DMRNet++ obtains the mAP of 94.5% and 52.1% on CUHK-SYSU and PRW datasets, which exceeds most existing methods.

7.
Article in English | MEDLINE | ID: mdl-32203019

ABSTRACT

Dynamic scene blur is usually caused by object motion, depth variation as well as camera shake. Most existing methods usually solve this problem using image segmentation or fully end-to-end trainable deep convolutional neural networks by considering different object motions or camera shakes. However, these algorithms are less effective when there exist depth variations. In this work, we propose a deep neural convolutional network that exploits the depth map for dynamic scene deblurring. Given a blurred image, we first extract the depth map and adopt a depth refinement network to restore the edges and structure in the depth map. To effectively exploit the depth map, we adopt the spatial feature transform layer to extract depth features and fuse with the image features through scaling and shifting. Our image deblurring network thus learns to restore a clear image under the guidance of the depth map. With substantial experiments and analysis, we show that the depth information is crucial to the performance of the proposed model. Finally, extensive quantitative and qualitative evaluations demonstrate that the proposed model performs favorably against the state-of-the-art dynamic scene deblurring approaches as well as conventional depth-based deblurring algorithms.

8.
Article in English | MEDLINE | ID: mdl-31751272

ABSTRACT

We present an effective semi-supervised learning algorithm for single image dehazing. The proposed algorithm applies a deep Convolutional Neural Network (CNN) containing a supervised learning branch and an unsupervised learning branch. In the supervised branch, the deep neural network is constrained by the supervised loss functions, which are mean squared, perceptual, and adversarial losses. In the unsupervised branch, we exploit the properties of clean images via sparsity of dark channel and gradient priors to constrain the network. We train the proposed network on both the synthetic data and real-world images in an end-to-end manner. Our analysis shows that the proposed semi-supervised learning algorithm is not limited to synthetic training datasets and can be generalized well to real-world images. Extensive experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art single image dehazing algorithms on both benchmark datasets and real-world images.

9.
Sensors (Basel) ; 19(18)2019 Sep 06.
Article in English | MEDLINE | ID: mdl-31500196

ABSTRACT

Most existing person re-identification methods focus on matching still person images across non-overlapping camera views. Despite their excellent performance in some circumstances, these methods still suffer from occlusion and the changes of pose, viewpoint or lighting. Video-based re-id is a natural way to overcome these problems, by exploiting space-time information from videos. One of the most challenging problems in video-based person re-identification is temporal alignment, in addition to spatial alignment. To address the problem, we propose an effective superpixel-based temporally aligned representation for video-based person re-identification, which represents a video sequence only using one walking cycle. Particularly, we first build a candidate set of walking cycles by extracting motion information at superpixel level, which is more robust than that at the pixel level. Then, from the candidate set, we propose an effective criterion to select the walking cycle most matching the intrinsic periodicity property of walking persons. Finally, we propose a temporally aligned pooling scheme to describe the video data in the selected walking cycle. In addition, to characterize the individual still images in the cycle, we propose a superpixel-based representation to improve spatial alignment. Extensive experimental results on three public datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art approaches.

10.
Article in English | MEDLINE | ID: mdl-31329554

ABSTRACT

Instance segmentation is a challenging computer vision problem which lies at the intersection of object detection and semantic segmentation. Motivated by plant image analysis in the context of plant phenotyping, a recently emerging application field of computer vision, this paper presents the Exemplar-Based Recursive Instance Segmentation (ERIS) framework. A three-layer probabilistic model is firstly introduced to jointly represent hypotheses, voting elements, instance labels and their connections. Afterwards, a recursive optimization algorithm is developed to infer the maximum a posteriori (MAP) solution, which handles one instance at a time by alternating among the three steps of detection, segmentation and update. The proposed ERIS framework departs from previous works mainly in two respects. First, it is exemplar-based and model-free, which can achieve instance-level segmentation of a specific object class given only a handful of (typically less than 10) annotated exemplars. Such a merit enables its use in case that no massive manually-labeled data is available for training strong classification models, as required by most existing methods. Second, instead of attempting to infer the solution in a single shot, which suffers from extremely high computational complexity, our recursive optimization strategy allows for reasonably efficient MAP-inference in full hypothesis space. The ERIS framework is substantialized for the specific application of plant leaf segmentation in this work. Experiments are conducted on public benchmarks to demonstrate the superiority of our method in both effectiveness and efficiency in comparison with the state-of-the-art.

11.
PLoS One ; 11(7): e0159355, 2016.
Article in English | MEDLINE | ID: mdl-27433940

ABSTRACT

BACKGROUND: Since the discovery of cell-free foetal DNA in the plasma of pregnant women, many non-invasive prenatal testing assays have been developed. In the area of skeletal dysplasia diagnosis, some PCR-based non-invasive prenatal testing assays have been developed to facilitate the ultrasound diagnosis of skeletal dysplasias that are caused by de novo mutations. However, skeletal dysplasias are a group of heterogeneous genetic diseases, the PCR-based method is hard to detect multiple gene or loci simultaneously, and the diagnosis rate is highly dependent on the accuracy of the ultrasound diagnosis. In this study, we investigated the feasibility of using targeted capture sequencing to detect foetal de novo pathogenic mutations responsible for skeletal dysplasia. METHODOLOGY/PRINCIPAL FINDINGS: Three families whose foetuses were affected by skeletal dysplasia and two control families whose foetuses were affected by other single gene diseases were included in this study. Sixteen genes related to some common lethal skeletal dysplasias were selected for analysis, and probes were designed to capture the coding regions of these genes. Targeted capture sequencing was performed on the maternal plasma DNA, the maternal genomic DNA, and the paternal genomic DNA. The de novo pathogenic variants in the plasma DNA data were identified using a bioinformatical process developed for low frequency mutation detection and a strict variant interpretation strategy. The causal variants could be specifically identified in the plasma, and the results were identical to those obtained by sequencing amniotic fluid samples. Furthermore, a mean of 97% foetal specific alleles, which are alleles that are not shared by maternal genomic DNA and amniotic fluid DNA, were identified successfully in plasma samples. CONCLUSIONS/SIGNIFICANCE: Our study shows that capture sequencing of maternal plasma DNA can be used to non-invasive detection of de novo pathogenic variants. This method has the potential to be used to facilitate the prenatal diagnosis of skeletal dysplasia.


Subject(s)
Bone Diseases, Developmental/blood , Bone Diseases, Developmental/genetics , Craniofacial Abnormalities/blood , Craniofacial Abnormalities/genetics , DNA/blood , Prenatal Diagnosis , Alleles , Amniotic Fluid/chemistry , Bone Diseases, Developmental/pathology , Cell-Free System , Craniofacial Abnormalities/pathology , DNA/chemistry , Female , Fetus , Humans , Mutation , Polymerase Chain Reaction , Pregnancy , Sequence Analysis, DNA
12.
Sensors (Basel) ; 16(4)2016 Apr 15.
Article in English | MEDLINE | ID: mdl-27092505

ABSTRACT

Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the "good" models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm.

13.
BMC Med Genet ; 17: 23, 2016 Mar 15.
Article in English | MEDLINE | ID: mdl-26980296

ABSTRACT

BACKGROUND: The identification of causative mutations is important for treatment decisions and genetic counseling of patients with disorders of sex development (DSD). Here, we designed a new assay based on targeted next-generation sequencing (NGS) to diagnose these genetically heterogeneous disorders. METHODS: All coding regions and flanking sequences of 219 genes implicated in DSD were designed to be included on a panel. A total of 45 samples were used for sex chromosome dosage validation by targeted sequencing using the NGS platform. Among these, 21 samples were processed to find the causative mutation. RESULTS: The sex chromosome dosages of all 45 samples in this assay were concordant with their corresponding karyotyping results. Among the 21 DSD patients, a total of 11 mutations in SRY, NR0B1, AR, CYP17A1, GK, CHD7, and SRD5A2 were identified, including five single nucleotide variants, three InDels, one in-frame duplication, one SRY-positive 46,XX, and one gross duplication with an estimated size of more than 427,038 bp containing NR0B1 and GK. We also identified six novel mutations: c.230_231insA in SRY, c.7389delA in CHD7, c.273C>G in NR0B1, and c.2158G>A, c.1825A>G, and c.2057_2065dupTGTGTGCTG in AR. CONCLUSIONS: Our assay was able to make a genetic diagnosis for eight DSD patients (38.1%), and identified variants of uncertain clinical significance in the other three cases (14.3%). Targeted NGS is therefore a comprehensive and efficient method to diagnose DSD. This work also expands the pathogenic mutation spectrum of DSD.


Subject(s)
Disorders of Sex Development/genetics , High-Throughput Nucleotide Sequencing/methods , Mutation , Asian People/genetics , China , Disorders of Sex Development/diagnosis , Female , Genetic Testing , Humans , Male , Polymorphism, Single Nucleotide , Reproducibility of Results , Sequence Alignment , Sequence Analysis, DNA/methods , Sex Chromosomes/genetics , Sexual Development/genetics
14.
J Opt Soc Am A Opt Image Sci Vis ; 33(3): 404-15, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26974910

ABSTRACT

Salient object detection is very useful in a large variety of image and vision-related applications. A recent trend in salient object detection is to explore novel top-down visual cues and combine them with bottom-up saliency to improve the performance. However, a basic and important problem, i.e., how to effectively fuse multiple visual cues, has rarely been addressed in previous works. To this end, the paper presents a multicue fusion method using the cross-diffusion process (CDP) for salient object detection. The CDP algorithm is deployed to combine the affinity matrices constructed over individual visual cue channels, which is then embedded into a saliency propagation framework to accomplish salient object detection. Different from other multicue fusion strategies, our proposed approach allows for collaborative fusion, that is, the individual visual cues to be fused are able to interact and exchange information with each other during the fusion procedure, which can possibly correct the noise or corruption included in the individual visual cue channels, leading to more robust and effective fusion results. Intensive experiments on publicly available datasets demonstrate the effectiveness and superior performance of our proposed method.

15.
J Opt Soc Am A Opt Image Sci Vis ; 32(2): 173-85, 2015 Feb 01.
Article in English | MEDLINE | ID: mdl-26366588

ABSTRACT

Recent methods based on midlevel visual concepts have shown promising capabilities in the human action recognition field. Automatically discovering semantic entities such as action parts remains challenging. In this paper, we present a method of automatically discovering distinctive midlevel action parts from video for recognition of human actions. We address this problem by learning and selecting a collection of discriminative and representative action part detectors directly from video data. We initially train a large collection of candidate exemplar-linear discriminant analysis detectors from clusters obtained by clustering spatiotemporal patches in whitened space. To select the most effective detectors from the vast array of candidates, we propose novel coverage-entropy curves (CE curves) to evaluate a detector's capability of distinguishing actions. The CE curves characterize the correlation between the representative and discriminative power of detectors. In the experiments, we apply the mined part detectors as a visual vocabulary to the task of action recognition on four datasets: KTH, Olympic Sports, UCF50, and HMDB51. The experimental results demonstrate the effectiveness of the proposed method and show the state-of-the-art recognition performance.

16.
PLoS One ; 9(5): e98447, 2014.
Article in English | MEDLINE | ID: mdl-24871350

ABSTRACT

This paper presents a novel object detection method using a single instance from the object category. Our method uses biologically inspired global scene context criteria to check whether every individual location of the image can be naturally replaced by the query instance, which indicates whether there is a similar object at this location. Different from the traditional detection methods that only look at individual locations for the desired objects, our method evaluates the consistency of the entire scene. It is therefore robust to large intra-class variations, occlusions, a minor variety of poses, low-revolution conditions, background clutter etc., and there is no off-line training. The experimental results on four datasets and two video sequences clearly show the superior robustness of the proposed method, suggesting that global scene context is important for visual detection/localization.


Subject(s)
Algorithms , Models, Biological , Pattern Recognition, Automated/methods , Computer Simulation , Humans , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Visual/physiology , Visual Perception/physiology
17.
Opt Lett ; 37(1): 76-8, 2012 Jan 01.
Article in English | MEDLINE | ID: mdl-22212796

ABSTRACT

We present an instance-based attention model to predict where humans could look first when searching for an object instance, and we show its application in image synthesis. The proposed model learns configurational rules from vast scene images described by global scene representations. The rules are then used to predict the focus of attention for the purpose of searching for a given object instance with special scale and pose. Finally, the image synthesis results are obtained by putting the object instance into the scene at the position that attracts most attention. Promising experimental results demonstrate the effectiveness of the proposed model.


Subject(s)
Attention , Vision, Ocular/physiology , Humans , Models, Biological , Photography , Probability
SELECTION OF CITATIONS
SEARCH DETAIL
...