Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
1.
Behav Sci (Basel) ; 14(6)2024 Jun 19.
Article in English | MEDLINE | ID: mdl-38920840

ABSTRACT

Ensemble coding allows observers to form an average to represent a set of elements. However, it is unclear whether observers can extract an average from a cross-category set. Previous investigations on this issue using low-level stimuli yielded contradictory results. The current study addressed this issue by presenting high-level stimuli (i.e., a crowd of facial expressions) simultaneously (Experiment 1) or sequentially (Experiment 2), and asked participants to complete a member judgment task. The results showed that participants could extract average information from a group of cross-category facial expressions with a short perceptual distance. These findings demonstrate cross-category ensemble coding of high-level stimuli, contributing to the understanding of ensemble coding and providing inspiration for future research.

2.
Article in English | MEDLINE | ID: mdl-38900610

ABSTRACT

Thin-plate spline (TPS) is a principal warp that allows for representing elastic, nonlinear transformation with control point motions. With the increase of control points, the warp becomes increasingly flexible but usually encounters a bottleneck caused by undesired issues, e.g., content distortion. In this paper, we explore generic applications of TPS in single-image-based warping tasks, such as rotation correction, rectangling, and portrait correction. To break this bottleneck, we propose the coupled thin-plate spline model (CoupledTPS), which iteratively couples multiple TPS with limited control points into a more flexible and powerful transformation. Concretely, we first design an iterative search to predict new control points according to the current latent condition. Then, we present the warping flow as a bridge for the coupling of different TPS transformations, effectively eliminating interpolation errors caused by multiple warps. Besides, in light of the laborious annotation cost, we develop a semi-supervised learning scheme to improve warping quality by exploiting unlabeled data. It is formulated through dual transformation between the searched control points of unlabeled data and its graphic augmentation, yielding an implicit correction consistency constraint. Finally, we collect massive unlabeled data to exhibit the benefit of our semi-supervised scheme in rotation correction. Extensive experiments demonstrate the superiority and universality of CoupledTPS over the existing state-of-the-art (SoTA) solutions for rotation correction and beyond. The code and data will be available at https://github.com/nie-lang/CoupledTPS.

3.
IEEE Trans Image Process ; 33: 2676-2688, 2024.
Article in English | MEDLINE | ID: mdl-38530733

ABSTRACT

Accurate segmentation of lesions is crucial for diagnosis and treatment of early esophageal cancer (EEC). However, neither traditional nor deep learning-based methods up to today can meet the clinical requirements, with the mean Dice score - the most important metric in medical image analysis - hardly exceeding 0.75. In this paper, we present a novel deep learning approach for segmenting EEC lesions. Our method stands out for its uniqueness, as it relies solely on a single input image from a patient, forming the so-called "You-Only-Have-One" (YOHO) framework. On one hand, this "one-image-one-network" learning ensures complete patient privacy as it does not use any images from other patients as the training data. On the other hand, it avoids nearly all generalization-related problems since each trained network is applied only to the same input image itself. In particular, we can push the training to "over-fitting" as much as possible to increase the segmentation accuracy. Our technical details include an interaction with clinical doctors to utilize their expertise, a geometry-based data augmentation over a single lesion image to generate the training dataset (the biggest novelty), and an edge-enhanced UNet. We have evaluated YOHO over an EEC dataset collected by ourselves and achieved a mean Dice score of 0.888, which is much higher as compared to the existing deep-learning methods, thus representing a significant advance toward clinical applications. The code and dataset are available at: https://github.com/lhaippp/YOHO.


Subject(s)
Deep Learning , Esophageal Neoplasms , Humans , Esophageal Neoplasms/diagnostic imaging , Image Processing, Computer-Assisted
4.
IEEE Trans Image Process ; 32: 2879-2888, 2023.
Article in English | MEDLINE | ID: mdl-37195842

ABSTRACT

Not everybody can be equipped with professional photography skills and sufficient shooting time, and there can be some tilts in the captured images occasionally. In this paper, we propose a new and practical task, named Rotation Correction, to automatically correct the tilt with high content fidelity in the condition that the rotated angle is unknown. This task can be easily integrated into image editing applications, allowing users to correct the rotated images without any manual operations. To this end, we leverage a neural network to predict the optical flows that can warp the tilted images to be perceptually horizontal. Nevertheless, the pixel-wise optical flow estimation from a single image is severely unstable, especially in large-angle tilted images. To enhance its robustness, we propose a simple but effective prediction strategy to form a robust elastic warp. Particularly, we first regress the mesh deformation that can be transformed into robust initial optical flows. Then we estimate residual optical flows to facilitate our network the flexibility of pixel-wise deformation, further correcting the details of the tilted images. To establish an evaluation benchmark and train the learning framework, a comprehensive rotation correction dataset is presented with a large diversity in scenes and rotated angles. Extensive experiments demonstrate that even in the absence of the angle prior, our algorithm can outperform other state-of-the-art solutions requiring this prior. The code and dataset are available at https://github.com/nie-lang/RotationCorrection.

5.
Opt Express ; 31(5): 7900-7906, 2023 Feb 27.
Article in English | MEDLINE | ID: mdl-36859911

ABSTRACT

InGaAs/AlGaAs multiple quantum well lasers grown on silicon (001) by molecular beam epitaxy have been demonstrated. By inserting InAlAs trapping layers into AlGaAs cladding layers, misfit dislocations easily located in the active region can be effectively transferred out of the active region. For comparison, the same laser structure without the InAlAs trapping layers was also grown. All these as-grown materials were fabricated into Fabry-Perot lasers with the same cavity size of 20 × 1000 µm2. The laser with trapping layers achieved a 2.7-fold reduction in threshold current density under pulsed operation (5 µs-pulsed width, 1%-duty cycle) compared to the counterpart, and further realized a room-temperature continuous-wave lasing with a threshold current of 537 mA which corresponds to a threshold current density of 2.7 kA/cm2. When the injection current reached 1000 mA, the single-facet maximum output power and slope efficiency were 45.3 mW and 0.143 W/A, respectively. This work demonstrates significantly improved performances of InGaAs/AlGaAs quantum well lasers monolithically grown on silicon, providing a feasible solution to optimize the InGaAs quantum well structure.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 2849-2863, 2023 Mar.
Article in English | MEDLINE | ID: mdl-35536823

ABSTRACT

Homography estimation is a basic image alignment method in many applications. It is usually done by extracting and matching sparse feature points, which are error-prone in low-light and low-texture images. On the other hand, previous deep homography approaches use either synthetic images for supervised learning or aerial images for unsupervised learning, both ignoring the importance of handling depth disparities and moving objects in real-world applications. To overcome these problems, in this work, we propose an unsupervised deep homography method with a new architecture design. In the spirit of the RANSAC procedure in traditional methods, we specifically learn an outlier mask to only select reliable regions for homography estimation. We calculate loss with respect to our learned deep features instead of directly comparing image content as did previously. To achieve the unsupervised training, we also formulate a novel triplet loss customized for our network. We verify our method by conducting comprehensive comparisons on a new dataset that covers a wide range of scenes with varying degrees of difficulties for the task. Experimental results reveal that our method outperforms the state-of-the-art, including deep solutions and feature-based solutions.

7.
Behav Res Methods ; 55(5): 2353-2366, 2023 08.
Article in English | MEDLINE | ID: mdl-35931937

ABSTRACT

Human body movements are important for emotion recognition and social communication and have received extensive attention from researchers. In this field, emotional biological motion stimuli, as depicted by point-light displays, are widely used. However, the number of stimuli in the existing material library is small, and there is a lack of standardized indicators, which subsequently limits experimental design and conduction. Therefore, based on our prior kinematic dataset, we constructed the Dalian Emotional Movement Open-source Set (DEMOS) using computational modeling. The DEMOS has three views (i.e., frontal 0°, left 45°, and left 90°) and in total comprises 2664 high-quality videos of emotional biological motion, each displaying happiness, sadness, anger, fear, disgust, and neutral. All stimuli were validated in terms of recognition accuracy, emotional intensity, and subjective movement. The objective movement for each expression was also calculated. The DEMOS can be downloaded for free from https://osf.io/83fst/ . To our knowledge, this is the largest multi-view emotional biological motion set based on the whole body. The DEMOS can be applied in many fields, including affective computing, social cognition, and psychiatry.


Subject(s)
Emotions , Happiness , Humans , Fear , Anger , Communication , Movement , Facial Expression
8.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7885-7899, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36409814

ABSTRACT

In this paper, we introduce a new framework for unsupervised deep homography estimation. Our contributions are 3 folds. First, unlike previous methods that regress 4 offsets for a homography, we propose a homography flow representation, which can be estimated by a weighted sum of 8 pre-defined homography flow bases. Second, considering a homography contains 8 Degree-of-Freedoms (DOFs) that is much less than the rank of the network features, we propose a Low Rank Representation (LRR) block that reduces the feature rank, so that features corresponding to the dominant motions are retained while others are rejected. Last, we propose a Feature Identity Loss (FIL) to enforce the learned image feature warp-equivariant, meaning that the result should be identical if the order of warp operation and feature extraction is swapped. With this constraint, the unsupervised optimization can be more effective and the learned features are more stable. With global-to-local homography flow refinement, we also naturally generalize the proposed method to local mesh-grid homography estimation, which can go beyond the constraint of a single homography. Extensive experiments are conducted to demonstrate the effectiveness of all the newly proposed components, and results show that our approach outperforms the state-of-the-art on the homography benchmark dataset both qualitatively and quantitatively. Code is available at https://github.com/megvii-research/BasesHomo.

9.
Brain Sci ; 12(12)2022 Dec 03.
Article in English | MEDLINE | ID: mdl-36552125

ABSTRACT

Although emotional expressions conveyed by the eye regions are processed efficiently, little is known regarding the relationship between emotional processing of isolated eye regions and temporal attention. In this study, we conducted three rapid serial visual presentation (RSVP) experiments with varying task demands (emotion discrimination, eye detection, eyes ignored) related to the first target (T1) to investigate how the perception of emotional valence in the eye region (T1: happy, neutral, fearful) impacts the identification of a second target (T2: neutral houses). Event-related potential (ERP) findings indicated that fearful stimuli reliably increased N170 amplitude regardless of the emotional relevance of task demands. The P3 component exhibited enhanced responses to happy and fearful stimuli in the emotion discrimination task and to happy eye regions in the eye detection task. Analysis of T2-related ERPs within the attentional blink period revealed that T2 houses preceded by fearful and happy stimuli elicited larger N2 and P3 amplitudes than those preceded by neutral stimuli only in the emotion discrimination task. Together, these findings indicate that attention to affective content conveyed by the eyes can not only amplify the perceptual analysis of emotional eye regions but also facilitate the processing of a subsequent target.

10.
Neuroimage ; 258: 119374, 2022 09.
Article in English | MEDLINE | ID: mdl-35700944

ABSTRACT

Humans can detect and recognize faces quickly, but there has been little research on the temporal dynamics of the different dimensional face information that is extracted. The present study aimed to investigate the time course of neural responses to the representation of different dimensional face information, such as age, gender, emotion, and identity. We used support vector machine decoding to obtain representational dissimilarity matrices of event-related potential responses to different faces for each subject over time. In addition, we performed representational similarity analysis with the model representational dissimilarity matrices that contained different dimensional face information. Three significant findings were observed. First, the extraction process of facial emotion occurred before that of facial identity and lasted for a long time, which was specific to the right frontal region. Second, arousal was preferentially extracted before valence during the processing of facial emotional information. Third, different dimensional face information exhibited representational stability during different periods. In conclusion, these findings reveal the precise temporal dynamics of multidimensional information processing in faces and provide powerful support for computational models on emotional face perception.


Subject(s)
Facial Recognition , Arousal , Electroencephalography , Emotions , Evoked Potentials , Facial Expression , Facial Recognition/physiology , Humans
11.
IEEE Trans Image Process ; 31: 513-524, 2022.
Article in English | MEDLINE | ID: mdl-34874852

ABSTRACT

The paper proposes a solution based on Generative Adversarial Network (GAN) for solving jigsaw puzzles. The problem assumes that an image is divided into equal square pieces, and asks to recover the image according to information provided by the pieces. Conventional jigsaw puzzle solvers often determine the relationships based on the boundaries of pieces, which ignore the important semantic information. In this paper, we propose JigsawGAN, a GAN-based auxiliary learning method for solving jigsaw puzzles with unpaired images (with no prior knowledge of the initial images). We design a multi-task pipeline that includes, (1) a classification branch to classify jigsaw permutations, and (2) a GAN branch to recover features to images in correct orders. The classification branch is constrained by the pseudo-labels generated according to the shuffled pieces. The GAN branch concentrates on the image semantic information, where the generator produces the natural images to fool the discriminator, while the discriminator distinguishes whether a given image belongs to the synthesized or the real target domain. These two branches are connected by a flow-based warp module that is applied to warp features to correct the order according to the classification results. The proposed method can solve jigsaw puzzles more efficiently by utilizing both semantic information and boundary information simultaneously. Qualitative and quantitative comparisons against several representative jigsaw puzzle solvers demonstrate the superiority of our method.

12.
Gastroenterol Res Pract ; 2021: 5682288, 2021.
Article in English | MEDLINE | ID: mdl-34868306

ABSTRACT

Ancylostomiasis is a fairly common small bowel parasite disease identified by capsule endoscopy (CE) for which a computer-aided clinical detection method has not been established. We sought to develop an artificial intelligence system with a convolutional neural network (CNN) to automatically detect hookworms in CE images. We trained a deep CNN system based on a YOLO-V4 (You Look Only Once-Version4) detector using 11236 CE images of hookworms. We assessed its performance by calculating the area under the receiver operating characteristic curve and its sensitivity, specificity, and accuracy using an independent test set of 10,529 small-bowel images including 531 images of hookworms. The trained CNN system required 403 seconds to evaluate 10,529 test images. The area under the curve for the detection of hookworms was 0.972 (95% confidence interval (CI), 0.967-0.978). The sensitivity, specificity, and accuracy of the CNN system were 92.2%, 91.1%, and 91.2%, respectively, at a probability score cut-off of 0.485. We developed and validated a CNN-based system for detecting hookworms in CE images. By combining this high-accuracy, high-speed, and oversight-preventing system with other CNN systems, we hope it will become an important supplement for detecting intestinal abnormalities in CE images. This trial is registered with ChiCTR2000034546 (a clinical research of artificial-intelligence-aided diagnosis for hookworms in small intestine by capsule endoscope images).

13.
IEEE Trans Image Process ; 30: 8212-8221, 2021.
Article in English | MEDLINE | ID: mdl-34546922

ABSTRACT

In this paper we present a new data-driven method for pixel-level scene text segmentation from a single natural image. Although scene text detection, i.e. producing a text region mask, has been well studied in the past decade, pixel-level text segmentation is still an open problem due to the lack of massive pixel-level labeled data for supervised training. To tackle this issue, we incorporate text region mask as an auxiliary data into this task, considering acquiring large-scale of labeled text region mask is commonly less expensive and time-consuming. To be specific, we propose a mutually guided network which produces a polygon-level mask in one branch and a pixel-level text mask in the other. The two branches' outputs serve as guidance for each other and the whole network is trained via a semi-supervised learning strategy. Extensive experiments are conducted to demonstrate the effectiveness of our mutually guided network, and experimental results show our network outperforms the state-of-the-art in pixel-level scene text segmentation. We also demonstrate the mask produced by our network could improve the text recognition performance besides the trivial image editing application.

14.
IEEE Trans Image Process ; 30: 6184-6197, 2021.
Article in English | MEDLINE | ID: mdl-34214040

ABSTRACT

Traditional feature-based image stitching technologies rely heavily on feature detection quality, often failing to stitch images with few features or low resolution. The learning-based image stitching solutions are rarely studied due to the lack of labeled data, making the supervised methods unreliable. To address the above limitations, we propose an unsupervised deep image stitching framework consisting of two stages: unsupervised coarse image alignment and unsupervised image reconstruction. In the first stage, we design an ablation-based loss to constrain an unsupervised homography network, which is more suitable for large-baseline scenes. Moreover, a transformer layer is introduced to warp the input images in the stitching-domain space. In the second stage, motivated by the insight that the misalignments in pixel-level can be eliminated to a certain extent in feature-level, we design an unsupervised image reconstruction network to eliminate the artifacts from features to pixels. Specifically, the reconstruction network can be implemented by a low-resolution deformation branch and a high-resolution refined branch, learning the deformation rules of image stitching and enhancing the resolution simultaneously. To establish an evaluation benchmark and train the learning framework, a comprehensive real-world image dataset for unsupervised deep image stitching is presented and released. Extensive experiments well demonstrate the superiority of our method over other state-of-the-art solutions. Even compared with the supervised solutions, our image stitching quality is still preferred by users.

15.
IEEE Trans Image Process ; 30: 6420-6433, 2021.
Article in English | MEDLINE | ID: mdl-34232877

ABSTRACT

Occlusion is an inevitable and critical problem in unsupervised optical flow learning. Existing methods either treat occlusions equally as non-occluded regions or simply remove them to avoid incorrectness. However, the occlusion regions can provide effective information for optical flow learning. In this paper, we present OIFlow, an occlusion-inpainting framework to make full use of occlusion regions. Specifically, a new appearance-flow network is proposed to inpaint occluded flows based on the image content. Moreover, a boundary dilated warp is proposed to deal with occlusions caused by displacement beyond the image border. We conduct experiments on multiple leading flow benchmark datasets such as Flying Chairs, KITTI and MPI-Sintel, which demonstrate that the performance is significantly improved by our proposed occlusion handling framework.

16.
IEEE Trans Image Process ; 30: 6434-6445, 2021.
Article in English | MEDLINE | ID: mdl-34232880

ABSTRACT

The channel redundancy of convolutional neural networks (CNNs) results in the large consumption of memories and computational resources. In this work, we design a novel Slim Convolution (SlimConv) module to boost the performance of CNNs by reducing channel redundancies. Our SlimConv consists of three main steps: Reconstruct, Transform, and Fuse. It aims to reorganize and fuse the learned features more efficiently, such that the method can compress the model effectively. Our SlimConv is a plug-and-play architectural unit that can be used to replace convolutional layers in CNNs directly. We validate the effectiveness of SlimConv by conducting comprehensive experiments on various leading benchmarks, such as ImageNet, MS COCO2014, Pascal VOC2012 segmentation, and Pascal VOC2007 detection datasets. The experiments show that SlimConv-equipped models can achieve better performances consistently, less consumption of memory and computation resources than non-equipped counterparts. For example, the ResNet-101 fitted with SlimConv achieves 77.84% top-1 classification accuracy with 4.87 GFLOPs and 27.96M parameters on ImageNet, which shows almost 0.5% better performance with about 3 GFLOPs and 38% parameters reduced.

17.
IEEE Trans Neural Netw Learn Syst ; 32(10): 4362-4373, 2021 Oct.
Article in English | MEDLINE | ID: mdl-32941156

ABSTRACT

Visual question answering (VQA) that involves understanding an image and paired questions develops very quickly with the boost of deep learning in relevant research fields, such as natural language processing and computer vision. Existing works highly rely on the knowledge of the data set. However, some questions require more professional cues other than the data set knowledge to answer questions correctly. To address such an issue, we propose a novel framework named a knowledge-based augmentation network (KAN) for VQA. We introduce object-related open-domain knowledge to assist the question answering. Concretely, we extract more visual information from images and introduce a knowledge graph to provide the necessary common sense or experience for the reasoning process. For these two augmented inputs, we design an attention module that can adjust itself according to the specific questions, such that the importance of external knowledge against detected objects can be balanced adaptively. Extensive experiments show that our KAN achieves state-of-the-art performance on three challenging VQA data sets, i.e., VQA v2, VQA-CP v2, and FVQA. In addition, our open-domain knowledge is also beneficial to VQA baselines. Code is available at https://github.com/yyyanglz/KAN.

18.
IEEE Trans Image Process ; 30: 374-385, 2021.
Article in English | MEDLINE | ID: mdl-33186111

ABSTRACT

The paper proposes a solution to effectively handle salient regions for style transfer between unpaired datasets. Recently, Generative Adversarial Networks (GAN) have demonstrated their potentials of translating images from source domain X to target domain Y in the absence of paired examples. However, such a translation cannot guarantee to generate high perceptual quality results. Existing style transfer methods work well with relatively uniform content, they often fail to capture geometric or structural patterns that always belong to salient regions. Detail losses in structured regions and undesired artifacts in smooth regions are unavoidable even if each individual region is correctly transferred into the target style. In this paper, we propose SDP-GAN, a GAN-based network for solving such problems while generating enjoyable style transfer results. We introduce a saliency network, which is trained with the generator simultaneously. The saliency network has two functions: (1) providing constraints for content loss to increase punishment for salient regions, and (2) supplying saliency features to generator to produce coherent results. Moreover, two novel losses are proposed to optimize the generator and saliency networks. The proposed method preserves the details on important salient regions and improves the total image perceptual quality. Qualitative and quantitative comparisons against several leading prior methods demonstrates the superiority of our method.

19.
Article in English | MEDLINE | ID: mdl-32310770

ABSTRACT

Physically based rendering has been widely used to generate photo-realistic images, which greatly impacts industry by providing appealing rendering, such as for entertainment and augmented reality, and academia by serving large scale high-fidelity synthetic training data for data hungry methods like deep learning. However, physically based rendering heavily relies on ray-tracing, which can be computational expensive in complicated environment and hard to parallelize. In this paper, we propose an end-to-end deep learning based approach to generate physically based rendering efficiently. Our system consists of two stacked neural networks, which effectively simulates the physical behavior of the rendering process and produces photo-realistic images. The first network, namely shading network, is designed to predict the optimal shading image from surface normal, depth and illumination; the second network, namely composition network, learns to combine the predicted shading image with the reflectance to generate the final result. Our approach is inspired by intrinsic image decomposition, and thus it is more physically reasonable to have shading as intermediate supervision. Extensive experiments show that our approach is robust to noise thanks to a modified perceptual loss and even outperforms the physically based rendering systems in complex scenes given a reasonable time budget.

20.
Sci Rep ; 10(1): 4103, 2020 03 05.
Article in English | MEDLINE | ID: mdl-32139758

ABSTRACT

The retention of a capsule endoscope (CE) in the stomach and the duodenal bulb during the examination is a troublesome problem, which can make the medical staff spend several hours observing whether the CE enters the descending segment of the duodenum (DSD). This paper investigated and evaluated the Convolution Neural Network (CNN) for automatic retention-monitoring of the CE in the stomach or the duodenal bulb. A trained CNN system based on 180,000 CE images of the DSD, stomach, and duodenal bulb was used to assess its recognition of the accuracy by calculating the area under the receiver operating characteristic curve (ROC-AUC), sensitivity and specificity. The AUC for distinguishing the DSD was 0.984. The sensitivity, specificity, positive predictive value, and negative predictive value of the CNN were 97.8%, 96.0%, 96.1% and 97.8%, respectively, at a cut-off value of 0.42 for the probability score. The deviated rate of the time into the DSD marked by the CNN at less than ±8 min was 95.7% (P < 0.01). These results indicate that the CNN for automatic retention-monitoring of the CE in the stomach or the duodenal bulb can be used as an efficient auxiliary measure in the clinical practice.


Subject(s)
Capsule Endoscopes , Capsule Endoscopy/methods , Duodenum , Monitoring, Intraoperative/methods , Neural Networks, Computer , Stomach , Adolescent , Adult , Aged , Aged, 80 and over , Algorithms , Automation , Female , Foreign Bodies/prevention & control , Humans , Image Processing, Computer-Assisted , Male , Middle Aged , Retrospective Studies , Sensitivity and Specificity , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...