Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Comput Biol Med ; 178: 108671, 2024 May 31.
Article in English | MEDLINE | ID: mdl-38870721

ABSTRACT

Medical image segmentation is a compelling fundamental problem and an important auxiliary tool for clinical applications. Recently, the Transformer model has emerged as a valuable tool for addressing the limitations of convolutional neural networks by effectively capturing global relationships and numerous hybrid architectures combining convolutional neural networks (CNNs) and Transformer have been devised to enhance segmentation performance. However, they suffer from multilevel semantic feature gaps and fail to account for multilevel dependencies between space and channel. In this paper, we propose a hierarchical dependency Transformer for medical image segmentation, named HD-Former. First, we utilize a Compressed Bottleneck (CB) module to enrich shallow features and localize the target region. We then introduce the Dual Cross Attention Transformer (DCAT) module to fuse multilevel features and bridge the feature gap. In addition, we design the broad exploration network (BEN) that cascades convolution and self-attention from different percepts to capture hierarchical dense contextual semantic features locally and globally. Finally, we exploit uncertain multitask edge loss to adaptively map predictions to a consistent feature space, which can optimize segmentation edges. The extensive experiments conducted on medical image segmentation from ISIC, LiTS, Kvasir-SEG, and CVC-ClinicDB datasets demonstrate that our HD-Former surpasses the state-of-the-art methods in terms of both subjective visual performance and objective evaluation. Code: https://github.com/barcelonacontrol/HD-Former.

2.
Front Neurorobot ; 18: 1383943, 2024.
Article in English | MEDLINE | ID: mdl-38817732

ABSTRACT

Introduction: Accurately counting the number of dense objects in an image, such as pedestrians or vehicles, is a challenging and practical task. The existing density map regression methods based on CNN are mainly used to count a class of dense objects in a single scene. However, in complex traffic scenes, objects such as vehicles and pedestrians usually exist at the same time, and multiple classes of dense objects need to be counted simultaneously. Methods: To solve the above issues, we propose a new multiple types of dense object counting method based on feature enhancement, which can enhance the features of dense counting objects in complex traffic scenes to realize the classification and regression counting of dense vehicles and people. The counting model consists of the regression subnet and the classification subnet. The regression subnet is primarily used to generate two-channel predicted density maps, mainly including the initial feature layer and the feature enhancement layer, in which the feature enhancement layer can enhance the classification features and regression counting features of dense objects in complex traffic scenes. The classification subnet mainly supervises classifying dense vehicles and people into two feature channels to assist the regression counting task of the regression subnets. Results: Our method is compared on VisDrone+ datasets, ApolloScape+ datasets, and UAVDT+ datasets. The experimental results show that the method counts two kinds of dense objects simultaneously and outputs a high-quality two-channel predicted density map. The counting performance is better than the state-of-the-art counting network in dense people and vehicle counting. Discussion: In future work, we will further improve the feature extraction ability of the model in complex traffic scenes to classify and count a variety of dense objects such as cars, pedestrians, and non-motor vehicles.

3.
Comput Biol Med ; 174: 108374, 2024 May.
Article in English | MEDLINE | ID: mdl-38582003

ABSTRACT

Semi-supervised medical image segmentation strives to polish deep models with a small amount of labeled data and a large amount of unlabeled data. The efficiency of most semi-supervised medical image segmentation methods based on voxel-level consistency learning is affected by low-confidence voxels. In addition, voxel-level consistency learning fails to consider the spatial correlation between neighboring voxels. To encourage reliable voxel-level consistent learning, we propose a dual-teacher affine consistent uncertainty estimation method to filter out some voxels with high uncertainty. Moreover, we design the spatially dependent mutual information module, which enhances the spatial dependence between neighboring voxels by maximizing the mutual information between the local voxel blocks predicted from the dual-teacher models and the student model, enabling consistent learning at the block level. On two benchmark medical image segmentation datasets, including the Left Atrial Segmentation Challenge dataset and the BraTS-2019 dataset, our method achieves state-of-the-art performance in both quantitative and qualitative aspects.


Subject(s)
Image Processing, Computer-Assisted , Humans , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Algorithms , Databases, Factual
4.
Sensors (Basel) ; 23(9)2023 Apr 27.
Article in English | MEDLINE | ID: mdl-37177519

ABSTRACT

The vehicle logo contains the vehicle's identity information, so vehicle logo detection (VLD) technology has extremely important significance. Although the VLD field has been studied for many years, the detection task is still difficult due to the small size of the vehicle logo and the background interference problem. To solve these problems, this paper proposes a method of VLD based on the YOLO-T model and the correlation of the vehicle space structure. Aiming at the small size of the vehicle logo, we propose a vehicle logo detection network called YOLO-T. It integrates multiple receptive fields and establishes a multi-scale detection structure suitable for VLD tasks. In addition, we design an effective pre-training strategy to improve the detection accuracy of YOLO-T. Aiming at the background interference, we use the position correlation between the vehicle lights and the vehicle logo to extract the region of interest of the vehicle logo. This measure not only reduces the search area but also weakens the background interference. We have labeled a new vehicle logo dataset named LOGO-17, which contains 17 different categories of vehicle logos. The experimental results show that our proposed method achieves high detection accuracy and outperforms the existing vehicle logo detection methods.

5.
Front Neurosci ; 17: 1153356, 2023.
Article in English | MEDLINE | ID: mdl-37077320

ABSTRACT

Medical image segmentation has long been a compelling and fundamental problem in the realm of neuroscience. This is an extremely challenging task due to the intensely interfering irrelevant background information to segment the target. State-of-the-art methods fail to consider simultaneously addressing both long-range and short-range dependencies, and commonly emphasize the semantic information characterization capability while ignoring the geometric detail information implied in the shallow feature maps resulting in the dropping of crucial features. To tackle the above problem, we propose a Global-Local representation learning net for medical image segmentation, namely GL-Segnet. In the Feature encoder, we utilize the Multi-Scale Convolution (MSC) and Multi-Scale Pooling (MSP) modules to encode the global semantic representation information at the shallow level of the network, and multi-scale feature fusion operations are applied to enrich local geometric detail information in a cross-level manner. Beyond that, we adopt a global semantic feature extraction module to perform filtering of irrelevant background information. In Attention-enhancing Decoder, we use the Attention-based feature decoding module to refine the multi-scale fused feature information, which provides effective cues for attention decoding. We exploit the structural similarity between images and the edge gradient information to propose a hybrid loss to improve the segmentation accuracy of the model. Extensive experiments on medical image segmentation from Glas, ISIC, Brain Tumors and SIIM-ACR demonstrated that our GL-Segnet is superior to existing state-of-art methods in subjective visual performance and objective evaluation.

6.
IEEE Trans Biomed Eng ; 70(5): 1622-1633, 2023 05.
Article in English | MEDLINE | ID: mdl-36409812

ABSTRACT

OBJECTIVE: Autism spectrum disorder (ASD) affects nearly 1 in 44 children younger than 8 years old in the United States, and the situation may be even worse in remote areas of the world. However, it is difficult to utilize existing approaches to screen patients with ASD in remote areas due to the lack of professionals and high-tech instruments. Therefore, we develop a fast and accurate scalable method for screening children with ASD. METHODS: A deep weakly supervised artificial intelligence model is proposed for ASD screening based on the dynamic viewing patterns (DVP) over viewing time and visual stimuli. In training, we utilized a long short-term memory (LSTM) network to learn the mapping between the autoencoder-based encoded dynamic patterns and the labels. In testing, we fed the encoded DVP of each undiagnosed child into the trained network and predicted the diagnosis category based on the score on all stimuli. RESULTS: Based on the multi-center evaluation on 165 subjects (95 typically developing children and 70 children with ASD) aged 3-6 years from different areas of China, our method achieves an average recognition accuracy of 96.73% (sensitivity 96.85% and specificity 96.63%). CONCLUSION: The DVP is a discriminating attribute to identify the atypical performance of ASD. The DVP-based model is an effective platform for enhancing auxiliary ASD screening accuracy. SIGNIFICANCE: We validated the importance of dynamic information on between-group differences and classification. Additionally, the evaluation results suggest that the proposed model can provide an objective and accessible tool for scalable ASD screening applications.


Subject(s)
Autism Spectrum Disorder , Humans , Child , United States , Autism Spectrum Disorder/diagnosis , Artificial Intelligence , Learning , Recognition, Psychology
7.
Brain Sci ; 12(9)2022 Aug 27.
Article in English | MEDLINE | ID: mdl-36138880

ABSTRACT

Due to the complexity of medical imaging techniques and the high heterogeneity of glioma surfaces, image segmentation of human gliomas is one of the most challenging tasks in medical image analysis. Current methods based on convolutional neural networks concentrate on feature extraction while ignoring the correlation between local and global. In this paper, we propose a residual mix transformer fusion net, namely RMTF-Net, for brain tumor segmentation. In the feature encoder, a residual mix transformer encoder including a mix transformer and a residual convolutional neural network (RCNN) is proposed. The mix transformer gives an overlapping patch embedding mechanism to cope with the loss of patch boundary information. Moreover, a parallel fusion strategy based on RCNN is utilized to obtain local-global balanced information. In the feature decoder, a global feature integration (GFI) module is applied, which can enrich the context with the global attention feature. Extensive experiments on brain tumor segmentation from LGG, BraTS2019 and BraTS2020 demonstrated that our proposed RMTF-Net is superior to existing state-of-art methods in subjective visual performance and objective evaluation.

8.
Comput Intell Neurosci ; 2022: 6608448, 2022.
Article in English | MEDLINE | ID: mdl-35733557

ABSTRACT

Effective extraction and representation of action information are critical in action recognition. The majority of existing methods fail to recognize actions accurately because of interference of background changes when the proportion of high-activity action areas is not reinforced and by using RGB flow alone or combined with optical flow. A novel recognition method using action sequences optimization and two-stream fusion network with different modalities is proposed to solve these problems. The method is based on shot segmentation and dynamic weighted sampling, and it reconstructs the video by reinforcing the proportion of high-activity action areas, eliminating redundant intervals, and extracting long-range temporal information. A two-stream 3D dilated neural network that integrates features of RGB and human skeleton information is also proposed. The human skeleton information strengthens the deep representation of humans for robust processing, alleviating the interference of background changes, and the dilated CNN enlarges the receptive field of feature extraction. Compared with existing approaches, the proposed method achieves superior or comparable classification accuracies on benchmark datasets UCF101 and HMDB51.


Subject(s)
Pattern Recognition, Automated , Rivers , Humans , Neural Networks, Computer , Pattern Recognition, Automated/methods
9.
Sensors (Basel) ; 22(1)2021 Dec 31.
Article in English | MEDLINE | ID: mdl-35009827

ABSTRACT

An important area in a gathering place is a region attracting the constant attention of people and has evident visual features, such as a flexible stage or an open-air show. Finding such areas can help security supervisors locate the abnormal regions automatically. The existing related methods lack an efficient means to find important area candidates from a scene and have failed to judge whether or not a candidate attracts people's attention. To realize the detection of an important area, this study proposes a two-stage method with a novel multi-input attention network (MAN). The first stage, called important area candidate generation, aims to generate candidate important areas with an image-processing algorithm (i.e., K-means++, image dilation, median filtering, and the RLSA algorithm). The candidate areas can be selected automatically for further analysis. The second stage, called important area candidate classification, aims to detect an important area from candidates with MAN. In particular, MAN is designed as a multi-input network structure, which fuses global and local image features to judge whether or not an area attracts people's attention. To enhance the representation of candidate areas, two modules (i.e., channel attention and spatial attention modules) are proposed on the basis of the attention mechanism. These modules are mainly based on multi-layer perceptron and pooling operation to reconstruct the image feature and provide considerably efficient representation. This study also contributes to a new dataset called gathering place important area detection for testing the proposed two-stage method. Lastly, experimental results show that the proposed method has good performance and can correctly detect an important area.


Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Algorithms , Humans , Research Design
10.
Sensors (Basel) ; 19(20)2019 Oct 18.
Article in English | MEDLINE | ID: mdl-31635231

ABSTRACT

Vehicle Logo Recognition (VLR) is an important part of vehicle behavior analysis and can provide supplementary information for vehicle identification, which is an essential research topic in robotic systems. However, the inaccurate extraction of vehicle logo candidate regions will affect the accuracy of logo recognition. Additionally, the existing methods have low recognition rate for most small vehicle logos and poor performance under complicated environments. A VLR method based on enhanced matching, constrained region extraction and SSFPD network is proposed in this paper to solve the aforementioned problems. A constrained region extraction method based on segmentation of the car head and car tail is proposed to accurately extract the candidate region of logo. An enhanced matching method is proposed to improve the detection performance of small objects, which augment each of training images by copy-pasting small objects many times in the unconstrained region. A single deep neural network based on a reduced ResNeXt model and Feature Pyramid Networks is proposed in this paper, which is named as Single Shot Feature Pyramid Detector (SSFPD). The SSFPD uses the reduced ResNeXt to improve classification performance of the network and retain more detailed information for small-sized vehicle logo detection. Additionally, it uses the Feature Pyramid Networks module to bring in more semantic context information to build several high-level semantic feature maps, which effectively improves recognition performance. Extensive evaluations have been made on self-collected and public vehicle logo datasets. The proposed method achieved 93.79% accuracy on the Common Vehicle Logos Dataset and 99.52% accuracy on another public dataset, respectively, outperforming the existing methods.

11.
Sensors (Basel) ; 18(9)2018 Sep 16.
Article in English | MEDLINE | ID: mdl-30223598

ABSTRACT

Behavior analysis through posture recognition is an essential research in robotic systems. Sitting with unhealthy sitting posture for a long time seriously harms human health and may even lead to lumbar disease, cervical disease and myopia. Automatic vision-based detection of unhealthy sitting posture, as an example of posture detection in robotic systems, has become a hot research topic. However, the existing methods only focus on extracting features of human themselves and lack understanding relevancies among objects in the scene, and henceforth fail to recognize some types of unhealthy sitting postures in complicated environments. To alleviate these problems, a scene recognition and semantic analysis approach to unhealthy sitting posture detection in screen-reading is proposed in this paper. The key skeletal points of human body are detected and tracked with a Microsoft Kinect sensor. Meanwhile, a deep learning method, Faster R-CNN, is used in the scene recognition of our method to accurately detect objects and extract relevant features. Then our method performs semantic analysis through Gaussian-Mixture behavioral clustering for scene understanding. The relevant features in the scene and the skeletal features extracted from human are fused into the semantic features to discriminate various types of sitting postures. Experimental results demonstrated that our method accurately and effectively detected various types of unhealthy sitting postures in screen-reading and avoided error detection in complicated environments. Compared with the existing methods, our proposed method detected more types of unhealthy sitting postures including those that the existing methods could not detect. Our method can be potentially applied and integrated as a medical assistance in robotic systems of health care and treatment.


Subject(s)
Posture , Reading , Robotics/methods , Screen Time , Semantics , Automation , Humans
12.
ScientificWorldJournal ; 2014: 364501, 2014.
Article in English | MEDLINE | ID: mdl-24757419

ABSTRACT

We propose two tampered image detection methods based on consistency of shadow. The first method is based on texture consistency of shadow for the first kind of splicing image, in which the shadow as well as main body is copied and pasted from another image. The suspicious region including shadow and nonshadow is first selected. Then texture features of the shadow region and the nonshadow region are extracted. Last, correlation function is used to measure the similarity of the two texture features. By comparing the similarity, we can judge whether the image is tampered. Due to the failure in detecting the second kind of splicing image, in which main body, its shadow, and surrounding regions are copied and pasted from another image, another method based on strength of light source of shadows is proposed. The two suspicious shadow regions are first selected. Then an efficient method is used to estimate the strength of light source of shadow. Last, the similarity of strength of light source of two shadows is measured by correlation function. By combining the two methods, we can detect forged image with shadows. Experimental results demonstrate that the proposed methods are effective despite using simplified model compared with the existing methods.


Subject(s)
Image Processing, Computer-Assisted/methods , Image Enhancement/methods , Models, Theoretical
SELECTION OF CITATIONS
SEARCH DETAIL
...