Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Med Imaging ; 42(2): 467-480, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36378797

RESUMO

Accurately delineating individual teeth and the gingiva in the three-dimension (3D) intraoral scanned (IOS) mesh data plays a pivotal role in many digital dental applications, e.g., orthodontics. Recent research shows that deep learning based methods can achieve promising results for 3D tooth segmentation, however, most of them rely on high-quality labeled dataset which is usually of small scales as annotating IOS meshes requires intensive human efforts. In this paper, we propose a novel self-supervised learning framework, named STSNet, to boost the performance of 3D tooth segmentation leveraging on large-scale unlabeled IOS data. The framework follows two-stage training, i.e., pre-training and fine-tuning. In pre-training, three hierarchical-level, i.e., point-level, region-level, cross-level, contrastive losses are proposed for unsupervised representation learning on a set of predefined matched points from different augmented views. The pretrained segmentation backbone is further fine-tuned in a supervised manner with a small number of labeled IOS meshes. With the same amount of annotated samples, our method can achieve an mIoU of 89.88%, significantly outperforming the supervised counterparts. The performance gain becomes more remarkable when only a small amount of labeled samples are available. Furthermore, STSNet can achieve better performance with only 40% of the annotated samples as compared to the fully supervised baselines. To the best of our knowledge, we present the first attempt of unsupervised pre-training for 3D tooth segmentation, demonstrating its strong potential in reducing human efforts for annotation and verification.


Assuntos
Próteses e Implantes , Telas Cirúrgicas , Humanos , Processamento de Imagem Assistida por Computador , Cintilografia , Aprendizado de Máquina Supervisionado
2.
Comput Intell Neurosci ; 2022: 1569911, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36317074

RESUMO

With the characteristic of high recognition rate and strong network robustness, convolutional neural network has now become the most mainstream method in the field of crop disease recognition. Aiming at the problems with insufficient numbers of labeled samples, complex backgrounds of sample images, and difficult extraction of useful feature information, a novel algorithm is proposed in this study based on attention mechanisms and convolutional neural networks for cassava leaf recognition. Specifically, a combined data augmentation strategy for datasets is used to prevent single distribution of image datasets, and then the PDRNet (plant disease recognition network) combining channel attention mechanism and spatial attention mechanism is proposed. The algorithm is designed as follows. Firstly, an attention module embedded in the network layer is deployed to establish remote dependence on each feature layer, strengthen the key feature information, and suppress the interference feature information, such as background noise. Secondly, a stochastic depth learning strategy is formulated to accelerate the training and inference of the network. And finally, a transfer learning method is adopted to load the pretrained weights into the model proposed in this study, with the recognition accuracy of the model enhanced by means of detailed parameter adjustments and dynamic changes in the learning rate. A large number of comparative experiments demonstrate that the proposed algorithm can deliver a recognition accuracy of 99.56% on the cassava disease image dataset, reaching the state-of-the-art level among CNN-based methods in terms of accuracy.


Assuntos
Manihot , Redes Neurais de Computação , Algoritmos , Reconhecimento Psicológico , Projetos de Pesquisa
3.
Tomography ; 8(2): 905-919, 2022 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-35448707

RESUMO

There is a growing demand for high-resolution (HR) medical images for both clinical and research applications. Image quality is inevitably traded off with acquisition time, which in turn impacts patient comfort, examination costs, dose, and motion-induced artifacts. For many image-based tasks, increasing the apparent spatial resolution in the perpendicular plane to produce multi-planar reformats or 3D images is commonly used. Single-image super-resolution (SR) is a promising technique to provide HR images based on deep learning to increase the resolution of a 2D image, but there are few reports on 3D SR. Further, perceptual loss is proposed in the literature to better capture the textural details and edges versus pixel-wise loss functions, by comparing the semantic distances in the high-dimensional feature space of a pre-trained 2D network (e.g., VGG). However, it is not clear how one should generalize it to 3D medical images, and the attendant implications are unclear. In this paper, we propose a framework called SOUP-GAN: Super-resolution Optimized Using Perceptual-tuned Generative Adversarial Network (GAN), in order to produce thinner slices (e.g., higher resolution in the 'Z' plane) with anti-aliasing and deblurring. The proposed method outperforms other conventional resolution-enhancement methods and previous SR work on medical images based on both qualitative and quantitative comparisons. Moreover, we examine the model in terms of its generalization for arbitrarily user-selected SR ratios and imaging modalities. Our model shows promise as a novel 3D SR interpolation technique, providing potential applications for both clinical and research applications.


Assuntos
Artefatos , Imageamento por Ressonância Magnética , Humanos , Imageamento Tridimensional , Movimento (Física)
4.
Med Phys ; 49(6): 3692-3704, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35312077

RESUMO

PURPOSE: Automatic segmentation of medical lesions is a prerequisite for efficient clinic analysis. Segmentation algorithms for multimodal medical images have received much attention in recent years. Different strategies for multimodal combination (or fusion), such as probability theory, fuzzy models, belief functions, and deep neural networks, have also been developed. In this paper, we propose the modality weighted UNet (MW-UNet) and attention-based fusion method to combine multimodal images for medical lesion segmentation. METHODS: MW-UNet is a multimodal fusion method which is based on UNet, but we use a shallower layer and fewer feature map channels to reduce the amount of network parameters, and our method uses the new multimodal fusion method called fusion attention. It uses weighted sum rule and fusion attention to combine feature maps in intermediate layers. During training, all the weight parameters are updated through backpropagation like other parameters in the network. We also incorporate residual blocks into MW-UNet to further improve segmentation performance. The comparison between the automatic multimodal lesion segmentations and the manual contours was quantified by (1) five metrics including Dice, 95% Hausdorff Distance (HD95), volumetric overlap error (VOE), relative volume difference (RVD), and mean-Intersection-over-Union (mIoU); (2) Number of parameters and flops to calculate the complexity of the network. RESULTS: The proposed method is verified on ZJCHD, which is the data set of contrast-enhanced computed tomography (CECT) for Liver Lesion Segmentation taken from Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Hangzhou, China. For accuracy evaluation, we use 120 patients with liver lesions from ZJCHD, of which 100 are used for fourfold cross-validation (CV) and 20 are used for hold-out (HO) test. The mean Dice was 90.55 ± 14.44 % $90.55 \pm 14.44\%$ and 89.31 ± 19.07 % $89.31 \pm 19.07\%$ for HO and CV tests, respectively. The corresponding HD95, VOE, RVD, and mIoU of the two tests are 1.95 ± 1.83 and 2.67 ± 3.35 mm, 13.11 ± 15.83 and 13.13 ± 18.52 % $13.13 \pm 18.52 \%$ , 12.20 ± 18.20 and 13.00 ± 21.82 % $13.00 \pm 21.82 \%$ , and 83.79 ± 15.83 and 82.35 ± 20.03 % $82.35 \pm 20.03 \%$ . The parameters and flops of our method is 4.04 M and 18.36 G, respectively. CONCLUSIONS: The results show that our method performs well on multimodal liver lesion segmentation. It can be easily extended to other multimodal data sets and other networks for multimodal fusion. Our method is the potential to provide doctors with multimodal annotations and assist them with clinical diagnosis.


Assuntos
Redes Neurais de Computação , Tomografia Computadorizada por Raios X , Abdome , Algoritmos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Fígado , Tomografia Computadorizada por Raios X/métodos
5.
Sensors (Basel) ; 21(3)2021 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-33540831

RESUMO

Emotion recognition is of great importance for artificial intelligence, robots, and medicine etc. Although many techniques have been developed for emotion recognition, with certain successes, they rely heavily on complicated and expensive equipment. Skin potential (SP) has been recognized to be correlated with human emotions for a long time, but has been largely ignored due to the lack of systematic research. In this paper, we propose a single SP-signal-based method for emotion recognition. Firstly, we developed a portable wireless device to measure the SP signal between the middle finger and left wrist. Then, a video induction experiment was designed to stimulate four kinds of typical emotion (happiness, sadness, anger, fear) in 26 subjects. Based on the device and video induction, we obtained a dataset consisting of 397 emotion samples. We extracted 29 features from each of the emotion samples and used eight well-established algorithms to classify the four emotions based on these features. Experimental results show that the gradient-boosting decision tree (GBDT), logistic regression (LR) and random forest (RF) algorithms achieved the highest accuracy of 75%. The obtained accuracy is similar to, or even better than, that of other methods using multiple physiological signals. Our research demonstrates the feasibility of the SP signal's integration into existing physiological signals for emotion recognition.


Assuntos
Inteligência Artificial , Emoções , Pele , Algoritmos , Humanos , Modelos Logísticos , Tecnologia sem Fio
6.
IEEE Trans Image Process ; 27(9): 4529-4544, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29993577

RESUMO

The past decade has witnessed the use of highlevel features in saliency prediction for both videos and images. Unfortunately, the existing saliency prediction methods only handle high-level static features, such as face. In fact, high-level dynamic features (also called actions), such as speaking or head turning, are also extremely attractive to visual attention in videos. Thus, in this paper, we propose a data-driven method for learning to predict the saliency of multiple-face videos, by leveraging both static and dynamic features at high-level. Specifically, we introduce an eye-tracking database, collecting the fixations of 39 subjects viewing 65 multiple-face videos. Through analysis on our database, we find a set of high-level features that cause a face to receive extensive visual attention. These high-level features include the static features of face size, center-bias and head pose, as well as the dynamic features of speaking and head turning. Then, we present the techniques for extracting these high-level features. Afterwards, a novel model, namely multiple hidden Markov model (M-HMM), is developed in our method to enable the transition of saliency among faces. In our MHMM, the saliency transition takes into account both the state of saliency at previous frames and the observed high-level features at the current frame. The experimental results show that the proposed method is superior to other state-of-the-art methods in predicting visual attention on multiple-face videos. Finally, we shed light on a promising implementation of our saliency prediction method in locating the region-of-interest (ROI), for video conference compression with high efficiency video coding (HEVC).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...