Pesquisa | Portal Regional da BVS (teste)

Enhancing surgical instrument segmentation: integrating vision transformer insights with adapter.

Wei, Meng; Shi, Miaojing; Vercauteren, Tom.

Int J Comput Assist Radiol Surg ; 19(7): 1313-1320, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38717737

RESUMO

PURPOSE: In surgical image segmentation, a major challenge is the extensive time and resources required to gather large-scale annotated datasets. Given the scarcity of annotated data in this field, our work aims to develop a model that achieves competitive performance with training on limited datasets, while also enhancing model robustness in various surgical scenarios. METHODS: We propose a method that harnesses the strengths of pre-trained Vision Transformers (ViTs) and data efficiency of convolutional neural networks (CNNs). Specifically, we demonstrate how a CNN segmentation model can be used as a lightweight adapter for a frozen ViT feature encoder. Our novel feature adapter uses cross-attention modules that merge the multiscale features derived from the CNN encoder with feature embeddings from ViT, ensuring integration of the global insights from ViT along with local information from CNN. RESULTS: Extensive experiments demonstrate our method outperforms current models in surgical instrument segmentation. Specifically, it achieves superior performance in binary segmentation on the Robust-MIS 2019 dataset, as well as in multiclass segmentation tasks on the EndoVis 2017 and EndoVis 2018 datasets. It also showcases remarkable robustness through cross-dataset validation across these 3 datasets, along with the CholecSeg8k and AutoLaparo datasets. Ablation studies based on the datasets prove the efficacy of our novel adapter module. CONCLUSION: In this study, we presented a novel approach integrating ViT and CNN. Our unique feature adapter successfully combines the global insights of ViT with the local, multi-scale spatial capabilities of CNN. This integration effectively overcomes data limitations in surgical instrument segmentation. The source code is available at: https://github.com/weimengmeng1999/AdapterSIS.git .

Assuntos

Redes Neurais de Computação , Humanos , Instrumentos Cirúrgicos , Processamento de Imagem Assistida por Computador/métodos , Cirurgia Assistida por Computador/métodos

Boosting Zero-Shot Learning via Contrastive Optimization of Attribute Representations.

Du, Yu; Shi, Miaojing; Wei, Fangyun; Li, Guoqi.

IEEE Trans Neural Netw Learn Syst ; PP2023 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-37527321

RESUMO

Zero-shot learning (ZSL) aims to recognize classes that do not have samples in the training set. One representative solution is to directly learn an embedding function associating visual features with corresponding class semantics for recognizing new classes. Many methods extend upon this solution, and recent ones are especially keen on extracting rich features from images, e.g., attribute features. These attribute features are normally extracted within each individual image; however, the common traits for features across images yet belonging to the same attribute are not emphasized. In this article, we propose a new framework to boost ZSL by explicitly learning attribute prototypes beyond images and contrastively optimizing them with attribute-level features within images. Besides the novel architecture, two elements are highlighted for attribute representations: a new prototype generation module (PM) is designed to generate attribute prototypes from attribute semantics; a hard-example-based contrastive optimization scheme is introduced to reinforce attribute-level features in the embedding space. We explore two alternative backbones, CNN-based and transformer-based, to build our framework and conduct experiments on three standard benchmarks, Caltech-UCSD Birds-200-2011 (CUB), SUN attribute database (SUN), and animals with attributes 2 (AwA2). Results on these benchmarks demonstrate that our method improves the state of the art by a considerable margin. Our codes will be available at https://github.com/dyabel/CoAR-ZSL.git.

Facial Video-Based Remote Physiological Measurement via Self-Supervised Learning.

Yue, Zijie; Shi, Miaojing; Ding, Shuai.

IEEE Trans Pattern Anal Mach Intell ; 45(11): 13844-13859, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37490386

RESUMO

Facial video-based remote physiological measurement aims to estimate remote photoplethysmography (rPPG) signals from human facial videos and then measure multiple vital signs (e.g., heart rate, respiration frequency) from rPPG signals. Recent approaches achieve it by training deep neural networks, which normally require abundant facial videos and synchronously recorded photoplethysmography (PPG) signals for supervision. However, the collection of these annotated corpora is not easy in practice. In this paper, we introduce a novel frequency-inspired self-supervised framework that learns to estimate rPPG signals from facial videos without the need of ground truth PPG signals. Given a video sample, we first augment it into multiple positive/negative samples which contain similar/dissimilar signal frequencies to the original one. Specifically, positive samples are generated using spatial augmentation; negative samples are generated via a learnable frequency augmentation module, which performs non-linear signal frequency transformation on the input without excessively changing its visual appearance. Next, we introduce a local rPPG expert aggregation module to estimate rPPG signals from augmented samples. It encodes complementary pulsation information from different face regions and aggregates them into one rPPG prediction. Finally, we propose a series of frequency-inspired losses, i.e., frequency contrastive loss, frequency ratio consistency loss, and cross-video frequency agreement loss, for the optimization of estimated rPPG signals from multiple augmented video samples. We conduct rPPG-based heart rate, heart rate variability, and respiration frequency estimation on five standard benchmarks. The experimental results demonstrate that our method improves the state of the art by a large margin.

Redesigning Multi-Scale Neural Network for Crowd Counting.

Du, Zhipeng; Shi, Miaojing; Deng, Jiankang; Zafeiriou, Stefanos.

IEEE Trans Image Process ; 32: 3664-3678, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37384475

RESUMO

Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (e.g. by concatenation) or merged through the guidance of proxies (e.g. attentions) in the DNNs. Despite their prevalence, these combination methods are not sophisticated enough to deal with the per-pixel performance discrepancy over multi-scale density maps. In this work, we redesign the multi-scale neural network by introducing a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting. Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales; pixel-wise soft gating nets are introduced to provide pixel-wise soft weights for scale combinations in different hierarchies. The network is optimized using both the crowd density map and the local counting map, where the latter is obtained by local integration on the former. Optimizing both can be problematic because of their potential conflicts. We introduce a new relative local counting loss based on relative count differences among hard-predicted local regions in an image, which proves to be complementary to the conventional absolute error loss on the density map. Experiments show that our method achieves the state-of-the-art performance on five public datasets, i.e. ShanghaiTech, UCF_CC_50, JHU-CROWD++, NWPU-Crowd and Trancos. Our codes will be available at https://github.com/ZPDu/Redesigning-Multi-Scale-Neural-Network-for-Crowd-Counting.

A Selective Biogeography-Based Optimizer Considering Resource Allocation for Large-Scale Global Optimization.

Cui, Meiji; Li, Li; Shi, Miaojing.

Comput Intell Neurosci ; 2019: 1240162, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31379932

RESUMO

Biogeography-based optimization (BBO), a recent proposed metaheuristic algorithm, has been successfully applied to many optimization problems due to its simplicity and efficiency. However, BBO is sensitive to the curse of dimensionality; its performance degrades rapidly as the dimensionality of the search space increases. In this paper, a selective migration operator is proposed to scale up the performance of BBO and we name it selective BBO (SBBO). The differential migration operator is selected heuristically to explore the global area as far as possible whist the normal distributed migration operator is chosen to exploit the local area. By the means of heuristic selection, an appropriate migration operator can be used to search the global optimum efficiently. Moreover, the strategy of cooperative coevolution (CC) is adopted to solve large-scale global optimization problems (LSOPs). To deal with subgroup imbalance contribution to the whole solution in the context of CC, a more efficient computing resource allocation is proposed. Extensive experiments are conducted on the CEC 2010 benchmark suite for large-scale global optimization, and the results show the effectiveness and efficiency of SBBO compared with BBO variants and other representative algorithms for LSOPs. Also, the results confirm that the proposed computing resource allocation is vital to the large-scale optimization within the limited computation budget.

Assuntos

Algoritmos , Simulação por Computador , Metodologias Computacionais , Alocação de Recursos , Simulação por Computador/economia , Heurística , Distribuição Normal , Resolução de Problemas

Computerized tongue image segmentation via the double geo-vector flow.

Shi, Miao-Jing; Li, Guo-Zheng; Li, Fu-Feng; Xu, Chao.

Chin Med ; 9(1): 7, 2014 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-24507094

RESUMO

BACKGROUND: Visual inspection for tongue analysis is a diagnostic method in traditional Chinese medicine (TCM). Owing to the variations in tongue features, such as color, texture, coating, and shape, it is difficult to precisely extract the tongue region in images. This study aims to quantitatively evaluate tongue diagnosis via automatic tongue segmentation. METHODS: Experiments were conducted using a clinical image dataset provided by the Laboratory of Traditional Medical Syndromes, Shanghai University of TCM. First, a clinical tongue image was refined by a saliency window. Second, we initialized the tongue area as the upper binary part and lower level set matrix. Third, a double geo-vector flow (DGF) was proposed to detect the tongue edge and segment the tongue region in the image, such that the geodesic flow was evaluated in the lower part, and the geo-gradient vector flow was evaluated in the upper part. RESULTS: The performance of the DGF was evaluated using 100 images. The DGF exhibited better results compared with other representative studies, with its true-positive volume fraction reaching 98.5%, its false-positive volume fraction being 1.51%, and its false-negative volume fraction being 1.42%. The errors between the proposed automatic segmentation results and manual contours were 0.29 and 1.43% in terms of the standard boundary error metrics of Hausdorff distance and mean distance, respectively. CONCLUSIONS: By analyzing the time complexity of the DGF and evaluating its performance via standard boundary and area error metrics, we have shown both efficiency and effectiveness of the DGF for automatic tongue image segmentation.

W-tree indexing for fast visual word generation.

Shi, Miaojing; Xu, Ruixin; Tao, Dacheng; Xu, Chao.

IEEE Trans Image Process ; 22(3): 1209-22, 2013 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-23192558

RESUMO

The bag-of-visual-words representation has been widely used in image retrieval and visual recognition. The most time-consuming step in obtaining this representation is the visual word generation, i.e., assigning visual words to the corresponding local features in a high-dimensional space. Recently, structures based on multibranch trees and forests have been adopted to reduce the time cost. However, these approaches cannot perform well without a large number of backtrackings. In this paper, by considering the spatial correlation of local features, we can significantly speed up the time consuming visual word generation process while maintaining accuracy. In particular, visual words associated with certain structures frequently co-occur; hence, we can build a co-occurrence table for each visual word for a large-scale data set. By associating each visual word with a probability according to the corresponding co-occurrence table, we can assign a probabilistic weight to each node of a certain index structure (e.g., a KD-tree and a K-means tree), in order to re-direct the searching path to be close to its global optimum within a small number of backtrackings. We carefully study the proposed scheme by comparing it with the fast library for approximate nearest neighbors and the random KD-trees on the Oxford data set. Thorough experimental results suggest the efficiency and effectiveness of the new scheme.

Assuntos

Algoritmos , Documentação/métodos , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Simbolismo , Inteligência Artificial , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Semântica , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA