Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Am J Pathol ; 194(5): 721-734, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38320631

RESUMO

Histopathology is the reference standard for pathology diagnosis, and has evolved with the digitization of glass slides [ie, whole slide images (WSIs)]. While trained histopathologists are able to diagnose diseases by examining WSIs visually, this process is time consuming and prone to variability. To address these issues, artificial intelligence models are being developed to generate slide-level representations of WSIs, summarizing the entire slide as a single vector. This enables various computational pathology applications, including interslide search, multimodal training, and slide-level classification. Achieving expressive and robust slide-level representations hinges on patch feature extraction and aggregation steps. This study proposed an additional binary patch grouping (BPG) step, a plugin that can be integrated into various slide-level representation pipelines, to enhance the quality of slide-level representation in bone marrow histopathology. BPG excludes patches with less clinical relevance through minimal interaction with the pathologist; a one-time human intervention for the entire process. This study further investigated domain-general versus domain-specific feature extraction models based on convolution and attention and examined two different feature aggregation methods, with and without BPG, showing BPG's generalizability. The results showed that using BPG boosts the performance of WSI retrieval (mean average precision at 10) by 4% and improves WSI classification (weighted-F1) by 5% compared to not using BPG. Additionally, domain-general large models and parameterized pooling produced the best-quality slide-level representations.


Assuntos
Inteligência Artificial , Medula Óssea , Humanos , Suplementos Nutricionais , Patologistas
2.
Artigo em Inglês | MEDLINE | ID: mdl-35857731

RESUMO

Convolutional neural networks (CNNs) have come to dominate vision-based deep neural network structures in both image and video models over the past decade. However, convolution-free vision Transformers (ViTs) have recently outperformed CNN-based models in image recognition. Despite this progress, building and designing video Transformers have not yet obtained the same attention in research as image-based Transformers. While there have been attempts to build video Transformers by adapting image-based Transformers for video understanding, these Transformers still lack efficiency due to the large gap between CNN-based models and Transformers regarding the number of parameters and the training settings. In this work, we propose three techniques to improve video understanding with video Transformers. First, to derive better spatiotemporal feature representation, we propose a new spatiotemporal attention scheme, termed synchronized spatiotemporal and spatial attention (SSTSA), which derives the spatiotemporal features with temporal and spatial multiheaded self-attention (MSA) modules. It also preserves the best spatial attention by another spatial self-attention module in parallel, thereby resulting in an effective Transformer encoder. Second, a motion spotlighting module is proposed to embed the short-term motion of the consecutive input frames to the regular RGB input, which is then processed with a single-stream video Transformer. Third, a simple intraclass frame interlacing method of the input clips is proposed that serves as an effective video augmentation method. Finally, our proposed techniques have been evaluated and validated with a set of extensive experiments in this study. Our video Transformer outperforms its previous counterparts on two well-known datasets, Kinetics400 and Something-Something-v2.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...