Pesquisa | Portal Regional da BVS (teste)

Keyframe extraction from laparoscopic videos based on visual saliency detection.

Loukas, Constantinos; Varytimidis, Christos; Rapantzikos, Konstantinos; Kanakis, Meletios A.

Comput Methods Programs Biomed ; 165: 13-23, 2018 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-30337068

RESUMO

BACKGROUND AND OBJECTIVE: Laparoscopic surgery offers the potential for video recording of the operation, which is important for technique evaluation, cognitive training, patient briefing and documentation. An effective way for video content representation is to extract a limited number of keyframes with semantic information. In this paper we present a novel method for keyframe extraction from individual shots of the operational video. METHODS: The laparoscopic video was first segmented into video shots using an objectness model, which was trained to capture significant changes in the endoscope field of view. Each frame of a shot was then decomposed into three saliency maps in order to model the preference of human vision to regions with higher differentiation with respect to color, motion and texture. The accumulated responses from each map provided a 3D time series of saliency variation across the shot. The time series was modeled as a multivariate autoregressive process with hidden Markov states (HMMAR model). This approach allowed the temporal segmentation of the shot into a predefined number of states. A representative keyframe was extracted from each state based on the highest state-conditional probability of the corresponding saliency vector. RESULTS: Our method was tested on 168 video shots extracted from various laparoscopic cholecystectomy operations from the publicly available Cholec80 dataset. Four state-of-the-art methodologies were used for comparison. The evaluation was based on two assessment metrics: Color Consistency Score (CCS), which measures the color distance between the ground truth (GT) and the closest keyframe, and Temporal Consistency Score (TCS), which considers the temporal proximity between GT and extracted keyframes. About 81% of the extracted keyframes matched the color content of the GT keyframes, compared to 77% yielded by the second-best method. The TCS of the proposed and the second-best method was close to 1.9 and 1.4 respectively. CONCLUSIONS: Our results demonstrated that the proposed method yields superior performance in terms of content and temporal consistency to the ground truth. The extracted keyframes provided highly semantic information that may be used for various applications related to surgical video content representation, such as workflow analysis, video summarization and retrieval.

Assuntos

Interpretação de Imagem Assistida por Computador/métodos , Laparoscopia/métodos , Gravação em Vídeo/métodos , Algoritmos , Inteligência Artificial , Colecistectomia Laparoscópica/métodos , Colecistectomia Laparoscópica/estatística & dados numéricos , Cor , Bases de Dados Factuais , Humanos , Laparoscopia/estatística & dados numéricos , Cadeias de Markov , Movimento (Física) , Reconhecimento Automatizado de Padrão/métodos , Gravação em Vídeo/estatística & dados numéricos

An embedded saliency map estimator scheme: application to video encoding.

Tsapatsoulis, Nicolas; Rapantzikos, Konstantinos; Pattichis, Constantinos.

Int J Neural Syst ; 17(4): 289-304, 2007 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-17696293

RESUMO

In this paper we propose a novel saliency-based computational model for visual attention. This model processes both top-down (goal directed) and bottom-up information. Processing in the top-down channel creates the so called skin conspicuity map and emulates the visual search for human faces performed by humans. This is clearly a goal directed task but is generic enough to be context independent. Processing in the bottom-up information channel follows the principles set by Itti et al. but it deviates from them by computing the orientation, intensity and color conspicuity maps within a unified multi-resolution framework based on wavelet subband analysis. In particular, we apply a wavelet based approach for efficient computation of the topographic feature maps. Given that wavelets and multiresolution theory are naturally connected the usage of wavelet decomposition for mimicking the center surround process in humans is an obvious choice. However, our implementation goes further. We utilize the wavelet decomposition for inline computation of the features (such as orientation angles) that are used to create the topographic feature maps. The bottom-up topographic feature maps and the top-down skin conspicuity map are then combined through a sigmoid function to produce the final saliency map. A prototype of the proposed model was realized through the TMDSDMK642-0E DSP platform as an embedded system allowing real-time operation. For evaluation purposes, in terms of perceived visual quality and video compression improvement, a ROI-based video compression setup was followed. Extended experiments concerning both MPEG-1 as well as low bit-rate MPEG-4 video encoding were conducted showing significant improvement in video compression efficiency without perceived deterioration in visual quality.

Assuntos

Atenção/fisiologia , Modelos Psicológicos , Redes Neurais de Computação , Reconhecimento Visual de Modelos/fisiologia , Humanos , Reconhecimento Automatizado de Padrão , Estimulação Luminosa/métodos , Tempo de Reação/fisiologia , Gravação em Vídeo

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA