Search | VHL Regional Portal

Learning Degradation-Robust Spatiotemporal Frequency-Transformer for Video Super-Resolution.

Qiu, Zhongwei; Yang, Huan; Fu, Jianlong; Liu, Daochang; Xu, Chang; Fu, Dongmei.

IEEE Trans Pattern Anal Mach Intell ; 45(12): 14888-14904, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37669199

ABSTRACT

Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges remain to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. This work proposes a novel degradation-robust Frequency-Transformer (FTVSR++) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a "divided attention" which conducts joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR++ outperforms state-of-the-art methods on different low-quality videos with clear visual margins.

Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark.

Wagner, Martin; Müller-Stich, Beat-Peter; Kisilenko, Anna; Tran, Duc; Heger, Patrick; Mündermann, Lars; Lubotsky, David M; Müller, Benjamin; Davitashvili, Tornike; Capek, Manuela; Reinke, Annika; Reid, Carissa; Yu, Tong; Vardazaryan, Armine; Nwoye, Chinedu Innocent; Padoy, Nicolas; Liu, Xinyang; Lee, Eung-Joo; Disch, Constantin; Meine, Hans; Xia, Tong; Jia, Fucang; Kondo, Satoshi; Reiter, Wolfgang; Jin, Yueming; Long, Yonghao; Jiang, Meirui; Dou, Qi; Heng, Pheng Ann; Twick, Isabell; Kirtac, Kadir; Hosgor, Enes; Bolmgren, Jon Lindström; Stenzel, Michael; von Siemens, Björn; Zhao, Long; Ge, Zhenxiao; Sun, Haiming; Xie, Di; Guo, Mengqi; Liu, Daochang; Kenngott, Hannes G; Nickel, Felix; Frankenberg, Moritz von; Mathis-Ullrich, Franziska; Kopp-Schneider, Annette; Maier-Hein, Lena; Speidel, Stefanie; Bodenstedt, Sebastian.

Med Image Anal ; 86: 102770, 2023 05.

Article in English | MEDLINE | ID: mdl-36889206

ABSTRACT

PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery.

Subject(s)

Artificial Intelligence , Benchmarking , Humans , Workflow , Algorithms , Machine Learning

Clearness of operating field: a surrogate for surgical skills on in vivo clinical data.

Liu, Daochang; Jiang, Tingting; Wang, Yizhou; Miao, Rulin; Shan, Fei; Li, Ziyu.

Int J Comput Assist Radiol Surg ; 15(11): 1817-1824, 2020 Nov.

Article in English | MEDLINE | ID: mdl-33044734

ABSTRACT

PURPOSE: Automatic surgical skill assessment is an emerging field beneficial to both efficiency and quality of surgical education and practice. Prior works largely evaluate skills on elementary tasks performed in the simulation laboratory, which cannot fully reflect the variety of intraoperative circumstances in the real operating room. In this paper, we attempt to fill this gap by expanding surgical skill assessment onto a clinical dataset including fifty-seven in vivo surgeries. METHODS: To tackle the workflow and device constraints in the clinical setting, we propose a robust and non-interruptive surrogate for surgical skills, namely the clearness of operating field (COF), which shows strong correlation with overall skills and high inter-annotator consistency on our clinical data. Then, an automatic model based on neural networks is developed to regress surgical skills through the surrogate of COF using only video as input. RESULTS: The automatic model achieves 0.595 Spearman's correlation with the ground truth of overall technical skill, which even exceeds the human performance of junior surgeons. Moreover, an exploratory study is conducted to validate the skill predictions against the clinical outcomes of patients. CONCLUSION: Our results demonstrate that the surrogate of COF is promising and the approach is potentially applicable to clinical practice.

Subject(s)

Clinical Competence , Computer Simulation , Neural Networks, Computer , Operating Rooms , Workflow , Humans , Surgeons

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL