Search | VHL Regional Portal

1.

Attention-Guided Disentangled Feature Aggregation for Video Object Detection.

Muralidhara, Shishir; Hashmi, Khurram Azeem; Pagani, Alain; Liwicki, Marcus; Stricker, Didier; Afzal, Muhammad Zeshan.

Sensors (Basel) ; 22(21)2022 Nov 07.

Article in English | MEDLINE | ID: mdl-36366281

ABSTRACT

Object detection is a computer vision task that involves localisation and classification of objects in an image. Video data implicitly introduces several challenges, such as blur, occlusion and defocus, making video object detection more challenging in comparison to still image object detection, which is performed on individual and independent images. This paper tackles these challenges by proposing an attention-heavy framework for video object detection that aggregates the disentangled features extracted from individual frames. The proposed framework is a two-stage object detector based on the Faster R-CNN architecture. The disentanglement head integrates scale, spatial and task-aware attention and applies it to the features extracted by the backbone network across all the frames. Subsequently, the aggregation head incorporates temporal attention and improves detection in the target frame by aggregating the features of the support frames. These include the features extracted from the disentanglement network along with the temporal features. We evaluate the proposed framework using the ImageNet VID dataset and achieve a mean Average Precision (mAP) of 49.8 and 52.5 using the backbones of ResNet-50 and ResNet-101, respectively. The improvement in performance over the individual baseline methods validates the efficacy of the proposed approach.

2.

A Comprehensive Survey of Depth Completion Approaches.

Khan, Muhammad Ahmed Ullah; Nazir, Danish; Pagani, Alain; Mokayed, Hamam; Liwicki, Marcus; Stricker, Didier; Afzal, Muhammad Zeshan.

Sensors (Basel) ; 22(18)2022 Sep 14.

Article in English | MEDLINE | ID: mdl-36146318

ABSTRACT

Depth maps produced by LiDAR-based approaches are sparse. Even high-end LiDAR sensors produce highly sparse depth maps, which are also noisy around the object boundaries. Depth completion is the task of generating a dense depth map from a sparse depth map. While the earlier approaches focused on directly completing this sparsity from the sparse depth maps, modern techniques use RGB images as a guidance tool to resolve this problem. Whilst many others rely on affinity matrices for depth completion. Based on these approaches, we have divided the literature into two major categories; unguided methods and image-guided methods. The latter is further subdivided into multi-branch and spatial propagation networks. The multi-branch networks further have a sub-category named image-guided filtering. In this paper, for the first time ever we present a comprehensive survey of depth completion methods. We present a novel taxonomy of depth completion approaches, review in detail different state-of-the-art techniques within each category for depth completion of LiDAR data, and provide quantitative results for the approaches on KITTI and NYUv2 depth completion benchmark datasets.

3.

Three-Dimensional Reconstruction from a Single RGB Image Using Deep Learning: A Review.

Khan, Muhammad Saif Ullah; Pagani, Alain; Liwicki, Marcus; Stricker, Didier; Afzal, Muhammad Zeshan.

J Imaging ; 8(9)2022 Aug 23.

Article in English | MEDLINE | ID: mdl-36135391

ABSTRACT

Performing 3D reconstruction from a single 2D input is a challenging problem that is trending in literature. Until recently, it was an ill-posed optimization problem, but with the advent of learning-based methods, the performance of 3D reconstruction has also significantly improved. Infinitely many different 3D objects can be projected onto the same 2D plane, which makes the reconstruction task very difficult. It is even more difficult for objects with complex deformations or no textures. This paper serves as a review of recent literature on 3D reconstruction from a single view, with a focus on deep learning methods from 2018 to 2021. Due to the lack of standard datasets or 3D shape representation methods, it is hard to compare all reviewed methods directly. However, this paper reviews different approaches for reconstructing 3D shapes as depth maps, surface normals, point clouds, and meshes; along with various loss functions and metrics used to train and evaluate these methods.

4.

Exploiting Concepts of Instance Segmentation to Boost Detection in Challenging Environments.

Hashmi, Khurram Azeem; Pagani, Alain; Liwicki, Marcus; Stricker, Didier; Afzal, Muhammad Zeshan.

Sensors (Basel) ; 22(10)2022 May 12.

Article in English | MEDLINE | ID: mdl-35632112

ABSTRACT

In recent years, due to the advancements in machine learning, object detection has become a mainstream task in the computer vision domain. The first phase of object detection is to find the regions where objects can exist. With the improvements in deep learning, traditional approaches, such as sliding windows and manual feature selection techniques, have been replaced with deep learning techniques. However, object detection algorithms face a problem when performed in low light, challenging weather, and crowded scenes, similar to any other task. Such an environment is termed a challenging environment. This paper exploits pixel-level information to improve detection under challenging situations. To this end, we exploit the recently proposed hybrid task cascade network. This network works collaboratively with detection and segmentation heads at different cascade levels. We evaluate the proposed methods on three complex datasets of ExDark, CURE-TSD, and RESIDE, and achieve a mAP of 0.71, 0.52, and 0.43, respectively. Our experimental results assert the efficacy of the proposed approach.

Subject(s)

Algorithms , Machine Learning , Face

5.

Contrastive Learning for 3D Point Clouds Classification and Shape Completion.

Nazir, Danish; Afzal, Muhammad Zeshan; Pagani, Alain; Liwicki, Marcus; Stricker, Didier.

Sensors (Basel) ; 21(21)2021 Nov 06.

Article in English | MEDLINE | ID: mdl-34770698

ABSTRACT

In this paper, we present the idea of Self Supervised learning on the shape completion and classification of point clouds. Most 3D shape completion pipelines utilize AutoEncoders to extract features from point clouds used in downstream tasks such as classification, segmentation, detection, and other related applications. Our idea is to add contrastive learning into AutoEncoders to encourage global feature learning of the point cloud classes. It is performed by optimizing triplet loss. Furthermore, local feature representations learning of point cloud is performed by adding the Chamfer distance function. To evaluate the performance of our approach, we utilize the PointNet classifier. We also extend the number of classes for evaluation from 4 to 10 to show the generalization ability of the learned features. Based on our results, embeddings generated from the contrastive AutoEncoder enhances shape completion and classification performance from 84.2% to 84.9% of point clouds achieving the state-of-the-art results with 10 classes.

6.

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution.

Hashmi, Khurram Azeem; Pagani, Alain; Liwicki, Marcus; Stricker, Didier; Afzal, Muhammad Zeshan.

J Imaging ; 7(10)2021 Oct 16.

Article in English | MEDLINE | ID: mdl-34677300

ABSTRACT

Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the existing backbone architecture. By utilizing a comparativelyightweight backbone of ResNet-50, this paper demonstrates that superior results are attainable without relying on pre- and post-processing methods, heavier backbone networks (ResNet-101, ResNeXt-152), and memory-intensive deformable convolutions. We evaluate the proposed approach on five different publicly available table detection datasets. Our CasTabDetectoRS outperforms the previous state-of-the-art results on four datasets (ICDAR-19, TableBank, UNLV, and Marmot) and accomplishes comparable results on ICDAR-17 POD. Upon comparing with previous state-of-the-art results, we obtain a significant relative error reduction of 56.36%, 20%, 4.5%, and 3.5% on the datasets of ICDAR-19, TableBank, UNLV, and Marmot, respectively. Furthermore, this paper sets a new benchmark by performing exhaustive cross-datasets evaluations to exhibit the generalization capabilities of the proposed method.

7.

From IR Images to Point Clouds to Pose: Point Cloud-Based AR Glasses Pose Estimation.

Firintepe, Ahmet; Vey, Carolin; Asteriadis, Stylianos; Pagani, Alain; Stricker, Didier.

J Imaging ; 7(5)2021 Apr 27.

Article in English | MEDLINE | ID: mdl-34460676

ABSTRACT

In this paper, we propose two novel AR glasses pose estimation algorithms from single infrared images by using 3D point clouds as an intermediate representation. Our first approach "PointsToRotation" is based on a Deep Neural Network alone, whereas our second approach "PointsToPose" is a hybrid model combining Deep Learning and a voting-based mechanism. Our methods utilize a point cloud estimator, which we trained on multi-view infrared images in a semi-supervised manner, generating point clouds based on one image only. We generate a point cloud dataset with our point cloud estimator using the HMDPose dataset, consisting of multi-view infrared images of various AR glasses with the corresponding 6-DoF poses. In comparison to another point cloud-based 6-DoF pose estimation named CloudPose, we achieve an error reduction of around 50%. Compared to a state-of-the-art image-based method, we reduce the pose estimation error by around 96%.

8.

Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments.

Ahmed, Muhammad; Hashmi, Khurram Azeem; Pagani, Alain; Liwicki, Marcus; Stricker, Didier; Afzal, Muhammad Zeshan.

Sensors (Basel) ; 21(15)2021 Jul 28.

Article in English | MEDLINE | ID: mdl-34372351

ABSTRACT

Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of deep learning-based object detection in challenging environments. However, there is no consolidated reference to cover the state of the art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present a quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.

Subject(s)

Deep Learning , Neural Networks, Computer , Algorithms , Humans

9.

Mixed reality applications in urology: Requirements and future potential.

Reis, Gerd; Yilmaz, Mehmet; Rambach, Jason; Pagani, Alain; Suarez-Ibarrola, Rodrigo; Miernik, Arkadiusz; Lesur, Paul; Minaskan, Nareg.

Ann Med Surg (Lond) ; 66: 102394, 2021 Jun.

Article in English | MEDLINE | ID: mdl-34040777

ABSTRACT

BACKGROUND: Mixed reality (MR), the computer-supported augmentation of a real environment with virtual elements, becomes ever more relevant in the medical domain, especially in urology, ranging from education and training over surgeries. We aimed to review existing MR technologies and their applications in urology. METHODS: A non-systematic review of current literature was performed using the PubMed-Medline database using the medical subject headings (MeSH) term "mixed reality", combined with one of the following terms: "virtual reality", "augmented reality", ''urology'' and "augmented virtuality". The relevant studies were utilized. RESULTS: MR applications such as MR guided systems, immersive VR headsets, AR models, MR-simulated ureteroscopy and smart glasses have enormous potential in education, training and surgical interventions of urology. Medical students, urology residents and inexperienced urologists can gain experience thanks to MR technologies. MR applications are also used in patient education before interventions. CONCLUSIONS: For surgical support, the achievable accuracy is often not sufficient. The main challenges are the non-rigid nature of the genitourinary organs, intraoperative data acquisition, online and multimodal registration and calibration of devices. However, the progress made in recent years is tremendous in all respects and the gap is constantly shrinking.

10.

SynPo-Net-Accurate and Fast CNN-Based 6DoF Object Pose Estimation Using Synthetic Training.

Su, Yongzhi; Rambach, Jason; Pagani, Alain; Stricker, Didier.

Sensors (Basel) ; 21(1)2021 Jan 05.

Article in English | MEDLINE | ID: mdl-33466293

ABSTRACT

Estimation and tracking of 6DoF poses of objects in images is a challenging problem of great importance for robotic interaction and augmented reality. Recent approaches applying deep neural networks for pose estimation have shown encouraging results. However, most of them rely on training with real images of objects with severe limitations concerning ground truth pose acquisition, full coverage of possible poses, and training dataset scaling and generalization capability. This paper presents a novel approach using a Convolutional Neural Network (CNN) trained exclusively on single-channel Synthetic images of objects to regress 6DoF object Poses directly (SynPo-Net). The proposed SynPo-Net is a network architecture specifically designed for pose regression and a proposed domain adaptation scheme transforming real and synthetic images into an intermediate domain that is better fit for establishing correspondences. The extensive evaluation shows that our approach significantly outperforms the state-of-the-art using synthetic training in terms of both accuracy and speed. Our system can be used to estimate the 6DoF pose from a single frame, or be integrated into a tracking system to provide the initial pose.

11.

Automatic Museum Audio Guide.

Vallez, Noelia; Krauss, Stephan; Espinosa-Aranda, Jose Luis; Pagani, Alain; Seirafi, Kasra; Deniz, Oscar.

Sensors (Basel) ; 20(3)2020 Jan 31.

Article in English | MEDLINE | ID: mdl-32023954

ABSTRACT

An automatic "museum audio guide" is presented as a new type of audio guide for museums. The device consists of a headset equipped with a camera that captures exhibit pictures and the eyes of things computer vision device (EoT). The EoT board is capable of recognizing artworks using features from accelerated segment test (FAST) keypoints and a random forest classifier, and is able to be used for an entire day without the need to recharge the batteries. In addition, an application logic has been implemented, which allows for a special highly-efficient behavior upon recognition of the painting. Two different use case scenarios have been implemented. The main testing was performed with a piloting phase in a real world museum. Results show that the system keeps its promises regarding its main benefit, which is simplicity of use and the user's preference of the proposed system over traditional audioguides.

12.

Eyes of Things.

Deniz, Oscar; Vallez, Noelia; Espinosa-Aranda, Jose L; Rico-Saavedra, Jose M; Parra-Patino, Javier; Bueno, Gloria; Moloney, David; Dehghani, Alireza; Dunne, Aubrey; Pagani, Alain; Krauss, Stephan; Reiser, Ruben; Waeny, Martin; Sorci, Matteo; Llewellynn, Tim; Fedorczak, Christian; Larmoire, Thierry; Herbst, Marco; Seirafi, Andre; Seirafi, Kasra.

Sensors (Basel) ; 17(5)2017 May 21.

Article in English | MEDLINE | ID: mdl-28531141

ABSTRACT

Embedded systems control and monitor a great deal of our reality. While some "classic" features are intrinsically necessary, such as low power consumption, rugged operating ranges, fast response and low cost, these systems have evolved in the last few years to emphasize connectivity functions, thus contributing to the Internet of Things paradigm. A myriad of sensing/computing devices are being attached to everyday objects, each able to send and receive data and to act as a unique node in the Internet. Apart from the obvious necessity to process at least some data at the edge (to increase security and reduce power consumption and latency), a major breakthrough will arguably come when such devices are endowed with some level of autonomous "intelligence". Intelligent computing aims to solve problems for which no efficient exact algorithm can exist or for which we cannot conceive an exact algorithm. Central to such intelligence is Computer Vision (CV), i.e., extracting meaning from images and video. While not everything needs CV, visual information is the richest source of information about the real world: people, places and things. The possibilities of embedded CV are endless if we consider new applications and technologies, such as deep learning, drones, home robotics, intelligent surveillance, intelligent toys, wearable cameras, etc. This paper describes the Eyes of Things (EoT) platform, a versatile computer vision platform tackling those challenges and opportunities.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL