Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Sensors (Basel) ; 24(7)2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38610375

RESUMO

Ultra-wideband (UWB) has gained increasing interest for providing real-time positioning to robots in GPS-denied environments. For a robot to act on this information, it also requires its heading. This is, however, not provided by UWB. To overcome this, either multiple tags are used to create a local reference frame connected to the robot or a single tag is combined with ego-motion estimation from odometry or Inertial Measurement Unit (IMU) measurements. Both odometry and the IMU suffer from drift, and it is common to use a magnetometer to correct the drift on the heading; however, magnetometers tend to become unreliable in typical GPS-denied environments. To overcome this, a lightweight particle filter was designed to run in real time. The particle filter corrects the ego-motion heading and location drift using the UWB measurements over a moving horizon time frame. The algorithm was evaluated offline using data sets collected from a ground robot that contains line-of-sight (LOS) and non-line-of-sight conditions. An RMSE of 13 cm and 0.12 (rad) was achieved with four anchors in the LOS condition. It is also shown that it can be used to provide the robot with real-time position and heading information for the robot to act on it in LOS conditions, and it is shown to be robust in both experimental conditions.

2.
Sensors (Basel) ; 24(5)2024 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-38474954

RESUMO

Generative models have the potential to revolutionize 3D extended reality. A primary obstacle is that augmented and virtual reality need real-time computing. Current state-of-the-art point cloud random generation methods are not fast enough for these applications. We introduce a vector-quantized variational autoencoder model (VQVAE) that can synthesize high-quality point clouds in milliseconds. Unlike previous work in VQVAEs, our model offers a compact sample representation suitable for conditional generation and data exploration with potential applications in rapid prototyping. We achieve this result by combining architectural improvements with an innovative approach for probabilistic random generation. First, we rethink current parallel point cloud autoencoder structures, and we propose several solutions to improve robustness, efficiency and reconstruction quality. Notable contributions in the decoder architecture include an innovative computation layer to process the shape semantic information, an attention mechanism that helps the model focus on different areas and a filter to cover possible sampling errors. Secondly, we introduce a parallel sampling strategy for VQVAE models consisting of a double encoding system, where a variational autoencoder learns how to generate the complex discrete distribution of the VQVAE, not only allowing quick inference but also describing the shape with a few global variables. We compare the proposed decoder and our VQVAE model with established and concurrent work, and we prove, one by one, the validity of the single contributions.

3.
Sensors (Basel) ; 23(19)2023 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-37836959

RESUMO

High-quality data are of utmost importance for any deep-learning application. However, acquiring such data and their annotation is challenging. This paper presents a GPU-accelerated simulator that enables the generation of high-quality, perfectly labelled data for any Time-of-Flight sensor, including LiDAR. Our approach optimally exploits the 3D graphics pipeline of the GPU, significantly decreasing data generation time while preserving compatibility with all real-time rendering engines. The presented algorithms are generic and allow users to perfectly mimic the unique sampling pattern of any such sensor. To validate our simulator, two neural networks are trained for denoising and semantic segmentation. To bridge the gap between reality and simulation, a novel loss function is introduced that requires only a small set of partially annotated real data. It enables the learning of classes for which no labels are provided in the real data, hence dramatically reducing annotation efforts. With this work, we hope to provide means for alleviating the data acquisition problem that is pertinent to deep-learning applications.

4.
Sensors (Basel) ; 23(20)2023 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-37896466

RESUMO

Keystroke dynamics is a soft biometric based on the assumption that humans always type in uniquely characteristic manners. Previous works mainly focused on analyzing the key press or release events. Unlike these methods, we explored a novel visual modality of keystroke dynamics for human identification using a single RGB-D sensor. In order to verify this idea, we created a dataset dubbed KD-MultiModal, which contains 243.2 K frames of RGB images and depth images, obtained by recording a video of hand typing with a single RGB-D sensor. The dataset comprises RGB-D image sequences of 20 subjects (10 males and 10 females) typing sentences, and each subject typed around 20 sentences. In the task, only the hand and keyboard region contributed to the person identification, so we also propose methods of extracting Regions of Interest (RoIs) for each type of data. Unlike the data of the key press or release, our dataset not only captures the velocity of pressing and releasing different keys and the typing style of specific keys or combinations of keys, but also contains rich information on the hand shape and posture. To verify the validity of our proposed data, we adopted deep neural networks to learn distinguishing features from different data representations, including RGB-KD-Net, D-KD-Net, and RGBD-KD-Net. Simultaneously, the sequence of point clouds also can be obtained from depth images given the intrinsic parameters of the RGB-D sensor, so we also studied the performance of human identification based on the point clouds. Extensive experimental results showed that our idea works and the performance of the proposed method based on RGB-D images is the best, which achieved 99.44% accuracy based on the unseen real-world data. To inspire more researchers and facilitate relevant studies, the proposed dataset will be publicly accessible together with the publication of this paper.


Assuntos
Antropologia Forense , Redes Neurais de Computação , Humanos , Postura , Biometria , Mãos
5.
IEEE Trans Image Process ; 32: 4170-4184, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37440397

RESUMO

End-to-end Long Short-Term Memory (LSTM) has been successfully applied to video summarization. However, the weakness of the LSTM model, poor generalization with inefficient representation learning for inputted nodes, limits its capability to efficiently carry out node classification within user-created videos. Given the power of Graph Neural Networks (GNNs) in representation learning, we adopted the Graph Information Bottle (GIB) to develop a Contextual Feature Transformation (CFT) mechanism that refines the temporal dual-feature, yielding a semantic representation with attention alignment. Furthermore, a novel Salient-Area-Size-based spatial attention model is presented to extract frame-wise visual features based on the observation that humans tend to focus on sizable and moving objects. Lastly, semantic representation is embedded within attention alignment under the end-to-end LSTM framework to differentiate indistinguishable images. Extensive experiments demonstrate that the proposed method outperforms State-Of-The-Art (SOTA) methods.

6.
Sensors (Basel) ; 22(21)2022 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-36366136

RESUMO

In recent years, Vehicle Make and Model Recognition (VMMR) has attracted a lot of attention as it plays a crucial role in Intelligent Transportation Systems (ITS). Accurate and efficient VMMR systems are required in real-world applications including intelligent surveillance and autonomous driving. The paper introduces a new large-scale dataset and a novel deep learning paradigm for VMMR. A new large-scale dataset dubbed Diverse large-scale VMM (DVMM) is proposed collecting image-samples with the most popular vehicle brands operating in Europe. A novel VMMR framework is proposed which follows a two-branch architecture performing make and model recognition respectively. A two-stage training procedure and a novel decision module are proposed to process the make and model predictions and compute the final model prediction. In addition, a novel metric based on the true positive rate is proposed to compare classification confusion of the proposed 2B-2S and the baseline methods. A complex experimental validation is carried out, demonstrating the generality, diversity, and practicality of the proposed DVMM dataset. The experimental results show that the proposed framework provides 93.95% accuracy over the more diverse DVMM dataset and 95.85% accuracy over traditional VMMR datasets. The proposed two-branch approach outperforms the conventional one-branch approach for VMMR over small-, medium-, and large-scale datasets by providing lower vehicle model confusion and reduced inter-make ambiguity. The paper demonstrates the advantages of the proposed two-branch VMMR paradigm in terms of robustness and lower confusion relative to single-branch designs.


Assuntos
Aprendizado Profundo , Pesquisa , Coleta de Dados , Modelos Biológicos , Inteligência
7.
Sensors (Basel) ; 22(19)2022 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-36236555

RESUMO

Dot-product attention is a powerful mechanism for capturing contextual information. Models that build on top of it have acclaimed state-of-the-art performance in various domains, ranging from sequence modelling to visual tasks. However, the main bottleneck is the construction of the attention map, which is quadratic with respect to the number of tokens in the sequence. Consequently, efficient alternatives have been developed in parallel, but it was only recently that their performances were compared and contrasted. This study performs a comparative analysis between some efficient attention mechanisms in the context of a purely attention-based spatio-temporal forecasting model used for traffic prediction. Experiments show that these methods can reduce the training times by up to 28% and the inference times by up to 31%, while the performance remains on par with the baseline.


Assuntos
Previsões
8.
Sensors (Basel) ; 22(4)2022 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-35214252

RESUMO

The paper proposes a novel post-filtering method based on convolutional neural networks (CNNs) for quality enhancement of RGB/grayscale images and video sequences. The lossy images are encoded using common image codecs, such as JPEG and JPEG2000. The video sequences are encoded using previous and ongoing video coding standards, high-efficiency video coding (HEVC) and versatile video coding (VVC), respectively. A novel deep neural network architecture is proposed to estimate fine refinement details for full-, half-, and quarter-patch resolutions. The proposed architecture is built using a set of efficient processing blocks designed based on the following concepts: (i) the multi-head attention mechanism for refining the feature maps, (ii) the weight sharing concept for reducing the network complexity, and (iii) novel block designs of layer structures for multiresolution feature fusion. The proposed method provides substantial performance improvements compared with both common image codecs and video coding standards. Experimental results on high-resolution images and standard video sequences show that the proposed post-filtering method provides average BD-rate savings of 31.44% over JPEG and 54.61% over HEVC (x265) for RGB images, Y-BD-rate savings of 26.21% over JPEG and 15.28% over VVC (VTM) for grayscale images, and 15.47% over HEVC and 14.66% over VVC for video sequences.


Assuntos
Compressão de Dados , Aprendizado Profundo , Compressão de Dados/métodos , Redes Neurais de Computação , Gravação em Vídeo/métodos
9.
Sensors (Basel) ; 22(4)2022 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-35214335

RESUMO

A novel low-power distributed Visual Sensor Network (VSN) system is proposed, which performs real-time collaborative barcode localization, tracking, and robust identification. Due to a dynamic triggering mechanism and efficient transmission protocols, communication is organized amongst the nodes themselves rather than being orchestrated by a single sink node, achieving lower congestion and significantly reducing the vulnerability of the overall system. Specifically, early detection of the moving barcode is achieved through a dynamic triggering mechanism. A hierarchical transmission protocol is designed, within which different communication protocols are used, depending on the type of data exchanged among nodes. Real-Time Transport Protocol (RTP) is employed for video communication, while the Transmission Control Protocol (TCP) and Long Range (LoRa) protocol are used for passing messages amongst the nodes in the VSN. Through an extensive experimental evaluation, we demonstrate that the proposed distributed VSN brings substantial advantages in terms of accuracy, power savings, and time complexity compared to an equivalent system performing centralized processing.


Assuntos
Redes de Comunicação de Computadores , Tecnologia sem Fio , Algoritmos , Coleta de Dados
10.
Artigo em Inglês | MEDLINE | ID: mdl-34941512

RESUMO

The early diagnosis of cerebral palsy is an area which has recently seen significant multi-disciplinary research. Diagnostic tools such as the General Movements Assessment (GMA), have produced some very promising results. However, the prospect of automating these processes may improve accessibility of the assessment and also enhance the understanding of movement development of infants. Previous works have established the viability of using pose-based features extracted from RGB video sequences to undertake classification of infant body movements based upon the GMA. In this paper, we propose a series of new and improved features, and a feature fusion pipeline for this classification task. We also introduce the RVI-38 dataset, a series of videos captured as part of routine clinical care. By utilising this challenging dataset we establish the robustness of several motion features for classification, subsequently informing the design of our proposed feature fusion framework based upon the GMA. We evaluate our proposed framework's classification performance using both the RVI-38 dataset and the publicly available MINI-RGBD dataset. We also implement several other methods from the literature for direct comparison using these two independent datasets. Our experimental results and feature analysis show that our proposed pose-based method performs well across both datasets. The proposed features afford us the opportunity to include finer detail than previous methods, and further model GMA specific body movements. These new features also allow us to take advantage of additional body-part specific information as a means of improving the overall classification performance, whilst retaining GMA relevant, interpretable, and shareable features.


Assuntos
Paralisia Cerebral , Paralisia Cerebral/diagnóstico , Humanos , Lactente , Movimento
11.
Sensors (Basel) ; 21(9)2021 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-34067191

RESUMO

In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using efficient complex processing blocks and novel attention-based residual blocks. The network takes advantage of the macro-pixel (MP) structure, specific to LF images, and processes each reconstructed MP in the luminance (Y) channel. The input patch is represented as a tensor that collects, from an MP neighbourhood, four Epipolar Plane Images (EPIs) at four different angles. The experimental results on a common LF image database showed high improvements over HEVC in terms of the structural similarity index (SSIM), with an average Y-Bjøntegaard Delta (BD)-rate savings of 36.57%, and an average Y-BD-PSNR improvement of 2.301 dB. Increased performance was achieved when the HEVC built-in filtering methods were skipped. The visual results illustrate that the enhanced image contains sharper edges and more texture details. The ablation study provides two robust solutions to reduce the inference time by 44.6% and the network complexity by 74.7%. The results demonstrate the potential of attention networks for the quality enhancement of LF images encoded by HEVC.

12.
Neural Netw ; 137: 43-53, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33549982

RESUMO

Deep learning-based methods have shown to achieve excellent results in a variety of domains, however, some important assets are absent. Quality scalability is one of them. In this work, we introduce a novel and generic neural network layer, named MaskLayer. It can be integrated in any feedforward network, allowing quality scalability by design by creating embedded feature sets. These are obtained by imposing a specific structure of the feature vector during training. To further improve the performance, a masked optimizer and a balancing gradient rescaling approach are proposed. Our experiments show that the cost of introducing scalability using MaskLayer remains limited. In order to prove its generality and applicability, we integrated the proposed techniques in existing, non-scalable networks for point cloud compression and semantic hashing with excellent results. To the best of our knowledge, this is the first work presenting a generic solution able to achieve quality scalable results within the deep learning framework.


Assuntos
Compressão de Dados/métodos , Aprendizado Profundo , Computação em Nuvem , Semântica
13.
Sensors (Basel) ; 21(1)2021 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-33401627

RESUMO

The paper proposes a novel instance segmentation method for traffic videos devised for deployment on real-time embedded devices. A novel neural network architecture is proposed using a multi-resolution feature extraction backbone and improved network designs for the object detection and instance segmentation branches. A novel post-processing method is introduced to ensure a reduced rate of false detection by evaluating the quality of the output masks. An improved network training procedure is proposed based on a novel label assignment algorithm. An ablation study on speed-vs.-performance trade-off further modifies the two branches and replaces the conventional ResNet-based performance-oriented backbone with a lightweight speed-oriented design. The proposed architectural variations achieve real-time performance when deployed on embedded devices. The experimental results demonstrate that the proposed instance segmentation method for traffic videos outperforms the you only look at coefficients algorithm, the state-of-the-art real-time instance segmentation method. The proposed architecture achieves qualitative results with 31.57 average precision on the COCO dataset, while its speed-oriented variations achieve speeds of up to 66.25 frames per second on the Jetson AGX Xavier module.

14.
IEEE Trans Image Process ; 30: 1072-1085, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33290219

RESUMO

Multiview video allows for simultaneously presenting dynamic imaging from multiple viewpoints, enabling a broad range of immersive applications. This paper proposes a novel super-resolution (SR) approach to mixed-resolution (MR) multiview video, whereby the low-resolution (LR) videos produced by MR camera setups are up-sampled based on the neighboring HR videos. Our solution analyzes the statistical correlation of different resolutions between multiple views, and introduces a low-rank prior based SR optimization framework using local linear embedding and weighted nuclear norm minimization. The target HR patch is reconstructed by learning texture details from the neighboring HR camera views using local linear embedding. A low-rank constrained patch optimization solution is introduced to effectively restrain visual artifacts and the ADMM framework is used to solve the resulting optimization problem. Comprehensive experiments including objective and subjective test metrics demonstrate that the proposed method outperforms the state-of-the-art SR methods for MR multiview video.

15.
Sensors (Basel) ; 20(21)2020 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-33143080

RESUMO

The paper presents a novel depth-estimation method for light-field (LF) images based on innovative multi-stereo matching and machine-learning techniques. In the first stage, a novel block-based stereo matching algorithm is employed to compute the initial estimation. The proposed algorithm is specifically designed to operate on any pair of sub-aperture images (SAIs) in the LF image and to compute the pair's corresponding disparity map. For the central SAI, a disparity fusion technique is proposed to compute the initial disparity map based on all available pairwise disparities. In the second stage, a novel pixel-wise deep-learning (DL)-based method for residual error prediction is employed to further refine the disparity estimation. A novel neural network architecture is proposed based on a new structure of layers. The proposed DL-based method is employed to predict the residual error of the initial estimation and to refine the final disparity map. The experimental results demonstrate the superiority of the proposed framework and reveal that the proposed method achieves an average improvement of 15.65% in root mean squared error (RMSE), 43.62% in mean absolute error (MAE), and 5.03% in structural similarity index (SSIM) over machine-learning-based state-of-the-art methods.

16.
Sensors (Basel) ; 20(3)2020 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-32041201

RESUMO

The Internet of Things (IoT) domain presents a wide spectrum of technologies for building IoT applications. The requirements are varying from one application to another granting uniqueness to each IoT system. Each application demands custom implementations to achieve efficient, secure and cost-effective environments. They pose a set of properties that cannot be addressed by a single-based protocol IoT network. Such properties are achievable by designing a heterogeneous IoT system, which integrates diverse IoT protocols and provides a network management solution to efficiently manage the system components. This paper proposes an IoT message-based communication model applied atop the IoT protocols in order to achieve functional scalability and network management transparency agnostic to the employed communication protocol. The paper evaluates the proposed communication model and proves its functional scalability in a heterogeneous IoT system. The experimental assessment compares the payload size of the proposed system with respect to the LwM2M standard, a protocol designed specifically for IoT applications. In addition, the paper discusses the energy consumption introduced by the proposed model as well as the options available to reduce such impact.

17.
Sensors (Basel) ; 18(2)2018 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-29360798

RESUMO

So far, existing sub-GHz wireless communication technologies focused on low-bandwidth, long-range communication with large numbers of constrained devices. Although these characteristics are fine for many Internet of Things (IoT) applications, more demanding application requirements could not be met and legacy Internet technologies such as Transmission Control Protocol/Internet Protocol (TCP/IP) could not be used. This has changed with the advent of the new IEEE 802.11ah Wi-Fi standard, which is much more suitable for reliable bidirectional communication and high-throughput applications over a wide area (up to 1 km). The standard offers great possibilities for network performance optimization through a number of physical- and link-layer configurable features. However, given that the optimal configuration parameters depend on traffic patterns, the standard does not dictate how to determine them. Such a large number of configuration options can lead to sub-optimal or even incorrect configurations. Therefore, we investigated how two key mechanisms, Restricted Access Window (RAW) grouping and Traffic Indication Map (TIM) segmentation, influence scalability, throughput, latency and energy efficiency in the presence of bidirectional TCP/IP traffic. We considered both high-throughput video streaming traffic and large-scale reliable sensing traffic and investigated TCP behavior in both scenarios when the link layer introduces long delays. This article presents the relations between attainable throughput per station and attainable number of stations, as well as the influence of RAW, TIM and TCP parameters on both. We found that up to 20 continuously streaming IP-cameras can be reliably connected via IEEE 802.11ah with a maximum average data rate of 160 kbps, whereas 10 IP-cameras can achieve average data rates of up to 255 kbps over 200 m. Up to 6960 stations transmitting every 60 s can be connected over 1 km with no lost packets. The presented results enable the fine tuning of RAW and TIM parameters for throughput-demanding reliable applications (i.e., video streaming, firmware updates) on one hand, and very dense low-throughput reliable networks with bidirectional traffic on the other hand.

18.
Opt Express ; 24(20): 23094-23108, 2016 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-27828375

RESUMO

Many robust phase unwrapping algorithms are computationally very time-consuming, making them impractical for handling large datasets or real-time applications. In this paper, we propose a generic framework using a novel wavelet transform that can be combined with many types of phase unwrapping algorithms. By inserting reversible modulo operators in the wavelet transform, the number of coefficients that need to be unwrapped is significantly reduced, which results in large computational gains. The algorithm is tested on various types of wrapped phase imagery, reporting speedup factors of up to 500. The source code of the algorithm is publicly available.

19.
Opt Express ; 23(17): 22149-61, 2015 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-26368189

RESUMO

We propose a novel fast method for full parallax computer-generated holograms with occlusion processing, suitable for volumetric data such as point clouds. A novel light wave propagation strategy relying on the sequential use of the wavefront recording plane method is proposed, which employs look-up tables in order to reduce the computational complexity in the calculation of the fields. Also, a novel technique for occlusion culling with little additional computation cost is introduced. Additionally, the method adheres a Gaussian distribution to the individual points in order to improve visual quality. Performance tests show that for a full-parallax high-definition CGH a speedup factor of more than 2,500 compared to the ray-tracing method can be achieved without hardware acceleration.

20.
IEEE Trans Image Process ; 21(4): 1934-49, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22203710

RESUMO

In the context of low-cost video encoding, distributed video coding (DVC) has recently emerged as a potential candidate for uplink-oriented applications. This paper builds on a concept of correlation channel (CC) modeling, which expresses the correlation noise as being statistically dependent on the side information (SI). Compared with classical side-information-independent (SII) noise modeling adopted in current DVC solutions, it is theoretically proven that side-information-dependent (SID) modeling improves the Wyner-Ziv coding performance. Anchored in this finding, this paper proposes a novel algorithm for online estimation of the SID CC parameters based on already decoded information. The proposed algorithm enables bit-plane-by-bit-plane successive refinement of the channel estimation leading to progressively improved accuracy. Additionally, the proposed algorithm is included in a novel DVC architecture that employs a competitive hash-based motion estimation technique to generate high-quality SI at the decoder. Experimental results corroborate our theoretical gains and validate the accuracy of the channel estimation algorithm. The performance assessment of the proposed architecture shows remarkable and consistent coding gains over a germane group of state-of-the-art distributed and standard video codecs, even under strenuous conditions, i.e., large groups of pictures and highly irregular motion content.


Assuntos
Algoritmos , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Fotografação/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Gráficos por Computador , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Estatística como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...