Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 154-180, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-32750812

RESUMO

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of µs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.


Assuntos
Algoritmos , Robótica , Redes Neurais de Computação
2.
Sensors (Basel) ; 21(23)2021 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-34883852

RESUMO

Event-based vision sensors show great promise for use in embedded applications requiring low-latency passive sensing at a low computational cost. In this paper, we present an event-based algorithm that relies on an Extended Kalman Filter for 6-Degree of Freedom sensor pose estimation. The algorithm updates the sensor pose event-by-event with low latency (worst case of less than 2 µs on an FPGA). Using a single handheld sensor, we test the algorithm on multiple recordings, ranging from a high contrast printed planar scene to a more natural scene consisting of objects viewed from above. The pose is accurately estimated under rapid motions, up to 2.7 m/s. Thereafter, an extension to multiple sensors is described and tested, highlighting the improved performance of such a setup, as well as the integration with an off-the-shelf mapping algorithm to allow point cloud updates with a 3D scene and enhance the potential applications of this visual odometry solution.


Assuntos
Algoritmos , Movimento (Física)
3.
Front Neurosci ; 14: 135, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32153357

RESUMO

We present the first purely event-based, energy-efficient approach for dynamic object detection and categorization with a freely moving event camera. Compared to traditional cameras, event-based object recognition systems are considerably behind in terms of accuracy and algorithmic maturity. To this end, this paper presents an event-based feature extraction method devised by accumulating local activity across the image frame and then applying principal component analysis (PCA) to the normalized neighborhood region. Subsequently, we propose a backtracking-free k-d tree mechanism for efficient feature matching by taking advantage of the low-dimensionality of the feature representation. Additionally, the proposed k-d tree mechanism allows for feature selection to obtain a lower-dimensional object representation when hardware resources are limited to implement PCA. Consequently, the proposed system can be realized on a field-programmable gate array (FPGA) device leading to high performance over resource ratio. The proposed system is tested on real-world event-based datasets for object categorization, showing superior classification performance compared to state-of-the-art algorithms. Additionally, we verified the real-time FPGA performance of the proposed object detection method, trained with limited data as opposed to deep learning methods, under a closed-loop aerial vehicle flight mode. We also compare the proposed object categorization framework to pre-trained convolutional neural networks using transfer learning and highlight the drawbacks of using frame-based sensors under dynamic camera motion. Finally, we provide critical insights about the feature extraction method and the classification parameters on the system performance, which aids in understanding the framework to suit various low-power (less than a few watts) application scenarios.

4.
IEEE Trans Pattern Anal Mach Intell ; 42(11): 2767-2780, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-31144625

RESUMO

We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-words classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101); (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) Statistical bootstrapping is leveraged with online learning for overcoming the low-sample problem during the one-shot learning of the tracker, (ii) Cyclical shifts are induced in the log-polar domain of the DART descriptor to achieve robustness to object scale and rotation variations; (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset; (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.

5.
IEEE Trans Neural Netw Learn Syst ; 31(9): 3649-3657, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-31714243

RESUMO

In this article, we present a systematic computational model to explore brain-based computation for object recognition. The model extracts temporal features embedded in address-event representation (AER) data and discriminates different objects by using spiking neural networks (SNNs). We use multispike encoding to extract temporal features contained in the AER data. These temporal patterns are then learned through the tempotron learning rule. The presented model is consistently implemented in a temporal learning framework, where the precise timing of spikes is considered in the feature-encoding and learning process. A noise-reduction method is also proposed by calculating the correlation of an event with the surrounding spatial neighborhood based on the recently proposed time-surface technique. The model evaluated on wide spectrum data sets (MNIST, N-MNIST, MNIST-DVS, AER Posture, and Poker Card) demonstrates its superior recognition performance, especially for the events with noise.

6.
IEEE Trans Biomed Circuits Syst ; 12(4): 860-870, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29994132

RESUMO

This paper describes a fully spike-based neural network for optical flow estimation from dynamic vision sensor data. A low power embedded implementation of the method, which combines the asynchronous time-based image sensor with IBM's TrueNorth Neurosynaptic System, is presented. The sensor generates spikes with submillisecond resolution in response to scene illumination changes. These spike are processed by a spiking neural network running on TrueNorth with a 1-ms resolution to accurately determine the order and time difference of spikes from neighbouring pixels, and therefore infer the velocity. The spiking neural network is a variant of the Barlow Levick method for optical flow estimation. The system is evaluated on two recordings for which ground truth motion is available, and achieves an average endpoint error of 11% at an estimated power budget of under 80 mW for the sensor and computation.


Assuntos
Redes Neurais de Computação , Algoritmos , Modelos Neurológicos
7.
IEEE Trans Biomed Circuits Syst ; 12(4): 927-939, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29994268

RESUMO

Vision processing with dynamic vision sensors (DVSs) is becoming increasingly popular. This type of a bio-inspired vision sensor does not record static images. The DVS pixel activity relies on the changes in light intensity. In this paper, we introduce a platform for the object recognition with a DVS in which the sensor is installed on a moving pan-tilt unit in a closed loop with a recognition neural network. This neural network is trained to recognize objects observed by a DVS, while the pan-tilt unit is moved to emulate micro-saccades. We show that performing more saccades in different directions can result in having more information about the object, and therefore, more accurate object recognition is possible. However, in high-performance and low-latency platforms, performing additional saccades adds latency and power consumption. Here, we show that the number of saccades can be reduced while keeping the same recognition accuracy by performing intelligent saccadic movements, in a closed action-perception smart loop. We propose an algorithm for smart saccadic movement decisions that can reduce the number of necessary saccades to half, on average, for a predefined accuracy on the N-MNIST dataset. Additionally, we show that by replacing this control algorithm with an artificial neural network that learns to control the saccades, we can also reduce to half the average number of saccades needed for the N-MNIST recognition.


Assuntos
Redes Neurais de Computação , Algoritmos , Técnicas Biossensoriais , Humanos , Visão Ocular/fisiologia
8.
IEEE Trans Neural Netw Learn Syst ; 29(10): 5030-5044, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-29994752

RESUMO

As the interest in event-based vision sensors for mobile and aerial applications grows, there is an increasing need for high-speed and highly robust algorithms for performing visual tasks using event-based data. As event rate and network structure have a direct impact on the power consumed by such systems, it is important to explore the efficiency of the event-based encoding used by these sensors. The work presented in this paper represents the first study solely focused on the effects of both spatial and temporal downsampling on event-based vision data and makes use of a variety of data sets chosen to fully explore and characterize the nature of downsampling operations. The results show that both spatial downsampling and temporal downsampling produce improved classification accuracy and, additionally, a lower overall data rate. A finding is particularly relevant for bandwidth and power constrained systems. For a given network containing 1000 hidden layer neurons, the spatially downsampled systems achieved a best case accuracy of 89.38% on N-MNIST as opposed to 81.03% with no downsampling at the same hidden layer size. On the N-Caltech101 data set, the downsampled system achieved a best case accuracy of 18.25%, compared with 7.43% achieved with no downsampling. The results show that downsampling is an important preprocessing technique in event-based visual processing, especially for applications sensitive to power consumption and transmission bandwidth.

9.
Front Neurosci ; 12: 118, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29556172

RESUMO

Asynchronous event-based sensors, or "silicon retinae," are a new class of vision sensors inspired by biological vision systems. The output of these sensors often contains a significant number of noise events along with the signal. Filtering these noise events is a common preprocessing step before using the data for tasks such as tracking and classification. This paper presents a novel spiking neural network-based approach to filtering noise events from data captured by an Asynchronous Time-based Image Sensor on a neuromorphic processor, the IBM TrueNorth Neurosynaptic System. The significant contribution of this work is that it demonstrates our proposed filtering algorithm outperforms the traditional nearest neighbor noise filter in achieving higher signal to noise ratio (~10 dB higher) and retaining the events related to signal (~3X more). In addition, for our envisioned application of object tracking and classification under some parameter settings, it can also generate some of the missing events in the spatial neighborhood of the signal for all classes of moving objects in the data which are unattainable using the nearest neighbor filter.

10.
Front Neurosci ; 11: 535, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29051722

RESUMO

Compared to standard frame-based cameras, biologically-inspired event-based sensors capture visual information with low latency and minimal redundancy. These event-based sensors are also far less prone to motion blur than traditional cameras, and still operate effectively in high dynamic range scenes. However, classical framed-based algorithms are not typically suitable for these event-based data and new processing algorithms are required. This paper focuses on the problem of depth estimation from a stereo pair of event-based sensors. A fully event-based stereo depth estimation algorithm which relies on message passing is proposed. The algorithm not only considers the properties of a single event but also uses a Markov Random Field (MRF) to consider the constraints between the nearby events, such as disparity uniqueness and depth continuity. The method is tested on five different scenes and compared to other state-of-art event-based stereo matching methods. The results show that the method detects more stereo matches than other methods, with each match having a higher accuracy. The method can operate in an event-driven manner where depths are reported for individual events as they are received, or the network can be queried at any time to generate a sparse depth frame which represents the current state of the network.

11.
IEEE Trans Pattern Anal Mach Intell ; 39(7): 1346-1359, 2017 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-27411216

RESUMO

This paper describes novel event-based spatio-temporal features called time-surfaces and how they can be used to create a hierarchical event-based pattern recognition architecture. Unlike existing hierarchical architectures for pattern recognition, the presented model relies on a time oriented approach to extract spatio-temporal features from the asynchronously acquired dynamics of a visual scene. These dynamics are acquired using biologically inspired frameless asynchronous event-driven vision sensors. Similarly to cortical structures, subsequent layers in our hierarchy extract increasingly abstract features using increasingly large spatio-temporal windows. The central concept is to use the rich temporal information provided by events to create contexts in the form of time-surfaces which represent the recent temporal activity within a local spatial neighborhood. We demonstrate that this concept can robustly be used at all stages of an event-based hierarchical model. First layer feature units operate on groups of pixels, while subsequent layer feature units operate on the output of lower level feature units. We report results on a previously published 36 class character recognition task and a four class canonical dynamic card pip task, achieving near 100 percent accuracy on each. We introduce a new seven class moving face recognition task, achieving 79 percent accuracy.This paper describes novel event-based spatio-temporal features called time-surfaces and how they can be used to create a hierarchical event-based pattern recognition architecture. Unlike existing hierarchical architectures for pattern recognition, the presented model relies on a time oriented approach to extract spatio-temporal features from the asynchronously acquired dynamics of a visual scene. These dynamics are acquired using biologically inspired frameless asynchronous event-driven vision sensors. Similarly to cortical structures, subsequent layers in our hierarchy extract increasingly abstract features using increasingly large spatio-temporal windows. The central concept is to use the rich temporal information provided by events to create contexts in the form of time-surfaces which represent the recent temporal activity within a local spatial neighborhood. We demonstrate that this concept can robustly be used at all stages of an event-based hierarchical model. First layer feature units operate on groups of pixels, while subsequent layer feature units operate on the output of lower level feature units. We report results on a previously published 36 class character recognition task and a four class canonical dynamic card pip task, achieving near 100 percent accuracy on each. We introduce a new seven class moving face recognition task, achieving 79 percent accuracy.

12.
Front Neurosci ; 10: 184, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27199646

RESUMO

The growing demands placed upon the field of computer vision have renewed the focus on alternative visual scene representations and processing paradigms. Silicon retinea provide an alternative means of imaging the visual environment, and produce frame-free spatio-temporal data. This paper presents an investigation into event-based digit classification using N-MNIST, a neuromorphic dataset created with a silicon retina, and the Synaptic Kernel Inverse Method (SKIM), a learning method based on principles of dendritic computation. As this work represents the first large-scale and multi-class classification task performed using the SKIM network, it explores different training patterns and output determination methods necessary to extend the original SKIM method to support multi-class problems. Making use of SKIM networks applied to real-world datasets, implementing the largest hidden layer sizes and simultaneously training the largest number of output neurons, the classification system achieved a best-case accuracy of 92.87% for a network containing 10,000 hidden layer neurons. These results represent the highest accuracies achieved against the dataset to date and serve to validate the application of the SKIM method to event-based visual classification tasks. Additionally, the study found that using a square pulse as the supervisory training signal produced the highest accuracy for most output determination methods, but the results also demonstrate that an exponential pattern is better suited to hardware implementations as it makes use of the simplest output determination method based on the maximum value.

13.
Front Neurosci ; 9: 437, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26635513

RESUMO

Creating datasets for Neuromorphic Vision is a challenging task. A lack of available recordings from Neuromorphic Vision sensors means that data must typically be recorded specifically for dataset creation rather than collecting and labeling existing data. The task is further complicated by a desire to simultaneously provide traditional frame-based recordings to allow for direct comparison with traditional Computer Vision algorithms. Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. Moving the sensor rather than the scene or image is a more biologically realistic approach to sensing and eliminates timing artifacts introduced by monitor updates when simulating motion on a computer monitor. We present conversion of two popular image datasets (MNIST and Caltech101) which have played important roles in the development of Computer Vision, and we provide performance metrics on these datasets using spike-based recognition algorithms. This work contributes datasets for future use in the field, as well as results from spike-based algorithms against which future works can compare. Furthermore, by converting datasets already popular in Computer Vision, we enable more direct comparison with frame-based approaches.

14.
Front Neurosci ; 9: 374, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26528120

RESUMO

Neuromorphic Vision sensors have improved greatly since the first silicon retina was presented almost three decades ago. They have recently matured to the point where they are commercially available and can be operated by laymen. However, despite improved availability of sensors, there remains a lack of good datasets, while algorithms for processing spike-based visual data are still in their infancy. On the other hand, frame-based computer vision algorithms are far more mature, thanks in part to widely accepted datasets which allow direct comparison between algorithms and encourage competition. We are presented with a unique opportunity to shape the development of Neuromorphic Vision benchmarks and challenges by leveraging what has been learnt from the use of datasets in frame-based computer vision. Taking advantage of this opportunity, in this paper we review the role that benchmarks and challenges have played in the advancement of frame-based computer vision, and suggest guidelines for the creation of Neuromorphic Vision benchmarks and challenges. We also discuss the unique challenges faced when benchmarking Neuromorphic Vision algorithms, particularly when attempting to provide direct comparison with frame-based computer vision.

15.
IEEE Trans Pattern Anal Mach Intell ; 37(10): 2028-40, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26353184

RESUMO

This paper introduces a spiking hierarchical model for object recognition which utilizes the precise timing information inherently present in the output of biologically inspired asynchronous address event representation (AER) vision sensors. The asynchronous nature of these systems frees computation and communication from the rigid predetermined timing enforced by system clocks in conventional systems. Freedom from rigid timing constraints opens the possibility of using true timing to our advantage in computation. We show not only how timing can be used in object recognition, but also how it can in fact simplify computation. Specifically, we rely on a simple temporal-winner-take-all rather than more computationally intensive synchronous operations typically used in biologically inspired neural networks for object recognition. This approach to visual computation represents a major paradigm shift from conventional clocked systems and can find application in other sensory modalities and computational tasks. We showcase effectiveness of the approach by achieving the highest reported accuracy to date (97.5% ± 3.5%) for a previously published four class card pip recognition task and an accuracy of 84.9% ± 1.9% for a new more difficult 36 class character recognition task.


Assuntos
Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Modelos Neurológicos
16.
Front Neurosci ; 9: 522, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26834547

RESUMO

Pose estimation is a fundamental step in many artificial vision tasks. It consists of estimating the 3D pose of an object with respect to a camera from the object's 2D projection. Current state of the art implementations operate on images. These implementations are computationally expensive, especially for real-time applications. Scenes with fast dynamics exceeding 30-60 Hz can rarely be processed in real-time using conventional hardware. This paper presents a new method for event-based 3D object pose estimation, making full use of the high temporal resolution (1 µs) of asynchronous visual events output from a single neuromorphic camera. Given an initial estimate of the pose, each incoming event is used to update the pose by combining both 3D and 2D criteria. We show that the asynchronous high temporal resolution of the neuromorphic camera allows us to solve the problem in an incremental manner, achieving real-time performance at an update rate of several hundreds kHz on a conventional laptop. We show that the high temporal resolution of neuromorphic cameras is a key feature for performing accurate pose estimation. Experiments are provided showing the performance of the algorithm on real data, including fast moving objects, occlusions, and cases where the neuromorphic camera and the object are both in motion.

17.
IEEE Trans Neural Netw Learn Syst ; 24(8): 1239-52, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24808564

RESUMO

Recognition of objects in still images has traditionally been regarded as a difficult computational problem. Although modern automated methods for visual object recognition have achieved steadily increasing recognition accuracy, even the most advanced computational vision approaches are unable to obtain performance equal to that of humans. This has led to the creation of many biologically inspired models of visual object recognition, among them the hierarchical model and X (HMAX) model. HMAX is traditionally known to achieve high accuracy in visual object recognition tasks at the expense of significant computational complexity. Increasing complexity, in turn, increases computation time, reducing the number of images that can be processed per unit time. In this paper we describe how the computationally intensive and biologically inspired HMAX model for visual object recognition can be modified for implementation on a commercial field-programmable aate Array, specifically the Xilinx Virtex 6 ML605 evaluation board with XC6VLX240T FPGA. We show that with minor modifications to the traditional HMAX model we can perform recognition on images of size 128 × 128 pixels at a rate of 190 images per second with a less than 1% loss in recognition accuracy in both binary and multiclass visual object recognition tasks.


Assuntos
Modelos Biológicos , Reconhecimento Automatizado de Padrão , Reconhecimento Visual de Modelos , Reconhecimento Psicológico , Algoritmos , Simulação por Computador , Humanos
18.
IEEE Trans Neural Netw ; 21(12): 1950-62, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20959265

RESUMO

Spiking neurons and spiking neural circuits are finding uses in a multitude of tasks such as robotic locomotion control, neuroprosthetics, visual sensory processing, and audition. The desired neural output is achieved through the use of complex neuron models, or by combining multiple simple neurons into a network. In either case, a means for configuring the neuron or neural circuit is required. Manual manipulation of parameters is both time consuming and non-intuitive due to the nonlinear relationship between parameters and the neuron's output. The complexity rises even further as the neurons are networked and the systems often become mathematically intractable. In large circuits, the desired behavior and timing of action potential trains may be known but the timing of the individual action potentials is unknown and unimportant, whereas in single neuron systems the timing of individual action potentials is critical. In this paper, we automate the process of finding parameters. To configure a single neuron we derive a maximum likelihood method for configuring a neuron model, specifically the Mihalas-Niebur Neuron. Similarly, to configure neural circuits, we show how we use genetic algorithms (GAs) to configure parameters for a network of simple integrate and fire with adaptation neurons. The GA approach is demonstrated both in software simulation and hardware implementation on a reconfigurable custom very large scale integration chip.


Assuntos
Potenciais de Ação/fisiologia , Modelos Neurológicos , Neurônios/fisiologia , Algoritmos , Simulação por Computador , Redes Neurais de Computação , Dinâmica não Linear
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...