Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 785-795, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35196224

RESUMO

Dynamic Vision Sensor (DVS) event camera output includes uninformative background activity (BA) noise events that increase dramatically under dim lighting. Existing denoising algorithms are not effective under these high noise conditions. Furthermore, it is difficult to quantitatively compare algorithm accuracy. This paper proposes a novel framework to better quantify BA denoising algorithms by measuring receiver operating characteristics with known mixtures of signal and noise DVS events. New datasets for stationary and moving camera applications of DVS in surveillance and driving are used to compare 3 new low-cost algorithms: Algorithm 1 checks distance to past events using a tiny fixed size window and removes most of the BA while preserving most of the signal for stationary camera scenarios. Algorithm 2 uses a memory proportional to the number of pixels for improved correlation checking. Compared with existing methods, it removes more noise while preserving more signal. Algorithm 3 uses a lightweight multilayer perceptron classifier driven by local event time surfaces to achieve the best accuracy over all datasets. The code and data are shared with the paper as DND21.

2.
Artigo em Inglês | MEDLINE | ID: mdl-35687629

RESUMO

Long short-term memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data, such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this article proposes a new accelerator called "Spartus" that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new column-balanced targeted dropout (CBTD) structured pruning method, producing structured sparse weight matrices for a balanced workload. The pruned networks running on Spartus hardware achieve weight sparsity levels of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 µ s. Exploiting spatio-temporal sparsity on our test LSTM network using the TIMIT dataset leads to 46 × speedup of Spartus over its theoretical hardware performance to achieve 9.4-TOp/s effective batch-1 throughput and 1.1-TOp/s/W power efficiency.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 154-180, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-32750812

RESUMO

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of µs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.


Assuntos
Algoritmos , Robótica , Redes Neurais de Computação
4.
IEEE Trans Neural Netw Learn Syst ; 30(3): 644-656, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30047912

RESUMO

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though graphical processing units are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1×1 to 7×7 . NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq field-programmable gate array (FPGA) platform and presented the results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Postsynthesis simulations using Mentor Modelsim in a 28-nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the multiply-accumulate units, and achieves a power efficiency of over 3 TOp/s/W in a core area of 6.3 mm2. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real-time interactive demonstrations.

5.
IEEE Trans Pattern Anal Mach Intell ; 40(10): 2402-2412, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-29990121

RESUMO

Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high-speed motions or in scenes characterized by high dynamic range. These features, along with a very low power consumption, make event cameras an ideal complement to standard cameras for VR/AR and video game applications. With these applications in mind, this paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. We successfully evaluate the method in both indoor and outdoor scenes and show that-because of the technological advantages of the event camera-our pipeline works in scenes characterized by high-speed motion, which are still inaccessible to standard cameras.

6.
Front Neurosci ; 12: 23, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29479300

RESUMO

Event-driven neuromorphic spiking sensors such as the silicon retina and the silicon cochlea encode the external sensory stimuli as asynchronous streams of spikes across different channels or pixels. Combining state-of-art deep neural networks with the asynchronous outputs of these sensors has produced encouraging results on some datasets but remains challenging. While the lack of effective spiking networks to process the spike streams is one reason, the other reason is that the pre-processing methods required to convert the spike streams to frame-based features needed for the deep networks still require further investigation. This work investigates the effectiveness of synchronous and asynchronous frame-based features generated using spike count and constant event binning in combination with the use of a recurrent neural network for solving a classification task using N-TIDIGITS18 dataset. This spike-based dataset consists of recordings from the Dynamic Audio Sensor, a spiking silicon cochlea sensor, in response to the TIDIGITS audio dataset. We also propose a new pre-processing method which applies an exponential kernel on the output cochlea spikes so that the interspike timing information is better preserved. The results from the N-TIDIGITS18 dataset show that the exponential features perform better than the spike count features, with over 91% accuracy on the digit classification task. This accuracy corresponds to an improvement of at least 2.5% over the use of spike count features, establishing a new state of the art for this dataset.

7.
IEEE Trans Biomed Circuits Syst ; 12(1): 123-136, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29377801

RESUMO

Applications requiring detection of small visual contrast require high sensitivity. Event cameras can provide higher dynamic range (DR) and reduce data rate and latency, but most existing event cameras have limited sensitivity. This paper presents the results of a 180-nm Towerjazz CIS process vision sensor called SDAVIS192. It outputs temporal contrast dynamic vision sensor (DVS) events and conventional active pixel sensor frames. The SDAVIS192 improves on previous DAVIS sensors with higher sensitivity for temporal contrast. The temporal contrast thresholds can be set down to 1% for negative changes in logarithmic intensity (OFF events) and down to 3.5% for positive changes (ON events). The achievement is possible through the adoption of an in-pixel preamplification stage. This preamplifier reduces the effective intrascene DR of the sensor (70 dB for OFF and 50 dB for ON), but an automated operating region control allows up to at least 110-dB DR for OFF events. A second contribution of this paper is the development of characterization methodology for measuring DVS event detection thresholds by incorporating a measure of signal-to-noise ratio (SNR). At average SNR of 30 dB, the DVS temporal contrast threshold fixed pattern noise is measured to be 0.3%-0.8% temporal contrast. Results comparing monochrome and RGBW color filter array DVS events are presented. The higher sensitivity of SDAVIS192 make this sensor potentially useful for calcium imaging, as shown in a recording from cultured neurons expressing calcium sensitive green fluorescent protein GCaMP6f.


Assuntos
Percepção de Cores , Processamento de Imagem Assistida por Computador , Neuroimagem , Neurônios/citologia , Imagem Óptica , Animais , Linhagem Celular , Processamento de Imagem Assistida por Computador/instrumentação , Processamento de Imagem Assistida por Computador/métodos , Camundongos , Neuroimagem/instrumentação , Neuroimagem/métodos , Neurônios/metabolismo , Imagem Óptica/instrumentação , Imagem Óptica/métodos , Razão Sinal-Ruído
8.
Entropy (Basel) ; 20(6)2018 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-33265565

RESUMO

Taking inspiration from biology to solve engineering problems using the organizing principles of biological neural computation is the aim of the field of neuromorphic engineering. This field has demonstrated success in sensor based applications (vision and audition) as well as in cognition and actuators. This paper is focused on mimicking the approaching detection functionality of the retina that is computed by one type of Retinal Ganglion Cell (RGC) and its application to robotics. These RGCs transmit action potentials when an expanding object is detected. In this work we compare the software and hardware logic FPGA implementations of this approaching function and the hardware latency when applied to robots, as an attention/reaction mechanism. The visual input for these cells comes from an asynchronous event-driven Dynamic Vision Sensor, which leads to an end-to-end event based processing system. The software model has been developed in Java, and computed with an average processing time per event of 370 ns on a NUC embedded computer. The output firing rate for an approaching object depends on the cell parameters that represent the needed number of input events to reach the firing threshold. For the hardware implementation, on a Spartan 6 FPGA, the processing time is reduced to 160 ns/event with the clock running at 50 MHz. The entropy has been calculated to demonstrate that the system is not totally deterministic in response to approaching objects because of several bioinspired characteristics. It has been measured that a Summit XL mobile robot can react to an approaching object in 90 ms, which can be used as an attentional mechanism. This is faster than similar event-based approaches in robotics and equivalent to human reaction latencies to visual stimulus.

9.
Front Neurosci ; 10: 508, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27877107

RESUMO

Deep spiking neural networks (SNNs) hold the potential for improving the latency and energy efficiency of deep neural networks through data-driven event-based computation. However, training such networks is difficult due to the non-differentiable nature of spike events. In this paper, we introduce a novel technique, which treats the membrane potentials of spiking neurons as differentiable signals, where discontinuities at spike times are considered as noise. This enables an error backpropagation mechanism for deep SNNs that follows the same principles as in conventional deep networks, but works directly on spike signals and membrane potentials. Compared with previous methods relying on indirect training and conversion, our technique has the potential to capture the statistics of spikes more precisely. We evaluate the proposed framework on artificially generated events from the original MNIST handwritten digit benchmark, and also on the N-MNIST benchmark recorded with an event-based dynamic vision sensor, in which the proposed method reduces the error rate by a factor of more than three compared to the best previous SNN, and also achieves a higher accuracy than a conventional convolutional neural network (CNN) trained and tested on the same data. We demonstrate in the context of the MNIST task that thanks to their event-driven operation, deep SNNs (both fully connected and convolutional) trained with our method achieve accuracy equivalent with conventional neural networks. In the N-MNIST example, equivalent accuracy is achieved with about five times fewer computational operations.

11.
Front Neurosci ; 10: 176, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27199639

RESUMO

In this study we compare nine optical flow algorithms that locally measure the flow normal to edges according to accuracy and computation cost. In contrast to conventional, frame-based motion flow algorithms, our open-source implementations compute optical flow based on address-events from a neuromorphic Dynamic Vision Sensor (DVS). For this benchmarking we created a dataset of two synthesized and three real samples recorded from a 240 × 180 pixel Dynamic and Active-pixel Vision Sensor (DAVIS). This dataset contains events from the DVS as well as conventional frames to support testing state-of-the-art frame-based methods. We introduce a new source for the ground truth: In the special case that the perceived motion stems solely from a rotation of the vision sensor around its three camera axes, the true optical flow can be estimated using gyro data from the inertial measurement unit integrated with the DAVIS camera. This provides a ground-truth to which we can compare algorithms that measure optical flow by means of motion cues. An analysis of error sources led to the use of a refractory period, more accurate numerical derivatives and a Savitzky-Golay filter to achieve significant improvements in accuracy. Our pure Java implementations of two recently published algorithms reduce computational cost by up to 29% compared to the original implementations. Two of the algorithms introduced in this paper further speed up processing by a factor of 10 compared with the original implementations, at equal or better accuracy. On a desktop PC, they run in real-time on dense natural input recorded by a DAVIS camera.

12.
Front Neurosci ; 10: 49, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26941595

RESUMO

Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

13.
IEEE Trans Biomed Circuits Syst ; 9(2): 207-16, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25879969

RESUMO

Optical flow sensors have been a long running theme in neuromorphic vision sensors which include circuits that implement the local background intensity adaptation mechanism seen in biological retinas. This paper reports a bio-inspired optical motion sensor aimed towards miniature robotic and aerial platforms. It combines a 20 × 20 continuous-time CMOS silicon retina vision sensor with a DSP microcontroller. The retina sensor has pixels that have local gain control and adapt to background lighting. The system allows the user to validate various motion algorithms without building dedicated custom solutions. Measurements are presented to show that the system can compute global 2D translational motion from complex natural scenes using one particular algorithm: the image interpolation algorithm (I2A). With this algorithm, the system can compute global translational motion vectors at a sample rate of 1 kHz, for speeds up to ±1000 pixels/s, using less than 5 k instruction cycles (12 instructions per pixel) per frame. At 1 kHz sample rate the DSP is 12% occupied with motion computation. The sensor is implemented as a 6 g PCB consuming 170 mW of power.


Assuntos
Interpretação de Imagem Assistida por Computador , Movimento (Física) , Silício/química , Algoritmos , Biomimética , Modelos Neurológicos , Retina , Visão Ocular
14.
IEEE Trans Neural Netw Learn Syst ; 25(12): 2250-63, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25420246

RESUMO

We propose a real-time hand gesture interface based on combining a stereo pair of biologically inspired event-based dynamic vision sensor (DVS) silicon retinas with neuromorphic event-driven postprocessing. Compared with conventional vision or 3-D sensors, the use of DVSs, which output asynchronous and sparse events in response to motion, eliminates the need to extract movements from sequences of video frames, and allows significantly faster and more energy-efficient processing. In addition, the rate of input events depends on the observed movements, and thus provides an additional cue for solving the gesture spotting problem, i.e., finding the onsets and offsets of gestures. We propose a postprocessing framework based on spiking neural networks that can process the events received from the DVSs in real time, and provides an architecture for future implementation in neuromorphic hardware devices. The motion trajectories of moving hands are detected by spatiotemporally correlating the stereoscopically verged asynchronous events from the DVSs by using leaky integrate-and-fire (LIF) neurons. Adaptive thresholds of the LIF neurons achieve the segmentation of trajectories, which are then translated into discrete and finite feature vectors. The feature vectors are classified with hidden Markov models, using a separate Gaussian mixture model for spotting irrelevant transition gestures. The disparity information from stereovision is used to adapt LIF neuron parameters to achieve recognition invariant of the distance of the user to the sensor, and also helps to filter out movements in the background of the user. Exploiting the high dynamic range of DVSs, furthermore, allows gesture recognition over a 60-dB range of scene illuminance. The system achieves recognition rates well over 90% under a variety of variable conditions with static and dynamic backgrounds with naïve users.


Assuntos
Interfaces Cérebro-Computador , Sistemas Computacionais , Gestos , Reconhecimento Automatizado de Padrão/métodos , Retina , Silício/química , Interfaces Cérebro-Computador/tendências , Sistemas Computacionais/tendências , Humanos , Reconhecimento Automatizado de Padrão/tendências , Estimulação Luminosa/métodos , Retina/fisiologia
15.
IEEE Trans Biomed Circuits Syst ; 8(4): 453-64, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24216772

RESUMO

This paper proposes an integrated event-based binaural silicon cochlea system aimed at efficient spatial audition and auditory scene analysis. The cochlea chip has a matched pair of digitally-calibrated 64-stage cascaded analog second-order filter banks with 512 pulse-frequency modulated (PFM) address-event representation (AER) outputs. The quality factors (Qs) of channels are individually adjusted by local DACs. The 2P4M 0.35 um CMOS chip consumes an average power of 14 mW including its integrated microphone preamplifiers and biasing circuits. Typical speech data rates are 10 k to 100 k events per second (eps) with peak output rates of 10 Meps. The event timing jitter is 2 us for a 250 mVpp input. It is shown that the computational cost of an event-driven source localization application can be up to 40 times lower when compared to a conventional cross-correlation approach.


Assuntos
Desenho de Equipamento , Algoritmos , Audição , Processamento de Sinais Assistido por Computador , Silício/química
17.
Front Neurosci ; 7: 223, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24311999

RESUMO

Conventional vision-based robotic systems that must operate quickly require high video frame rates and consequently high computational costs. Visual response latencies are lower-bound by the frame period, e.g., 20 ms for 50 Hz frame rate. This paper shows how an asynchronous neuromorphic dynamic vision sensor (DVS) silicon retina is used to build a fast self-calibrating robotic goalie, which offers high update rates and low latency at low CPU load. Independent and asynchronous per pixel illumination change events from the DVS signify moving objects and are used in software to track multiple balls. Motor actions to block the most "threatening" ball are based on measured ball positions and velocities. The goalie also sees its single-axis goalie arm and calibrates the motor output map during idle periods so that it can plan open-loop arm movements to desired visual locations. Blocking capability is about 80% for balls shot from 1 m from the goal even with the fastest-shots, and approaches 100% accuracy when the ball does not beat the limits of the servo motor to move the arm to the necessary position in time. Running with standard USB buses under a standard preemptive multitasking operating system (Windows), the goalie robot achieves median update rates of 550 Hz, with latencies of 2.2 ± 2 ms from ball movement to motor command at a peak CPU load of less than 4%. Practical observations and measurements of USB device latency are provided.

18.
Front Neurosci ; 7: 178, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24115919

RESUMO

Deep Belief Networks (DBNs) have recently shown impressive performance on a broad range of classification problems. Their generative properties allow better understanding of the performance, and provide a simpler solution for sensor fusion tasks. However, because of their inherent need for feedback and parallel update of large numbers of units, DBNs are expensive to implement on serial computers. This paper proposes a method based on the Siegert approximation for Integrate-and-Fire neurons to map an offline-trained DBN onto an efficient event-driven spiking neural network suitable for hardware implementation. The method is demonstrated in simulation and by a real-time implementation of a 3-layer network with 2694 neurons used for visual classification of MNIST handwritten digits with input from a 128 × 128 Dynamic Vision Sensor (DVS) silicon retina, and sensory-fusion using additional input from a 64-channel AER-EAR silicon cochlea. The system is implemented through the open-source software in the jAER project and runs in real-time on a laptop computer. It is demonstrated that the system can recognize digits in the presence of distractions, noise, scaling, translation and rotation, and that the degradation of recognition performance by using an event-based approach is less than 1%. Recognition is achieved in an average of 5.8 ms after the onset of the presentation of a digit. By cue integration from both silicon retina and cochlea outputs we show that the system can be biased to select the correct digit from otherwise ambiguous input.

19.
Artigo em Inglês | MEDLINE | ID: mdl-24110926

RESUMO

We report on the neuromorphic sound localization circuit which can enhance the perceptual sensation in a hearing aid system. All elements are simple leaky integrate-and-fire neuron circuits with different parameters optimized to suppress the impacts of synaptic circuit noises. The detection range and resolution of the proposed neuromorphic circuit are 500 us and 5 us, respectively. Our results show that, the proposed technique can localize a sound pulse with extremely narrow duration (∼ 1 ms) resulting in real-time response.


Assuntos
Auxiliares de Audição , Neurônios/fisiologia , Localização de Som/fisiologia , Potenciais de Ação/fisiologia , Cóclea/fisiologia , Simulação por Computador , Humanos , Modelos Neurológicos , Silício , Som , Sinapses/fisiologia , Fatores de Tempo
20.
Front Neurosci ; 7: 275, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24478619

RESUMO

Mobile robots need to know the terrain in which they are moving for path planning and obstacle avoidance. This paper proposes the combination of a bio-inspired, redundancy-suppressing dynamic vision sensor (DVS) with a pulsed line laser to allow fast terrain reconstruction. A stable laser stripe extraction is achieved by exploiting the sensor's ability to capture the temporal dynamics in a scene. An adaptive temporal filter for the sensor output allows a reliable reconstruction of 3D terrain surfaces. Laser stripe extractions up to pulsing frequencies of 500 Hz were achieved using a line laser of 3 mW at a distance of 45 cm using an event-based algorithm that exploits the sparseness of the sensor output. As a proof of concept, unstructured rapid prototype terrain samples have been successfully reconstructed with an accuracy of 2 mm.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...