Pesquisa | Portal Regional da BVS

Efficient SNN multi-cores MAC array acceleration on SpiNNaker 2.

Huang, Jiaxin; Kelber, Florian; Vogginger, Bernhard; Liu, Chen; Kreutz, Felix; Gerhards, Pascal; Scholz, Daniel; Knobloch, Klaus; Mayr, Christian G.

Front Neurosci ; 17: 1223262, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37609449

RESUMO

The potential low-energy feature of the spiking neural network (SNN) engages the attention of the AI community. Only CPU-involved SNN processing inevitably results in an inherently long temporal span in the cases of large models and massive datasets. This study introduces the MAC array, a parallel architecture on each processing element (PE) of SpiNNaker 2, into the computational process of SNN inference. Based on the work of single-core optimization algorithms, we investigate the parallel acceleration algorithms for collaborating with multi-core MAC arrays. The proposed Echelon Reorder model information densification algorithm, along with the adapted multi-core two-stage splitting and authorization deployment strategies, achieves efficient spatio-temporal load balancing and optimization performance. We evaluate the performance by benchmarking a wide range of constructed SNN models to research on the influence degree of different factors. We also benchmark with two actual SNN models (the gesture recognition model of the real-world application and balanced random cortex-like network from neuroscience) on the neuromorphic multi-core hardware SpiNNaker 2. The echelon optimization algorithm with mixed processors realizes 74.28% and 85.78% memory footprint of the original MAC calculation on these two models, respectively. The execution time of echelon algorithms using only MAC or mixed processors accounts for ≤ 24.56% of the serial ARM baseline. Accelerating SNN inference with algorithms in this study is essentially the general sparse matrix-matrix multiplication (SpGEMM) problem. This article explicitly expands the application field of the SpGEMM issue to SNN, developing novel SpGEMM optimization algorithms fitting the SNN feature and MAC array.

E-prop on SpiNNaker 2: Exploring online learning in spiking RNNs on neuromorphic hardware.

Rostami, Amirhossein; Vogginger, Bernhard; Yan, Yexin; Mayr, Christian G.

Front Neurosci ; 16: 1018006, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36518534

RESUMO

Introduction: In recent years, the application of deep learning models at the edge has gained attention. Typically, artificial neural networks (ANNs) are trained on graphics processing units (GPUs) and optimized for efficient execution on edge devices. Training ANNs directly at the edge is the next step with many applications such as the adaptation of models to specific situations like changes in environmental settings or optimization for individuals, e.g., optimization for speakers for speech processing. Also, local training can preserve privacy. Over the last few years, many algorithms have been developed to reduce memory footprint and computation. Methods: A specific challenge to train recurrent neural networks (RNNs) for processing sequential data is the need for the Back Propagation Through Time (BPTT) algorithm to store the network state of all time steps. This limitation is resolved by the biologically-inspired E-prop approach for training Spiking Recurrent Neural Networks (SRNNs). We implement the E-prop algorithm on a prototype of the SpiNNaker 2 neuromorphic system. A parallelization strategy is developed to split and train networks on the ARM cores of SpiNNaker 2 to make efficient use of both memory and compute resources. We trained an SRNN from scratch on SpiNNaker 2 in real-time on the Google Speech Command dataset for keyword spotting. Result: We achieved an accuracy of 91.12% while requiring only 680 KB of memory for training the network with 25 K weights. Compared to other spiking neural networks with equal or better accuracy, our work is significantly more memory-efficient. Discussion: In addition, we performed a memory and time profiling of the E-prop algorithm. This is used on the one hand to discuss whether E-prop or BPTT is better suited for training a model at the edge and on the other hand to explore architecture modifications to SpiNNaker 2 to speed up online learning. Finally, energy estimations predict that the SRNN can be trained on SpiNNaker2 with 12 times less energy than using a NVIDIA V100 GPU.

Automotive Radar Processing With Spiking Neural Networks: Concepts and Challenges.

Vogginger, Bernhard; Kreutz, Felix; López-Randulfe, Javier; Liu, Chen; Dietrich, Robin; Gonzalez, Hector A; Scholz, Daniel; Reeb, Nico; Auge, Daniel; Hille, Julian; Arsalan, Muhammad; Mirus, Florian; Grassmann, Cyprian; Knoll, Alois; Mayr, Christian.

Front Neurosci ; 16: 851774, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35431782

RESUMO

Frequency-modulated continuous wave radar sensors play an essential role for assisted and autonomous driving as they are robust under all weather and light conditions. However, the rising number of transmitters and receivers for obtaining a higher angular resolution increases the cost for digital signal processing. One promising approach for energy-efficient signal processing is the usage of brain-inspired spiking neural networks (SNNs) implemented on neuromorphic hardware. In this article we perform a step-by-step analysis of automotive radar processing and argue how spiking neural networks could replace or complement the conventional processing. We provide SNN examples for two processing steps and evaluate their accuracy and computational efficiency. For radar target detection, an SNN with temporal coding is competitive to the conventional approach at a low compute overhead. Instead, our SNN for target classification achieves an accuracy close to a reference artificial neural network while requiring 200 times less operations. Finally, we discuss the specific requirements and challenges for SNN-based radar processing on neuromorphic hardware. This study proves the general applicability of SNNs for automotive radar processing and sustains the prospect of energy-efficient realizations in automated vehicles.

Efficient Reward-Based Structural Plasticity on a SpiNNaker 2 Prototype.

Yan, Yexin; Kappel, David; Neumarker, Felix; Partzsch, Johannes; Vogginger, Bernhard; Hoppner, Sebastian; Furber, Steve; Maass, Wolfgang; Legenstein, Robert; Mayr, Christian.

IEEE Trans Biomed Circuits Syst ; 13(3): 579-591, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-30932847

RESUMO

Advances in neuroscience uncover the mechanisms employed by the brain to efficiently solve complex learning tasks with very limited resources. However, the efficiency is often lost when one tries to port these findings to a silicon substrate, since brain-inspired algorithms often make extensive use of complex functions, such as random number generators, that are expensive to compute on standard general purpose hardware. The prototype chip of the second generation SpiNNaker system is designed to overcome this problem. Low-power advanced RISC machine (ARM) processors equipped with a random number generator and an exponential function accelerator enable the efficient execution of brain-inspired algorithms. We implement the recently introduced reward-based synaptic sampling model that employs structural plasticity to learn a function or task. The numerical simulation of the model requires to update the synapse variables in each time step including an explorative random term. To the best of our knowledge, this is the most complex synapse model implemented so far on the SpiNNaker system. By making efficient use of the hardware accelerators and numerical optimizations, the computation time of one plasticity update is reduced by a factor of 2. This, combined with fitting the model into to the local static random access memory (SRAM), leads to 62% energy reduction compared to the case without accelerators and the use of external dynamic random access memory (DRAM). The model implementation is integrated into the SpiNNaker software framework allowing for scalability onto larger systems. The hardware-software system presented in this paper paves the way for power-efficient mobile and biomedical applications with biologically plausible brain-inspired algorithms.

Assuntos

Encéfalo/fisiologia , Aprendizado de Máquina , Modelos Neurológicos , Redes Neurais de Computação , Software , Sinapses/fisiologia , Humanos

Memory-Efficient Deep Learning on a SpiNNaker 2 Prototype.

Liu, Chen; Bellec, Guillaume; Vogginger, Bernhard; Kappel, David; Partzsch, Johannes; Neumärker, Felix; Höppner, Sebastian; Maass, Wolfgang; Furber, Steve B; Legenstein, Robert; Mayr, Christian G.

Front Neurosci ; 12: 840, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30505263

RESUMO

The memory requirement of deep learning algorithms is considered incompatible with the memory restriction of energy-efficient hardware. A low memory footprint can be achieved by pruning obsolete connections or reducing the precision of connection strengths after the network has been trained. Yet, these techniques are not applicable to the case when neural networks have to be trained directly on hardware due to the hard memory constraints. Deep Rewiring (DEEP R) is a training algorithm which continuously rewires the network while preserving very sparse connectivity all along the training procedure. We apply DEEP R to a deep neural network implementation on a prototype chip of the 2nd generation SpiNNaker system. The local memory of a single core on this chip is limited to 64 KB and a deep network architecture is trained entirely within this constraint without the use of external memory. Throughout training, the proportion of active connections is limited to 1.3%. On the handwritten digits dataset MNIST, this extremely sparse network achieves 96.6% classification accuracy at convergence. Utilizing the multi-processor feature of the SpiNNaker system, we found very good scaling in terms of computation time, per-core memory consumption, and energy constraints. When compared to a X86 CPU implementation, neural network training on the SpiNNaker 2 prototype improves power and energy consumption by two orders of magnitude.

Reducing the computational footprint for real-time BCPNN learning.

Vogginger, Bernhard; Schüffny, René; Lansner, Anders; Cederström, Love; Partzsch, Johannes; Höppner, Sebastian.

Front Neurosci ; 9: 2, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25657618

RESUMO

The implementation of synaptic plasticity in neural simulation or neuromorphic hardware is usually very resource-intensive, often requiring a compromise between efficiency and flexibility. A versatile, but computationally-expensive plasticity mechanism is provided by the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm. Building upon Bayesian statistics, and having clear links to biological plasticity processes, the BCPNN learning rule has been applied in many fields, ranging from data classification, associative memory, reward-based learning, probabilistic inference to cortical attractor memory networks. In the spike-based version of this learning rule the pre-, postsynaptic and coincident activity is traced in three low-pass-filtering stages, requiring a total of eight state variables, whose dynamics are typically simulated with the fixed step size Euler method. We derive analytic solutions allowing an efficient event-driven implementation of this learning rule. Further speedup is achieved by first rewriting the model which reduces the number of basic arithmetic operations per update to one half, and second by using look-up tables for the frequently calculated exponential decay. Ultimately, in a typical use case, the simulation using our approach is more than one order of magnitude faster than with the fixed step size Euler method. Aiming for a small memory footprint per BCPNN synapse, we also evaluate the use of fixed-point numbers for the state variables, and assess the number of bits required to achieve same or better accuracy than with the conventional explicit Euler method. All of this will allow a real-time simulation of a reduced cortex model based on BCPNN in high performance computing. More important, with the analytic solution at hand and due to the reduced memory bandwidth, the learning rule can be efficiently implemented in dedicated or existing digital neuromorphic hardware.

Characterization and compensation of network-level anomalies in mixed-signal neuromorphic modeling platforms.

Petrovici, Mihai A; Vogginger, Bernhard; Müller, Paul; Breitwieser, Oliver; Lundqvist, Mikael; Muller, Lyle; Ehrlich, Matthias; Destexhe, Alain; Lansner, Anders; Schüffny, René; Schemmel, Johannes; Meier, Karlheinz.

PLoS One ; 9(10): e108590, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25303102

RESUMO

Advancing the size and complexity of neural network models leads to an ever increasing demand for computational resources for their simulation. Neuromorphic devices offer a number of advantages over conventional computing architectures, such as high emulation speed or low power consumption, but this usually comes at the price of reduced configurability and precision. In this article, we investigate the consequences of several such factors that are common to neuromorphic devices, more specifically limited hardware resources, limited parameter configurability and parameter variations due to fixed-pattern noise and trial-to-trial variability. Our final aim is to provide an array of methods for coping with such inevitable distortion mechanisms. As a platform for testing our proposed strategies, we use an executable system specification (ESS) of the BrainScaleS neuromorphic system, which has been designed as a universal emulation back-end for neuroscientific modeling. We address the most essential limitations of this device in detail and study their effects on three prototypical benchmark network models within a well-defined, systematic workflow. For each network model, we start by defining quantifiable functionality measures by which we then assess the effects of typical hardware-specific distortion mechanisms, both in idealized software simulations and on the ESS. For those effects that cause unacceptable deviations from the original network dynamics, we suggest generic compensation mechanisms and demonstrate their effectiveness. Both the suggested workflow and the investigated compensation mechanisms are largely back-end independent and do not require additional hardware configurability beyond the one required to emulate the benchmark networks in the first place. We hereby provide a generic methodological environment for configurable neuromorphic devices that are targeted at emulating large-scale, functional neural networks.

Assuntos

Simulação por Computador , Sistemas Computacionais , Redes Neurais de Computação , Computadores , Desenho de Equipamento , Modelos Neurológicos , Neurônios/fisiologia , Software

VLSI Implementation of a 2.8 Gevent/s Packet-Based AER Interface with Routing and Event Sorting Functionality.

Scholze, Stefan; Schiefer, Stefan; Partzsch, Johannes; Hartmann, Stephan; Mayr, Christian Georg; Höppner, Sebastian; Eisenreich, Holger; Henker, Stephan; Vogginger, Bernhard; Schüffny, Rene.

Front Neurosci ; 5: 117, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-22016720

RESUMO

State-of-the-art large-scale neuromorphic systems require sophisticated spike event communication between units of the neural network. We present a high-speed communication infrastructure for a waferscale neuromorphic system, based on application-specific neuromorphic communication ICs in an field programmable gate arrays (FPGA)-maintained environment. The ICs implement configurable axonal delays, as required for certain types of dynamic processing or for emulating spike-based learning among distant cortical areas. Measurements are presented which show the efficacy of these delays in influencing behavior of neuromorphic benchmarks. The specialized, dedicated address-event-representation communication in most current systems requires separate, low-bandwidth configuration channels. In contrast, the configuration of the waferscale neuromorphic system is also handled by the digital packet-based pulse channel, which transmits configuration data at the full bandwidth otherwise used for pulse transmission. The overall so-called pulse communication subgroup (ICs and FPGA) delivers a factor 25-50 more event transmission rate than other current neuromorphic communication infrastructures.

A comprehensive workflow for general-purpose neural modeling with highly configurable neuromorphic hardware systems.

Brüderle, Daniel; Petrovici, Mihai A; Vogginger, Bernhard; Ehrlich, Matthias; Pfeil, Thomas; Millner, Sebastian; Grübl, Andreas; Wendt, Karsten; Müller, Eric; Schwartz, Marc-Olivier; de Oliveira, Dan Husmann; Jeltsch, Sebastian; Fieres, Johannes; Schilling, Moritz; Müller, Paul; Breitwieser, Oliver; Petkov, Venelin; Muller, Lyle; Davison, Andrew P; Krishnamurthy, Pradeep; Kremkow, Jens; Lundqvist, Mikael; Muller, Eilif; Partzsch, Johannes; Scholze, Stefan; Zühl, Lukas; Mayr, Christian; Destexhe, Alain; Diesmann, Markus; Potjans, Tobias C; Lansner, Anders; Schüffny, René; Schemmel, Johannes; Meier, Karlheinz.

Biol Cybern ; 104(4-5): 263-96, 2011 May.

Artigo em Inglês | MEDLINE | ID: mdl-21618053

RESUMO

In this article, we present a methodological framework that meets novel requirements emerging from upcoming types of accelerated and highly configurable neuromorphic hardware systems. We describe in detail a device with 45 million programmable and dynamic synapses that is currently under development, and we sketch the conceptual challenges that arise from taking this platform into operation. More specifically, we aim at the establishment of this neuromorphic system as a flexible and neuroscientifically valuable modeling tool that can be used by non-hardware experts. We consider various functional aspects to be crucial for this purpose, and we introduce a consistent workflow with detailed descriptions of all involved modules that implement the suggested steps: The integration of the hardware interface into the simulator-independent model description language PyNN; a fully automated translation between the PyNN domain and appropriate hardware configurations; an executable specification of the future neuromorphic system that can be seamlessly integrated into this biology-to-hardware mapping process as a test bench for all software layers and possible hardware design modifications; an evaluation scheme that deploys models from a dedicated benchmark library, compares the results generated by virtual or prototype hardware devices with reference software simulations and analyzes the differences. The integration of these components into one hardware-software workflow provides an ecosystem for ongoing preparative studies that support the hardware design process and represents the basis for the maturity of the model-to-hardware mapping software. The functionality and flexibility of the latter is proven with a variety of experimental results.

Assuntos

Computadores , Modelos Teóricos , Sistema Nervoso

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA