Search | VHL Regional Portal

1.

SHA-256 Hardware Proposal for IoT Devices in the Blockchain Context.

Santos, Carlos E B; Silva, Lucileide M D da; Torquato, Matheus F; Silva, Sérgio N; Fernandes, Marcelo A C.

Sensors (Basel) ; 24(12)2024 Jun 17.

Article in English | MEDLINE | ID: mdl-38931692

ABSTRACT

This work proposes an implementation of the SHA-256, the most common blockchain hash algorithm, on a field-programmable gate array (FPGA) to improve processing capacity and power saving in Internet of Things (IoT) devices to solve security and privacy issues. This implementation presents a different approach than other papers in the literature, using clustered cores executing the SHA-256 algorithm in parallel. Details about the proposed architecture and an analysis of the resources used by the FPGA are presented. The implementation achieved a throughput of approximately 1.4 Gbps for 16 cores on a single FPGA. Furthermore, it saved dynamic power, using almost 1000 times less compared to previous works in the literature, making this proposal suitable for practical problems for IoT devices in blockchain environments. The target FPGA used was the Xilinx Virtex 6 xc6vlx240t-1ff1156.

2.

Artificial cerebellum on FPGA: realistic real-time cerebellar spiking neural network model capable of real-world adaptive motor control.

Shinji, Yusuke; Okuno, Hirotsugu; Hirata, Yutaka.

Front Neurosci ; 18: 1220908, 2024.

Article in English | MEDLINE | ID: mdl-38726031

ABSTRACT

The cerebellum plays a central role in motor control and learning. Its neuronal network architecture, firing characteristics of component neurons, and learning rules at their synapses have been well understood in terms of anatomy and physiology. A realistic artificial cerebellum with mimetic network architecture and synaptic plasticity mechanisms may allow us to analyze cerebellar information processing in the real world by applying it to adaptive control of actual machines. Several artificial cerebellums have previously been constructed, but they require high-performance hardware to run in real-time for real-world machine control. Presently, we implemented an artificial cerebellum with the size of 104 spiking neuron models on a field-programmable gate array (FPGA) which is compact, lightweight, portable, and low-power-consumption. In the implementation three novel techniques are employed: (1) 16-bit fixed-point operation and randomized rounding, (2) fully connected spike information transmission, and (3) alternative memory that uses pseudo-random number generators. We demonstrate that the FPGA artificial cerebellum runs in real-time, and its component neuron models behave as those in the corresponding artificial cerebellum configured on a personal computer in Python. We applied the FPGA artificial cerebellum to the adaptive control of a machine in the real world and demonstrated that the artificial cerebellum is capable of adaptively reducing control error after sudden load changes. This is the first implementation and demonstration of a spiking artificial cerebellum on an FPGA applicable to real-world adaptive control. The FPGA artificial cerebellum may provide neuroscientific insights into cerebellar information processing in adaptive motor control and may be applied to various neuro-devices to augment and extend human motor control capabilities.

3.

Hybrid Filtering Compensation Algorithm for Suppressing Random Errors in MEMS Arrays.

Liang, Siyuan; Guo, Tianyu; Chen, Rongrong; Li, Xuguang.

Micromachines (Basel) ; 15(5)2024 Apr 24.

Article in English | MEDLINE | ID: mdl-38793131

ABSTRACT

To solve the high error phenomenon of microelectromechanical systems (MEMS) due to their poor signal-to-noise ratio, this paper proposes an online compensation algorithm wavelet threshold back-propagation neural network (WT-BPNN), based on a neural network and designed to effectively suppress the random error of MEMS arrays. The algorithm denoises MEMS and compensates for the error using a back propagation neural network (BPNN). To verify the feasibility of the proposed algorithm, we deployed it in a ZYNQ-based MEMS array hardware. The experimental results showed that the zero-bias instability, angular random wander, and angular velocity random wander of the gyroscope were improved by about 12 dB, 10 dB, and 7 dB, respectively, compared with the original device in static scenarios, and the dispersion of the output data was reduced by about 8 dB in various dynamic environments, which effectively verified the robustness and feasibility of the algorithm.

4.

Hardware Acceleration of Digital Pulse Shape Analysis Using FPGAs.

González, César; Ruiz, Mariano; Carpeño, Antonio; Piñas, Alejandro; Cano-Ott, Daniel; Plaza, Julio; Martinez, Trino; Villamarin, David.

Sensors (Basel) ; 24(9)2024 Apr 25.

Article in English | MEDLINE | ID: mdl-38732830

ABSTRACT

The BC501A sensor is a liquid scintillator frequently used in nuclear physics for detecting fast neutrons. This paper describes a hardware implementation of digital pulse shape analysis (DPSA) for real-time analysis. DPSA is an algorithm that extracts the physically relevant parameters from the detected BC501A signals. The hardware solution is implemented in a MicroTCA system that provides the physical, mechanical, electrical, and cooling support for an AMC board (NAMC-ZYNQ-FMC) with a Xilinx ZYNQ Ultrascale-MP SoC. The Xilinx FPGA programmable logic implements a JESD204B interface to high-speed ADCs. The physical and datalink JESD204B layers are implemented using hardware description language (HDL), while the Xilinx high-level synthesis language (HLS) is used for the transport and application layers. The DPSA algorithm is a JESD204B application layer that includes a FIR filter and a constant fraction discriminator (CFD) function, a baseline calculation function, a peak detection function, and an energy calculation function. This architecture achieves an analysis mean time of less than 100 µs per signal with an FPGA resource utilization of about 50% of its most used resources. This paper presents a high-performance DPSA embedded system that interfaces with a 1 GS/s ADC and performs accurate calculations with relatively low latency.

5.

A Versatile Approach for Adaptive Grid Mapping and Grid Flex-Graph Exploration with a Field-Programmable Gate Array-Based Robot Using Hardware Schemes.

Basha, Mudasar; Siva Kumar, Munuswamy; Chinnaiah, Mangali Chinna; Lam, Siew-Kei; Srikanthan, Thambipillai; Divya Vani, Gaddam; Janardhan, Narambhatla; Hari Krishna, Dodde; Dubey, Sanjay.

Sensors (Basel) ; 24(9)2024 Apr 26.

Article in English | MEDLINE | ID: mdl-38732882

ABSTRACT

Robotic exploration in dynamic and complex environments requires advanced adaptive mapping strategies to ensure accurate representation of the environments. This paper introduces an innovative grid flex-graph exploration (GFGE) algorithm designed for single-robot mapping. This hardware-scheme-based algorithm leverages a combination of quad-grid and graph structures to enhance the efficiency of both local and global mapping implemented on a field-programmable gate array (FPGA). This novel research work involved using sensor fusion to analyze a robot's behavior and flexibility in the presence of static and dynamic objects. A behavior-based grid construction algorithm was proposed for the construction of a quad-grid that represents the occupancy of frontier cells. The selection of the next exploration target in a graph-like structure was proposed using partial reconfiguration-based frontier-graph exploration approaches. The complete exploration method handles the data when updating the local map to optimize the redundant exploration of previously explored nodes. Together, the exploration handles the quadtree-like structure efficiently under dynamic and uncertain conditions with a parallel processing architecture. Integrating several algorithms into indoor robotics was a complex process, and a Xilinx-based partial reconfiguration approach was used to prevent computing difficulties when running many algorithms simultaneously. These algorithms were developed, simulated, and synthesized using the Verilog hardware description language on Zynq SoC. Experiments were carried out utilizing a robot based on a field-programmable gate array (FPGA), and the resource utilization and power consumption of the device were analyzed.

6.

An Edge Computing System with AMD Xilinx FPGA AI Customer Platform for Advanced Driver Assistance System.

Chi, Tsun-Kuang; Chen, Tsung-Yi; Lin, Yu-Chen; Lin, Ting-Lan; Zhang, Jun-Ting; Lu, Cheng-Lin; Chen, Shih-Lun; Li, Kuo-Chen; Abu, Patricia Angela R.

Sensors (Basel) ; 24(10)2024 May 13.

Article in English | MEDLINE | ID: mdl-38793952

ABSTRACT

The convergence of edge computing systems with Field-Programmable Gate Array (FPGA) technology has shown considerable promise in enhancing real-time applications across various domains. This paper presents an innovative edge computing system design specifically tailored for pavement defect detection within the Advanced Driver-Assistance Systems (ADASs) domain. The system seamlessly integrates the AMD Xilinx AI platform into a customized circuit configuration, capitalizing on its capabilities. Utilizing cameras as input sensors to capture road scenes, the system employs a Deep Learning Processing Unit (DPU) to execute the YOLOv3 model, enabling the identification of three distinct types of pavement defects with high accuracy and efficiency. Following defect detection, the system efficiently transmits detailed information about the type and location of detected defects via the Controller Area Network (CAN) interface. This integration of FPGA-based edge computing not only enhances the speed and accuracy of defect detection, but also facilitates real-time communication between the vehicle's onboard controller and external systems. Moreover, the successful integration of the proposed system transforms ADAS into a sophisticated edge computing device, empowering the vehicle's onboard controller to make informed decisions in real time. These decisions are aimed at enhancing the overall driving experience by improving safety and performance metrics. The synergy between edge computing and FPGA technology not only advances ADAS capabilities, but also paves the way for future innovations in automotive safety and assistance systems.

7.

An Ultralow-Power Real-Time Machine Learning Based fNIRS Motion Artifacts Detection.

Ercan, Renas; Xia, Yunjia; Zhao, Yunyi; Loureiro, Rui; Yang, Shufan; Zhao, Hubin.

IEEE Trans Very Large Scale Integr VLSI Syst ; 32(4): 763-773, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38765316

ABSTRACT

Due to iterative matrix multiplications or gradient computations, machine learning modules often require a large amount of processing power and memory. As a result, they are often not feasible for use in wearable devices, which have limited processing power and memory. In this study, we propose an ultralow-power and real-time machine learning-based motion artifact detection module for functional near-infrared spectroscopy (fNIRS) systems. We achieved a high classification accuracy of 97.42%, low field-programmable gate array (FPGA) resource utilization of 38354 lookup tables and 6024 flip-flops, as well as low power consumption of 0.021 W in dynamic power. These results outperform conventional CPU support vector machine (SVM) methods and other state-of-the-art SVM implementations. This study has demonstrated that an FPGA-based fNIRS motion artifact classifier can be exploited while meeting low power and resource constraints, which are crucial in embedded hardware systems while keeping high classification accuracy.

8.

FPGA-based fast bin-ratio spiking ensemble network for radioisotope identification.

Xie, Shouyu; Jones, Edward; Zhang, Siru; Marsden, Edward; Baistow, Ian; Furber, Steve; Mitra, Srinjoy; Hamilton, Alister.

Neural Netw ; 176: 106332, 2024 Aug.

Article in English | MEDLINE | ID: mdl-38678831

ABSTRACT

In this work, we demonstrate the training, conversion, and implementation flow of an FPGA-based bin-ratio ensemble spiking neural network applied for radioisotope identification. The combination of techniques including learned step quantisation (LSQ) and pruning facilitated the implementation by compressing the network's parameters down to 30% yet retaining the accuracy of 97.04% with an accuracy loss of less than 1%. Meanwhile, the proposed ensemble network of 20 3-layer spiking neural networks (SNNs), which incorporates 1160 spiking neurons, only needs 334 µs for a single inference with the given clock frequency of 100 MHz. Under such optimisation, this FPGA implementation in an Artix-7 board consumes 157 µJ per inference by estimation.

Subject(s)

Neural Networks, Computer , Neurons , Neurons/physiology , Action Potentials/physiology , Radioisotopes , Algorithms , Humans

9.

Design of efficient binary multiplier architecture using hybrid compressor with FPGA implementation.

Thamizharasan, V; Parthipan, V.

Sci Rep ; 14(1): 8492, 2024 Apr 11.

Article in English | MEDLINE | ID: mdl-38605103

ABSTRACT

In signal processing applications, the multipliers are essential component of arithmetic functional units in many applications, like digital signal processors, image/video processing, Machine Learning, Cryptography and Arithmetic & Logical units (ALU). In recent years, Profuse multipliers are there. In that, Vedic multiplier is one of the high-performance multiplications and it is used to signal/image processing applications. In order to ameliorate the performance of this multiplier further, by proposed a novel multiplier using hybrid compressor. The proposed hybrid compressor-based multiplier is designed and implemented in Field programmable Gate Array (FPGA-spartan 6). The synthesis result shows that the speed of proposed hybrid compressor-based multiplier gets improved as compared to Array multiplier (35.83%), Wallace tree multiplier (34.58%), Vedic Multiplier based on Carry look ahead adder (CLA) (28.49%), Vedic Multiplier based on Ripple carry adder (RCA) (20.65%), Booth Multiplication (21.65%) and Vedic Multiplication based on Han-Carlson Adder (HCA) (20.10%) and Hybrid multiplier using Carry Select Adder (CSELA) (17.81%) and Hybrid Vedic Multiplier (7.15%).

10.

Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure.

Xu, Yuhua; Luo, Jie; Sun, Wei.

Sensors (Basel) ; 24(7)2024 Mar 31.

Article in English | MEDLINE | ID: mdl-38610450

ABSTRACT

Convolutional neural networks (CNNs) have significantly advanced various fields; however, their computational demands and power consumption have escalated, posing challenges for deployment in low-power scenarios. To address this issue and facilitate the application of CNNs in power constrained environments, the development of dedicated CNN accelerators is crucial. Prior research has predominantly concentrated on developing low precision CNN accelerators using code generated from high-level synthesis (HLS) tools. Unfortunately, these approaches often fail to efficiently utilize the computational resources of field-programmable gate arrays (FPGAs) and do not extend well to full precision scenarios. To overcome these limitations, we integrate vector dot products to unify the convolution and fully connected layers. By treating the row vector of input feature maps as the fundamental processing unit, we balance processing latency and resource consumption while eliminating data rearrangement time. Furthermore, an accurate design space exploration (DSE) model is established to identify the optimal design points for each CNN layer, and dynamic partial reconfiguration is employed to maximize each layer's access to computational resources. Our approach is validated through the implementation of AlexNet and VGG16 on 7A100T and ZU15EG platforms, respectively. We achieve an average convolutional layer throughput of 28.985 GOP/s and 246.711 GOP/s for full precision. Notably, the proposed accelerator demonstrates remarkable power efficiency, with a maximum improvement of 23.989 and 15.376 times compared to current state-of-the-art FPGA implementations.

11.

High-Performance Reconfigurable Pipeline Implementation for FPGA-Based SmartNIC.

Song, Xiaoyong; Lu, Rui; Guo, Zhichuan.

Micromachines (Basel) ; 15(4)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38675261

ABSTRACT

As the key module of programmable switches or the SmartNIC card, the packet processing pipeline undertakes the task of packet forwarding and processing. However, the current pipeline for the FPGA-based SmartNIC is inflexible, and the related reconfigurable commercial device designs are closed-source. To solve this problem, this paper proposes a high-performance reconfigurable pipeline design, which has fully reconfigurable match-action units, supporting various network functions by its flexible reconfiguration. The fields of the match key and the size of the match table can be reconfigured without recompiling the HDL code or modifying the hardware. The processing rules and action instructions for the pipeline can be dynamically installed by the configuration module at runtime. We implement our design on the Xilinx Alveo U200 board with a Virtex UltraScale+ XCU200-2FSGD2104E FPGA and show that the designed pipeline supports fast reconfiguration to implement new network functions and that the throughput of the designed pipeline reaches 100 Gbps with low latency.

12.

FPGA-Microprocessor Based Sensor for Faults Detection in Induction Motors Using Time-Frequency and Machine Learning Methods.

Osornio-Rios, Roque Alfredo; Cueva-Perez, Isaias; Alvarado-Hernandez, Alvaro Ivan; Dunai, Larisa; Zamudio-Ramirez, Israel; Antonino-Daviu, Jose Alfonso.

Sensors (Basel) ; 24(8)2024 Apr 22.

Article in English | MEDLINE | ID: mdl-38676270

ABSTRACT

Induction motors (IM) play a fundamental role in the industrial sector because they are robust, efficient, and low-cost machines. Changes in the environment, installation errors, or modifications to working conditions can generate faults in induction motors. The trend on IM fault detection is focused on the design techniques and sensors capable of evaluating multiple faults with various signals using non-invasive analysis. The methodology is based on processing electric current signals by applying the short-time Fourier transform (STFT). Additionally, the computation of the mean and standard deviation of infrared thermograms is proposed as main indicators. The proposed system combines both parameters by means of Support Vector Machine and k-nearest-neighbor classifiers. The development of the diagnostic system was done with digital hardware implementations using a Xilinx PYNQ Z2 card that integrates an FPGA with a microprocessor, thus taking advantage of the acquisition and processing of digital signals and images in hardware. The proposed method has proved to be effective for the classification of healthy (HLT), misalignment (MAMT), unbalance (UNB), damaged bearing (BDF), and broken rotor bar (BRB) faults with an accuracy close to 99%.

13.

Real-Time Direction Judgment System for Dual-Frequency Laser Interferometer.

Zeng, Qilin; Chen, Wenwei; Du, Hua; Zhang, Wentao; Xiong, Xianming; Zhao, Zhengyi; Zhou, Fangjun; Guo, Xin; Xu, Le.

Sensors (Basel) ; 24(7)2024 Mar 22.

Article in English | MEDLINE | ID: mdl-38610242

ABSTRACT

Current real-time direction judgment systems are inaccurate and insensitive, as well as limited by the sampling rate of analog-to-digital converters. To address this problem, we propose a dynamic real-time direction judgment system based on an integral dual-frequency laser interferometer and field-programmable gate array technology. The optoelectronic signals resulting from the introduction of a phase subdivision method based on the amplitude resolution of the laser interferometer when measuring displacement are analyzed. The proposed system integrates the optoelectronic signals to increase the accuracy of its direction judgments and ensures these direction judgments are made in real time by dynamically controlling the integration time. Several experiments were conducted to verify the performance of the proposed system. The results show that, compared with current real-time direction judgment systems, the proposed system makes accurate judgements during low-speed motions and can update directions within 0.125 cycles of the phase difference change at different speeds. Moreover, a sweep frequency experiment confirmed the system's ability to effectively judge dynamic directions. The proposed system is capable of accurate and real-time directional judgment during low-speed movements of a table in motion.

14.

Advanced direct torque control based on neural tree controllers for induction motor drives.

Aissa, Oualid; Reffas, Abderrahim; Krama, Abdelbasset; Benkercha, Rabah; Talhaoui, Hicham; Abu-Rub, Haitham.

ISA Trans ; 148: 92-104, 2024 May.

Article in English | MEDLINE | ID: mdl-38570257

ABSTRACT

This paper introduces a novel direct torque control approach based on the decision tree (T-DTC), employing artificial neural networks that are effectively trained to enhance accuracy and robustness. The main objective of T-DTC is the substantial reduction of flux and torque ripples inherent in the conventional DTC, ensuring effective control of the induction motor. The conventional hysteresis controllers for stator flux and electromagnetic torque are replaced by two advanced controllers named M5 Prime model trees. Additionally, the traditional switching table is substituted with a novel decision tree table utilizing the classifier algorithm 4.5. The effectiveness of the proposed T-DTC strategy is demonstrated through simulation in MATLAB/Simulink and validated in real-time using an HIL platform based on OPAL-RT OP 5600 and Virtex 6 FPGA ML605. The results obtained demonstrate a notable improvement compared to existing techniques in the literature.

15.

ExaFlexHH: an exascale-ready, flexible multi-FPGA library for biologically plausible brain simulations.

Miedema, Rene; Strydis, Christos.

Front Neuroinform ; 18: 1330875, 2024.

Article in English | MEDLINE | ID: mdl-38680548

ABSTRACT

Introduction: In-silico simulations are a powerful tool in modern neuroscience for enhancing our understanding of complex brain systems at various physiological levels. To model biologically realistic and detailed systems, an ideal simulation platform must possess: (1) high performance and performance scalability, (2) flexibility, and (3) ease of use for non-technical users. However, most existing platforms and libraries do not meet all three criteria, particularly for complex models such as the Hodgkin-Huxley (HH) model or for complex neuron-connectivity modeling such as gap junctions. Methods: This work introduces ExaFlexHH, an exascale-ready, flexible library for simulating HH models on multi-FPGA platforms. Utilizing FPGA-based Data-Flow Engines (DFEs) and the dataflow programming paradigm, ExaFlexHH addresses all three requirements. The library is also parameterizable and compliant with NeuroML, a prominent brain-description language in computational neuroscience. We demonstrate the performance scalability of the platform by implementing a highly demanding extended-Hodgkin-Huxley (eHH) model of the Inferior Olive using ExaFlexHH. Results: Model simulation results show linear scalability for unconnected networks and near-linear scalability for networks with complex synaptic plasticity, with a 1.99 × performance increase using two FPGAs compared to a single FPGA simulation, and 7.96 × when using eight FPGAs in a scalable ring topology. Notably, our results also reveal consistent performance efficiency in GFLOPS per watt, further facilitating exascale-ready computing speeds and pushing the boundaries of future brain-simulation platforms. Discussion: The ExaFlexHH library shows superior resource efficiency, quantified in FLOPS per hardware resources, benchmarked against other competitive FPGA-based brain simulation implementations.

16.

Machine learning algorithms for FPGA Implementation in biomedical engineering applications: A review.

Altman, Morteza Babaee; Wan, Wenbin; Hosseini, Amineh Sadat; Arabi Nowdeh, Saber; Alizadeh, Masoumeh.

Heliyon ; 10(4): e26652, 2024 Feb 29.

Article in English | MEDLINE | ID: mdl-38434008

ABSTRACT

Field Programmable Gate Arrays (FPGAs) are integrated circuits that can be configured by the user after manufacturing, making them suitable for customized hardware prototypes, a feature not available in general-purpose processors in Application Specific Integrated Circuits (ASIC). In this paper, we review the vast Machine Learning (ML) algorithms implemented on FPGAs to increase performance and capabilities in healthcare technology over 2001-2023. In particular, we focus on real-time ML algorithms targeted to FPGAs and hybrid System-on-a-chip (SoC) FPGA architectures for biomedical applications. We discuss how previous works have customized and optimized their ML algorithm and FPGA designs to address the putative embedded systems challenges of limited memory, hardware, and power resources while maintaining scalability to accommodate different network sizes and topologies. We provide a synthesis of articles implementing classifiers and regression algorithms, as they are significant algorithms that cover a wide range of ML algorithms used for biomedical applications. This article is written to inform the biomedical engineering and FPGA design communities to advance knowledge of FPGA-enabled ML accelerators for biomedical applications.

17.

Leveraging neuro-inspired AI accelerator for high-speed computing in 6G networks.

Lin, Chunxiao; Azmine, Muhammad Farhan; Liang, Yibin; Yi, Yang.

Front Comput Neurosci ; 18: 1345644, 2024.

Article in English | MEDLINE | ID: mdl-38449671

ABSTRACT

The field of wireless communication is currently being pushed to new boundaries with the emergence of 6G technology. This advanced technology requires substantially increased data rates and processing speeds while simultaneously requiring energy-efficient solutions for real-world practicality. In this work, we apply a neuroscience-inspired machine learning model called echo state network (ESN) to the critical task of symbol detection in massive MIMO-OFDM systems, a key technology for 6G networks. Our work encompasses the design of a hardware-accelerated reservoir neuron architecture to speed up the ESN-based symbol detector. The design is then validated through a proof of concept on the Xilinx Virtex-7 FPGA board in real-world scenarios. The experiment results show the great performance and scalability of our symbol detector design across a range of MIMO configurations, compared with traditional MIMO symbol detection methods like linear minimum mean square error. Our findings also confirm the performance and feasibility of our entire system, reflected in low bit error rates, low resource utilization, and high throughput.

18.

Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function.

Jiang, Yiyue; Vaicaitis, Andrius; Dooley, John; Leeser, Miriam.

Sensors (Basel) ; 24(6)2024 Mar 13.

Article in English | MEDLINE | ID: mdl-38544092

ABSTRACT

The implementation of neural networks (NNs) on edge devices enables local processing of wireless data, but faces challenges such as high computational complexity and memory requirements when deep neural networks (DNNs) are used. Shallow neural networks customized for specific problems are more efficient, requiring fewer resources and resulting in a lower latency solution. An additional benefit of the smaller network size is that it is suitable for real-time processing on edge devices. The main concern with shallow neural networks is their accuracy performance compared to DNNs. In this paper, we demonstrate that a customized adaptive activation function (AAF) can meet the accuracy of a DNN. We designed an efficient FPGA implementation for a customized segmented spline curve neural network (SSCNN) structure to replace the traditional fixed activation function with an AAF. We compared our SSCNN with different neural network structures such as a real-valued time-delay neural network (RVTDNN), an augmented real-valued time-delay neural network (ARVTDNN), and deep neural networks with different parameters. Our proposed SSCNN implementation uses 40% fewer hardware resources and no block RAMs compared to the DNN with similar accuracy. We experimentally validated this computationally efficient and memory-saving FPGA implementation of the SSCNN for digital predistortion of radio-frequency (RF) power amplifiers using the AMD/Xilinx RFSoC ZCU111. The implemented solution uses less than 3% of the available resources. The solution also enables an increase of the clock frequency to 221.12 MHz, allowing the transmission of wide bandwidth signals.

19.

Enhancing Embedded Object Tracking: A Hardware Acceleration Approach for Real-Time Predictability.

Zhang, Mingyang; Van Beeck, Kristof; Goedemé, Toon.

J Imaging ; 10(3)2024 Mar 13.

Article in English | MEDLINE | ID: mdl-38535150

ABSTRACT

While Siamese object tracking has witnessed significant advancements, its hard real-time behaviour on embedded devices remains inadequately addressed. In many application cases, an embedded implementation should not only have a minimal execution latency, but this latency should ideally also have zero variance, i.e., be predictable. This study aims to address this issue by meticulously analysing real-time predictability across different components of a deep-learning-based video object tracking system. Our detailed experiments not only indicate the superiority of Field-Programmable Gate Array (FPGA) implementations in terms of hard real-time behaviour but also unveil important time predictability bottlenecks. We introduce dedicated hardware accelerators for key processes, focusing on depth-wise cross-correlation and padding operations, utilizing high-level synthesis (HLS). Implemented on a KV260 board, our enhanced tracker exhibits not only a speed up, with a factor of 6.6, in mean execution time but also significant improvements in hard real-time predictability by yielding 11 times less latency variation as compared to our baseline. A subsequent analysis of power consumption reveals our approach's contribution to enhanced power efficiency. These advancements underscore the crucial role of hardware acceleration in realizing time-predictable object tracking on embedded systems, setting new standards for future hardware-software co-design endeavours in this domain.

20.

Cosine convolutional neural network and its application for seizure detection.

Liu, Guoyang; Tian, Lan; Wen, Yiming; Yu, Weize; Zhou, Weidong.

Neural Netw ; 174: 106267, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38555723

ABSTRACT

Traditional convolutional neural networks (CNNs) often suffer from high memory consumption and redundancy in their kernel representations, leading to overfitting problems and limiting their application in real-time, low-power scenarios such as seizure detection systems. In this work, a novel cosine convolutional neural network (CosCNN), which replaces traditional kernels with the robust cosine kernel modulated by only two learnable factors, is presented, and its effectiveness is validated on the tasks of seizure detection. Meanwhile, based on the cosine lookup table and KL-divergence, an effective post-training quantization algorithm is proposed for CosCNN hardware implementation. With quantization, CosCNN can achieve a nearly 75% reduction in the memory cost with almost no accuracy loss. Moreover, we design a configurable cosine convolution accelerator on Field Programmable Gate Array (FPGA) and deploy the quantized CosCNN on Zedboard, proving the proposed seizure detection system can operate in real-time and low-power scenarios. Extensive experiments and comparisons were conducted using two publicly available epileptic EEG databases, the Bonn database and the CHB-MIT database. The results highlight the performance superiority of the CosCNN over traditional CNNs as well as other seizure detection methods.

Subject(s)

Electroencephalography , Epilepsy , Humans , Electroencephalography/methods , Seizures/diagnosis , Neural Networks, Computer , Epilepsy/diagnosis , Algorithms

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL