Pesquisa | Portal Regional da BVS

Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function.

Jiang, Yiyue; Vaicaitis, Andrius; Dooley, John; Leeser, Miriam.

Sensors (Basel) ; 24(6)2024 Mar 13.

Artigo em Inglês | MEDLINE | ID: mdl-38544092

RESUMO

The implementation of neural networks (NNs) on edge devices enables local processing of wireless data, but faces challenges such as high computational complexity and memory requirements when deep neural networks (DNNs) are used. Shallow neural networks customized for specific problems are more efficient, requiring fewer resources and resulting in a lower latency solution. An additional benefit of the smaller network size is that it is suitable for real-time processing on edge devices. The main concern with shallow neural networks is their accuracy performance compared to DNNs. In this paper, we demonstrate that a customized adaptive activation function (AAF) can meet the accuracy of a DNN. We designed an efficient FPGA implementation for a customized segmented spline curve neural network (SSCNN) structure to replace the traditional fixed activation function with an AAF. We compared our SSCNN with different neural network structures such as a real-valued time-delay neural network (RVTDNN), an augmented real-valued time-delay neural network (ARVTDNN), and deep neural networks with different parameters. Our proposed SSCNN implementation uses 40% fewer hardware resources and no block RAMs compared to the DNN with similar accuracy. We experimentally validated this computationally efficient and memory-saving FPGA implementation of the SSCNN for digital predistortion of radio-frequency (RF) power amplifiers using the AMD/Xilinx RFSoC ZCU111. The implemented solution uses less than 3% of the available resources. The solution also enables an increase of the clock frequency to 221.12 MHz, allowing the transmission of wide bandwidth signals.

High-performance transformation of protein structure representation from internal to Cartesian coordinates.

Bayati, Mahsa; Leeser, Miriam; Bardhan, Jaydeep P.

J Comput Chem ; 41(24): 2104-2114, 2020 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-32686852

RESUMO

We present a highly parallel algorithm to convert internal coordinates of a polymeric molecule into Cartesian coordinates. Traditionally, converting the structures of polymers (e.g., proteins) from internal to Cartesian coordinates has been performed serially, due to an inherent linear dependency along the polymer chain. We show this dependency can be removed using a tree-based concatenation of coordinate transforms between segments, and then parallelized efficiently on graphics processing units (GPUs). The conversion algorithm is applicable to protein engineering and fitting protein structures to experimental data, and we observe an order of magnitude speedup using parallel processing on a GPU compared to serial execution on a CPU.

Assuntos

Proteínas/química , Algoritmos , Aminoácidos/química , Dissulfetos/química , Simulação de Dinâmica Molecular , Método de Monte Carlo , Conformação Proteica , Relação Estrutura-Atividade

Fast reconstruction of 3D volumes from 2D CT projection data with GPUs.

Leeser, Miriam; Mukherjee, Saoni; Brock, James.

BMC Res Notes ; 7: 582, 2014 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-25176282

RESUMO

BACKGROUND: Biomedical image reconstruction applications require producing high fidelity images in or close to real-time. We have implemented reconstruction of three dimensional conebeam computed tomography(CBCT) with two dimensional projections. The algorithm takes slices of the target, weights and filters them to backproject the data, then creates the final 3D volume. We have implemented the algorithm using several hardware and software approaches and taken advantage of different types of parallelism in modern processors. The two hardware platforms used are a Central Processing Unit (CPU) and a heterogeneous system with a combination of CPU and GPU. On the CPU we implement serial MATLAB, parallel MATLAB, C and parallel C with OpenMP extensions. These codes are compared against the heterogeneous versions written in CUDA-C and OpenCL. FINDINGS: Our results show that GPUs are particularly well suited to accelerating CBCT. Relative performance was evaluated on a mathematical phantom as well as on mouse data. Speedups of up to 200x are observed by using an AMD GPU compared to a parallel version in C with OpenMP constructs. CONCLUSIONS: In this paper, we have implemented the Feldkamp-Davis-Kress algorithm, compatible with Fessler's image reconstruction toolbox and tested it on different hardware platforms including CPU and a combination of CPU and GPU. Both NVIDIA and AMD GPUs have been used for performance evaluation. GPUs provide significant speedup over the parallel CPU version.

Assuntos

Tomografia Computadorizada por Raios X/métodos , Algoritmos

The effect of temporal impulse response on experimental reduction of photon scatter in time-resolved diffuse optical tomography.

Valim, Niksa; Brock, James; Leeser, Miriam; Niedre, Mark.

Phys Med Biol ; 58(2): 335-49, 2013 Jan 21.

Artigo em Inglês | MEDLINE | ID: mdl-23257349

RESUMO

New fast detector technology has driven significant renewed interest in time-resolved measurement of early photons in improving imaging resolution in diffuse optical tomography and fluorescence mediated tomography in recent years. In practice, selection of early photons results in significantly narrower instrument photon density sensitivity functions (PDSFs) than the continuous wave case, resulting in a better conditioned reconstruction problem. In this work, we studied the quantitative impact of the instrument temporal impulse response function (TIRF) on experimental PDSFs in tissue mimicking optical phantoms. We used a multimode fiber dispersion method to vary the system TIRF over a range of representative literature values. Substantial disagreement in PDSF width--by up to 40%--was observed between experimental measurements and Monte Carlo (MC) models of photon propagation over the range of TIRFs studied. On average, PDSFs were broadened by about 0.3 mm at the center plane of the 2 cm wide imaging chamber per 100 ps of the instrument TIRF at early times. Further, this broadening was comparable on both the source and detector sides. Results were confirmed by convolution of instrument TIRFs with MC simulations. These data also underscore the importance of correcting imaging PDSFs for the instrument TIRF when performing tomographic image reconstruction to ensure accurate data-model agreement.

Assuntos

Aumento da Imagem/métodos , Fótons , Espalhamento de Radiação , Tomografia Óptica/métodos , Aumento da Imagem/instrumentação , Método de Monte Carlo , Fibras Ópticas , Espectrometria de Fluorescência , Fatores de Tempo , Tomografia Óptica/instrumentação

The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs.

Leeser, Miriam; Yablonski, Devon; Brooks, Dana; King, Laurie Smith.

Comput Archit News ; 39(4): 2-7, 2011 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-23807820

RESUMO

Graphics Processing Units (GPUs) are widely used to accelerate scientific applications. Many successes have been reported with speedups of two or three orders of magnitude over serial implementations of the same algorithms. These speedups typically pertain to a specific implementation with fixed parameters mapped to a specific hardware implementation. The implementations are not designed to be easily ported to other GPUs, even from the same manufacturer. When target hardware changes, the application must be re-optimized. In this paper we address a different problem. We aim to deliver working, efficient GPU code in a library that is downloaded and run by many different users. The issue is to deliver efficiency independent of the individual user parameters and without a priori knowledge of the hardware the user will employ. This problem requires a different set of tradeoffs than finding the best runtime for a single solution. Solutions must be adaptable to a range of different parameters both to solve users' problems and to make the best use of the target hardware. Another issue is the integration of GPUs into a Problem Solving Environment (PSE) where the use of a GPU is almost invisible from the perspective of the user. Ease of use and smooth interactions with the existing user interface are important to our approach. We illustrate our solution with the incorporation of GPU processing into the Scientific Computing Institute (SCI)Run Biomedical PSE developed at the University of Utah. SCIRun allows scientists to interactively construct many different types of biomedical simulations. We use this environment to demonstrate the effectiveness of the GPU by accelerating time consuming algorithms in the scientist's simulations. Specifically we target the linear solver module, including Conjugate Gradient, Jacobi and MinRes solvers for sparse matrices.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA