Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Phys Imaging Radiat Oncol ; 27: 100484, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37664799

ABSTRACT

Background and purpose: Physiological motion impacts the dose delivered to tumours and vital organs in external beam radiotherapy and particularly in particle therapy. The excellent soft-tissue demarcation of 4D magnetic resonance imaging (4D-MRI) could inform on intra-fractional motion, but long image reconstruction times hinder its use in online treatment adaptation. Here we employ techniques from high-performance computing to reduce 4D-MRI reconstruction times below two minutes to facilitate their use in MR-guided radiotherapy. Material and methods: Four patients with pancreatic adenocarcinoma were scanned with a radial stack-of-stars gradient echo sequence on a 1.5T MR-Linac. Fast parallelised open-source implementations of the extra-dimensional golden-angle radial sparse parallel algorithm were developed for central processing unit (CPU) and graphics processing unit (GPU) architectures. We assessed the impact of architecture, oversampling and respiratory binning strategy on 4D-MRI reconstruction time and compared images using the structural similarity (SSIM) index against a MATLAB reference implementation. Scaling and bottlenecks for the different architectures were studied using multi-GPU systems. Results: All reconstructed 4D-MRI were identical to the reference implementation (SSIM > 0.99). Images reconstructed with overlapping respiratory bins were sharper at the cost of longer reconstruction times. The CPU  + GPU implementation was over 17 times faster than the reference implementation, reconstructing images in 60 ± 1 s and hyper-scaled using multiple GPUs. Conclusion: Respiratory-resolved 4D-MRI reconstruction times can be reduced using high-performance computing methods for online workflows in MR-guided radiotherapy with potential applications in particle therapy.

2.
J Big Data ; 10(1): 95, 2023.
Article in English | MEDLINE | ID: mdl-37283690

ABSTRACT

Processing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph processing acceleration with Field-Programmable Gate Arrays (FPGA). FPGAs are programmable hardware devices that can be fully customised to perform specific tasks in a highly parallel and efficient manner. However, FPGAs have a limited amount of on-chip memory that cannot fit the entire graph. Due to the limited device memory size, data needs to be repeatedly transferred to and from the FPGA on-chip memory, which makes data transfer time dominate over the computation time. A possible way to overcome the FPGA accelerators' resource limitation is to engage a multi-FPGA distributed architecture and use an efficient partitioning scheme. Such a scheme aims to increase data locality and minimise communication between different partitions. This work proposes an FPGA processing engine that overlaps, hides and customises all data transfers so that the FPGA accelerator is fully utilised. This engine is integrated into a framework for using FPGA clusters and is able to use an offline partitioning method to facilitate the distribution of large-scale graphs. The proposed framework uses Hadoop at a higher level to map a graph to the underlying hardware platform. The higher layer of computation is responsible for gathering the blocks of data that have been pre-processed and stored on the host's file system and distribute to a lower layer of computation made of FPGAs. We show how graph partitioning combined with an FPGA architecture will lead to high performance, even when the graph has Millions of vertices and Billions of edges. In the case of the PageRank algorithm, widely used for ranking the importance of nodes in a graph, compared to state-of-the-art CPU and GPU solutions, our implementation is the fastest, achieving a speedup of 13 compared to 8 and 3 respectively. Moreover, in the case of the large-scale graphs, the GPU solution fails due to memory limitations while the CPU solution achieves a speedup of 12 compared to the 26x achieved by our FPGA solution. Other state-of-the-art FPGA solutions are 28 times slower than our proposed solution. When the size of a graph limits the performance of a single FPGA device, our performance model shows that using multi-FPGAs in a distributed system can further improve the performance by about 12x. This highlights our implementation efficiency for large datasets not fitting in the on-chip memory of a hardware device.

3.
Comput Biol Med ; 153: 106483, 2023 02.
Article in English | MEDLINE | ID: mdl-36621192

ABSTRACT

The COVID-19 disease pandemic spread rapidly worldwide and caused extensive human death and financial losses. Therefore, finding accurate, accessible, and inexpensive methods for diagnosing the disease has challenged researchers. To automate the process of diagnosing COVID-19 disease through images, several strategies based on deep learning, such as transfer learning and ensemble learning, have been presented. However, these techniques cannot deal with noises and their propagation in different layers. In addition, many of the datasets already being used are imbalanced, and most techniques have used binary classification, COVID-19, from normal cases. To address these issues, we use the blind/referenceless image spatial quality evaluator to filter out inappropriate data in the dataset. In order to increase the volume and diversity of the data, we merge two datasets. This combination of two datasets allows multi-class classification between the three states of normal, COVID-19, and types of pneumonia, including bacterial and viral types. A weighted multi-class cross-entropy is used to reduce the effect of data imbalance. In addition, a fuzzy fine-tuned Xception model is applied to reduce the noise propagation in different layers. Quantitative analysis shows that our proposed model achieves 96.60% accuracy on the merged test set, which is more accurate than previously mentioned state-of-the-art methods.


Subject(s)
COVID-19 , Humans , COVID-19/diagnosis , COVID-19 Testing , Entropy
4.
J Chem Theory Comput ; 13(11): 5265-5272, 2017 Nov 14.
Article in English | MEDLINE | ID: mdl-29019679

ABSTRACT

We demonstrate the use of dataflow technology in the computation of the correlation energy in molecules at the Møller-Plesset perturbation theory (MP2) level. Specifically, we benchmark density fitting (DF)-MP2 for as many as 168 atoms (in valinomycin) and show that speed-ups between 3 and 3.8 times can be achieved when compared to the MOLPRO package run on a single CPU. Acceleration is achieved by offloading the matrix multiplications steps in DF-MP2 to Dataflow Engines (DFEs). We project that the acceleration factor could be as much as 24 with the next generation of DFEs.

5.
IEEE Trans Vis Comput Graph ; 22(4): 1377-86, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26780798

ABSTRACT

Latency - the delay between a user's action and the response to this action - is known to be detrimental to virtual reality. Latency is typically considered to be a discrete value characterising a delay, constant in time and space - but this characterisation is incomplete. Latency changes across the display during scan-out, and how it does so is dependent on the rendering approach used. In this study, we present an ultra-low latency real-time ray-casting renderer for virtual reality, implemented on an FPGA. Our renderer has a latency of ~1 ms from 'tracker to pixel'. Its frameless nature means that the region of the display with the lowest latency immediately follows the scan-beam. This is in contrast to frame-based systems such as those using typical GPUs, for which the latency increases as scan-out proceeds. Using a series of high and low speed videos of our system in use, we confirm its latency of ~1 ms. We examine how the renderer performs when driving a traditional sequential scan-out display on a readily available HMO, the Oculus Rift OK2. We contrast this with an equivalent apparatus built using a GPU. Using captured human head motion and a set of image quality measures, we assess the ability of these systems to faithfully recreate the stimuli of an ideal virtual reality system - one with a zero latency tracker, renderer and display running at 1 kHz. Finally, we examine the results of these quality measures, and how each rendering approach is affected by velocity of movement and display persistence. We find that our system, with a lower average latency, can more faithfully draw what the ideal virtual reality system would. Further, we find that with low display persistence, the sensitivity to velocity of both systems is lowered, but that it is much lower for ours.

6.
Article in English | MEDLINE | ID: mdl-19163384

ABSTRACT

A more structured and streamlined design of implants is nowadays possible. In this paper we focus on implant processors located in the heart of implantable systems. We present a real and representative biomedical-application scenario where such a new processor can be employed. Based on a suitably selected processor simulator, various operational aspects of the application are being monitored. Findings on performance, cache behavior, branch prediction, power consumption, energy expenditure and instruction mixes are presented and analyzed. The suitability of such an implant processor and directions for future work are given.


Subject(s)
Biomedical Engineering/methods , Biomedical Engineering/trends , Cochlear Implants , Electrodes, Implanted , Hearing Loss/rehabilitation , Microcomputers , Algorithms , Computer Simulation , Computers , Electric Power Supplies , Equipment Design , Humans , Prosthesis Design
SELECTION OF CITATIONS
SEARCH DETAIL
...