Search | VHL Regional Portal

In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst.

Atzori, Marco; Köpp, Wiebke; Chien, Steven W D; Massaro, Daniele; Mallor, Fermín; Peplinski, Adam; Rezaei, Mohamad; Jansson, Niclas; Markidis, Stefano; Vinuesa, Ricardo; Laure, Erwin; Schlatter, Philipp; Weinkauf, Tino.

J Supercomput ; 78(3): 3605-3620, 2022.

Article in English | MEDLINE | ID: mdl-35210696

ABSTRACT

In situ visualization on high-performance computing systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We develop an in situ adaptor for Paraview Catalyst and Nek5000, a massively parallel Fortran and C code for computational fluid dynamics. We perform a strong scalability test up to 2048 cores on KTH's Beskow Cray XC40 supercomputer and assess in situ visualization's impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ≈ 21 % on 2048 cores (the relative efficiency of Nek5000 without in situ operations is ≈ 99 % ). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses the Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in situ processing time between rank 0 and all other ranks. In our case, better scaling and load-balancing in the parallel image composition would considerably improve the performance of Nek5000 with in situ capabilities. In general, the result of this study highlights the technical challenges posed by the integration of high-performance simulation codes and data-analysis libraries and their practical use in complex cases, even when efficient algorithms already exist for a certain application scenario.

Efficient iterative virtual screening with Apache Spark and conformal prediction.

Ahmed, Laeeq; Georgiev, Valentin; Capuccini, Marco; Toor, Salman; Schaal, Wesley; Laure, Erwin; Spjuth, Ola.

J Cheminform ; 10(1): 8, 2018 Mar 01.

Article in English | MEDLINE | ID: mdl-29492726

ABSTRACT

BACKGROUND: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. CONTRIBUTION: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. RESULTS: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub ( https://github.com/laeeq80/spark-cpvs ) and can be run on high-performance computers as well as on cloud resources.

E-Science technologies in a workflow for personalized medicine using cancer screening as a case study.

Spjuth, Ola; Karlsson, Andreas; Clements, Mark; Humphreys, Keith; Ivansson, Emma; Dowling, Jim; Eklund, Martin; Jauhiainen, Alexandra; Czene, Kamila; Grönberg, Henrik; Sparén, Pär; Wiklund, Fredrik; Cheddad, Abbas; Pálsdóttir, Þorgerður; Rantalainen, Mattias; Abrahamsson, Linda; Laure, Erwin; Litton, Jan-Eric; Palmgren, Juni.

J Am Med Inform Assoc ; 24(5): 950-957, 2017 Sep 01.

Article in English | MEDLINE | ID: mdl-28444384

ABSTRACT

OBJECTIVE: We provide an e-Science perspective on the workflow from risk factor discovery and classification of disease to evaluation of personalized intervention programs. As case studies, we use personalized prostate and breast cancer screenings. MATERIALS AND METHODS: We describe an e-Science initiative in Sweden, e-Science for Cancer Prevention and Control (eCPC), which supports biomarker discovery and offers decision support for personalized intervention strategies. The generic eCPC contribution is a workflow with 4 nodes applied iteratively, and the concept of e-Science signifies systematic use of tools from the mathematical, statistical, data, and computer sciences. RESULTS: The eCPC workflow is illustrated through 2 case studies. For prostate cancer, an in-house personalized screening tool, the Stockholm-3 model (S3M), is presented as an alternative to prostate-specific antigen testing alone. S3M is evaluated in a trial setting and plans for rollout in the population are discussed. For breast cancer, new biomarkers based on breast density and molecular profiles are developed and the US multicenter Women Informed to Screen Depending on Measures (WISDOM) trial is referred to for evaluation. While current eCPC data management uses a traditional data warehouse model, we discuss eCPC-developed features of a coherent data integration platform. DISCUSSION AND CONCLUSION: E-Science tools are a key part of an evidence-based process for personalized medicine. This paper provides a structured workflow from data and models to evaluation of new personalized intervention strategies. The importance of multidisciplinary collaboration is emphasized. Importantly, the generic concepts of the suggested eCPC workflow are transferrable to other disease domains, although each disease will require tailored solutions.

Subject(s)

Breast Neoplasms/diagnosis , Computational Biology , Early Detection of Cancer , Precision Medicine , Prostatic Neoplasms/diagnosis , Workflow , Aged , Algorithms , Biomarkers, Tumor/analysis , Data Mining , Female , Humans , Male , Middle Aged , Models, Biological , Risk Assessment , Sweden

Large-scale virtual screening on public cloud resources with Apache Spark.

Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola.

J Cheminform ; 9: 15, 2017.

Article in English | MEDLINE | ID: mdl-28316653

ABSTRACT

BACKGROUND: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. RESULTS: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. CONCLUSION: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.

Idle waves in high-performance computing.

Markidis, Stefano; Vencels, Juris; Peng, Ivy Bo; Akhmetova, Dana; Laure, Erwin; Henri, Pierre.

Phys Rev E Stat Nonlin Soft Matter Phys ; 91(1): 013306, 2015 Jan.

Article in English | MEDLINE | ID: mdl-25679738

ABSTRACT

The vast majority of parallel scientific applications distributes computation among processes that are in a busy state when computing and in an idle state when waiting for information from other processes. We identify the propagation of idle waves through processes in scientific applications with a local information exchange between the two processes. Idle waves are nondispersive and have a phase velocity inversely proportional to the average busy time. The physical mechanism enabling the propagation of idle waves is the local synchronization between two processes due to remote data dependency. This study provides a description of the large number of processes in parallel scientific applications as a continuous medium. This work also is a step towards an understanding of how localized idle periods can affect remote processes, leading to the degradation of global performance in parallel scientific applications.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL