DPG Phi
Verhandlungen
Verhandlungen
DPG

Karlsruhe 2024 – scientific programme

Parts | Days | Selection | Search | Updates | Downloads | Help

T: Fachverband Teilchenphysik

T 44: Data, AI, Computing 4 (workflow)

T 44.4: Talk

Tuesday, March 5, 2024, 16:45–17:00, Geb. 30.34: LTI

Parallelization and benchmarking of a Jupyter based HEP data analysis with Dask — •Karl Erik Bode, Michael Böhler, and Markus Schumacher — Institute of Physics, Albert-Ludwigs-University Freiburg, Freiburg, Germany

Using the combination of the scientific Python software stack, Jupyter notebooks, and Dask it is possible to scale an interactive HEP data analysis, both on local resources and on a computing cluster.

After vectorisation of the reference Higgs boson to di-photon decay analysis, the required compute time is decreased and it is even possible to analyze larger data sets.

With Dask, the vectorized algorithm can be scaled to utilize all CPU cores of the local machine and at the same time provides data structures to enable analysis of data sets larger than memory.

Only minor changes are required, to port this analysis setup from a laptop to a to a High Throughput Cluster (HTC) or to a High Performance Cluster (HPC).

This contribution introduces the used software stack, specific for scaling the algorithm from single threaded to a mult threaded analysis. Finally we discuss the performance improvement both on a typical laptop as well as on an HPC and HPC cluster.

Keywords: Interactive Analysis; Cluster Computing

100% | Mobile Layout | Deutsche Version | Contact/Imprint/Privacy
DPG-Physik > DPG-Verhandlungen > 2024 > Karlsruhe