Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe
T: Fachverband Teilchenphysik
T 44: Data, AI, Computing 4 (workflow)
T 44.4: Vortrag
Dienstag, 5. März 2024, 16:45–17:00, Geb. 30.34: LTI
Parallelization and benchmarking of a Jupyter based HEP data analysis with Dask — •Karl Erik Bode, Michael Böhler, and Markus Schumacher — Institute of Physics, Albert-Ludwigs-University Freiburg, Freiburg, Germany
Using the combination of the scientific Python software stack, Jupyter notebooks, and Dask it is possible to scale an interactive HEP data analysis, both on local resources and on a computing cluster.
After vectorisation of the reference Higgs boson to di-photon decay analysis, the required compute time is decreased and it is even possible to analyze larger data sets.
With Dask, the vectorized algorithm can be scaled to utilize all CPU cores of the local machine and at the same time provides data structures to enable analysis of data sets larger than memory.
Only minor changes are required, to port this analysis setup from a laptop to a to a High Throughput Cluster (HTC) or to a High Performance Cluster (HPC).
This contribution introduces the used software stack, specific for scaling the algorithm from single threaded to a mult threaded analysis. Finally we discuss the performance improvement both on a typical laptop as well as on an HPC and HPC cluster.
Keywords: Interactive Analysis; Cluster Computing