Karlsruhe 2024 – scientific programme
Parts | Days | Selection | Search | Updates | Downloads | Help
T: Fachverband Teilchenphysik
T 68: Data, AI, Computing 6 (ML utilities)
T 68.1: Talk
Wednesday, March 6, 2024, 16:00–16:15, Geb. 30.34: LTI
b-hive: a Model-Independent Machine Learning Training Framework for the CMS Experiment — •Mate Farkas1, Niclas Eich2, and Martin Erdmann3 — 1mate.farkas@rwth-aachen.de — 2niclas.eich@rwth-aachen.de — 3martin.erdmann@physik.rwth-aachen.de
In high-energy physics (HEP), neural-network (NN) based algorithms have found many applications, such as quark-flavor identification of jets in experiments like the Compact Muon Solenoid (CMS) at the Large Hadron Collider (LHC) at CERN. Unfortunately, complete training pipelines often finds application-specific obstacles like the processing of many and large root files, the data provisioning to the model, and a correct evaluation. We have developed a framework called "b-hive" that combines state-of-the-art tools for HEP data processing and training in a Python-based ecosystem. The framework uses common Python packages like law, coffea and pytorch bundled in a conda-environment, aimed for an uncomplicated setup. Different subtasks like dataset conversion, training, and evaluation are implemented as law tasks, making the reproduction of trainings through built-in versioning and parametrization straightforward. The framework is designed in a modular structure so that single components can be exchanged and used through parameters, making b-hive not only suited for production tasks but also network development and optimization. Further, fundamental HEP requirements as the configuration of different physics processes, event-level information, and kinematic cuts can be specified and steered in a single configuration without touching the code itself.
Keywords: b-hive; CMS Experiment; Machine Learning; b-tagging; python