Karlsruhe 2024 – scientific programme
Parts | Days | Selection | Search | Updates | Downloads | Help
T: Fachverband Teilchenphysik
T 68: Data, AI, Computing 6 (ML utilities)
T 68.3: Talk
Wednesday, March 6, 2024, 16:30–16:45, Geb. 30.34: LTI
MLProf: Automated resource profiling of machine learning models in CMS production — •Nathan Prouvost, Marcel Rieger, and Peter Schleper — University Hamburg, Hamburg, Germany
With the increasing amount of data recorded by LHC experiments and collaborations, the efficient handling of computing resources is a topic of growing importance. In this regard, machine learning models, with their increasing number of applications and increasing specialization, can be difficult to include in central reconstruction workflows of an experiment.
For the CMS experiment at CERN, the awareness of this situation concerning time and memory budget has been growing in the last years. On top of these challenges, the integration of machine learning models into the CMS core software can be a very laborious task to begin with, impeding a fast feedback loop from performance measurements back to the model development. Therefore, a new tool for automating the extraction of resource consumption metrics of machine learning models within the CMS software environment, called MLProf, has been created.
This presentation introduces MLProf and its main features, including automated runtime measurements on different batch sizes, different versions of the CMS software environment and for different inference engines.
Keywords: profiling; machine-learning; CMS; production; software