Dresden 2013 – wissenschaftliches Programm
Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe
HK: Fachverband Physik der Hadronen und Kerne
HK 34: Instrumentation
HK 34.3: Vortrag
Dienstag, 5. März 2013, 14:45–15:00, HSZ-405
Experience Report: System Management at the ALICE HLT Cluster — •Camilo Lara, Falco Vennedey, Jochen Ulrich, Stefan Böttger, Timo Breitner, and Udo Kebschull — Infrastruktur und Rechnersysteme in der Informationsverarbeitung (IRI), Institut für Informatik, Goethe-Universität Frankfurt am Main
The ALICE HLT cluster is responsible for the first analysis and compression of the data from the ALICE experiment at CERN. The processing is performed using hardware accelerators like FPGAs, GPUs and computer nodes with commodity hardware. The mixture of hardware accelerators and several types of nodes causes an increased configuration and system management effort. To handle this effort, we are using a combination of three tools: Chef for the configuration management, Ganglia for the real time monitoring and SysMES for unattended system management, i.e. automatic problem recognition and solution. The tools help to minimize the manpower needed to administrate the cluster by reducing the time needed to recognize and identify problems or even by solving problems automatically. In this talk, we give an insight into our setup and report on the experience we have gained with the heterogeneous, on-line processing cluster during the last four years.