Program

All times are CEST.

Time Content PDF
09:00 – 09:10 Opening
09:10 – 10:00 Keynote presentation:
Monitoring and anomaly detection in CINECA’s supercomputing facility
Daniele Cesarini  (CINECA, Italy)
PDF
10:00 – 10:30 Paper presentation:
Automatic Detection of HPC Job Inefficiencies at TU Dresden’s HPC center with PIKA
Frank Winkler  and Andreas Knüpfer (TU Dresden, Germany)
PDF
10:30 – 11:00 Paper presentation:
A Fast Simulator to Enable HPC Scheduling Strategy Comparisons
Alex Wilkinson  (UCL, United Kingdom), Jess Jones, Harvey Richardson, Tim Dykes and Utz-Uwe Haus (HPE, United Kingdom)
PDF
11:00 – 11:30 Coffee break
11:30 – 12:00 Invited talk:
Current and Future Monitoring at GWDG to Ensure Performant and Secure Operation
Hendrik Nolte  (GWDG Göttingen, Germany)
PDF
12:00 – 12:30 Paper presentation:
ML-based methodology for HPC facilities supervision
Laetitia Anton,  Sophie Willemot , Sebastien Gougeaud (CEA, France) and Soraya Zertal (Univ of Versailles, France)
PDF
12:30 – 12:55 Panel / participant discussion:
MODA as the foundational component of data center digital twins
Utz-Uwe Haus  (HPE, Switzerland)
12:55 – 13:00 Closing

Keynote presentation:
Monitoring and anomaly detection in CINECA’s supercomputing facility
Daniele Cesarini (CINECA, Italy)

Supercomputer facilities place great importance on achieving energy efficiency and data center automation as crucial objectives in their research and deployment agenda. As modern high-performance computing systems become increasingly complex, automated and data-driven methodologies become essential to manage and maintain system availability. At CINECA, we have developed a comprehensive and adaptable monitoring framework, known as Examon, which monitors HPC systems and critical facility infrastructures. We are particularly focused on assessing the feasibility of using machine-learning for job scheduling, power prediction, and deep-learning for anomaly detection of compute nodes. In this presentation, we will discuss the latest advancements of Examon and how CINECA leverages this tool in production on large-scale HPC systems and supercomputing facilities.