All times are CEST.
Time | Content | |
---|---|---|
09:00 – 09:10 | Opening | |
09:10 – 10:00 |
Keynote presentation:
Monitoring and anomaly detection in CINECA’s supercomputing facility Daniele Cesarini (CINECA, Italy) |
|
10:00 – 10:30 |
Paper presentation:
Automatic Detection of HPC Job Inefficiencies at TU Dresden’s HPC center with PIKA Frank Winkler and Andreas Knüpfer (TU Dresden, Germany) |
|
10:30 – 11:00 |
Paper presentation:
A Fast Simulator to Enable HPC Scheduling Strategy Comparisons Alex Wilkinson (UCL, United Kingdom), Jess Jones, Harvey Richardson, Tim Dykes and Utz-Uwe Haus (HPE, United Kingdom) |
|
11:00 – 11:30 | Coffee break | |
11:30 – 12:00 |
Invited talk:
Current and Future Monitoring at GWDG to Ensure Performant and Secure Operation Hendrik Nolte (GWDG Göttingen, Germany) |
|
12:00 – 12:30 |
Paper presentation:
ML-based methodology for HPC facilities supervision Laetitia Anton, Sophie Willemot , Sebastien Gougeaud (CEA, France) and Soraya Zertal (Univ of Versailles, France) |
|
12:30 – 12:55 |
Panel / participant discussion:
MODA as the foundational component of data center digital twins Utz-Uwe Haus (HPE, Switzerland) |
|
12:55 – 13:00 | Closing |
Keynote presentation:
Monitoring and anomaly detection in CINECA’s supercomputing facility
Daniele Cesarini (CINECA, Italy)
Supercomputer facilities place great importance on achieving energy efficiency and data center automation as crucial objectives in their research and deployment agenda. As modern high-performance computing systems become increasingly complex, automated and data-driven methodologies become essential to manage and maintain system availability. At CINECA, we have developed a comprehensive and adaptable monitoring framework, known as Examon, which monitors HPC systems and critical facility infrastructures. We are particularly focused on assessing the feasibility of using machine-learning for job scheduling, power prediction, and deep-learning for anomaly detection of compute nodes. In this presentation, we will discuss the latest advancements of Examon and how CINECA leverages this tool in production on large-scale HPC systems and supercomputing facilities.