All times are CEST.
Time | Content | |
---|---|---|
14:00 – 14:05 | Opening | |
14:05 – 14:55 | Keynote Presentation: Redefining HPC Observability: Integrating Monitoring, Modeling, and Meaning Sarah Neuwirth (Johannes Gutenberg University, Germany) | |
14:55 – 15:20 | Paper Presentation: Monitoring Energy Consumption of Workloads on HPC Vega Teo Prica (UM – University of Maribor, Slovenia / IZUM – Institute of Information Science, Slovenia) and Aleš Zamuda (UM – University of Maribor, Slovenia) | |
15:20 – 15:45 | Paper Presentation: Supporting HPC Users with LLview Filipe Souza Mendes Guimarães, Aravind Sankaran and Wolfgang Frings (Forschungszentrum Jülich, Germany) | |
15:45 – 16:00 | Lightning Talk: Enabling Adaptive Power Control through HPC Job Power Prediction Kevin Menear and Dmitry Duplyakin (National Renewable Energy Laboratory, United States) | |
16:00 – 16:30 | Coffee Break | |
16:30 – 16:45 | Lightning Talk: HPC Operational Data Analytics for Digital Twins Jeff Hanson (Hewlett Packard Enterprise, United States) | |
16:45 – 17:10 | Paper Presentation: What Time Taught Us: Monitoring a Computing Technology Testbed Across Multiple Years Eva Siegmann, David Carlson (Stony Brook University, United States), Nikolay Simakov (University at Buffalo, United States), Anthony Curtis, Alan Calder and Robert Harrison (Stony Brook University, United States) | |
17:10 – 17:35 | Paper Presentation: A Unified I/O Monitoring Framework Using eBPF Mahendra Paipuri (CNRS, France) | |
17:35 – 18:00 | Paper Presentation: Duration-Informed Workload Scheduler Daniela Loreti, Davide Leone and Andrea Borghesi (University of Bologna, Italy) |
Keynote Presentation:
Redefining HPC Observability: Integrating Monitoring, Modeling, and Meaning
Sarah Neuwirth (Johannes Gutenberg University, Germany)
As HPC systems become increasingly heterogeneous and applications span simulation, AI and data-intensive workflows, performance optimization is becoming more complex and traditional monitoring is no longer sufficient. To fully understand and adapt these systems, we need to rethink observability as an integrated process that combines continuous monitoring, empirical modeling, and domain-relevant interpretation. This talk will explore how such a shift can transform the analysis, optimization and operation of HPC systems at scale.
Drawing on practical experience and recent research, this talk will propose a framework that combines low-level telemetry with explainable modeling approaches, including I/O pattern detection, extended roofline analysis, ML-driven performance workflows, and benchmarking strategies. We will explore how to extract meaning from multiple data sources, foster reproducibility, and build trust in performance insights. The talk will conclude with a vision for self-aware HPC environments that can dynamically and responsibly adapt to evolving computational needs.