Program

All times are CEST.

TimeContentPDF
14:00 – 14:05Opening
14:05 – 14:55Keynote Presentation:
Redefining HPC Observability: Integrating Monitoring, Modeling, and Meaning
Sarah Neuwirth (Johannes Gutenberg University, Germany)
14:55 – 15:20Paper Presentation:
Monitoring Energy Consumption of Workloads on HPC Vega
Teo Prica (UM – University of Maribor, Slovenia / IZUM – Institute of Information Science, Slovenia) and Aleš Zamuda (UM – University of Maribor, Slovenia)
15:20 – 15:45Paper Presentation:
Supporting HPC Users with LLview
Filipe Souza Mendes Guimarães, Aravind Sankaran and Wolfgang Frings (Forschungszentrum Jülich, Germany)
15:45 – 16:00Lightning Talk:
Enabling Adaptive Power Control through HPC Job Power Prediction
Kevin Menear and Dmitry Duplyakin (National Renewable Energy Laboratory, United States)
16:00 – 16:30Coffee Break
16:30 – 16:45Lightning Talk:
HPC Operational Data Analytics for Digital Twins
Jeff Hanson (Hewlett Packard Enterprise, United States)
16:45 – 17:10Paper Presentation:
What Time Taught Us: Monitoring a Computing Technology Testbed Across Multiple Years
Eva Siegmann, David Carlson (Stony Brook University, United States), Nikolay Simakov (University at Buffalo, United States), Anthony Curtis, Alan Calder and Robert Harrison (Stony Brook University, United States)
17:10 – 17:35Paper Presentation:
A Unified I/O Monitoring Framework Using eBPF
Mahendra Paipuri (CNRS, France)
17:35 – 18:00Paper Presentation:
Duration-Informed Workload Scheduler
Daniela Loreti, Davide Leone and Andrea Borghesi (University of Bologna, Italy)

Keynote Presentation:
Redefining HPC Observability: Integrating Monitoring, Modeling, and Meaning
Sarah Neuwirth (Johannes Gutenberg University, Germany)

As HPC systems become increasingly heterogeneous and applications span simulation, AI and data-intensive workflows, performance optimization is becoming more complex and traditional monitoring is no longer sufficient. To fully understand and adapt these systems, we need to rethink observability as an integrated process that combines continuous monitoring, empirical modeling, and domain-relevant interpretation. This talk will explore how such a shift can transform the analysis, optimization and operation of HPC systems at scale. 

Drawing on practical experience and recent research, this talk will propose a framework that combines low-level telemetry with explainable modeling approaches, including I/O pattern detection, extended roofline analysis, ML-driven performance workflows, and benchmarking strategies. We will explore how to extract meaning from multiple data sources, foster reproducibility, and build trust in performance insights. The talk will conclude with a vision for self-aware HPC environments that can dynamically and responsibly adapt to evolving computational needs.