Program

All times are CEST.

TimeContentSlides
09:00 – 09:05Opening
09:05 – 09:55Keynote Presentation:
The Past, Current, and Future of HPC ODA
Woong Shin (Oak Ridge National Laboratory)
TBA
09:55-10:20Paper Presentation:
Automatic Workload Characterization on Production HPC Systems via Roofline Telemetry
Bole Ma, Jan Eitzinger, Gerhard Wellein (Erlangen National High Performance Computing Center, Germany)
TBA
10:20 – 10:45Paper Presentation:
Pre-runtime GPU Power Quantile Forecasting from Submission-Time Job Artifacts
Kevin Menear, Jazmin Green, Struan Clark, Sara Abril-Guevara, Olivia Hull, Weslley da Silva Pereira, Kristin Munch (National Laboratory of the Rockies, United States of America)
TBA
10:45 – 11:00Invited Talk:
Data-Driven Orchestration: Leveraging Resource Utilization Models for Efficient Management in Traditional Scientific and AI/ML Computing Environments
Jim Brandt (Sandia National Laboratories, United States of America)
TBA
11:00 – 11:30Coffee Break
11:30 – 11:55Paper Presentation:
Evaluating Forecasting Techniques for Hardware Errors on a Large-scale HPC System
Kaiyuan Liao, Xiwei Xuan, Tanwi Mallick (University of California, Davis, United States of America), Kevin Brown (Argonne National Laboratory, United States of America)
TBA
11:55 – 12:20Paper Presentation:
Centralised Dashboard for Continuous Benchmarking: From HPC Clusters to Quantum Processors
Filipe Guimaraes, Ashwin Kumar Karnad, Pit Steinbach, Thomas Breuer, Jhon Alejandro Montanez Barrera, Carlos Daniel Gonzalez Calaza, Wolfgang Frings (Forschungszentrum Jülich, Germany)
TBA
12:20 – 12:45Paper Presentation:
Enhancing Security in HPC Systems: A Clustering Approach for Filtering Weak Signals
Laetitia Anton-Croquelois, Sophie Willemot, Ludovic Mustiere, Sebastien Gougeaud (CEA, France)
TBA
12:45 – 13:00Closing

Keynote by Woong Shin
Oak Ridge National Laboratory (ORNL)

The Past, Current, and Future of HPC ODA

Abstract: Operational Data Analytics (ODA) is rapidly becoming a cornerstone of HPC operations, essential for tackling challenges such as energy efficiency, sustainability, and system resilience. However, despite advancements, ODA is far from a solved problem. The sheer scale and complexity of modern supercomputers demand breakthroughs in data collection, analytics, and AI/ML applications to transform raw telemetry into actionable intelligence. Drawing from years of experience spanning multiple supercomputing generations, this talk will explore the evolution of ODA, from its early roots in monitoring to its expanding role in predictive and prescriptive analytics. I will highlight key obstacles, lessons learned, and emerging opportunities, emphasizing that ODA is not merely an auxiliary function but a critical enabler of future HPC success. This keynote will serve as a call to action for the community to invest in ODA innovation, ensuring that HPC remains efficient, adaptive, and ready for the challenges ahead.

Bio: Woong Shin (Ph.D.) is an HPC systems engineer and a computer scientist in the Analytics & AI Methods at Scale (AAIMS) Group at Oak Ridge National Laboratory (ORNL). He is involved in R&D and engineering activities around designing and improving system software & system architectures for HPC systems. Currently he is working as the technical lead in developing and maintaining operational data analytics systems that provide near-real time and long term insights for the Oak Ridge Leadership Computing Facility. Woong started his career as a software engineer in the enterprise sector, working for Samsung & TmaxSoft (South Korea) in the business of developing monitoring systems and business intelligence systems. Later in career, he pursued academic training in system software, distributed systems, and computer architecture specialized in NVRAM based storage systems. He joined ORNL in 2017. He received his Ph.D. degree in electrical engineering and computer science (M.S. and Ph.D. integrated course) in 2017 from Seoul National University, South Korea. He earned his B.S. in computer science in 2003 from Korea University, South Korea.