
Barcelona Supercomputing Center
Resilience Design Patterns for High-Performance Computing
Pages
2
Time to read
2 mins
Publication
Language
English

Pages
2
Time to read
2 mins
Publication
Language
English
This research article discusses a structured approach to resilience in extreme-scale high-performance computing (HPC) systems. It highlights the challenges of reliability and fault management, presenting resilience design patterns as reusable solutions. The work aims to provide a comprehensive framework for evaluating and implementing resilience technologies, optimizing performance, power efficiency, and fault management across hardware and software components.