Current Research

October 12th, 2016
2015-…: Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale

This project will increase the ability of scientific applications to reach accurate solutions in a timely and efficient manner. Using a novel design pattern concept, it identifies and evaluates repeatedly occurring resilience problems and coordinates solutions throughout high-performance computing hardware and software. [US Department of Energy Early Career Research Program]

2015-…: Catalog: Characterizing Faults, Errors, and Failures in Extreme-Scale Systems

This project identifies, categorizes and models the fault, error and failure properties of US Department of Energy high-performance computing (HPC) systems. It develops a fault taxonomy, catalog and models that capture the observed and inferred conditions in current systems and extrapolate this knowledge to exascale HPC systems. [US Department of Energy Resilience for Extreme Scale Supercomputing Systems Program]

Comments are closed.