2013-…: MCREX – Monte Carlo Resilient Exascale Solvers
The next generation of computational science applications will require numerical solvers that are capable of high performance on proposed exascale platforms. In order to meet this goal, solvers must be resilient to soft and hard failures, provide high concurrency on heterogeneous hardware configurations, and retain numerical accuracy and efficiency. In light of these requirements, a natural avenue of inquiry would be to adapt the current stable of numerically efficient solvers to this new high-performance computing (HPC) regime. However, an alternative approach is to investigate different classes of algorithms that can address issues of resiliency naturally. This project investigates new stochastic methods for solving linear systems, otherwise termed Monte Carlo Resilient Exascale (MCREX) solvers. The family of methods builds on the sequential Monte Carlo work of Halton, 1962. While showing significant promise, this class of solvers has not made inroads into the broader computational science community. Our initially developed methods use Monte Carlo to accelerate a fixed-point iteration. Therefore, they are called Monte Carlo Synthetic Acceleration (MCSA). Preliminary work using MCSA has demonstrated that they are at least as efficient as Jacobi-preconditioned Conjugate Gradient (PCG) on sparse, SPD systems. These initial results demonstrate that, because MCSA does not require symmetry or positive definiteness, very good efficiency could be attained on non-symmetric systems, thus making MCSA an ideal solver in non-linear Newton schemes. Furthermore, Monte Carlo methods have the benefit of addressing resiliency in a natural way; soft errors can be treated as high variance samples and lost histories from processor failures can be easily discarded without affecting the quality of the solution.
- Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy