Home > Uncategorized > About Me

About Me

April 4th, 2013

Dr. Christian Engelmann is the Task Lead of the System Software Team in the Computer Science Research Group of the Computer Science and Mathematics Division at Oak Ridge National Laboratory, which is the U.S. Department of Energy’s largest multiprogram science and technology laboratory with an annual budget of $1.6 billion. He has 12 years experience in software research and development (R&D) for extreme-scale high-performance computing (HPC) systems with a strong research funding and publication record. In collaboration with other laboratories and universities, his research aims at solving computer science challenges in HPC software, such as scalability, dependability, energy efficiency, and portability, for the largest current and future supercomputers in the world. Dr. Engelmann’s primary expertise is in HPC resilience, i.e., providing efficiency and correctness in the presence of faults, errors, and failures through avoidance, masking, and recovery. As chair and member of several scientific committees and panels, including the U.S. DOE Technical Council on Resilience, he is a leading expert in the HPC resilience community. The term HPC resilience was coined in a co-authored whitepaper in 2009. Dr. Engelmann’s secondary expertise is in HPC hardware/software co-design through lightweight simulation of future-generation extreme-scale systems with up to 134,217,728 (2^27) processor cores, studying the impact of hardware, system software, and parallel application properties on the key HPC system design factors: performance, resilience, and power consumption. His skills further include leading R&D teams, co-advising students, programming in C/C++, MPI, Fortran, and Java, and system administration.

Download NSF-style 2-page bio. Download full list of publications. Resume available upon request.

Contact Information

QR Code

e-Mail: engelmannc@computer.org
Mail: P.O. Box 2008, Oak Ridge, TN 37831-6016, USA
Phone: +1 (865) 574-3132
Fax: +1 (865) 576-5491

View Christian Engelmann's profile on LinkedIn
View Christian Engelmann's profile on facebook

Professional Accomplishments

8 Research grants ($15.6M, 2 as lead-PI) 7 Peer-reviewed journal articles 36 Invited talks and seminars
8 Co-advised Master theses 34 Peer-reviewed conference papers 83 Committees at 36 conference series
2 Mentored summer faculty 27 Peer-reviewed workshop papers 27 Article and book proposal reviews
10 Direct reports over the past 7 years 8 Peer-reviewed conference posters 11 Conference booth exhibitions
Erdős number of 5 910+ Total publication citations H-index of 16 / G-index of 27

Ongoing Research Activities

2012-…: HPC resilience co-design toolkit evaluating the resilience/power/performance cost/benefit trade-off of resilience solutions, identifying hardware/software resilience properties, and coordinating interfaces/responsibilities of individual hardware/software components … more

Upcoming Deadlines

2013-09-01: 4th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) 2013 at the 26th IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC), Denver, CO, USA , November 18, 2013.
2013-05-31: 6th Workshop on Resiliency in High-Performance Computing (Resilience) 2013 at the 19th International European Conference on Parallel and Distributed Computing (Euro-Par), Aachen, Germany, August 26-30, 2013.

Recent Events

2013-02-12: Presentation of the sole-authored research paper, Investigating Operating System Noise in Extreme-Scale High-Performance Computing Systems using Simulation, at the 11th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2013, Innsbruck, Austria.
2013-01-31: First meeting of the Technical Council on Resilience for the Advanced Scientific Computing Research Program at the Office of Science of the U.S. Department of Energy, Germantown, MD, USA.
2012-11-15: Chair of the Birds-of-a-Feather session on Resilience for Extreme-scale High-performance Computing at the 25th IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2012, Salt Lake City, UT, USA.
2012-11-15: Presentation by David Fiala of the co-authored research paper, Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing, at the 25th IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2012, Salt Lake City, UT, USA.
2012-11-12: After its recent upgrade to 261,632 NVIDIA K20x accelerator cores and 298,592 AMD Opteron cores, ORNL’s Titan Cray XK7 supercomputer is ranked 1st in the Top 500 List of supercomputers with a LINPACK performance of 17.95 PFlops and 3rd in the Green 500 List of energy-efficient supercomputers.
2012-11-11: Program Chair of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) at the 25th IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2012, Salt Lake City, UT, USA.

Important Peer-reviewed Journal Publications

Symbols: Abstract Abstract, Publication Publication, BibTeX Citation BibTeX Citation, DOI Link DOI Link

  1. Christian Engelmann. Scaling To A Million Cores And Beyond: Using Light-Weight Simulation to Understand The Challenges Ahead On The Road To Exascale. Future Generation Computer Systems (FGCS), 2013. Elsevier B.V, Amsterdam, The Netherlands. To appear. Abstract BibTeX Citation
  2. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Proactive Process-Level Live Migration and Back Migration in HPC Environments. Journal of Parallel and Distributed Computing (JPDC), volume 72, number 2, pages 254-267, 2012. Elsevier B.V, Amsterdam, The Netherlands. ISSN 0743-7315. Abstract Publication BibTeX Citation DOI Link
  3. Xubin (Ben) He, Li Ou, Christian Engelmann, Xin Chen, and Stephen L. Scott. Symmetric Active/Active Metadata Service for High Availability Parallel File Systems. Journal of Parallel and Distributed Computing (JPDC), volume 69, number 12, pages 961-973, 2009. Elsevier B.V, Amsterdam, The Netherlands. ISSN 0743-7315. Abstract Publication BibTeX Citation DOI Link
  4. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. Symmetric Active/Active High Availability for High-Performance Computing System Services. Journal of Computers (JCP), volume 1, number 8, pages 43-54, 2006. Academy Publisher, Oulu, Finland. ISSN 1796-203X. Abstract Publication BibTeX Citation DOI Link

Important Peer-reviewed Conference Publications

Symbols: Abstract Abstract, Publication Publication, Presentation Presentation, BibTeX Citation BibTeX Citation, DOI Link DOI Link

  1. David Fiala, Frank Mueller, Christian Engelmann, Kurt Ferreira, Ron Brightwell, and Rolf Riesen. Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing. In Proceedings of the 25th IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2012, pages 78:1-78:12, Salt Lake City, UT, USA, November 10-16, 2012. ACM Press, New York, NY, USA. ISBN 978-1-4673-0804-5. Acceptance rate 21.2% (100/472). Abstract Publication Presentation BibTeX Citation
  2. James Elliott, Kishor Kharbas, David Fiala, Frank Mueller, Kurt Ferreira, and Christian Engelmann. Combining Partial Redundancy and Checkpointing for HPC. In Proceedings of the 32nd International Conference on Distributed Computing Systems (ICDCS) 2012, pages 615-626, Macau, SAR, China, June 18-21, 2012. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4685-8. ISSN 1063-6927. Acceptance rate 13% (71/515). Abstract Publication Presentation BibTeX Citation DOI Link
  3. Chao Wang, Sudharshan S. Vazhkudai, Xiaosong Ma, Fei Meng, Youngjae Kim, and Christian Engelmann. NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2012, pages 957-968, Shanghai, China, May 21-25, 2012. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4675-9. Acceptance rate 21% (118/569). Abstract Publication Presentation BibTeX Citation DOI Link
  4. Swen Böhm and Christian Engelmann. xSim: The Extreme-Scale Simulator. In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS) 2011, pages 280-286, Istanbul, Turkey, July 4-8, 2011. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-61284-383-4. Acceptance rate 28.1% (48/171). Abstract Publication Presentation BibTeX Citation DOI Link
  5. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Hybrid Checkpointing for MPI Jobs in HPC Environments. In Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems (ICPADS) 2010, pages 524-533, Shanghai, China, December 8-10, 2010. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4307-9. Acceptance rate 29.6% (77/188). Abstract Publication Presentation BibTeX Citation DOI Link
  6. Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim, Christian Engelmann, and Galen Shipman. Functional Partitioning to Optimize End-to-End Performance on Many-Core Architectures. In Proceedings of the 23rd IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2010, pages 1-12, New Orleans, LA, USA, November 13-19, 2010. ACM Press, New York, NY, USA. ISBN 978-1-4244-7559-9. Acceptance rate 19.8% (50/253). Abstract Publication Presentation BibTeX Citation DOI Link

Comments are closed.