RIKEN Center for Computational Science Supercomputing Performance Research Team

Team Principal: Jens Domke (Ph.D.)

Japanese Page

Research Summary

The complexity of modern supercomputers is steadily increasing. Previously, we were able to ride the wave of persistent transistor shrinking as observed by G. Moore, and hence could focus on finding technological solutions for the ever growing need for supercomputing performance. But nowadays, utilizing these machines effectively and efficiently is becoming ever more challenging.

To tackle these challenges, and to provide our HPC users with the best and fastest scientific instrument for their modelling and simulation of real-world phenomena, our team is applying, researching, and developing state-of-the-art methodologies to analyze hardware options. We are implementing novel performance monitoring and analysis tools and are conducting detailed performance studies of HPC architectures and software subsystems. Our team's mission is to bring performance to the masses. With the right tools, automatic performance tuning frameworks, and appropriate co-design, we are able to enhance the user experience for Fugaku and we are able to design the next Japanese flagship supercomputers to meet the needs of our domain experts and researchers without them requiring an advanced degree in computer science.

Main Research Fields

Informatics

Related Research Fields

Interdisciplinary Science & Engineering

Keywords

Performance Modelling and Predictions
Hardware/Software Co-Design for HPC
Architecture and Application Evaluations
Instrumentation and Monitoring Tools
Auto-Tuning and Portability

Selected Publications

Papers with an asterisk(*) are based on research conducted outside of RIKEN.

1. T.N. Truong, F. Trahay, J. Domke, A. Drozd, E. Vatai, J. Liao, M. Wahib, B. Gerofi,
"Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning,"
in Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS), (Lyon, France), IEEE Computer Society, May 2022.
2. J. Domke, E. Vatai, A. Drozd, P. Chen, Y. Oyama, L. Zhang, S. Salaria, D. Mukunoki, A. Podobas, M. Wahib, S. Matsuoka,
"Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?,"
in Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), (Portland, Oregon, USA), IEEE Computer Society, May 2021.
3. M. Besta, J. Domke, M. Schneider, M. Konieczny, S.D. Girolamo, T. Schneider, A. Singla, T. Hoefler,
"High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers,"
IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 4, pp. 943-959, 2021.
4. M. Wahib, H. Zhang, T.T. Nguyen, A. Drozd, J. Domke, L. Zhang, R. Takano, S. Matsuoka,
"Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA,"
in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’20, (Piscataway, NJ, USA), IEEE Press, Nov. 2020.
5. J. Domke, S. Matsuoka, I.R. Ivanov, Y. Tsushima, T. Yuki, A. Nomura, S. Miura, N. McDonald, D.L. Floyd, N. Dube,
"HyperX Topology: First at-scale Implementation and Comparison to the Fat-Tree,"
in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’19, (Piscataway, NJ, USA), IEEE Press, Nov. 2019.
6. *J. Domke, K. Matsumura, M. Wahib, H. Zhang, K. Yashima, T. Tsuchikawa, Y. Tsuji, A. Podobas, S. Matsuoka,
"Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches?,"
in Proceedings of the 33th IEEE International Parallel & Distributed Processing Symposium (IPDPS), (Rio de Janeiro, Brazil), IEEE Computer Society, May 2019.
7. *S. Smith, C. Cromey, D.K. Lowenthal, J. Domke, N. Jain, J.J. Thiagarajan, A. Bhatele,
"Mitigating Inter-Job Interference Using Adaptive Flow-Aware Routing,"
in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’18, (Piscataway, NJ, USA), IEEE Press, Nov. 2018.
8. *J. Domke and T. Hoefler,
"Scheduling-Aware Routing for Supercomputers,"
in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’16, (Piscataway, NJ, USA), pp. 13:1-13:12, IEEE Press, 2016.
9. *J. Domke, T. Hoefler, and S. Matsuoka,
"Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing," in Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’16, (New York, NY, USA), pp. 3-14, ACM, 2016.
10. *J. Domke, T. Hoefler, and S. Matsuoka,
"Fail-in-place Network Design: Interaction Between Topology, Routing Algorithm and Failures,"
in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’14, (Piscataway, NJ, USA), pp. 597-608, IEEE Press, 2014.

Lab Members

Principal investigator

Jens Domke: Team Principal

Core members

Seydou Ba: Research Scientist
Theresa Pollinger: Postdoctoral Researcher
Francesco Antici: Postdoctoral Researcher
Ivan Ivanov: Junior Research Associate

Careers

Position	Deadline
Seeking a Research Scientists or Postdoctoral Researchers (K24069)	Open until filled

Contact Information

RIKEN Center for Computational Science (R-CCS) R503
7-1-26,Minatojima-minami-machi,
Chuo-ku,Kobe,Hyogo
650-0047,Japan
Email: jens.domke@riken.jp