Mar. 29, 2019 Perspectives Computing / Math

To exascale and beyond

The post-K supercomputer will be first off the rack in the exascale era, but will have to manage a post-Moore's law world, says Satoshi Matsuoka.

Illustration of a supercomputer. © lvcandy/Getty Images

In the early 2020s, when RIKEN switches on its post-K supercomputer, humanity will enter the exascale era. Post-K will likely be the fastest computer in the world, the first of a new generation of supercomputers operating at computational speeds in the region of exaflops, one billion billion ‘floating point operations’ (FLOPS) per second.

Developed at the RIKEN Center for Computational Science (R-CCS), post-K will be two orders of magnitude faster than its predecessor, the 10-petaflop-scale K computer. Post-K will also likely be the first in a wave of exascale computers from Japan, China, the United States and Europe expected to become operational in the early 2020s. These machines will be able to tackle real-world problems that exceed the capabilities of the K computer and its petascale cohort.

For the first time, for example, we will be able to accurately model whole-city responses to natural disasters such as earthquakes¹. This has been difficult because there is no single scenario for planning an earthquake response; earthquakes can occur at different magnitudes and epicenters, so we have to run hundreds or thousands of scenarios to come up with realistic evacuation plans. It only becomes possible with exascale performance².

We also believe exascale computing will bring substantial breakthroughs in drug design. We will finally be able to accurately model drug interactions with whole cells rather than single proteins, and perform whole genome analyses to identify the best drugs for individual patients.

In fact, we expect that exascale computing will have applications ranging from improved climate modeling³ to elucidating the fundamental laws of the Universe. This will entail an era that I have dubbed the ‘Cambrian explosion of computing’ in which computing types will diversify immensely, a topic to which I will return.

End of Moore’s Law built into Post-K

Although post-K will be the first of the new wave of exascale machines, it will also arguably be the most advanced.

There are several strategies that contribute to what we think is a superior exascale computing design. For example, from the project’s inception, ‘co-design’ has been key to post-K’s creation. We have been actively engaging with the machine’s prospective users to anticipate their future high-performance computing needs.

To further maximize real-world usability, post-K adopted and extended its ARM instruction set architecture—these chips with small sets of simple and general instructions often run computers. ARM is the most popular instruction set architecture in the world and billions of ARM chips are produced annually, ranging from very small ones in embedded controllers, to cell phone chips, to very large ones in Internet servers and supercomputers, with a vast software portfolio to match. Post-K will sit at the pinnacle of this ARM ecosystem, offering performance that we believe will best all other general central processing units in the world⁴.

But the most crucial aspect of post-K’s advantage is that it has been built with the looming end of Moore’s law in mind.

Since the mid-1970s, computer-chip manufacturers have been able to double the density of transistors on a silicon chip every two years. This trend, known as Moore’s law, has driven extraordinary advances in computing. Compute speeds have increased seven orders of magnitude in the last 35 years.

Today, Moore’s law is tailing off. Transistors are now so small we are approaching a fundamental physical size limit. Taking the last steps toward that limit is becoming increasingly difficult. The technology company Intel, who makes the microprocessors found in most personal computers, usually transitions to a smaller transistor size every two years, but the transition from their current 14-nanometer transistor size to 7 or 10 nanometers has taken significantly longer. The vast expense and technological challenges are forcing some semiconductor manufacturers out of the game.

Moving on to nanometer and beyond will require changes in fabrication technologies. Most people anticipate Moore’s law will end sometime between 2025 and 2030, before we reach the physical limits of transistors, because we can no longer afford to produce smaller transistors.

Post-K has been designed cognizant of the tailing off of Moore’s law. Rather than emphasizing speed via FLOPS, post-K will therefore focus on bandwidth, the ability to move data more quickly between transistors and memory⁵. We have concluded that most applications are already limited by bandwidth rather than FLOPS, so with post-K we are investing more into the bandwidth design to maintain a ‘system balance’ between compute speed and data transfer.

A Cambrian Explosion of computing

Global efforts to address the end of Moore’s law have taken off in the last three or four years, as it has become evident that we are starting to experience problems that industry will not be able to address by itself. If we are to prevent supercomputer development from stagnating—and supercomputer-reliant scientific research with it—it’s time to act.

At RIKEN, R-CCS-led research teams will explore two new directions, as part of four projects recently or soon to be launched.

The first project, an extension of post-K’s expanded-bandwidth concept, looks at how to boost conventional computing using parameters other than extra transistors. We’re looking into new technologies to move data around faster between transistors and memory—from new devices and architecture, to new algorithms and software to exploit new hardware. We can expect to increase performance from these innovations, and to keep seeing gains⁶. In October 2018, we launched a program looking at future processors as part of this project, with a 2026/7 time frame.

Another broad area we are delving into is to move beyond conventional computing and explore alternative compute models.

Two promising fields are artificial intelligence and machine learning. Traditionally, supercomputer simulation science involves first principle physics and solving partial differential equations (PDEs), but that’s not the only way to model and understand the world. Pour water in a jar and shake it, and a 3-year-old will be able to tell you what the jar contains. The child’s not solving PDEs, she is recognizing empirically that the liquid moves like water.

A computer can use intelligence and deep learning to do the same thing. A good model can analyze a sequence of snapshots to anticipate what will happen next, extrapolating the next state. This approach can dramatically reduce the computational time and energy required, sometimes by four orders of magnitude. Machine learning and empirical modeling is behind the recent progress in self-driving cars, for example.

For certain research applications, empirical approaches could reasonably replace first-principles simulations. The second project of the four is a deep learning project to investigate how to accelerate machine learning, in particular deep learning, on post-K and beyond.

Other alternatives to conventional compute models involve a much more significant rethink of supercomputer hardware. Quantum computing is one area of active research; neuromorphic computing, inspired by neuroscience, is another.

In a neuromorphic computer, each compute unit connects to multiple others, across circuits that evolve over time as the machine learns, like neural circuits in the brain. We think this type of computing could be combined effectively with empirical simulation science and have started a third project on this that will involve international workshops on neuromorphic computing applications.

The fourth project will examine how to combine some of these ideas. For example, the solution to some problems may suit the neuromorphic approach, while others are solved with traditional PDE solvers, but augmented with assumptions and massive bandwidths. It’s important to note that as supercomputing approaches become more varied, we will require machines that are equally heterogeneous, which presents yet another puzzle that we will also examine.

We sometimes call this heterogeneous era the ‘Cambrian explosion of computing^’7, after the dramatic diversification of life forms that characterized the Cambrian period of life on Earth. Ultimately, as in nature, Darwinian processes may kick in and we may winnow down to the most viable of these diverse designs. But right now, as we search for solutions to the end of Moore’s law, it is the time to be really diverse.

Priority issues for the R-CCS

1. Innovative drug discovery infrastructure through functional control of biomolecular systems
2.Integrated computational life science to support personalized and preventive medicine
3.Development of integrated simulation systems for hazards and disasters induced by earthquakes and tsunamis
4.Advancement of meteorological and global environmental predictions utilizing observational ‘big data’
5.Development of new fundamental technologies for high-efficiency energy creation, conversion/ storage and use
6.Accelerated development of innovative clean-energy systems
7.Creation of new functional devices and high-performance materials to support next-generation industries
8.Development of innovative design and production processes that lead the way for the manufacturing industry in the near future
9.Elucidation of the fundamental laws and evolution of the Universe

References

1. Ichimura, T., Fujita, K., Yamaguchi, T., Naruse, A., Wells, J. C., Schulthess, T. C., Straatsma, T. P., Zimmer, C. J., Martinasso, M., Nakajima, K. et al. A fast scalable implicit solver for nonlinear time-evolution earthquake city problem on low-ordered unstructured finite elements with artificial intelligence and transprecision computing. Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18) 49, (2018). Article on ACM Digital Library
2. Matsuoka, S., Sato, H., Tatebe, O., Koibuchi, M., Fujiwara, I., Suzuki, S., Kakuta, M., Ishida, T., Akiyama, Y., Suzumura, T., et al. Extreme big data (EBD): Next generation big data infrastructure technologies towards Yottabyte/Year. Supercomputing Frontiers and Innovations 1(2), 89–107 (2014). doi: 10.14529/jsfi140206
3. Miyoshi, T., Kondo, K. & Terasaki, K. Big ensemble data assimilation in numerical weather prediction. Computer 48(11), 15–21 (2015). doi: 10.1109/MC.2015.332
4. Kodama, Y., Odajima, T. & Sato, M. Preliminary performance evaluation of application kernels using ARM SVE with multiple vector lengths. 2017 IEEE International Conference on Cluster Computing 677–684, (2017). doi: 10.1109/CLUSTER.2017.93
5. Ajima, Y., Kawashima, T., Okamoto, T., Shida, N., Hirai, K., Shimizu, T., Hiramoto, S., Ikeda, Y., Yoshikawa, T. et al. The Tofu Interconnect D. 2018 IEEE International Conference on Cluster Computing (CLUSTER)(2018). doi: 10.1109/CLUSTER.2018.00090
6.Matsuoka, S., Amano, H., Nakajima, K., Inoue, K., Kudoh, T., Maruyama, N., Taura, K., Iwashita, T., Katagiri, T., Hanawa, T. et al. From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era. CF '16 Proceedings of the ACM International Conference on Computing Frontiers 274–281 (2016). doi: 10.1145/2903150.2906830
7.Matsuoka, S. Cambrian explosion of computing and big data in the post-moore era. HPDC '18 Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing 105 (2018). doi: 10.1145/3208040.3225055

About the Researcher

Satoshi Matsuoka, Director of the RIKEN R-CCS

Satoshi Matsuoka became director of the RIKEN Center for Computational Science (R-CCS) in 2018. The R-CCS conducts high-performance computing (HPC) research, hosts the K computer and is developing an ARM-based ‘exascale’ supercomputer, the post-K machine. Previously, Matsuoka led the development of the TSUBAME series of supercomputers at the Tokyo Institute of Technology, where he still holds a professorship. He won the ACM Gordon Bell Prize in 2011 and the IEEE Sidney Fernbach Award in 2014, which are among the most prestigious awards in his field. Matsuoka was Program Chair of the IEEE/ACM Supercomputing 2013 conference, probably the world’s most prominent HPC conference, and was the ACM Gordon Bell Prize selection chair in 2018.