A world-class academic IT service provider and supercomputing pioneer, the Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, or LRZ) recently began work on a brand-new supercomputing system, soon to be one of the most powerful computers in the world. Built with a combination of technologies from Intel and Lenovo, and running on SUSE Linux Enterprise High Performance Computing, the newly built SuperMUC-NG is set to advance scientific research at a global level.

Overview

The Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, or LRZ) is located on the campus area of the city of Garching, near Munich, Germany. LRZ is the IT service provider for all universities in the Munich area, as well as a growing number of research organizations throughout Bavaria. The organization supports ground-breaking research and education across a wide range of scientific disciplines by offering highly available, secure and energy efficient services based on cutting edge IT technology. LRZ also plays an important role as a member of the Gauss Centre for Supercomputing (GCS), delivering top tier HPC services on the national and European level. LRZ is an institute of the Bavarian Academy of Sciences and Humanities.

The Challenge

Big data could hold the key to understanding the origin of the universe, the composition of matter itself and many more huge questions that have intrigued academics for centuries. To advance the frontiers of knowledge with ground-breaking research, scientists require access to high-performance computing environments that enable them to process vast amounts of complex data quickly and efficiently.

The Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, or LRZ) is a worldleader in its field, providing HPC and data centre resources for scientific research across the region of Bavaria. On top of this, LRZ is one of the three key players in the Gauss Centre for Supercomputing (GCS)—Germany’s foremost supercomputing institution—which is committed to creating a consolidated HPC infrastructure that can be applied to a broad range of scientific and industrial research projects.

Until recently, the cutting-edge SuperMUC Petascale System was at the heart of LRZ’s operations. With more than 241,000 cores and a combined peak performance of more than 6.8 petaflops, it was one of the fastest supercomputers in the world, relying in part on SUSE Linux Enterprise.

Dr. Herbert Huber, Department Head of High Performance Systems at LRZ, explained: “We have used SUSE® solutions at LRZ for over two decades now. For us, the overriding benefit of their solutions is the compatibility they offer. We first implemented SUSE Linux Enterprise Server on our very first general-purpose Linux cluster system in the late 1990s, because we deemed it to be one of the best operating systems for performing standard HPC workflows: a quality which SUSE has reliably upheld over the years.

“On the supercomputing level, we first deployed SUSE Linux Enterprise High Performance Computing in 2006, due to its support for very large shared memory nodes, as well as its seamless interoperation with the HPC software stack and many commercial application software packages. The seamless compatibility of SUSE Linux Enterprise Server with our HPC software stack was and is a key factor in our decision to employ the SUSE operating system on our HPC systems.”

After seven years of top-flight technical research, LRZ determined that an upgrade to its HPC infrastructure was in order.

“In 2016, we began planning the SuperMUC Next Generation or SuperMUC-NG for short,” said Dr. Huber. “We were very satisfied with SUSE’s support for the existing SuperMUC system, but we knew that now was the time to significantly increase both the performance and energy efficiency of LRZ’s supercomputer.”

The creation of a new supercomputing system required an EU-wide public procurement process.

“We set out a competitive dialogue with two dialogue phases as procurement process for SuperMUC-NG,” said Dr. Huber. “This eventually led us to our current HPC solution.”

“The SuperMUC-NG is a significant achievement. The broad compatibility of the SUSE operating system, our longstanding relationship with the company, and our own experience with the platform gave us real confidence.”

SUSE Solution

Having undergone a rigorous procurement process for the SuperMUC-NG project, LRZ selected a joint hardware solution from Intel and Lenovo alongside SUSE. The SuperMUC-NG consists of ten ‘islands’ of computing nodes, with a combined total of 311,040 computing cores and a peak performance of 26.9 petaflops.

“The SuperMUC-NG is an evolutionary step up from its predecessor,” said Dr. Huber. “It consists of eight ‘thin’ node compute islands, one ‘fat’ node compute island, and one Input/Output island. Each island is connected via a non-blocking Omni-Path-1 network and the inter-island network connectivity has a blocking factor of 4.”

“In total, the compute islands each contain 6,336 ‘thin’ nodes (with 96 gigabytes of RAM) and 144 ‘fat’ nodes (with 768 gigabytes of RAM). Each node contains two Skylake 8147 processors with 24 physical processing cores. The SuperMUC-NG has a total storage capacity of 70 petabytes, with a bandwidth of 500 gigabytes per second for its parallel file system capabilities. It represents a real advancement in our computing capacities.”

Intel, along with Lenovo and SUSE, assists LRZ in the long-term maintenance and support of the supercomputer, while also facilitating access to LRZ’s cloud computing service, which enables users to visualize and manage their large-scale data.

“The SuperMUC-NG is a significant achievement,” said Dr. Huber. “The broad compatibility of the SUSE operating system, our longstanding relationship with the company, and our own experience with the platform gave us real confidence that it would work well with the new hardware. We really appreciated the willingness of the companies Intel, Lenovo and SUSE to work together for this project.”

SuperMUC-NG will also allow LRZ to participate in a major advancement in the GCS, creating a major milestone in German supercomputing.

“In the academic and industrial research communities, a major new focus area is the efficient handling and processing of large-scale data produced by scientific simulations,” said Dr. Huber. “Some of this data requires long-term effort to evaluate and extract valuable results, of a kind which is only really viable at a very high level of supercomputing power. With the new SuperMUC-NG system, we will be able to link up with our fellow GCS members to process simulation data in ways not feasible for any one of us individually. We will also be able to collaborate even more closely to establish workflows that improve support for user, project, and data management, and explore new methods for data processing based on AI and machine learning. These new methods are just one part of our much broader research strategy  here at LRZ, all of which is underpinned by the SuperMUC-NG.”

The Results

LRZ plans to begin the full rollout of the SuperMUC-NG in early 2019, but the organisation has already begun to realise the potential of many ground-breaking scientific projects.

“We are currently in the planning phases for a number of scientific projects which make full use of the SuperMUC-NG’s expanded capabilities,” said Dr. Huber. “For instance, we are planning to work with the Bavarian Health Ministry to help monitor the levels of pollen in the air, and model future predictions to help hay fever sufferers. SuperMUC-NG will also allow more precise simulations of blood flow in aneurysms, of the airflow in in the human respiratory tract, and the effects of individual medicines upon patients.

“On top of all this, SuperMUC-NG enables us to gain a deeper insight into the origin and evolution of stars and galaxies and will help scientists to develop a richer understanding of the complex mathematical structure of matter itself. These hugely ambitious and innovative projects will only technically be feasible with the very high level of performance that SuperMUC-NG offers. The new storage capabilities of SuperMUC-NG empower our researchers to maintain important data sets across multiple generations of computational systems, and the SUSE operating system is scalable enough to accommodate any future hardware expansions.”

Once the SuperMUC-NG system has been fully rolled out, LRZ expects substantial improvements in system efficiency. The organization has already been officially commended for its work in the supercomputing field.

“Thanks to the improvements in system hardware, as well as our development of an energy-aware scheduling feature alongside Lenovo, we expect to see energy cost savings of around 35% over the lifetime of the SuperMUC-NG. That represents a significant efficiency saving, especially when operating at this scale. In fact, we recently received an official recognition from the Bavarian Ministry of Science for the scientific innovation that the SuperMUC-NG supports, and the SuperMUC-NG also received the HPCwire Editors’ Choice Award for Energy-Efficient HPC.”

In the future, LRZ plans to expand its hardware capacity still further, and take greater advantage of cloud storage technology.

“Our future plans involve even greater collaboration with GCS and further development of cloud computing resources to aid with storage and processing of scientific data,” said Dr. Huber. “We also have plans to expand our high performance hardware landscape once we secure the budget to do so. We are even in the concept stages of building innovative new hardware solutions, including quantum computing and exascale systems, which we hope to implement within the next decade.”

Dr. Huber concludes: “Our chief purpose has always been to serve the scientific community, and with SuperMUC-NG and the ongoing support of SUSE, Intel and Lenovo, we can continue to facilitate the scientific breakthroughs which help drive human knowledge forward.