The Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, or LRZ) is located on the campus area of the city of Garching, near Munich, Germany. LRZ is the IT service provider for all universities in the Munich area, as well as a growing number of research organizations throughout Bavaria. The organization supports ground-breaking research and education across a wide range of scientific disciplines by offering highly available, secure and energy efficient services based on cutting edge IT technology. LRZ also plays an important role as a member of the Gauss Centre for Supercomputing (GCS), delivering top tier high performance computing (HPC) services on the national and European level. LRZ is an institute of the Bavarian Academy of Sciences and Humanities.
A world-class academic IT service provider and supercomputing pioneer, the Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, or LRZ) recently began work on a brand-new supercomputing system, soon to be one of the most powerful computers in the world. Built with a combination of technologies from Intel and Lenovo, and running on SUSE Linux Enterprise High Performance Computing (SLE HPC), the newly built SuperMUC-NG is set to advance scientific research at a global level.
Big data could hold the key to understanding the origin of the universe, the composition of matter itself and many more huge questions that have intrigued academics for centuries. To advance the frontiers of knowledge with groundbreaking research, scientists require access to HPC environments that enable them to process vast amounts of complex data quickly and efficiently.
The LRZ is a world-leader in its field, providing HPC and data center resources for scientific research across the region of Bavaria. On top of this, LRZ is one of the three key players in the GCS — Germany’s foremost supercomputing institution — which is committed to creating a consolidated HPC infrastructure that can be applied to a broad range of scientific and industrial research projects.
Until recently, the cutting-edge SuperMUC Petascale System was at the heart of LRZ’s operations. With more than 241,000 cores and a combined peak performance of more than 6.8 petaflops, it was one of the fastest supercomputers in the world, relying in part on SLE.
Dr. Herbert Huber, department head of high performance systems at LRZ, explains: “We have used SUSE solutions at LRZ for over two decades now. For us, the overriding benefit of their solutions is the compatibility they offer. We first implemented SUSE Linux Enterprise Server (SLES) on our very first general-purpose Linux cluster system in the late 1990s, because we deemed it to be one of the best operating systems for performing standard HPC workflows: a quality which SUSE has reliably upheld over the years.
“On the supercomputing level, we first deployed SLE HPC in 2006, due to its support for very large, shared memory nodes, as well as its seamless interoperation with the HPC software stack and many commercial application software packages. The seamless compatibility of SLES with our HPC software stack was and is a key factor in our decision to employ the SUSE operating system on our HPC systems.”
After seven years of top-flight technical research, LRZ determined that an upgrade to its HPC infrastructure was in order. “In 2016, we began planning the SuperMUC Next Generation or SuperMUCNG for short,” says Dr. Huber. “We were very satisfied with SUSE’s support for the existing SuperMUC system, but we knew that now was the time to significantly increase both the performance and energy efficiency of LRZ’s supercomputer.”
The creation of a new supercomputing system required an EU-wide public procurement process.
“We set out a competitive dialogue with two dialogue phases as procurement process for SuperMUC-NG,” says Dr. Huber. “This eventually led us to our current HPC solution.”
"The SuperMUC-NG is a significant achievement. The broad compatibility of the SUSE operating system, our longstanding relationship with the company, and our own experience with the platform gave us real confidence.”
Having undergone a rigorous procurement process for the SuperMUC-NG project, LRZ selected a joint hardware solution from Intel and Lenovo alongside SUSE. The SuperMUC-NG consists of 10 ‘islands’ of computing nodes, with a combined total of 311,040 computing cores and a peak performance of 26.9 petaflops.
“The SuperMUC-NG is an evolutionary step up from its predecessor,” says Dr. Huber. “It consists of eight ‘thin’ node compute islands, one ‘fat’ node compute island, and one input/output island. Each island is connected via a non-blocking Omni-Path-1 network and the inter-island network connectivity has a blocking factor of 4.
“In total, the compute islands each contain 6,336 ‘thin’ nodes (with 96 gigabytes of RAM) and 144 ‘fat’ nodes (with 768 gigabytes of RAM). Each node contains two Skylake 8147 processors with 24 physical processing cores. The SuperMUC-NG has a total storage capacity of 70 petabytes, with a bandwidth of 500 gigabytes per second for its parallel file system capabilities. It represents a real advancement in our computing capacities.”
Intel, along with Lenovo and SUSE, assists LRZ in the long term maintenance and support of the supercomputer, while also facilitating access to LRZ’s cloud computing service, which enables users to visualize and manage their large-scale data.
“The SuperMUC-NG is a significant achievement,” says Dr. Huber. “The broad compatibility of the SUSE operating system, our longstanding relationship with the company, and our own experience with the platform gave us real confidence that it would work well with the new hardware. We really appreciated the willingness of the companies Intel, Lenovo and SUSE to work together for this project.”
SuperMUC-NG will also allow LRZ to participate in a major advancement in the GCS, creating a major milestone in German supercomputing.
“In the academic and industrial research communities, a major new focus area is the efficient handling and processing of large-scale data produced by scientific simulations,” says Dr. Huber. “Some of this data requires long-term effort to evaluate and extract valuable results, of a kind which is only really viable at a very high level of supercomputing power. With the new SuperMUC-NG system, we will be able to link up with our fellow GCS members to process simulation data in ways not feasible for any one of us individually. We will also be able to collaborate even more closely to establish workflows that improve support for user, project and data management, and explore new methods for data processing based on AI and machine learning. These new methods are just one part of our much broader research strategy here at LRZ, all of which is underpinned by the SuperMUC-NG.”
LRZ plans to begin the full rollout of the SuperMUC-NG in early 2019, but the organization has already begun to realize the potential of many ground-breaking scientific projects.
“We are currently in the planning phases for a number of scientific projects which make full use of the SuperMUC-NG’s expanded capabilities,” says Dr. Huber. “For instance, we are planning to work with the Bavarian Health Ministry to help monitor the levels of pollen in the air and model future predictions to help hay fever sufferers. SuperMUC-NG will also allow more precise simulations of blood flow in aneurysms, of the airflow in in the human respiratory tract, and the effects of individual medicines upon patients.
“On top of all this, SuperMUC-NG enables us to gain a deeper insight into the origin and evolution of stars and galaxies and will help scientists to develop a richer understanding of the complex mathematical structure of matter itself. These hugely ambitious and innovative projects will only technically be feasible with the very high level of performance that SuperMUC-NG offers. The new storage capabilities of SuperMUC-NG empower our researchers to maintain important data sets across multiple generations of computational systems, and the SUSE operating system is scalable enough to accommodate any future hardware expansions.”
Once the SuperMUC-NG system has been fully rolled out, LRZ expects substantial improvements in system efficiency. The organization has already been officially commended for its work in the supercomputing field.
“Thanks to the improvements in system hardware, as well as our development of an energy-aware scheduling feature alongside Lenovo, we expect to see energy cost savings of around 35% over the lifetime of the SuperMUC-NG. That represents a significant efficiency saving, especially when operating at this scale. In fact, we recently received an official recognition from the Bavarian Ministry of Science for the scientific innovation that the SuperMUC-NG supports, and the SuperMUC-NG also received the HPCwire Editors’ Choice Award for Energy-Efficient HPC.”
In the future, LRZ plans to expand its hardware capacity still further and take greater advantage of cloud storage technology.
“Our future plans involve even greater collaboration with GCS and further development of cloud computing resources to aid with storage and processing of scientific data,” says Dr. Huber. “We also have plans to expand our high-performance hardware landscape once we secure the budget to do so. We are even in the concept stages of building innovative new hardware solutions, including quantum computing and exascale systems, which we hope to implement within the next decade.”
Dr. Huber concludes: “Our chief purpose has always been to serve the scientific community, and with SuperMUC-NG and the ongoing support of SUSE, Intel and Lenovo, we can continue to facilitate the scientific breakthroughs which help drive human knowledge forward.”