Elastic, Scalable, and Efficient High-Performance Computing With Ampere® Altra, Altra Max and SUSE Linux Enterprise
For those of us interested in high-performance, high-throughput or AI/ML, this is a special week. SC22 is finally here and its message of HPC accelerates resonates more than ever.
A key SC22 message is “Leveraging HPC, skilled minds employ innovative technologies to respond to the call – driven by data, simulating possibilities, and unlocking new solutions”. In this blog, I would like to add another component: cost effectiveness measured in terms of efficiency, scalability, elasticity as well as overall sustainability.
This blog article shows how Ampere Computing with its Altra® family of processors combined with the SUSE software stack can yields a modern, open source, supported environment capable of analyzing the growing volumes of data needed by today’s high-performance environment while doing so in an elastic, scalable and efficient manner.
Elasticity, Scalability, and Efficiency with Ampere® Altra®
The folks at Ampere Computing have developed Altra® – A family of processors designed for current and future cloud-native applications. It combines power efficiency and high performance. In brief titled, “Cloud Native Computing – Why data center operators should care about 128-core processors”, Ampere Computing makes a compelling case for efficiency, scalability and elasticity showcasing three key features of the Altra® Max processors:
- Single-threaded execution – The availability of 128 cores allows for consistent performance over time across the processor. Translation: Efficient use of resources for all consumers of the processor.
- Maximum number of cores – 128 power and area-efficient cores for cloud server and cloud-native environments. Great performance per core while reducing system power consumption. Translation: in .
- High speed private and low latency caches – There are large private caches to accelerate the performance of each core’s individual workload, avoiding conflict between users for the same resources. This is key for cloud-native workloads where nearly all processes are executed privately in each core.
In short, Ampere Computing delivered a processor built for the future where each workload or microservice is executed as a single thread in its own core. According to Ampere, “it’s also more power efficient to deliver maximum throughput via balanced system performance across many cores and many users than to allow certain users to consume an unfair share of resources in a power-hungry manner while throttling others.”
Now, how does this translate to high-performance computing? Ampere Computing and SUSE have been running a series of benchmarks including application workloads with WRF and GROMACS.
WRF (Weather Research and Forecasting Model) is a numerical weather prediction (NWP) system designed to serve both atmospheric research and operational forecasting needs. (Source: Wikipedia )
GROMACS is a molecular dynamics package mainly designed for simulations of proteins, lipids, and nucleic acids. GROMACS can run on CPUs and GPUs. It’s free and open-source and available via LGPL license.
WRF and GROMACS were selected since they don’t rely significantly on storage and can provide a good measure of CPU performance.
Converting products into solutions the open-source way with SUSE
SUSE, now with Rancher, offers the industry’s most adaptable, enterprise Linux operating system and the only open Kubernetes management platform.
The combination of our SUSE Linux Enterprise distribution AND Rancher Kubernetes management platform, allows for the delivery of traditional or cloud-native high-performance and high-throughput solutions across multiple instruction set architectures (ISA) such as aarch64 and x86_64.
- SUSE Linux Enterprise High Performance Computing is a highly scalable, high-performance, open-source operating system designed to utilize the power of parallel computing for modeling, simulation, and advanced analytics workloads. The HPC module provides a supported set of popular HPC tools and utilities that make managing and monitoring parallel computing environments easier. This includes (but is not limited to):
- Workload manager.
- Remote and parallel shells.
- Performance monitoring and measuring tools.
- Serial console monitoring tool.
- Cluster power management tools.
- A tool for discovering the machine hardware topology.
- Systems monitoring, including the monitoring of memory errors.
- Serial and parallel computational libraries providing the common standards BLAS, LAPACK.
- Various MPI implementations.
- Serial and parallel libraries for the HDF5 file format.
- SUSE Rancher is an open-source container management platform that unifies Kubernetes clusters to ensure consistent operations, workload management, and enterprise-grade security from core to cloud to edge. Key capabilities include (but are not limited to):
- Supporting any CNCF-certified Kubernetes distribution. For on-premises, SUSE offers and K3s. We also support all the public cloud distributions, including EKS, AKS, and GKE.
- Simplified multi-cluster operations including provisioning, version management, visibility, and diagnostics, monitoring and alerting, and centralized audit.
- Easy to adopt shared tools and services with SUSE Rancher’s rich catalog for building, deploying, and scaling containerized applications, including app packaging, CI/CD, logging, monitoring, and service mesh.
Benchmarks and results
Michael Bennett from Ampere Computing and Bryan Gartner from SUSE presented the results of Ampere Computing’s benchmark tests with SUSE Linux Enterprise for HPC. Their presentation is titled “HPC Performance and Efficiency of Ampere Altra and Altra Max Cloud-Native Processors” and should be available at the SC22 site shortly.
The presentation provides a glimpse into the hardware and software test configuration, benchmark results and key findings from Ampere Computing. In my opinion, one of the key highlights of the presentation was that running WRF and GROMACS workloads on Ampere Altra Max resulted in more nanoseconds of performance per dollar spent (based on cloud-based workload cost/performance model) and lower total and peak power consumption. This is an interesting finding, particularly for large HPC deployments (cloud or on-premises) since power or more importantly, amount of compute available per watt can create situations where you have ‘stranded space’ in your server rack (you can’t add more compute because you ran out of power).
If performance, power, and efficiency are key to you as you plan your HPC infrastructure, Ampere Altra Max should be on your short list of deployment options. And SUSE Linux Enterprise for HPC can provide you with a fully supported HPC stack for your Ampere Altra Max platform
Summary and Call to Action
In summary, high-performance and high-throughput computing customers can look to SUSE and Ampere Computing to provide high performance, power-efficient HPC stack with key open source software ingredients and enterprise support.
For more information on Ampere Altra Max with SUSE Linux Enterprise or HPC please review Ampere Computing and SUSE’s SC22 presentation. You can also download the SUSE HPC stack for a spin.