SUSE at SC23 – Empowering customers to build HPC and AI solutions “their way”, with a little help from our friends
As the website (https://sc23.supercomputing.org/)states, SC23 is “The 2023 International Conference for High Performance Computing, Networking, Storage and Analysis”. The conference description gives us a glimpse at the challenges associated with HPC and AI today. It takes a combination of hardware (processors, network and graphical accelerators, and high-speed storage) as the hardware building blocks. To “glue together” the hardware stack, a software-defined, fully enabled, software infrastructure layer is needed.
This blog describes several products SUSE provides to build HPC and AI solutions and how we collaborate with our Silicon alliances to give our customer broader choices when deploying their own solutions.
Building blocks for HPC and AI.
The building blocks available to build HPC and AI infrastructures today are more varied than ever.
On the hardware infrastructure side:
Whereas in the past major Silicon providers had their respective product “swim lanes” (ex: CPU, GPUs, Networking), customers today face a growing set of options from the major providers (ex: Arm, AMD, and Intel® to name a few) through multiple delivery options (Edge, Data Center, or Cloud). Some products are performance “thoroughbreds” while others aim at “performance per watt” (power efficiency).
Server, storage, and networking equipment designers and manufacturers need to design new platforms that take advantage of the new Silicon pipeline. These designers need to address two competing demands: On one end, build something ‘unique and differentiated’ while at the same time, achieve manufacturing economies of scale (and cost savings) by building a base platform capable of supporting different processor and accelerator options from the Silicon providers.
On the software infrastructure side:
There is continued need for what I call ‘traditional’ high-performance computing (HPC) with applications running across multiple, bare-metal nodes. The existing HPC application eco-system continues to grow and flourish. Operating System platforms supporting these environments need to be able to recognize and use newer processors, GPUs, and SmartNIC cards. In some instances, the Silicon provider delivers new features ‘in-code’ and the Operating System provider needs to be able to consume and support these features in a timely fashion.
Artificial Intelligence workloads such as Machine Learning (ML)while available through multiple delivery options, tend to be consumed in the form of cloud-native options such as containers and can be managed via orchestration platforms such as Kubernetes.
And let’s not forget delivery options: Customers may be deploying their solution in the cloud, in a traditional HPC-type datacenter or perhaps as an edge platform performing AI workloads.
Putting it all together – The value of partnerships:
Given all the options with hardware and software infrastructure and delivery options, it can be somewhat daunting at putting the right HPC and/or AI solution that meets or exceeds requirements while reducing deployment and usage risks and costs.
It’s the relationships between Silicon designers and providers, the platform designers and manufacturers, and software infrastructure providers like SUSE that enables customers with options.
Through ongoing collaboration, new Silicon technologies are tested by companies like SUSE at the software level and then certified as part of hardware or cloud-based offerings in conjunction with IHVs and CSPs respectively. The results of testing and certification efforts provide customers with peace of mind and the ability to secure support for their deployed configurations.
SUSE’s efforts during the last twelve months:
When it comes to HPC and AI, SUSE has been busy for the last twelve months. Some of our accomplishments with the different Silicon alliances and IHV partners include (but are not limited to):
New performance tuning guide for SUSE Linux Enterprise Server for AMD EPYCTM processors.
AMD’s GPU Device Plugin is now available for easy consumption via the Rancher Marketplace.
SLES for Arm available for Ampere Altra, and Altra Max via CSPs or IHV platforms such as Hewlett-Packard Enterprise’s RL300 Gen11 server.
Growing support for more components of the SUSE Rancher ecosystem on Arm64. Both RKE2 (experimental) and K3s (production) are available for aarch64 architecture.
We continue to test and certify Intel® XeonTM processors and associated upstream enablement as part of our SUSE Linux Enterprise Server development process. SUSE is also supporting HPE’s efforts in the upcoming Aurora supercomputer.
Intel-optimized plugins are now available via the Rancher Marketplace for customers looking to adopt Intel hardware accelerators into their cloud-native AI ecosystem.
Wrap-up: Working collaborations empowering customers with choice.
SUSE’s approach to customers is simple, yet valuable: We work with our partners to provide you a broad set of options. We want you, the customer to have choices, based on open-source technologies that meet or exceed your requirements.
When it comes to HPC and AI, our engagement with Silicon partners is key. Having the fundamental building blocks fully enabled and supported empowers the rest of the ecosystem (IHVs, CSPs) to develop solutions based on SUSE and other open-source technologies.
Stop-by our booth at SC23 and learn more about our products, solutions and approaches to HPC, AI whether on the edge, the data center, or the cloud. And allow us to learn from you, by articulating what you would like to see from us in the future.