The Pittsburgh Supercomputing Center (PSC) competes to support scientific research from fluid dynamics to climate modeling and genomics. PSC won a National Science Foundation grant with a SUSE Linux Enterprise Server-based SGI UV 1000 cache-coherent shared-memory system. The system now hosts 1,316 users and 373 research projects at universities across the United States with unparalleled ease of use for rapidly testing new ideas.
The U.S. National Science Foundation (NSF) periodically issues solicitations for solutions in its Extreme Science and Engineering Discovery Environment (XSEDE), a US$121- million project that integrates digital resources and services for universities and research centers across the United States. The NSF maintains a rigorous selection process for resource providers, screening for solutions that offer tremendous capabilities, maximum productivity, the ability to share knowledge, and the power to make XSEDE the most advanced and capable digital cyberinfrastucture in the world.
PSC, which has a long history of success, highly regarded reputation and people who are respected throughout the industry, proposed a unique shared-memory supercomputing system that would be much faster and more efficient than previous distributed-memory systems. PSC also had a long relationship with SGI and knew the supercomputer maker was the only provider that could deliver the unique shared-memory capabilities it was looking for.
PSC selected SGI, the maker of the SGI UV 1000 system, as its partner in building the shared-memory foundation for its XSEDE proposal. Shared memory far surpasses distributed memory, because all the processors can access all the memory, where distributed- memory systems require additional code to access the processors’ memory. Programming for a shared-memory machine is thus much easier, and processing is far faster and more efficient.
“SGI is a unique supplier of large, hardware cache-coherent shared-memory machines,” said Jim Kasdorf, director of special projects at PSC. “Software shared-memory approaches exist, but they are far less efficient. We knew when we decided on a shared-memory machine that SGI was the only choice.”
Selecting the operating system for the SGI supercomputer was an even easier choice. PSC has worked with SUSE since 2004, when SUSE provided the operating system for components of a Cray XT3 computer. When SGI designed its shared-memory supercomputer, SUSE Linux Enterprise Server was the only operating system that could provide the necessary support. “SUSE Linux Enterprise Server is the only distribution that supports the full capabilities of the SGI machine,” said Kasdorf. “It was a no-brainer for this application. We use it. We recommend it. SUSE has a newer kernel than other options, making it the best choice.”
The SUSE support team meets weekly with SGI to ensure its needs are met, and the SGI support team meets weekly with PSC. “They are very responsive. They do a very good job. They work very hard. The users are happy. And when the users are happy, we’re happy,” said Kasdorf.
SUSE Linux Enterprise Server supports this SGI shared-memory system that holds 256 blades, 4,096 processing cores and 32 terabytes of memory in two 16-terabyte partitions. This is the largest cache-coherent shared-memory system in the world. And the benefits to researchers are unparalleled. More than 1,300 users are taking advantage of the system for research in 373 projects, covering extreme-scale performance engineering, chemistry, fluid dynamics, the early universe, condensed matter, seismic analysis, nanomaterials, astrophysics, climate modeling and genomics. One example involves researchers who hope to build a diagnostic chip that may identify heart disease in humans. They have been screening more than 100,000 mutant mice to find heart defects, sequencing the genomes and comparing the results to the genome of a healthy mouse. With the PSC-SGI machine running SUSE Linux Enterprise Server, processing that had been taking almost two weeks was cut to less than eight hours.
A system such as this must be readily accessible to researchers all over the United States, yet the research must be kept secure. The NSF has been very successful in maintaining security, and this SGI system has never had a security incident. “We’ve been very successful in providing the security while providing the open access that’s necessary,” said Kasdorf. “SUSE is very good, and they stay very up to date on security.”
The extraordinary memory size, ease of programming, scalability and stability of the PSC system built on the SGI UV 1000 running SUSE Linux Enterprise Server has given scientists and engineers the ability to solve problems in ways that have never been possible before.