HPC
Linux Cluster
A Linux cluster is a connected array of Linux computers or nodes that work together and can be viewed and managed as a single system. Nodes are usually connected by fast local area networks, with each node running its own instance of Linux. Nodes may be physical or virtual machines, and they may be separated geographically. Each node includes storage capacity, processing power and I/O (input/output) bandwidth. Multiple redundant nodes of Linux servers may be connected as a cluster for high availability (HA) in a network or data center, where each node is capable of failure detection and recovery.
IT organizations use Linux clusters to reduce downtime and deliver high availability of IT services and mission-critical workloads. The redundancy of cluster components eliminates single points of failure. Linux clusters may be connected nodes of servers, storage devices or virtualized containers. A server cluster is a group of linked servers that work together to improve system performance, load balancing and service availability. If a server fails, other servers in the cluster can take over the functions and workloads of the failed server.
Compared to a single computer, a Linux cluster can provide faster processing speed, larger storage capacity, better data integrity, greater reliability and wider availability of resources. Clusters are usually dedicated to specific functions, such as load balancing, high availability, high performance, storage or large-scale processing. Compared to a mainframe computer, the amount of power and processing speed produced by a Linux cluster is more cost effective. The networked nodes in a cluster also create an efficient, distributed infrastructure that prevents bottlenecks, thus improving performance. SUSE Linux High Availability Extension provides open source HA clustering that can be deployed in physical or virtual environments.
Grid Computing
Grid computing is a distributed computing system formed by a network of independent computers in multiple locations. Grid computing links disparate, low-cost computers into one large infrastructure, harnessing their unused processing and other compute resources. Unlike high performance computing (HPC) and cluster computing, grid computing can assign a different task to each node. Each computer’s resources – CPUs, GPUs, applications, memory, data storage, etc. – can be shared as community resources. Load-sharing software or middleware evenly distributes tasks to several personal computers and servers on the grid. The nodes run independent tasks and are loosely linked by the Internet or low-speed networks, connecting to the grid directly or via scheduling systems. Grid environments are fault tolerant and have no single point of failure. If one node in the grid fails, many others are available to take over its load.
Grid computing is best suited for applications in which many parallel computations can happen independently, without the need to communicate between processors. Most grid computing projects have no time dependency, and large projects typically deploy across many countries. Some grid projects use the idle power of computers, known as cycle-scavenging, and may run in the background for many weeks. The SETI@home project, a search for extraterrestrial intelligence, is one example of grid computing. Millions of PCs run a search program on a segmented piece of radio telescope data with no set completion date. Other examples of grid computing projects include genetics research, drug design, geophysical prospecting, mechanical engineering, movie special effects, financial analytics and cancer research.
Grid computing may not be used for traditional business applications due to its lack of end-to-end security and usage metering, plus concerns about licensing, privacy, reliability and latency. Government, scientific and financial organizations utilize grid computing for massive computational projects because it makes efficient use of idle computer resources, such as servers and desktops that are not used after business hours. Using IP-based networks, a grid can link hundreds or thousands of inexpensive computers, sharing resources over diverse operating systems such as Linux, Solaris, UNIX and Windows. Many SUSE Linux partners offer HPC, clustering and grid computing solutions for more efficient processing power.
Mainframe
A mainframe is a high-capacity computer that often serves as the central data repository in an organization’s IT infrastructure. It is linked to users through less powerful devices such as workstations or terminals. Centralizing data in a single mainframe repository makes it easier to manage, update and protect the integrity of the data. Mainframes are generally used for large-scale computer processes that require higher availability and security than smaller-scale machines can provide. For example, the IBM z13 mainframe is capable of processing 2.5 billion transactions per day.
The original mainframes were housed in room-sized metal frames, commonly called “big iron.” In the past, a typical mainframe might have occupied 2,000 to 10,000 square feet. Newer mainframes are the size of a large refrigerator, occupying less than 10 square feet. Some computer manufacturers don’t use the term mainframe, calling any commercial-use computer a server; a “mainframe” is simply the largest type of server. In most cases, mainframe refers to computers that can support thousands of applications and input/output devices simultaneously, serving many thousands of users.
Long past their predicted extinction in 1996, mainframes are the only type of computing hardware that can handle the huge volumes of transactions used in many industries today, including banking, insurance, healthcare, government and retail. According to IBM, 80% of the world’s corporate data is still managed by mainframes. More than a quarter of the mainframe processing capacity that IBM ships is used to run Linux. Mainframes are ideal for server consolidation, where one mainframe can run as many as 100,000 virtual Linux servers. SUSE Linux Enterprise provides support for IBM z System mainframes.
Supercomputer
A supercomputer is a computer with a high level of performance compared to a general-purpose computer. Performance of a supercomputer is measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Supercomputers contain tens of thousands of processors and can perform billions and trillions of calculations or computations per second. Some supercomputers can perform up to a hundred quadrillion FLOPS. Since information moves quickly between processors in a supercomputer (compared to distributed computing systems) they are ideal for real-time applications.
Supercomputers are used for data-intensive and computation-heavy scientific and engineering purposes such as quantum mechanics, weather forecasting, oil and gas exploration, molecular modeling, physical simulations, aerodynamics, nuclear fusion research and cryptoanalysis. Early operating systems were custom made for each supercomputer to increase its speed. In recent years, supercomputer architecture has moved away from proprietary, in-house operating systems to Linux. Although most supercomputers use a Linux-based operating system, each manufacturer optimizes its own Linux derivative for peak hardware performance. In 2017, half of the world’s top 50 supercomputers used SUSE Enterprise Linux Server.
The largest, most powerful supercomputers are actually multiple computers that perform parallel processing. Today, many academic and scientific research firms, engineering companies and large enterprises that require massive processing power are using cloud computing instead of supercomputers. High performance computing (HPC) via the cloud is more affordable, scalable and faster to upgrade than on-premises supercomputers. Cloud-based HPC architectures can expand, adapt and shrink as business needs demand. SUSE Linux Enterprise High Performance Computing allows organizations to leverage their existing hardware for HPC computations and data-intensive operations.
Deep Learning
Deep learning, also known as deep neural networks, is a machine learning method based on digital representations rather than task-specific algorithms. Deep learning architectures are inspired by the structure of the brain. Improvements in mathematical formulas and increasingly powerful computers have enabled computer scientists to model many layers of virtual neurons. Deep learning software can be trained to recognize patterns in unstructured data, such as digital representations of sounds, images, video and text.
Deep learning algorithms have been applied to computerized vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics and drug design. Unlike other machine learning techniques, the performance of deep learning neural networks does not plateau; deep learning performance continues to increase as the system grows and receives more data input. Because deep learning excels at identifying patterns in unstructured data, enterprises can use it to unlock the value of data they already have, revealing patterns they can use to create or improve products and services. For example, a deep learning model that combines a customer’s search queries, browsing history, news preferences, and movie and TV show rankings can provide a more accurate purchase suggestion. Personalized recommendations on a shopping website can improve a retailer’s competitive advantage as well as their customer relationship management.
Some deep learning workloads make heavy use of CPUs, while others make heavy use of GPUs. The unpredictable process of training neural networks (some training takes hours while other training may take days) requires rapid on-demand scaling of virtual machine pools. Kubernetes clusters offer a flexible, cost-effective option for training different types of deep learning workloads.
Computer Cluster
A computer cluster is a set of connected computers (nodes) that work together as if they are a single (much more powerful) machine. Unlike grid computers, where each node performs a different task, computer clusters assign the same task to each node. Nodes in a cluster are usually connected to each other through high-speed local area networks. Each node runs its own instance of an operating system. A computer cluster may range from a simple two-node system connecting two personal computers to a supercomputer with a cluster architecture. Computer clusters are often used for cost-effective high performance computing (HPC) and high availability (HA) by businesses of all sizes. If a single component fails in a computer cluster, the other nodes continue to provide uninterrupted processing.
Compared to a single computer, a computer cluster can provide faster processing speed, larger storage capacity, better data integrity, greater reliability and wider availability of resources. Computer clusters are usually dedicated to specific functions, such as load balancing, high availability, high performance or large-scale processing. Compared to a mainframe computer, the amount of power and processing speed produced by a cluster is more cost effective. The networked nodes in a cluster also create an efficient, distributed infrastructure that prevents bottlenecks, thus improving performance.
Computer clustering relies on centralized management software that makes the nodes available as orchestrated servers. The right enterprise operating system can prevent application downtime with clustering that replicates data across multiple computer clusters and provides service failover across any distance with geo clustering. SUSE Linux Enterprise High Availability Extension can protect workloads across globally distributed data centers. It allows companies to deploy both physical and virtual Linux clusters across data centers, ensuring business continuity.
Machine Learning
Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. It evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning focuses on the study of algorithms that can learn from data and make predictions (or decisions) based on models built from sample inputs.
Machine learning also refers to a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. The ability to automatically apply complex mathematical calculations to big data has led to the development of machine learning applications such as self-driving cars, online shopping recommendations, email filtering, intruder prevention, fraud detection and natural language interfaces.
Machine learning is closely related to computational statistics, which also uses computer prediction-making. The growing volumes and varieties of available data, cheaper compute processing and more affordable data storage are driving new demands for machine learning. Enterprises are looking for ways to quickly and automatically produce models that can analyze more complex data and deliver more accurate results, thereby identifying new opportunities or avoiding risks. Using machine learning to build predictive models can help organizations make data-driven decisions without human intervention. A supercomputer or high performance computing (HPC) infrastructure is generally required to build machine learning applications. SUSE Linux Enterprise High Performance Computing allows companies to leverage underlying hardware to power their machine learning applications and data analysis.
Artificial Intelligence
Artificial intelligence (AI) is computer technology that makes it possible for machines to learn from experience, adjust to new inputs and perform human-like tasks and problem solving. AI can adapt through progressive learning algorithms to let the data do the programming. Just as algorithms can teach a computer how to play chess, an online shopping site can teach itself what product to recommend next. AI is commonly used in content delivery networks, military simulations, and data analysis of images and videos. Other examples of AI include chess-playing computers, self-driving cars and speech recognition by wireless devices.
Artificial intelligence relies on deep learning and natural language processing. Computers can be trained to accomplish specific tasks by processing large amounts of data and recognizing patterns in the data. AI automates repetitive learning and discovery through data input and analysis. Unlike the hardware-driven, robotic automation of manual tasks, AI can perform frequent, high-volume computerized tasks reliably without fatigue. AI is usually not sold as an individual application but as an additional feature of existing products, such as Siri was added to Apple devices. AI can be combined with large amounts of data to improve network security, investment analysis and medical diagnostics. The more data input, the more accurate AI learning becomes.
Artificial intelligence can be added to banking and finance applications, pharmaceutical development and research, shipping and transportation logistics, healthcare management, manufacturing operations and agriculture processes. A supercomputer or high performance computing (HPC) infrastructure is generally required to build AI into business applications. With a strong HPC operating system, computers can improve their ability to recognize patterns, iteratively learn from data analysis, and sharpen predictive analytics. SUSE Linux Enterprise High Performance Computing can leverage underlying hardware to power new data-intensive AI applications across many industries.
High-Performance Computing
High Performance Computing (HPC) is the IT practice of aggregating computing power to deliver more performance than a typical computer can provide. Originally used to solve complex scientific and engineering problems, HPC is now used by businesses of all sizes for data-intensive tasks. Companies that provide automotive engineering, pharmaceutical design, oil and gas exploration, renewable energy research, entertainment and media, financial analytics, and consumer product manufacturing rely on HPC for scalable business computing.
An HPC system is typically a cluster of computers or nodes, with each node containing one to four processors and each processor containing two to four cores. A common cluster size in many businesses is between 16 and 64 nodes, with 64 to 256 cores. Linux is the dominant operating system for HPC installations. HPC applications usually require fast network and storage performance, large amounts of memory and very high compute capabilities. An HPC cluster running in the cloud can scale to larger numbers of parallel tasks than most on-premises environments. A cloud-based HPC cluster allows users to focus on their applications and research output instead of IT maintenance. HPC cloud systems only charge for the services clients actually use, so businesses can optimize costs without paying for idle compute capacity.
HPC infrastructures are complex in both design and operation, involving a large set of interdependent hardware and software elements that must be precisely configured and seamlessly integrated across a growing number of compute nodes. SUSE Linux Enterprise for HPC is designed for scalability and includes tools that simplify configuration and management. SUSE Linux Enterprise Real Time is ideal for time-sensitive HPC applications.
RELATED TOPICS
IT Infrastructure Management
Infrastructure management is the management of both technical and operational components—including hardware, software, policies, processes, data, facilities and equipment—for business effectiveness. I...
Mission-Critical Computing
Mission-critical computing, also known as a mission-critical system, is any IT component (software, hardware, database, process, application, etc.) that performs a function essential to business opera...