The next gen platform for the edge: SUSE and Synadia Bring Two-Node High Availability to Kubernetes | SUSE Communities

The next gen platform for the edge: SUSE and Synadia Bring Two-Node High Availability to Kubernetes

Share
Share
SUSE and Synadia are partnering to deliver a native two-node option in K3s. This joint solution, powered by K3s and NATS.io, combines services and technology capable of changing the development and operational landscape at the edge.

Many leaders are struggling to figure out an edge strategy that can simultaneously leverage their existing infrastructure and enable innovation. The edge, often defined as the data processed and maintained outside of an on-prem data center, is growing into a dominant market.

The challenge: Unfortunately from a design perspective, what worked for the cloud, won’t necessarily work for the edge: when solutions aren’t built to scale, emerging technologies will out-pace current capacities.

&

This is already happening as workloads go to the edge from the cloud or data center: the traditional communication layer of enterprise messaging systems doesn’t work. Many companies have invested in physical infrastructure at physical sites and, especially in industries like retail, medical, automotive, have historically wanted to have HA with a limited two-node configuration. If one fails, there is a backup.

However, modern architecture replication standards require an odd number of systems (and certainly more than one!) – the HW/SW costs of which would certainly exceed the savings of true HA.

Together, SUSE and Synadia’s open innovation approach aims to solve this problem by combining K3s and NATS to bring the next generation stack for edge developers.

The solution: We’re embarking on a mission to empower customers by enabling them to leverage their existing two-node infrastructure setups and optimize their hardware budgets. This involves achieving Kubernetes high availability (HA) with just two nodes, previously considered impossible due to 1) the best practices of HA requiring three nodes, and 2) the challenges associated with etcd, the distributed key-value store at the heart of Kubernetes.

Our approach leverages a combination of known constraints and the NATS messaging system to maintain system state safely and efficiently with only two nodes. While this isn’t entirely new ground (as the concept of active recovery isn’t unheard of), the innovation lies in seamlessly integrating NATS with K3s/Kine and establishing a set of application specific constraints that can be solved while still ensuring HA and resolving potential split-brain scenarios.

Why now? The two node problem stems from a previous generation of infrastructure decisions where active/passive architecture was common. These architecture decisions were likely a cost consideration. On the flip side, the computer science perspective of data survivability elevates consensus protocols like RAFT (in modern distributed databases) and PAXOS, in which cluster participants must agree in the majority to the integrity of data. Consensus algorithms allow for nodes to fail in a cluster without disruption to service.

But with 2 nodes, this option disappears. Consensus algorithms solve for the split brain problem, but who owns the data if the network is cut in half? There are a whole slew of problems if you’re not conscious of the implications. Nonetheless, legacy and cost based use cases are driving the need for a two node solution.

Many industries have computers in their stores where there is only ever an active/passive set up.

The benefits of a two-node HA solution

Economic Advantages:

  • Reduced Hardware Costs: Two-node deployments significantly decrease hardware expenses compared to traditional three-node setups. This is particularly relevant in resource-constrained environments or industries with specific hardware requirements (such as manufacturing, where fanless systems are essential).
  • Simplified Infrastructure Management: Fewer nodes translate to less infrastructure complexity, leading to reduced operational costs and easier management for IT teams.
  • Improved Space Utilization: In situations with physical space limitations, like edge locations, the compact two-node solution offers significant advantages. This is crucial for environments like oil rigs, charging stations, or other space constrained sites like hospital wards.
Technical Benefits:

  • Moving Beyond etcd Constraints: By leveraging the open Kine interface it has provided the opportunity to inject alternative technologies to improve performance and scalability for edge deployments with limited resources. Embedding NATS within K3s as a default configuration simplifies both the developer and system admin experience.
  • Flexibility and Adaptability: The interoperable and cross-industry standard K3s + NATS stack allows for efficient deployment in diverse environments, catering to the needs of various use cases and accommodating existing infrastructure limitations. This same stack can also seamlessly extend to supporting the rest of your edge-based applications by providing high-performance data streaming, request reply, object storage capabilities as well.
Reaching all stakeholders

The true impact of a K3s + NATS stack, complete with two-node HA configuration, lies in its ability to address the specific needs of various stakeholders. From the c-suite to developers, standardization at the edge allows companies to innovate as quickly as possible with the least amount of friction at the edge.

CXO-Level Decision Makers:

Executives involved in cloud and edge transformations can innovate at the edge without the requirements of significant HW investments by using the environments they already have.

  • Essential for Future Competitiveness: Accenture found that 83% of businesses believe edge computing is crucial for future competitiveness [1]. By embracing the K3s + NATS stack, leaders can revive adoption and be equipped for long-term success.
  • Bridging the Gap: The simplification and standardization of an edge stack will enable businesses to connect their digital core to the edge, unlocking the potential of real-time data and AI models executed outside of the cloud.
  • Disruptive Potential: Disrupt or be disrupted by someone acting quicker and leveraging data faster. By connecting the digital core to the edge, early adopters of edge solutions equip their businesses to succeed.

83% of businesses believe edge computing is crucial for future competitiveness.

Accenture, 2023 [1]

Engineer, Architects, and Developers:

In a world of dense complexity we aim to simplify the delivery, sourcing, and understanding of technology. Engineers, architects, and developers who are trying to move quickly and solve a solution eloquently will be happy to know that the K3s + NATS industry standard edge stack they are using is tested and secure in their two-node constrained use cases.

  • Improved Resource Efficiency: Ideal for resource-constrained environments, the compact two-node solution optimizes space utilization and simplifies maintenance. Especially with K3s as the unit of delivery – bringing together the benefits of kubernetes and orchestration. The possibilities of a single binary server deployment that encapsulates kubernetes and properly handles the failure conditions of the two-node setup are incredibly powerful.
  • Enhanced Performance: Understanding the intricacies of two-node deployments is crucial for developers to avoid potential issues and ensure smooth operation. Workload management configurations affect the way workloads are spun up and managed during failover situations: the two-node HA configuration seamlessly handles failover scenarios, ensuring system state, storage, and message integrity.
  • Software challenges: The software itself needs to be able to handle the failover scenarios, including things like system state, storage, and making sure messages don’t get lost. Normally, a three-node setup has load balancing, while in a two-node configuration there are active and passive designations. When the passive becomes active, it will need workload and that workload is spun up from scratch with no knowledge of the current state. If software developers don’t understand how the cluster degrades, they can inadvertently cause issues later on. Solving unique challenges customers at the edge are increasingly going to face is going to depend on developers’ abilities to move beyond the status quo.
The challenges we look forward to addressing

While the core K3S/Kine-NATS integration is available for anyone to download, the two-node framework is a custom solution available as part of a joint engagement. We are actively looking to collaborate with early adopters to further define the two-node approach and continuously working to address areas like automation, hardware, behavior tradeoffs, and security considerations and we’re confident that we will unlock the full potential of two-node Kubernetes.

  • Automation & Hardware: Currently, setup and installation require manual intervention, and further work is needed in areas like hardware integration, storage integration, and automatic clustering. Future collaboration with hardware vendors has the potential to revolutionize future node interaction by developing an out of band way to physically plug the two nodes together.
  • Behavior Tradeoffs: As technology becomes more complex and humans are increasingly involved, maintaining simplicity is critical. For example, while the system offers default probes and script-based user-defined probes for various environments, further flexibility in probe behavior and state machine decisions is needed. In practice, we will see that depending on the environments this solution gets deployed in, we will see additional probes and preferred behaviors change over time.
  • Security: Managing compromised assets and ensuring endpoint security in highly distributed environments are crucial challenges as the scale of edge deployments increases. With K3s and NATS, the consistency of tech and behavior in running mission critical services from the datacenter to the edge will ensure that, as endpoints become more intelligent, we can continue to drive security to the edge.

In the long term, we’re committed to removing complexity from the technology landscape and staying ahead of the curve in a world driven by the need for faster and more accurate decisions. The future integration of machine learning and autotuning will further enable this technology to scale beyond the current limits.

Let’s build the future together

The edge is not going away and two-node configurations are the key to unlocking its transformative potential. Together, K3s and NATS create a future where businesses harness the power of data closer to its source, drive innovation, and achieve unprecedented levels of agility and efficiency.

This is not just a technological revolution – it’s the dawn of a new era where simplicity empowers progress and allows us to achieve the seemingly impossible.

Explore to find out more

NATS Slack

Rancher Slack

Synadia Rethink Connectivity

Come check out a demo at Kubernetes on Edge Day and KubeCon EMEA 2024!

[1] Accenture (2023). Leading with Edge Computing. Retrieved from https://www.accenture.com/content/dam/accenture/final/accenture-com/document-2/Accenture-Leading-With-Edge-Computing.pdf#zoom=40.
Share
Avatar photo
1,909 views