Best Practices for EKS Cluster Management on AWS Using SUSE Rancher Prime

Share
Share

This article is intended to provide a guideline of best practices for greenfield or brownfield deployments on EKS clusters managed by SUSE Rancher Prime.

  • Understand Requirements:
    • Review Options for running Rancher Prime on AWS.
      • Read the blog: https://www.suse.com/c/options-for-running-rancher-on-aws/
      • Explore the Cloud Native SUSE Portfolio built from open source software with SLSA Level 3 certification.
        • SUSE Security = NeuVector + Kubewarden
        • SUSE Observability = StackState
        • SUSE Rancher Prime = Application Collection + Rancher + SUSE Security + SUSE Observability + RKE2 + k3s
        • SUSE Rancher Prime Hosted for AWS – Fully SUSE Supported Rancher Manager with 99.9% SLA uptime
        • SUSE Virtualization = Harvester
        • SUSE Storage = Longhorn
        • SUSE Rancher Suite = All of the above
      • Installation Requirements :: Rancher product documentation
  • Plan your EKS cluster size and node types based on your workload needs. (pulled from AWS best practices)
    • Understand your workloads:
      • Application type:
        Identify the types of applications running on your cluster (e.g., web servers, databases, batch processing, machine learning).
      • Resource usage:
        Estimate the average and peak CPU, memory, and storage needs of each application pod.
      • Traffic patterns:
        Analyze expected traffic fluctuations and peak usage times to determine scaling requirements.
      • Latency sensitivity:
        Consider if your application has strict latency requirements, which might influence node selection.
      • Choose node types:
        • General purpose instances:
          For diverse workloads with balanced CPU and memory needs, consider “m” series instances (e.g., m5.large, m5.xlarge).
        • CPU-optimized instances:
          If your workloads are primarily CPU intensive, choose “c” series instances (e.g., c5.xlarge, c5.2xlarge).
        • Memory-optimized instances:
          For large data processing or in-memory applications, opt for “r” series instances (e.g., r5.large, r5.2xlarge).
        • GPU-optimized instances:
          If your workload involves machine learning or graphics rendering, use “p” series instances (e.g., p3.xlarge, p3.2xlarge).
  • Determine initial cluster size:
    • Start small:
      Begin with a small cluster with a few nodes to test your application and optimize resource usage.
    • Consider redundancy:
      For high availability, distribute nodes across multiple Availability Zones (AZs).
    • Scaling strategy:
      Plan how to scale your cluster horizontally by adding nodes as needed using the Kubernetes Cluster Autoscaler.
  • Configure node groups:
    • Multiple node groups:
      Create separate node groups with different instance types to cater to specific application requirements.
    • Taints and tolerations:
      Use taints and tolerations to restrict pod scheduling to specific node types.
  • Example scenarios:
    • Web application with moderate traffic:
      Start with a small cluster using m5.large instances, utilizing the Cluster Autoscaler to scale based on real-time demand.
    • High-performance computing workload:
      Deploy a cluster with c5.xlarge or c5.2xlarge instances for optimal CPU performance.
    • Large-scale data processing:
      Use r5.xlarge or r5.2xlarge instances to handle large datasets with high memory requirements.
  • Decide on your networking strategy (VPC, subnets, security groups).
    • Network Segmentation:
      Divide your VPC into multiple subnets based on application functionality, ensuring isolation between different tiers.
    • Dedicated Security Groups:
      Create unique security groups for each application tier to enforce granular access controls. RBAC available via SUSE Rancher Prime.
    • Regular Review and Updates:
      Periodically review your security group rules and update them as needed to maintain a secure network.
    • Monitoring:
      Implement network monitoring tools to detect suspicious activity and identify potential security vulnerabilities.

      • SUSE Observability included in SUSE Rancher Prime and Rancher Prime Suite.
  • AWS Account Setup:
    • Create an AWS account if you don’t have one.
    • Configure AWS credentials with appropriate permissions for EKS and other AWS services.
    • Set up an IAM user and role with the necessary policies for Rancher Prime to operate.
  • Tooling:
    • Install and configure essential tools: AWS CLI, eksctl, kubectl, and Helm.
  • Networking:
  • EKS Cluster Creation
  • Kubernetes Version:
    • Choose a supported Kubernetes version that aligns with Rancher Prime’s compatibility.
    • Stay updated with the latest Kubernetes releases for security and features.

Rancher Prime Installation

  • Helm Chart:
    • Use the Helm chart from AWS Marketplace for installing Rancher Prime on your EKS cluster.
    • Customize the Helm chart values to match your environment and preferences.
    • Installing Rancher on Amazon EKS
    • Optionally, you can create the SUSE Rancher Prime management deployment on EC2 directly. This will use RKE2 as the Kubernetes platform.
  • High Availability:
  • Ingress:
    • Set up an Ingress controller (e.g., Nginx) to expose Rancher Prime externally.
    • Obtain and configure SSL certificates for secure access.

Security Best Practices

  • Deploy SUSE Security:
  • IAM Roles and Policies:
    • Follow the principle of least privilege when assigning IAM roles and policies.
    • Regularly rotate your AWS access keys.
  • Network Security:
    • Implement security groups to control traffic flow to and from your EKS cluster.
  • Secrets Management:
    • Utilize AWS Secrets Manager to securely store and manage sensitive data.
    • Integrate Rancher Prime with Secrets Manager for Kubernetes secrets.
  • Pod Security Policies:
    • Enforce pod security policies to define security constraints for your workloads using Kubewarden, included with SUSE Rancher Prime.
  • Regular Security Audits:
    • Conduct regular security audits to identify and address vulnerabilities.

Monitoring and Logging

    • SUSE Observability
      • End to end monitoring, tracing, historical data for reducing MTTR and future downtime.
    • CloudWatch:
      • Integrate EKS with CloudWatch for monitoring cluster health and performance.
      • Set up alerts for critical events and metrics.
    • Logging:
      • Centralize your logs using a logging solution Elasticsearch or Fluentd.
      • Monitor logs for errors and potential security issues in SUSE Observability.

Backup and Disaster Recovery

    • etcd Backups:
      • Regularly back up your etcd data for cluster recovery.
      • Store backups securely in an S3 bucket.
    • SUSE Rancher Prime Backups:
      • Use SUSE Rancher Prime’s backup and restore functionality to protect your configuration. Can be scheduled or run on demand.
    • Disaster Recovery Plan:
      • Develop a disaster recovery plan to ensure business continuity in case of an outage. DR for your SUSE Rancher Prime clusters can be accomplished across AZ and regions.

Ongoing Management

    • Updates and Upgrades:
      • Stay up-to-date with the latest Rancher Prime and Kubernetes releases. Rancher Prime Hosted does this for you for the Rancher instances.
      • Follow the recommended upgrade procedures to minimize downtime.
    • Cluster Scaling:
      • Monitor your cluster resources and scale your nodes as needed.
      • Utilize auto-scaling to dynamically adjust your cluster size.
    • Cost Optimization:
      • Optimize your EKS cluster costs by right-sizing your nodes and using spot instances (SUSE Rancher Prime does this for you).
      • Monitor your AWS billing and identify areas for cost savings. SUSE Application Collection provides an enterprise build of OpenCost.

Additional Tips

  • Infrastructure as Code:
    • Use the SUSE Rancher Prime IaC tools.
    • Use tools like Terraform or CloudFormation to manage your AWS infrastructure.
    • This allows for reproducible deployments and easier management.
  • GitOps:
    • Implement GitOps and SUSE Fleet for managing your Kubernetes configurations and applications.
    • This provides a version-controlled and auditable approach to deployments.
  • Support:
    • Leverage SUSE’s support resources.
Share
(Visited 1 times, 1 visits today)
Avatar photo
216 views
Ted Jones is an architect on the Global Cloud Alliance team at SUSE focused on the Secure Container Platform domain.