SUSE OpenStack Cloud 8

Operations Guide

This guide provides a list of useful procedures for managing your SUSE OpenStack Cloud 8 cloud. The audience is the admin-level operator of the cloud.

Publication Date: 11/05/2018
1 Operations Overview
1.1 What is a cloud operator?
1.2 Tools provided to operate your cloud
1.3 Daily tasks
1.4 Weekly or monthly tasks
1.5 Semi-annual tasks
1.6 Troubleshooting
1.7 Common Questions
2 Tutorials
2.1 SUSE OpenStack Cloud Quickstart Guide
2.2 Log Management and Integration
2.3 Integrating Your Logs with Splunk
2.4 Integrating SUSE OpenStack Cloud with an LDAP System
3 Third-Party Integrations
3.1 Splunk Integration
3.2 Nagios Integration
3.3 Operations Bridge Integration
3.4 Monitoring Third-Party Components With Monasca
4 Managing Identity
4.1 The Identity Service
4.2 Supported Upstream Keystone Features
4.3 Understanding Domains, Projects, Users, Groups, and Roles
4.4 Identity Service Token Validation Example
4.5 Configuring the Identity Service
4.6 Retrieving the Admin Password
4.7 Changing Service Passwords
4.8 Reconfiguring the Identity Service
4.9 Integrating LDAP with the Identity Service
4.10 Keystone-to-Keystone Federation
4.11 Configuring Web Single Sign-On
4.12 Identity Service Notes and Limitations
5 Managing Compute
5.1 Managing Compute Hosts using Aggregates and Scheduler Filters
5.2 Using Flavor Metadata to Specify CPU Model
5.3 Forcing CPU and RAM Overcommit Settings
5.4 Enabling the Nova Resize and Migrate Features
5.5 Enabling ESX Compute Instance(s) Resize Feature
5.6 Configuring the Image Service
6 Managing ESX
6.1 Networking for ESXi Hypervisor (OVSvApp)
6.2 Validating the Neutron Installation
6.3 Removing a Cluster from the Compute Resource Pool
6.4 Removing an ESXi Host from a Cluster
6.5 Configuring Debug Logging
6.6 Making Scale Configuration Changes
6.7 Monitoring vCenter Clusters
6.8 Monitoring Integration with OVSvApp Appliance
7 Managing Block Storage
7.1 Managing Block Storage using Cinder
8 Managing Object Storage
8.1 Running the Swift Dispersion Report
8.2 Gathering Swift Data
8.3 Gathering Swift Monitoring Metrics
8.4 Using the Swift Command-line Client (CLI)
8.5 Managing Swift Rings
8.6 Configuring your Swift System to Allow Container Sync
9 Managing Networking
9.1 Configuring the SUSE OpenStack Cloud Firewall
9.2 DNS Service Overview
9.3 Networking Service Overview
10 Managing the Dashboard
10.1 Configuring the Dashboard Service
10.2 Changing the Dashboard Timeout Value
11 Managing Orchestration
11.1 Configuring the Orchestration Service
11.2 Autoscaling using the Orchestration Service
12 Managing Monitoring, Logging, and Usage Reporting
12.1 Monitoring
12.2 Centralized Logging Service
12.3 Metering Service (Ceilometer) Overview
13 System Maintenance
13.1 Planned System Maintenance
13.2 Unplanned System Maintenance
13.3 Cloud Lifecycle Manager Maintenance Update Procedure
14 Backup and Restore
14.1 Architecture
14.2 Architecture of the Backup/Restore Service
14.3 Default Automatic Backup Jobs
14.4 Enabling Default Backups of the Control Plane to an SSH Target
14.5 Changing Default Jobs
14.6 Backup/Restore Via the Horizon UI
14.7 Restore from a Specific Backup
14.8 Backup/Restore Scheduler
14.9 Backup/Restore Agent
14.10 Backup and Restore Limitations
14.11 Disabling Backup/Restore before Deployment
14.12 Enabling, Disabling and Restoring Backup/Restore Services
14.13 Backing up and Restoring Audit Logs
15 Troubleshooting Issues
15.1 General Troubleshooting
15.2 Control Plane Troubleshooting
15.3 Troubleshooting Compute Service
15.4 Network Service Troubleshooting
15.5 Troubleshooting the Image (Glance) Service
15.6 Storage Troubleshooting
15.7 Monitoring, Logging, and Usage Reporting Troubleshooting
15.8 Backup and Restore Troubleshooting
15.9 Orchestration Troubleshooting
15.10 Troubleshooting Tools
List of Examples
4.1 k2kclient.py

1 Operations Overview

A high-level overview of the processes related to operating a SUSE OpenStack Cloud 8 cloud.

1.1 What is a cloud operator?

When we talk about a cloud operator it is important to understand the scope of the tasks and responsibilities we are referring to. SUSE OpenStack Cloud defines a cloud operator as the person or group of people who will be administering the cloud infrastructure, which includes:

  • Monitoring the cloud infrastructure, resolving issues as they arise.

  • Managing hardware resources, adding/removing hardware due to capacity needs.

  • Repairing, and recovering if needed, any hardware issues.

  • Performing domain administration tasks, which involves creating and managing projects, users, and groups as well as setting and managing resource quotas.

1.2 Tools provided to operate your cloud

SUSE OpenStack Cloud provides the following tools which are available to operate your cloud:

Operations Console

Often referred to as the Ops Console, you can use this console to view data about your cloud infrastructure in a web-based graphical user interface (GUI) to make sure your cloud is operating correctly. By logging on to the console, SUSE OpenStack Cloud administrators can manage data in the following ways:

  • Triage alarm notifications in the central dashboard

  • Monitor the environment by giving priority to alarms that take precedence

  • Manage compute nodes and easily use a form to create a new host

  • Refine the monitoring environment by creating new alarms to specify a combination of metrics, services, and hosts that match the triggers unique to an environment

  • Plan for future storage by tracking capacity over time to predict with some degree of reliability the amount of additional storage needed

For more details on how to connect to and use the Operations Console, see Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.1 “Operations Console Overview”.

Dashboard

Often referred to as Horizon or the Horizon dashboard, you can use this console to manage resources on a domain and project level in a web-based graphical user interface (GUI). The following are some of the typical operational tasks that you may perform using the dashboard:

  • Creating and managing projects, users, and groups within your domain.

  • Assigning roles to users and groups to manage access to resources.

  • Setting and updating resource quotas for the projects.

For more details, see the following pages:

Command-line interface (CLI)

Each service within SUSE OpenStack Cloud provides a command-line client, such as the novaclient (sometimes referred to as the python-novaclient or nova CLI) for the Compute service, the keystoneclient for the Identity service, etc. There is also an effort in the OpenStack community to make a unified client, called the openstackclient, which will combine the available commands in the various service-specific clients into one tool. By default, we install each of the necessary clients onto the hosts in your environment for you to use.

You will find processes defined in our documentation that use these command-line tools. There is also a list of common cloud administration tasks which we have outlined which you can use the command-line tools to do. For more details, see Book “User Guide Overview”, Chapter 4 “Cloud Admin Actions with the Command Line”.

1.3 Daily tasks

  • Ensure your cloud is running correctly: SUSE OpenStack Cloud is deployed as a set of highly available services to minimize the impact of failures. That said, hardware and software systems can fail. Detection of failures early in the process will enable you to address issues before they affect the broader system. SUSE OpenStack Cloud provides a monitoring solution, based on OpenStack’s Monasca, which provides monitoring and metrics for all OpenStack components and much of the underlying system, including service status, performance metrics, compute node, and virtual machine status. Failures are exposed via the Operations Console and/or alarm notifications. In the case where more detailed diagnostics are required, you can use a centralized logging system based on the Elasticsearch, Logstash, and Kibana (ELK) stack. This provides the ability to search service logs to get detailed information on behavior and errors.

  • Perform critical maintenance: To ensure your OpenStack installation is running correctly, provides the right access and functionality, and is secure, you should make ongoing adjustments to the environment. Examples of daily maintenance tasks include:

    • Add/remove projects and users. The frequency of this task depends on your policy.

    • Apply security patches (if released).

    • Run daily backups.

1.4 Weekly or monthly tasks

  • Do regular capacity planning: Your initial deployment will likely reflect the known near to mid-term scale requirements, but at some point your needs will outgrow your initial deployment’s capacity. You can expand SUSE OpenStack Cloud in a variety of ways, such as by adding compute and storage capacity.

To manage your cloud’s capacity, begin by determining the load on the existing system. OpenStack is a set of relatively independent components and services, so there are multiple subsystems that can affect capacity. These include control plane nodes, compute nodes, object storage nodes, block storage nodes, and an image management system. At the most basic level, you should look at the CPU used, RAM used, I/O load, and the disk space used relative to the amounts available. For compute nodes, you can also evaluate the allocation of resource to hosted virtual machines. This information can be viewed in the Operations Console. You can pull historical information from the monitoring service (OpenStack’s Monasca) by using its client or API. Also, OpenStack provides you some ability to manage the hosted resource utilization by using quotas for projects. You can track this usage over time to get your growth trend so that you can project when you will need to add capacity.

1.5 Semi-annual tasks

  • Perform upgrades: OpenStack releases new versions on a six-month cycle. In general, SUSE OpenStack Cloud will release new major versions annually with minor versions and maintenance updates more often. Each new release consists of both new functionality and services, as well as bug fixes for existing functionality.

Note
Note

If you are planning to upgrade, this is also an excellent time to evaluate your existing capabilities, especially in terms of capacity (see Capacity Planning above).

1.6 Troubleshooting

As part of managing your cloud, you should be ready to troubleshoot issues, as needed. The following are some common troubleshooting scenarios and solutions:

How do I determine if my cloud is operating correctly now?: SUSE OpenStack Cloud provides a monitoring solution based on OpenStack’s Monasca service. This service provides monitoring and metrics for all OpenStack components, as well as much of the underlying system. By default, SUSE OpenStack Cloud comes with a set of alarms that provide coverage of the primary systems. In addition, you can define alarms based on threshold values for any metrics defined in the system. You can view alarm information in the Operations Console. You can also receive or deliver this information to others by configuring email or other mechanisms. Alarms provide information about whether a component failed and is affecting the system, and also what condition triggered the alarm.

How do I troubleshoot and resolve performance issues for my cloud?: There are a variety of factors that can affect the performance of a cloud system, such as the following:

  • Health of the control plane

  • Health of the hosting compute node and virtualization layer

  • Resource allocation on the compute node

If your cloud users are experiencing performance issues on your cloud, use the following approach:

  1. View the compute summary page on the Operations Console to determine if any alarms have been triggered.

  2. Determine the hosting node of the virtual machine that is having issues.

  3. On the compute hosts page, view the status and resource utilization of the compute node to determine if it has errors or is over-allocated.

  4. On the compute instances page you can view the status of the VM along with its metrics.

How do I troubleshoot and resolve availability issues for my cloud?: If your cloud users are experiencing availability issues, determine what your users are experiencing that indicates to them the cloud is down. For example, can they not access the Dashboard service (Horizon) console or APIs, indicating a problem with the control plane? Or are they having trouble accessing resources? Console/API issues would indicate a problem with the control planes. Use the Operations Console to view the status of services to see if there is an issue. However, if it is an issue of accessing a virtual machine, then also search the consolidated logs that are available in the ELK stack or errors related to the virtual machine and supporting networking.

1.7 Common Questions

To manage a cloud, how many administrators do I need?

A 24x7 cloud needs a 24x7 cloud operations team. If you already have a NOC, managing the cloud can be added to their workload.

A cloud with 20 nodes will need a part-time person. You can manage a cloud with 200 nodes with two people. As the amount of nodes increases and processes and automation are put in place, you will need to increase the number of administrators but the need is not linear. As an example, if you have 3000 nodes and 15 clouds you will probably need 6 administrators.

What skills do my cloud administrators need?

Your administrators should be experienced Linux admins. They should have experience in application management, as well as experience with Ansible. It is a plus if they have experience with Bash shell scripting and Python programming skills.

In addition, you will need networking engineers. A 3000 node environment will need two networking engineers.

What operations should I plan on performing daily, weekly, monthly, or semi-annually?

You should plan for operations by understanding what tasks you need to do daily, weekly, monthly, or semi-annually. The specific list of tasks that you need to perform depends on your cloud configuration, but should include the following high-level tasks specified in the Chapter 2, Tutorials

2 Tutorials

This section contains tutorials for common tasks for your SUSE OpenStack Cloud 8 cloud.

2.1 SUSE OpenStack Cloud Quickstart Guide

2.1.1 Introduction

This document provides simplified instructions for installing and setting up a SUSE OpenStack Cloud. Use this quickstart guide to build testing, demonstration, and lab-type environments., rather than production installations. When you complete this quickstart process, you will have a fully functioning SUSE OpenStack Cloud demo environment.

Note
Note

These simplified instructions are intended for testing or demonstration. Instructions for production installations are in Book “Installing with Cloud Lifecycle Manager.

2.1.2 Overview of components

The following are short descriptions of the components that SUSE OpenStack Cloud employs when installing and deploying your cloud.

Ansible.  Ansible is a powerful configuration management tool used by SUSE OpenStack Cloud to manage nearly all aspects of your cloud infrastructure. Most commands in this quickstart guide execute Ansible scripts, known as playbooks. You will run playbooks that install packages, edit configuration files, manage network settings, and take care of the general administration tasks required to get your cloud up and running.

Get more information on Ansible at https://www.ansible.com/.

Cobbler.  Cobbler is another third-party tool used by SUSE OpenStack Cloud to deploy operating systems across the physical servers that make up your cloud. Find more info at http://cobbler.github.io/.

Git.  Git is the version control system used to manage the configuration files that define your cloud. Any changes made to your cloud configuration files must be committed to the locally hosted git repository to take effect. Read more information on Git at https://git-scm.com/.

2.1.3 Preparation

Successfully deploying a SUSE OpenStack Cloud environment is a large endeavor, but it is not complicated. For a successful deployment, you must put a number of components in place before rolling out your cloud. Most importantly, a basic SUSE OpenStack Cloud requires the proper network infrastrucure. Because SUSE OpenStack Cloud segregates the network traffic of many of its elements, if the necessary networks, routes, and firewall access rules are not in place, communication required for a successful deployment will not occur.

2.1.4 Getting Started

When your network infrastructure is in place, go ahead and set up the Cloud Lifecycle Manager. This is the server that will orchestrate the deployment of the rest of your cloud. It is also the server you will run most of your deployment and management commands on.

Set up the Cloud Lifecycle Manager

  1. Download the installation media

    Obtain a copy of the SUSE OpenStack Cloud installation media, and make sure that it is accessible by the server that you are installing it on. Your method of doing this may vary. For instance, some may choose to load the installation ISO on a USB drive and physically attach it to the server, while others may run the IPMI Remote Console and attach the ISO to a virtual disc drive.

  2. Install the operating system

    1. Boot your server, using the installation media as the boot source.

    2. Choose "install" from the list of options and choose your preferred keyboard layout, location, language, and other settings.

    3. Set the address, netmask, and gateway for the primary network interface.

    4. Create a root user account.

    Proceed with the OS installation. After the installation is complete and the server has rebooted into the new OS, log in with the user account you created.

  3. Configure the new server

    1. SSH to your new server, and set a valid DNS nameserver in the /etc/resolv.conf file.

    2. Set the environment variable LC_ALL:

      export LC_ALL=C

    You now have a server running SUSE Linux Enterprise Server (SLES). The next step is to configure this machine as a Cloud Lifecycle Manager.

  4. Configure the Cloud Lifecycle Manager

    The installation media you used to install the OS on the server also has the files that will configure your cloud. You need to mount this installation media on your new server in order to use these files.

    1. Using the URL that you obtained the SUSE OpenStack Cloud installation media from, run wget to download the ISO file to your server:

      wget INSTALLATION_ISO_URL
    2. Now mount the ISO in the /media/cdrom/ directory

      sudo mount INSTALLATION_ISO /media/cdrom/
    3. Unpack the tar file found in the /media/cdrom/ardana/ directory where you just mounted the ISO:

      tar xvf /media/cdrom/ardana/ardana-x.x.x-x.tar
    4. Now you will install and configure all the components needed to turn this server into a Cloud Lifecycle Manager. Run the ardana-init.bash script from the uncompressed tar file:

      ~/ardana-x.x.x/ardana-init.bash

      The ardana-init.bash script prompts you to enter an optional SSH passphrase. This passphrase protects the RSA key used to SSH to the other cloud nodes. This is an optional passphrase, and you can skip it by pressing Enter at the prompt.

      The ardana-init.bash script automatically installs and configures everything needed to set up this server as the lifecycle manager for your cloud.

      When the script has finished running, you can proceed to the next step, editing your input files.

  5. Edit your input files

    Your SUSE OpenStack Cloud input files are where you define your cloud infrastructure and how it runs. The input files define options such as which servers are included in your cloud, the type of disks the servers use, and their network configuration. The input files also define which services your cloud will provide and use, the network architecture, and the storage backends for your cloud.

    There are several example configurations, which you can find on your Cloud Lifecycle Manager in the ~/openstack/examples/ directory.

    1. The simplest way to set up your cloud is to copy the contents of one of these example configurations to your ~/openstack/mycloud/definition/ directory. You can then edit the copied files and define your cloud.

      cp -r ~/openstack/examples/CHOSEN_EXAMPLE/* ~/openstack/my_cloud/definition/
    2. Edit the files in your ~/openstack/my_cloud/definition/ directory to define your cloud.

  6. Commit your changes

    When you finish editing the necessary input files, stage them, and then commit the changes to the local Git repository:

    cd ~/openstack/ardana/ansible
    git add -A
    git commit -m "My commit message"
  7. Image your servers

    Now that you have finished editing your input files, you can deploy the configuration to the servers that will comprise your cloud.

    1. Image the servers. You will install the SLES operating system across all the servers in your cloud, using Ansible playbooks to trigger the process.

    2. The following playbook confirms that your servers are accessible over their IPMI ports, which is a prerequisite for the imaging process:

      ansible-playbook -i hosts/localhost bm-power-status.yml
    3. Now validate that your cloud configuration files have proper YAML syntax by running the config-processor-run.yml playbook:

      ansible-playbook -i hosts/localhost config-processor-run.yml

      If you receive an error when running the preceeding playbook, one or more of your configuration files has an issue. Refer to the output of the Ansible playbook, and look for clues in the Ansible log file, found at ~/.ansible/ansible.log.

    4. The next step is to prepare your imaging system, Cobbler, to deploy operating systems to all your cloud nodes:

      ansible-playbook -i hosts/localhost cobbler-deploy.yml
    5. Now you can image your cloud nodes. You will use an Ansible playbook to trigger Cobbler to deploy operating systems to all the nodes you specified in your input files:

      ansible-playbook -i hosts/localhost bm-reimage.yml

      The bm-reimage.yml playbook performs the following operations:

      1. Powers down the servers.

      2. Sets the servers to boot from a network interface.

      3. Powers on the servers and performs a PXE OS installation.

      4. Waits for the servers to power themselves down as part of a successful OS installation. This can take some time.

      5. Sets the servers to boot from their local hard disks and powers on the servers.

      6. Waits for the SSH service to start on the servers and verifies that they have the expected host-key signature.

  8. Deploy your cloud

    Now that your servers are running the SLES operating system, it is time to configure them for the roles they will play in your new cloud.

    1. Prepare the Cloud Lifecycle Manager to deploy your cloud configuration to all the nodes:

      ansible-playbook -i hosts/localhost ready-deployment.yml

      NOTE: The preceding playbook creates a new directory, ~/scratch/ansible/next/ardana/ansible/, from which you will run many of the following commands.

    2. (Optional) If you are reusing servers or disks to run your cloud, you can wipe the disks of your newly imaged servers by running the wipe_disks.yml playbook:

      cd ~/scratch/ansible/next/ardana/ansible/
      ansible-playbook -i hosts/verb_hosts wipe_disks.yml

      The wipe_disks.yml playbook removes any existing data from the drives on your new servers. This can be helpful if you are reusing servers or disks. This action will not affect the OS partitions on the servers.

    3. Now it is time to deploy your cloud. Do this by running the site.yml playbook, which pushes the configuration you defined in the input files out to all the servers that will host your cloud.

      cd ~/scratch/ansible/next/ardana/ansible/
      ansible-playbook -i hosts/verb_hosts site.yml

      The site.yml playbook installs packages, starts services, configures network interface settings, sets iptables firewall rules, and more. Upon successful completion of this playbook, your SUSE OpenStack Cloud will be in place and in a running state. This playbook can take up to six hours to complete.

  9. SSH to your nodes

    Now that you have successfully run site.yml, your cloud will be up and running. You can verify connectivity to your nodes by connecting to each one by using SSH. You can find the IP addresses of your nodes by viewing the /etc/hosts file.

    For security reasons, you can only SSH to your nodes from the Cloud Lifecycle Manager. SSH connections from any machine other than the Cloud Lifecycle Manager will be refused by the nodes.

    From the Cloud Lifecycle Manager, SSH to your nodes:

    ssh <management IP address of node>

    Also note that SSH is limited to your cloud's management network. Each node has an address on the management network, and you can find this address by reading the /etc/hosts or server_info.yml file.

2.2 Log Management and Integration

2.2.1 Overview

SUSE OpenStack Cloud uses the ELK (Elasticsearch, Logstash, Kibana) stack for log management across the entire cloud infrastructure. This configuration facilitates simple administration as well as integration with third-party tools. This tutorial covers how to forward your logs to a third-party tool or service, and how to access and search the Elasticsearch log stores through API endpoints.

2.2.2 The ELK stack

The ELK logging stack consists of the Elasticsearch, Logstash, and Kibana elements:

  • Logstash.  Logstash reads the log data from the services running on your servers, and then aggregates and ships that data to a storage location. By default, Logstash sends the data to the Elasticsearch indexes, but it can also be configured to send data to other storage and indexing tools such as Splunk.

  • Elasticsearch.  Elasticsearch is the storage and indexing component of the ELK stack. It stores and indexes the data received from Logstash. Indexing makes your log data searchable by tools designed for querying and analyzing massive sets of data. You can query the Elasticsearch datasets from the built-in Kibana console, a third-party data analysis tool, or through the Elasticsearch API (covered later).

  • Kibana.  Kibana provides a simple and easy-to-use method for searching, analyzing, and visualizing the log data stored in the Elasticsearch indexes. You can customize the Kibana console to provide graphs, charts, and other visualizations of your log data.

2.2.3 Using the Elasticsearch API

You can query the Elasticsearch indexes through various language-specific APIs, as well as directly over the IP address and port that Elasticsearch exposes on your implementation. By default, Elasticsearch presents from localhost, port 9200. You can run queries directly from a terminal using curl. For example:

curl -XGET 'http://localhost:9200/_search?q=tag:yourSearchTag'

The preceding command searches all indexes for all data with the "yourSearchTag" tag.

You can also use the Elasticsearch API from outside the logging node. This method connects over the Kibana VIP address, port 5601, using basic http authentication. For example, you can use the following command to perform the same search as the preceding search:

curl -u kibana:<password> kibana_vip:5601/_search?q=tag:yourSearchTag

You can further refine your search to a specific index of data, in this case the "elasticsearch" index:

curl -XGET 'http://localhost:9200/elasticsearch/_search?q=tag:yourSearchTag'

The search API is RESTful, so responses are provided in JSON format. Here's a sample (though empty) response:

{
    "took":13,
    "timed_out":false,
    "_shards":{
        "total":45,
        "successful":45,
        "failed":0
    },
    "hits":{
        "total":0,
        "max_score":null,
        "hits":[]
    }
}

2.2.4 For More Information

You can find more detailed Elasticsearch API documentation at https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html.

Review the Elasticsearch Python API documentation at the following sources: http://elasticsearch-py.readthedocs.io/en/master/api.html

Read the Elasticsearch Java API documentation at https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html.

2.2.5 Forwarding your logs

You can configure Logstash to ship your logs to an outside storage and indexing system, such as Splunk. Setting up this configuration is as simple as editing a few configuration files, and then running the Ansible playbooks that implement the changes. Here are the steps.

  1. Begin by logging in to the Cloud Lifecycle Manager.

  2. Verify that the logging system is up and running:

    cd ~/scratch/ansible/next/ardana/ansible
    ansible-playbook -i hosts/verb_hosts logging-status.yml

    When the preceding playbook completes without error, proceed to the next step.

  3. Edit the Logstash configuration file, found at the following location:

    ~/openstack/ardana/ansible/roles/logging-server/templates/logstash.conf.j2

    Near the end of the Logstash configuration file, you will find a section for configuring Logstash output destinations. The following example demonstrates the changes necessary to forward your logs to an outside server (changes in bold). The configuration block sets up a TCP connection to the destination server's IP address over port 5514.

    # Logstash outputs
        output {
          # Configure Elasticsearch output
          # http://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
          elasticsearch {
            index => "%{[@metadata][es_index]}
            hosts => ["{{ elasticsearch_http_host }}:{{ elasticsearch_http_port }}"]
            flush_size => {{ logstash_flush_size }}
            idle_flush_time => 5
            workers => {{ logstash_threads }}
          }
            # Forward Logs to Splunk on TCP port 5514 which matches the one specified in Splunk Web UI.
          tcp {
            mode => "client"
            host => "<Enter Destination listener IP address>"
            port => 5514
          }
        }

    Note that Logstash can forward log data to multiple sources, so there is no need to remove or alter the Elasticsearch section in the preceding file. However, if you choose to stop forwarding your log data to Elasticsearch, you can do so by removing the related section in this file, and then continue with the following steps.

  4. Commit your changes to the local git repository:

    cd ~/openstack/ardana/ansible
    git add -A
    git commit -m "Your commit message"
  5. Run the configuration processor to check the status of all configuration files:

    ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Run the ready-deployment playbook:

    ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Implement the changes to the Logstash configuration file:

    cd ~/scratch/ansible/next/ardana/ansible
    ansible-playbook -i hosts/verb_hosts logging-server-configure.yml

Please note that configuring the receiving service will vary from product to product. Consult the documentation for your particular product for instructions on how to set it up to receive log files from Logstash.

2.3 Integrating Your Logs with Splunk

2.3.1 Integrating with Splunk

The SUSE OpenStack Cloud 8 logging solution provides a flexible and extensible framework to centralize the collection and processing of logs from all nodes in your cloud. The logs are shipped to a highly available and fault-tolerant cluster where they are transformed and stored for better searching and reporting. The SUSE OpenStack Cloud 8 logging solution uses the ELK stack (Elasticsearch, Logstash and Kibana) as a production-grade implementation and can support other storage and indexing technologies.

You can configure Logstash, the service that aggregates and forwards the logs to a searchable index, to send the logs to a third-party target, such as Splunk.

For how to integrate the SUSE OpenStack Cloud 8 centralized logging solution with Splunk, including the steps to set up and forward logs, please refer to Section 3.1, “Splunk Integration”.

2.4 Integrating SUSE OpenStack Cloud with an LDAP System

You can configure your SUSE OpenStack Cloud cloud to work with an outside user authentication source such as Active Directory or OpenLDAP. Keystone, the SUSE OpenStack Cloud identity service, functions as the first stop for any user authorization/authentication requests. Keystone can also function as a proxy for user account authentication, passing along authentication and authorization requests to any LDAP-enabled system that has been configured as an outside source. This type of integration lets you use an existing user-management system such as Active Directory and its powerful group-based organization features as a source for permissions in SUSE OpenStack Cloud.

Upon successful completion of this tutorial, your cloud will refer user authentication requests to an outside LDAP-enabled directory system, such as Microsoft Active Directory or OpenLDAP.

2.4.1 Configure your LDAP source

To configure your SUSE OpenStack Cloud cloud to use an outside user-management source, perform the following steps:

  1. Make sure that the LDAP-enabled system you plan to integrate with is up and running and accessible over the necessary ports from your cloud management network.

  2. Edit the /var/lib/ardana/openstack/my_cloud/config/keystone/keystone.conf.j2 file and set the following options:

    domain_specific_drivers_enabled = True
    domain_configurations_from_database = False
  3. Create a YAML file in the /var/lib/ardana/openstack/my_cloud/config/keystone/ directory that defines your LDAP connection. You can make a copy of the sample Keystone-LDAP configuration file, and then edit that file with the details of your LDAP connection.

    The following example copies the keystone_configure_ldap_sample.yml file and names the new file keystone_configure_ldap_my.yml:

    ardana > cp /var/lib/ardana/openstack/my_cloud/config/keystone/keystone_configure_ldap_sample.yml \
      /var/lib/ardana/openstack/my_cloud/config/keystone/keystone_configure_ldap_my.yml
  4. Edit the new file to define the connection to your LDAP source. This guide does not provide comprehensive information on all aspects of the keystone_configure_ldap.yml file. Find a complete list of Keystone/LDAP configuration file options at: https://github.com/openstack/keystone/blob/stable/pike/etc/keystone.conf.sample

    The following file illustrates an example Keystone configuration that is customized for an Active Directory connection.

    keystone_domainldap_conf:
    
        # CA certificates file content.
        # Certificates are stored in Base64 PEM format. This may be entire LDAP server
        # certificate (in case of self-signed certificates), certificate of authority
        # which issued LDAP server certificate, or a full certificate chain (Root CA
        # certificate, intermediate CA certificate(s), issuer certificate).
        #
        cert_settings:
          cacert: |
            -----BEGIN CERTIFICATE-----
    
            certificate appears here
    
            -----END CERTIFICATE-----
    
        # A domain will be created in MariaDB with this name, and associated with ldap back end.
        # Installer will also generate a config file named /etc/keystone/domains/keystone.<domain_name>.conf
        #
        domain_settings:
          name: ad
          description: Dedicated domain for ad users
    
        conf_settings:
          identity:
             driver: ldap
    
    
          # For a full list and description of ldap configuration options, please refer to
          # http://docs.openstack.org/liberty/config-reference/content/keystone-configuration-file.html.
          #
          # Please note:
          #  1. LDAP configuration is read-only. Configuration which performs write operations (i.e. creates users, groups, etc)
          #     is not supported at the moment.
          #  2. LDAP is only supported for identity operations (reading users and groups from LDAP). Assignment
          #     operations with LDAP (i.e. managing roles, projects) are not supported.
          #  3. LDAP is configured as non-default domain. Configuring LDAP as a default domain is not supported.
          #
    
          ldap:
            url: ldap://YOUR_COMPANY_AD_URL
            suffix: YOUR_COMPANY_DC
            query_scope: sub
            user_tree_dn: CN=Users,YOUR_COMPANY_DC
            user : CN=admin,CN=Users,YOUR_COMPANY_DC
            password: REDACTED
            user_objectclass: user
            user_id_attribute: cn
            user_name_attribute: cn
            group_tree_dn: CN=Users,YOUR_COMPANY_DC
            group_objectclass: group
            group_id_attribute: cn
            group_name_attribute: cn
            use_pool: True
            user_enabled_attribute: userAccountControl
            user_enabled_mask: 2
            user_enabled_default: 512
            use_tls: True
            tls_req_cert: demand
            # if you are configuring multiple LDAP domains, and LDAP server certificates are issued
            # by different authorities, make sure that you place certs for all the LDAP backend domains in the
            # cacert parameter as seen in this sample yml file so that all the certs are combined in a single CA file
            # and every LDAP domain configuration points to the combined CA file.
            # Note:
            # 1. Please be advised that every time a new ldap domain is configured, the single CA file gets overwritten
            # and hence ensure that you place certs for all the LDAP backend domains in the cacert parameter.
            # 2. There is a known issue on one cert per CA file per domain when the system processes
            # concurrent requests to multiple LDAP domains. Using the single CA file with all certs combined
            # shall get the system working properly.
    
            tls_cacertfile: /etc/keystone/ssl/certs/all_ldapdomains_ca.pem
  5. Add your new file to the local Git repository and commit the changes.

    ardana > cd ~/openstack
    ardana > git checkout site
    ardana > git add -A
    ardana > git commit -m "Adding LDAP server integration config"
  6. Run the configuration processor and deployment preparation playbooks to validate the YAML files and prepare the environment for configuration.

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Run the Keystone reconfiguration playbook to implement your changes, passing the newly created YAML file as an argument to the -e@FILE_PATH parameter:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml \
      -e@/var/lib/ardana/openstack/my_cloud/config/keystone/keystone_configure_ldap_my.yml

    To integrate your SUSE OpenStack Cloud cloud with multiple domains, repeat these steps starting from Step 3 for each domain.

3 Third-Party Integrations

3.1 Splunk Integration

This documentation demonstrates the possible integration between the SUSE OpenStack Cloud 8 centralized logging solution and Splunk including the steps to set up and forward logs.

The SUSE OpenStack Cloud 8 logging solution provides a flexible and extensible framework to centralize the collection and processing of logs from all of the nodes in a cloud. The logs are shipped to a highly available and fault tolerant cluster where they are transformed and stored for better searching and reporting. The SUSE OpenStack Cloud 8 logging solution uses the ELK stack (Elasticsearch, Logstash and Kibana) as a production grade implementation and can support other storage and indexing technologies. The Logstash pipeline can be configured to forward the logs to an alternative target if you wish.

This documentation demonstrates the possible integration between the SUSE OpenStack Cloud 8 centralized logging solution and Splunk including the steps to set up and forward logs.

3.1.1 What is Splunk?

Splunk is software for searching, monitoring, and analyzing machine-generated big data, via a web-style interface. Splunk captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualizations. It is commercial software (unlike Elasticsearch) and more details about Splunk can be found at https://www.splunk.com.

3.1.2 Configuring Splunk to receive log messages from SUSE OpenStack Cloud 8

This documentation assumes that you already have Splunk set up and running. For help with installing and setting up Splunk, refer to Splunk Tutorial.

There are different ways in which a log message (or "event" in Splunk's terminology) can be shipped to Splunk. These steps will set up a TCP port (5514) where Splunk will listen for messages.

  1. On the Splunk web UI, click on the Settings menu in the upper right-hand corner.

  2. In the Data section of the Settings menu, click Data Inputs.

  3. Choose the TCP option.

  4. Click the New button to add an input.

  5. In the Port field, enter 5514 (or any other port number of your choice).

    Note
    Note

    If you are on a less secure network and want to restrict connections to this port, use the Only accept connection from field to restrict the traffic to a specific IP address.

  6. Click the Next button.

  7. Specify the Source Type by clicking on the Select button and choosing linux_messages_syslog from the list.

  8. Click the Review button.

  9. Review the configuration and click the Submit button.

  10. You should see this success images if everything went okay.

3.1.3 Forwarding log messages from SUSE OpenStack Cloud 8 Centralized Logging to Splunk

Once you have Splunk set up and configured to receive log messages, the final step is to configure SUSE OpenStack Cloud 8 to forward the logs to Splunk. These steps will show you how to do this.

  1. Log in to the Cloud Lifecycle Manager.

  2. Verify the status of the logging service to ensure everything is up and running:

    cd ~/scratch/ansible/next/ardana/ansible
    ansible-playbook -i hosts/verb_hosts logging-status.yml

    If everything is up and running, continue to the next step.

  3. Edit the logstash config file at the location below:

    ~/openstack/ardana/ansible/roles/logging-server/templates/logstash.conf.j2
  4. At the bottom of the file will be a section for the Logstash outputs. You will need to add details about your Splunk environment details.

    Example, showing the placement in bold:

    # Logstash outputs
    #------------------------------------------------------------------------------
    output {
      # Configure Elasticsearch output
      # http://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
      elasticsearch {
        index => "%{[@metadata][es_index]}
        hosts => ["{{ elasticsearch_http_host }}:{{ elasticsearch_http_port }}"]
        flush_size => {{ logstash_flush_size }}
        idle_flush_time => 5
        workers => {{ logstash_threads }}
      }
       # Forward Logs to Splunk on TCP port 5514 which matches the one specified in Splunk Web UI.
     tcp {
       mode => "client"
       host => "<Enter Splunk listener IP address>"
       port => 5514
     }
    }
    Note
    Note

    If you are not planning on using the Splunk UI to parse your centralized logs, there is no need to forward your logs to Elasticsearch. Hence, you can comment out those lines in the Logstash outputs pertaining to Elasticsearch. However, you can continue to forward your centralized logs to multiple locations.

  5. Commit your changes to git:

    cd ~/openstack/ardana/ansible
    git add -A
    git commit -m "Logstash configuration change for Splunk integration"
  6. Run the configuration processor:

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost config-processor-run.yml
  7. Update your deployment directory:

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost ready-deployment.yml
  8. Complete this change with a reconfigure of the logging environment:

    cd ~/scratch/ansible/next/ardana/ansible
    ansible-playbook -i hosts/verb_hosts logging-configure.yml
  9. You can confirm via your Splunk UI that the logs have begun to forward.

3.1.4 Searching for log messages from the Spunk dashboard

To both verify that your integration worked and to search your log messages that have been forwarded you can navigate back to your Splunk dashboard. In the search field, use this string:

source="tcp:5514"

Find information on using the Splunk search tool at http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/WelcometotheSearchTutorial.

3.2 Nagios Integration

SUSE OpenStack Cloud cloud operators that are using Nagios or Icinga-based monitoring systems may wish to integrate them with the built-in monitoring infrastructure of SUSE OpenStack Cloud. Integrating with the existing monitoring processes and procedures will reduce support overhead and avoid duplication. This document describes the different approaches that can be taken to create a well-integrated monitoring dashboard using both technologies.

Note
Note

This document refers to Nagios but the proposals will work equally well with Icinga, Icinga2, or other Nagios clone monitoring systems.

3.2.1 SUSE OpenStack Cloud monitoring and reporting

SUSE OpenStack Cloud comes with a monitoring engine (Monasca) and a separate management dashboard (Operations Console). Monasca is extremely scalable, designed to cope with the constant change in monitoring sources and services found in a cloud environment. Monitoring agents running on hosts (physical and virtual) submit data to the Monasca message bus via a RESTful API. Threshold and notification engines then trigger alarms when predefined thresholds are passed. Notification methods are flexible and extensible. Typical examples of notification methods would be emails generated or creating alarms in PagerDuty.

While extensible, Monasca is largely focused on monitoring cloud infrastructures rather than traditional environments such as server hardware, network links, switches, etc. For more details about the monitoring service, see Section 12.1, “Monitoring”.

The Operations Console (Ops Console) provides cloud administrators a clear web interfaces to view alarm status, management alarm workflow, and configure alarms and thresholds. For more details about the Ops Console, see Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.1 “Operations Console Overview”.

3.2.2 Nagios monitoring and reporting

Nagios is an industry leading open source monitoring service with extensive plugins and agents. Nagios checks are either run directly from the monitoring server or run on a remote host via an agent and with results submitted back to the monitoring server. While Nagios has proven extremely flexible and scalable, it requires significant explicit configuration. Using Nagios to monitor guest virtual machines becomes more challenging because virtual machines can be ephemeral which means new virtual machines are created and destroyed regularly. Configuration automation (Chef, Puppet, Ansible etc) can create a more dynamic Nagios setup but they still require the Nagios service to be restarted every time a new host is added.

A key benefit of Nagios style monitoring is that it allows for SUSE OpenStack Cloud to be monitored externally, from a user or service perspective. For example, checks can be created to monitor availability of all the API endpoints from external locations or even to create and destroy instances to ensure the entire system is working as expected.

3.2.3 Adding Monasca

Many private cloud operators already have existing monitoring solutions such as Nagios and Icinga. We recommend that you extend your existing solutions into Monasca or forward Monasca alerts to your existing solution to maximize coverage and reduce risk.

3.2.4 Integration Approaches

Integration between Nagios and Monasca can occur at two levels, at the individual check level or at the management interfaces. Both options are discussed in the following sections.

Running Nagios-style checks in the Monasca agents

The Monasca agent is installed on all SUSE OpenStack Cloud servers and includes the ability to execute Nagios-style plugins as well as its own plugin scripts. For this configuration check, plugins need to be installed on the required server then added to the Monasca configuration under /etc/monasca/agent/conf.d. Care should be taken as plugins that take a long time (greater than 10 seconds) to run can result in the Monasca agent failing to run its own checks in the allotted time and therefore stopping all client monitoring. Issues have been seen with hardware monitoring plugins that can take greater than 30 seconds and any plugins relying on name resolution when DNS services are not available. Details on the required Monasca configuration can be found at https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md#nagios-wrapper.

Use Case:

  • Local host checking. As an operator I want to run a local monitoring check on my host to check physical hardware. Check status and alert management will be based around the Operations Console, not Nagios.

Limitation

  • As mentioned earlier, care should be taken to ensure checks do not introduce load or delays in the Monasca agent check cycle. Additionally, depending on the operating system the node is running, plugins or dependencies may not be available.

Using Nagios as a central dashboard

It is possible to create a Nagios-style plugin that will query the Monasca API endpoint for an alarm status to create Nagios alerts and alarms based on Monasca alarms and filters. Monasca alarms appear in Nagios using two approaches, one listing checks by service and the other listing checks by physical host.

In the top section of the Nagios-style plugin, services can be created under a dummy host, monasca_endpoint. Each service retrieves all alarms based on defined dimensions. For example the ardana-compute check will return all alarms with the compute (Nova) dimension.

In the bottom section, the physical servers making up the SUSE OpenStack Cloud cluster can be defined and checks can be run. For example, one could check the server hardware from the Nagios server using a third party plugin and the another could retrieve all monasca alarms related to that host.

To build this configuration, a custom Nagios plugin (Please see example plugin at: https://github.com/openstack/python-monascaclient/tree/stable/pike/examples) was created with the following options:

check_monasca –c CREDENTIALS -d DIMENSION -v VALUE

Examples:

To check alarms on test-ccp-comp001-mgmt you would use:

check_monasca –c service.osrc –d hostname –v test-ccp-comp001-mgmt

To check all Network related alarms, you would use:

check_monasca –c service.osrc –d service –v networking

Use Cases:

  • Multiple clouds, integrating SUSE OpenStack Cloud monitoring with existing monitoring capabilities or viewing Monasca alerts in Nagios, fully integrating Monasca alarms with Nagios alarms and workflow.

  • In a predominantly Nagios or Icinga-based monitoring environment, Monasca alarm status can be integrated into existing processes and workflows. This approach works best for checks associated with physical servers running the SUSE OpenStack Cloud services.

  • With multiple SUSE OpenStack Cloud clusters, all of their alarms can be consolidated into a single view, the current version of Operations Console is for a single cluster only.

Limitations

  • Nagios has a more traditional configuration model that requires checks to belong to predefined services and hosts, this is not well suited in highly dynamic cloud environments where the lifespan of virtual instances can be very short. One possible solution is with Icinga2 which has an API available to dynamically add host and service definitions, the check plugin could be extended to create alarm definitions dynamically as they occur.

    The key disadvantage is that multiple alarms can appear as a single service. For example, suppose there are 3 warnings against one service. If the operator acknowledges this alarm and subsequently a 4th warning alarm occurs, it would not generate an alert and could get missed.

    Care has to be taken that alarms are not missed. If the defined checks are only looking for checks in an ALARM status they will not report undetermined checks that might indicate other issues.

Using Operations Console as central dashboard

Nagios has the ability to run custom scripts in response to events. It is therefore possible to write a plugin to update Monasca whenever a Nagios alert occurs. The Operations Console could then be used as a central reporting dashboard for both Monasca and Nagios alarms. The external Nagios alarms can have their own check dimension and could be displayed as a separate group in the Operations Console.

Use Cases

  • Using Operations Console the central monitoring tool.

Limitations

  • The alarm could not be acknowledged from the Operations Console so Nagios could send repetitive notifications unless configured to take this into account.

SUSE OpenStack Cloud-specific Nagios Plugins

Several OpenStack plugin packages exist (see https://launchpad.net/ubuntu/+source/nagios-plugins-openstack) that are useful to run from external sources to ensure the overall system is working as expected. Monasca requires some OpenStack components to be working in order to work at all. For example, if Keystone were unavailable, Monasca could not authenticate client or console requests. An external service check could highlight this.

3.2.5 Common integration issues

Alarm status differences

Monasca and Nagios treat alarms and status in different ways and for the two systems to talk there needs to be a mapping between them. The following table details the alarm parameters available for each:

SystemStatusSeverityDetails
NagiosOK Plugin returned OK with given thresholds
WARNING Plugin returned WARNING based on thresholds
CRITICAL Plugin returned CRITICAL alarm
UNKNOWN Plugin failed
MonascaOK No alarm triggered
ALARMLOWAlarm state, LOW impact
ALARMMEDIUMAlarm state, MEDIUM impact
ALARMHIGHAlarm state, HIGH impact
UNDETERMINED No metrics received

In the plugin described here, the mapping was created with this flow:

Monasca OK -> Nagios OK
Monasca ALARM ( LOW or MEDIUM ) -> Nagios Warning
Monasca ALARM ( HIGH ) -> Nagios Critical

Alarm workflow differences

In both, system alarms can be acknowledged in the dashboards to indicate they are being worked on (or ignored). Not all the scenarios above will provide the same level of workflow integration.

3.3 Operations Bridge Integration

The SUSE OpenStack Cloud 8 monitoring solution (Monasca) can easily be integrated with your existing monitoring tools. Integrating SUSE OpenStack Cloud 8 Monasca with Operations Bridge using the Operations Bridge Connector simplifies monitoring and managing events and topology information.

The integration provides the following functionality:

  • Forwarding of SUSE OpenStack Cloud Monasca alerts and topology to Operations Bridge for event correlation

  • Customization of forwarded events and topology

For more information about this connector please see https://software.microfocus.com/en-us/products/operations-bridge-suite/overview.

3.4 Monitoring Third-Party Components With Monasca

3.4.1 Monasca Monitoring Integration Overview

Monasca, the SUSE OpenStack Cloud 8 monitoring service, collects information about your cloud's systems, and allows you to create alarm definitions based on these measurements. Monasca-agent is the component that collects metrics such as metric storage and alarm thresholding and forwards them to the monasca-api for further processing.

With a small amount of configuration, you can use the detection and check plugins that are provided with your cloud to monitor integrated third-party components. In addition, you can write custom plugins and integrate them with the existing monitoring service.

Find instructions for customizing existing plugins to monitor third-party components in the Section 3.4.4, “Configuring Check Plugins”.

Find instructions for installing and configuring new custom plugins in the Section 3.4.3, “Writing Custom Plugins”.

You can also use existing alarm definitions, as well as create new alarm definitions that relate to a custom plugin or metric. Instructions for defining new alarm definitions are in the Section 3.4.6, “Configuring Alarm Definitions”.

You can use the Operations Console and Monasca CLI to list all of the alarms, alarm-definitions, and metrics that exist on your cloud.

3.4.2 Monasca Agent

The Monasca agent (monasca-agent) collects information about your cloud using the installed plugins. The plugins are written in Python, and determine the monitoring metrics for your system, as well as the interval for collection. The default collection interval is 30 seconds, and we strongly recommend not changing this default value.

The following two types of custom plugins can be added to your cloud.

  • Detection Plugin. Determines whether the monasca-agent has the ability to monitor the specified component or service on a host. If successful, this type of plugin configures an associated check plugin by creating a YAML configuration file.

  • Check Plugin. Specifies the metrics to be monitored, using the configuration file created by the detection plugin.

Monasca-agent is installed on every server in your cloud, and provides plugins that monitor the following.

  • System metrics relating to CPU, memory, disks, host availability, etc.

  • Process health metrics (process, http_check)

  • SUSE OpenStack Cloud 8-specific component metrics, such as apache rabbitmq, kafka, cassandra, etc.

Monasca is pre-configured with default check plugins and associated detection plugins. The default plugins can be reconfigured to monitor third-party components, and often only require small adjustments to adapt them to this purpose. Find a list of the default plugins here: https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md#detection-plugins

Often, a single check plugin will be used to monitor multiple services. For example, many services use the http_check.py detection plugin to detect the up/down status of a service endpoint. Often the process.py check plugin, which provides process monitoring metrics, is used as a basis for a custom process detection plugin.

More information about the Monasca agent can be found in the following locations

3.4.3 Writing Custom Plugins

When the pre-built Monasca plugins do not meet your monitoring needs, you can write custom plugins to monitor your cloud. After you have written a plugin, you must install and configure it.

When your needs dictate a very specific custom monitoring check, you must provide both a detection and check plugin.

The steps involved in configuring a custom plugin include running a detection plugin and passing any necesssary parameters to the detection plugin so the resulting check configuration file is created with all necessary data.

When using an existing check plugin to monitor a third-party component, a custom detection plugin is needed only if there is not an associated default detection plugin.

Check plugin configuration files

Each plugin needs a corresponding YAML configuration file with the same stem name as the plugin check file. For example, the plugin file http_check.py (in /usr/lib/python2.7/site-packages/monasca_agent/collector/checks_d/) should have a corresponding configuration file, http_check.yaml (in /etc/monasca/agent/conf.d/http_check.yaml). The stem name http_check must be the same for both files.

Permissions for the YAML configuration file must be read+write for mon-agent user (the user that must also own the file), and read for the mon-agent group. Permissions for the file must be restricted to the mon-agent user and monasca group. The following example shows correct permissions settings for the file http_check.yaml.

ardana > ls -alt /etc/monasca/agent/conf.d/http_check.yaml
-rw-r----- 1 monasca-agent monasca 10590 Jul 26 05:44 http_check.yaml

A check plugin YAML configuration file has the following structure.

init_config:
    key1: value1
    key2: value2

instances:
    - name: john_smith
      username: john_smith
      password: 123456
    - name: jane_smith
      username: jane_smith
      password: 789012

In the above file structure, the init_config section allows you to specify any number of global key:value pairs. Each pair will be available on every run of the check that relates to the YAML configuration file.

The instances section allows you to list the instances that the related check will be run on. The check will be run once on each instance listed in the instances section. Ensure that each instance listed in the instances section has a unique name.

Custom detection plugins

Detection plugins should be written to perform checks that ensure that a component can be monitored on a host. Any arguments needed by the associated check plugin are passed into the detection plugin at setup (configuration) time. The detection plugin will write to the associated check configuration file.

When a detection plugin is successfully run in the configuration step, it will write to the check configuration YAML file. The configuration file for the check is written to the following directory.

/etc/monasca/agent/conf.d/

Writing process detection plugin using the ServicePlugin class

The monasca-agent provides a ServicePlugin class that makes process detection monitoring easy.

Process check

The process check plugin generates metrics based on the process status for specified process names. It generates process.pid_count metrics for the specified dimensions, and a set of detailed process metrics for the specified dimensions by default.

The ServicePlugin class allows you to specify a list of process name(s) to detect, and uses psutil to see if the process exists on the host. It then appends the process.yml configuration file with the process name(s), if they do not already exist.

The following is an example of a process.py check ServicePlugin.

import monasca_setup.detection

class MonascaTransformDetect(monasca_setup.detection.ServicePlugin):
    """Detect Monasca Transform daemons and setup configuration to monitor them."""
    def __init__(self, template_dir, overwrite=False, args=None):
        log.info("      Watching the monasca transform processes.")
        service_params = {
            'args': {},
            'template_dir': template_dir,
            'overwrite': overwrite,
            'service_name': 'monasca-transform',
            'process_names': ['monasca-transform','pyspark',
                              'transform/lib/driver']
        }
        super(MonascaTransformDetect, self).__init__(service_params)

Writing a Custom Detection Plugin using Plugin or ArgsPlugin classes

A custom detection plugin class should derive from either the Plugin or ArgsPlugin classes provided in the /usr/lib/python2.7/site-packages/monasca_setup/detection directory.

If the plugin parses command line arguments, the ArgsPlugin class is useful. The ArgsPlugin class derives from the Plugin class. The ArgsPlugin class has a method to check for required arguments, and a method to return the instance that will be used for writing to the configuration file with the dimensions from the command line parsed and included.

If the ArgsPlugin methods do not seem to apply, then derive directly from the Plugin class.

When deriving from these classes, the following methods should be implemented.

  • _detect - set self.available=True when conditions are met that the thing to monitor exists on a host.

  • build_config - writes the instance information to the configuration and return the configuration.

  • dependencies_installed (default implementation is in ArgsPlugin, but not Plugin) - return true when python dependent libraries are installed.

The following is an example custom detection plugin.

import ast
import logging

import monasca_setup.agent_config
import monasca_setup.detection

log = logging.getLogger(__name__)


class HttpCheck(monasca_setup.detection.ArgsPlugin):
    """Setup an http_check according to the passed in args.
       Despite being a detection plugin this plugin does no detection and will be a noop without   arguments.
       Expects space separated arguments, the required argument is url. Optional parameters include:
       disable_ssl_validation and match_pattern.
    """

    def _detect(self):
        """Run detection, set self.available True if the service is detected.
        """
        self.available = self._check_required_args(['url'])

    def build_config(self):
        """Build the config as a Plugins object and return.
        """
        config = monasca_setup.agent_config.Plugins()
        # No support for setting headers at this time
        instance = self._build_instance(['url', 'timeout', 'username', 'password',
                                         'match_pattern', 'disable_ssl_validation',
                                         'name', 'use_keystone', 'collect_response_time'])

        # Normalize any boolean parameters
        for param in ['use_keystone', 'collect_response_time']:
            if param in self.args:
                instance[param] = ast.literal_eval(self.args[param].capitalize())
        # Set some defaults
        if 'collect_response_time' not in instance:
            instance['collect_response_time'] = True
        if 'name' not in instance:
            instance['name'] = self.args['url']

        config['http_check'] = {'init_config': None, 'instances': [instance]}

        return config

Installing a detection plugin in the OpenStack version delivered with SUSE OpenStack Cloud

Install a plugin by copying it to the plugin directory (/usr/lib/python2.7/site-packages/monasca_agent/collector/checks_d/).

The plugin should have file permissions of read+write for the root user (the user that should also own the file) and read for the root group and all other users.

The following is an example of correct file permissions for the http_check.py file.

-rw-r--r-- 1 root root 1769 Sep 19 20:14 http_check.py

Detection plugins should be placed in the following directory.

/usr/lib/monasca/agent/custom_detect.d/

The detection plugin directory name should be accessed using the monasca_agent_detection_plugin_dir Ansible variable. This variable is defined in the roles/monasca-agent/vars/main.yml file.

monasca_agent_detection_plugin_dir: /usr/lib/monasca/agent/custom_detect.d/

Example: Add Ansible monasca_configure task to install the plugin. (The monasca_configure task can be added to any service playbook.) In this example, it is added to ~/openstack/ardana/ansible/roles/_CEI-CMN/tasks/monasca_configure.yml.

---
- name: _CEI-CMN | monasca_configure |
    Copy Ceilometer Custom plugin
  become: yes
  copy:
    src: ardanaceilometer_mon_plugin.py
    dest: "{{ monasca_agent_detection_plugin_dir }}"
    owner: root
    group: root
    mode: 0440

Custom check plugins

Custom check plugins generate metrics. Scalability should be taken into consideration on systems that will have hundreds of servers, as a large number of metrics can affect performance by impacting disk performance, RAM and CPU usage.

You may want to tune your configuration parameters so that less-important metrics are not monitored as frequently. When check plugins are configured (when they have an associated YAML configuration file) the agent will attempt to run them.

Checks should be able to run within the 30-second metric collection window. If your check runs a command, you should provide a timeout to prevent the check from running longer than the default 30-second window. You can use the monasca_agent.common.util.timeout_command to set a timeout for in your custom check plugin python code.

Find a description of how to write custom check plugins at https://github.com/openstack/monasca-agent/blob/master/docs/Customizations.md#creating-a-custom-check-plugin

Custom checks derive from the AgentCheck class located in the monasca_agent/collector/checks/check.py file. A check method is required.

Metrics should contain dimensions that make each item that you are monitoring unique (such as service, component, hostname). The hostname dimension is defined by default within the AgentCheck class, so every metric has this dimension.

A custom check will do the following.

  • Read the configuration instance passed into the check method.

  • Set dimensions that will be included in the metric.

  • Create the metric with gauge, rate, or counter types.

Metric Types:

  • gauge: Instantaneous reading of a particular value (for example, mem.free_mb).

  • rate: Measurement over a time period. The following equation can be used to define rate.

    rate=delta_v/float(delta_t)
  • counter: The number of events, increment and decrement methods, for example, zookeeper.timeouts

The following is an example component check named SimpleCassandraExample.

import monasca_agent.collector.checks as checks
from monasca_agent.common.util import timeout_command

CASSANDRA_VERSION_QUERY = "SELECT version();"


class SimpleCassandraExample(checks.AgentCheck):

    def __init__(self, name, init_config, agent_config):
        super(SimpleCassandraExample, self).__init__(name, init_config, agent_config)

    @staticmethod
    def _get_config(instance):
        user = instance.get('user')
        password = instance.get('password')
        service = instance.get('service')
        timeout = int(instance.get('timeout'))

        return user, password, service, timeout

    def check(self, instance):
        user, password, service, node_name, timeout = self._get_config(instance)

        dimensions = self._set_dimensions({'component': 'cassandra', 'service': service}, instance)

        results, connection_status = self._query_database(user, password, timeout, CASSANDRA_VERSION_QUERY)

        if connection_status != 0:
            self.gauge('cassandra.connection_status', 1, dimensions=dimensions)
        else:
            # successful connection status
            self.gauge('cassandra.connection_status', 0, dimensions=dimensions)

    def _query_database(self, user, password, timeout, query):
        stdout, stderr, return_code = timeout_command(["/opt/cassandra/bin/vsql", "-U", user, "-w", password, "-A", "-R",
                                                       "|", "-t", "-F", ",", "-x"], timeout, command_input=query)
        if return_code == 0:
            # remove trailing newline
            stdout = stdout.rstrip()
            return stdout, 0
        else:
            self.log.error("Error querying cassandra with return code of {0} and error {1}".format(return_code, stderr))
            return stderr, 1

Installing check plugin

The check plugin needs to have the same file permissions as the detection plugin. File permissions must be read+write for the root user (the user that should own the file), and read for the root group and all other users.

Check plugins should be placed in the following directory.

/usr/lib/monasca/agent/custom_checks.d/

The check plugin directory should be accessed using the monasca_agent_check_plugin_dir Ansible variable. This variable is defined in the roles/monasca-agent/vars/main.yml file.

monasca_agent_check_plugin_dir: /usr/lib/monasca/agent/custom_checks.d/

3.4.4 Configuring Check Plugins

Manually configure a plugin when unit-testing using the monasca-setup script installed with the monasca-agent

Find a good explanation of configuring plugins here: https://github.com/openstack/monasca-agent/blob/master/docs/Agent.md#configuring

SSH to a node that has both the monasca-agent installed as well as the component you wish to monitor.

The following is an example command that configures a plugin that has no parameters (uses the detection plugin class name).

root # /usr/bin/monasca-setup -d ARDANACeilometer

The following is an example command that configures the apache plugin and includes related parameters.

root # /usr/bin/monasca-setup -d apache -a 'url=http://192.168.245.3:9095/server-status?auto'

If there is a change in the configuration it will restart the monasca-agent on the host so the configuration is loaded.

After the plugin is configured, you can verify that the configuration file has your changes (see the next Verify that your check plugin is configured section).

Use the monasca CLI to see if your metric exists (see the Verify that metrics exist section).

Using Ansible modules to configure plugins in SUSE OpenStack Cloud 8

The monasca_agent_plugin module is installed as part of the monasca-agent role.

The following Ansible example configures the process.py plugin for the Ceilometer detection plugin. The following example only passes in the name of the detection class.

- name: _CEI-CMN | monasca_configure |
    Run Monasca agent Cloud Lifecycle Manager specific ceilometer detection plugin
  become: yes
  monasca_agent_plugin:
    name: "ARDANACeilometer"

If a password or other sensitive data are passed to the detection plugin, the no_log option should be set to True. If the no_log option is not set to True, the data passed to the plugin will be logged to syslog.

The following Ansible example configures the Cassandra plugin and passes in related arguments.

 - name: Run Monasca Agent detection plugin for Cassandra
   monasca_agent_plugin:
     name: "Cassandra"
     args="directory_names={{ FND_CDB.vars.cassandra_data_dir }},{{ FND_CDB.vars.cassandra_commit_log_dir }} process_username={{ FND_CDB.vars.cassandra_user }}"
   when: database_type == 'cassandra'

The following Ansible example configures the Keystone endpoint using the http_check.py detection plugin. The class name httpcheck of the http_check.py detection plugin is the name.

root # - name:  keystone-monitor | local_monitor |
    Setup active check on keystone internal endpoint locally
  become: yes
  monasca_agent_plugin:
    name: "httpcheck"
    args: "use_keystone=False \
           url=http://{{ keystone_internal_listen_ip }}:{{
               keystone_internal_port }}/v3 \
           dimensions=service:identity-service,\
                       component:keystone-api,\
                       api_endpoint:internal,\
                       monitored_host_type:instance"
  tags:
    - keystone
    - keystone_monitor

Verify that your check plugin is configured

All check configuration files are located in the following directory. You can see the plugins that are running by looking at the plugin configuration directory.

/etc/monasca/agent/conf.d/

When the monasca-agent starts up, all of the check plugins that have a matching configuration file in the /etc/monasca/agent/conf.d/ directory will be loaded.

If there are errors running the check plugin they will be written to the following error log file.

/var/log/monasca/agent/collector.log

You can change the monasca-agent log level by modifying the log_level option in the /etc/monasca/agent/agent.yaml configuration file, and then restarting the monasca-agent, using the following command.

root # service openstack-monasca-agent restart

You can debug a check plugin by running monasca-collector with the check option. The following is an example of the monasca-collector command.

tux > sudo /usr/bin/monasca-collector check CHECK_NAME

Verify that metrics exist

Begin by logging in to your deployer or controller node.

Run the following set of commands, including the monasca metric-list command. If the metric exists, it will be displayed in the output.

ardana > source ~/service.osrc
ardana > monasca metric-list --name METRIC_NAME

3.4.5 Metric Performance Considerations

Collecting metrics on your virtual machines can greatly affect performance. SUSE OpenStack Cloud 8 supports 200 compute nodes, with up to 40 VMs each. If your environment is managing maximum number of VMs, adding a single metric for all VMs is the equivalent of adding 8000 metrics.

Because of the potential impact that new metrics have on system performance, consider adding only new metrics that are useful for alarm-definition, capacity planning, or debugging process failure.

3.4.6 Configuring Alarm Definitions

The monasca-api-spec, found here https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md provides an explanation of Alarm Definitions and Alarms. You can find more information on alarm definition expressions at the following page: https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definition-expressions.

When an alarm definition is defined, the monasca-threshold engine will generate an alarm for each unique instance of the match_by metric dimensions found in the metric. This allows a single alarm definition that can dynamically handle the addition of new hosts.

There are default alarm definitions configured for all "process check" (process.py check) and "HTTP Status" (http_check.py check) metrics in the monasca-default-alarms role. The monasca-default-alarms role is installed as part of the Monasca deployment phase of your cloud's deployment. You do not need to create alarm definitions for these existing checks.

Third parties should create an alarm definition when they wish to alarm on a custom plugin metric. The alarm definition should only be defined once. Setting a notification method for the alarm definition is recommended but not required.

The following Ansible modules used for alarm definitions are installed as part of the monasca-alarm-definition role. This process takes place during the Monasca set up phase of your cloud's deployment.

  • monasca_alarm_definition

  • monasca_notification_method

The following examples, found in the ~/openstack/ardana/ansible/roles/monasca-default-alarms directory, illustrate how Monasca sets up the default alarm definitions.

Monasca Notification Methods

The monasca-api-spec, found in the following link, provides details about creating a notification https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#create-notification-method

The following are supported notification types.

  • EMAIL

  • WEBHOOK

  • PAGERDUTY

The keystone_admin_tenant project is used so that the alarms will show up on the Operations Console UI.

The following file snippet shows variables from the ~/openstack/ardana/ansible/roles/monasca-default-alarms/defaults/main.yml file.

---
notification_address: root@localhost
notification_name: 'Default Email'
notification_type: EMAIL

monasca_keystone_url: "{{ KEY_API.advertises.vips.private[0].url }}/v3"
monasca_api_url: "{{ MON_AGN.consumes_MON_API.vips.private[0].url }}/v2.0"
monasca_keystone_user: "{{ MON_API.consumes_KEY_API.vars.keystone_monasca_user }}"
monasca_keystone_password: "{{ MON_API.consumes_KEY_API.vars.keystone_monasca_password | quote }}"
monasca_keystone_project: "{{ KEY_API.vars.keystone_admin_tenant }}"

monasca_client_retries: 3
monasca_client_retry_delay: 2

You can specify a single default notification method in the ~/openstack/ardana/ansible/roles/monasca-default-alarms/tasks/main.yml file. You can also add or modify the notification type and related details using the Operations Console UI or Monasca CLI.

The following is a code snippet from the ~/openstack/ardana/ansible/roles/monasca-default-alarms/tasks/main.yml file.

---
- name: monasca-default-alarms | main | Setup default notification method
  monasca_notification_method:
    name: "{{ notification_name }}"
    type: "{{ notification_type }}"
    address: "{{ notification_address }}"
    keystone_url: "{{ monasca_keystone_url }}"
    keystone_user: "{{ monasca_keystone_user }}"
    keystone_password: "{{ monasca_keystone_password }}"
    keystone_project: "{{ monasca_keystone_project }}"
    monasca_api_url: "{{ monasca_api_url }}"
  no_log: True
  tags:
    - system_alarms
    - monasca_alarms
    - openstack_alarms
  register: default_notification_result
  until: not default_notification_result | failed
  retries: "{{ monasca_client_retries }}"
  delay: "{{ monasca_client_retry_delay }}"

Monasca Alarm Definition

In the alarm definition "expression" field, you can specify the metric name and threshold. The "match_by" field is used to create a new alarm for every unique combination of the match_by metric dimensions.

Find more details on alarm definitions at the Monasca API documentation: (https://github.com/stackforge/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definitions-and-alarms).

The following is a code snippet from the ~/openstack/ardana/ansible/roles/monasca-default-alarms/tasks/main.yml file.

- name: monasca-default-alarms | main | Create Alarm Definitions
  monasca_alarm_definition:
    name: "{{ item.name }}"
    description: "{{ item.description | default('') }}"
    expression: "{{ item.expression }}"
    keystone_token: "{{ default_notification_result.keystone_token }}"
    match_by: "{{ item.match_by | default(['hostname']) }}"
    monasca_api_url: "{{ default_notification_result.monasca_api_url }}"
    severity: "{{ item.severity | default('LOW') }}"
    alarm_actions:
      - "{{ default_notification_result.notification_method_id }}"
    ok_actions:
      - "{{ default_notification_result.notification_method_id }}"
    undetermined_actions:
      - "{{ default_notification_result.notification_method_id }}"
  register: monasca_system_alarms_result
  until: not monasca_system_alarms_result | failed
  retries: "{{ monasca_client_retries }}"
  delay: "{{ monasca_client_retry_delay }}"
  with_flattened:
    - monasca_alarm_definitions_system
    - monasca_alarm_definitions_monasca
    - monasca_alarm_definitions_openstack
    - monasca_alarm_definitions_misc_services
  when: monasca_create_definitions

In the following example ~/openstack/ardana/ansible/roles/monasca-default-alarms/vars/main.yml Ansible variables file, the alarm definition named Process Check sets the match_by variable with the following parameters.

  • process_name

  • hostname

monasca_alarm_definitions_system:
  - name: "Host Status"
    description: "Alarms when the specified host is down or not reachable"
    severity: "HIGH"
    expression: "host_alive_status > 0"
    match_by:
      - "target_host"
      - "hostname"
  - name: "HTTP Status"
    description: >
      "Alarms when the specified HTTP endpoint is down or not reachable"
    severity: "HIGH"
    expression: "http_status > 0"
    match_by:
      - "service"
      - "component"
      - "hostname"
      - "url"
  - name: "CPU Usage"
    description: "Alarms when CPU usage is high"
    expression: "avg(cpu.idle_perc) < 10 times 3"
  - name: "High CPU IOWait"
    description: "Alarms when CPU IOWait is high, possible slow disk issue"
    expression: "avg(cpu.wait_perc) > 40 times 3"
    match_by:
      - "hostname"
  - name: "Disk Inode Usage"
    description: "Alarms when disk inode usage is high"
    expression: "disk.inode_used_perc > 90"
    match_by:
      - "hostname"
      - "device"
    severity: "HIGH"
  - name: "Disk Usage"
    description: "Alarms when disk usage is high"
    expression: "disk.space_used_perc > 90"
    match_by:
      - "hostname"
      - "device"
    severity: "HIGH"
  - name: "Memory Usage"
    description: "Alarms when memory usage is high"
    severity: "HIGH"
    expression: "avg(mem.usable_perc) < 10 times 3"
  - name: "Network Errors"
    description: >
      "Alarms when either incoming or outgoing network errors are high"
    severity: "MEDIUM"
    expression: "net.in_errors_sec > 5 or net.out_errors_sec > 5"
  - name: "Process Check"
    description: "Alarms when the specified process is not running"
    severity: "HIGH"
    expression: "process.pid_count < 1"
    match_by:
      - "process_name"
      - "hostname"
  - name: "Crash Dump Count"
    description: "Alarms when a crash directory is found"
    severity: "MEDIUM"
    expression: "crash.dump_count > 0"
    match_by:
      - "hostname"

The preceding configuration would result in the creation of an alarm for each unique metric that matched the following criteria.

process.pid_count + process_name + hostname

Check that the alarms exist

Begin by using the following commands, including monasca alarm-definition-list, to check that the alarm definition exists.

ardana > source ~/service.osrc
ardana > monasca alarm-definition-list --name ALARM_DEFINITION_NAME

Then use either of the following commands to check that the alarm has been generated. A status of "OK" indicates a healthy alarm.

ardana > monasca alarm-list --metric-name metric name

Or

ardana > monasca alarm-list --alarm-definition-id ID_FROM_ALARM-DEFINITION-LIST
Note
Note

To see CLI options use the monasca help command.

Alarm state upgrade considerations

If the name of a monitoring metric changes or is no longer being sent, existing alarms will show the alarm state as UNDETERMINED. You can update an alarm definition as long as you do not change the metric name or dimension name values in the expression or match_by fields. If you find that you need to alter either of these values, you must delete the old alarm definitions and create new definitions with the updated values.

If a metric is never sent, but had a related alarm definition, then no alarms would exist. If you find that no metrics are never sent, then you should remove the related alarm definition.

When removing an alarm definition, the Ansible module monasca_alarm_definition supports the state "absent".

The following file snippet shows an example of how to remove an alarm definition by setting the state to absent.

- name: monasca-pre-upgrade | Remove alarm definitions
   monasca_alarm_definition:
     name: "{{ item.name }}"
     state: "absent"
     keystone_url: "{{ monasca_keystone_url }}"
     keystone_user: "{{ monasca_keystone_user }}"
     keystone_password: "{{ monasca_keystone_password }}"
     keystone_project: "{{ monasca_keystone_project }}"
     monasca_api_url: "{{ monasca_api_url }}"
   with_items:
     - { name: "Kafka Consumer Lag" }

An alarm exists in the OK state when the monasca threshold engine has seen at least one metric associated with the alarm definition and has not exceeded the alarm definition threshold.

3.4.7 Openstack Integration of Custom Plugins into Monasca-Agent (if applicable)

Monasca-agent is an OpenStack open-source project. Monasca can also monitor non-openstack services. Third parties should install custom plugins into their SUSE OpenStack Cloud 8 system using the steps outlined in the Section 3.4.3, “Writing Custom Plugins”. If the OpenStack community determines that the custom plugins are of general benefit, the plugin may be added to the openstack/monasca-agent so that they are installed with the monasca-agent. During the review process for openstack/monasca-agent there are no guarantees that code will be approved or merged by a deadline. Open-source contributors are expected to help with codereviews in order to get their code accepted. Once changes are approved and integrated into the openstack/monasca-agent and that version of the monasca-agent is integrated with SUSE OpenStack Cloud 8, the third party can remove the custom plugin installation steps since they would be installed in the default monasca-agent venv.

Find the open source repository for the monaca-agent here: https://github.com/openstack/monasca-agent

4 Managing Identity

The Identity service provides the structure for user authentication to your cloud.

4.1 The Identity Service

This topic explains the purpose and mechanisms of the identity service.

The SUSE OpenStack Cloud Identity service, based on the OpenStack Keystone API, is responsible for providing UserID authentication and access authorization to enable organizations to achieve their access security and compliance objectives and successfully deploy OpenStack. In short, the Identity Service is the gateway to the rest of the OpenStack services.

4.1.1 Which version of the Keystone Identity service should you use?

Use Identity API version 3.0. Identity API v2.0 is deprecated. Many features such as LDAP integration and fine-grained access control will not work with v2.0. Below are a few more questions you may have regarding versions.

Why does the Keystone identity catalog still show version 2.0?

Tempest tests still use the v2.0 API. They are in the process of migrating to v3.0. We will remove the v2.0 version once tempest has migrated the tests. The Identity catalog has v2.0 version just to support tempest migration.

Will the Keystone identity v3.0 API work if the identity catalog has only the v2.0 endpoint?

Identity v3.0 does not rely on the content of the catalog. It will continue to work regardless of the version of the API in the catalog.

Which CLI client should you use?

You should use the OpenStack CLI, not the Keystone CLI as it is deprecated. The Keystone CLI does not support v3.0 API, only the OpenStack CLI supports the v3.0 API.

4.1.2 Authentication

The authentication function provides the initial login function to OpenStack. Keystone supports multiple sources of authentication, including a native or built-in authentication system. The Keystone native system can be used for all user management functions for proof of concept deployments or small deployments not requiring integration with a corporate authentication system, but it lacks some of the advanced functions usually found in user management systems such as forcing password changes. The focus of the Keystone native authentication system is to be the source of authentication for OpenStack-specific users required for the operation of the various OpenStack services. These users are stored by Keystone in a default domain; the addition of these IDs to an external authentication system is not required.

Keystone is more commonly integrated with external authentication systems such as OpenLDAP or Microsoft Active Directory. These systems are usually centrally deployed by organizations to serve as the single source of user management and authentication for all in-house deployed applications and systems requiring user authentication. In addition to LDAP and Microsoft Active Directory, support for integration with Security Assertion Markup Language (SAML)-based identity providers from companies such as Ping, CA, IBM, Oracle, and others is also nearly "production-ready".

Keystone also provides architectural support via the underlying Apache deployment for other types of authentication systems such as Multi-Factor Authentication. These types of systems typically require driver support and integration from the respective provider vendors.

Note
Note

While support for Identity Providers and Multi-factor authentication is available in Keystone, it has not yet been certified by the SUSE OpenStack Cloud engineering team and is an experimental feature in SUSE OpenStack Cloud.

LDAP-compatible directories such as OpenLDAP and Microsoft Active Directory are recommended alternatives to using the Keystone local authentication. Both methods are widely used by organizations and are integrated with a variety of other enterprise applications. These directories act as the single source of user information within an organization. Keystone can be configured to authenticate against an LDAP-compatible directory on a per-domain basis.

Domains, as explained in Section 4.3, “Understanding Domains, Projects, Users, Groups, and Roles”, can be configured so that based on the user ID, a incoming user is automatically mapped to a specific domain. This domain can then be configured to authenticate against a specific LDAP directory. The user credentials provided by the user to Keystone are passed along to the designated LDAP source for authentication. This communication can be optionally configured to be secure via SSL encryption. No special LDAP administrative access is required, and only read-only access is needed for this configuration. Keystone will not add any LDAP information. All user additions, deletions, and modifications are performed by the application's front end in the LDAP directories. After a user has been successfully authenticated, he is then assigned to the groups, roles, and projects defined by the Keystone domain or project administrators. This information is stored within the Keystone service database.

Another form of external authentication provided by the Keystone service is via integration with SAML-based Identity Providers (IdP) such as Ping Identity, IBM Tivoli, and Microsoft Active Directory Federation Server. A SAML-based identity provider provides authentication that is often called "single sign-on". The IdP server is configured to authenticate against identity sources such as Active Directory and provides a single authentication API against multiple types of downstream identity sources. This means that an organization could have multiple identity storage sources but a single authentication source. In addition, if a user has logged into one such source during a defined session time frame, they do not need to re-authenticate within the defined session. Instead, the IdP will automatically validate the user to requesting applications and services.

A SAML-based IdP authentication source is configured with Keystone on a per-domain basis similar to the manner in which native LDAP directories are configured. Extra mapping rules are required in the configuration that define which Keystone group an incoming UID is automatically assigned to. This means that groups need to be defined in Keystone first, but it also removes the requirement that a domain or project admin assign user roles and project membership on a per-user basis. Instead, groups are used to define project membership and roles and incoming users are automatically mapped to Keystone groups based on their upstream group membership. This provides a very consistent role-based access control (RBAC) model based on the upstream identity source. The configuration of this option is fairly straightforward. IdP vendors such as Ping and IBM are contributing to the maintenance of this function and have also produced their own integration documentation. Microsoft Active Directory Federation Services (ADFS) is used for functional testing and future documentation.

The third Keystone-supported authentication source is known as Multi-Factor Authentication (MFA). MFA typically requires an external source of authentication beyond a login name and password, and can include options such as SMS text, a temporal token generator, a fingerprint scanner, etc. Each of these types of MFA are usually specific to a particular MFA vendor. The Keystone architecture supports an MFA-based authentication system, but this has not yet been certified or documented for SUSE OpenStack Cloud.

4.1.3 Authorization

The second major function provided by the Keystone service is access authorization that determines what resources and actions are available based on the UserID, the role of the user, and the projects that a user is provided access to. All of this information is created, managed, and stored by Keystone. These functions are applied via the Horizon web interface, the OpenStack command-line interface, or the direct Keystone API.

Keystone provides support for organizing users via three entities including:

Domains

Domains provide the highest level of organization. Domains are intended to be used as high-level containers for multiple projects. A domain can represent different tenants, companies or organizations for an OpenStack cloud deployed for public cloud deployments or represent major business units, functions, or any other type of top-level organization unit in an OpenStack private cloud deployment. Each domain has at least one Domain Admin assigned to it. This Domain Admin can then create multiple projects within the domain and assign the project admin role to specific project owners. Each domain created in an OpenStack deployment is unique and the projects assigned to a domain cannot exist in another domain.

Projects

Projects are entities within a domain that represent groups of users, each user role within that project, and how many underlying infrastructure resources can be consumed by members of the project.

Groups

Groups are an optional function and provide the means of assigning project roles to multiple users at once.

Keystone also provides the means to create and assign roles to groups of users or individual users. The role names are created and user assignments are made within Keystone. The actual function of a role is defined currently per each OpenStack service via scripts. When a user requests access to an OpenStack service, his access token contains information about his assigned project membership and role for that project. This role is then matched to the service-specific script and the user is allowed to perform functions within that service defined by the role mapping.

4.2 Supported Upstream Keystone Features

4.2.1 OpenStack upstream features that are enabled by default in SUSE OpenStack Cloud 8

The following supported Keystone features are enabled by default in the SUSE OpenStack Cloud 8 release.

NameUser/AdminNote: API support only. No CLI/UI support
Implied RolesAdminhttps://blueprints.launchpad.net/keystone/+spec/implied-roles
Domain-Specific RolesAdminhttps://blueprints.launchpad.net/keystone/+spec/domain-specific-roles

Implied rules

To allow for the practice of hierarchical permissions in user roles, this feature enables roles to be linked in such a way that they function as a hierarchy with role inheritance.

When a user is assigned a superior role, the user will also be assigned all roles implied by any subordinate roles. The hierarchy of the assigned roles will be expanded when issuing the user a token.

Domain-specific roles

This feature extends the principle of implied roles to include a set of roles that are specific to a domain. At the time a token is issued, the domain-specific roles are not included in the token, however, the roles that they map to are.

4.2.2 OpenStack upstream features that are disabled by default in SUSE OpenStack Cloud 8

The following is a list of features which are fully supported in the SUSE OpenStack Cloud 8 release, but are disabled by default. Customers can run a playbook to enable the features.

NameUser/AdminReason Disabled
Support multiple LDAP backends via per-domain configurationAdminNeeds explicit configuration.
WebSSOUser and AdminNeeds explicit configuration.
Keystone-to-Keystone (K2K) federationUser and AdminNeeds explicit configuration.
Fernet token providerUser and AdminNeeds explicit configuration.
Domain-specific config in SQLAdminDomain specific configuration options can be stored in SQL instead of configuration files, using the new REST APIs.

Multiple LDAP backends for each domain

This feature allows identity backends to be configured on a domain-by-domain basis. Domains will be capable of having their own exclusive LDAP service (or multiple services). A single LDAP service can also serve multiple domains, with each domain in a separate subtree.

To implement this feature, individual domains will require domain-specific configuration files. Domains that do not implement this feature will continue to share a common backend driver.

WebSSO

This feature enables the Keystone service to provide federated identity services through a token-based single sign-on page. This feature is disabled by default, as it requires explicit configuration.

Keystone-to-Keystone (K2K) federation

This feature enables separate Keystone instances to federate identities among the instances, offering inter-cloud authorization. This feature is disabled by default, as it requires explicit configuration.

Fernet token provider

Provides tokens in the fernet format. This is an experimental feature and is disabled by default.

Domain-specific config in SQL

Using the new REST APIs, domain-specific configuration options can be stored in a SQL database instead of in configuration files.

4.2.3 Stack upstream features that have been specifically disabled in SUSE OpenStack Cloud 8

The following is a list of extensions which are disabled by default in SUSE OpenStack Cloud 8, according to Keystone policy.

Target ReleaseNameUser/AdminReason Disabled
TBDEndpoint FilteringAdmin

This extension was implemented to facilitate service activation. However, due to lack of enforcement at the service side, this feature is only half effective right now.

TBDEndpoint PolicyAdmin

This extension was intended to facilitate policy (policy.json) management and enforcement. This feature is useless right now due to lack of the needed middleware to utilize the policy files stored in Keystone.

TBDOATH 1.0aUser and Admin

Complexity in workflow. Lack of adoption. Its alternative, Keystone Trust, is enabled by default. HEAT is using Keystone Trust.

TBDRevocation EventsAdmin

For PKI token only and PKI token is disabled by default due to usability concerns.

TBDOS CERTAdmin

For PKI token only and PKI token is disabled by default due to usability concerns.

TBDPKI TokenAdmin

PKI token is disabled by default due to usability concerns.

TBDDriver level cachingAdmin

Driver level caching is disabled by default due to complexity in setup.

TBDTokenless AuthzAdmin

Tokenless authorization with X.509 SSL client certificate.

TBDTOTP AuthenticationUser

Not fully baked. Has not been battle-tested.

TBDis_admin_projectAdmin

No integration with the services.

4.3 Understanding Domains, Projects, Users, Groups, and Roles

The identity service uses these concepts for authentication within your cloud and these are descriptions of each of them.

The SUSE OpenStack Cloud 8 identity service uses OpenStack Keystone and the concepts of domains, projects, users, groups, and roles to manage authentication. This page describes how these work together.

4.3.1 Domains, Projects, Users, Groups, and Roles

Most large business organizations use an identity system such as Microsoft Active Directory to store and manage their internal user information. A variety of applications such as HR systems are, in turn, used to manage the data inside of Active Directory. These same organizations often deploy a separate user management system for external users such as contractors, partners, and customers. Multiple authentication systems are then deployed to support multiple types of users.

An LDAP-compatible directory such as Active Directory provides a top-level organization or domain component. In this example, the organization is called Acme. The domain component (DC) is defined as acme.com. Underneath the top level domain component are entities referred to as organizational units (OU). Organizational units are typically designed to reflect the entity structure of the organization. For example, this particular schema has 3 different organizational units for the Marketing, IT, and Contractors units or departments of the Acme organization. Users (and other types of entities like printers) are then defined appropriately underneath each organizational entity. The Keystone domain entity can be used to match the LDAP OU entity; each LDAP OU can have a corresponding Keystone domain created. In this example, both the Marketing and IT domains represent internal employees of Acme and use the same authentication source. The Contractors domain contains all external people associated with Acme. UserIDs associated with the Contractor domain are maintained in a separate user directory and thus have a different authentication source assigned to the corresponding Keystone-defined Contractors domain.

A public cloud deployment usually supports multiple, separate organizations. Keystone domains can be created to provide a domain per organization with each domain configured to the underlying organization's authentication source. For example, the ABC company would have a Keystone domain created called "abc". All users authenticating to the "abc" domain would be authenticated against the authentication system provided by the ABC organization; in this case ldap://ad.abc.com

4.3.2 Domains

A domain is a top-level container targeted at defining major organizational entities.

  • Domains can be used in a multi-tenant OpenStack deployment to segregate projects and users from different companies in a public cloud deployment or different organizational units in a private cloud setting.

  • Domains provide the means to identify multiple authentication sources.

  • Each domain is unique within an OpenStack implementation.

  • Multiple projects can be assigned to a domain but each project can only belong to a single domain.

  • Each domain and project have an assigned admin.

  • Domains are created by the "admin" service account and domain admins are assigned by the "admin" user.

  • The "admin" UserID (UID) is created during the Keystone installation, has the "admin" role assigned to it, and is defined as the "Cloud Admin". This UID is created using the "magic" or "secret" admin token found in the default 'keystone.conf' file installed during SUSE OpenStack Cloud keystone installation after the Keystone service has been installed. This secret token should be removed after installation and the "admin" password changed.

  • The "default" domain is created automatically during the SUSE OpenStack Cloud Keystone installation.

  • The "default" domain contains all OpenStack service accounts that are installed during the SUSE OpenStack Cloud keystone installation process.

  • No users but the OpenStack service accounts should be assigned to the "default" domain.

  • Domain admins can be any UserID inside or outside of the domain.

4.3.3 Domain Administrator

A UUID is a domain administrator for a given domain if that UID has a domain-scoped token scoped for the given domain. This means that the UID has the "admin" role assigned to it for the selected domain.

  • The Cloud Admin UID assigns the domain administrator role for a domain to a selected UID.

  • A domain administrator can create and delete local users who have authenticated against Keystone. These users will be assigned to the domain belonging to the domain administrator who creates the UserID.

  • A domain administrator can only create users and projects within her assigned domains.

  • A domain administrator can assign the "admin" role of their domains to another UID or revoke it; each UID with the "admin" role for a specified domain will be a co-administrator for that domain.

  • A UID can be assigned to be the domain admin of multiple domains.

  • A domain administrator can assign non-admin roles to any users and groups within their assigned domain, including projects owned by their assigned domain.

  • A domain admin UID can belong to projects within their administered domains.

  • Each domain can have a different authentication source.

  • The domain field is used during the initial login to define the source of authentication.

  • The "List Users" function can only be executed by a UID with the domain admin role.

  • A domain administrator can assign a UID from outside of their domain the "domain admin" role but it is assumed that the domain admin would know the specific UID and would not need to list users from an external domain.

  • A domain administrator can assign a UID from outside of their domain the "project admin" role for a specific project within their domain but it is assumed that the domain admin would know the specific UID and would not need to list users from an external domain.

4.3.4 Projects

The domain administrator creates projects within his assigned domain and assigns the project admin role to each project to a selected UID. A UID is a project administrator for a given project if that UID has a project-scoped token scoped for the given project. There can be multiple projects per domain. The project admin sets the project quota settings, adds/deletes users and groups to and from the project, and defines the user/group roles for the assigned project. Users can be belong to multiple projects and have different roles on each project. Users are assigned to a specific domain and a default project. Roles are assigned per project.

4.3.5 Users and Groups

Each user belongs to one domain only. Domain assignments are defined either by the domain configuration files or by a domain administrator when creating a new, local (user authenticated against Keystone) user. There is no current method for "moving" a user from one domain to another. A user can belong to multiple projects within a domain with a different role assignment per project. A group is a collection of users. Users can be assigned to groups either by the project admin or automatically via mappings if an external authentication source is defined for the assigned domain. Groups can be assigned to multiple projects within a domain and have different roles assigned to the group per project. A group can be assigned the "admin" role for a domain or project. All members of the group will be an "admin" for the selected domain or project.

4.3.6 Roles

Service roles represent the functionality used to implement the OpenStack role based access control (RBAC), model used to manage access to each OpenStack service. Roles are named and assigned per user or group for each project by the identity service. Role definition and policy enforcement are defined outside of the identity service independently by each OpenStack service. The token generated by the identity service for each user authentication contains the role assigned to that user for a particular project. When a user attempts to access a specific OpenStack service, the role is parsed by the service, compared to the service-specific policy file, and then granted the resource access defined for that role by the service policy file.

Each service has its own service policy file with the /etc/[SERVICE_CODENAME]/policy.json file name format where [SERVICE_CODENAME] represents a specific OpenStack service name. For example, the OpenStack Nova service would have a policy file called /etc/nova/policy.json. With Service policy files can be modified and deployed to control nodes from the Cloud Lifecycle Manager. Administrators are advised to validate policy changes before checking in the changes to the site branch of the local git repository before rolling the changes into production. Do not make changes to policy files without having a way to validate them.

The policy files are located at the following site branch locations on the Cloud Lifecycle Manager.

~/openstack/ardana/ansible/roles/GLA-API/templates/policy.json.j2
~/openstack/ardana/ansible/roles/ironic-common/files/policy.json
~/openstack/ardana/ansible/roles/KEYMGR-API/templates/policy.json
~/openstack/ardana/ansible/roles/heat-common/files/policy.json
~/openstack/ardana/ansible/roles/CND-API/templates/policy.json
~/openstack/ardana/ansible/roles/nova-common/files/policy.json
~/openstack/ardana/ansible/roles/CEI-API/templates/policy.json.j2
~/openstack/ardana/ansible/roles/neutron-common/templates/policy.json.j2

For test and validation, policy files can be modified in a non-production environment from the ~/scratch/ directory. For a specific policy file, run a search for policy.json. To deploy policy changes for a service, run the service specific reconfiguration playbook (for example, nova-reconfigure.yml). For a complete list of reconfiguration playbooks, change directories to ~/scratch/ansible/next/ardana/ansible and run this command:

ardana > ls | grep reconfigure

A read-only role named project_observer is explicitly created in SUSE OpenStack Cloud 8. Any user who is granted this role can use list_project.

4.4 Identity Service Token Validation Example

The following diagram illustrates the flow of typical Identity Service (Keystone) requests/responses between SUSE OpenStack Cloud services and the Identity service. It shows how Keystone issues and validates tokens to ensure the identity of the caller of each service.

  1. Horizon sends an HTTP authentication request to Keystone for user credentials.

  2. Keystone validates the credentials and replies with token.

  3. Horizon sends a POST request, with token to Nova to start provisioning a virtual machine.

  4. Nova sends token to Keystone for validation.

  5. Keystone validates the token.

  6. Nova forwards a request for an image with the attached token.

  7. Glance sends token to Keystone for validation.

  8. Keystone validates the token.

  9. Glance provides image-related information to Nova.

  10. Nova sends request for networks to Neutron with token.

  11. Neutron sends token to Keystone for validation.

  12. Keystone validates the token.

  13. Neutron provides network-related information to Nova.

  14. Nova reports the status of the virtual machine provisioning request.

4.5 Configuring the Identity Service

4.5.1 What is the Identity service?

The SUSE OpenStack Cloud Identity service, based on the OpenStack Keystone API, provides UserID authentication and access authorization to help organizations achieve their access security and compliance objectives and successfully deploy OpenStack. In short, the Identity service is the gateway to the rest of the OpenStack services.

The identity service is installed automatically by the Cloud Lifecycle Manager (just after MySQL and RabbitMQ). When your cloud is up and running, you can customize Keystone in a number of ways, including integrating with LDAP servers. This topic describes the default configuration. See Section 4.8, “Reconfiguring the Identity Service” for changes you can implement. Also see Section 4.9, “Integrating LDAP with the Identity Service” for information on integrating with an LDAP provider.

4.5.2 Which version of the Keystone Identity service should you use?

Note that you should use identity API version 3.0. Identity API v2.0 was has been deprecated. Many features such as LDAP integration and fine-grained access control will not work with v2.0. The following are a few questions you may have regarding versions.

Why does the Keystone identity catalog still show version 2.0?

Tempest tests still use the v2.0 API. They are in the process of migrating to v3.0. We will remove the v2.0 version once tempest has migrated the tests. The Identity catalog has version 2.0 just to support tempest migration.

Will the Keystone identity v3.0 API work if the identity catalog has only the v2.0 endpoint?

Identity v3.0 does not rely on the content of the catalog. It will continue to work regardless of the version of the API in the catalog.

Which CLI client should you use?

You should use the OpenStack CLI, not the Keystone CLI, because it is deprecated. The Keystone CLI does not support the v3.0 API; only the OpenStack CLI supports the v3.0 API.

4.5.3 Authentication

The authentication function provides the initial login function to OpenStack. Keystone supports multiple sources of authentication, including a native or built-in authentication system. You can use the Keystone native system for all user management functions for proof-of-concept deployments or small deployments not requiring integration with a corporate authentication system, but it lacks some of the advanced functions usually found in user management systems such as forcing password changes. The focus of the Keystone native authentication system is to be the source of authentication for OpenStack-specific users required to operate various OpenStack services. These users are stored by Keystone in a default domain; the addition of these IDs to an external authentication system is not required.

Keystone is more commonly integrated with external authentication systems such as OpenLDAP or Microsoft Active Directory. These systems are usually centrally deployed by organizations to serve as the single source of user management and authentication for all in-house deployed applications and systems requiring user authentication. In addition to LDAP and Microsoft Active Directory, support for integration with Security Assertion Markup Language (SAML)-based identity providers from companies such as Ping, CA, IBM, Oracle, and others is also nearly "production-ready."

Keystone also provides architectural support through the underlying Apache deployment for other types of authentication systems, such as multi-factor authentication. These types of systems typically require driver support and integration from the respective providers.

Note
Note

While support for Identity providers and multi-factor authentication is available in Keystone, it has not yet been certified by the SUSE OpenStack Cloud engineering team and is an experimental feature in SUSE OpenStack Cloud.

LDAP-compatible directories such as OpenLDAP and Microsoft Active Directory are recommended alternatives to using Keystone local authentication. Both methods are widely used by organizations and are integrated with a variety of other enterprise applications. These directories act as the single source of user information within an organization. You can configure Keystone to authenticate against an LDAP-compatible directory on a per-domain basis.

Domains, as explained in Section 4.3, “Understanding Domains, Projects, Users, Groups, and Roles”, can be configured so that, based on the user ID, an incoming user is automatically mapped to a specific domain. You can then configure this domain to authenticate against a specific LDAP directory. User credentials provided by the user to Keystone are passed along to the designated LDAP source for authentication. You can optionally configure this communication to be secure through SSL encryption. No special LDAP administrative access is required, and only read-only access is needed for this configuration. Keystone will not add any LDAP information. All user additions, deletions, and modifications are performed by the application's front end in the LDAP directories. After a user has been successfully authenticated, that user is then assigned to the groups, roles, and projects defined by the Keystone domain or project administrators. This information is stored in the Keystone service database.

Another form of external authentication provided by the Keystone service is through integration with SAML-based identity providers (IdP) such as Ping Identity, IBM Tivoli, and Microsoft Active Directory Federation Server. A SAML-based identity provider provides authentication that is often called "single sign-on." The IdP server is configured to authenticate against identity sources such as Active Directory and provides a single authentication API against multiple types of downstream identity sources. This means that an organization could have multiple identity storage sources but a single authentication source. In addition, if a user has logged into one such source during a defined session time frame, that user does not need to reauthenticate within the defined session. Instead, the IdP automatically validates the user to requesting applications and services.

A SAML-based IdP authentication source is configured with Keystone on a per-domain basis similar to the manner in which native LDAP directories are configured. Extra mapping rules are required in the configuration that define which Keystone group an incoming UID is automatically assigned to. This means that groups need to be defined in Keystone first, but it also removes the requirement that a domain or project administrator assign user roles and project membership on a per-user basis. Instead, groups are used to define project membership and roles and incoming users are automatically mapped to Keystone groups based on their upstream group membership. This strategy provides a consistent role-based access control (RBAC) model based on the upstream identity source. The configuration of this option is fairly straightforward. IdP vendors such as Ping and IBM are contributing to the maintenance of this function and have also produced their own integration documentation. HPE is using the Microsoft Active Directory Federation Services (AD FS) for functional testing and future documentation.

The third Keystone-supported authentication source is known as multi-factor authentication (MFA). MFA typically requires an external source of authentication beyond a login name and password, and can include options such as SMS text, a temporal token generator, or a fingerprint scanner. Each of these types of MFAs are usually specific to a particular MFA vendor. The Keystone architecture supports an MFA-based authentication system, but this has not yet been certified or documented for SUSE OpenStack Cloud.

4.5.4 Authorization

Another major function provided by the Keystone service is access authorization that determines which resources and actions are available based on the UserID, the role of the user, and the projects that a user is provided access to. All of this information is created, managed, and stored by Keystone. These functions are applied through the Horizon web interface, the OpenStack command-line interface, or the direct Keystone API.

Keystone provides support for organizing users by using three entities:

Domains

Domains provide the highest level of organization. Domains are intended to be used as high-level containers for multiple projects. A domain can represent different tenants, companies, or organizations for an OpenStack cloud deployed for public cloud deployments or it can represent major business units, functions, or any other type of top-level organization unit in an OpenStack private cloud deployment. Each domain has at least one Domain Admin assigned to it. This Domain Admin can then create multiple projects within the domain and assign the project administrator role to specific project owners. Each domain created in an OpenStack deployment is unique and the projects assigned to a domain cannot exist in another domain.

Projects

Projects are entities within a domain that represent groups of users, each user role within that project, and how many underlying infrastructure resources can be consumed by members of the project.

Groups

Groups are an optional function and provide the means of assigning project roles to multiple users at once.

Keystone also makes it possible to create and assign roles to groups of users or individual users. Role names are created and user assignments are made within Keystone. The actual function of a role is defined currently for each OpenStack service via scripts. When users request access to an OpenStack service, their access tokens contain information about their assigned project membership and role for that project. This role is then matched to the service-specific script and users are allowed to perform functions within that service defined by the role mapping.

4.5.5 Default settings

Identity service configuration settings

The identity service configuration options are described in the OpenStack documentation on the Keystone Configuration Options page on the OpenStack site.

Default domain and service accounts

The "default" domain is automatically created during the installation to contain the various required OpenStack service accounts, including the following:

  • neutron

  • glance

  • swift-monitor

  • ceilometer

  • swift

  • monasca-agent

  • glance-swift

  • swift-demo

  • nova

  • monasca

  • logging

  • demo

  • heat

  • cinder

  • admin

These are required accounts and are used by the underlying OpenStack services. These accounts should not be removed or reassigned to a different domain. These "default" domain should be used only for these service accounts.

For details on how to create additional users, see Book “User Guide Overview”, Chapter 4 “Cloud Admin Actions with the Command Line”.

4.5.6 Preinstalled roles

The following are the preinstalled roles. You can create additional roles by UIDs with the "admin" role. Roles are defined on a per-service basis (more information is available at Manage projects, users, and roles on the OpenStack website).

RoleDescription
admin

The "superuser" role. Provides full access to all SUSE OpenStack Cloud services across all domains and projects. This role should be given only to a cloud administrator.

_member_

A general role that enables a user to access resources within an assigned project including creating, modifying, and deleting compute, storage, and network resources.

You can find additional information on these roles in each service policy stored in the /etc/PROJECT/policy.json files where PROJECT is a placeholder for an OpenStack service. For example, the Compute (Nova) service roles are stored in the /etc/nova/policy.json file. Each service policy file defines the specific API functions available to a role label.

4.6 Retrieving the Admin Password

The admin password will be used to access the dashboard and Operations Console as well as allow you to authenticate to use the command-line tools and API.

In a default SUSE OpenStack Cloud 8 installation there is a randomly generated password for the Admin user created. These steps will show you how to retrieve this password.

4.6.1 Retrieving the Admin Password

You can retrieve the randomly generated Admin password by using this command on the Cloud Lifecycle Manager:

ardana > cat ~/service.osrc

In this example output, the value for OS_PASSWORD is the Admin password:

ardana > cat ~/service.osrc
unset OS_DOMAIN_NAME
export OS_IDENTITY_API_VERSION=3
export OS_AUTH_VERSION=3
export OS_PROJECT_NAME=admin
export OS_PROJECT_DOMAIN_NAME=Default
export OS_USERNAME=admin
export OS_USER_DOMAIN_NAME=Default
export OS_PASSWORD=SlWSfwxuJY0
export OS_AUTH_URL=https://10.13.111.145:5000/v3
export OS_ENDPOINT_TYPE=internalURL
# OpenstackClient uses OS_INTERFACE instead of OS_ENDPOINT
export OS_INTERFACE=internal
export OS_CACERT=/etc/ssl/certs/ca-certificates.crt
export OS_COMPUTE_API_VERSION=2

4.7 Changing Service Passwords

SUSE OpenStack Cloud provides a process for changing the default service passwords, including your admin user password, which you may want to do for security or other purposes.

You can easily change the inter-service passwords used for authenticating communications between services in your SUSE OpenStack Cloud deployment, promoting better compliance with your organization’s security policies. The inter-service passwords that can be changed include (but are not limited to) Keystone, MariaDB, RabbitMQ, Cloud Lifecycle Manager cluster, Monasca and Barbican.

The general process for changing the passwords is to:

  • Indicate to the configuration processor which password(s) you want to change, and optionally include the value of that password

  • Run the configuration processor to generate the new passwords (you do not need to run git add before this)

  • Run ready-deployment

  • Check your password name(s) against the tables included below to see which high-level credentials-change playbook(s) you need to run

  • Run the appropriate high-level credentials-change playbook(s)

4.7.1 Password Strength

Encryption passwords supplied to the configuration processor for use with Ansible Vault and for encrypting the configuration processor’s persistent state must have a minimum length of 12 characters and a maximum of 128 characters. Passwords must contain characters from each of the following three categories:

  • Uppercase characters (A-Z)

  • Lowercase characters (a-z)

  • Base 10 digits (0-9)

Service Passwords that are automatically generated by the configuration processor are chosen from the 62 characters made up of the 26 uppercase, the 26 lowercase, and the 10 numeric characters, with no preference given to any character or set of characters, with the minimum and maximum lengths being determined by the specific requirements of individual services.

Important
Important

Currently, you can not use any special characters with Ansible Vault, Service Passwords, or vCenter configuration.

4.7.2 Telling the configuration processor which password(s) you want to change

In SUSE OpenStack Cloud 8, the configuration processor will produce metadata about each of the passwords (and other variables) that it generates in the file ~/openstack/my_cloud/info/private_data_metadata_ccp.yml. A snippet of this file follows. Expand the header to see the file:

4.7.3 private_data_metadata_ccp.yml

metadata_proxy_shared_secret:
  metadata:
  - clusters:
    - cluster1
    component: nova-metadata
    consuming-cp: ccp
    cp: ccp
  version: '2.0'
mysql_admin_password:
  metadata:
  - clusters:
    - cluster1
    component: ceilometer
    consumes: mysql
    consuming-cp: ccp
    cp: ccp
  - clusters:
    - cluster1
    component: heat
    consumes: mysql
    consuming-cp: ccp
    cp: ccp
  - clusters:
    - cluster1
    component: keystone
    consumes: mysql
    consuming-cp: ccp
    cp: ccp
  - clusters:
    - cluster1
    - compute
    component: nova
    consumes: mysql
    consuming-cp: ccp
    cp: ccp
  - clusters:
    - cluster1
    component: cinder
    consumes: mysql
    consuming-cp: ccp
    cp: ccp
  - clusters:
    - cluster1
    component: glance
    consumes: mysql
    consuming-cp: ccp
    cp: ccp
  - clusters:
    - cluster1
    - compute
    component: neutron
    consumes: mysql
    consuming-cp: ccp
    cp: ccp
  - clusters:
    - cluster1
    component: horizon
    consumes: mysql
    consuming-cp: ccp
    cp: ccp
  version: '2.0'
mysql_barbican_password:
  metadata:
  - clusters:
    - cluster1
    component: barbican
    consumes: mysql
    consuming-cp: ccp
    cp: ccp
  version: '2.0'

For each variable, there is a metadata entry for each pair of services that use the variable including a list of the clusters on which the service component that consumes the variable (defined as "component:" in private_data_metadata_ccp.yml above) runs.

Note above that the variable mysql_admin_password is used by a number of service components, and the service that is consumed in each case is mysql, which in this context refers to the MariaDB instance that is part of the product.

4.7.4 Steps to change a password

First, make sure that you have a copy of private_data_metadata_ccp.yml. If you do not, generate one to run the configuration processor:

ardana > cd ~/openstack/ardana/ansible
ardana > ansible-playbook -i hosts/localhost config-processor-run.yml

Make a copy of the private_data_metadata_ccp.yml file and place it into the ~/openstack/change_credentials directory:

ardana > cp ~/openstack/my_cloud/info/private_data_metadata_control-plane-1.yml \
 ~/openstack/change_credentials/

Edit the copied file in ~/openstack/change_credentials leaving only those passwords you intend to change. All entries in this template file should be deleted except for those passwords.

Important
Important

If you leave other passwords in that file that you do not want to change, they will be regenerated and no longer match those in use which could disrupt operations.

Note
Note

It is required that you change passwords in batches of each category listed below.

For example, the snippet below would result in the configuration processor generating new random values for keystone_backup_password, keystone_ceilometer_password, and keystone_cinder_password:

keystone_backup_password:
  metadata:
  - clusters:
    - cluster0
    - cluster1
    - compute
    component: freezer-agent
    consumes: keystone-api
    consuming-cp: ccp
    cp: ccp
  version: '2.0'
keystone_ceilometer_password:
  metadata:
  - clusters:
    - cluster1
    component: ceilometer-common
    consumes: keystone-api
    consuming-cp: ccp
    cp: ccp
  version: '2.0'
keystone_cinder_password:
  metadata:
  - clusters:
    - cluster1
    component: cinder-api
    consumes: keystone-api
    consuming-cp: ccp
    cp: ccp
  version: '2.0'

4.7.5 Specifying password value

Optionally, you can specify a value for the password by including a "value:" key and value at the same level as metadata:

keystone_backup_password:
    value: 'new_password'
    metadata:
    - clusters:
        - cluster0
        - cluster1
        - compute
        component: freezer-agent
        consumes: keystone-api
        consuming-cp: ccp
        cp: ccp
      version: '2.0'

Note that you can have multiple files in openstack/change_credentials. The configuration processor will only read files that end in .yml or .yaml.

Note
Note

If you have specified a password value in your credential change file, you may want to encrypt it using ansible-vault. If you decide to encrypt with ansible-vault, make sure that you use the encryption key you have already used when running the configuration processor.

To encrypt a file using ansible-vault, execute:

ardana > cd ~/openstack/change_credentials
ardana > ansible-vault encrypt credential change file ending in .yml or .yaml

Be sure to provide the encryption key when prompted. Note that if you have specified the wrong ansible-vault password, the configuration-processor will error out with a message like the following:

################################################## Reading Persistent State ##################################################

################################################################################
# The configuration processor failed.
# PersistentStateCreds: User-supplied creds file test1.yml was not parsed properly
################################################################################

4.7.6 Running the configuration processor to change passwords

The directory openstack/change_credentials is not managed by git, so to rerun the configuration processor to generate new passwords and prepare for the next deployment just enter the following commands:

ardana > cd ~/openstack/ardana/ansible
ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
Note
Note

The files that you placed in ~/openstack/change_credentials should be removed once you have run the configuration processor because the old password values and new password values will be stored in the configuration processor's persistent state.

Note that if you see output like the following after running the configuration processor:

################################################################################
# The configuration processor completed with warnings.
# PersistentStateCreds: User-supplied password name 'blah' is not valid
################################################################################

this tells you that the password name you have supplied, 'blah,' does not exist. A failure to correctly parse the credentials change file will result in the configuration processor erroring out with a message like the following:

################################################## Reading Persistent State ##################################################

################################################################################
# The configuration processor failed.
# PersistentStateCreds: User-supplied creds file test1.yml was not parsed properly
################################################################################

Once you have run the configuration processor to change passwords, an information file ~/openstack/my_cloud/info/password_change.yml similar to the private_data_metadata_ccp.yml is written to tell you which passwords have been changed, including metadata but not including the values.

4.7.7 Password change playbooks and tables

Once you have completed the steps above to change password(s) value(s) and then prepare for the deployment that will actually switch over to the new passwords, you will need to run some high-level playbooks. The passwords that can be changed are grouped into six categories. The tables below list the password names that belong in each category. The categories are:

Keystone

Playbook: ardana-keystone-credentials-change.yml

RabbitMQ

Playbook: ardana-rabbitmq-credentials-change.yml

MariaDB

Playbook: ardana-reconfigure.yml

Cluster:

Playbook: ardana-cluster-credentials-change.yml

Monasca:

Playbook: monasca-reconfigure-credentials-change.yml

Other:

Playbook: ardana-other-credentials-change.yml

It is recommended that you change passwords in batches; in other words, run through a complete password change process for each batch of passwords, preferably in the above order. Once you have followed the process indicated above to change password(s), check the names against the tables below to see which password change playbook(s) you should run.

Changing identity service credentials

The following table lists identity service credentials you can change.

Keystone credentials
Password name
barbican_admin_password
barbican_service_password
keystone_admin_pwd
keystone_admin_token
keystone_backup_password
keystone_ceilometer_password
keystone_cinder_password
keystone_cinderinternal_password
keystone_demo_pwd
keystone_designate_password
keystone_freezer_password
keystone_glance_password
keystone_glance_swift_password
keystone_heat_password
keystone_magnum_password
keystone_monasca_agent_password
keystone_monasca_password
keystone_neutron_password
keystone_nova_password
keystone_octavia_password
keystone_swift_dispersion_password
keystone_swift_monitor_password
keystone_swift_password
logging_keystone_password
nova_monasca_password

The playbook to run to change Keystone credentials is ardana-keystone-credentials-change.yml. Execute the following commands to make the changes:

ardana > cd ~/scratch/ansible/next/ardana/ansible/
ardana > ansible-playbook -i hosts/verb_hosts ardana-keystone-credentials-change.yml

Changing RabbitMQ credentials

The following table lists the RabbitMQ credentials you can change.

RabbitMQ credentials
Password name
ops_mon_rmq_password
rmq_barbican_password
rmq_ceilometer_password
rmq_cinder_password
rmq_designate_password
rmq_keystone_password
rmq_magnum_password
rmq_monasca_monitor_password
rmq_nova_password
rmq_octavia_password
rmq_service_password

The playbook to run to change RabbitMQ credentials is ardana-rabbitmq-credentials-change.yml. Execute the following commands to make the changes:

ardana > cd ~/scratch/ansible/next/ardana/ansible/
ardana > ansible-playbook -i hosts/verb_hosts ardana-rabbitmq-credentials-change.yml

Changing MariaDB credentials

The following table lists the MariaDB credentials you can change.

MariaDB credentials
Password name
mysql_admin_password
mysql_barbican_password
mysql_clustercheck_pwd
mysql_designate_password
mysql_magnum_password
mysql_monasca_api_password
mysql_monasca_notifier_password
mysql_monasca_thresh_password
mysql_octavia_password
mysql_powerdns_password
mysql_root_pwd
mysql_service_pwd
mysql_sst_password
ops_mon_mdb_password
mysql_monasca_transform_password
mysql_nova_api_password
password

The playbook to run to change MariaDB credentials is ardana-reconfigure.yml. To make the changes, execute the following commands:

ardana > cd ~/scratch/ansible/next/ardana/ansible/
ardana > ansible-playbook -i hosts/verb_hosts ardana-reconfigure.yml

Changing cluster credentials

The following table lists the cluster credentials you can change.

cluster credentials
Password name
haproxy_stats_password
keepalive_vrrp_password

The playbook to run to change cluster credentials is ardana-cluster-credentials-change.yml. To make changes, execute the following commands:

ardana > cd ~/scratch/ansible/next/ardana/ansible/
ardana > ansible-playbook -i hosts/verb_hosts ardana-cluster-credentials-change.yml

Changing Monasca credentials

The following table lists the Monasca credentials you can change.

Monasca credentials
Password name
mysql_monasca_api_password
mysql_monasca_persister_password
monitor_user_password
cassandra_monasca_api_password
cassandra_monasca_persister_password

The playbook to run to change Monasca credentials is monasca-reconfigure-credentials-change.yml. To make the changes, execute the following commands:

ardana > cd ~/scratch/ansible/next/ardana/ansible/
ardana > ansible-playbook -i hosts/verb_hosts monasca-reconfigure-credentials-change.yml

Changing other credentials

The following table lists the other credentials you can change.

Other credentials
Password name
logging_beaver_password
logging_api_password
logging_monitor_password
logging_kibana_password

The playbook to run to change these credentials is ardana-other-credentials-change.yml. To make the changes, execute the following commands:

ardana > cd ~/scratch/ansible/next/ardana/ansible/
ardana > ansible-playbook -i hosts/verb_hosts ardana-other-credentials-change.yml

4.7.8 Changing RADOS Gateway Credential

To change the keystone credentials of RADOS Gateway, follow the preceding steps documented in Section 4.7, “Changing Service Passwords” by modifying the keystone_rgw_password section in private_data_metadata_ccp.yml file in Section 4.7.4, “Steps to change a password” or Section 4.7.5, “Specifying password value”.

4.7.9 Immutable variables

The values of certain variables are immutable, which means that once they have been generated by the configuration processor they cannot be changed. These variables are:

  • barbican_master_kek_db_plugin

  • swift_hash_path_suffix

  • swift_hash_path_prefix

  • mysql_cluster_name

  • heartbeat_key

  • erlang_cookie

The configuration processor will not re-generate the values of the above passwords, nor will it allow you to specify a value for them. In addition to the above variables, the following are immutable in SUSE OpenStack Cloud 8:

  • All ssh keys generated by the configuration processor

  • All UUIDs generated by the configuration processor

  • metadata_proxy_shared_secret

  • horizon_secret_key

  • ceilometer_metering_secret

4.8 Reconfiguring the Identity Service

4.8.1 Updating the Keystone Identity Service

This topic explains configuration options for the Identity service.

SUSE OpenStack Cloud lets you perform updates on the following parts of the Identity service configuration:

4.8.2 Updating the Main Identity Service Configuration File

  1. The main Keystone Identity service configuration file (/etc/keystone/keystone.conf), located on each control plane server, is generated from the following template file located on a Cloud Lifecycle Manager: ~/openstack/my_cloud/config/keystone/keystone.conf.j2

    Modify this template file as appropriate. See Keystone Liberty documentation for full descriptions of all settings. This is a Jinja2 template, which expects certain template variables to be set. Do not change values inside double curly braces: {{ }}.

    Note
    Note

    SUSE OpenStack Cloud 8 has the following token expiration setting, which differs from the upstream value 3600:

    [token]
    expiration = 14400
  2. After you modify the template, commit the change to the local git repository, and rerun the configuration processor / deployment area preparation playbooks (as suggested in Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”):

    ardana > cd ~/openstack
    ardana > git checkout site
    ardana > git add my_cloud/config/keystone/keystone.conf.j2
    ardana > git commit -m "Adjusting some parameters in keystone.conf"
    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  3. Run the reconfiguration playbook in the deployment area:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml

4.8.3 Enabling Identity Service Features

To enable or disable Keystone features, do the following:

  1. Adjust respective parameters in ~/openstack/my_cloud/config/keystone/keystone_deploy_config.yml

  2. Commit the change into local git repository, and rerun the configuration processor/deployment area preparation playbooks (as suggested in Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”):

    ardana > cd ~/openstack
    ardana > git checkout site
    ardana > git add my_cloud/config/keystone/keystone_deploy_config.yml
    ardana > git commit -m "Adjusting some WSGI or logging parameters for keystone"
    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  3. Run the reconfiguration playbook in the deployment area:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml

4.8.4 Fernet Tokens

SUSE OpenStack Cloud 8 supports UUID tokens by default. Fernet tokens are available as an experimental feature. You can switch to using Fernet tokens instead of UUID tokens by following the preceding steps. The benefit of using Fernet tokens is that tokens are not persisted in a database, which is helpful if you want to deploy the Keystone Identity service as one master and multiple slaves; only roles, projects, and other details will need to be replicated from master to slaves, not the token table. The tradeoff is the cost of token validation. According to our performance testing, although Fernet tokens perform slightly better than UUID on token creation, the degradation is about 400% on token validation when compared to UUID tokens. This performance degradation is caused mainly by database operations; 39 queries occur per each Fernet token validation, while UUID causes only two queries per token validation.

Note
Note

Tempest does not work with Fernet tokens in SUSE OpenStack Cloud 8. If Fernet tokens are enabled, do not run token tests in Tempest.

Note
Note

During reconfiguration when switching to a Fernet token provider or during Fernet key rotation, you may see a warning in keystone.log stating [fernet_tokens] key_repository is world readable: /etc/keystone/fernet-keys/. This is expected. You can safely ignore this message. For other Keystone operations, you will not see this warning. Directory permissions are actually set to 600 (read/write by owner only), not world readable.

Fernet token-signing key rotation is being handled by a cron job, which is configured on one of the controllers. The controller with the Fernet token-signing key rotation cron job is also known as the Fernet Master node. By default, the Fernet token-signing key is being rotated once every 24 hours. The Fernet token-signing keys are distributed from the Fernet Master node to the rest of the controllers at each rotation. Therefore, the Fernet token-signing keys are consistent for all the controlers at all time.

When enabling Fernet token provider the first time, specific steps are needed to set up the necessary mechanisms for Fernet token-signing key distributions.

  1. Set keystone_configure_fernet to True in ~/openstack/my_cloud/config/keystone/keystone_deploy_config.yml.

  2. Run the following commands to commit your change in Git and enable Fernet:

    ardana > git add my_cloud/config/keystone/keystone_deploy_config.yml
    ardana > git commit -m "enable Fernet token provider"
    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts keystone-deploy.yml

When the Fernet token provider is enabled, a Fernet Master alarm definition is also created on Monasca to monitor the Fernet Master node. If the Fernet Master node is offline or unreachable, a CRITICAL alarm will be raised for the Cloud Admin to take corrective actions. If the Fernet Master node is offline for a prolonged period of time, Fernet token-signing key rotation will not be performed. This may introduce security risks to the cloud. The Cloud Admin must take immediate actions to resurrect the Fernet Master node.

4.9 Integrating LDAP with the Identity Service

4.9.1 Integrating with an external LDAP server

The Keystone identity service provides two primary functions: user authentication and access authorization. The user authentication function validates a user's identity. Keystone has a very basic user management system that can be used to create and manage user login and password credentials but this system is intended only for proof of concept deployments due to the very limited password control functions. The internal identity service user management system is also commonly used to store and authenticate OpenStack-specific service account information.

The recommended source of authentication is external user management systems such as LDAP directory services. The identity service can be configured to connect to and use external systems as the source of user authentication. The identity service domain construct is used to define different authentication sources based on domain membership. For example, cloud deployment could consist of as few as two domains:

  • The default domain that is pre-configured for the service account users that are authenticated directly against the identity service internal user management system

  • A customer-defined domain that contains all user projects and membership definitions. This domain can then be configured to use an external LDAP directory such as Microsoft Active Directory as the authentication source.

SUSE OpenStack Cloud can support multiple domains for deployments that support multiple tenants. Multiple domains can be created with each domain configured to either the same or different external authentication sources. This deployment model is known as a "per-domain" model.

There are currently two ways to configure "per-domain" authentication sources:

  • File store – each domain configuration is created and stored in separate text files. This is the older and current default method for defining domain configurations.

  • Database store – each domain configuration can be created using either the identity service manager utility (recommenced) or a Domain Admin API (from OpenStack.org), and the results are stored in the identity service MariaDB database. This database store is a new method introduced in the OpenStack Kilo release and now available in SUSE OpenStack Cloud.

Instructions for initially creating per-domain configuration files and then migrating to the Database store method via the identity service manager utility are provided as follows.

Important
Important

We do not support enabling LDAP connection pool (that is, use_pool: True) due to an upstream bug. The use_pool parameter must be present and must set to False.

4.9.2 Set up domain-specific driver configuration - file store

To update configuration to a specific LDAP domain:

  1. Ensure that the following configuration options are in the main configuration file template: ~/openstack/my_cloud/config/keystone/keystone.conf.j2

    [identity]
    domain_specific_drivers_enabled = True
    domain_configurations_from_database = False
  2. Create a YAML file that contains the definition of the LDAP server connection. The sample file below is already provided as part of the Cloud Lifecycle Manager in the Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”. It is available on the Cloud Lifecycle Manager in the following file:

    ~/openstack/my_cloud/config/keystone/keystone_configure_ldap_sample.yml

    Save a copy of this file with a new name, for example:

    ~/openstack/my_cloud/config/keystone/keystone_configure_ldap_my.yml
    Note
    Note

    Please refer to the LDAP section of the Keystone configuration example for OpenStack for the full option list and description.

    Below are samples of YAML configurations for identity service LDAP certificate settings, optimized for Microsoft Active Directory server.

    Sample YAML configuration keystone_configure_ldap_my.yml

    ---
    keystone_domainldap_conf:
    
        # CA certificates file content.
        # Certificates are stored in Base64 PEM format. This may be entire LDAP server
        # certificate (in case of self-signed certificates), certificate of authority
        # which issued LDAP server certificate, or a full certificate chain (Root CA
        # certificate, intermediate CA certificate(s), issuer certificate).
        #
        cert_settings:
          cacert: |
            -----BEGIN CERTIFICATE-----
    
            certificate appears here
    
            -----END CERTIFICATE-----
    
        # A domain will be created in MariaDB with this name, and associated with ldap back end.
        # Installer will also generate a config file named /etc/keystone/domains/keystone.<domain_name>.conf
        #
        domain_settings:
          name: ad
          description: Dedicated domain for ad users
    
        conf_settings:
          identity:
             driver: ldap
    
    
          # For a full list and description of ldap configuration options, please refer to
          # https://github.com/openstack/keystone/blob/master/etc/keystone.conf.sample or
          # http://docs.openstack.org/liberty/config-reference/content/keystone-configuration-file.html.
          #
          # Please note:
          #  1. LDAP configuration is read-only. Configuration which performs write operations (i.e. creates users, groups, etc)
          #     is not supported at the moment.
          #  2. LDAP is only supported for identity operations (reading users and groups from LDAP). Assignment
          #     operations with LDAP (i.e. managing roles, projects) are not supported.
          #  3. LDAP is configured as non-default domain. Configuring LDAP as a default domain is not supported.
          #
          ldap:
            url: ldap://ad.hpe.net
            suffix: DC=hpe,DC=net
            query_scope: sub
            user_tree_dn: CN=Users,DC=hpe,DC=net
            user : CN=admin,CN=Users,DC=hpe,DC=net
            password: REDACTED
            user_objectclass: user
            user_id_attribute: cn
            user_name_attribute: cn
            group_tree_dn: CN=Users,DC=hpe,DC=net
            group_objectclass: group
            group_id_attribute: cn
            group_name_attribute: cn
            use_pool: True
            user_enabled_attribute: userAccountControl
            user_enabled_mask: 2
            user_enabled_default: 512
            use_tls: True
            tls_req_cert: demand
            # if you are configuring multiple LDAP domains, and LDAP server certificates are issued
            # by different authorities, make sure that you place certs for all the LDAP backend domains in the
            # cacert parameter as seen in this sample yml file so that all the certs are combined in a single CA file
            # and every LDAP domain configuration points to the combined CA file.
            # Note:
            # 1. Please be advised that every time a new ldap domain is configured, the single CA file gets overwritten
            # and hence ensure that you place certs for all the LDAP backend domains in the cacert parameter.
            # 2. There is a known issue on one cert per CA file per domain when the system processes
            # concurrent requests to multiple LDAP domains. Using the single CA file with all certs combined
            # shall get the system working properly*.
    
            tls_cacertfile: /etc/keystone/ssl/certs/all_ldapdomains_ca.pem
    
            # The issue is in the underlying SSL library. Upstream is not investing in python-ldap package anymore.
            # It is also not python3 compliant.
    keystone_domain_MSAD_conf:
    
        # CA certificates file content.
        # Certificates are stored in Base64 PEM format. This may be entire LDAP server
        # certificate (in case of self-signed certificates), certificate of authority
        # which issued LDAP server certificate, or a full certificate chain (Root CA
        # certificate, intermediate CA certificate(s), issuer certificate).
        #
        cert_settings:
          cacert: |
            -----BEGIN CERTIFICATE-----
    
            certificate appears here
    
            -----END CERTIFICATE-----
    
        # A domain will be created in MariaDB with this name, and associated with ldap back end.
        # Installer will also generate a config file named /etc/keystone/domains/keystone.<domain_name>.conf
        #
            domain_settings:
              name: msad
              description: Dedicated domain for msad users
    
            conf_settings:
              identity:
                driver: ldap
    
        # For a full list and description of ldap configuration options, please refer to
        # https://github.com/openstack/keystone/blob/master/etc/keystone.conf.sample or
        # http://docs.openstack.org/liberty/config-reference/content/keystone-configuration-file.html.
        #
        # Please note:
        #  1. LDAP configuration is read-only. Configuration which performs write operations (i.e. creates users, groups, etc)
        #     is not supported at the moment.
        #  2. LDAP is only supported for identity operations (reading users and groups from LDAP). Assignment
        #     operations with LDAP (i.e. managing roles, projects) are not supported.
        #  3. LDAP is configured as non-default domain. Configuring LDAP as a default domain is not supported.
        #
        ldap:
          # If the url parameter is set to ldap then typically use_tls should be set to True. If
          # url is set to ldaps, then use_tls should be set to False
          url: ldaps://10.16.22.5
          use_tls: False
          query_scope: sub
          user_tree_dn: DC=l3,DC=local
          # this is the user and password for the account that has access to the AD server
          user: administrator@l3.local
          password: OpenStack123
          user_objectclass: user
          # For a default Active Directory schema this is where to find the user name, openldap uses a different value
          user_id_attribute: userPrincipalName
          user_name_attribute: sAMAccountName
          group_tree_dn: DC=l3,DC=local
          group_objectclass: group
          group_id_attribute: cn
          group_name_attribute: cn
          # An upstream defect requires use_pool to be set false
          use_pool: False
          user_enabled_attribute: userAccountControl
          user_enabled_mask: 2
          user_enabled_default: 512
          tls_req_cert: allow
          # Referals may contain urls that can't be resolved and will cause timeouts, ignore them
          chase_referrals: False
          # if you are configuring multiple LDAP domains, and LDAP server certificates are issued
          # by different authorities, make sure that you place certs for all the LDAP backend domains in the
          # cacert parameter as seen in this sample yml file so that all the certs are combined in a single CA file
          # and every LDAP domain configuration points to the combined CA file.
          # Note:
          # 1. Please be advised that every time a new ldap domain is configured, the single CA file gets overwritten
          # and hence ensure that you place certs for all the LDAP backend domains in the cacert parameter.
          # 2. There is a known issue on one cert per CA file per domain when the system processes
          # concurrent requests to multiple LDAP domains. Using the single CA file with all certs combined
          # shall get the system working properly.
    
          tls_cacertfile: /etc/keystone/ssl/certs/all_ldapdomains_ca.pem
  3. As suggested in Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”, commit the new file to the local git repository, and rerun the configuration processor and ready deployment playbooks:

    ardana > cd ~/openstack
    ardana > git checkout site
    ardana > git add my_cloud/config/keystone/keystone_configure_ldap_my.yml
    ardana > git commit -m "Adding LDAP server integration config"
    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  4. Run the reconfiguration playbook in a deployment area, passing the YAML file created in the previous step as a command-line option:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml -e@~/openstack/my_cloud/config/keystone/keystone_configure_ldap_my.yml
  5. Follow these same steps for each LDAP domain with which you are integrating the identity service, creating a YAML file for each and running the reconfigure playbook once for each additional domain.

  6. Ensure that a new domain was created for LDAP (Microsoft AD in this example) and set environment variables for admin level access

    ardana > source keystone.osrc

    Get a list of domains

    ardana > openstack domain list

    As output here:

    +----------------------------------+---------+---------+----------------------------------------------------------------------+
    | ID                               | Name    | Enabled | Description                                                          |
    +----------------------------------+---------+---------+----------------------------------------------------------------------+
    | 6740dbf7465a4108a36d6476fc967dbd | heat    | True    | Owns users and projects created by heat                              |
    | default                          | Default | True    | Owns users and tenants (i.e. projects) available on Identity API v2. |
    | b2aac984a52e49259a2bbf74b7c4108b | ad      | True    | Dedicated domain for users managed by Microsoft AD server            |
    +----------------------------------+---------+---------+----------------------------------------------------------------------+
    Note
    Note

    LDAP domain is read-only. This means that you cannot create new user or group records in it.

  7. Once the LDAP user is granted the appropriate role, he can authenticate within the specified domain. Set environment variables for admin-level access

    ardana > source keystone.osrc

    Get user record within the ad (Active Directory) domain

    ardana > openstack user show testuser1 --domain ad

    Note the output:

    +-----------+------------------------------------------------------------------+
    | Field     | Value                                                            |
    +-----------+------------------------------------------------------------------+
    | domain_id | 143af847018c4dc7bd35390402395886                                 |
    | id        | e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 |
    | name      | testuser1                                                        |
    +-----------+------------------------------------------------------------------+

    Now, get list of LDAP groups:

    ardana > openstack group list --domain ad

    Here you see testgroup1 and testgroup2:

    +------------------------------------------------------------------+------------+
    |  ID                                                              | Name       |
    +------------------------------------------------------------------+------------+
    |  03976b0ea6f54a8e4c0032e8f756ad581f26915c7e77500c8d4aaf0e83afcdc6| testgroup1 |
    7ba52ee1c5829d9837d740c08dffa07ad118ea1db2d70e0dc7fa7853e0b79fcf   | testgroup2 |
    +------------------------------------------------------------------+------------+

    Create a new role. Note that the role is not bound to the domain.

    ardana > openstack role create testrole1

    Testrole1 has been created:

    +-------+----------------------------------+
    | Field | Value                            |
    +-------+----------------------------------+
    | id    | 02251585319d459ab847409dea527dee |
    | name  | testrole1                        |
    +-------+----------------------------------+

    Grant the user a role within the domain by executing the code below. Note that due to a current OpenStack CLI limitation, you must use the user ID rather than the user name when working with a non-default domain.

    ardana > openstack role add testrole1 --user e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 --domain ad

    Verify that the role was successfully granted, as shown here:

    ardana > openstack role assignment list --user e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 --domain ad
    +----------------------------------+------------------------------------------------------------------+-------+---------+----------------------------------+
    | Role                             | User                                                             | Group | Project | Domain                           |
    +----------------------------------+------------------------------------------------------------------+-------+---------+----------------------------------+
    | 02251585319d459ab847409dea527dee | e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 |       |         | 143af847018c4dc7bd35390402395886 |
    +----------------------------------+------------------------------------------------------------------+-------+---------+----------------------------------+

    Authenticate (get a domain-scoped token) as a new user with a new role. The --os-* command-line parameters specified below override the respective OS_* environment variables set by the keystone.osrc script to provide admin access. To ensure that the command below is executed in a clean environment, you may want log out from the node and log in again.

    ardana > openstack --os-identity-api-version 3 \
                --os-username testuser1 \
                --os-password testuser1_password \
                --os-auth-url http://10.0.0.6:35357/v3 \
                --os-domain-name ad \
                --os-user-domain-name ad \
                token issue

    Here is the result:

    +-----------+------------------------------------------------------------------+
    | Field     | Value                                                            |
    +-----------+------------------------------------------------------------------+
    | domain_id | 143af847018c4dc7bd35390402395886                                 |
    | expires   | 2015-09-09T21:36:15.306561Z                                      |
    | id        | 6f8f9f1a932a4d01b7ad9ab061eb0917                                 |
    | user_id   | e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 |
    +-----------+------------------------------------------------------------------+
  8. Users can also have a project within the domain and get a project-scoped token. To accomplish this, set environment variables for admin level access:

    ardana > source keystone.osrc

    Then create a new project within the domain:

    ardana > openstack project create testproject1 --domain ad

    The result shows that they have been created:

    +-------------+----------------------------------+
    | Field       | Value                            |
    +-------------+----------------------------------+
    | description |                                  |
    | domain_id   | 143af847018c4dc7bd35390402395886 |
    | enabled     | True                             |
    | id          | d065394842d34abd87167ab12759f107 |
    | name        | testproject1                     |
    +-------------+----------------------------------+

    Grant the user a role with a project, re-using the role created in the previous example. Note that due to a current OpenStack CLI limitation, you must use user ID rather than user name when working with a non-default domain.

    ardana > openstack role add testrole1 --user e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 --project testproject1

    Verify that the role was successfully granted by generating a list:

    ardana > openstack role assignment list --user e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 --project testproject1

    The output shows the result:

    +----------------------------------+------------------------------------------------------------------+-------+----------------------------------+--------+
    | Role                             | User                                                             | Group | Project                          | Domain |
    +----------------------------------+------------------------------------------------------------------+-------+----------------------------------+--------+
    | 02251585319d459ab847409dea527dee | e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 |       | d065394842d34abd87167ab12759f107 |        |
    +----------------------------------+------------------------------------------------------------------+-------+----------------------------------+--------+

    Authenticate (get a project-scoped token) as the new user with a new role. The --os-* command line parameters specified below override their respective OS_* environment variables set by keystone.osrc to provide admin access. To ensure that the command below is executed in a clean environment, you may want log out from the node and log in again. Note that both the --os-project-domain-name and --os-project-user-name parameters are needed to verify that both user and project are not in the default domain.

    ardana > openstack --os-identity-api-version 3 \
                --os-username testuser1 \
                --os-password testuser1_password \
                --os-auth-url http://10.0.0.6:35357/v3 \
                --os-project-name testproject1 \
                --os-project-domain-name ad \
                --os-user-domain-name ad \
                token issue

    Below is the result:

    +------------+------------------------------------------------------------------+
    | Field      | Value                                                            |
    +------------+------------------------------------------------------------------+
    | expires    | 2015-09-09T21:50:49.945893Z                                      |
    | id         | 328e18486f69441fb13f4842423f52d1                                 |
    | project_id | d065394842d34abd87167ab12759f107                                 |
    | user_id    | e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 |
    +------------+------------------------------------------------------------------+

4.9.3 Set up or switch to domain-specific driver configuration using a database store

To make the switch, execute the steps below. Remember, you must have already set up the configuration for a file store as explained in Section 4.9.2, “Set up domain-specific driver configuration - file store”, and it must be working properly.

  1. Ensure that the following configuration options are set in the main configuration file, ~/openstack/my_cloud/config/keystone/keystone.conf.j2:

    [identity]
    domain_specific_drivers_enabled = True
    domain_configurations_from_database = True
    
    [domain_config]
    driver = sql
  2. Once the template is modified, commit the change to the local git repository, and rerun the configuration processor / deployment area preparation playbooks (as suggested at Using Git for Configuration Management):

    ardana > cd ~/openstack
    ardana > git checkout site
    ardana > git add -A

    Verify that the files have been added using git status:

    ardana > git status

    Then commit the changes:

    ardana > git commit -m "Use Domain-Specific Driver Configuration - Database Store: more description here..."

    Next, run the configuration processor and ready deployment playbooks:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  3. Run the reconfiguration playbook in a deployment area:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml
  4. Upload the domain-specific config files to the database if they have not been loaded. If they have already been loaded and you want to switch back to database store mode, then skip this upload step and move on to step 5.

    1. Go to one of the controller nodes where Keystone is deployed.

    2. Verify that domain-specific driver configuration files are located under the directory (default /etc/keystone/domains) with the format: keystone.<domain name>.conf Use the Keystone manager utility to load domain-specific config files to the database. There are two options for uploading the files:

      1. Option 1: Upload all configuration files to the SQL database:

        ardana > keystone-manage domain_config_upload --all
      2. Option 2: Upload individual domain-specific configuration files by specifying the domain name one by one:

        ardana > keystone-manage domain_config_upload --domain-name domain name

        Here is an example:

        keystone-manage domain_config_upload --domain-name ad

        Note that the Keystone manager utility does not upload the domain-specific driver configuration file the second time for the same domain. For the management of the domain-specific driver configuration in the database store, you may refer to OpenStack Identity API - Domain Configuration.

  5. Verify that the switched domain driver configuration for LDAP (Microsoft AD in this example) in the database store works properly. Then set the environment variables for admin level access:

    ardana > source ~/keystone.osrc

    Get a list of domain users:

    ardana > openstack user list --domain ad

    Note the three users returned:

    +------------------------------------------------------------------+------------+
    | ID                                                               | Name       |
    +------------------------------------------------------------------+------------+
    | e7dbec51ecaf07906bd743debcb49157a0e8af557b860a7c1dadd454bdab03fe | testuser1  |
    | 8a09630fde3180c685e0cd663427e8638151b534a8a7ccebfcf244751d6f09bd | testuser2  |
    | ea463d778dadcefdcfd5b532ee122a70dce7e790786678961420ae007560f35e | testuser3  |
    +------------------------------------------------------------------+------------+

    Get user records within the ad domain:

    ardana > openstack user show testuser1 --domain ad

    Here testuser1 is returned:

    +-----------+------------------------------------------------------------------+
    | Field     | Value                                                            |
    +-----------+------------------------------------------------------------------+
    | domain_id | 143af847018c4dc7bd35390402395886                                 |
    | id        | e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 |
    | name      | testuser1                                                        |
    +-----------+------------------------------------------------------------------+

    Get a list of LDAP groups:

    ardana > openstack group list --domain ad

    Note that testgroup1 and testgroup2 are returned:

    +------------------------------------------------------------------+------------+
    | ID                                                               | Name       |
    +------------------------------------------------------------------+------------+
    | 03976b0ea6f54a8e4c0032e8f756ad581f26915c7e77500c8d4aaf0e83afcdc6 | testgroup1 |
    | 7ba52ee1c5829d9837d740c08dffa07ad118ea1db2d70e0dc7fa7853e0b79fcf | testgroup2 |
    +------------------------------------------------------------------+------------+
    Note
    Note

    LDAP domain is read-only. This means that you cannot create new user or group records in it.

4.9.4 Domain-specific driver configuration. Switching from a database to a file store

Following is the procedure to switch a domain-specific driver configuration from a database store to a file store. It is assumed that:

  • The domain-specific driver configuration with a database store has been set up and is working properly.

  • Domain-specific driver configuration files with the format: keystone.<domain name>.conf have already been located and verified in the specific directory (by default, /etc/keystone/domains/) on all of the controller nodes.

  1. Ensure that the following configuration options are set in the main configuration file template in ~/openstack/my_cloud/config/keystone/keystone.conf.j2:

    [identity]
     domain_specific_drivers_enabled = True
     domain_configurations_from_database = False
    
    [domain_config]
    # driver = sql
  2. Once the template is modified, commit the change to the local git repository, and rerun the configuration processor / deployment area preparation playbooks (as suggested at Using Git for Configuration Management):

    ardana > cd ~/openstack
    ardana > git checkout site
    ardana > git add -A

    Verify that the files have been added using git status, then commit the changes:

    ardana > git status
    ardana > git commit -m "Domain-Specific Driver Configuration - Switch From Database Store to File Store: more description here..."

    Then run the configuration processor and ready deployment playbooks:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  3. Run reconfiguration playbook in a deployment area:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml
  4. Verify that the switched domain driver configuration for LDAP (Microsoft AD in this example) using file store works properly: Set environment variables for admin level access

    ardana > source ~/keystone.osrc

    Get list of domain users:

    ardana > openstack user list --domain ad

    Here you see the three users:

    +------------------------------------------------------------------+------------+
    | ID                                                               | Name       |
    +------------------------------------------------------------------+------------+
    | e7dbec51ecaf07906bd743debcb49157a0e8af557b860a7c1dadd454bdab03fe | testuser1  |
    | 8a09630fde3180c685e0cd663427e8638151b534a8a7ccebfcf244751d6f09bd | testuser2  |
    | ea463d778dadcefdcfd5b532ee122a70dce7e790786678961420ae007560f35e | testuser3  |
    +------------------------------------------------------------------+------------+

    Get user records within the ad domain:

    ardana > openstack user show testuser1 --domain ad

    Here is the result:

    +-----------+------------------------------------------------------------------+
    | Field     | Value                                                            |
    +-----------+------------------------------------------------------------------+
    | domain_id | 143af847018c4dc7bd35390402395886                                 |
    | id        | e6d8c90abdc4510621271b73cc4dda8bc6009f263e421d8735d5f850f002f607 |
    | name      | testuser1                                                        |
    +-----------+------------------------------------------------------------------+

    Get a list of LDAP groups:

    ardana > openstack group list --domain ad

    Here are the groups returned:

    +------------------------------------------------------------------+------------+
    | ID                                                               | Name       |
    +------------------------------------------------------------------+------------+
    | 03976b0ea6f54a8e4c0032e8f756ad581f26915c7e77500c8d4aaf0e83afcdc6 | testgroup1 |
    | 7ba52ee1c5829d9837d740c08dffa07ad118ea1db2d70e0dc7fa7853e0b79fcf | testgroup2 |
    +------------------------------------------------------------------+------------+

    Note: Note: LDAP domain is read-only. This means that you can not create new user or group record in it.

4.9.5 Update LDAP CA certificates

There is a chance that LDAP CA certificates may expire or for some reason not work anymore. Below are steps to update the LDAP CA certificates on the identity service side. Follow the steps below to make the updates.

  1. Locate the file keystone_configure_ldap_certs_sample.yml

    ~/openstack/my_cloud/config/keystone/keystone_configure_ldap_certs_sample.yml
  2. Save a copy of this file with a new name, for example:

    ~/openstack/my_cloud/config/keystone/keystone_configure_ldap_certs_all.yml
  3. Edit the file and specify the correct single file path name for the ldap certificates. This file path name has to be consistent with the one defined in tls_cacertfile of the domain-specific configuration. Edit the file and populate or update it with LDAP CA certificates for all LDAP domains.

  4. As suggested in Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”, add the new file to the local git repository:

    ardana > cd ~/openstack
    ardana > git checkout site
    ardana > git add -A

    Verify that the files have been added using git status and commit the file:

    ardana > git status
    ardana > git commit -m "Update LDAP CA certificates: more description here..."

    Then run the configuration processor and ready deployment playbooks:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  5. Run the reconfiguration playbook in the deployment area:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml -e@~/openstack/my_cloud/config/keystone/keystone_configure_ldap_certs_all.yml

4.9.6 Limitations

SUSE OpenStack Cloud 8 domain-specific configuration:

  • No Global User Listing: Once domain-specific driver configuration is enabled, listing all users and listing all groups are not supported operations. Those calls require a specific domain filter and a domain-scoped token for the target domain.

  • You cannot have both a file store and a database store for domain-specific driver configuration in a single identity service instance. Once a database store is enabled within the identity service instance, any file store will be ignored, and vice versa.

  • The identity service allows a list limit configuration to globally set the maximum number of entities that will be returned in an identity collection per request but it does not support per-domain list limit setting at this time.

  • Each time a new domain is configured with LDAP integration the single CA file gets overwritten. Ensure that you place certs for all the LDAP back-end domains in the cacert parameter. Detailed CA file inclusion instructions are provided in the comments of the sample YAML configuration file keystone_configure_ldap_my.yml (Section 4.9.2, “Set up domain-specific driver configuration - file store”).

  • LDAP is only supported for identity operations (reading users and groups from LDAP).

  • Keystone assignment operations from LDAP records such as managing or assigning roles and projects, are not currently supported.

  • The SUSE OpenStack Cloud 'default' domain is pre-configured to store service account users and is authenticated locally against the identity service. Domains configured for external LDAP integration are non-default domains.

  • When using the current OpenStackClient CLI you must use the user ID rather than the user name when working with a non-default domain.

  • Each LDAP connection with the identity service is for read-only operations. Configurations that require identity service write operations (to create users, groups, etc.) are not currently supported.

  • LDAP is only supported for identity operations (reading users and groups from LDAP). Keystone assignment operations from LDAP records such as managing or assigning roles and projects, are not currently supported.

  • When using the current OpenStackClient CLI you must use the user ID rather than the user name when working with a non-default domain.

SUSE OpenStack Cloud 8 API-based domain-specific configuration management

  • No GUI dashboard for domain-specific driver configuration management

  • API-based Domain specific config does not check for type of option.

  • API-based Domain specific config does not check for option values supported.

  • API-based Domain config method does not provide retrieval of default values of domain-specific configuration options.

  • Status: Domain-specific driver configuration database store is a non-core feature for SUSE OpenStack Cloud 8.

Note
Note

When integrating with an external identity provider, cloud security is dependent upon the security of that identify provider. You should examine the security of the identity provider, and in particular the SAML 2.0 token generation process and decide what security properties you need to ensure adequate security of your cloud deployment. More information about SAML can be found at https://www.owasp.org/index.php/SAML_Security_Cheat_Sheet.

4.10 Keystone-to-Keystone Federation

This topic explains how you can use one instance of Keystone as an identity provider and one as a service provider.

4.10.1 What Is Keystone-to-Keystone Federation?

Identity federation lets you configure SUSE OpenStack Cloud using existing identity management systems such as an LDAP directory as the source of user access authentication. The Keystone-to-Keystone federation (K2K) function extends this concept for accessing resources in multiple, separate SUSE OpenStack Cloud clouds. You can configure each cloud to trust the authentication credentials of other clouds to provide the ability for users to authenticate with their home cloud and to access authorized resources in another cloud without having to reauthenticate with the remote cloud. This function is sometimes referred to as "single sign-on" or SSO.

The SUSE OpenStack Cloud cloud that provides the initial user authentication is called the identity provider (IdP). The identity provider cloud can support domain-based authentication against external authentication sources including LDAP-based directories such as Microsoft Active Directory. The identity provider creates the user attributes, known as assertions, which are used to automatically authenticate users with other SUSE OpenStack Cloud clouds.

An SUSE OpenStack Cloud cloud that provides resources is called a service provider (SP). A service provider cloud accepts user authentication assertions from the identity provider and provides access to project resources based on the mapping file settings developed for each service provider cloud. The following are characteristics of a service provider:

  • Each service provider cloud has a unique set of projects, groups, and group role assignments that are created and managed locally.

  • The mapping file consists a set of rules that define user group membership.

  • The mapping file enables the ability to auto-assign incoming users to a specific group. Project membership and access are defined by group membership.

  • Project quotas are defined locally by each service provider cloud.

Keystone-to-Keystone federation is supported and enabled in SUSE OpenStack Cloud 8 using configuration parameters in specific Ansible files. Instructions are provided to define and enable the required configurations.

Support for Keystone-to-Keystone federation happens on the API level, and you must implement it using your own client code by calling the supported APIs. Python-keystoneclient has supported APIs to access the K2K APIs.

Example 4.1: k2kclient.py

The following k2kclient.py file is an example, and the request diagram Figure 4.1, “Keystone Authentication Flow” explains the flow of client requests.

import json
import os
import requests

import xml.dom.minidom

from keystoneclient.auth.identity import v3
from keystoneclient import session

class K2KClient(object):

    def __init__(self):
        # IdP auth URL
        self.auth_url = "http://192.168.245.9:35357/v3/"
        self.project_name = "admin"
        self.project_domain_name = "Default"
        self.username = "admin"
        self.password = "vvaQIZ1S"
        self.user_domain_name = "Default"
        self.session = requests.Session()
        self.verify = False
        # identity provider Id
        self.idp_id = "z420_idp"
        # service provider Id
        self.sp_id = "z620_sp"
        #self.sp_ecp_url = "https://16.103.149.44:8443/Shibboleth.sso/SAML2/ECP"
        #self.sp_auth_url = "https://16.103.149.44:8443/v3"

    def v3_authenticate(self):
        auth = v3.Password(auth_url=self.auth_url,
                           username=self.username,
                           password=self.password,
                           user_domain_name=self.user_domain_name,
                           project_name=self.project_name,
                           project_domain_name=self.project_domain_name)

        self.auth_session = session.Session(session=requests.session(),
                                       auth=auth, verify=self.verify)
        auth_ref = self.auth_session.auth.get_auth_ref(self.auth_session)
        self.token = self.auth_session.auth.get_token(self.auth_session)

    def _generate_token_json(self):
        return {
            "auth": {
                "identity": {
                    "methods": [
                        "token"
                    ],
                    "token": {
                        "id": self.token
                    }
                },
                "scope": {
                    "service_provider": {
                        "id": self.sp_id
                    }
                }
            }
        }

    def get_saml2_ecp_assertion(self):
        token = json.dumps(self._generate_token_json())
        url = self.auth_url + 'auth/OS-FEDERATION/saml2/ecp'
        r = self.session.post(url=url,
                              data=token,
                              verify=self.verify)
        if not r.ok:
            raise Exception("Something went wrong, %s" % r.__dict__)
        self.ecp_assertion = r.text

    def _get_sp_url(self):
        url = self.auth_url + 'OS-FEDERATION/service_providers/' + self.sp_id
        r = self.auth_session.get(
           url=url,
           verify=self.verify)
        if not r.ok:
            raise Exception("Something went wrong, %s" % r.__dict__)

        sp = json.loads(r.text)[u'service_provider']
        self.sp_ecp_url = sp[u'sp_url']
        self.sp_auth_url = sp[u'auth_url']

    def _handle_http_302_ecp_redirect(self, response, method, **kwargs):
        location = self.sp_auth_url + '/OS-FEDERATION/identity_providers/' + self.idp_id + '/protocols/saml2/auth'
        return self.auth_session.request(location, method, authenticated=False, **kwargs)

    def exchange_assertion(self):
        """Send assertion to a Keystone SP and get token."""
        self._get_sp_url()
        print("SP ECP Url:%s" % self.sp_ecp_url)
        print("SP Auth Url:%s" % self.sp_auth_url)
        #self.sp_ecp_url = 'https://16.103.149.44:8443/Shibboleth.sso/SAML2/ECP'
        r = self.auth_session.post(
            self.sp_ecp_url,
            headers={'Content-Type': 'application/vnd.paos+xml'},
            data=self.ecp_assertion,
            authenticated=False, redirect=False)
        r = self._handle_http_302_ecp_redirect(r, 'GET',
            headers={'Content-Type': 'application/vnd.paos+xml'})
        self.fed_token_id = r.headers['X-Subject-Token']
        self.fed_token = r.text

if __name__ == "__main__":
    client = K2KClient()
    client.v3_authenticate()
    client.get_saml2_ecp_assertion()
    client.exchange_assertion()
    print('Unscoped token_id: %s' % client.fed_token_id)
    print('Unscoped token body:
%s' % client.fed_token)

4.10.2 Setting Up a Keystone Provider

To set up Keystone as a service provider, follow these steps.

  1. Create a config file called k2k.yml with the following parameters and place it in any directory on your Cloud Lifecycle Manager, such as /tmp.

    keystone_trusted_idp: k2k
    keystone_sp_conf:
      shib_sso_idp_entity_id: <protocol>://<idp_host>:<port>/v3/OS-FEDERATION/saml2/idp
      shib_sso_application_entity_id: http://service_provider_uri_entityId
      target_domain:
        name: domain1
        description: my domain
      target_project:
        name: project1
        description: my project
      target_group:
        name: group1
        description: my group
      role:
        name: service
      idp_metadata_file: /tmp/idp_metadata.xml
      identity_provider:
        id: my_idp_id
        description: This is the identity service provider.
      mapping:
        id: mapping1
        rules_file: /tmp/k2k_sp_mapping.json
      protocol:
        id: saml2
      attribute_map:
        -
          name: name1
          id: id1

    The following are descriptions of each of the attributes.

    AttributeDefinition
    keystone_trusted_idp

    A flag to indicate if this configuration is used for Keystone-to-Keystone or WebSSO. The value can be either k2k or adfs.

    keystone_sp_conf  
    shib_sso_idp_entity_id

    The identity provider URI used as an entity Id to identity the IdP. You shoud use the following value: <protocol>://<idp_host>:<port>/v3/OS-FEDERATION/saml2/idp.

    shib_sso_application_entity_id

    The service provider URI used as an entity Id. It can be any URI here for Keystone-to-Keystone.

    target_domain

    A domain where the group will be created.

    name

    Any domain name. If it does not exist, it will be created or updated.

    description

    Any description.

    target_project

    A project scope of the group.

    name

    Any project name. If it does not exist, it will be created or updated.

    descriptionAny description.
    target_group

    A group will be created from target_domain.

    name

    Any group name. If it does not exist, it will be created or updated.

    descriptionAny description.
    role

    A role will be assigned on target_project. This role impacts the IdP user scoped token permission on the service provider side.

    nameMust be an existing role.
    idp_metadata_file

    A reference to the IdP metadata file that validates the SAML2 assertion.

    identity_providerA supported IdP.
    id

    Any Id. If it does not exist, it will be created or updated. This Id needs to be shared with the client so that the right mapping will be selected.

    descriptionAny description.
    mapping

    A mapping in JSON format that maps a federated user to a corresponding group.

    id

    Any Id. If it does not exist, it will be created or updated.

    rules_file

    A reference to the file that has the mapping in JSON.

    protocol

    The supported federation protocol.

    id

    Security Assertion Markup Language 2.0 (SAML2) is the only supported protocol for K2K.

    attribute_map

    A shibboleth mapping that defines additional attributes to map the attributes from the SAML2 assertion to the K2K mapping that the service provider understands. K2K does not require any additional attribute mapping.

    nameAn attribute name from the SAML2 assertion.
    idAn Id that the preceding name will be mapped to.
  2. Create a metadata file that is referenced from k2k.yml, such as /tmp/idp_metadata.xml. The content of the metadata file comes from the identity provider and can be found in /etc/keystone/idp_metadata.xml.

    1. Create a mapping file that is referenced in k2k.yml, shown previously. An example is /tmp/k2k_sp_mapping.json. You can see the reference in bold in the preceding k2k.yml example. The following is an example of the mapping file.

      [
        {
          "local": [
            {
              "user": {
                "name": "{0}"
              }
            },
            {
              "group": {
                 "name": "group1",
                 "domain":{
                   "name": "domain1"
                 }
              }
            }
          ],
          "remote":[{
            "type": "openstack_user"
          },
          {
            "type": "Shib-Identity-Provider",
            "any_one_of":[
               "https://idp_host:5000/v3/OS-FEDERATION/saml2/idp"
            ]
           }
          ]
         }
      ]

      You can find more information on how the K2K mapping works at http://docs.openstack.org.

  3. Go to ~/stack/scratch/ansible/next/ardana/ansible and run the following playbook to enable the service provider:

    ardana > ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml -e@/tmp/k2k.yml

Setting Up an Identity Provider

To set up Keystone as an identity provider, follow these steps:

  1. Create a config file k2k.yml with the following parameters and place it in any directory on your Cloud Lifecycle Manager, such as /tmp. Note that the certificate and key here are excerpted for space.

    keystone_k2k_idp_conf:
        service_provider:
              -
                id: my_sp_id
                description: This is service provider.
                sp_url: https://sp_host:5000
                auth_url: https://sp_host:5000/v3
        signer_cert: -----BEGIN CERTIFICATE-----
    MIIDmDCCAoACCQDS+ZDoUfr
        cIzANBgkqhkiG9w0BAQsFADCBjDELMAkGA1UEBhMC\ nVVMxEzARBgNVB
        AgMCkNhbGlmb3JuaWExEjAQBgNVBAcMCVN1bm55dmFsZTEMMAoG\
       
                ...
        nOpKEvhlMsl5I/tle
    -----END CERTIFICATE-----
        signer_key: -----BEGIN RSA PRIVATE KEY-----
    MIIEowIBAAKCAQEA1gRiHiwSO6L5PrtroHi/f17DQBOpJ1KMnS9FOHS
                
                ...

    The following are descriptions of each of the attributes under keystone_k2k_idp_conf

    service_provider

    One or more service providers can be defined. If it does not exist, it will be created or updated.

    id

    Any Id. If it does not exist, it will be created or updated. This Id needs to be shared with the client so that it knows where the service provider is.

    description

    Any description.

    sp_url

    Service provider base URL.

    auth_url

    Service provider auth URL.

    signer_cert

    Content of self-signed certificate that is embedded in the metadata file. We recommend setting the validity for a longer period of time, such as 3650 days (10 years).

    signer_key

    A private key that has a key size of 2048 bits.

  2. Create a private key and a self-signed certificate. The command-line tool, openssl, is required to generate the keys and certificates. If the system does not have it, you must install it.

    1. Create a private key of size 2048.

      ardana > openssl genrsa -out myidp.key 2048
    2. Generate a certificate request named myidp.csr. When prompted, choose CommonName for the server's hostname.

      ardana > openssl req -new -key myidp.key -out myidp.csr
    3. Generate a self-signed certificate named myidp.cer.

      ardana > openssl x509 -req -days 3650 -in myidp.csr -signkey myidp.key -out myidp.cer
  3. Go to ~/scratch/ansible/next/ardana/ansible and run the following playbook to enable the service provider in Keystone:

    ardana > ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml -e@/tmp/k2k.yml

4.10.3 Test It Out

You can use the script listed earlier, k2kclient.py (Example 4.1, “k2kclient.py”), as an example for the end-to-end flows. To run k2kclient.py, follow these steps:

  1. A few parameters must be changed in the beginning of k2kclient.py. For example, enter your specific URL, project name, and user name, as follows:

    # IdP auth URL
    self.auth_url = "http://idp_host:5000/v3/"
    self.project_name = "my_project_name"
    self.project_domain_name = "my_project_domain_name"
    self.username = "test"
    self.password = "mypass"
    self.user_domain_name = "my_domain"
    # identity provider Id that is defined in the SP config
    self.idp_id = "my_idp_id"
    # service provider Id that is defined in the IdP config
    self.sp_id = "my_sp_id"
  2. Install python-keystoneclient along with its dependencies.

  3. Run the k2kclient.py script. An unscoped token will be returned from the service provider.

At this point, the domain or project scope of the unscoped taken can be discovered by sending the following URLs:

ardana > curl -k -X GET -H "X-Auth-Token: unscoped token" \
 https://<sp_public_endpoint>:5000/v3/OS-FEDERATION/domains
ardana > curl -k -X GET -H "X-Auth-Token: unscoped token" \
 https://<sp_public_endpoint:5000/v3/OS-FEDERATION/projects

4.10.4 Inside Keystone-to-Keystone Federation

K2K federation places a lot of responsibility with the user. The complexity is apparent from the following diagram.

  1. Users must first authenticate to their home or local cloud, or local identity provider Keystone instance to obtain a scoped token.

  2. Users must discover which service providers (or remote clouds) are available to them by querying their local cloud.

  3. For a given remote cloud, users must discover which resources are available to them by querying the remote cloud for the projects they can scope to.

  4. To talk to the remote cloud, users must first exchange, with the local cloud, their locally scoped token for a SAML2 assertion to present to the remote cloud.

  5. Users then present the SAML2 assertion to the remote cloud. The remote cloud applies its mapping for the incoming SAML2 assertion to map each user to a local ephemeral persona (such as groups) and issues an unscoped token.

  6. Users query the remote cloud for the list of projects they have access to.

  7. Users then rescope their token to a given project.

  8. Users now have access to the resources owned by the project.

The following diagram illustrates the flow of authentication requests.

Keystone Authentication Flow
Figure 4.1: Keystone Authentication Flow

4.10.5 Additional Testing Scenarios

The following tests assume one identity provider and one service provider.

Test Case 1: Any federated user in the identity provider maps to a single designated group in the service provider

  1. On the identity provider side:

    hostname=myidp.com
    username=user1
  2. On the service provider side:

    group=group1
    group_domain_name=domain1
    'group1' scopes to 'project1'
  3. Mapping used:

    testcase1_1.json

    testcase1_1.json

    [
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group1",
               "domain":{
                 "name": "domain1"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
    ]
  4. Expected result: The federated user will scope to project1.

Test Case 2: A federated user in a specific domain in the identity provider maps to two different groups in the service provider

  1. On the identity provider side:

    hostname=myidp.com
    username=user1
    user_domain_name=Default
  2. On the service provider side:

    group=group1
    group_domain_name=domain1
    'group1' scopes to 'project1' group=group2
    group_domain_name=domain2
    'group2' scopes to 'project2'
  3. Mapping used:

    testcase1_2.json

    testcase1_2.json

    [
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group1",
               "domain":{
                 "name": "domain1"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group2",
               "domain":{
                 "name": "domain2"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "openstack_user_domain",
          "any_one_of": [
              "Default"
          ]
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
    ]
  4. Expected result: The federated user will scope to both project1 and project2.

Test Case 3: A federated user with a specific project in the identity provider maps to a specific group in the service provider

  1. On the identity provider side:

    hostname=myidp.com
    username=user4
    user_project_name=test1
  2. On the service provider side:

    group=group4
    group_domain_name=domain4
    'group4' scopes to 'project4'
  3. Mapping used:

    testcase1_3.json

    testcase1_3.json

    [
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group4",
               "domain":{
                 "name": "domain4"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "openstack_project",
          "any_one_of": [
              "test1"
          ]
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       },
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group5",
               "domain":{
                 "name": "domain5"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "openstack_roles",
          "not_any_of": [
              "_member_"
          ]
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
    ]
  4. Expected result: The federated user will scope to project4.

Test Case 4: A federated user with a specific role in the identity provider maps to a specific group in the service provider

  1. On the identity provider side:

    hostname=myidp.com, username=user5, role_name=_member_
  2. On the service provider side:

    group=group5, group_domain_name=domain5, 'group5' scopes to 'project5'
  3. Mapping used:

    testcase1_3.json

    testcase1_3.json

    [
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group4",
               "domain":{
                 "name": "domain4"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "openstack_project",
          "any_one_of": [
              "test1"
          ]
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       },
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group5",
               "domain":{
                 "name": "domain5"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "openstack_roles",
          "not_any_of": [
              "_member_"
          ]
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
    ]
  4. Expected result: The federated user will scope to project5.

Test Case 5: Retain the previous scope for a federated user

  1. On the identity provider side:

    hostname=myidp.com, username=user1, user_domain_name=Default
  2. On the service provider side:

    group=group1, group_domain_name=domain1, 'group1' scopes to 'project1'
  3. Mapping used:

    testcase1_1.json

    testcase1_1.json

    [
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group1",
               "domain":{
                 "name": "domain1"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
    ]
  4. Expected result: The federated user will scope to project1. Later, we would like to scope federated users who have the default domain in the identity provider to project2 in addition to project1.

  5. On the identity provider side:

    hostname=myidp.com, username=user1, user_domain_name=Default
  6. On the service provider side:

    group=group1
    group_domain_name=domain1
    'group1' scopes to 'project1' group=group2
    group_domain_name=domain2
    'group2' scopes to 'project2'
  7. Mapping used:

    testcase1_2.json

    testcase1_2.json

    [
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group1",
               "domain":{
                 "name": "domain1"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group2",
               "domain":{
                 "name": "domain2"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "openstack_user_domain",
          "any_one_of": [
              "Default"
          ]
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
    ]
  8. Expected result: The federated user will scope to project1 and project2.

Test Case 6: Scope a federated user to a domain

  1. On the identity provider side:

    hostname=myidp.com, username=user1
  2. On the service provider side:

    group=group1, group_domain_name=domain1, 'group1' scopes to 'project1'
  3. Mapping used:

    testcase1_1.json

    testcase1_1.json

    [
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group1",
               "domain":{
                 "name": "domain1"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
    ]
  4. Expected result:

    • The federated user will scope to project1.

    • User uses CLI/Curl to assign any existing role to group1 on domain1.

    • User uses CLI/Curl to remove project1 scope from group1.

  5. Final result: The federated user will scope to domain1.

Test Case 7: Test five remote attributes for mapping

  1. Test all five different remote attributes, as follows, with similar test cases as noted previously.

    • openstack_user

    • openstack_user_domain

    • openstack_roles

    • openstack_project

    • openstack_project_domain

    The attribute openstack_user does not make much sense for testing because it is mapped only to a specific username. The preceding test cases have already covered the attributes openstack_user_domain, openstack_roles, and openstack_project.

Note that similar tests have also been run for two identity providers with one service provider, and for one identity provider with two service providers.

4.10.6 Known Issues and Limitations

Keep the following points in mind:

  • When a user is disabled in the identity provider, the issued federated token from the service provider still remains valid until the token is expired based on the Keystone expiration setting.

  • An already issued federated token will retain its scope until its expiration. Any changes in the mapping on the service provider will not impact the scope of an already issued federated token. For example, if an already issued federated token was mapped to group1 that has scope on project1, and mapping is changed to group2 that has scope on project2, the prevously issued federated token still has scope on project1.

  • Access to service provider resources is provided only through the python-keystone CLI client or the Keystone API. No Horizon web interface support is currently available.

  • Domains, projects, groups, roles, and quotas are created per the service provider cloud. Support for federated projects, groups, roles, and quotas is currently not available.

  • Keystone-to-Keystone federation and WebSSO cannot be configured by putting both sets of configuration attributes in the same config file; they will overwrite each other. Consequently, they need to be configured individually.

  • Scoping the federated user to a domain is not supported by default in the playbook. Please follow the steps at Section 4.10.7, “Scope Federated User to Domain”.

4.10.7 Scope Federated User to Domain

Use the following steps to scope a federated user to a domain:

  1. On the IdP side, set hostname=myidp.com and username=user1.

  2. On the service provider side, set: group=group1, group_domain_name=domain1, group1 scopes to project1.

  3. Mapping used: testcase1_1.json.

    testcase1_1.json

    [
      {
        "local": [
          {
            "user": {
              "name": "{0}"
            }
          },
          {
            "group": {
               "name": "group1",
               "domain":{
                 "name": "domain1"
               }
            }
          }
        ],
        "remote":[{
          "type": "openstack_user"
        },
        {
          "type": "Shib-Identity-Provider",
          "any_one_of":[
             "https://myidp.com:5000/v3/OS-FEDERATION/saml2/idp"
          ]
         }
        ]
       }
    ]
  4. Expected result: The federated user will scope to project1. Use CLI/Curl to assign any existing role to group1 on domain1. Use CLI/Curl to remove project1 scope from group1.

  5. Result: The federated user will scope to domain1.

4.11 Configuring Web Single Sign-On

This topic explains how to implement web single sign-on.

4.11.1 What is WebSSO?

WebSSO, or web single sign-on, is a method for web browsers to receive current authentication information from an identity provider system without requiring a user to log in again to the application displayed by the browser. Users initially access the identity provider web page and supply their credentials. If the user successfully authenticates with the identity provider, the authentication credentials are then stored in the user’s web browser and automatically provided to all web-based applications, such as the Horizon dashboard in SUSE OpenStack Cloud 8. If users have not yet authenticated with an identity provider or their credentials have timed out, they are automatically redirected to the identity provider to renew their credentials.

4.11.2 Limitations

  • The WebSSO function supports only Horizon web authentication. It is not supported for direct API or CLI access.

  • WebSSO works only with Fernet token provider. See Section 4.8.4, “Fernet Tokens”.

  • The SUSE OpenStack Cloud WebSSO function was tested with Microsoft Active Directory Federation Services (AD FS). The instructions provided are pertinent to AD FS and are intended to provide a sample configuration for deploying WebSSO with an external identity provider. If you have a different identity provider such as Ping Identity or IBM Tivoli, consult with those vendors for specific instructions for those products.

  • Only WebSSO federation using the SAML method is supported in SUSE OpenStack Cloud 8 . OpenID-based federation is not currently supported.

  • WebSSO has a change password option in User Settings, but note that this function is not accessible for users authenticating with external systems such as LDAP or SAML Identity Providers.

4.11.3 Enabling WebSSO

SUSE OpenStack Cloud 8 provides WebSSO support for the Horizon web interface. This support requires several configuration steps including editing the Horizon configuration file as well as ensuring that the correct Keystone authentication configuration is enabled to receive the authentication assertions provided by the identity provider.

The following is the workflow that depicts how Horizon and Keystone supports WebSSO if no current authentication assertion is available.

  1. Horizon redirects the web browser to the Keystone endpoint.

  2. Keystone automatically redirects the web browser to the correct identity provider authentication web page based on the Keystone configuration file.

  3. The user authenticates with the identity provider.

  4. The identity provider automatically redirects the web browser back to the Keystone endpoint.

  5. Keystone generates the required Javascript code to POST a token back to Horizon.

  6. Keystone automatically redirects the web browser back to Horizon and the user can then access projects and resources assigned to the user.

The following diagram provides more details on the WebSSO authentication workflow.

Note that the Horizon dashboard service never talks directly to the Keystone identity service until the end of the sequence, after the federated unscoped token negotiation has completed. The browser interacts with the Horizon dashboard service, the Keystone identity service, and AD FS on their respective public endpoints.

The following sequence of events is depicted in the diagram.

  1. The user's browser reaches the Horizon dashboard service's login page. The user selects AD FS login from the drop-down menu.

  2. The Horizon dashboard service issues an HTTP Redirect (301) to redirect the browser to the Keystone identity service's (public) SAML2 Web SSO endpoint (/auth/OS-FEDERATION/websso/saml2). The endpoint is protected by Apache mod_shib (shibboleth).

  3. The browser talks to the Keystone identity service. Because the user's browser does not have an active session with AD FS, the Keystone identity service issues an HTTP Redirect (301) to the browser, along with the required SAML2 request, to the AD FS endpoint.

  4. The browser talks to AD FS. AD FS returns a login form. The browser presents it to the user.

  5. The user enters credentials (such as username and password) and submits the form to AD FS.

  6. Upon successful validation of the user's credentials, AD FS issues an HTTP Redirect (301) to the browser, along with the SAML2 assertion, to the Keystone identity service's (public) SAML2 endpoint (/auth/OS-FEDERATION/websso/saml2).

  7. The browser talks to the Keystone identity service. the Keystone identity service validates the SAML2 assertion and issues a federated unscoped token. the Keystone identity service returns JavaScript code to be executed by the browser, along with the federated unscoped token in the headers.

  8. Upon execution of the JavaScript code, the browser is redirected to the Horizon dashboard service with the federated unscoped token in the header.

  9. The browser talks to the Horizon dashboard service with the federated unscoped token.

  10. With the unscoped token, the Horizon dashboard service talks to the Keystone identity service's (internal) endpoint to get a list of projects the user has access to.

  11. The Horizon dashboard service rescopes the token to the first project in the list. At this point, the user is successfully logged in.

4.11.4 Prerequisites

4.11.4.1 Creating AD FS metadata

For information about creating Active Directory Federation Services metadata, see the section To create edited AD FS 2.0 metadata with an added scope element of https://technet.microsoft.com/en-us/library/gg317734.

  1. On the AD FS computer, use a browser such as Internet Explorer to view https://<adfs_server_hostname>/FederationMetadata/2007-06/FederationMetadata.xml.

  2. On the File menu, click Save as, and then navigate to the Windows desktop and save the file with the name adfs_metadata.xml. Make sure to change the Save as type drop-down box to All Files (*.*).

  3. Use Windows Explorer to navigate to the Windows desktop, right-click adfs_metadata.xml, and then click Edit.

  4. In Notepad, insert the following XML in the first element. Before editing, the EntityDescriptor appears as follows:

    <EntityDescriptor ID="abc123" entityID=http://WIN-CAICP35LF2I.vlan44.domain/adfs/services/trust xmlns="urn:oasis:names:tc:SAML:2.0:metadata" >

    After editing, it should look like this:

    <EntityDescriptor ID="abc123" entityID="http://WIN-CAICP35LF2I.vlan44.domain/adfs/services/trust" xmlns="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:shibmd="urn:mace:shibboleth:metadata:1.0">
  5. In Notepad, on the Edit menu, click Find. In Find what, type IDPSSO, and then click Find Next.

  6. Insert the following XML in this section: Before editing, the IDPSSODescriptor appears as follows:

    <IDPSSODescriptor protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol"><KeyDescriptor use="encryption">

    After editing, it should look like this:

    <IDPSSODescriptor protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol"><Extensions><shibmd:Scope regexp="false">vlan44.domain</shibmd:Scope></Extensions><KeyDescriptor use="encryption">
  7. Delete the metadata document signature section of the file (the bold text shown in the following code). Because you have edited the document, the signature will now be invalid. Before editing the signature appears as follows:

    <EntityDescriptor ID="abc123" entityID="http://FSWEB.contoso.com/adfs/services/trust" xmlns="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:shibmd="urn:mace:shibboleth:metadata:1.0">
    <ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
        SIGNATURE DATA
    </ds:Signature>
    <RoleDescriptor xsi:type=…>

    After editing it should look like this:

    <EntityDescriptor ID="abc123" entityID="http://FSWEB.contoso.com/adfs/services/trust" xmlns="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:shibmd="urn:mace:shibboleth:metadata:1.0">
    <RoleDescriptor xsi:type=…>
  8. Save and close adfs_metadata.xml.

  9. Copy adfs_metadata.xml to the Cloud Lifecycle Manager node in your preferred location. Here it is /tmp.

4.11.4.2 Setting Up WebSSO

Start by creating a config file adfs_config.yml with the following parameters and place it in any directory on your Cloud Lifecycle Manager, such as /tmp.

keystone_trusted_idp: adfs
keystone_sp_conf:
    idp_metadata_file: /tmp/adfs_metadata.xml
    shib_sso_application_entity_id: http://sp_uri_entityId
    shib_sso_idp_entity_id: http://default_idp_uri_entityId
    target_domain:
        name: domain1
        description: my domain
    target_project:
        name: project1
        description: my project
    target_group:
        name: group1
        description: my group
    role:
        name: service
    identity_provider:
        id: adfs_idp1
        description: This is the AD FS identity provider.
    mapping:
        id: mapping1
        rules_file: adfs_mapping.json
    protocol:
        id: saml2
    attribute_map:
        -
          name: http://schemas.xmlsoap.org/claims/Group
          id: ADFS_GROUP
        -
          name: urn:oid:1.3.6.1.4.1.5923.1.1.1.6
          id: ADFS_LOGIN

A sample config file like this exists in roles/KEY-API/files/samples/websso/keystone_configure_adfs_sample.yml. Here are some detailed descriptions for each of the config options:

keystone_trusted_idp: A flag to indicate if this configuration is used for WebSSO or K2K. The value can be either 'adfs' or 'k2k'.
keystone_sp_conf:
    shib_sso_idp_entity_id: The AD FS URI used as an entity Id to identity the IdP.
    shib_sso_application_entity_id: The Service Provider URI used as a entity Id. It can be any URI here for Websso as long as it is unique to the SP.
    target_domain: A domain where the group will be created from.
        name: Any domain name. If it does not exist, it will be created or be updated.
        description: Any description.
    target_project: A project scope that the group has.
        name: Any project name. If it does not exist, it will be created or be updated.
        description: Any description.
    target_group: A group will be created from 'target_domain'.
        name: Any group name. If it does not exist, it will be created or be updated.
        description: Any description.
    role: A role will be assigned on 'target_project'. This role impacts the idp user scoped token permission at sp side.
        name: It has to be an existing role.
    idp_metadata_file: A reference to the AD FS metadata file that validates the SAML2 assertion.
    identity_provider: An AD FS IdP
        id: Any Id. If it does not exist, it will be created or be updated. This Id needs to be shared with the client so that the right mapping will be selected.
        description: Any description.
    mapping: A mapping in json format that maps a federated user to a corresponding group.
        id: Any Id. If it does not exist, it will be created or be updated.
        rules_file: A reference to the file that has the mapping in json.
    protocol: The supported federation protocol.
        id: 'saml2' is the only supported protocol for Websso.
    attribute_map: A shibboleth mapping defined additional attributes to map the attributes from the SAML2 assertion to the Websso mapping that SP understands.
        -
          name: An attribute name from the SAML2 assertion.
          id: An Id that the above name will be mapped to.
  1. In the preceding config file, /tmp/adfs_config.yml, make sure the idp_metadata_file references the previously generated AD FS metadata file. In this case:

    idp_metadata_file: /tmp/adfs_metadata.xml
  2. Create a mapping file that is referenced from the preceding config file, such as /tmp/adfs_sp_mapping.json. rules_file: /tmp/adfs_sp_mapping.json. The following is an example of the mapping file, existing in roles/KEY-API/files/samples/websso/adfs_sp_mapping.json:

    [
                 {
                   "local": [{
                         "user": {
                             "name": "{0}"
                         }
                     }],
                     "remote": [{
                         "type": "ADFS_LOGIN"
                     }]
                  },
                  {
                    "local": [{
                        "group": {
                            "id": "GROUP_ID"
                        }
                    }],
                    "remote": [{
                        "type": "ADFS_GROUP",
                    "any_one_of": [
                        "Domain Users"
                        ]
                    }]
                  }
     ]

    You can find more details about how the WebSSO mapping works at http://docs.openstack.org. Also see Section 4.11.4.3, “Mapping rules” for more information.

  3. Go to ~/scratch/ansible/next/ardana/ansible and run the following playbook to enable WebSSO in the Keystone identity service:

    ansible-playbook -i hosts/verb_hosts keystone-reconfigure.yml -e@/tmp/adfs_config.yml
  4. Enable WebSSO in the Horizon dashboard service by setting horizon_websso_enabled flag to True in roles/HZN-WEB/defaults/main.yml and then run the horizon-reconfigure playbook:

    ardana > ansible-playbook -i hosts/verb_hosts horizon-reconfigure.yml

4.11.4.3 Mapping rules

One IdP-SP has only one mapping. The last mapping that the customer configures will be the one used and will overwrite the old mapping setting. Therefore, if the example mapping adfs_sp_mapping.json is used, the following behavior is expected because it maps the federated user only to the one group configured in keystone_configure_adfs_sample.yml.

  • Configure domain1/project1/group1, mapping1; websso login horizon, see project1;

  • Then reconfigure: domain1/project2/group1. mapping1, websso login horizon, see project1 and project2;

  • Reconfigure: domain3/project3/group3; mapping1, websso login horizon, only see project3; because now the IDP mapping maps the federated user to group3, which only has priviliges on project3.

If you need a more complex mapping, you can use a custom mapping file, which needs to be specified in keystone_configure_adfs_sample.yml -> rules_file.

You can use different attributes of the AD FS user in order to map to different or multiple groups.

An example of a more complex mapping file is adfs_sp_mapping_multiple_groups.json, as follows.

adfs_sp_mapping_multiple_groups.json

[
  {
    "local": [
      {
        "user": {
          "name": "{0}"
        }
      },
      {
        "group": {
           "name": "group1",
           "domain":{
             "name": "domain1"
           }
        }
      }
    ],
    "remote":[{
      "type": "ADFS_LOGIN"
    },
    {
      "type": "ADFS_GROUP",
      "any_one_of":[
         "Domain Users"
      ]
     }
    ]
   },
  {
    "local": [
      {
        "user": {
          "name": "{0}"
        }
      },
      {
        "group": {
           "name": "group2",
           "domain":{
             "name": "domain2"
           }
        }
      }
    ],
    "remote":[{
      "type": "ADFS_LOGIN"
    },
    {
      "type": "ADFS_SCOPED_AFFILIATION",
      "any_one_of": [
          "member@contoso.com"
      ]
    },
    ]
   }
]

The adfs_sp_mapping_multiple_groups.json must be run together with keystone_configure_mutiple_groups_sample.yml, which adds a new attribute for the shibboleth mapping. That file is as follows:

keystone_configure_mutiple_groups_sample.yml

#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
---

keystone_trusted_idp: adfs
keystone_sp_conf:
    identity_provider:
        id: adfs_idp1
        description: This is the AD FS identity provider.
    idp_metadata_file: /opt/stack/adfs_metadata.xml

    shib_sso_application_entity_id: http://blabla
    shib_sso_idp_entity_id: http://WIN-CAICP35LF2I.vlan44.domain/adfs/services/trust

    target_domain:
        name: domain2
        description: my domain

    target_project:
        name: project6
        description: my project

    target_group:
        name: group2
        description: my group

    role:
        name: admin

    mapping:
        id: mapping1
        rules_file: /opt/stack/adfs_sp_mapping_multiple_groups.json

    protocol:
        id: saml2

    attribute_map:
        -
          name: http://schemas.xmlsoap.org/claims/Group
          id: ADFS_GROUP
        -
          name: urn:oid:1.3.6.1.4.1.5923.1.1.1.6
          id: ADFS_LOGIN
        -
          name: urn:oid:1.3.6.1.4.1.5923.1.1.1.9
          id: ADFS_SCOPED_AFFILIATION

4.11.5 Setting up the AD FS server as the identity provider

For AD FS to be able to communicate with the Keystone identity service, you need to add the Keystone identity service as a trusted relying party for AD FS and also specify the user attributes that you want to send to the Keystone identity service when users authenticate via WebSSO.

For more information, see the Microsoft AD FS wiki, section "Step 2: Configure AD FS 2.0 as the identity provider and shibboleth as the Relying Party".

Log in to the AD FS server.

Add a relying party using metadata

  1. From Server Manager Dashboard, click Tools on the upper right, then ADFS Management.

  2. Right-click ADFS, and then select Add Relying Party Trust.

  3. Click Start, leave the already selected option Import data about the relying party published online or on a local network.

  4. In the Federation metadata address field, type <keystone_publicEndpoint>/Shibboleth.sso/Metadata (your Keystone identity service Metadata endpoint), and then click Next. You can also import metadata from a file. Create a file with the content of the result of the following curl command

    curl <keystone_publicEndpoint>/Shibboleth.sso/Metadata

    and then choose this file for importing the metadata for the relying party.

  5. In the Specify Display Name page, choose a proper name to identify this trust relationship, and then click Next.

  6. On the Choose Issuance Authorization Rules page, leave the default Permit all users to access the relying party selected, and then click Next.

  7. Click Next, and then click Close.

Edit claim rules for relying party trust

  1. The Edit Claim Rules dialog box should already be open. If not, In the ADFS center pane, under Relying Party Trusts, right-click your newly created trust, and then click Edit Claim Rules.

  2. On the Issuance Transform Rules tab, click Add Rule.

  3. On the Select Rule Template page, select Send LDAP Attributes as Claims, and then click Next.

  4. On the Configure Rule page, in the Claim rule name box, type Get Data.

  5. In the Attribute Store list, select Active Directory.

  6. In the Mapping of LDAP attributes section, create the following mappings.

    LDAP AttributeOutgoing Claim Type
    Token-Groups – Unqualified NamesGroup
    User-Principal-NameUPN
  7. Click Finish.

  8. On the Issuance Transform Rules tab, click Add Rule.

  9. On the Select Rule Template page, select Send Claims Using a Custom Rule, and then click Next.

  10. In the Configure Rule page, in the Claim rule name box, type Transform UPN to epPN.

  11. In the Custom Rule window, type or copy and paste the following:

    c:[Type == "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/upn"]
    => issue(Type = "urn:oid:1.3.6.1.4.1.5923.1.1.1.6", Value = c.Value, Properties["http://schemas.xmlsoap.org/ws/2005/05/identity/claimproperties/attributename"] = "urn:oasis:names:tc:SAML:2.0:attrname-format:uri");
  12. Click Finish.

  13. On the Issuance Transform Rules tab, click Add Rule.

  14. On the Select Rule Template page, select Send Claims Using a Custom Rule, and then click Next.

  15. On the Configure Rule page, in the Claim rule name box, type Transform Group to epSA.

  16. In the Custom Rule window, type or copy and paste the following:

    c:[Type == "http://schemas.xmlsoap.org/claims/Group", Value == "Domain Users"]
    => issue(Type = "urn:oid:1.3.6.1.4.1.5923.1.1.1.9", Value = "member@contoso.com", Properties["http://schemas.xmlsoap.org/ws/2005/05/identity/claimproperties/attributename"] = "urn:oasis:names:tc:SAML:2.0:attrname-format:uri");
  17. Click Finish, and then click OK.

This list of Claim Rules is just an example and can be modified or enhanced based on the customer's necessities and AD FS setup specifics.

Create a sample user on the AD FS server

  1. From the Server Manager Dashboard, click Tools on the upper right, then Active Directory Users and Computer.

  2. Right click User, then New, and then User.

  3. Follow the on-screen instructions.

You can test the Horizon dashboard service "Login with ADFS" by opening a browser at the Horizon dashboard service URL and choose Authenticate using: ADFS Credentials. You should be redirected to the ADFS login page and be able to log into the Horizon dashboard service with your ADFS credentials.

4.12 Identity Service Notes and Limitations

4.12.1 Notes

This topic describes limitations of and important notes pertaining to the identity service. Domains

  • Domains can be created and managed by the Horizon web interface, Keystone API and OpenStackClient CLI.

  • The configuration of external authentication systems requires the creation and usage of Domains.

  • All configurations are managed by creating and editing specific configuration files.

  • End users can authenticate to a particular project and domain via the Horizon web interface, Keystone API and OpenStackClient CLI.

  • A new Horizon login page that requires a Domain entry is now installed by default.

Keystone-to-Keystone Federation

  • Keystone-to-Keystone (K2K) Federation provides the ability to authenticate once with one cloud and then use these credentials to access resources on other federated clouds.

  • All configurations are managed by creating and editing specific configuration files.

Multi-Factor Authentication (MFA)

  • The Keystone architecture provides support for MFA deployments.

  • MFA provides the ability to deploy non-password based authentication; for example: token providing hardware and text messages.

Hierarchical Multitenancy

  • Provides the ability to create sub-projects within a Domain-Project hierarchy.

4.12.2 Limitations

Authentication with external authentication systems (LDAP, Active Directory (AD) or Identity Providers)

  • No Horizon web portal support currently exists for the creation and management of external authentication system configurations.

Integration with LDAP services SUSE OpenStack Cloud 8 domain-specific configuration:

  • No Global User Listing: Once domain-specific driver configuration is enabled, listing all users and listing all groups are not supported operations. Those calls require a specific domain filter and a domain-scoped token for the target domain.

  • You cannot have both a file store and a database store for domain-specific driver configuration in a single identity service instance. Once a database store is enabled within the identity service instance, any file store will be ignored, and vice versa.

  • The identity service allows a list limit configuration to globally set the maximum number of entities that will be returned in an identity collection per request but it does not support per-domain list limit setting at this time.

  • Each time a new domain is configured with LDAP integration the single CA file gets overwritten. Ensure that you place certs for all the LDAP back-end domains in the cacert parameter. Detailed CA file inclusion instructions are provided in the comments of the sample YAML configuration file keystone_configure_ldap_my.yml (see Section 4.9.2, “Set up domain-specific driver configuration - file store”).

  • LDAP is only supported for identity operations (reading users and groups from LDAP).

  • Keystone assignment operations from LDAP records such as managing or assigning roles and projects, are not currently supported.

  • The SUSE OpenStack Cloud 'default' domain is pre-configured to store service account users and is authenticated locally against the identity service. Domains configured for external LDAP integration are non-default domains.

  • When using the current OpenStackClient CLI you must use the user ID rather than the user name when working with a non-default domain.

  • Each LDAP connection with the identity service is for read-only operations. Configurations that require identity service write operations (to create users, groups, etc.) are not currently supported.

  • LDAP is only supported for identity operations (reading users and groups from LDAP). Keystone assignment operations from LDAP records such as managing or assigning roles and projects, are not currently supported.

  • When using the current OpenStackClient CLI you must use the user ID rather than the user name when working with a non-default domain.

SUSE OpenStack Cloud 8 API-based domain-specific configuration management

  • No GUI dashboard for domain-specific driver configuration management

  • API-based Domain specific config does not check for type of option.

  • API-based Domain specific config does not check for option values supported.

  • API-based Domain config method does not provide retrieval of default values of domain-specific configuration options.

  • Status: Domain-specific driver configuration database store is a non-core feature for SUSE OpenStack Cloud 8.

4.12.3 Keystone-to-Keystone federation

  • When a user is disabled in the identity provider, the issued federated token from the service provider still remains valid until the token is expired based on the Keystone expiration setting.

  • An already issued federated token will retain its scope until its expiration. Any changes in the mapping on the service provider will not impact the scope of an already issued federated token. For example, if an already issued federated token was mapped to group1 that has scope on project1, and mapping is changed to group2 that has scope on project2, the prevously issued federated token still has scope on project1.

  • Access to service provider resources is provided only through the python-keystone CLI client or the Keystone API. No Horizon web interface support is currently available.

  • Domains, projects, groups, roles, and quotas are created per the service provider cloud. Support for federated projects, groups, roles, and quotas is currently not available.

  • Keystone-to-Keystone federation and WebSSO cannot be configured by putting both sets of configuration attributes in the same config file; they will overwrite each other. Consequently, they need to be configured individually.

  • Scoping the federated user to a domain is not supported by default in the playbook. To enable it, see the steps in Section 4.10.7, “Scope Federated User to Domain”.

  • No Horizon web portal support currently exists for the creation and management of federation configurations.

  • All end user authentication is available only via the Keystone API and OpenStackClient CLI.

  • Additional information can be found at http://docs.openstack.org.

WebSSO

  • The WebSSO function supports only Horizon web authentication. It is not supported for direct API or CLI access.

  • WebSSO works only with Fernet token provider. See Section 4.8.4, “Fernet Tokens”.

  • The SUSE OpenStack Cloud WebSSO function was tested with Microsoft Active Directory Federation Services (ADFS). The instructions provided are pertinent to ADFS and are intended to provide a sample configuration for deploying WebSSO with an external identity provider. If you have a different identity provider such as Ping Identity or IBM Tivoli, consult with those vendors for specific instructions for those products.

  • Only WebSSO federation using the SAML method is supported in SUSE OpenStack Cloud 8 . OpenID-based federation is not currently supported.

  • WebSSO has a change password option in User Settings, but note that this function is not accessible for users authenticating with external systems such as LDAP or SAML Identity Providers.

Multi-factor authentication (MFA)

Hierarchical multitenancy

Missing quota information for compute resources

Note
Note

An error message that will appear in the default Horizon page if you are running a Swift-only deployment (no Compute service). In this configuration, you will not see any quota information for Compute resources and will see the following error message:

The Compute service is not installed or is not configured properly. No information is available for Compute resources. This error message is expected as no Compute service is configured for this deployment. Please ignore the error message.

The following is the benchmark of the performance that is based on 150 concurrent requests and run for 10 minute periods of stable load time.

Operation In SUSE OpenStack Cloud 8 (secs/request)In SUSE OpenStack Cloud 8 3.0 (secs/request)
Token Creation 0.860.42
Token Validation0.470.41

Considering that token creation operations do not happen as frequently as token validation operations, you are likely to experience less of a performance problem regardless of the extended time for token creation.

4.12.4 System cron jobs need setup

Keystone relies on two cron jobs to periodically clean up expired tokens and for token revocation. The following is how the cron jobs appear on the system:

1 1 * * * /opt/stack/service/keystone/venv/bin/keystone-manage token_flush
1 1,5,10,15,20 * * * /opt/stack/service/keystone/venv/bin/revocation_cleanup.sh

By default, the two cron jobs are enabled on controller node 1 only, not on the other two nodes. When controller node 1 is down or has failed for any reason, these two cron jobs must be manually set up on one of the other two nodes.

5 Managing Compute

Information about managing and configuring the Compute service.

5.1 Managing Compute Hosts using Aggregates and Scheduler Filters

OpenStack Nova has the concepts of availability zones and host aggregates that enable you to segregate your compute hosts. Availability zones are used to specify logical separation within your cloud based on the physical isolation or redundancy you have set up. Host aggregates are used to group compute hosts together based upon common features, such as operation system. For more information, read this topic.

OpenStack Nova has the concepts of availability zones and host aggregates that enable you to segregate your Compute hosts. Availability zones are used to specify logical separation within your cloud based on the physical isolation or redundancy you have set up. Host aggregates are used to group compute hosts together based upon common features, such as operation system. For more information, see Scaling and Segregating your Cloud.

The Nova scheduler also has a filter scheduler, which supports both filtering and weighting to make decisions on where new compute instances should be created. For more information, see Filter Scheduler and Scheduling.

This document is going to show you how to set up both a Nova host aggregate and configure the filter scheduler to further segregate your compute hosts.

5.1.1 Creating a Nova Aggregate

These steps will show you how to create a Nova aggregate and how to add a compute host to it. You can run these steps on any machine that contains the NovaClient that also has network access to your cloud environment. These requirements are met by the Cloud Lifecycle Manager.

  1. Log in to the Cloud Lifecycle Manager.

  2. Source the administrative creds:

    ardana > source ~/service.osrc
  3. List your current Nova aggregates:

    ardana > nova aggregate-list
  4. Create a new Nova aggregate with this syntax:

    ardana > nova aggregate-create AGGREGATE-NAME

    If you wish to have the aggregate appear as an availability zone, then specify an availability zone with this syntax:

    ardana > nova aggregate-create AGGREGATE-NAME AVAILABILITY-ZONE-NAME

    So, for example, if you wish to create a new aggregate for your SUSE Linux Enterprise compute hosts and you wanted that to show up as the SLE availability zone, you could use this command:

    ardana > nova aggregate-create SLE SLE

    This would produce an output similar to this:

    +----+------+-------------------+-------+------------------+
    | Id | Name | Availability Zone | Hosts | Metadata                 
    +----+------+-------------------+-------+--------------------------+
    | 12 | SLE  | SLE               |       | 'availability_zone=SLE'
    +----+------+-------------------+-------+--------------------------+
  5. Next, you need to add compute hosts to this aggregate so you can start by listing your current hosts. You will want to limit the output of this command to only the hosts running the compute service, like this:

    ardana > nova host-list | grep compute
  6. You can then add host(s) to your aggregate with this syntax:

    ardana > nova aggregate-add-host AGGREGATE-NAME HOST
  7. Then you can confirm that this has been completed by listing the details of your aggregate:

    nova aggregate-details AGGREGATE-NAME

    You can also list out your availability zones using this command:

    ardana > nova availability-zone-list

5.1.2 Using Nova Scheduler Filters

The Nova scheduler has two filters that can help with differentiating between different compute hosts that we'll describe here.

FilterDescription
AggregateImagePropertiesIsolation

Isolates compute hosts based on image properties and aggregate metadata. You can use commas to specify multiple values for the same property. The filter will then ensure at least one value matches.

AggregateInstanceExtraSpecsFilter

Checks that the aggregate metadata satisfies any extra specifications associated with the instance type. This uses aggregate_instance_extra_specs

Note
Note

For details about other available filters, see Filter Scheduler.

Using the AggregateImagePropertiesIsolation Filter

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the ~/openstack/my_cloud/config/nova/nova.conf.j2 file and add AggregateImagePropertiesIsolation to the scheduler_filters section. Example below, in bold:

    # Scheduler
    ...
    scheduler_available_filters = nova.scheduler.filters.all_filters
    scheduler_default_filters = AvailabilityZoneFilter,RetryFilter,ComputeFilter,
     DiskFilter,RamFilter,ImagePropertiesFilter,ServerGroupAffinityFilter,
     ServerGroupAntiAffinityFilter,ComputeCapabilitiesFilter,NUMATopologyFilter,
     AggregateImagePropertiesIsolation
    ...

    Optionally, you can also add these lines:

    aggregate_image_properties_isolation_namespace = <a prefix string>
    aggregate_image_properties_isolation_separator = <a separator character>

    (defaults to .)

    If these are added, the filter will only match image properties starting with the name space and separator - for example, setting to my_name_space and : would mean the image property my_name_space:image_type=SLE matches metadata image_type=SLE, but an_other=SLE would not be inspected for a match at all.

    If these are not added all image properties will be matched against any similarly named aggregate metadata.

  3. Add image properties to images that should be scheduled using the above filter

  4. Commit the changes to git:

    ardana > git add -A
    ardana > git commit -a -m "editing nova schedule filters"
  5. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Run the ready deployment playbook:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Run the Nova reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts nova-reconfigure.yml

Using the AggregateInstanceExtraSpecsFilter Filter

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the ~/openstack/my_cloud/config/nova/nova.conf.j2 file and add AggregateInstanceExtraSpecsFilter to the scheduler_filters section. Example below, in bold:

    # Scheduler
    ...
    scheduler_available_filters = nova.scheduler.filters.all_filters
     scheduler_default_filters = AvailabilityZoneFilter,RetryFilter,ComputeFilter,
     DiskFilter,RamFilter,ImagePropertiesFilter,ServerGroupAffinityFilter,
     ServerGroupAntiAffinityFilter,ComputeCapabilitiesFilter,NUMATopologyFilter,
     AggregateInstanceExtraSpecsFilter
    ...
  3. There is no additional configuration needed because the following is true:

    1. The filter assumes : is a separator

    2. The filter will match all simple keys in extra_specs plus all keys with a separator if the prefix is aggregate_instance_extra_specs - for example, image_type=SLE and aggregate_instance_extra_specs:image_type=SLE will both be matched against aggregate metadata image_type=SLE

  4. Add extra_specs to flavors that should be scheduled according to the above.

  5. Commit the changes to git:

    ardana > git add -A
    ardana > git commit -a -m "Editing nova scheduler filters"
  6. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  7. Run the ready deployment playbook:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  8. Run the Nova reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts nova-reconfigure.yml

5.2 Using Flavor Metadata to Specify CPU Model

Libvirt is a collection of software used in OpenStack to manage virtualization. It has the ability to emulate a host CPU model in a guest VM. In SUSE OpenStack Cloud Nova, the ComputeCapabilitiesFilter limits this ability by checking the exact CPU model of the compute host against the requested compute instance model. It will only pick compute hosts that have the cpu_model requested by the instance model, and if the selected compute host does not have that cpu_model, the ComputeCapabilitiesFilter moves on to find another compute host that matches, if possible. Selecting an unavailable vCPU model may cause Nova to fail with no valid host found.

To assist, there is a Nova scheduler filter that captures cpu_models as a subset of a particular CPU family. The filter determines if the host CPU model is capable of emulating the guest CPU model by maintaining the mapping of the vCPU models and comparing it with the host CPU model.

There is a limitation when a particular cpu_model is specified with hw:cpu_model via a compute flavor: the cpu_mode will be set to custom. This mode ensures that a persistent guest virtual machine will see the same hardware no matter what host physical machine the guest virtual machine is booted on. This allows easier live migration of virtual machines. Because of this limitation, only some of the features of a CPU are exposed to the guest. Requesting particular CPU features is not supported.

5.2.1 Editing the flavor metadata in the Horizon dashboard

These steps can be used to edit a flavor's metadata in the Horizon dashboard to add the extra_specs for a cpu_model:

  1. Access the Horizon dashboard and log in with admin credentials.

  2. Access the Flavors menu by (A) clicking on the menu button, (B) navigating to the Admin section, and then (C) clicking on Flavors:

  3. In the list of flavors, choose the flavor you wish to edit and click on the entry under the Metadata column:

    Note
    Note

    You can also create a new flavor and then choose that one to edit.

  4. In the Custom field, enter hw:cpu_model and then click on the + (plus) sign to continue:

  5. Then you will want to enter the CPU model into the field that you wish to use and then click Save:

5.3 Forcing CPU and RAM Overcommit Settings

SUSE OpenStack Cloud supports overcommitting of CPU and RAM resources on compute nodes. Overcommitting is a technique of allocating more virtualized CPUs and/or memory than there are physical resources.

The default settings for this are:

SettingDefault ValueDescription
cpu_allocation_ratio16

Virtual CPU to physical CPU allocation ratio which affects all CPU filters. This configuration specifies a global ratio for CoreFilter. For AggregateCoreFilter, it will fall back to this configuration value if no per-aggregate setting found.

Note
Note

This can be set per-compute, or if set to 0.0, the value set on the scheduler node(s) will be used and defaulted to 16.0.

ram_allocation_ratio1.0

Virtual RAM to physical RAM allocation ratio which affects all RAM filters. This configuration specifies a global ratio for RamFilter. For AggregateRamFilter, it will fall back to this configuration value if no per-aggregate setting found.

Note
Note

This can be set per-compute, or if set to 0.0, the value set on the scheduler node(s) will be used and defaulted to 1.5.

disk_allocation_ratio1.0

This is the virtual disk to physical disk allocation ratio used by the disk_filter.py script to determine if a host has sufficient disk space to fit a requested instance. A ratio greater than 1.0 will result in over-subscription of the available physical disk, which can be useful for more efficiently packing instances created with images that do not use the entire virtual disk,such as sparse or compressed images. It can be set to a value between 0.0 and 1.0 in order to preserve a percentage of the disk for uses other than instances.

Note
Note

This can be set per-compute, or if set to 0.0, the value set on the scheduler node(s) will be used and defaulted to 1.0.

5.3.1 Changing the overcommit ratios for your entire environment

If you wish to change the CPU and/or RAM overcommit ratio settings for your entire environment then you can do so via your Cloud Lifecycle Manager with these steps.

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the Nova configuration settings located in this file:

    ~/openstack/my_cloud/config/nova/nova.conf.j2
  3. Add or edit the following lines to specify the ratios you wish to use:

    cpu_allocation_ratio = 16
    ram_allocation_ratio = 1.0
  4. Commit your configuration to the Git repository (Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "setting Nova overcommit settings"
  5. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Run the Nova reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts nova-reconfigure.yml

5.4 Enabling the Nova Resize and Migrate Features

The Nova resize and migrate features are disabled by default. If you wish to utilize these options, these steps will show you how to enable it in your cloud.

The two features below are disabled by default:

These two features are disabled by default because they require passwordless SSH access between Compute hosts with the user having access to the file systems to perform the copy.

5.4.1 Enabling Nova Resize and Migrate

If you wish to enable these features, use these steps on your lifecycle manager. This will deploy a set of public and private SSH keys to the Compute hosts, allowing the nova user SSH access between each of your Compute hosts.

  1. Log in to the Cloud Lifecycle Manager.

  2. Run the Nova reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts nova-reconfigure.yml --extra-vars nova_migrate_enabled=true
  3. To ensure that the resize and migration options show up in the Horizon dashboard, run the Horizon reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts horizon-reconfigure.yml

5.4.2 Disabling Nova Resize and Migrate

This feature is disabled by default. However, if you have previously enabled it and wish to re-disable it, you can use these steps on your lifecycle manager. This will remove the set of public and private SSH keys that were previously added to the Compute hosts, removing the nova users SSH access between each of your Compute hosts.

  1. Log in to the Cloud Lifecycle Manager.

  2. Run the Nova reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts nova-reconfigure.yml --extra-vars nova_migrate_enabled=false
  3. To ensure that the resize and migrate options are removed from the Horizon dashboard, run the Horizon reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts horizon-reconfigure.yml

5.5 Enabling ESX Compute Instance(s) Resize Feature

The resize of ESX compute instance is disabled by default. If you want to utilize this option, these steps will show you how to configure and enable it in your cloud.

The following feature is disabled by default:

  • Resize - this feature allows you to change the size of a Compute instance by changing its flavor. See the OpenStack User Guide for more details on its use.

5.5.1 Procedure

If you want to configure and re-size ESX compute instance(s), perform the following steps:

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the ~ /openstack/my_cloud/config/nova/nova.conf.j2 to add the following parameter under Policy:

    # Policy
    allow_resize_to_same_host=True
  3. Commit your configuration:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "<commit message>"
  4. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml

    By default the nova resize feature is disabled. To enable nova resize, refer to Section 5.4, “Enabling the Nova Resize and Migrate Features”.

    By default an ESX console log is not set up. For more details about its setup, refer to VMware vSphere.

5.6 Configuring the Image Service

The image service, based on OpenStack Glance, works out of the box and does not need any special configuration. However, we show you how to enable Glance image caching as well as how to configure your environment to allow the Glance copy-from feature if you choose to do so. A few features detailed below will require some additional configuration if you choose to use them.

Warning
Warning

Glance images are assigned IDs upon creation, either automatically or specified by the user. The ID of an image should be unique, so if a user assigns an ID which already exists, a conflict (409) will occur.

This only becomes a problem if users can publicize or share images with others. If users can share images AND cannot publicize images then your system is not vulnerable. If the system has also been purged (via glance-manage db purge) then it is possible for deleted image IDs to be reused.

If deleted image IDs can be reused then recycling of public and shared images becomes a possibility. This means that a new (or modified) image can replace an old image, which could be malicious.

If this is a problem for you, please contact Sales Engineering.

5.6.1 How to enable Glance image caching

In SUSE OpenStack Cloud 8, by default, the Glance image caching option is not enabled. You have the option to have image caching enabled and these steps will show you how to do that.

The main benefits to using image caching is that it will allow the Glance service to return the images faster and it will cause less load on other services to supply the image.

In order to use the image caching option you will need to supply a logical volume for the service to use for the caching.

If you wish to use the Glance image caching option, you will see the section below in your ~/openstack/my_cloud/definition/data/disks_controller.yml file. You will specify the mount point for the logical volume you wish to use for this.

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit your ~/openstack/my_cloud/definition/data/disks_controller.yml file and specify the volume and mount point for your glance-cache. Here is an example:

    # Glance cache: if a logical volume with consumer usage glance-cache
    # is defined Glance caching will be enabled. The logical volume can be
    # part of an existing volume group or a dedicated volume group.
     - name: glance-vg
       physical-volumes:
         - /dev/sdx
       logical-volumes:
         - name: glance-cache
           size: 95%
           mount: /var/lib/glance/cache
           fstype: ext4
           mkfs-opts: -O large_file
           consumer:
             name: glance-api
             usage: glance-cache

    If you are enabling image caching during your initial installation, prior to running site.yml the first time, then continue with the installation steps. However, if you are making this change post-installation then you will need to commit your changes with the steps below.

  3. Commit your configuration to the Git repository (Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "My config or other commit message"
  4. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  6. Run the Glance reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts glance-reconfigure.yml

5.6.2 Allowing the Glance copy-from option in your environment

When creating images, one of the options you have is to copy the image from a remote location to your local Glance store. You do this by specifying the --copy-from option when creating the image. To use this feature though you need to ensure the following conditions are met:

  • The server hosting the Glance service must have network access to the remote location that is hosting the image.

  • There cannot be a proxy between Glance and the remote location.

  • The Glance v1 API must be enabled, as v2 does not currently support the copy-from function.

  • The http Glance store must be enabled in the environment, following the steps below.

Enabling the HTTP Glance Store

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the ~/openstack/my_cloud/config/glance/glance-api.conf.j2 file and add http to the list of Glance stores in the [glance_store] section as seen below in bold:

    [glance_store]
    stores = {{ glance_stores }}, http
  3. Commit your configuration to the Git repository (Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "My config or other commit message"
  4. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  6. Run the Glance reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts glance-reconfigure.yml
  7. Run the Horizon reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts horizon-reconfigure.yml

6 Managing ESX

Information about managing and configuring the ESX service.

6.1 Networking for ESXi Hypervisor (OVSvApp)

To provide the network as a service for tenant VM's hosted on ESXi Hypervisor, a service VM named OVSvApp VM is deployed on each ESXi Hypervisor within a cluster managed by OpenStack Nova, as shown in the following figure.

The OVSvApp VM runs SLES as a guest operating system, and has Open vSwitch 2.1.0 or above installed. It also runs an agent called OVSvApp agent, which is responsible for dynamically creating the port groups for the tenant VMs and manages OVS bridges, which contain the flows related to security groups and L2 networking.

To facilitate fault tolerance and mitigation of data path loss for tenant VMs, run the neutron-ovsvapp-agent-monitor process as part of the neutron-ovsvapp-agent service, responsible for monitoring the Open vSwitch module within the OVSvApp VM. It also uses a nginx server to provide the health status of the Open vSwitch module to the Neutron server for mitigation actions. There is a mechanism to keep the neutron-ovsvapp-agent service alive through a systemd script.

When a OVSvApp Service VM crashes, an agent monitoring mechanism starts a cluster mitigation process. You can mitigate data path traffic loss for VMs on the failed ESX host in that cluster by putting the failed ESX host in the maintenance mode. This, in turn, triggers the vCenter DRS migrates tenant VMs to other ESX hosts within the same cluster. This ensures data path continuity of tenant VMs traffic.

To View Cluster Mitigation

An administrator can view the cluster mitigation status using the following commands.

  1. neutron ovsvapp-mitigated-cluster-list

    Lists all the clusters where at least one round of host mitigation has happened.

    Example:

    neutron ovsvapp-mitigated-cluster-list
    +----------------+--------------+-----------------------+---------------------------+
    | vcenter_id     | cluster_id   | being_mitigated       | threshold_reached         |
    +----------------+--------------+-----------------------+---------------------------+
    | vcenter1       | cluster1     | True                  | False                     |
    | vcenter2       | cluster2     | False                 | True                      |
    +---------------+------------+-----------------+------------------------------------+
  2. neutron ovsvapp-mitigated-cluster-show --vcenter-id <VCENTER_ID> --cluster-id <CLUSTER_ID>

    Shows the status of a particular cluster.

    Example :

    neutron ovsvapp-mitigated-cluster-show --vcenter-id vcenter1 --cluster-id cluster1
    +---------------------------+-------------+
    | Field                     | Value       |
    +---------------------------+-------------+
    | being_mitigated           | True        |
    | cluster_id                | cluster1    |
    | threshold_reached         | False       |
    | vcenter_id                | vcenter1    |
    +---------------------------+-------------+

    There can be instances where a triggered mitigation may not succeed and the neutron server is not informed of such failure (for example, if the selected agent which had to mitigate the host, goes down before finishing the task). In this case, the cluster will be locked. To unlock the cluster for further mitigations, use the update command.

  3. neutron ovsvapp-mitigated-cluster-update --vcenter-id <VCENTER_ID> --cluster-id <CLUSTER_ID>

    • Update the status of a mitigated cluster:

      Modify the values of being-mitigated from True to False to unlock the cluster.

      Example:

      neutron ovsvapp-mitigated-cluster-update --vcenter-id vcenter1 --cluster-id cluster1 --being-mitigated False
    • Update the threshold value:

      Update the threshold-reached value to True, if no further migration is required in the selected cluster.

      Example :

      neutron ovsvapp-mitigated-cluster-update --vcenter-id vcenter1 --cluster-id cluster1 --being-mitigated False --threshold-reached True

    Rest API

    • curl -i -X GET http://<ip>:9696/v2.0/ovsvapp_mitigated_clusters \
        -H "User-Agent: python-neutronclient" -H "Accept: application/json" -H \
        "X-Auth-Token: <token_id>"

6.1.1 More Information

For more information on the Networking for ESXi Hypervisor (OVSvApp), see the following references:

6.2 Validating the Neutron Installation

You can validate that the ESX compute cluster is added to the cloud successfully using the following command:

# neutron agent-list

+------------------+----------------------+-----------------------+-------------------+-------+----------------+---------------------------+
| id               | agent_type           | host                  | availability_zone | alive | admin_state_up | binary                    |
+------------------+----------------------+-----------------------+-------------------+-------+----------------+---------------------------+
| 05ca6ef...999c09 | L3 agent             | doc-cp1-comp0001-mgmt | nova              | :-)   | True           | neutron-l3-agent          |
| 3b9179a...28e2ef | Metadata agent       | doc-cp1-comp0001-mgmt |                   | :-)   | True           | neutron-metadata-agent    |
| 3d756d7...a719a2 | Loadbalancerv2 agent | doc-cp1-comp0001-mgmt |                   | :-)   | True           | neutron-lbaasv2-agent     |
| 4e8f84f...c9c58f | Metadata agent       | doc-cp1-comp0002-mgmt |                   | :-)   | True           | neutron-metadata-agent    |
| 55a5791...c17451 | L3 agent             | doc-cp1-c1-m1-mgmt    | nova              | :-)   | True           | neutron-vpn-agent         |
| 5e3db8f...87f9be | Open vSwitch agent   | doc-cp1-c1-m1-mgmt    |                   | :-)   | True           | neutron-openvswitch-agent |
| 6968d9a...b7b4e9 | L3 agent             | doc-cp1-c1-m2-mgmt    | nova              | :-)   | True           | neutron-vpn-agent         |
| 7b02b20...53a187 | Metadata agent       | doc-cp1-c1-m2-mgmt    |                   | :-)   | True           | neutron-metadata-agent    |
| 8ece188...5c3703 | Open vSwitch agent   | doc-cp1-comp0002-mgmt |                   | :-)   | True           | neutron-openvswitch-agent |
| 8fcb3c7...65119a | Metadata agent       | doc-cp1-c1-m1-mgmt    |                   | :-)   | True           | neutron-metadata-agent    |
| 9f48967...36effe | OVSvApp agent        | doc-cp1-comp0002-mgmt |                   | :-)   | True           | ovsvapp-agent             |
| a2a0b78...026da9 | Open vSwitch agent   | doc-cp1-comp0001-mgmt |                   | :-)   | True           | neutron-openvswitch-agent |
| a2fbd4a...28a1ac | DHCP agent           | doc-cp1-c1-m2-mgmt    | nova              | :-)   | True           | neutron-dhcp-agent        |
| b2428d5...ee60b2 | DHCP agent           | doc-cp1-c1-m1-mgmt    | nova              | :-)   | True           | neutron-dhcp-agent        |
| c0983a6...411524 | Open vSwitch agent   | doc-cp1-c1-m2-mgmt    |                   | :-)   | True           | neutron-openvswitch-agent |
| c32778b...a0fc75 | L3 agent             | doc-cp1-comp0002-mgmt | nova              | :-)   | True           | neutron-l3-agent          |
+------------------+----------------------+-----------------------+-------------------+-------+----------------+---------------------------+

6.3 Removing a Cluster from the Compute Resource Pool

6.3.1 Prerequisites

Write down the Hostname and ESXi configuration IP addresses of OVSvAPP VMs of that ESX cluster before deleting the VMs. These IP address and Hostname will be used to cleanup Monasca alarm definitions.

Perform the following steps:

  1. Login to vSphere client.

  2. Select the ovsvapp node running on each ESXi host and click Summary tab as shown in the following example.

    Similarly you can retrieve the compute-proxy node information.

6.3.2 Removing an existing cluster from the compute resource pool

Perform the following steps to remove an existing cluster from the compute resource pool.

  1. Run the following command to check for the instances launched in that cluster:

    # nova list --host <hostname>
    +--------------------------------------+------+--------+------------+-------------+------------------+
    | ID                                   | Name | Status | Task State | Power State | Networks         |
    +--------------------------------------+------+--------+------------+-------------+------------------+
    | 80e54965-758b-425e-901b-9ea756576331 | VM1  | ACTIVE | -          | Running     | private=10.0.0.2 |
    +--------------------------------------+------+--------+------------+-------------+------------------+

    where:

    • hostname: Specifies hostname of the compute proxy present in that cluster.

  2. Delete all instances spawned in that cluster:

    # nova delete <server> [<server ...>]

    where:

    • server: Specifies the name or ID of server (s)

    OR

    Migrate all instances spawned in that cluster.

    # nova migrate <server>
  3. Run the following playbooks for stop the Compute (Nova) and Networking (Neutron) services:

    ansible-playbook -i hosts/verb_hosts nova-stop --limit <hostname>;
    ansible-playbook -i hosts/verb_hosts neutron-stop --limit <hostname>;

    where:

    • hostname: Specifies hostname of the compute proxy present in that cluster.

6.3.3 Cleanup Monasca Agent for OVSvAPP Service

Perform the following procedure to cleanup Monasca agents for ovsvapp-agent service.

  1. If Monasca-API is installed on different node, copy the service.orsc from Cloud Lifecycle Manager to Monasca API server.

    scp service.orsc $USER@ardana-cp1-mtrmon-m1-mgmt:
  2. SSH to Monasca API server. You must SSH to each Monasca API server for cleanup.

    For example:

    ssh ardana-cp1-mtrmon-m1-mgmt
  3. Edit /etc/monasca/agent/conf.d/host_alive.yaml file to remove the reference to the OVSvAPP you removed. This requires sudo access.

    sudo vi /etc/monasca/agent/conf.d/host_alive.yaml

    A sample of host_alive.yaml:

    - alive_test: ping
      built_by: HostAlive
      host_name: esx-cp1-esx-ovsvapp0001-mgmt
      name: esx-cp1-esx-ovsvapp0001-mgmt ping
      target_hostname: esx-cp1-esx-ovsvapp0001-mgmt

    where HOST_NAME and TARGET_HOSTNAME is mentioned at the DNS name field at the vSphere client. (Refer to Section 6.3.1, “Prerequisites”).

  4. After removing the reference on each of the Monasca API servers, restart the monasca-agent on each of those servers by executing the following command.

    tux > sudo service openstack-monasca-agent restart
  5. With the OVSvAPP references removed and the monasca-agent restarted, you can delete the corresponding alarm to complete the cleanup process. We recommend using the Monasca CLI which is installed on each of your Monasca API servers by default. Execute the following command from the Monasca API server (for example: ardana-cp1-mtrmon-mX-mgmt).

    monasca alarm-list --metric-name host_alive_status --metric-dimensions hostname=<ovsvapp deleted>

    For example: You can execute the following command to get the alarm ID, if the OVSvAPP appears as a preceding example.

    monasca alarm-list --metric-name host_alive_status --metric-dimensions hostname=MCP-VCP-cpesx-esx-ovsvapp0001-mgmt
    +--------------------------------------+--------------------------------------+-----------------------+-------------------+-------------------------------------------+----------+-------+-----------------+------+--------------------------+--------------------------+--------------------------+
    | id                                   | alarm_definition_id                  | alarm_definition_name | metric_name       | metric_dimensions                         | severity | state | lifecycle_state | link | state_updated_timestamp  | updated_timestamp        | created_timestamp        |
    +--------------------------------------+--------------------------------------+-----------------------+-------------------+-------------------------------------------+----------+-------+-----------------+------+--------------------------+--------------------------+--------------------------+
    | cfc6bfa4-2485-4319-b1e5-0107886f4270 | cca96c53-a927-4b0a-9bf3-cb21d28216f3 | Host Status           | host_alive_status | service: system                           | HIGH     | OK    | None            | None | 2016-10-27T06:33:04.256Z | 2016-10-27T06:33:04.256Z | 2016-10-23T13:41:57.258Z |
    |                                      |                                      |                       |                   | cloud_name: entry-scale-kvm-esx-mml       |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | test_type: ping                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | hostname: ardana-cp1-esx-ovsvapp0001-mgmt |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | control_plane: control-plane-1            |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cluster: mtrmon                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | observer_host: ardana-cp1-mtrmon-m1-mgmt  |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       | host_alive_status | service: system                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cloud_name: entry-scale-kvm-esx-mml       |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | test_type: ping                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | hostname: ardana-cp1-esx-ovsvapp0001-mgmt |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | control_plane: control-plane-1            |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cluster: mtrmon                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | observer_host: ardana-cp1-mtrmon-m3-mgmt  |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       | host_alive_status | service: system                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cloud_name: entry-scale-kvm-esx-mml       |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | test_type: ping                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | hostname: ardana-cp1-esx-ovsvapp0001-mgmt |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | control_plane: control-plane-1            |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cluster: mtrmon                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | observer_host: ardana-cp1-mtrmon-m2-mgmt  |          |       |                 |      |                          |                          |                          |
    +--------------------------------------+--------------------------------------+-----------------------+-------------------+-------------------------------------------+----------+-------+-----------------+------+--------------------------+--------------------------+--------------------------+
  6. Delete the Monasca alaram.

    monasca alarm-delete <alarm ID>

    For example:

    monasca alarm-delete cfc6bfa4-2485-4319-b1e5-0107886f4270Successfully deleted alarm

    After deleting the alarms and updating the monasca-agent configuration, those alarms will be removed from the Operations Console UI. You can login to Operations Console and view the status.

6.3.4 Removing the Compute Proxy from Monitoring

Once you have removed the Compute proxy, the alarms against them will still trigger. Therefore to resolve this, you must perform the following steps.

  1. SSH to Monasca API server. You must SSH to each Monasca API server for cleanup.

    For example:

    ssh ardana-cp1-mtrmon-m1-mgmt
  2. Edit /etc/monasca/agent/conf.d/host_alive.yaml file to remove the reference to the Compute proxy you removed. This requires sudo access.

    sudo vi /etc/monasca/agent/conf.d/host_alive.yaml

    A sample of host_alive.yaml file.

    - alive_test: ping
      built_by: HostAlive
      host_name: MCP-VCP-cpesx-esx-comp0001-mgmt
      name: MCP-VCP-cpesx-esx-comp0001-mgmt ping
  3. Once you have removed the references on each of your Monasca API servers, execute the following command to restart the monasca-agent on each of those servers.

    tux > sudo service openstack-monasca-agent restart
  4. With the Compute proxy references removed and the monasca-agent restarted, delete the corresponding alarm to complete this process. complete the cleanup process. We recommend using the Monasca CLI which is installed on each of your Monasca API servers by default.

    monasca alarm-list --metric-dimensions hostname= <compute node deleted>

    For example: You can execute the following command to get the alarm ID, if the Compute proxy appears as a preceding example.

    monasca alarm-list --metric-dimensions hostname=ardana-cp1-comp0001-mgmt
  5. Delete the Monasca alarm

    monasca alarm-delete <alarm ID>

6.3.5 Cleaning the Monasca Alarms Related to ESX Proxy and vCenter Cluster

Perform the following procedure:

  1. Using the ESX proxy hostname, execute the following command to list all alarms.

    monasca alarm-list --metric-dimensions hostname=COMPUTE_NODE_DELETED

    where COMPUTE_NODE_DELETED - hostname is taken from the vSphere client (refer to Section 6.3.1, “Prerequisites”).

    Note
    Note

    Ensure to make a note of all the alarm IDs that is displayed after executing the preceding command.

    For example, the compute proxy hostname is MCP-VCP-cpesx-esx-comp0001-mgmt.

    monasca alarm-list --metric-dimensions hostname=MCP-VCP-cpesx-esx-comp0001-mgmt
    ardana@R28N6340-701-cp1-c1-m1-mgmt:~$ monasca alarm-list --metric-dimensions hostname=R28N6340-701-cp1-esx-comp0001-mgmt
    +--------------------------------------+--------------------------------------+------------------------+------------------------+--------------------------------------------------+----------+-------+-----------------+------+--------------------------+--------------------------+--------------------------+
    | id                                   | alarm_definition_id                  | alarm_definition_name  | metric_name            | metric_dimensions                                | severity | state | lifecycle_state | link | state_updated_timestamp  | updated_timestamp        | created_timestamp        |
    +--------------------------------------+--------------------------------------+------------------------+------------------------+--------------------------------------------------+----------+-------+-----------------+------+--------------------------+--------------------------+--------------------------+
    | 02342bcb-da81-40db-a262-09539523c482 | 3e302297-0a36-4f0e-a1bd-03402b937a4e | HTTP Status            | http_status            | service: compute                                 | HIGH     | OK    | None            | None | 2016-11-11T06:58:11.717Z | 2016-11-11T06:58:11.717Z | 2016-11-10T08:55:45.136Z |
    |                                      |                                      |                        |                        | cloud_name: entry-scale-esx-kvm                  |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | url: https://10.244.209.9:8774                   |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | hostname: R28N6340-701-cp1-esx-comp0001-mgmt     |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | component: nova-api                              |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | control_plane: control-plane-1                   |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | cluster: esx-compute                             |          |       |                 |      |                          |                          |                          |
    | 04cb36ce-0c7c-4b4c-9ebc-c4011e2f6c0a | 15c593de-fa54-4803-bd71-afab95b980a4 | Disk Usage             | disk.space_used_perc   | mount_point: /proc/sys/fs/binfmt_misc            | HIGH     | OK    | None            | None | 2016-11-10T08:52:52.886Z | 2016-11-10T08:52:52.886Z | 2016-11-10T08:51:29.197Z |
    |                                      |                                      |                        |                        | service: system                                  |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | cloud_name: entry-scale-esx-kvm                  |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | hostname: R28N6340-701-cp1-esx-comp0001-mgmt     |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | control_plane: control-plane-1                   |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | cluster: esx-compute                             |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                        |                        | device: systemd-1                                |          |       |                 |      |                          |                          |                          |
    +--------------------------------------+--------------------------------------+------------------------+------------------------+--------------------------------------------------+----------+-------+-----------------+------+--------------------------+--------------------------+--------------------------+
  2. Delete the alarm using the alarm IDs.

    monasca alarm-delete <alarm ID>

    This step has to be performed for all alarm IDs listed from the preceding step (Step 1).

    For Example:

    monasca alarm-delete 1cc219b1-ce4d-476b-80c2-0cafa53e1a12

6.4 Removing an ESXi Host from a Cluster

This topic describes how to remove an existing ESXi host from a cluster and clean up of services for OVSvAPP VM.

Note
Note

Before performing this procedure, wait until VCenter migrates all the tenant VMs to other active hosts in that same cluster.

6.4.1 Prerequisite

Write down the Hostname and ESXi configuration IP addresses of OVSvAPP VMs of that ESX cluster before deleting the VMs. These IP address and Hostname will be used to clean up Monasca alarm definitions.

  1. Login to vSphere client.

  2. Select the ovsvapp node running on the ESXi host and click Summary tab.

6.4.2 Procedure

  1. Right-click and put the host in the maintenance mode. This will automatically migrate all the tenant VMs except OVSvApp.

  2. Cancel the maintenance mode task.

  3. Right-click the ovsvapp VM (IP Address) node, select Power, and then click Power Off.

  4. Right-click the node and then click Delete from Disk.

  5. Right-click the Host, and then click Enter Maintenance Mode.

  6. Disconnect the VM. Right-click the VM, and then click Disconnect.

The ESXi node is removed from the vCenter.

6.4.3 Clean up Neutron Agent for OVSvAPP Service

After removing ESXi node from a vCenter, perform the following procedure to clean up neutron agents for ovsvapp-agent service.

  1. Login to Cloud Lifecycle Manager.

  2. Source the credentials.

    source service.osrc
  3. Execute the following command.

    neutron agent-list | grep <OVSvapp hostname>

    For example:

    neutron agent-list | grep MCP-VCP-cpesx-esx-ovsvapp0001-mgmt
    | 92ca8ada-d89b-43f9-b941-3e0cd2b51e49 | OVSvApp Agent      | MCP-VCP-cpesx-esx-ovsvapp0001-mgmt |                   | :-)   | True           | ovsvapp-agent             |
  4. Delete the OVSvAPP agent.

    neutron agent-delete <Agent -ID>

    For example:

    neutron agent-delete 92ca8ada-d89b-43f9-b941-3e0cd2b51e49

If you have more than one host, perform the preceding procedure for all the hosts.

6.4.4 Clean up Monasca Agent for OVSvAPP Service

Perform the following procedure to clean up Monasca agents for ovsvapp-agent service.

  1. If Monasca-API is installed on different node, copy the service.orsc from Cloud Lifecycle Manager to Monasca API server.

    scp service.orsc $USER@ardana-cp1-mtrmon-m1-mgmt:
  2. SSH to Monasca API server. You must SSH to each Monasca API server for cleanup.

    For example:

    ssh ardana-cp1-mtrmon-m1-mgmt
  3. Edit /etc/monasca/agent/conf.d/host_alive.yaml file to remove the reference to the OVSvAPP you removed. This requires sudo access.

    sudo vi /etc/monasca/agent/conf.d/host_alive.yaml

    A sample of host_alive.yaml:

    - alive_test: ping
      built_by: HostAlive
      host_name: MCP-VCP-cpesx-esx-ovsvapp0001-mgmt
      name: MCP-VCP-cpesx-esx-ovsvapp0001-mgmt ping
      target_hostname: MCP-VCP-cpesx-esx-ovsvapp0001-mgmt

    where host_name and target_hostname are mentioned at the DNS name field at the vSphere client. (Refer to Section 6.4.1, “Prerequisite”).

  4. After removing the reference on each of the Monasca API servers, restart the monasca-agent on each of those servers by executing the following command.

    tux > sudo service openstack-monasca-agent restart
  5. With the OVSvAPP references removed and the monasca-agent restarted, you can delete the corresponding alarm to complete the cleanup process. We recommend using the Monasca CLI which is installed on each of your Monasca API servers by default. Execute the following command from the Monasca API server (for example: ardana-cp1-mtrmon-mX-mgmt).

    monasca alarm-list --metric-name host_alive_status --metric-dimensions hostname=<ovsvapp deleted>

    For example: You can execute the following command to get the alarm ID, if the OVSvAPP appears as a preceding example.

    monasca alarm-list --metric-name host_alive_status --metric-dimensions hostname=MCP-VCP-cpesx-esx-ovsvapp0001-mgmt
    +--------------------------------------+--------------------------------------+-----------------------+-------------------+-------------------------------------------+----------+-------+-----------------+------+--------------------------+--------------------------+--------------------------+
    | id                                   | alarm_definition_id                  | alarm_definition_name | metric_name       | metric_dimensions                         | severity | state | lifecycle_state | link | state_updated_timestamp  | updated_timestamp        | created_timestamp        |
    +--------------------------------------+--------------------------------------+-----------------------+-------------------+-------------------------------------------+----------+-------+-----------------+------+--------------------------+--------------------------+--------------------------+
    | cfc6bfa4-2485-4319-b1e5-0107886f4270 | cca96c53-a927-4b0a-9bf3-cb21d28216f3 | Host Status           | host_alive_status | service: system                           | HIGH     | OK    | None            | None | 2016-10-27T06:33:04.256Z | 2016-10-27T06:33:04.256Z | 2016-10-23T13:41:57.258Z |
    |                                      |                                      |                       |                   | cloud_name: entry-scale-kvm-esx-mml       |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | test_type: ping                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | hostname: ardana-cp1-esx-ovsvapp0001-mgmt |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | control_plane: control-plane-1            |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cluster: mtrmon                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | observer_host: ardana-cp1-mtrmon-m1-mgmt  |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       | host_alive_status | service: system                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cloud_name: entry-scale-kvm-esx-mml       |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | test_type: ping                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | hostname: ardana-cp1-esx-ovsvapp0001-mgmt |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | control_plane: control-plane-1            |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cluster: mtrmon                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | observer_host: ardana-cp1-mtrmon-m3-mgmt  |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       | host_alive_status | service: system                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cloud_name: entry-scale-kvm-esx-mml       |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | test_type: ping                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | hostname: ardana-cp1-esx-ovsvapp0001-mgmt |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | control_plane: control-plane-1            |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | cluster: mtrmon                           |          |       |                 |      |                          |                          |                          |
    |                                      |                                      |                       |                   | observer_host: ardana-cp1-mtrmon-m2-mgmt  |          |       |                 |      |                          |                          |                          |
    +--------------------------------------+--------------------------------------+-----------------------+-------------------+-------------------------------------------+----------+-------+-----------------+------+--------------------------+--------------------------+--------------------------+
  6. Delete the Monasca alaram.

    monasca alarm-delete <alarm ID>

    For example:

    monasca alarm-delete cfc6bfa4-2485-4319-b1e5-0107886f4270Successfully deleted alarm

    After deleting the alarms and updating the monasca-agent configuration, those alarms will be removed from the Operations Console UI. You can login to Operations Console and view the status.

6.4.5 Clean up the entries of OVSvAPP VM from /etc/host

Perform the following procedure to clean up the entries of OVSvAPP VM from /etc/host.

  1. Login to Cloud Lifecycle Manager.

  2. Edit /etc/host.

    vi /etc/host

    For example: MCP-VCP-cpesx-esx-ovsvapp0001-mgmt VM is present in the /etc/host.

    192.168.86.17    MCP-VCP-cpesx-esx-ovsvapp0001-mgmt
  3. Delete the OVSvAPP entries from /etc/host.

6.4.6 Remove the OVSVAPP VM from the servers.yml and pass_through.yml files and run the Configuration Processor

Complete these steps from the Cloud Lifecycle Manager to remove the OVSvAPP VM:

  1. Log in to the Cloud Lifecycle Manager

  2. Edit servers.yml file to remove references to the OVSvAPP VM(s) you want to remove:

    ~/openstack/my_cloud/definition/data/servers.yml

    For example:

    - ip-addr:192.168.86.17
      server-group: AZ1    role:
      OVSVAPP-ROLE    id:
      6afaa903398c8fc6425e4d066edf4da1a0f04388
  3. Edit ~/openstack/my_cloud/definition/data/pass_through.yml file to remove the OVSvAPP VM references using the server-id above section to find the references.

    - data:
      vmware:
      vcenter_cluster: Clust1
      cluster_dvs_mapping: 'DC1/host/Clust1:TRUNK-DVS-Clust1'
      esx_hostname: MCP-VCP-cpesx-esx-ovsvapp0001-mgmt
      vcenter_id: 0997E2ED9-5E4F-49EA-97E6-E2706345BAB2
    id: 6afaa903398c8fc6425e4d066edf4da1a0f04388
  4. Commit the changes to git:

    git commit -a -m "Remove ESXi host <name>"
  5. Run the configuration processor. You may want to use the remove_deleted_servers and free_unused_addresses switches to free up the resources when running the configuration processor. See Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 7 “Other Topics”, Section 7.3 “Persisted Data” for more details.

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost config-processor-run.yml -e remove_deleted_servers="y" -e free_unused_addresses="y"
  6. Update your deployment directory:

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost ready-deployment.yml

6.4.7 Remove Distributed Resource Scheduler (DRS) Rules

Perform the following procedure to remove DRS rules, which is added by OVSvAPP installer to ensure that OVSvAPP does not get migrated to other hosts.

  1. Login to vCenter.

  2. Right click on cluster and select Edit settings.

    A cluster settings page appears.

  3. Click DRS Groups Manager on the left hand side of the pop-up box. Select the group which is created for deleted OVSvAPP and click Remove.

  4. Click Rules on the left hand side of the pop-up box and select the checkbox for deleted OVSvAPP and click Remove.

  5. Click OK.

6.5 Configuring Debug Logging

6.5.1 To Modify the OVSVAPP VM Log Level

To change the OVSVAPP log level to DEBUG, do the following:

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the file below:

    ~/openstack/ardana/ansible/roles/neutron-common/templates/ovsvapp-agent-logging.conf.j2
  3. Set the logging level value of the logger_root section to DEBUG, like this:

    [logger_root]
    qualname: root
    handlers: watchedfile, logstash
    level: DEBUG
  4. Commit your configuration to the Git repository (Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    cd ~/openstack/ardana/ansible
    git add -A
    git commit -m "My config or other commit message"
  5. Run the configuration processor:

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Update your deployment directory:

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Deploy your changes:

    cd ~/scratch/ansible/next/hos/ansible
    ansible-playbook -i hosts/verb_hosts neutron-reconfigure.yml

6.5.2 To Enable OVSVAPP Service for Centralized Logging

To enable OVSVAPP Service for centralized logging:

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the file below:

    ~/openstack/my_cloud/config/logging/vars/neutron-ovsvapp-clr.yml
  3. Set the value of centralized_logging to true as shown in the following sample:

    logr_services:
      neutron-ovsvapp:
        logging_options:
        - centralized_logging:
            enabled: true
            format: json
            ...
  4. Commit your configuration to the Git repository (Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    cd ~/openstack/ardana/ansible
    git add -A
    git commit -m "My config or other commit message"
  5. Run the configuration processor:

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Update your deployment directory:

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Deploy your changes, specifying the hostname for your OVSAPP host:

    cd ~/scratch/ansible/next/ardana/ansible
    ansible-playbook -i hosts/verb_hosts neutron-reconfigure.yml --limit <hostname>

    The hostname of the node can be found in the list generated from the output of the following command:

    grep hostname ~/openstack/my_cloud/info/server_info.yml

6.6 Making Scale Configuration Changes

This procedure describes how to make the recommended configuration changes to achieve 8,000 virtual machine instances.

Note
Note

In a scale environment for ESX computes, the configuration of vCenter Proxy VM has to be increased to 8 vCPUs and 16 GB RAM. By default it is 4 vCPUs and 4 GB RAM.

  1. Change the directory. The nova.conf.j2 file is present in following directories:

    cd ~/openstack/ardana/ansible/roles/nova-common/templates
  2. Edit the DEFAULT section in the nova.conf.j2 file as below:

    [DEFAULT]
    rpc_responce_timeout = 180
    server_down_time = 300
    report_interval = 30
  3. Commit your configuration:

    cd ~/openstack/ardana/ansible
    git add -A
    git commit -m "<commit message>"
  4. Prepare your environment for deployment:

    ansible-playbook -i hosts/localhost ready-deployment.yml;
    cd ~/scratch/ansible/next/ardana/ansible;
  5. Execute the nova-reconfigure playbook:

    ansible-playbook -i hosts/verb_hosts nova-reconfigure.yml

6.7 Monitoring vCenter Clusters

Remote monitoring of activated ESX cluster is enabled through vCenter Plugin of Monasca. The Monasca-agent running in each ESX Compute proxy node is configured with the vcenter plugin, to monitor the cluster.

Alarm definitions are created with the default threshold values and whenever the threshold limit breaches respective alarms (OK/ALARM/UNDETERMINED) are generated.

The configuration file details is given below:

init_config: {}
instances:
  - vcenter_ip: <vcenter-ip>
      username: <vcenter-username>
      password: <center-password>
      clusters: <[cluster list]>

Metrics List of metrics posted to monasca by vCenter Plugin are listed below:

  • vcenter.cpu.total_mhz

  • vcenter.cpu.used_mhz

  • vcenter.cpu.used_perc

  • vcenter.cpu.total_logical_cores

  • vcenter.mem.total_mb

  • vcenter.mem.used_mb

  • vcenter.mem.used_perc

  • vcenter.disk.total_space_mb

  • vcenter.disk.total_used_space_mb

  • vcenter.disk.total_used_space_perc

monasca measurement-list --dimensions esx_cluster_id=domain-c7.D99502A9-63A8-41A2-B3C3-D8E31B591224 vcenter.disk.total_used_space_mb 2016-08-30T11:20:08

+----------------------------------------------+----------------------------------------------------------------------------------------------+-----------------------------------+------------------+-----------------+
| name                                         | dimensions                                                                                   | timestamp                         | value            | value_meta      |
+----------------------------------------------+----------------------------------------------------------------------------------------------+-----------------------------------+------------------+-----------------+
| vcenter.disk.total_used_space_mb             | vcenter_ip: 10.1.200.91                                                                      | 2016-08-30T11:20:20.703Z          | 100371.000       |                 |
|                                              | esx_cluster_id: domain-c7.D99502A9-63A8-41A2-B3C3-D8E31B591224                               | 2016-08-30T11:20:50.727Z          | 100371.000       |                 |
|                                              | hostname: MCP-VCP-cpesx-esx-comp0001-mgmt                                                    | 2016-08-30T11:21:20.707Z          | 100371.000       |                 |
|                                              |                                                                                              | 2016-08-30T11:21:50.700Z          | 100371.000       |                 |
|                                              |                                                                                              | 2016-08-30T11:22:20.700Z          | 100371.000       |                 |
|                                              |                                                                                              | 2016-08-30T11:22:50.700Z          | 100371.000       |                 |
|                                              |                                                                                              | 2016-08-30T11:23:20.620Z          | 100371.000       |                 |
+----------------------------------------------+-----------------------------------------------------------------------------------------------+-----------------------------------+------------------+-----------------+

Dimensions

Each metric will have the dimension as below

vcenter_ip

FQDN/IP Address of the registered vCenter

server esx_cluster_id

clusterName.vCenter-id, as seen in the nova hypervisor-list

hostname

ESX compute proxy name

Alarms

Alarms are created for monitoring cpu, memory and disk usages for each activated clusters. The alarm definitions details are

NameExpressionSeverityMatch_by
ESX cluster CPU Usageavg(vcenter.cpu.used_perc) > 90 times 3Highesx_cluster_id
ESX cluster Memory Usageavg(vcenter.mem.used_perc) > 90 times 3Highesx_cluster_id
ESX cluster Disk Usagevcenter.disk.total_used_space_perc > 90Highesx_cluster_id

6.8 Monitoring Integration with OVSvApp Appliance

6.8.1 Processes Monitored with Monasca Agent

Using the Monasca agent, the following services are monitored on the OVSvApp appliance:

  • Neutron_ovsvapp_agent service - This is the Neutron agent which runs in the appliance which will help enable networking for the tenant virtual machines.

  • Openvswitch - This service is used by the neutron_ovsvapp_agent service for enabling the datapath and security for the tenant virtual machines.

  • Ovsdb-server - This service is used by the neutron_ovsvapp_agent service.

If any of the above three processes fail to run on the OVSvApp appliance it will lead to network disruption for the tenant virtual machines. This is why they are monitored.

The monasca-agent periodically reports the status of these processes and metrics data ('load' - cpu.load_avg_1min, 'process' - process.pid_count, 'memory' - mem.usable_perc, 'disk' - disk.space_used_perc, 'cpu' - cpu.idle_perc for examples) to the Monasca server.

6.8.2 How It Works

Once the vApp is configured and up, the monasca-agent will attempt to register with the Monasca server. After successful registration, the monitoring begins on the processes listed above and you will be able to see status updates on the server side.

The monasca-agent monitors the processes at the system level so, in the case of failures of any of the configured processes, updates should be seen immediately from Monasca.

To check the events from the server side, log into the Operations Console. For more details on how to use the Operations Console, see Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.1 “Operations Console Overview”.

7 Managing Block Storage

Information about managing and configuring the Block Storage service.

7.1 Managing Block Storage using Cinder

SUSE OpenStack Cloud Block Storage volume operations use the OpenStack Cinder service to manage storage volumes, which includes creating volumes, attaching/detaching volumes to Nova instances, creating volume snapshots, and configuring volumes.

SUSE OpenStack Cloud supports the following storage back ends for block storage volumes and backup datastore configuration:

  • Volumes

    • 3PAR FC or iSCSI; for more information, see Book “Installing with Cloud Lifecycle Manager”, Chapter 23 “Integrations”, Section 23.1 “Configuring for 3PAR Block Storage Backend”.

  • Backup

    • Swift

7.1.1 Setting Up Multiple Block Storage Backends

SUSE OpenStack Cloud supports setting up multiple block storage backends and multiple volume types.

Regardless of whether you have a single or multiple block storage backends defined in your cinder.conf.j2 file then you can create one or more volume types using the specific attributes associated with the backend. You can find details on how to do that for each of the supported backend types here:

  • Book “Installing with Cloud Lifecycle Manager”, Chapter 23 “Integrations”, Section 23.1 “Configuring for 3PAR Block Storage Backend”

7.1.2 Creating a Volume Type for your Volumes

Creating volume types allows you to create standard specifications for your volumes.

Volume types are used to specify a standard Block Storage back-end and collection of extra specifications for your volumes. This allows an administrator to give its users a variety of options while simplifying the process of creating volumes.

The tasks involved in this process are:

7.1.2.1 Create a Volume Type for your Volumes

The default volume type will be thin provisioned and will have no fault tolerance (RAID 0). You should configure Cinder to fully provision volumes, and you may want to configure fault tolerance. Follow the instructions below to create a new volume type which is fully provisioned and fault tolerant:

Perform the following steps to create a volume type using the Horizon GUI:

  1. Log in to the Horizon dashboard. See Book “User Guide Overview”, Chapter 3 “Cloud Admin Actions with the Dashboard” for details on how to do this.

  2. Ensure that you are scoped to your admin Project. Then under the Admin menu in the navigation pane, click on Volumes under the System subheading.

  3. Select the Volume Types tab and then click the Create Volume Type button to display a dialog box.

  4. Enter a unique name for the volume type and then click the Create Volume Type button to complete the action.

The newly created volume type will be displayed in the Volume Types list confirming its creation.

7.1.2.2 Associate the Volume Type to the Back-end

After the volume type(s) have been created, you can assign extra specification attributes to the volume types. Each Block Storage back-end option has unique attributes that can be used.

To map a volume type to a back-end, do the following:

  1. Log into the Horizon dashboard. See Book “User Guide Overview”, Chapter 3 “Cloud Admin Actions with the Dashboard” for details on how to do this.

  2. Ensure that you are scoped to your admin Project (for more information, see Section 4.10.7, “Scope Federated User to Domain”. Then under the Admin menu in the navigation pane, click on Volumes under the System subheading.

  3. Click the Volume Type tab to list the volume types.

  4. In the Actions column of the Volume Type you created earlier, click the drop-down option and select View Extra Specs which will bring up the Volume Type Extra Specs options.

  5. Click the Create button on the Volume Type Extra Specs screen.

  6. In the Key field, enter one of the key values in the table in the next section. In the Value box, enter its corresponding value. Once you have completed that, click the Create button to create the extra volume type specs.

Once the volume type is mapped to a back-end, you can create volumes with this volume type.

7.1.2.3 Extra Specification Options for 3PAR

3PAR supports volumes creation with additional attributes. These attributes can be specified using the extra specs options for your volume type. The administrator is expected to define appropriate extra spec for 3PAR volume type as per the guidelines provided at http://docs.openstack.org/liberty/config-reference/content/hp-3par-supported-ops.html.

The following Cinder Volume Type extra-specs options enable control over the 3PAR storage provisioning type:

KeyValueDescription
volume_backend_namevolume backend name

The name of the back-end to which you want to associate the volume type, which you also specified earlier in the cinder.conf.j2 file.

hp3par:provisioning (optional)thin, full, or dedup 

See OpenStack HPE 3PAR StoreServ Block Storage Driver Configuration Best Practices for more details.

7.1.3 Managing Cinder Volume and Backup Services

Important
Important: Use Only When Needed

If the host running the cinder-volume service fails for any reason, it should be restarted as quickly as possible. Often, the host running Cinder services also runs high availability (HA) services such as MySQL and RabbitMQ. These HA services are at risk while one of the nodes in the cluster is down. If it will take a significant amount of time to recover the failed node, then you may migrate the cinder-volume service to one of the other controller nodes. When the node has been recovered you should migrate the cinder-volume service back to the original (default) node.

7.1.3.1 Migrating the cinder-volume service

Use the following steps to migrate the cinder-volume service.

  1. Log in to the Cloud Lifecycle Manager node.

  2. Determine the host index numbers for each of your control plane nodes. This host index number will be used in a later step. They can be obtained by running this playbook:

    cd ~/scratch/ansible/next/ardana/ansible
    ansible-playbook -i hosts/verb_hosts cinder-show-volume-hosts.yml

    Here is an example snippet showing the output of a single three node control plane, with the host index numbers in bold:

    TASK: [_CND-CMN | show_volume_hosts | Show Cinder Volume hosts index and hostname] ***
    ok: [ardana-cp1-c1-m1] => (item=(0, 'ardana-cp1-c1-m1')) => {
        "item": [
            0,
            "ardana-cp1-c1-m1"
        ],
        "msg": "Index 0 Hostname ardana-cp1-c1-m1"
    }
    ok: [ardana-cp1-c1-m1] => (item=(1, 'ardana-cp1-c1-m2')) => {
        "item": [
            1,
            "ardana-cp1-c1-m2"
        ],
        "msg": "Index 1 Hostname ardana-cp1-c1-m2"
    }
    ok: [ardana-cp1-c1-m1] => (item=(2, 'ardana-cp1-c1-m3')) => {
        "item": [
            2,
            "ardana-cp1-c1-m3"
        ],
        "msg": "Index 2 Hostname ardana-cp1-c1-m3"
    }
  3. Locate the control plane fact file for the control plane you need to migrate the service from. It will be located in the following directory:

    /etc/ansible/facts.d/

    These fact files use the following naming convention:

    cinder_volume_run_location_<control_plane_name>.fact
  4. Edit the fact file to include the host index number of the control plane node you wish to migrate the cinder-volume services to. For example, if they currently reside on your first controller node, host index 0, and you wish to migrate them to your second controller, you would change the value in the fact file to 1.

  5. If you are using data encryption on your Cloud Lifecycle Manager, ensure you have included the encryption key in your environment variables:

    export HOS_USER_PASSWORD_ENCRYPT_KEY=<encryption key>
  6. Once you have edited the control plane fact file, run the Cinder volume migration playbook for the control plane nodes involved in the migration. This at minimum includes the one to start cinder-volume manager on and the one on which to stop it:

    cd ~/scratch/ansible/next/ardana/ansible
    ansible-playbook -i hosts/verb_hosts cinder-migrate-volume.yml --limit=<limit_pattern1,limit_pattern2>
    Note
    Note

    <limit_pattern> is the pattern used to limit the hosts that are selected to those within a specific control plane.

  7. Ensure that once your maintenance or other tasks are completed that you migrate the cinder-volume services back to their original node using these same steps.

8 Managing Object Storage

Information about managing and configuring the Object Storage service.

Managing your object storage environment includes tasks related to ensuring your Swift rings stay balanced and we discuss that and other topics in more detail in this section.

You can verify the Swift object storage operational status using commands and utilities. This section covers the following topics:

8.1 Running the Swift Dispersion Report

Swift contains a tool called swift-dispersion-report that can be used to determine whether your containers and objects have three replicas like they are supposed to. This tool works by populating a percentage of partitions in the system with containers and objects (using swift-dispersion-populate) and then running the report to see if all the replicas of these containers and objects are in the correct place. For a more detailed explanation of this tool in Openstack Swift, please see OpenStack Swift - Administrator's Guide.

8.1.1 Configuring the Swift dispersion populate

Once a Swift system has been fully deployed in SUSE OpenStack Cloud 8, you can setup the swift-dispersion-report using the default parameters found in ~/openstack/ardana/ansible/roles/swift-dispersion/templates/dispersion.conf.j2. This populates 1% of the partitions on the system and if you are happy with this figure, please proceed to step 2 below. Otherwise, follow step 1 to edit the configuration file.

  1. If you wish to change the dispersion coverage percentage then edit the value of dispersion_coverage in the ~/openstack/ardana/ansible/roles/swift-dispersion/templates/dispersion.conf.j2 file to the value you wish to use. In the example below we have altered the file to create 5% dispersion:

    ...
    [dispersion]
    auth_url = {{ keystone_identity_uri }}/v3
    auth_user = {{ swift_dispersion_tenant }}:{{ swift_dispersion_user }}
    auth_key = {{ swift_dispersion_password  }}
    endpoint_type = {{ endpoint_type }}
    auth_version = {{ disp_auth_version }}
    # Set this to the percentage coverage. We recommend a value
    # of 1%. You can increase this to get more coverage. However, if you
    # decrease the value, the dispersion containers and objects are
    # not deleted.
    dispersion_coverage = 5.0
  2. Commit your configuration to the Git repository (Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > git add -A
    ardana > git commit -m "My config or other commit message"
  3. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  4. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  5. Reconfigure the Swift servers:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-reconfigure.yml
  6. Run this playbook to populate your Swift system for the health check:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-dispersion-populate.yml

8.1.2 Running the Swift dispersion report

Check the status of the Swift system by running the Swift dispersion report with this playbook:

ardana > cd ~/scratch/ansible/next/ardana/ansible
ardana > ansible-playbook -i hosts/verb_hosts swift-dispersion-report.yml

The output of the report will look similar to this:

TASK: [swift-dispersion | report | Display dispersion report results] *********
ok: [padawan-ccp-c1-m1-mgmt] => {
    "var": {
        "dispersion_report_result.stdout_lines": [
            "Using storage policy: General ",
            "",
            "[KQueried 40 containers for dispersion reporting, 0s, 0 retries",
            "100.00% of container copies found (120 of 120)",
            "Sample represents 0.98% of the container partition space",
            "",
            "[KQueried 40 objects for dispersion reporting, 0s, 0 retries",
            "There were 40 partitions missing 0 copies.",
            "100.00% of object copies found (120 of 120)",
            "Sample represents 0.98% of the object partition space"
        ]
    }
}
...

In addition to being able to run the report above, there will be a cron-job running every 2 hours on the first proxy node of your system that will run dispersion-report and save the results to the following file:

/var/cache/swift/dispersion-report

When interpreting the results you get from this report, we recommend using Swift Administrator's Guide - Cluster Health

8.2 Gathering Swift Data

The swift-recon command retrieves data from Swift servers and displays the results. To use this command, log on as a root user to any node which is running the swift-proxy service.

8.2.1 Notes

For help with the swift-recon command you can use this:

tux > sudo swift-recon --help
Warning
Warning

The --driveaudit option is not supported.

Warning
Warning

SUSE OpenStack Cloud does not support ec_type isa_l_rs_vand and ec_num_parity_fragments greater than or equal to 5 in the storage-policy configuration. This particular policy is known to harm data durability.

8.2.2 Using the swift-recon Command

The following command retrieves and displays disk usage information:

tux > sudo swift-recon --diskusage

For example:

tux > sudo swift-recon --diskusage
===============================================================================
--> Starting reconnaissance on 3 hosts
===============================================================================
[2015-09-14 16:01:40] Checking disk usage now
Distribution Graph:
 10%    3 *********************************************************************
 11%    1 ***********************
 12%    2 **********************************************
Disk usage: space used: 13745373184 of 119927734272
Disk usage: space free: 106182361088 of 119927734272
Disk usage: lowest: 10.39%, highest: 12.96%, avg: 11.4613798613%
===============================================================================

In the above example, the results for several nodes are combined together. You can also view the results from individual nodes by adding the -v option as shown in the following example:

tux > sudo swift-recon --diskusage -v
===============================================================================
--> Starting reconnaissance on 3 hosts
===============================================================================
[2015-09-14 16:12:30] Checking disk usage now
-> http://192.168.245.3:6000/recon/diskusage: [{'device': 'disk1', 'avail': 17398411264, 'mounted': True, 'used': 2589544448, 'size': 19987955712}, {'device': 'disk0', 'avail': 17904222208, 'mounted': True, 'used': 2083733504, 'size': 19987955712}]
-> http://192.168.245.2:6000/recon/diskusage: [{'device': 'disk1', 'avail': 17769721856, 'mounted': True, 'used': 2218233856, 'size': 19987955712}, {'device': 'disk0', 'avail': 17793581056, 'mounted': True, 'used': 2194374656, 'size': 19987955712}]
-> http://192.168.245.4:6000/recon/diskusage: [{'device': 'disk1', 'avail': 17912147968, 'mounted': True, 'used': 2075807744, 'size': 19987955712}, {'device': 'disk0', 'avail': 17404235776, 'mounted': True, 'used': 2583719936, 'size': 19987955712}]
Distribution Graph:
 10%    3 *********************************************************************
 11%    1 ***********************
 12%    2 **********************************************
Disk usage: space used: 13745414144 of 119927734272
Disk usage: space free: 106182320128 of 119927734272
Disk usage: lowest: 10.39%, highest: 12.96%, avg: 11.4614140152%
===============================================================================

By default, swift-recon uses the object-0 ring for information about nodes and drives. For some commands, it is appropriate to specify account, container, or object to indicate the type of ring. For example, to check the checksum of the account ring, use the following:

tux > sudo swift-recon --md5 account
===============================================================================
--> Starting reconnaissance on 3 hosts
===============================================================================
[2015-09-14 16:17:28] Checking ring md5sums
3/3 hosts matched, 0 error[s] while checking hosts.
===============================================================================
[2015-09-14 16:17:28] Checking swift.conf md5sum
3/3 hosts matched, 0 error[s] while checking hosts.
===============================================================================

8.3 Gathering Swift Monitoring Metrics

The swiftlm-scan command is the mechanism used to gather metrics for the Monasca system. These metrics are used to derive alarms. For a list of alarms that can be generated from this data, see Section 15.1.1, “Alarm Resolution Procedures”.

To view the metrics, use the swiftlm-scan command directly. Log on to the Swift node as the root user. The following example shows the command and a snippet of the output:

tux > sudo swiftlm-scan --pretty
. . .
  {
    "dimensions": {
      "device": "sdc",
      "hostname": "padawan-ccp-c1-m2-mgmt",
      "service": "object-storage"
    },
    "metric": "swiftlm.swift.drive_audit",
    "timestamp": 1442248083,
    "value": 0,
    "value_meta": {
      "msg": "No errors found on device: sdc"
    }
  },
. . .
Note
Note

To make the JSON file easier to read, use the --pretty option.

The fields are as follows:

metric

Specifies the name of the metric.

dimensions

Provides information about the source or location of the metric. The dimensions differ depending on the metric in question. The following dimensions are used by swiftlm-scan:

  • service: This is always object-storage.

  • component: This identifies the component. For example, swift-object-server indicates that the metric is about the swift-object-server process.

  • hostname: This is the name of the node the metric relates to. This is not necessarily the name of the current node.

  • url: If the metric is associated with a URL, this is the URL.

  • port: If the metric relates to connectivity to a node, this is the port used.

  • device: This is the block device a metric relates to.

value

The value of the metric. For many metrics, this is simply the value of the metric. However, if the value indicates a status. If value_meta contains a msg field, the value is a status. The following status values are used:

  • 0 - no error

  • 1 - warning

  • 2 - failure

value_meta

Additional information. The msg field is the most useful of this information.

8.3.1 Optional Parameters

You can focus on specific sets of metrics by using one of the following optional parameters:

--replication

Checks replication and health status.

--file-ownership

Checks that Swift owns its relevant files and directories.

--drive-audit

Checks for logged events about corrupted sectors (unrecoverable read errors) on drives.

--connectivity

Checks connectivity to various servers used by the Swift system, including:

  • Checks this node can connect to all memcachd servers

  • Checks that this node can connect to the Keystone service (only applicable if this is a proxy server node)

--swift-services

Check that the relevant Swift processes are running.

--network-interface

Checks NIC speed and reports statistics for each interface.

--check-mounts

Checks that the node has correctly mounted drives used by Swift.

--hpssacli

If this server uses a Smart Array Controller, this checks the operation of the controller and disk drives.

8.4 Using the Swift Command-line Client (CLI)

The swift utility (or Swift CLI) is installed on the Cloud Lifecycle Manager node and also on all other nodes running the Swift proxy service. To use this utility on the Cloud Lifecycle Manager, you can use the ~/service.osrc file as a basis and then edit it with the credentials of another user if you need to.

ardana > cp ~/service.osrc ~/swiftuser.osrc

Then you can use your preferred editor to edit swiftuser.osrc so you can authenticate using the OS_USERNAME, OS_PASSWORD, and OS_PROJECT_NAME you wish to use. For example, if you would like to use the demo user that is created automatically for you, then it might look like this:

unset OS_DOMAIN_NAME
export OS_IDENTITY_API_VERSION=3
export OS_AUTH_VERSION=3
export OS_PROJECT_NAME=demo
export OS_PROJECT_DOMAIN_NAME=Default
export OS_USERNAME=demo
export OS_USER_DOMAIN_NAME=Default
export OS_PASSWORD=<password>
export OS_AUTH_URL=<auth_URL>
export OS_ENDPOINT_TYPE=internalURL
# OpenstackClient uses OS_INTERFACE instead of OS_ENDPOINT
export OS_INTERFACE=internal
export OS_CACERT=/etc/ssl/certs/ca-certificates.crt
export OS_COMPUTE_API_VERSION=2

You must use the appropriate password for the demo user and select the correct endpoint for the OS_AUTH_URL value, which should be in the ~/service.osrc file you copied.

You can then examine the following account data using this command:

ardana > swift stat

Example showing an environment with no containers or objects:

ardana > swift stat
        Account: AUTH_205804d000a242d385b8124188284998
     Containers: 0
        Objects: 0
          Bytes: 0
X-Put-Timestamp: 1442249536.31989
     Connection: keep-alive
    X-Timestamp: 1442249536.31989
     X-Trans-Id: tx5493faa15be44efeac2e6-0055f6fb3f
   Content-Type: text/plain; charset=utf-8

Use the following command and create a container:

ardana > swift post CONTAINER_NAME

Example, creating a container named documents:

ardana > swift post documents

The newly created container appears. But there are no objects:

ardana > swift stat documents
         Account: AUTH_205804d000a242d385b8124188284998
       Container: documents
         Objects: 0
           Bytes: 0
        Read ACL:
       Write ACL:
         Sync To:
        Sync Key:
   Accept-Ranges: bytes
X-Storage-Policy: General
      Connection: keep-alive
     X-Timestamp: 1442249637.69486
      X-Trans-Id: tx1f59d5f7750f4ae8a3929-0055f6fbcc
    Content-Type: text/plain; charset=utf-8

Upload a document:

ardana > swift upload CONTAINER_NAME FILENAME

Example:

ardana > swift upload documents mydocument
mydocument

List objects in the container:

ardana > swift list CONTAINER_NAME

Example:

ardana > swift list documents
mydocument
Note
Note

This is a brief introduction to the swift CLI. Use the swift --help command for more information. You can also use the OpenStack CLI, see openstack -h for more information.

8.5 Managing Swift Rings

Swift rings are a machine-readable description of which disk drives are used by the Object Storage service (for example, a drive is used to store account or object data). Rings also specify the policy for data storage (for example, defining the number of replicas). The rings are automatically built during the initial deployment of your cloud, with the configuration provided during setup of the SUSE OpenStack Cloud Input Model. For more information, see Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 5 “Input Model”.

After successful deployment of your cloud, you may want to change or modify the configuration for Swift. For example, you may want to add or remove Swift nodes, add additional storage policies, or upgrade the size of the disk drives. For instructions, see Section 8.5.5, “Applying Input Model Changes to Existing Rings” and Section 8.5.6, “Adding a New Swift Storage Policy”.

Note
Note

The process of modifying or adding a configuration is similar to other configuration or topology changes in the cloud. Generally, you make the changes to the input model files at ~/openstack/my_cloud/definition/ on the Cloud Lifecycle Manager and then run Ansible playbooks to reconfigure the system.

Changes to the rings require several phases to complete, therefore, you may need to run the playbooks several times over several days.

The following topics cover ring management.

8.5.1 Rebalancing Swift Rings

The Swift ring building process tries to distribute data evenly among the available disk drives. The data is stored in partitions. (For more information, see Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 11 “Modifying Example Configurations for Object Storage using Swift”, Section 11.10 “Understanding Swift Ring Specifications”.) If you, for example, double the number of disk drives in a ring, you need to move 50% of the partitions to the new drives so that all drives contain the same number of partitions (and hence same amount of data). However, it is not possible to move the partitions in a single step. It can take minutes to hours to move partitions from the original drives to their new drives (this process is called the replication process).

If you move all partitions at once, there would be a period where Swift would expect to find partitions on the new drives, but the data has not yet replicated there so that Swift could not return the data to the user. Therefore, Swift will not be able to find all of the data in the middle of replication because some data has finished replication while other bits of data are still in the old locations and have not yet been moved. So it is considered best practice to move only one replica at a time. If the replica count is 3, you could first move 16.6% of the partitions and then wait until all data has replicated. Then move another 16.6% of partitions. Wait again and then finally move the remaining 16.6% of partitions. For any given object, only one of the replicas is moved at a time.

8.5.1.1 Reasons to Move Partitions Gradually

Due to the following factors, you must move the partitions gradually:

  • Not all devices are of the same size. SUSE OpenStack Cloud 8 automatically assigns different weights to drives so that smaller drives store fewer partitions than larger drives.

  • The process attempts to keep replicas of the same partition in different servers.

  • Making a large change in one step (for example, doubling the number of drives in the ring), would result in a lot of network traffic due to the replication process and the system performance suffers. There are two ways to mitigate this:

8.5.2 Using the Weight-Step Attributes to Prepare for Ring Changes

Swift rings are built during a deployment and this process sets the weights of disk drives such that smaller disk drives have a smaller weight than larger disk drives. When making changes in the ring, you should limit the amount of change that occurs. SUSE OpenStack Cloud 8 does this by limiting the weights of the new drives to a smaller value and then building new rings. Once the replication process has finished, SUSE OpenStack Cloud 8 will increase the weight and rebuild rings to trigger another round of replication. (For more information, see Section 8.5.1, “Rebalancing Swift Rings”.)

In addition, you should become familiar with how the replication process behaves on your system during normal operation. Before making ring changes, use the swift-recon command to determine the typical oldest replication times for your system. For instructions, see Section 8.5.4, “Determining When to Rebalance and Deploy a New Ring”.

In SUSE OpenStack Cloud, the weight-step attribute is set in the ring specification of the input model. The weight-step value specifies a maximum value for the change of the weight of a drive in any single rebalance. For example, if you add a drive of 4TB, you would normally assign a weight of 4096. However, if the weight-step attribute is set to 1024 instead then when you add that drive the weight is initially set to 1024. The next time you rebalance the ring, the weight is set to 2048. The subsequent rebalance would then set the weight to the final value of 4096.

The value of the weight-step attribute is dependent on the size of the drives, number of the servers being added, and how experienced you are with the replication process. A common starting value is to use 20% of the size of an individual drive. For example, when adding X number of 4TB drives a value of 820 would be appropriate. As you gain more experience with your system, you may increase or reduce this value.

8.5.2.1 Setting the weight-step attribute

Perform the following steps to set the weight-step attribute:

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the ~/openstack/my_cloud/definition/data/swift/rings.yml file containing the ring-specifications for the account, container, and object rings.

    Add the weight-step attribute to the ring in this format:

    - name: account
      weight-step: WEIGHT_STEP_VALUE
      display-name: Account Ring
      min-part-hours: 16
      ...

    For example, to set weight-step to 820, add the attribute like this:

    - name: account
      weight-step: 820
      display-name: Account Ring
      min-part-hours: 16
      ...
  3. Repeat step 2 for the other rings, if necessary (container, object-0, etc).

  4. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Use the playbook to create a deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  6. To complete the configuration, use the ansible playbooks documented in Section 8.5.3, “Managing Rings Using Swift Playbooks”.

8.5.3 Managing Rings Using Swift Playbooks

The following table describes how playbooks relate to ring management.

All of these playbooks will be run from the Cloud Lifecycle Manager from the ~/scratch/ansible/next/ardana/ansible directory.

PlaybookDescriptionNotes
swift-update-from-model-rebalance-rings.yml

There are two steps in this playbook:

  • Make delta

    It processes the input model and compares it against the existing rings. After comparison, it produces a list of differences between the input model and the existing rings. This is called the ring delta. The ring delta covers drives being added, drives being removed, weight changes, and replica count changes.

  • Rebalance

    The ring delta is then converted into a series of commands (such as add) to the swift-ring-builder program. Finally, the rebalance command is issued to the swift-ring-builder program.

This playbook performs its actions on the first node running the swift-proxy service. (For more information, see Section 15.6.2.4, “Identifying the Swift Ring Building Server”.) However, it also scans all Swift nodes to find the size of disk drives.

If there are no changes in the ring delta, the rebalance command is still executed to rebalance the rings. If min-part-hours has not yet elapsed or if no partitions need to be moved, new rings are not written.

swift-compare-model-rings.yml

There are two steps in this playbook:

  • Make delta

    This is the same as described for swift-update-from-model-rebalance-rings.yml.

  • Report

    This prints a summary of the proposed changes that will be made to the rings (that is, what would happen if you rebalanced).

The playbook reports any issues or problems it finds with the input model.

This playbook can be useful to confirm that there are no errors in the input model. It also allows you to check that when you change the input model, that the proposed ring changes are as expected. For example, if you have added a server to the input model, but this playbook reports that no drives are being added, you should determine the cause.

There is troubleshooting information related to the information that you receive in this report that you can view on this page: Section 15.6.2.3, “Interpreting Swift Input Model Validation Errors”.

swift-deploy.yml

swift-deploy.yml is responsible for installing software and configuring Swift on nodes. As part of installing and configuring, it runs the swift-update-from-model-rebalance-rings.yml and swift-reconfigure.yml playbooks.

This playbook is included in the ardana-deploy.yml and site.yml playbooks, so if you run either of those playbooks, the swift-deploy.yml playbook is also run.

swift-reconfigure.yml

swift-reconfigure.yml takes rings that the swift-update-from-model-rebalance-rings.yml playbook has changed and copies those rings to all Swift nodes.

Every time that you directly use the swift-update-from-model-rebalance-rings.yml playbook, you must copy these rings to the system using the swift-reconfigure.yml playbook. If you forget and run swift-update-from-model-rebalance-rings.yml twice, the process may move two replicates of some partitions at the same time.

8.5.3.1 Optional Ansible variables related to ring management

The following optional variables may be specified when running the playbooks outlined above. They are specified using the --extra-vars option.

VariableDescription and Use
limit_ring

Limit changes to the named ring. Other rings will not be examined or updated. This option may be used with any of the Swift playbooks. For example, to only update the object-1 ring, use the following command:

ardana > ansible-playbook -i hosts/verb_hosts swift-update-from-model-rebalance-rings.yml --extra-vars "limit-ring=object-1"
drive_detail

Used only with the swift-compare-model-rings.yml playbook. The playbook will include details of changes to every drive where the model and existing rings differ. If you omit the drive_detail variable, only summary information is provided. The following shows how to use the drive_detail variable:

ardana > ansible-playbook -i hosts/verb_hosts swift-compare-model-rings.yml --extra-vars "drive_detail=yes"

8.5.3.2 Interpreting the report from the swift-compare-model-rings.yml playbook

The swift-compare-model-rings.yml playbook compares the existing Swift rings with the input model and prints a report telling you how the rings and the model differ. Specifically, it will tell you what actions will take place when you next run the swift-update-from-model-rebalance-rings.yml playbook (or a playbook such as ardana-deploy.yml that runs swift-update-from-model-rebalance-rings.yml).

The swift-compare-model-rings.yml playbook will make no changes, but is just an advisory report.

Here is an example output from the playbook. The report is between "report.stdout_lines" and "PLAY RECAP":

TASK: [swiftlm-ring-supervisor | validate-input-model | Print report] *********
ok: [ardana-cp1-c1-m1-mgmt] => {
    "var": {
        "report.stdout_lines": [
            "Rings:",
            "  ACCOUNT:",
            "    ring exists (minimum time to next rebalance: 8:07:33)",
            "    will remove 1 devices (18.00GB)",
            "    ring will be rebalanced",
            "  CONTAINER:",
            "    ring exists (minimum time to next rebalance: 8:07:35)",
            "    no device changes",
            "    ring will be rebalanced",
            "  OBJECT-0:",
            "    ring exists (minimum time to next rebalance: 8:07:34)",
            "    no device changes",
            "    ring will be rebalanced"
        ]
    }
}

The following describes the report in more detail:

MessageDescription

ring exists

The ring already exists on the system.

ring will be created

The ring does not yet exist on the system.

no device changes

The devices in the ring exactly match the input model. There are no servers being added or removed and the weights are appropriate for the size of the drives.

minimum time to next rebalance

If this time is 0:00:00, if you run one of the Swift playbooks that update rings, the ring will be rebalanced.

If the time is non-zero, it means that not enough time has elapsed since the ring was last rebalanced. Even if you run a Swift playbook that attempts to change the ring, the ring will not actually rebalance. This time is determined by the min-part-hours attribute.

set-weight ardana-ccp-c1-m1-mgmt:disk0:/dev/sdc 8.00 > 12.00 > 18.63

The weight of disk0 (mounted on /dev/sdc) on server ardana-ccp-c1-m1-mgmt is currently set to 8.0 but should be 18.83 given the size of the drive. However, in this example, we cannot go directly from 8.0 to 18.63 because of the weight-step attribute. Hence, the proposed weight change is from 8.0 to 12.0.

This information is only shown when you the drive_detail=yes argument when running the playbook.

will change weight on 12 devices (6.00TB)

The weight of 12 devices will be increased. This might happen for example, if a server had been added in a prior ring update. However, with use of the weight-step attribute, the system gradually increases the weight of these new devices. In this example, the change in weight represents 6TB of total available storage. For example, if your system currently has 100TB of available storage, when the weight of these devices is changed, there will be 106TB of available storage. If your system is 50% utilized, this means that when the ring is rebalanced, up to 3TB of data may be moved by the replication process. This is an estimate - in practice, because only one copy of a given replica is moved in any given rebalance, it may not be possible to move this amount of data in a single ring rebalance.

add: ardana-ccp-c1-m1-mgmt:disk0:/dev/sdc

The disk0 device will be added to the ardana-ccp-c1-m1-mgmt server. This happens when a server is added to the input model or if a disk model is changed to add additional devices.

This information is only shown when you the drive_detail=yes argument when running the playbook.

remove: ardana-ccp-c1-m1-mgmt:disk0:/dev/sdc

The device is no longer in the input model and will be removed from the ring. This happens if a server is removed from the model, a disk drive is removed from a disk model or the server is marked for removal using the pass-through feature.

This information is only shown when you the drive_detail=yes argument when running the playbook.

will add 12 devices (6TB)

There are 12 devices in the input model that have not yet been added to the ring. Usually this is because one or more servers have been added. In this example, this could be one server with 12 drives or two servers, each with 6 drives. The size in the report is the change in total available capacity. When the weight-step attribute is used, this may be a fraction of the total size of the disk drives. In this example, 6TB of capacity is being added. For example, if your system currently has 100TB of available storage, when these devices are added, there will be 106TB of available storage. If your system is 50% utilized, this means that when the ring is rebalanced, up to 3TB of data may be moved by the replication process. This is an estimate - in practice, because only one copy of a given replica is moved in any given rebalance, it may not be possible to move this amount of data in a single ring rebalance.

will remove 12 devices (6TB)

There are 12 devices in rings that no longer appear in the input model. Usually this is because one or more servers have been removed. In this example, this could be one server with 12 drives or two servers, each with 6 drives. The size in the report is the change in total removed capacity. In this example, 6TB of capacity is being removed. For example, if your system currently has 100TB of available storage, when these devices are removed, there will be 94TB of available storage. If your system is 50% utilized, this means that when the ring is rebalanced, approximately 3TB of data must be moved by the replication process.

min-part-hours will be changed

The min-part-hours attribute has been changed in the ring specification in the input model.

replica-count will be changed

The replica-count attribute has been changed in the ring specification in the input model.

ring will be rebalanced

This is always reported. Every time the swift-update-from-model-rebalance-rings.yml playbook is run, it will execute the swift-ring-builder rebalance command. This happens even if there were no input model changes. If the ring is already well balanced, the swift-ring-builder will not rewrite the ring.

8.5.4 Determining When to Rebalance and Deploy a New Ring

Before deploying a new ring, you must be sure the change that has been applied to the last ring is complete (that is, all the partitions are in their correct location). There are three aspects to this:

  • Is the replication system busy?

    You might want to postpone a ring change until after replication has finished. If the replication system is busy repairing a failed drive, a ring change will place additional load on the system. To check that replication has finished, use the swift-recon command with the --replication argument. (For more information, see Section 8.2, “Gathering Swift Data”.) The oldest completion time can indicate that the replication process is very busy. If it is more than 15 or 20 minutes then the object replication process are probably still very busy. The following example indicates that the oldest completion is 120 seconds, so that the replication process is probably not busy:

    root # swift-recon --replication
    ===============================================================================
    --> Starting reconnaissance on 3 hosts
    ===============================================================================
    [2015-10-02 15:31:45] Checking on replication
    [replication_time] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
    Oldest completion was 2015-10-02 15:31:32 (120 seconds ago) by 192.168.245.4:6000.
    Most recent completion was 2015-10-02 15:31:43 (10 seconds ago) by 192.168.245.3:6000.
    ===============================================================================
  • Are there drive or server failures?

    A drive failure does not preclude deploying a new ring. In principle, there should be two copies elsewhere. However, another drive failure in the middle of replication might make data temporary unavailable. If possible, postpone ring changes until all servers and drives are operating normally.

  • Has min-part-hours elapsed?

    The swift-ring-builder will refuse to build a new ring until the min-part-hours has elapsed since the last time it built rings. You must postpone changes until this time has elapsed.

    You can determine how long you must wait by running the swift-compare-model-rings.yml playbook, which will tell you how long you until the min-part-hours has elapsed. For more details, see Section 8.5.3, “Managing Rings Using Swift Playbooks”.

    You can change the value of min-part-hours. (For instructions, see Section 8.5.7, “Changing min-part-hours in Swift”).

  • Is the Swift dispersion report clean?

    Run the swift-dispersion-report.yml playbook (as described in Section 8.1, “Running the Swift Dispersion Report”) and examine the results. If the replication process has not yet replicated partitions that were moved to new drives in the last ring rebalance, the dispersion report will indicate that some containers or objects are missing a copy.

    For example:

    There were 462 partitions missing one copy.

    Assuming all servers and disk drives are operational, the reason for the missing partitions is that the replication process has not yet managed to copy a replica into the partitions.

    You should wait an hour and rerun the dispersion report process and examine the report. The number of partitions missing one copy should have reduced. Continue to wait until this reaches zero before making any further ring rebalances.

    Note
    Note

    It is normal to see partitions missing one copy if disk drives or servers are down. If all servers and disk drives are mounted, and you did not recently perform a ring rebalance, you should investigate whether there are problems with the replication process. You can use the Operations Console to investigate replication issues.

    Important
    Important

    If there are any partitions missing two copies, you must reboot or repair any failed servers and disk drives as soon as possible. Do not shutdown any Swift nodes in this situation. Assuming a replica count of 3, if you are missing two copies you are in danger of losing the only remaining copy.

8.5.5 Applying Input Model Changes to Existing Rings

This page describes a general approach for making changes to your existing Swift rings. This approach applies to actions such as adding and removing a server and replacing and upgrading disk drives, and must be performed as a series of phases, as shown below:

8.5.5.1 Changing the Input Model Configuration Files

The first step to apply new changes to the Swift environment is to update the configuration files. Follow these steps:

  1. Log in to the Cloud Lifecycle Manager.

  2. Set the weight-step attribute, as needed, for the nodes you are altering. (For instructions, see Section 8.5.2, “Using the Weight-Step Attributes to Prepare for Ring Changes”).

  3. Edit the configuration files as part of the Input Model as appropriate. (For general information about the Input Model, see Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 6 “Configuration Objects”, Section 6.14 “Networks”. For more specific information about the Swift parts of the configuration files, see Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 11 “Modifying Example Configurations for Object Storage using Swift”)

  4. Once you have completed all of the changes, commit your configuration to the local git repository. (For more information, seeBook “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”.) :

    ardana > git add -A
    root # git commit -m "commit message"
  5. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Create a deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Run the Swift playbook that will validate your configuration files and give you a report as an output:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    root # ansible-playbook -i hosts/verb_hosts swift-compare-model-rings.yml
  8. Use the report to validate that the number of drives proposed to be added or deleted, or the weight change, is correct. Fix any errors in your input model. At this stage, no changes have been made to rings.

8.5.5.2 First phase of Ring Rebalance

To begin the rebalancing of the Swift rings, follow these steps:

  1. After going through the steps in the section above, deploy your changes to all of the Swift nodes in your environment by running this playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-deploy.yml
  2. Wait until replication has finished or min-part-hours has elapsed (whichever is longer). For more information, see Section 8.5.4, “Determining When to Rebalance and Deploy a New Ring”

8.5.5.3 Weight Change Phase of Ring Rebalance

At this stage, no changes have been made to the input model. However, when you set the weight-step attribute, the rings that were rebuilt in the previous rebalance phase have weights that are different than their target/final value. You gradually move to the target/final weight by rebalancing a number of times as described on this page. For more information about the weight-step attribute, see Section 8.5.2, “Using the Weight-Step Attributes to Prepare for Ring Changes”.

To begin the re-balancing of the rings, follow these steps:

  1. Rebalance the rings by running the playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-update-from-model-rebalance-rings.yml
  2. Run the reconfiguration:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-reconfigure.yml
  3. Wait until replication has finished or min-part-hours has elapsed (whichever is longer). For more information, see Section 8.5.4, “Determining When to Rebalance and Deploy a New Ring”

  4. Run the following command and review the report:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-compare-model-rings.yml --limit SWF*

    The following is an example of the output after executing the above command. In the example no weight changes are proposed:

    TASK: [swiftlm-ring-supervisor | validate-input-model | Print report] *********
    ok: [padawan-ccp-c1-m1-mgmt] => {
        "var": {
            "report.stdout_lines": [
                "Need to add 0 devices",
                "Need to remove 0 devices",
                "Need to set weight on 0 devices"
            ]
        }
    }
  5. When there are no proposed weight changes, you proceed to the final phase.

  6. If there are proposed weight changes repeat this phase again.

8.5.5.4 Final Rebalance Phase

The final rebalance phase moves all replicas to their final destination.

  1. Rebalance the rings by running the playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-update-from-model-rebalance-rings.yml | tee /tmp/rebalance.log
    Note
    Note

    The output is saved for later reference.

  2. Review the output from the previous step. If the output for all rings is similar to the following, the rebalance had no effect. That is, the rings are balanced and no further changes are needed. In addition, the ring files were not changed so you do not need to deploy them to the Swift nodes:

    "Running: swift-ring-builder /etc/swiftlm/cloud1/cp1/builder_dir/account.builder rebalance 999",
          "NOTE: No partitions could be reassigned.",
          "Either none need to be or none can be due to min_part_hours [16]."

    The text No partitions could be reassigned indicates that no further rebalances are necessary. If this is true for all the rings, you have completed the final phase.

    Note
    Note

    You must have allowed enough time to elapse since the last rebalance. As mentioned in the above example, min_part_hours [16] means that you must wait at least 16 hours since the last rebalance. If not, you should wait until enough time has elapsed and repeat this phase.

  3. Run the swift-reconfigure.yml playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-reconfigure.yml
  4. Wait until replication has finished or min-part-hours has elapsed (whichever is longer). For more information see Section 8.5.4, “Determining When to Rebalance and Deploy a New Ring”

  5. Repeat the above steps until the ring is rebalanced.

8.5.5.5 System Changes that Change Existing Rings

There are many system changes ranging from adding servers to replacing drives, which might require you to rebuild and rebalance your rings.

Actions Process
Adding Servers(s)
Removing Server(s)

In SUSE OpenStack Cloud, when you remove servers from the input model, the disk drives are removed from the ring - the weight is not gradually reduced using the weight-step attribute.

  • Remove servers in phases:

    • This reduces the impact of the changes on your system.

    • If your rings use Swift zones, ensure you remove the same number of servers for each zone at each phase.

Replacing Disk Drive(s)

When a drive fails, replace it as soon as possible. Do not attempt to remove it from the ring - this creates operator overhead. Swift will continue to store the correct number of replicas by handing off objects to other drives instead of the failed drive.

If the disk drives are of the same size as the original when the drive is replaced, no ring changes are required. You can confirm this by running the swift-update-from-model-rebalance-rings.yml playbook. It should report that no weight changes are needed.

For a single drive replacement, even if the drive is significantly larger than the original drives, you do not need to rebalance the ring (however, the extra space on the drive will not be used).

Upgrading Disk Drives

If the drives are different size (for example, you are upgrading your system), you can proceed as follows:

  • If not already done, set the weight-step attribute

  • Replace drives in phases:

    • Avoid replacing too many drives at once.

    • If your rings use swift zones, upgrade a number of drives in the same zone at the same time - not drives in several zones.

    • It is also safer to upgrade one server instead of drives in several servers at the same time.

    • Remember that the final size of all Swift zones must be the same, so you may need to replace a small number of drives in one zone, then a small number in second zone, then return to the first zone and replace more drives, etc.

8.5.6 Adding a New Swift Storage Policy

This page describes how to add an additional storage policy to an existing system. For an overview of storage policies, see Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 11 “Modifying Example Configurations for Object Storage using Swift”, Section 11.11 “Designing Storage Policies”.

To Add a Storage Policy

Perform the following steps to add the storage policy to an existing system.

  1. Log in to the Cloud Lifecycle Manager.

  2. Select a storage policy index and ring name.

    For example, if you already have object-0 and object-1 rings in your ring-specifications (usually in the ~/openstack/my_cloud/definition/data/swift/rings.yml file), the next index is 2 and the ring name is object-2.

  3. Select a user-visible name so that you can see when you examine container metadata or when you want to specify the storage policy used when you create a container. The name should be a single word (hyphen and dashes are allowed).

  4. Decide if this new policy will be the default for all new containers.

  5. Decide on other attributes such as partition-power and replica-count if you are using a standard replication ring. However, if you are using an erasure coded ring, you also need to decide on other attributes: ec-type, ec-num-data-fragments, ec-num-parity-fragments, and ec-object-segment-size. For more details on the required attributes, see Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 11 “Modifying Example Configurations for Object Storage using Swift”, Section 11.10 “Understanding Swift Ring Specifications”.

  6. Edit the ring-specifications attribute (usually in the ~/openstack/my_cloud/definition/data/swift/rings.yml file) and add the new ring specification. If this policy is to be the default storage policy for new containers, set the default attribute to yes.

    Note
    Note
    1. Ensure that only one object ring has the default attribute set to yes. If you set two rings as default, Swift processes will not start.

    2. Do not specify the weight-step attribute for the new object ring. Since this is a new ring there is no need to gradually increase device weights.

  7. Update the appropriate disk model to use the new storage policy (for example, the data/disks_swobj.yml file). The following sample shows that the object-2 has been added to the list of existing rings that use the drives:

    disk-models:
    - name: SWOBJ-DISKS
      ...
      device-groups:
      - name: swobj
        devices:
           ...
        consumer:
            name: swift
            attrs:
                rings:
                - object-0
                - object-1
                - object-2
      ...
    Note
    Note

    You must use the new object ring on at least one node that runs the swift-object service. If you skip this step and continue to run the swift-compare-model-rings.yml or swift-deploy.yml playbooks, they will fail with an error There are no devices in this ring, or all devices have been deleted, as shown below:

    TASK: [swiftlm-ring-supervisor | build-rings | Build ring (make-delta, rebalance)] ***
    failed: [padawan-ccp-c1-m1-mgmt] => {"changed": true, "cmd": ["swiftlm-ring-supervisor", "--make-delta", "--rebalance"], "delta": "0:00:03.511929", "end": "2015-10-07 14:02:03.610226", "rc": 2, "start": "2015-10-07 14:02:00.098297", "warnings": []}
    ...
    Running: swift-ring-builder /etc/swiftlm/cloud1/cp1/builder_dir/object-2.builder rebalance 999
    ERROR: -------------------------------------------------------------------------------
    An error has occurred during ring validation. Common
    causes of failure are rings that are empty or do not
    have enough devices to accommodate the replica count.
    Original exception message:
    There are no devices in this ring, or all devices have been deleted
    -------------------------------------------------------------------------------
  8. Commit your configuration:

    ardana > git add -A
    ardana > git commit -m "commit message"
  9. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  10. Create a deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  11. Validate the changes by running the swift-compare-model-rings.yml playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-compare-model-rings.yml

    If any errors occur, correct them. For instructions, see Section 15.6.2.3, “Interpreting Swift Input Model Validation Errors”. Then, re-run steps 5 - 10.

  12. Create the new ring (for example, object-2). Then verify the Swift service status and reconfigure the Swift node to use a new storage policy, by running these playbooks:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-status.yml
    ardana > ansible-playbook -i hosts/verb_hosts swift-deploy.yml

After adding a storage policy, there is no need to rebalance the ring.

8.5.7 Changing min-part-hours in Swift

The min-part-hours parameter specifies the number of hours you must wait before Swift will allow a given partition to be moved. In other words, it constrains how often you perform ring rebalance operations. Before changing this value, you should get some experience with how long it takes your system to perform replication after you make ring changes (for example, when you add servers).

See Section 8.5.4, “Determining When to Rebalance and Deploy a New Ring” for more information about determining when replication has completed.

8.5.7.1 Changing the min-part-hours Value

To change the min-part-hours value, following these steps:

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit your ~/openstack/my_cloud/definition/data/swift/rings.yml file and change the value(s) of min-part-hours for the rings you desire. The value is expressed in hours and a value of zero is not allowed.

  3. Commit your configuration to the local Git repository (Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "My config or other commit message"
  4. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  6. Apply the changes by running this playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-deploy.yml

8.5.8 Changing Swift Zone Layout

Before changing the number of Swift zones or the assignment of servers to specific zones, you must ensure that your system has sufficient storage available to perform the operation. Specifically, if you are adding a new zone, you may need additional storage. There are two reasons for this:

  • You cannot simply change the Swift zone number of disk drives in the ring. Instead, you need to remove the server(s) from the ring and then re-add the server(s) with a new Swift zone number to the ring. At the point where the servers are removed from the ring, there must be sufficient spare capacity on the remaining servers to hold the data that was originally hosted on the removed servers.

  • The total amount of storage in each Swift zone must be the same. This is because new data is added to each zone at the same rate. If one zone has a lower capacity than the other zones, once that zone becomes full, you cannot add more data to the system – even if there is unused space in the other zones.

As mentioned above, you cannot simply change the Swift zone number of disk drives in an existing ring. Instead, you must remove and then re-add servers. This is a summary of the process:

  1. Identify appropriate server groups that correspond to the desired Swift zone layout.

  2. Remove the servers in a server group from the rings. This process may be protracted, either by removing servers in small batches or by using the weight-step attribute so that you limit the amount of replication traffic that happens at once.

  3. Once all the targeted servers are removed, edit the swift-zones attribute in the ring specifications to add or remove a Swift zone.

  4. Re-add the servers you had temporarily removed to the rings. Again you may need to do this in batches or rely on the weight-step attribute.

  5. Continue removing and re-adding servers until you reach your final configuration.

8.5.8.1 Process for Changing Swift Zones

This section describes the detailed process or reorganizing Swift zones. As a concrete example, we assume we start with a single Swift zone and the target is three Swift zones. The same general process would apply if you were reducing the number of zones as well.

The process is as follows:

  1. Identify the appropriate server groups that represent the desired final state. In this example, we are going to change the Swift zone layout as follows:

    Original LayoutTarget Layout
    swift-zones:
      - 1d: 1
        server-groups:
           - AZ1
           - AZ2
           - AZ3
    swift-zones:
       - 1d: 1
         server-groups:
            - AZ1
       - id: 2
            - AZ2
       - id: 3
            - AZ3

    The plan is to move servers from server groups AZ2 and AZ3 to a new Swift zone number. The servers in AZ1 will remain in Swift zone 1.

  2. If you have not already done so, consider setting the weight-step attribute as described in Section 8.5.2, “Using the Weight-Step Attributes to Prepare for Ring Changes”.

  3. Identify the servers in the AZ2 server group. You may remove all servers at once or remove them in batches. If this is the first time you have performed a major ring change, we suggest you remove one or two servers only in the first batch. When you see how long this takes and the impact replication has on your system you can then use that experience to decide whether you can remove a larger batch of servers, or increase or decrease the weight-step attribute for the next server-removal cycle. To remove a server, use steps 2-9 as described in Section 13.1.5.1.4, “Removing a Swift Node” ensuring that you do not remove the servers from the input model.

  4. This process may take a number of ring rebalance cycles until the disk drives are removed from the ring files. Once this happens, you can edit the ring specifications and add Swift zone 2 as shown in this example:

    swift-zones:
      - id: 1
        server-groups:
          - AZ1
          - AZ3
      - id: 2
           - AZ2
  5. The server removal process in step #3 set the "remove" attribute in the pass-through attribute of the servers in server group AZ2. Edit the input model files and remove this pass-through attribute. This signals to the system that the servers should be used the next time we rebalance the rings (that is, the server should be added to the rings).

  6. Commit your configuration to the local Git repository (Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "My config or other commit message"
  7. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  8. Use the playbook to create a deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  9. Rebuild and deploy the Swift rings containing the re-added servers by running this playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts swift-deploy.yml
  10. Wait until replication has finished. For more details, see Section 8.5.4, “Determining When to Rebalance and Deploy a New Ring”.

  11. You may need to continue to rebalance the rings. For instructions, see the "Final Rebalance Stage" steps at Section 8.5.5, “Applying Input Model Changes to Existing Rings”.

  12. At this stage, the servers in server group AZ2 are responsible for Swift zone 2. Repeat the process in steps #3-9 to remove the servers in server group AZ3 from the rings and then re-add them to Swift zone 3. The ring specifications for zones (step 4) should be as follows:

    swift-zones:
      - 1d: 1
        server-groups:
          - AZ1
      - id: 2
          - AZ2
      - id: 3
          - AZ3
  13. Once complete, all data should be dispersed (that is, each replica is located) in the Swift zones as specified in the input model.

8.6 Configuring your Swift System to Allow Container Sync

Swift has a feature where all the contents of a container can be mirrored to another container through background synchronization. Swift operators configure their system to allow/accept sync requests to/from other systems, and the user specifies where to sync their container to along with a secret synchronization key. For an overview of this feature, refer to OpenStack Swift - Container to Container Synchronization.

8.6.1 Notes and limitations

The container synchronization is done as a background action. When you put an object into the source container, it will take some time before it becomes visible in the destination container. Storage services will not necessarily copy objects in any particular order, meaning they may be transferred in a different order to which they were created.

Container sync may not be able to keep up with a moderate upload rate to a container. For example, if the average object upload rate to a container is greater than one object per second, then container sync may not be able to keep the objects synced.

If container sync is enabled on a container that already has a large number of objects then container sync may take a long time to sync the data. For example, a container with one million 1KB objects could take more than 11 days to complete a sync.

You may operate on the destination container just like any other container -- adding or deleting objects -- including the objects that are in the destination container because they were copied from the source container. To decide how to handle object creation, replacement or deletion, the system uses timestamps to determine what to do. In general, the latest timestamp "wins". That is, if you create an object, replace it, delete it and the re-create it, the destination container will eventually contain the most recently created object. However, if you also create and delete objects in the destination container, you get some subtle behaviours as follows:

  • If an object is copied to the destination container and then deleted, it remains deleted in the destination even though there is still a copy in the source container. If you modify the object (replace or change its metadata) in the source container, it will reappear in the destination again.

  • The same applies to a replacement or metadata modification of an object in the destination container -- the object will remain as-is unless there is a replacement or modification in the source container.

  • If you replace or modify metadata of an object in the destination container and then delete it in the source container, it is not deleted from the destination. This is because your modified object has a later timestamp than the object you deleted in the source.

  • If you create an object in the source container and before the system has a chance to copy it to the destination, you also create an object of the same name in the destination, then the object in the destination is not overwritten by the source container's object.

Segmented objects

Segmented objects (objects larger than 5GB) will not work seamlessly with container synchronization. If the manifest object is copied to the destination container before the object segments, when you perform a GET operation on the manifest object, the system may fail to find some or all of the object segments. If your manifest and object segments are in different containers, do not forget that both containers must be synchonized and that the container name of the object segments must be the same on both source and destination.

8.6.2 Prerequisites

Container to container synchronization requires that SSL certificates are configured on both the source and destination systems. For more information on how to implement SSL, see Book “Installing with Cloud Lifecycle Manager”, Chapter 30 “Configuring Transport Layer Security (TLS)”.

8.6.3 Configuring container sync

Container to container synchronization requires that both the source and destination Swift systems involved be configured to allow/accept this. In the context of container to container synchronization, Swift uses the term cluster to denote a Swift system. Swift clusters correspond to Control Planes in OpenStack terminology.

Gather the public API endpoints for both Swift systems

Gather information about the external/public URL used by each system, as follows:

  1. On the Cloud Lifecycle Manager of one system, get the public API endpoint of the system by running the following commands:

    ardana > source ~/service.osrc
    ardana > openstack endpoint list | grep swift

    The output of the command will look similar to this:

    ardana > openstack endpoint list | grep swift
    | 063a84b205c44887bc606c3ba84fa608 | region0 | swift           | object-store    | True    | admin     | https://10.13.111.176:8080/v1/AUTH_%(tenant_id)s |
    | 3c46a9b2a5f94163bb5703a1a0d4d37b | region0 | swift           | object-store    | True    | public    | https://10.13.120.105:8080/v1/AUTH_%(tenant_id)s |
    | a7b2f4ab5ad14330a7748c950962b188 | region0 | swift           | object-store    | True    | internal  | https://10.13.111.176:8080/v1/AUTH_%(tenant_id)s |

    The portion that you want is the endpoint up to, but not including, the AUTH part. It is bolded in the above example, https://10.13.120.105:8080/v1.

  2. Repeat these steps on the other Swift system so you have both of the public API endpoints for them.

Validate connectivity between both systems

The Swift nodes running the swift-container service must be able to connect to the public API endpoints of each other for the container sync to work. You can validate connectivity on each system using these steps.

For the sake of the examples, we will use the terms source and destination to notate the nodes doing the synchronization.

  1. Log in to a Swift node running the swift-container service on the source system. You can determine this by looking at the service list in your ~/openstack/my_cloud/info/service_info.yml file for a list of the servers containing this service.

  2. Verify the SSL certificates by running this command against the destination Swift server:

    echo | openssl s_client -connect PUBLIC_API_ENDPOINT:8080 -CAfile /etc/ssl/certs/ca-certificates.crt

    If the connection was successful you should see a return code of 0 (ok) similar to this:

    ...
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
  3. Also verify that the source node can connect to the destination Swift system using this command:

    ardana > curl -k DESTINATION_IP OR HOSTNAME:8080/healthcheck

    If the connection was successful, you should see a response of OK.

  4. Repeat these verification steps on any system involved in your container synchronization setup.

Configure container to container synchronization

Both the source and destination Swift systems must be configured the same way, using sync realms. For more details on how sync realms work, see OpenStack Swift - Configuring Container Sync.

To configure one of the systems, follow these steps:

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the ~/openstack/my_cloud/config/swift/container-sync-realms.conf.j2 file and uncomment the sync realm section.

    Here is a sample showing this section in the file:

    #Add sync realms here, for example:
    # [realm1]
    # key = realm1key
    # key2 = realm1key2
    # cluster_name1 = https://host1/v1/
    # cluster_name2 = https://host2/v1/
  3. Add in the details for your source and destination systems. Each realm you define is a set of clusters that have agreed to allow container syncing between them. These values are case sensitive.

    Only one key is required. The second key is optional and can be provided to allow an operator to rotate keys if desired. The values for the clusters must contain the prefix cluster_ and will be populated with the public API endpoints for the systems.

  4. Commit the changes to git:

    ardana > git add -A
    ardana > git commit -a -m "Add node <name>"
  5. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Update the deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Run the Swift reconfigure playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible/
    ardana > ansible-playbook -i hosts/verb_hosts swift-reconfigure.yml
  8. Run this command to validate that your container synchronization is configured:

    ardana > source ~/service.osrc
    ardana > swift capabilities

    Here is a snippet of the output showing the container sync information. This should be populated with your cluster names:

    ...
    Additional middleware: container_sync
     Options:
      realms: {u'INTRACLUSTER': {u'clusters': {u'THISCLUSTER': {}}}}
  9. Repeat these steps on any other Swift systems that will be involved in your sync realms.

8.6.4 Configuring Intra Cluster Container Sync

It is possible to use the swift container sync functionality to sync objects between containers within the same swift system. Swift is automatically configured to allow intra cluster container sync. Each swift PAC server will have an intracluster container sync realm defined in /etc/swift/container-sync-realms.conf.

For example:

# The intracluster realm facilitates syncing containers on this system
[intracluster]
key = lQ8JjuZfO
# key2 =
cluster_thiscluster = http://SWIFT-PROXY-VIP:8080/v1/

The keys defined in /etc/swift/container-sync-realms.conf are used by the container-sync daemon to determine trust. On top of this the containers that will be in sync will need a seperate shared key they both define in container metadata to establish their trust between each other.

  1. Create two containers, for example container-src and container-dst. In this example we will sync one way from container-src to container-dst.

    ardana > swift post container-src
    ardana > swift post container-dst
  2. Determine your swift account. In the following example it is AUTH_1234

    ardana > swift stat
                                     Account: AUTH_1234
                                  Containers: 3
                                     Objects: 42
                                       Bytes: 21692421
    Containers in policy "erasure-code-ring": 3
       Objects in policy "erasure-code-ring": 42
         Bytes in policy "erasure-code-ring": 21692421
                                Content-Type: text/plain; charset=utf-8
                 X-Account-Project-Domain-Id: default
                                 X-Timestamp: 1472651418.17025
                                  X-Trans-Id: tx81122c56032548aeae8cd-0057cee40c
                               Accept-Ranges: bytes
  3. Configure container-src to sync to container-dst using a key specified by both containers. Replace KEY with your key.

    ardana > swift post -t '//intracluster/thiscluster/AUTH_1234/container-dst' -k 'KEY' container-src
  4. Configure container-dst to accept synced objects with this key

    ardana > swift post -k 'KEY' container-dst
  5. Upload objects to container-src. Within a number of minutes the objects should be automatically synced to container-dst.

Changing the intracluster realm key

The intracluster realm key used by container sync to sync objects between containers in the same swift system is automatically generated. The process for changing passwords is described in Section 4.7, “Changing Service Passwords”.

The steps to change the intracluster realm key are as follows.

  1. On the Cloud Lifecycle Manager create a file called ~/openstack/change_credentials/swift_data_metadata.yml with the contents included below. The consuming-cp and cp are the control plane name specified in ~/openstack/my_cloud/definition/data/control_plane.yml where the swift-container service is running.

    swift_intracluster_sync_key:
     metadata:
     - clusters:
       - swpac
       component: swift-container
       consuming-cp: control-plane-1
       cp: control-plane-1
     version: '2.0'
  2. Run the following commands

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  3. Reconfigure the swift credentials

    ardana > cd ~/scratch/ansible/next/ardana/ansible/
    ardana > ansible-playbook -i hosts/verb_hosts swift-reconfigure-credentials-change.yml
  4. Delete ~/openstack/change_credentials/swift_data_metadata.yml

    ardana > rm ~/openstack/change_credentials/swift_data_metadata.yml
  5. On a swift PAC server check that the intracluster realm key has been updated in /etc/swift/container-sync-realms.conf

    # The intracluster realm facilitates syncing containers on this system
    [intracluster]
    key = aNlDn3kWK
  6. Update any containers using the intracluster container sync to use the new intracluster realm key

    ardana > swift post -k 'aNlDn3kWK' container-src
    ardana > swift post -k 'aNlDn3kWK' container-dst

9 Managing Networking

Information about managing and configuring the Networking service.

9.1 Configuring the SUSE OpenStack Cloud Firewall

The following instructions provide information about how to identify and modify the overall SUSE OpenStack Cloud firewall that is configured in front of the control services. This firewall is administered only by a cloud admin and is not available for tenant use for private network firewall services.

During the installation process, the configuration processor will automatically generate "allow" firewall rules for each server based on the services deployed and block all other ports. These are populated in ~/openstack/my_cloud/info/firewall_info.yml, which includes a list of all the ports by network, including the addresses on which the ports will be opened. This is described in more detail in Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 5 “Input Model”, Section 5.2 “Concepts”, Section 5.2.10 “Networking”, Section 5.2.10.5 “Firewall Configuration”.

The firewall_rules.yml file in the input model allows you to define additional rules for each network group. You can read more about this in Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 6 “Configuration Objects”, Section 6.15 “Firewall Rules”.

The purpose of this document is to show you how to make post-installation changes to the firewall rules if the need arises.

Important
Important

This process is not to be confused with Firewall-as-a-Service (see Book “User Guide Overview”, Chapter 14 “Using Firewall as a Service (FWaaS)”), which is a separate service that enables the ability for SUSE OpenStack Cloud tenants to create north-south, network-level firewalls to provide stateful protection to all instances in a private, tenant network. This service is optional and is tenant-configured.

9.1.1 Making Changes to the Firewall Rules

  1. Log in to your Cloud Lifecycle Manager.

  2. Edit your ~/openstack/my_cloud/definition/data/firewall_rules.yml file and add the lines necessary to allow the port(s) needed through the firewall.

    In this example we are going to open up port range 5900-5905 to allow VNC traffic through the firewall:

      - name: VNC
        network-groups:
      - MANAGEMENT
        rules:
         - type: allow
           remote-ip-prefix:  0.0.0.0/0
           port-range-min: 5900
           port-range-max: 5905
           protocol: tcp
    Note
    Note

    The example above shows a remote-ip-prefix of 0.0.0.0/0 which opens the ports up to all IP ranges. To be more secure you can specify your local IP address CIDR you will be running the VNC connect from.

  3. Commit those changes to your local git:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "firewall rule update"
  4. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Create the deployment directory structure:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  6. Change to the deployment directory and run the osconfig-iptables-deploy.yml playbook to update your iptable rules to allow VNC:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts osconfig-iptables-deploy.yml

You can repeat these steps as needed to add, remove, or edit any of these firewall rules.

9.2 DNS Service Overview

SUSE OpenStack Cloud DNS service provides multi-tenant Domain Name Service with REST API management for domain and records.

Warning
Warning

The DNS Service is not intended to be used as an internal or private DNS service. The name records in DNSaaS should be treated as public information that anyone could query. There are controls to prevent tenants from creating records for domains they do not own. TSIG provides a Transaction SIG nature to ensure integrity during zone transfer to other DNS servers.

9.2.1 For More Information

9.2.2 Designate Initial Configuration

After the SUSE OpenStack Cloud installation has been completed, Designate requires initial configuration to operate.

9.2.2.1 Identifying Name Server Public IPs

Depending on the back-end, the method used to identify the name servers' public IPs will differ.

9.2.2.1.1 InfoBlox

InfoBlox will act as your public name servers, consult the InfoBlox management UI to identify the IPs.

9.2.2.1.2 DynECT or Akamai Back-end

Not applicable: Proceed to Section 9.2.2.1.5, “Registering Name Server Entries”.

9.2.2.1.3 PowerDNS or BIND Back-end

You can find the name server IPs in /etc/hosts by looking for the ext-api addresses, which are the addresses of the controllers. For example:

192.168.10.1 example-cp1-c1-m1-extapi
192.168.10.2 example-cp1-c1-m2-extapi
192.168.10.3 example-cp1-c1-m3-extapi
9.2.2.1.4 Creating Name Server A Records

Each name server requires a public name, for example ns1.example.com., to which Designate-managed domains will be delegated. There are two common locations where these may be registered, either within a zone hosted on Designate itself, or within a zone hosted on a external DNS service.

If you are using an externally managed zone for these names:

  1. For each name server public IP, create the necessary A records in the external system.

  2. Proceed to Section 9.2.2.1.5, “Registering Name Server Entries”.

If you are using a Designate-managed zone for these names:

  1. Proceed to Section 9.2.2.1.5, “Registering Name Server Entries” and when complete, continue with the remaining steps below.

  2. Create the zone in Designate which will contain the records:

    ardana > openstack zone create --email hostmaster@example.com example.com.
    +----------------+--------------------------------------+
    | Field          | Value                                |
    +----------------+--------------------------------------+
    | action         | CREATE                               |
    | created_at     | 2016-03-09T13:16:41.000000           |
    | description    | None                                 |
    | email          | hostmaster@example.com               |
    | id             | 23501581-7e34-4b88-94f4-ad8cec1f4387 |
    | masters        |                                      |
    | name           | example.com.                         |
    | pool_id        | 794ccc2c-d751-44fe-b57f-8894c9f5c842 |
    | project_id     | a194d740818942a8bea6f3674e0a3d71     |
    | serial         | 1457529400                           |
    | status         | PENDING                              |
    | transferred_at | None                                 |
    | ttl            | 3600                                 |
    | type           | PRIMARY                              |
    | updated_at     | None                                 |
    | version        | 1                                    |
    +----------------+--------------------------------------+
  3. For each name server public IP, create an A record. For example:

    ardana > openstack recordset create --records 192.168.10.1 --type A example.com. ns1.example.com.
    +-------------+--------------------------------------+
    | Field       | Value                                |
    +-------------+--------------------------------------+
    | action      | CREATE                               |
    | created_at  | 2016-03-09T13:18:36.000000           |
    | description | None                                 |
    | id          | 09e962ed-6915-441a-a5a1-e8d93c3239b6 |
    | name        | ns1.example.com.                     |
    | records     | 192.168.10.1                         |
    | status      | PENDING                              |
    | ttl         | None                                 |
    | type        | A                                    |
    | updated_at  | None                                 |
    | version     | 1                                    |
    | zone_id     | 23501581-7e34-4b88-94f4-ad8cec1f4387 |
    +-------------+--------------------------------------+
  4. When records have been added, list the record sets in the zone to validate:

    ardana > openstack recordset list example.com.
    +--------------+------------------+------+---------------------------------------------------+
    | id           | name             | type | records                                           |
    +--------------+------------------+------+---------------------------------------------------+
    | 2d6cf...655b | example.com.     | SOA  | ns1.example.com. hostmaster.example.com 145...600 |
    | 33466...bd9c | example.com.     | NS   | ns1.example.com.                                  |
    | da98c...bc2f | example.com.     | NS   | ns2.example.com.                                  |
    | 672ee...74dd | example.com.     | NS   | ns3.example.com.                                  |
    | 09e96...39b6 | ns1.example.com. | A    | 192.168.10.1                                      |
    | bca4f...a752 | ns2.example.com. | A    | 192.168.10.2                                      |
    | 0f123...2117 | ns3.example.com. | A    | 192.168.10.3                                      |
    +--------------+------------------+------+---------------------------------------------------+
  5. Contact your domain registrar requesting Glue Records to be registered in the com. zone for the nameserver and public IP address pairs above. If you are using a sub-zone of an existing company zone (for example, ns1.cloud.mycompany.com.), the Glue must be placed in the mycompany.com. zone.

9.2.2.1.5 Registering Name Server Entries
  1. Connect to the Cloud Lifecycle Manager, and source the service.osrc credentials.

  2. For each name server public name, register the name within Designate. For example:

    ardana > designate server-create --name ns1.example.com.
    +------------+--------------------------------------+
    | Field      | Value                                |
    +------------+--------------------------------------+
    | id         | d65a7522-a74a-4e0d-a461-76060e3eb656 |
    | created_at | 2016-03-09T13:19:12.000000           |
    | updated_at | None                                 |
    | name       | ns1.example.com.                     |
    +------------+--------------------------------------+
9.2.2.1.6 For More Information

For additional DNS integration and configuration information, see the OpenStack Designate documentation at https://docs.openstack.org/designate/pike/index.html.

For more information on creating servers, domains and examples, see the OpenStack REST API documentation at https://developer.openstack.org/api-ref/dns/.

9.2.3 DNS Service Monitoring Support

9.2.3.1 DNS Service Monitoring Support

Additional monitoring support for the DNS Service (Designate) has been added to SUSE OpenStack Cloud.

In the Networking section of the Operations Console, you can see alarms for all of the DNS Services (Designate), such as designate-zone-manager, designate-api, designate-pool-manager, designate-mdns, and designate-central after running designate-stop.yml.

You can run designate-start.yml to start the DNS Services back up and the alarms will change from a red status to green and be removed from the New Alarms panel of the Operations Console.

An example of the generated alarms from the Operations Console is provided below after running designate-stop.yml:

ALARM:  STATE:  ALARM ID:  LAST CHECK:  DIMENSION:
Process Check
0f221056-1b0e-4507-9a28-2e42561fac3e 2016-10-03T10:06:32.106Z hostname=ardana-cp1-c1-m1-mgmt,
service=dns,
cluster=cluster1,
process_name=designate-zone-manager,
component=designate-zone-manager,
control_plane=control-plane-1,
cloud_name=entry-scale-kvm

Process Check
50dc4c7b-6fae-416c-9388-6194d2cfc837 2016-10-03T10:04:32.086Z hostname=ardana-cp1-c1-m1-mgmt,
service=dns,
cluster=cluster1,
process_name=designate-api,
component=designate-api,
control_plane=control-plane-1,
cloud_name=entry-scale-kvm

Process Check
55cf49cd-1189-4d07-aaf4-09ed08463044 2016-10-03T10:05:32.109Z hostname=ardana-cp1-c1-m1-mgmt,
service=dns,
cluster=cluster1,
process_name=designate-pool-manager,
component=designate-pool-manager,
control_plane=control-plane-1,
cloud_name=entry-scale-kvm

Process Check
c4ab7a2e-19d7-4eb2-a9e9-26d3b14465ea 2016-10-03T10:06:32.105Z hostname=ardana-cp1-c1-m1-mgmt,
service=dns,
cluster=cluster1,
process_name=designate-mdns,
component=designate-mdns,
control_plane=control-plane-1,
cloud_name=entry-scale-kvm
HTTP Status
c6349bbf-4fd1-461a-9932-434169b86ce5 2016-10-03T10:05:01.731Z service=dns,
cluster=cluster1,
url=http://100.60.90.3:9001/,
hostname=ardana-cp1-c1-m3-mgmt,
component=designate-api,
control_plane=control-plane-1,
api_endpoint=internal,
cloud_name=entry-scale-kvm,
monitored_host_type=instance

Process Check
ec2c32c8-3b91-4656-be70-27ff0c271c89 2016-10-03T10:04:32.082Z hostname=ardana-cp1-c1-m1-mgmt,
service=dns,
cluster=cluster1,
process_name=designate-central,
component=designate-central,
control_plane=control-plane-1,
cloud_name=entry-scale-kvm

9.3 Networking Service Overview

SUSE OpenStack Cloud Networking is a virtual networking service that leverages the OpenStack Neutron service to provide network connectivity and addressing to SUSE OpenStack Cloud Compute service devices.

The Networking service also provides an API to configure and manage a variety of network services.

You can use the Networking service to connect guest servers or you can define and configure your own virtual network topology.

9.3.1 Installing the Networking service

SUSE OpenStack Cloud Network Administrators are responsible for planning for the Neutron networking service, and once installed, to configure the service to meet the needs of their cloud network users.

9.3.2 Working with the Networking service

To perform tasks using the Networking service, you can use the dashboard, API or CLI.

9.3.3 Reconfiguring the Networking service

If you change any of the network configuration after installation, it is recommended that you reconfigure the Networking service by running the neutron-reconfigure playbook.

On the Cloud Lifecycle Manager:

ardana > cd ~/openstack/ardana/ansible
ardana > ansible-playbook -i hosts/verb_hosts neutron-reconfigure.yml

9.3.4 For more information

For information on how to operate your cloud we suggest you read the OpenStack Operations Guide. The Architecture section contains useful information about how an OpenStack Cloud is put together. However, SUSE OpenStack Cloud takes care of these details for you. The Operations section contains information on how to manage the system.

9.3.5 Neutron External Networks

9.3.5.1 External networks overview

This topic explains how to create a Neutron external network.

External networks provide access to the internet.

The typical use is to provide an IP address that can be used to reach a VM from an external network which can be a public network like the internet or a network that is private to an organization.

9.3.5.2 Using the Ansible Playbook

This playbook will query the Networking service for an existing external network, and then create a new one if you do not already have one. The resulting external network will have the name ext-net with a subnet matching the CIDR you specify in the command below.

If you need to specify more granularity, for example specifying an allocation pool for the subnet, use the Section 9.3.5.3, “Using the NeutronClient CLI”.

ardana > cd ~/scratch/ansible/next/ardana/ansible
ardana > ansible-playbook -i hosts/verb_hosts neutron-cloud-configure.yml -e EXT_NET_CIDR=<CIDR>

The table below shows the optional switch that you can use as part of this playbook to specify environment-specific information:

SwitchDescription

-e EXT_NET_CIDR=<CIDR>

Optional. You can use this switch to specify the external network CIDR. If you choose not to use this switch, or use a wrong value, the VMs will not be accessible over the network.

This CIDR will be from the EXTERNAL VM network.

9.3.5.3 Using the NeutronClient CLI

For more granularity you can utilize the Neutron command line tool to create your external network.

  1. Log in to the Cloud Lifecycle Manager.

  2. Source the Admin creds:

    ardana > source ~/service.osrc
  3. Create the external network and then the subnet using these commands below.

    Creating the network:

    ardana > neutron net-create --router:external <external-network-name>

    Creating the subnet:

    ardana > neutron subnet-create EXTERNAL-NETWORK-NAME CIDR --gateway GATEWAY --allocation-pool start=IP_START,end=IP_END [--disable-dhcp]

    Where:

    ValueDescription
    external-network-name

    This is the name given to your external network. This is a unique value that you will choose. The value ext-net is usually used.

    CIDR

    You can use this switch to specify the external network CIDR. If you choose not to use this switch, or use a wrong value, the VMs will not be accessible over the network.

    This CIDR will be from the EXTERNAL VM network.

    --gateway

    Optional switch to specify the gateway IP for your subnet. If this is not included then it will choose the first available IP.

    --allocation-pool start end

    Optional switch to specify a start and end IP address to use as the allocation pool for this subnet.

    --disable-dhcp

    Optional switch if you want to disable DHCP on this subnet. If this is not specified then DHCP will be enabled.

9.3.5.4 Multiple External Networks

SUSE OpenStack Cloud provides the ability to have multiple external networks, by using the Network Service (Neutron) provider networks for external networks. You can configure SUSE OpenStack Cloud to allow the use of provider VLANs as external networks by following these steps.

  1. Do NOT include the neutron.l3_agent.external_network_bridge tag in the network_groups definition for your cloud. This results in the l3_agent.ini external_network_bridge being set to an empty value (rather than the traditional br-ex).

  2. Configure your cloud to use provider VLANs, by specifying the provider_physical_network tag on one of the network_groups defined for your cloud.

    For example, to run provider VLANS over the EXAMPLE network group: (some attributes omitted for brevity)

    network-groups:
    
      - name: EXAMPLE
        tags:
          - neutron.networks.vlan:
              provider-physical-network: physnet1
  3. After the cloud has been deployed, you can create external networks using provider VLANs.

    For example, using the Network Service CLI:

    1. Create external network 1 on vlan101

      neutron net-create --provider:network_type vlan --provider:physical_network physnet1 --provider:segmentation_id 101 ext-net1 --router:external true
    2. Create external network 2 on vlan102

      neutron net-create --provider:network_type vlan --provider:physical_network physnet1 --provider:segmentation_id 102 ext-net2 --router:external true

9.3.6 Neutron Provider Networks

This topic explains how to create a Neutron provider network.

A provider network is a virtual network created in the SUSE OpenStack Cloud cloud that is consumed by SUSE OpenStack Cloud services. The distinctive element of a provider network is that it does not create a virtual router; rather, it depends on L3 routing that is provided by the infrastructure.

A provider network is created by adding the specification to the SUSE OpenStack Cloud input model. It consists of at least one network and one or more subnets.

9.3.6.1 SUSE OpenStack Cloud input model

The input model is the primary mechanism a cloud admin uses in defining a SUSE OpenStack Cloud installation. It exists as a directory with a data subdirectory that contains YAML files. By convention, any service that creates a Neutron provider network will create a subdirectory under the data directory and the name of the subdirectory shall be the project name. For example, the Octavia project will use Neutron provider networks so it will have a subdirectory named 'octavia' and the config file that specifies the neutron network will exist in that subdirectory.

├── cloudConfig.yml
    ├── data
    │   ├── control_plane.yml
    │   ├── disks_compute.yml
    │   ├── disks_controller_1TB.yml
    │   ├── disks_controller.yml
    │   ├── firewall_rules.yml
    │   ├── net_interfaces.yml
    │   ├── network_groups.yml
    │   ├── networks.yml
    │   ├── neutron
    │   │   └── neutron_config.yml
    │   ├── nic_mappings.yml
    │   ├── server_groups.yml
    │   ├── server_roles.yml
    │   ├── servers.yml
    │   ├── swift
    │   │   └── rings.yml
    │   └── octavia
    │       └── octavia_config.yml
    ├── README.html
    └── README.md

9.3.6.2 Network/Subnet specification

The elements required in the input model for you to define a network are:

  • name

  • network_type

  • physical_network

Elements that are optional when defining a network are:

  • segmentation_id

  • shared

Required elements for the subnet definition are:

  • cidr

Optional elements for the subnet definition are:

  • allocation_pools which will require start and end addresses

  • host_routes which will require a destination and nexthop

  • gateway_ip

  • no_gateway

  • enable-dhcp

NOTE: Only IPv4 is supported at the present time.

9.3.6.3 Network details

The following table outlines the network values to be set, and what they represent.

AttributeRequired/optionalAllowed ValuesUsage
nameRequired  
network_typeRequiredflat, vlan, vxlanThe type of desired network
physical_networkRequiredValidName of physical network that is overlayed with the virtual network
segmentation_idOptionalvlan or vxlan rangesVLAN id for vlan or tunnel id for vxlan
sharedOptionalTrueShared by all projects or private to a single project

9.3.6.4 Subnet details

The following table outlines the subnet values to be set, and what they represent.

AttributeReq/OptAllowed ValuesUsage
cidrRequiredValid CIDR rangefor example, 172.30.0.0/24
allocation_poolsOptionalSee allocation_pools table below 
host_routesOptionalSee host_routes table below 
gateway_ipOptionalValid IP addrSubnet gateway to other nets
no_gatewayOptionalTrueNo distribution of gateway
enable-dhcpOptionalTrueEnable dhcp for this subnet

9.3.6.5 ALLOCATION_POOLS details

The following table explains allocation pool settings.

AttributeReq/OptAllowed ValuesUsage
startRequiredValid IP addrFirst ip address in pool
endRequiredValid IP addrLast ip address in pool

9.3.6.6 HOST_ROUTES details

The following table explains host route settings.

AttributeReq/OptAllowed ValuesUsage
destinationRequiredValid CIDRDestination subnet
nexthopRequiredValid IP addrHop to take to destination subnet
Note
Note

Multiple destination/nexthop values can be used.

9.3.6.7 Examples

The following examples show the configuration file settings for Neutron and Octavia.

Octavia configuration

This file defines the mapping. It does not need to be edited unless you want to change the name of your VLAN.

Path: ~/openstack/my_cloud/definition/data/octavia/octavia_config.yml

---
  product:
    version: 2

  configuration-data:
    - name: OCTAVIA-CONFIG-CP1
      services:
        - octavia
      data:
        amp_network_name: OCTAVIA-MGMT-NET

Neutron configuration

Input your network configuration information for your provider VLANs in neutron_config.yml found here:

~/openstack/my_cloud/definition/data/neutron/.

---
  product:
    version: 2

  configuration-data:
    - name:  NEUTRON-CONFIG-CP1
      services:
        - neutron
      data:
        neutron_provider_networks:
        - name: OCTAVIA-MGMT-NET
          provider:
            - network_type: vlan
              physical_network: physnet1
              segmentation_id: 2754
          cidr: 10.13.189.0/24
          no_gateway:  True
          enable_dhcp: True
          allocation_pools:
            - start: 10.13.189.4
              end: 10.13.189.252
          host_routes:
            # route to MANAGEMENT-NET
            - destination: 10.13.111.128/26
              nexthop:  10.13.189.5

9.3.6.8 Implementing your changes

  1. Commit the changes to git:

    ardana > git add -A
    ardana > git commit -a -m "configuring provider network"
  2. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  3. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  4. Then continue with your clean cloud installation.

  5. If you are only adding a Neutron Provider network to an existing model, then run the neutron-reconfigure.yml playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts neutron-deploy.yml

9.3.6.9 Multiple Provider Networks

The physical network infrastructure must be configured to convey the provider VLAN traffic as tagged VLANs to the cloud compute nodes and network service network nodes. Configuration of the physical network infrastructure is outside the scope of the SUSE OpenStack Cloud 8 software.

SUSE OpenStack Cloud 8 automates the server networking configuration and the Network Service configuration based on information in the cloud definition. To configure the system for provider VLANs, specify the neutron.networks.vlan tag with a provider-physical-network attribute on one or more network groups. For example (some attributes omitted for brevity):

network-groups:

        - name: NET_GROUP_A
        tags:
        - neutron.networks.vlan:
        provider-physical-network: physnet1

        - name: NET_GROUP_B
        tags:
        - neutron.networks.vlan:
        provider-physical-network: physnet2

A network group is associated with a server network interface via an interface model. For example (some attributes omitted for brevity):

interface-models:
        - name: INTERFACE_SET_X
        network-interfaces:
        - device:
        name: bond0
        network-groups:
        - NET_GROUP_A
        - device:
        name: eth3
        network-groups:
        - NET_GROUP_B

A network group used for provider VLANs may contain only a single SUSE OpenStack Cloud network, because that VLAN must span all compute nodes and any Network Service network nodes/controllers (that is, it is a single L2 segment). The SUSE OpenStack Cloud network must be defined with tagged-vlan false, otherwise a Linux VLAN network interface will be created. For example:

networks:

        - name: NET_A
        tagged-vlan: false
        network-group: NET_GROUP_A

        - name: NET_B
        tagged-vlan: false
        network-group: NET_GROUP_B

When the cloud is deployed, SUSE OpenStack Cloud 8 will create the appropriate bridges on the servers, and set the appropriate attributes in the Neutron configuration files (for example, bridge_mappings).

After the cloud has been deployed, create Network Service network objects for each provider VLAN. For example, using the Network Service CLI:

ardana > neutron net-create --provider:network_type vlan --provider:physical_network physnet1 --provider:segmentation_id 101 mynet101
ardana > neutron net-create --provider:network_type vlan --provider:physical_network physnet2 --provider:segmentation_id 234 mynet234

9.3.6.10 More Information

For more information on the Network Service command-line interface (CLI), see the OpenStack networking command-line client reference: http://docs.openstack.org/cli-reference/content/neutronclient_commands.html

9.3.7 Using IPAM Drivers in the Networking Service

This topic describes how to choose and implement an IPAM driver.

9.3.7.1 Selecting and implementing an IPAM driver

Beginning with the Liberty release, OpenStack networking includes a pluggable interface for the IP Address Management (IPAM) function. This interface creates a driver framework for the allocation and de-allocation of subnets and IP addresses, enabling the integration of alternate IPAM implementations or third-party IP Address Management systems.

There are three possible IPAM driver options:

  • Non-pluggable driver. This option is the default when the ipam_driver parameter is not specified in neutron.conf.

  • Pluggable reference IPAM driver. The pluggable IPAM driver interface was introduced in SUSE OpenStack Cloud 8 (OpenStack Liberty). It is a refactoring of the Kilo non-pluggable driver to use the new pluggable interface. The setting in neutron.conf to specify this driver is ipam_driver = internal.

  • Pluggable Infoblox IPAM driver. The pluggable Infoblox IPAM driver is a third-party implementation of the pluggable IPAM interface. the corresponding setting in neutron.conf to specify this driver is ipam_driver = networking_infoblox.ipam.driver.InfobloxPool.

    Note
    Note

    You can use either the non-pluggable IPAM driver or a pluggable one. However, you cannot use both.

9.3.7.2 Using the Pluggable reference IPAM driver

To indicate that you want to use the Pluggable reference IPAM driver, the only parameter needed is "ipam_driver." You can set it by looking for the following commented line in the neutron.conf.j2 template (ipam_driver = internal) uncommenting it, and committing the file. After following the standard steps to deploy Neutron, Neutron will be configured to run using the Pluggable reference IPAM driver.

As stated, the file you must edit is neutron.conf.j2 on the Cloud Lifecycle Manager in the directory ~/openstack/my_cloud/config/neutron. Here is the relevant section where you can see the ipam_driver parameter commented out:

[DEFAULT]
  ...
  l3_ha_net_cidr = 169.254.192.0/18

  # Uncomment the line below if the Reference Pluggable IPAM driver is to be used
  # ipam_driver = internal
  ...

After uncommenting the line ipam_driver = internal, commit the file using git commit from the openstack/my_cloud directory:

ardana > git commit -a -m 'My config for enabling the internal IPAM Driver'

Then follow the steps to deploy SUSE OpenStack Cloud in the Book “Installing with Cloud Lifecycle Manager”, Preface “Installation Overview” appropriate to your cloud configuration.

Note
Note

Currently there is no migration path from the non-pluggable driver to a pluggable IPAM driver because changes are needed to database tables and Neutron currently cannot make those changes.

9.3.7.3 Using the Infoblox IPAM driver

As suggested above, using the Infoblox IPAM driver requires changes to existing parameters in nova.conf and neutron.conf. If you want to use the infoblox appliance, you will need to add the "infoblox service-component" to the service-role containing the neutron API server. To use the infoblox appliance for IPAM, both the agent and the Infoblox IPAM driver are required. The infoblox-ipam-agent should be deployed on the same node where the neutron-server component is running. Usually this is a Controller node.

  1. Have the Infoblox appliance running on the management network (the Infoblox appliance admin or the datacenter administrator should know how to perform this step).

  2. Change the control plane definition to add infoblox-ipam-agent as a service in the controller node cluster (see change in bold). Make the changes in control_plane.yml found here: ~/openstack/my_cloud/definition/data/control_plane.yml

    ---
      product:
        version: 2
    
      control-planes:
        - name: ccp
          control-plane-prefix: ccp
     ...
          clusters:
            - name: cluster0
              cluster-prefix: c0
              server-role: ARDANA-ROLE
              member-count: 1
              allocation-policy: strict
              service-components:
                - lifecycle-manager
            - name: cluster1
              cluster-prefix: c1
              server-role: CONTROLLER-ROLE
              member-count: 3
              allocation-policy: strict
              service-components:
                - ntp-server
    ...
                - neutron-server
                - infoblox-ipam-agent
    ...
                - designate-client
                - powerdns
          resources:
            - name: compute
              resource-prefix: comp
              server-role: COMPUTE-ROLE
              allocation-policy: any
  3. Modify the ~/openstack/my_cloud/config/neutron/neutron.conf.j2 file on the controller node to comment and uncomment the lines noted below to enable use with the Infoblox appliance:

    [DEFAULT]
                ...
                l3_ha_net_cidr = 169.254.192.0/18
    
    
                # Uncomment the line below if the Reference Pluggable IPAM driver is to be used
                # ipam_driver = internal
    
    
                # Comment out the line below if the Infoblox IPAM Driver is to be used
                # notification_driver = messaging
    
                # Uncomment the lines below if the Infoblox IPAM driver is to be used
                ipam_driver = networking_infoblox.ipam.driver.InfobloxPool
                notification_driver = messagingv2
    
    
                # Modify the infoblox sections below to suit your cloud environment
    
                [infoblox]
                cloud_data_center_id = 1
                # This name of this section is formed by "infoblox-dc:<infoblox.cloud_data_center_id>"
                # If cloud_data_center_id is 1, then the section name is "infoblox-dc:1"
    
                [infoblox-dc:0]
                http_request_timeout = 120
                http_pool_maxsize = 100
                http_pool_connections = 100
                ssl_verify = False
                wapi_version = 2.2
                admin_user_name = admin
                admin_password = infoblox
                grid_master_name = infoblox.localdomain
                grid_master_host = 1.2.3.4
    
    
                [QUOTAS]
                ...
  4. Change nova.conf.j2 to replace the notification driver "messaging" to "messagingv2"

     ...
    
     # Oslo messaging
     notification_driver = log
    
     #  Note:
     #  If the infoblox-ipam-agent is to be deployed in the cloud, change the
     #  notification_driver setting from "messaging" to "messagingv2".
     notification_driver = messagingv2
     notification_topics = notifications
    
     # Policy
     ...
  5. Commit the changes:

    ardana > cd ~/openstack/my_cloud
    ardana > git commit –a –m 'My config for enabling the Infoblox IPAM driver'
  6. Deploy the cloud with the changes. Due to changes to the control_plane.yml, you will need to rerun the config-processor-run.yml playbook if you have run it already during the install process.

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts site.yml

9.3.7.4 Configuration parameters for using the Infoblox IPAM driver

Changes required in the notification parameters in nova.conf:

Parameter NameSection in nova.confDefault ValueCurrent Value Description
notify_on_state_changeDEFAULTNonevm_and_task_state

Send compute.instance.update notifications on instance state changes.

Vm_and_task_state means notify on vm and task state changes.

Infoblox requires the value to be vm_state (notify on vm state change).

Thus NO CHANGE is needed for infoblox

notification_topicsDEFAULTempty listnotifications

NO CHANGE is needed for infoblox.

The infoblox installation guide requires the notifications to be "notifications"

notification_driverDEFAULTNonemessaging

Change needed.

The infoblox installation guide requires the notification driver to be "messagingv2".

Changes to existing parameters in neutron.conf

Parameter NameSection in neutron.confDefault ValueCurrent Value Description
ipam_driverDEFAULTNone

None

(param is undeclared in neutron.conf)

Pluggable IPAM driver to be used by Neutron API server.

For infoblox, the value is "networking_infoblox.ipam.driver.InfobloxPool"

notification_driverDEFAULTempty listmessaging

The driver used to send notifications from the Neutron API server to the Neutron agents.

The installation guide for networking-infoblox calls for the notification_driver to be "messagingv2"

notification_topicsDEFAULTNonenotifications

No change needed.

The row is here show the changes in the Neutron parameters described in the installation guide for networking-infoblox

Parameters specific to the Networking Infoblox Driver. All the parameters for the Infoblox IPAM driver must be defined in neutron.conf.

Parameter NameSection in neutron.confDefault ValueDescription
cloud_data_center_idinfoblox0ID for selecting a particular grid from one or more grids to serve networks in the Infoblox back end
ipam_agent_workersinfoblox1Number of Infoblox IPAM agent works to run
grid_master_hostinfoblox-dc.<cloud_data_center_id>empty stringIP address of the grid master. WAPI requests are sent to the grid_master_host
ssl_verifyinfoblox-dc.<cloud_data_center_id>FalseEnsure whether WAPI requests sent over HTTPS require SSL verification
WAPI Versioninfoblox-dc.<cloud_data_center_id>1.4The WAPI version. Value should be 2.2.
admin_user_nameinfoblox-dc.<cloud_data_center_id>empty stringAdmin user name to access the grid master or cloud platform appliance
admin_passwordinfoblox-dc.<cloud_data_center_id>empty stringAdmin user password
http_pool_connectionsinfoblox-dc.<cloud_data_center_id>100 
http_pool_maxsizeinfoblox-dc.<cloud_data_center_id>100 
http_request_timeoutinfoblox-dc.<cloud_data_center_id>120 

The diagram below shows Nova compute sending notification to the infoblox-ipam-agent

9.3.7.5 Limitations

  • There is no IPAM migration path from non-pluggable to pluggable IPAM driver (https://bugs.launchpad.net/neutron/+bug/1516156). This means there is no way to reconfigure the Neutron database if you wanted to change Neutron to use a pluggable IPAM driver. Unless you change the default of non-pluggable IPAM configuration to a pluggable driver at install time, you will have no other opportunity to make that change because reconfiguration of SUSE OpenStack Cloud 8from using the default non-pluggable IPAM configuration to SUSE OpenStack Cloud 8 using a pluggable IPAM driver is not supported.

  • Upgrade from previous versions of SUSE OpenStack Cloud to SUSE OpenStack Cloud 8 to use a pluggable IPAM driver is not supported.

  • The Infoblox appliance does not allow for overlapping IPs. For example, only one tenant can have a CIDR of 10.0.0.0/24.

  • The infoblox IPAM driver fails the creation of a subnet when a there is no gateway-ip supplied. For example, the command "neutron subnet-create ... --no-gateway ..." will fail.

9.3.8 Configuring Load Balancing as a Service (LBaaS)

SUSE OpenStack Cloud 8 LBaaS Configuration

Load Balancing as a Service (LBaaS) is an advanced networking service that allows load balancing of multi-node environments. It provides the ability to spread requests across multiple servers thereby reducing the load on any single server. This document describes the installation steps for LBaaS v1 (see prerequisites) and the configuration for LBaaS v1 and v2.

SUSE OpenStack Cloud 8 can support either LBaaS v1 or LBaaS v2 to allow for wide ranging customer requirements. If the decision is made to utilize LBaaS v1 it is highly unlikely that you will be able to perform an on-line upgrade of the service to v2 after the fact as the internal data structures are significantly different. Should you wish to attempt an upgrade, support will be needed from Sales Engineering and your chosen load balancer partner.

Warning
Warning

The LBaaS architecture is based on a driver model to support different load balancers. LBaaS-compatible drivers are provided by load balancer vendors including F5 and Citrix. A new software load balancer driver was introduced in the OpenStack Liberty release called "Octavia". The Octavia driver deploys a software load balancer called HAProxy. Octavia is the default load balancing provider in SUSE OpenStack Cloud 8 for LBaaS V2. Until Octavia is configured the creation of load balancers will fail with an error. Please refer to Book “Installing with Cloud Lifecycle Manager”, Chapter 32 “Configuring Load Balancer as a Service” document for information on installing Octavia.

Warning
Warning

Before upgrading to SUSE OpenStack Cloud 8, contact F5 and SUSE to determine which F5 drivers have been certified for use with SUSE OpenStack Cloud. Loading drivers not certified by SUSE may result in failure of your cloud deployment.

LBaaS V2 offers with Book “Installing with Cloud Lifecycle Manager”, Chapter 32 “Configuring Load Balancer as a Service” a software load balancing solution that supports both a highly available control plane and data plane. However, should an external hardware load balancer be selected the cloud operation can achieve additional performance and availability.

LBaaS v1

Reasons to select this version.

  1. You must be able to configure LBaaS via Horizon.

  2. Your hardware load balancer vendor does not currently support LBaaS v2.

Reasons not to select this version.

  1. No active development is being performed on this API in the OpenStack community. (Security fixes are still being worked upon).

  2. It does not allow for multiple ports on the same VIP (for example, to support both port 80 and 443 on a single VIP).

  3. It will never be able to support TLS termination/re-encryption at the load balancer.

  4. It will never be able to support L7 rules for load balancing.

  5. LBaaS v1 will likely become officially deprecated by the OpenStack community at the Tokyo (October 2015) summit.

LBaaS v2

Reasons to select this version.

  1. Your vendor already has a driver that supports LBaaS v2. Many hardware load balancer vendors already support LBaaS v2 and this list is growing all the time.

  2. You intend to script your load balancer creation and management so a UI is not important right now (Horizon support will be added in a future release).

  3. You intend to support TLS termination at the load balancer.

  4. You intend to use the Octavia software load balancer (adding HA and scalability).

  5. You do not want to take your load balancers offline to perform subsequent LBaaS upgrades.

  6. You intend in future releases to need L7 load balancing.

Reasons not to select this version.

  1. Your LBaaS vendor does not have a v2 driver.

  2. You must be able to manage your load balancers from Horizon.

  3. You have legacy software which utilizes the LBaaS v1 API.

LBaaS v1 requires configuration changes prior to installation and is not recommended. LBaaS v2 is installed by default with SUSE OpenStack Cloud and requires minimal configuration to start the service.

Note
Note

Only LBaaS V2 API currently supports load balancer failover with Octavia. However, in LBaaS V1 and if Octavia is not deployed when a load balancer is deleted it will need to be manually recreated. LBaaS v2 API includes automatic failover of a deployed load balancer with Octavia. More information about this driver can be found in Book “Installing with Cloud Lifecycle Manager”, Chapter 32 “Configuring Load Balancer as a Service”.

9.3.8.1 Prerequisites

SUSE OpenStack Cloud LBaaS v1

Installing LBaaS v1

Important
Important

It is not recommended that LBaaS v1 is used in a production environment. It is recommended you use LBaaS v2. If you do deploy LBaaS v1, the upgrade to LBaaS v2 is non-trivial and may require the use of professional services.

Note
Note

If you need to run LBaaS v1 instead of the default LBaaS v2, you should make appropriate installation preparations during SUSE OpenStack Cloud installation since LBaaS v2 is the default. If you have selected to install and use LBaaS v1 you will replace the control_plane.yml directories and neutron.conf.j2 file to use version 1.

Before you modify the control_plane.yml file, it is recommended that you back up the original version of this file. Once you have backed them up, modify the control_plane.yml file.

  1. Edit ~/openstack/my_cloud/definition/data/control_plane.yml - depending on your installation the control_plane.yml file might be in a different location.

  2. In the section specifying the compute nodes (resources/compute) replace neutron-lbaasv2-agent with neutron-lbaas-agent - there will only be one occurrence in that file.

  3. Save the modified file.

  4. Follow the steps in Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management” to commit and apply the changes.

  5. To test the installation follow the steps outlined in Book “Installing with Cloud Lifecycle Manager”, Chapter 32 “Configuring Load Balancer as a Service” after you have created a suitable subnet, see: Book “Installing with Cloud Lifecycle Manager”, Chapter 28 “UI Verification”, Section 28.4 “Creating an External Network”.

SUSE OpenStack Cloud LBaaS v2

  1. SUSE OpenStack Cloud must be installed for LBaaS v2.

  2. Follow the instructions to install Book “Installing with Cloud Lifecycle Manager”, Chapter 32 “Configuring Load Balancer as a Service”

9.3.9 Load Balancer: Octavia Driver Administration

This document provides the instructions on how to enable and manage various components of the Load Balancer Octavia driver if that driver is enabled.

9.3.9.1 Monasca Alerts

The Monasca-agent has the following Octavia-related plugins:

  • Process checks – checks if octavia processes are running. When it starts, it detects which processes are running and then monitors them.

  • http_connect check – checks if it can connect to octavia api servers.

Alerts are displayed in the Operations Console. For more information see Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.1 “Operations Console Overview”.

9.3.9.2 Tuning Octavia Installation

Homogeneous Compute Configuration

Octavia works only with homogeneous compute node configurations. Currently, Octavia does not support multiple nova flavors. If Octavia needs to be supported on multiple compute nodes, then all the compute nodes should carry same set of physnets (which will be used for Octavia).

Octavia and Floating IPs

Due to a Neutron limitation Octavia will only work with CVR routers. Another option is to use VLAN provider networks which do not require a router.

You cannot currently assign a floating IP address as the VIP (user facing) address for a load balancer created by the Octavia driver if the underlying Neutron network is configured to support Distributed Virtual Router (DVR). The Octavia driver uses a Neutron function known as allowed address pairs to support load balancer fail over.

There is currently a Neutron bug that does not support this function in a DVR configuration

Octavia Configuration Files

The system comes pre-tuned and should not need any adjustments for most customers. If in rare instances manual tuning is needed, follow these steps:

Warning
Warning

Changes might be lost during SUSE OpenStack Cloud upgrades.

Edit the Octavia configuration files in my_cloud/config/octavia. It is recommended that any changes be made in all of the Octavia configuration files.

  • octavia-api.conf.j2

  • octavia-health-manager.conf.j2

  • octavia-housekeeping.conf.j2

  • octavia-worker.conf.j2

After the changes are made to the configuration files, redeploy the service.

  1. Commit changes to git.

    ardana > cd ~/openstack
    ardana > git add -A
    ardana > git commit -m "My Octavia Config"
  2. Run the configuration processor and ready deployment.

    ardana > cd ~/openstack/ardana/ansible/
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  3. Run the Octavia reconfigure.

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts octavia-reconfigure.yml

Spare Pools

The Octavia driver provides support for creating spare pools of the HAProxy software installed in VMs. This means instead of creating a new load balancer when loads increase, create new load balancer calls will pull a load balancer from the spare pool. The spare pools feature consumes resources, therefore the load balancers in the spares pool has been set to 0, which is the default and also disables the feature.

Reasons to enable a load balancing spare pool in SUSE OpenStack Cloud

  1. You expect a large number of load balancers to be provisioned all at once (puppet scripts, or ansible scripts) and you want them to come up quickly.

  2. You want to reduce the wait time a customer has while requesting a new load balancer.

To increase the number of load balancers in your spares pool, edit the Octavia configuration files by uncommenting the spare_amphora_pool_size and adding the number of load balancers you would like to include in your spares pool.

# Pool size for the spare pool
# spare_amphora_pool_size = 0
Important
Important

In SUSE OpenStack Cloud the spare pool cannot be used to speed up fail overs. If a load balancer fails in SUSE OpenStack Cloud, Octavia will always provision a new VM to replace that failed load balancer.

9.3.9.3 Managing Amphora

Octavia starts a separate VM for each load balancing function. These VMs are called amphora.

Updating the Cryptographic Certificates

Octavia uses two-way SSL encryption for communication between amphora and the control plane. Octavia keeps track of the certificates on the amphora and will automatically recycle them. The certificates on the control plane are valid for one year after installation of SUSE OpenStack Cloud.

You can check on the status of the certificate by logging into the controller node as root and running:

ardana > cd /opt/stack/service/octavia-SOME UUID/etc/certs/
openssl x509 -in client.pem  -text –noout

This prints the certificate out where you can check on the expiration dates.

To renew the certificates, reconfigure Octavia. Reconfiguring causes Octavia to automatically generate new certificates and deploy them to the controller hosts.

On the Cloud Lifecycle Manager execute octavia-reconfigure:

ardana > cd ~/scratch/ansible/next/ardana/ansible
ardana > ansible-playbook -i hosts/verb_hosts octavia-reconfigure.yml

Accessing VM information in Nova

You can use openstack project list as an administrative user to obtain information about the tenant or project-id of the Octavia project. In the example below, the Octavia project has a project-id of 37fd6e4feac14741b6e75aba14aea833.

ardana > openstack project list
+----------------------------------+------------------+
| ID                               | Name             |
+----------------------------------+------------------+
| 055071d8f25d450ea0b981ca67f7ccee | glance-swift     |
| 37fd6e4feac14741b6e75aba14aea833 | octavia          |
| 4b431ae087ef4bd285bc887da6405b12 | swift-monitor    |
| 8ecf2bb5754646ae97989ba6cba08607 | swift-dispersion |
| b6bd581f8d9a48e18c86008301d40b26 | services         |
| bfcada17189e4bc7b22a9072d663b52d | cinderinternal   |
| c410223059354dd19964063ef7d63eca | monitor          |
| d43bc229f513494189422d88709b7b73 | admin            |
| d5a80541ba324c54aeae58ac3de95f77 | demo             |
| ea6e039d973e4a58bbe42ee08eaf6a7a | backup           |
+----------------------------------+------------------+

You can then use nova list --tenant <project-id> to list the VMs for the Octavia tenant. Take particular note of the IP address on the OCTAVIA-MGMT-NET; in the example below it is 172.30.1.11. For additional nova command-line options see Section 9.3.9.4, “For More Information”.

ardana > nova list --tenant 37fd6e4feac14741b6e75aba14aea833
+--------------------------------------+----------------------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------+
| ID                                   | Name                                         | Tenant ID                        | Status | Task State | Power State | Networks                                       |
+--------------------------------------+----------------------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------+
| 1ed8f651-de31-4208-81c5-817363818596 | amphora-1c3a4598-5489-48ea-8b9c-60c821269e4c | 37fd6e4feac14741b6e75aba14aea833 | ACTIVE | -          | Running     | private=10.0.0.4; OCTAVIA-MGMT-NET=172.30.1.11 |
+--------------------------------------+----------------------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------+
Important
Important

The Amphora VMs do not have SSH or any other access. In the rare case that there is a problem with the underlying load balancer the whole amphora will need to be replaced.

Initiating Failover of an Amphora VM

Under normal operations Octavia will monitor the health of the amphora constantly and automatically fail them over if there are any issues. This helps to minimize any potential downtime for load balancer users. There are, however, a few cases a failover needs to be initiated manually:

  1. The Loadbalancer has become unresponsive and Octavia has not detected an error.

  2. A new image has become available and existing load balancers need to start using the new image.

  3. The cryptographic certificates to control and/or the HMAC password to verify Health information of the amphora have been compromised. See Controller to Amphorae communications for more information.

To minimize the impact for end users we will keep the existing load balancer working until shortly before the new one has been provisioned. There will be a short interruption for the load balancing service so keep that in mind when scheduling the failovers. To achieve that follow these steps (assuming the management ip from the previous step):

  1. Assign the IP to a SHELL variable for better readability.

    ardana > export MGM_IP=172.30.1.11
  2. Identify the port of the vm on the management network.

    ardana > neutron port-list | grep $MGM_IP
    | 0b0301b9-4ee8-4fb6-a47c-2690594173f4 |                                                   | fa:16:3e:d7:50:92 |
    {"subnet_id": "3e0de487-e255-4fc3-84b8-60e08564c5b7", "ip_address": "172.30.1.11"} |
  3. Disable the port to initiate a failover. Note the load balancer will still function but cannot be controlled any longer by Octavia.

    Note
    Note

    Changes after disabling the port will result in errors.

    ardana > neutron port-update --admin-state-up False 0b0301b9-4ee8-4fb6-a47c-2690594173f4
    Updated port: 0b0301b9-4ee8-4fb6-a47c-2690594173f4
  4. You can check to see if the amphora failed over with nova list --tenant <project-id>. This may take some time and in some cases may need to be repeated several times. You can tell that the failover has been successful by the changed IP on the management network.

    ardana > nova list --tenant 37fd6e4feac14741b6e75aba14aea833
    +--------------------------------------+----------------------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------+
    | ID                                   | Name                                         | Tenant ID                        | Status | Task State | Power State | Networks                                       |
    +--------------------------------------+----------------------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------+
    | 1ed8f651-de31-4208-81c5-817363818596 | amphora-1c3a4598-5489-48ea-8b9c-60c821269e4c | 37fd6e4feac14741b6e75aba14aea833 | ACTIVE | -          | Running     | private=10.0.0.4; OCTAVIA-MGMT-NET=172.30.1.12 |
    +--------------------------------------+----------------------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------+
Warning
Warning

Do not issue too many failovers at once. In a big installation you might be tempted to initiate several failovers in parallel for instance to speed up an update of amphora images. This will put a strain on the nova service and depending on the size of your installation you might need to throttle the failover rate.

9.3.9.4 For More Information

For more information on the Nova command-line client, see the OpenStack Compute command-line client guide.

For more information on Octavia terminology, see the OpenStack Octavia Glossary

9.3.10 Role-based Access Control in Neutron

This topic explains how to achieve more granular access control for your Neutron networks.

Previously in SUSE OpenStack Cloud, a network object was either private to a project or could be used by all projects. If the network's shared attribute was True, then the network could be used by every project in the cloud. If false, only the members of the owning project could use it. There was no way for the network to be shared by only a subset of the projects.

Neutron Role Based Access Control (RBAC) solves this problem for networks. Now the network owner can create RBAC policies that give network access to target projects. Members of a targeted project can use the network named in the RBAC policy the same way as if the network was owned by the project. Constraints are described in the section Section 9.3.10.10, “Limitations”.

With RBAC you are able to let another tenant use a network that you created, but as the owner of the network, you need to create the subnet and the router for the network.

9.3.10.1 Creating a Network

ardana > openstack network create demo-net
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | UP                                   |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2018-07-25T17:43:59Z                 |
| description               |                                      |
| dns_domain                |                                      |
| id                        | 9c801954-ec7f-4a65-82f8-e313120aabc4 |
| ipv4_address_scope        | None                                 |
| ipv6_address_scope        | None                                 |
| is_default                | False                                |
| is_vlan_transparent       | None                                 |
| mtu                       | 1450                                 |
| name                      | demo-net                             |
| port_security_enabled     | False                                |
| project_id                | cb67c79e25a84e328326d186bf703e1b     |
| provider:network_type     | vxlan                                |
| provider:physical_network | None                                 |
| provider:segmentation_id  | 1009                                 |
| qos_policy_id             | None                                 |
| revision_number           | 2                                    |
| router:external           | Internal                             |
| segments                  | None                                 |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   |                                      |
| tags                      |                                      |
| updated_at                | 2018-07-25T17:43:59Z                 |
+---------------------------+--------------------------------------+

9.3.10.2 Creating an RBAC Policy

Here we will create an RBAC policy where a member of the project called 'demo' will share the network with members of project 'demo2'

To create the RBAC policy, run:

ardana > openstack network rbac create  --target-project DEMO2-PROJECT-ID --type network --action access_as_shared demo-net

Here is an example where the DEMO2-PROJECT-ID is 5a582af8b44b422fafcd4545bd2b7eb5

ardana > openstack network rbac create --target-tenant 5a582af8b44b422fafcd4545bd2b7eb5 \
  --type network --action access_as_shared demo-net

9.3.10.3 Listing RBACs

To list all the RBAC rules/policies, execute:

ardana > openstack network rbac list
+--------------------------------------+-------------+--------------------------------------+
| ID                                   | Object Type | Object ID                            |
+--------------------------------------+-------------+--------------------------------------+
| 0fdec7f0-9b94-42b4-a4cd-b291d04282c1 | network     | 7cd94877-4276-488d-b682-7328fc85d721 |
+--------------------------------------+-------------+--------------------------------------+

9.3.10.4 Listing the Attributes of an RBAC

To see the attributes of a specific RBAC policy, run

ardana > openstack network rbac show POLICY-ID

For example:

ardana > openstack network rbac show 0fd89dcb-9809-4a5e-adc1-39dd676cb386

Here is the output:

+---------------+--------------------------------------+
| Field         | Value                                |
+---------------+--------------------------------------+
| action        | access_as_shared                     |
| id            | 0fd89dcb-9809-4a5e-adc1-39dd676cb386 |
| object_id     | c3d55c21-d8c9-4ee5-944b-560b7e0ea33b |
| object_type   | network                              |
| target_tenant | 5a582af8b44b422fafcd4545bd2b7eb5     |
| tenant_id     | 75eb5efae5764682bca2fede6f4d8c6f     |
+---------------+--------------------------------------+

9.3.10.5 Deleting an RBAC Policy

To delete an RBAC policy, run openstack network rbac delete passing the policy id:

ardana > openstack network rbac delete POLICY-ID

For example:

ardana > openstack network rbac delete 0fd89dcb-9809-4a5e-adc1-39dd676cb386

Here is the output:

Deleted rbac_policy: 0fd89dcb-9809-4a5e-adc1-39dd676cb386

9.3.10.6 Sharing a Network with All Tenants

Either the administrator or the network owner can make a network shareable by all tenants.

The administrator can make a tenant's network shareable by all tenants. To make the network demo-shareall-net accessible by all tenants in the cloud:

To share a network with all tenants:

  1. Get a list of all projects

    ardana > ~/service.osrc
    ardana > openstack project list

    which produces the list:

    +----------------------------------+------------------+
    | ID                               | Name             |
    +----------------------------------+------------------+
    | 1be57778b61645a7a1c07ca0ac488f9e | demo             |
    | 5346676226274cd2b3e3862c2d5ceadd | admin            |
    | 749a557b2b9c482ca047e8f4abf348cd | swift-monitor    |
    | 8284a83df4df429fb04996c59f9a314b | swift-dispersion |
    | c7a74026ed8d4345a48a3860048dcb39 | demo-sharee      |
    | e771266d937440828372090c4f99a995 | glance-swift     |
    | f43fb69f107b4b109d22431766b85f20 | services         |
    +----------------------------------+------------------+
  2. Get a list of networks:

    ardana > openstack network list

    This produces the following list:

    +--------------------------------------+-------------------+----------------------------------------------------+
    | id                                   | name              | subnets                                            |
    +--------------------------------------+-------------------+----------------------------------------------------+
    | f50f9a63-c048-444d-939d-370cb0af1387 | ext-net           | ef3873db-fc7a-4085-8454-5566fb5578ea 172.31.0.0/16 |
    | 9fb676f5-137e-4646-ac6e-db675a885fd3 | demo-net          | 18fb0b77-fc8b-4f8d-9172-ee47869f92cc 10.0.1.0/24   |
    | 8eada4f7-83cf-40ba-aa8c-5bf7d87cca8e | demo-shareall-net | 2bbc85a9-3ffe-464c-944b-2476c7804877 10.0.250.0/24 |
    | 73f946ee-bd2b-42e9-87e4-87f19edd0682 | demo-share-subset | c088b0ef-f541-42a7-b4b9-6ef3c9921e44 10.0.2.0/24   |
    +--------------------------------------+-------------------+----------------------------------------------------+
  3. Set the network you want to share to a shared value of True:

    ardana > openstack network set --share 8eada4f7-83cf-40ba-aa8c-5bf7d87cca8e

    You should see the following output:

    Updated network: 8eada4f7-83cf-40ba-aa8c-5bf7d87cca8e
  4. Check the attributes of that network by running the following command using the ID of the network in question:

    ardana > openstack network show 8eada4f7-83cf-40ba-aa8c-5bf7d87cca8e

    The output will look like this:

    +---------------------------+--------------------------------------+
    | Field                     | Value                                |
    +---------------------------+--------------------------------------+
    | admin_state_up            | UP                                   |
    | availability_zone_hints   |                                      |
    | availability_zones        |                                      |
    | created_at                | 2018-07-25T17:43:59Z                 |
    | description               |                                      |
    | dns_domain                |                                      |
    | id                        | 8eada4f7-83cf-40ba-aa8c-5bf7d87cca8e |
    | ipv4_address_scope        | None                                 |
    | ipv6_address_scope        | None                                 |
    | is_default                | None                                 |
    | is_vlan_transparent       | None                                 |
    | mtu                       | 1450                                 |
    | name                      | demo-net                             |
    | port_security_enabled     | False                                |
    | project_id                | cb67c79e25a84e328326d186bf703e1b     |
    | provider:network_type     | vxlan                                |
    | provider:physical_network | None                                 |
    | provider:segmentation_id  | 1009                                 |
    | qos_policy_id             | None                                 |
    | revision_number           | 2                                    |
    | router:external           | Internal                             |
    | segments                  | None                                 |
    | shared                    | False                                |
    | status                    | ACTIVE                               |
    | subnets                   |                                      |
    | tags                      |                                      |
    | updated_at                | 2018-07-25T17:43:59Z                 |
    +---------------------------+--------------------------------------+
  5. As the owner of the demo-shareall-net network, view the RBAC attributes for demo-shareall-net (id=8eada4f7-83cf-40ba-aa8c-5bf7d87cca8e) by first getting an RBAC list:

    ardana > echo $OS_USERNAME ; echo $OS_PROJECT_NAME
    demo
    demo
    ardana > openstack network rbac list

    This produces the list:

    +--------------------------------------+--------------------------------------+
    | id                                   | object_id                            |
    +--------------------------------------+--------------------------------------+
    | ...                                                                         |
    | 3e078293-f55d-461c-9a0b-67b5dae321e8 | 8eada4f7-83cf-40ba-aa8c-5bf7d87cca8e |
    +--------------------------------------+--------------------------------------+
  6. View the RBAC information:

    ardana > openstack network rbac show 3e078293-f55d-461c-9a0b-67b5dae321e8
    
    +---------------+--------------------------------------+
    | Field         | Value                                |
    +---------------+--------------------------------------+
    | action        | access_as_shared                     |
    | id            | 3e078293-f55d-461c-9a0b-67b5dae321e8 |
    | object_id     | 8eada4f7-83cf-40ba-aa8c-5bf7d87cca8e |
    | object_type   | network                              |
    | target_tenant | *                                    |
    | tenant_id     | 1be57778b61645a7a1c07ca0ac488f9e     |
    +---------------+--------------------------------------+
  7. With network RBAC, the owner of the network can also make the network shareable by all tenants. First create the network:

    ardana > echo $OS_PROJECT_NAME ; echo $OS_USERNAME
    demo
    demo
    ardana > openstack network create test-net

    The network is created:

    +---------------------------+--------------------------------------+
    | Field                     | Value                                |
    +---------------------------+--------------------------------------+
    | admin_state_up            | UP                                   |
    | availability_zone_hints   |                                      |
    | availability_zones        |                                      |
    | created_at                | 2018-07-25T18:04:25Z                 |
    | description               |                                      |
    | dns_domain                |                                      |
    | id                        | a4bd7c3a-818f-4431-8cdb-fedf7ff40f73 |
    | ipv4_address_scope        | None                                 |
    | ipv6_address_scope        | None                                 |
    | is_default                | False                                |
    | is_vlan_transparent       | None                                 |
    | mtu                       | 1450                                 |
    | name                      | test-net                             |
    | port_security_enabled     | False                                |
    | project_id                | cb67c79e25a84e328326d186bf703e1b     |
    | provider:network_type     | vxlan                                |
    | provider:physical_network | None                                 |
    | provider:segmentation_id  | 1073                                 |
    | qos_policy_id             | None                                 |
    | revision_number           | 2                                    |
    | router:external           | Internal                             |
    | segments                  | None                                 |
    | shared                    | False                                |
    | status                    | ACTIVE                               |
    | subnets                   |                                      |
    | tags                      |                                      |
    | updated_at                | 2018-07-25T18:04:25Z                 |
    +---------------------------+--------------------------------------+
  8. Create the RBAC. It is important that the asterisk is surrounded by single-quotes to prevent the shell from expanding it to all files in the current directory.

    ardana > openstack network rbac create --type network \
      --action access_as_shared --target-project '*' test-net

    Here are the resulting RBAC attributes:

    +---------------+--------------------------------------+
    | Field         | Value                                |
    +---------------+--------------------------------------+
    | action        | access_as_shared                     |
    | id            | 0b797cc6-debc-48a1-bf9d-d294b077d0d9 |
    | object_id     | a4bd7c3a-818f-4431-8cdb-fedf7ff40f73 |
    | object_type   | network                              |
    | target_tenant | *                                    |
    | tenant_id     | 1be57778b61645a7a1c07ca0ac488f9e     |
    +---------------+--------------------------------------+

9.3.10.7 Target Project (demo2) View of Networks and Subnets

Note that the owner of the network and subnet is not the tenant named demo2. Both the network and subnet are owned by tenant demo. Demo2members cannot create subnets of the network. They also cannot modify or delete subnets owned by demo.

As the tenant demo2, you can get a list of neutron networks:

ardana > openstack network list
+--------------------------------------+-----------+--------------------------------------------------+
| id                                   | name      | subnets                                          |
+--------------------------------------+-----------+--------------------------------------------------+
| f60f3896-2854-4f20-b03f-584a0dcce7a6 | ext-net   | 50e39973-b2e3-466b-81c9-31f4d83d990b             |
| c3d55c21-d8c9-4ee5-944b-560b7e0ea33b | demo-net  | d9b765da-45eb-4543-be96-1b69a00a2556 10.0.1.0/24 |
   ...
+--------------------------------------+-----------+--------------------------------------------------+

And get a list of subnets:

ardana > openstack subnet list --network c3d55c21-d8c9-4ee5-944b-560b7e0ea33b
+--------------------------------------+---------+--------------------------------------+---------------+
| ID                                   | Name    | Network                              | Subnet        |
+--------------------------------------+---------+--------------------------------------+---------------+
| a806f28b-ad66-47f1-b280-a1caa9beb832 | ext-net | c3d55c21-d8c9-4ee5-944b-560b7e0ea33b | 10.0.1.0/24   |
+--------------------------------------+---------+--------------------------------------+---------------+

To show details of the subnet:

ardana > openstack subnet show d9b765da-45eb-4543-be96-1b69a00a2556
+-------------------+--------------------------------------------+
| Field             | Value                                      |
+-------------------+--------------------------------------------+
| allocation_pools  | {"start": "10.0.1.2", "end": "10.0.1.254"} |
| cidr              | 10.0.1.0/24                                |
| dns_nameservers   |                                            |
| enable_dhcp       | True                                       |
| gateway_ip        | 10.0.1.1                                   |
| host_routes       |                                            |
| id                | d9b765da-45eb-4543-be96-1b69a00a2556       |
| ip_version        | 4                                          |
| ipv6_address_mode |                                            |
| ipv6_ra_mode      |                                            |
| name              | sb-demo-net                                |
| network_id        | c3d55c21-d8c9-4ee5-944b-560b7e0ea33b       |
| subnetpool_id     |                                            |
| tenant_id         | 75eb5efae5764682bca2fede6f4d8c6f           |
+-------------------+--------------------------------------------+

9.3.10.8 Target Project: Creating a Port Using demo-net

The owner of the port is demo2. Members of the network owner project (demo) will not see this port.

Running the following command:

ardana > openstack port create c3d55c21-d8c9-4ee5-944b-560b7e0ea33b

Creates a new port:

+-----------------------+-----------------------------------------------------------------------------------------------------+
| Field                 | Value                                                                                               |
+-----------------------+-----------------------------------------------------------------------------------------------------+
| admin_state_up        | True                                                                                                |
| allowed_address_pairs |                                                                                                     |
| binding:vnic_type     | normal                                                                                              |
| device_id             |                                                                                                     |
| device_owner          |                                                                                                     |
| dns_assignment        | {"hostname": "host-10-0-1-10", "ip_address": "10.0.1.10", "fqdn": "host-10-0-1-10.openstacklocal."} |
| dns_name              |                                                                                                     |
| fixed_ips             | {"subnet_id": "d9b765da-45eb-4543-be96-1b69a00a2556", "ip_address": "10.0.1.10"}                    |
| id                    | 03ef2dce-20dc-47e5-9160-942320b4e503                                                                |
| mac_address           | fa:16:3e:27:8d:ca                                                                                   |
| name                  |                                                                                                     |
| network_id            | c3d55c21-d8c9-4ee5-944b-560b7e0ea33b                                                                |
| security_groups       | 275802d0-33cb-4796-9e57-03d8ddd29b94                                                                |
| status                | DOWN                                                                                                |
| tenant_id             | 5a582af8b44b422fafcd4545bd2b7eb5                                                                    |
+-----------------------+-----------------------------------------------------------------------------------------------------+

9.3.10.9 Target Project Booting a VM Using Demo-Net

Here the tenant demo2 boots a VM that uses the demo-net shared network:

ardana > openstack server create --flavor 1 --image $OS_IMAGE --nic net-id=c3d55c21-d8c9-4ee5-944b-560b7e0ea33b demo2-vm-using-demo-net-nic
+--------------------------------------+------------------------------------------------+
| Property                             | Value                                          |
+--------------------------------------+------------------------------------------------+
| OS-EXT-AZ:availability_zone          |                                                |
| OS-EXT-STS:power_state               | 0                                              |
| OS-EXT-STS:task_state                | scheduling                                     |
| OS-EXT-STS:vm_state                  | building                                       |
| OS-SRV-USG:launched_at               | -                                              |
| OS-SRV-USG:terminated_at             | -                                              |
| accessIPv4                           |                                                |
| accessIPv6                           |                                                |
| adminPass                            | sS9uSv9PT79F                                   |
| config_drive                         |                                                |
| created                              | 2016-01-04T19:23:24Z                           |
| flavor                               | m1.tiny (1)                                    |
| hostId                               |                                                |
| id                                   | 3a4dc44a-027b-45e9-acf8-054a7c2dca2a           |
| image                                | cirros-0.3.3-x86_64 (6ae23432-8636-4e...1efc5) |
| key_name                             | -                                              |
| metadata                             | {}                                             |
| name                                 | demo2-vm-using-demo-net-nic                    |
| os-extended-volumes:volumes_attached | []                                             |
| progress                             | 0                                              |
| security_groups                      | default                                        |
| status                               | BUILD                                          |
| tenant_id                            | 5a582af8b44b422fafcd4545bd2b7eb5               |
| updated                              | 2016-01-04T19:23:24Z                           |
| user_id                              | a0e6427b036344fdb47162987cb0cee5               |
+--------------------------------------+------------------------------------------------+

Run openstack server list:

ardana > openstack server list

See the VM running:

+-------------------+-----------------------------+--------+------------+-------------+--------------------+
| ID                | Name                        | Status | Task State | Power State | Networks           |
+-------------------+-----------------------------+--------+------------+-------------+--------------------+
| 3a4dc...a7c2dca2a | demo2-vm-using-demo-net-nic | ACTIVE | -          | Running     | demo-net=10.0.1.11 |
+-------------------+-----------------------------+--------+------------+-------------+--------------------+

Run openstack port list:

ardana > neutron port-list --device-id 3a4dc44a-027b-45e9-acf8-054a7c2dca2a

View the subnet:

+---------------------+------+-------------------+-------------------------------------------------------------------+
| id                  | name | mac_address       | fixed_ips                                                         |
+---------------------+------+-------------------+-------------------------------------------------------------------+
| 7d14ef8b-9...80348f |      | fa:16:3e:75:32:8e | {"subnet_id": "d9b765da-45...00a2556", "ip_address": "10.0.1.11"} |
+---------------------+------+-------------------+-------------------------------------------------------------------+

Run neutron port-show:

ardana > openstack port show 7d14ef8b-9d48-4310-8c02-00c74d80348f
+-----------------------+-----------------------------------------------------------------------------------------------------+
| Field                 | Value                                                                                               |
+-----------------------+-----------------------------------------------------------------------------------------------------+
| admin_state_up        | True                                                                                                |
| allowed_address_pairs |                                                                                                     |
| binding:vnic_type     | normal                                                                                              |
| device_id             | 3a4dc44a-027b-45e9-acf8-054a7c2dca2a                                                                |
| device_owner          | compute:None                                                                                        |
| dns_assignment        | {"hostname": "host-10-0-1-11", "ip_address": "10.0.1.11", "fqdn": "host-10-0-1-11.openstacklocal."} |
| dns_name              |                                                                                                     |
| extra_dhcp_opts       |                                                                                                     |
| fixed_ips             | {"subnet_id": "d9b765da-45eb-4543-be96-1b69a00a2556", "ip_address": "10.0.1.11"}                    |
| id                    | 7d14ef8b-9d48-4310-8c02-00c74d80348f                                                                |
| mac_address           | fa:16:3e:75:32:8e                                                                                   |
| name                  |                                                                                                     |
| network_id            | c3d55c21-d8c9-4ee5-944b-560b7e0ea33b                                                                |
| security_groups       | 275802d0-33cb-4796-9e57-03d8ddd29b94                                                                |
| status                | ACTIVE                                                                                              |
| tenant_id             | 5a582af8b44b422fafcd4545bd2b7eb5                                                                    |
+-----------------------+-----------------------------------------------------------------------------------------------------+

9.3.10.10 Limitations

Note the following limitations of RBAC in Neutron.

  • Neutron network is the only supported RBAC Neutron object type.

  • The "access_as_external" action is not supported – even though it is listed as a valid action by python-neutronclient.

  • The neutron-api server will not accept action value of 'access_as_external'. The access_as_external definition is not found in the specs.

  • The target project users cannot create, modify, or delete subnets on networks that have RBAC policies.

  • The subnet of a network that has an RBAC policy cannot be added as an interface of a target tenant's router. For example, the command neutron router-interface-add tgt-tenant-router <sb-demo-net uuid> will error out.

  • The security group rules on the network owner do not apply to other projects that can use the network.

  • A user in target project can boot up VMs using a VNIC using the shared network. The user of the target project can assign a floating IP (FIP) to the VM. The target project must have SG rules that allows SSH and/or ICMP for VM connectivity.

  • Neutron RBAC creation and management are currently not supported in Horizon. For now, the Neutron CLI has to be used to manage RBAC rules.

  • A RBAC rule tells Neutron whether a tenant can access a network (Allow). Currently there is no DENY action.

  • Port creation on a shared network fails if --fixed-ip is specified in the neutron port-create command.

9.3.11 Configuring Maximum Transmission Units in Neutron

This topic explains how you can configure MTUs, what to look out for, and the results and implications of changing the default MTU settings. It is important to note that every network within a network group will have the same MTU.

Warning
Warning

An MTU change will not affect existing networks that have had VMs created on them. It will only take effect on new networks created after the reconfiguration process.

9.3.11.1 Overview

A Maximum Transmission Unit, or MTU is the maximum packet size (in bytes) that a network device can or is configured to handle. There are a number of places in your cloud where MTU configuration is relevant: the physical interfaces managed and configured by SUSE OpenStack Cloud, the virtual interfaces created by Neutron and Nova for Neutron networking, and the interfaces inside the VMs.

SUSE OpenStack Cloud-managed physical interfaces

SUSE OpenStack Cloud-managed physical interfaces include the physical interfaces and the bonds, bridges, and VLANs created on top of them. The MTU for these interfaces is configured via the 'mtu' property of a network group. Because multiple network groups can be mapped to one physical interface, there may have to be some resolution of differing MTUs between the untagged and tagged VLANs on the same physical interface. For instance, if one untagged VLAN, vlan101 (with an MTU of 1500) and a tagged VLAN vlan201 (with an MTU of 9000) are both on one interface (eth0), this means that eth0 can handle 1500, but the VLAN interface which is created on top of eth0 (that is, vlan201@eth0) wants 9000. However, vlan201 cannot have a higher MTU than eth0, so vlan201 will be limited to 1500 when it is brought up, and fragmentation will result.

In general, a VLAN interface MTU must be lower than or equal to the base device MTU. If they are different, as in the case above, the MTU of eth0 can be overridden and raised to 9000, but in any case the discrepancy will have to be reconciled.

Neutron/Nova interfaces

Neutron/Nova interfaces include the virtual devices created by Neutron and Nova during the normal process of realizing a Neutron network/router and booting a VM on it (qr-*, qg-*, tap-*, qvo-*, qvb-*, etc.). There is currently no support in Neutron/Nova for per-network MTUs in which every interface along the path for a particular Neutron network has the correct MTU for that network. There is, however, support for globally changing the MTU of devices created by Neutron/Nova (see network_device_mtu below). This means that if you want to enable jumbo frames for any set of VMs, you will have to enable it for all your VMs. You cannot just enable them for a particular Neutron network.

VM interfaces

VMs typically get their MTU via DHCP advertisement, which means that the dnsmasq processes spawned by the neutron-dhcp-agent actually advertise a particular MTU to the VMs. In SUSE OpenStack Cloud 8, the DHCP server advertises to all VMS a 1400 MTU via a forced setting in dnsmasq-neutron.conf. This is suboptimal for every network type (vxlan, flat, vlan, etc) but it does prevent fragmentation of a VM's packets due to encapsulation.

For instance, if you set the new *-mtu configuration options to a default of 1500 and create a VXLAN network, it will be given an MTU of 1450 (with the remaining 50 bytes used by the VXLAN encapsulation header) and will advertise a 1450 MTU to any VM booted on that network. If you create a provider VLAN network, it will have an MTU of 1500 and will advertise 1500 to booted VMs on the network. It should be noted that this default starting point for MTU calculation and advertisement is also global, meaning you cannot have an MTU of 8950 on one VXLAN network and 1450 on another. However, you can have provider physical networks with different MTUs by using the physical_network_mtus config option, but Nova still requires a global MTU option for the interfaces it creates, thus you cannot really take advantage of that configuration option.

9.3.11.2 Network settings in the input model

MTU can be set as an attribute of a network group in network_groups.yml. Note that this applies only to KVM. That setting means that every network in the network group will be assigned the specified MTU. The MTU value must be set individually for each network group. For example:

network-groups:
        - name: GUEST
        mtu: 9000
        ...

        - name: EXTERNAL-API
        mtu: 9000
        ...

        - name: EXTERNAL-VM
        mtu: 9000
        ...

9.3.11.3 Infrastructure support for jumbo frames

If you want to use jumbo frames, or frames with an MTU of 9000 or more, the physical switches and routers that make up the infrastructure of the SUSE OpenStack Cloud installation must be configured to support them. To realize the advantages, generally all devices in the same broadcast domain must have the same MTU.

If you want to configure jumbo frames on compute and controller nodes, then all switches joining the compute and controller nodes must have jumbo frames enabled. Similarly, the "infrastructure gateway" through which the external VM network flows, commonly known as the default route for the external VM VLAN, must also have the same MTU configured.

You can also consider anything in the same broadcast domain to be anything in the same VLAN or anything in the same IP subnet.

9.3.11.4 Enabling end-to-end jumbo frames for a VM

  1. Add an 'mtu' attribute to all the network groups in your model. Note that adding the MTU for the network groups will only affect the configuration for physical network interfaces.

    To add the mtu attribute, find the YAML file that contains your network-groups entry. We will assume it is network_groups.yml, unless you have changed it. Whatever the file is named, it will be found in ~/openstack/my_cloud/definition/data/.

    To edit these files, begin by checking out the site branch on the Cloud Lifecycle Manager node. You may already be on that branch. If so, you will remain there.

    ardana > cd ~/openstack/ardana/ansible
    ardana > git checkout site

    Then begin editing the files. In network_groups.yml, add mtu: 9000

    network-groups:
                - name: GUEST
                hostname-suffix: guest
                mtu: 9000
                tags:
                - neutron.networks.vxlan

    This will set the physical interface managed by SUSE OpenStack Cloud 8 that has the GUEST network group tag assigned to it. This can be found in the interfaces_set.yml file under the interface-models section.

  2. Next, edit the layer 2 agent config file, ml2_conf.ini.j2, found in ~/openstack/my_cloud/config/neutron/ to set the path_mtu to 0, this ensures that global_physnet_mtu is used.

    [ml2]
    ...
    path_mtu = 0
  3. Next, edit neutron.conf.j2 found in ~/openstack/my_cloud/config/neutron/ to set advertise_mtu (to true) and global_physnet_mtu to 9000 under [DEFAULT]:

    [DEFAULT]
    ...
    advertise_mtu = True
    global_physnet_mtu = 9000

    This allows Neutron to advertise the optimal MTU to instances (based upon global_physnet_mtu minus the encapsulation size).

  4. Next, remove the "dhcp-option-force=26,1400" line from ~/openstack/my_cloud/config/neutron/dnsmasq-neutron.conf.j2.

  5. Commit your changes

    ardana > git add -A
    ardana > git commit -m "your commit message goes here in quotes"
  6. If SUSE OpenStack Cloud has not been deployed yet, do normal deployment and skip to step 8.

  7. Assuming it has been deployed already, continue here:

    Run the configuration processor:

    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml

    and ready the deployment:

    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml

    Then run the network_interface-reconfigure.yml playbook, changing directories first:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts network_interface-reconfigure.yml

    Then run neutron-reconfigure.yml:

    ardana > ansible-playbook -i hosts/verb_hosts neutron-reconfigure.yml

    Then nova-reconfigure.yml:

    ardana > ansible-playbook -i hosts/verb_hosts nova-reconfigure.yml

    Note: adding/changing network-group mtu settings will likely require a network restart during network_interface-reconfigure.yml.

  8. Follow the normal process for creating a Neutron network and booting a VM or two. In this example, if a VXLAN network is created and a VM is booted on it, the VM will have an MTU of 8950, with the remaining 50 bytes used by the VXLAN encapsulation header.

  9. Test and verify that the VM can send and receive jumbo frames without fragmentation. You can use ping. For example, to test an MTU of 9000 using VXLAN:

    ardana > ping –M do –s 8950 YOUR_VM_FLOATING_IP

    Just substitute your actual floating IP address for the YOUR_VM_FLOATING_IP.

9.3.11.5 Enabling Optimal MTU Advertisement Feature

To enable the optimal MTU feature, follow these steps:

  1. Edit ~/openstack/my_cloud/config/neutron/neutron.conf.j2 to remove advertise_mtu variable under [DEFAULT]

    [DEFAULT]
    ...
    advertise_mtu = False #remove this
  2. Remove the dhcp-option-force=26,1400 line from ~/openstack/my_cloud/config/neutron/dnsmasq-neutron.conf.j2

  3. If SUSE OpenStack Cloud has already been deployed, follow the remaining steps, otherwise follow the normal deployment procedures.

  4. Commit your changes

    ardana > git add -A
    ardana > git commit -m "your commit message goes here in quotes"
  5. Run the configuration processor:

    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Run ready deployment:

    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Run the network_interface-reconfigure.yml playbook, changing directories first:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts network_interface-reconfigure.yml
  8. Run neutron-reconfigure.yml:

    ardana > ansible-playbook -i hosts/verb_hosts neutron-reconfigure.yml
Important
Important

If you are upgrading an existing deployment, attention must be paid to avoid creating MTU mismatch between network interfaces in preexisting VMs and that of VMs created after upgrade. If you do have an MTU mismatch, then the new VMs (having interface with 1500 minus the underlay protocol overhead) will not be able to have L2 connectivity with preexisting VMs (with 1400 MTU due to dhcp-option-force).

9.3.12 Improve Network Peformance with Isolated Metadata Settings

In SUSE OpenStack Cloud, Neutron currently sets enable_isolated_metadata = True by default in dhcp_agent.ini because several services require isolated networks (Neutron networks without a router). This has the effect of spawning a neutron-ns-metadata-proxy process on one of the controller nodes for every active Neutron network.

In environments that create many Neutron networks, these extra neutron-ns-metadata-proxy processes can quickly eat up a lot of memory on the controllers, which does not scale well.

For deployments that do not require isolated metadata (that is, they do not require the Platform Services and will always create networks with an attached router), you can set enable_isolated_metadata = False in dhcp_agent.ini to reduce Neutron memory usage on controllers, allowing a greater number of active Neutron networks.

Note that the dhcp_agent.ini.j2 template is found in ~/openstack/my_cloud/config/neutron on the Cloud Lifecycle Manager node. The edit can be made there and the standard deployment can be run if this is install time. In a deployed cloud, run the Neutron reconfiguration procedure outlined here:

  1. First check out the site branch:

    ardana > cd ~/openstack/my_cloud/config/neutron
    ardana > git checkout site
  2. Edit the dhcp_agent.ini.j2 file to change the enable_isolated_metadata = {{ neutron_enable_isolated_metadata }} line in the [DEFAULT] section to read:

    enable_isolated_metadata = False
  3. Commit the file:

    ardana > git add -A
    ardana > git commit -m "your commit message goes here in quotes"
  4. Run the ready-deployment.yml playbook from ~/openstack/ardana/ansible:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  5. Then run the neutron-reconfigure.yml playbook, changing directories first:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts neutron-reconfigure.yml

9.3.13 Moving from DVR deployments to non_DVR

If you have an older deployment of SUSE OpenStack Cloud which is using DVR as a default and you are attempting to move to non_DVR, make sure you follow these steps:

  1. Remove all your existing DVR routers and their workloads. Make sure to remove interfaces, floating ips and gateways, if applicable.

    neutron router-interface-delete ROUTER-NAME SUBNET-NAME/SUBNET-ID
    neutron floatingip-disassociate FLOATINGIP-ID PRIVATE-PORT-ID
    neutron router-gateway-clear ROUTER-NAME -NET-NAME/EXT-NET-ID
  2. Then delete the router.

    neutron router-delete ROUTER-NAME
  3. Before you create any non_DVR router make sure that l3-agents and metadata-agents are not running in any compute host. You can run the command neutron agent-list to see if there are any neutron-l3-agent running in any compute-host in your deployment.

    You must disable neutron-l3-agent and neutron-metadata-agent on every compute host by running the following commands:

    ardana > neutron agent-list
    +--------------------------------------+----------------------+--------------------------+-------------------+-------+----------------+---------------------------+
    | id                                   | agent_type           | host                     | availability_zone | alive | admin_state_up | binary                    |
    +--------------------------------------+----------------------+--------------------------+-------------------+-------+----------------+---------------------------+
    | 208f6aea-3d45-4b89-bf42-f45a51b05f29 | Loadbalancerv2 agent | ardana-cp1-comp0001-mgmt |                   | :-)   | True           | neutron-lbaasv2-agent     |
    | 810f0ae7-63aa-4ee3-952d-69837b4b2fe4 | L3 agent             | ardana-cp1-comp0001-mgmt | nova              | :-)   | True           | neutron-l3-agent          |
    | 89ac17ba-2f43-428a-98fa-b3698646543d | Metadata agent       | ardana-cp1-comp0001-mgmt |                   | :-)   | True           | neutron-metadata-agent    |
    | f602edce-1d2a-4c8a-ba56-fa41103d4e17 | Open vSwitch agent   | ardana-cp1-comp0001-mgmt |                   | :-)   | True           | neutron-openvswitch-agent |
    ...
    +--------------------------------------+----------------------+--------------------------+-------------------+-------+----------------+---------------------------+
    
    $ neutron agent-update 810f0ae7-63aa-4ee3-952d-69837b4b2fe4 --admin-state-down
    Updated agent: 810f0ae7-63aa-4ee3-952d-69837b4b2fe4
    
    $ neutron agent-update 89ac17ba-2f43-428a-98fa-b3698646543d --admin-state-down
    Updated agent: 89ac17ba-2f43-428a-98fa-b3698646543d
    Note
    Note

    Only L3 and Metadata agents were disabled.

  4. Once L3 and metadata neutron agents are stopped, follow steps 1 through 7 in the document Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 12 “Alternative Configurations”, Section 12.2 “Configuring SUSE OpenStack Cloud without DVR” and then run the neutron-reconfigure.yml playbook:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts neutron-reconfigure.yml

9.3.14 OVS-DPDK Support

SUSE OpenStack Cloud uses a version of Open vSwitch (OVS) that is built with the Data Plane Development Kit (DPDK) and includes a QEMU hypervisor which supports vhost-user.

The OVS-DPDK package modifes the OVS fast path, which is normally performed in kernel space, and allows it to run in userspace so there is no context switch to the kernel for processing network packets.

The EAL component of DPDK supports mapping the Network Interface Card (NIC) registers directly into userspace. The DPDK provides a Poll Mode Driver (PMD) that can access the NIC hardware from userspace and uses polling instead of interrupts to avoid the user to kernel transition.

The PMD maps the shared address space of the VM that is provided by the vhost-user capability of QEMU. The vhost-user mode causes Neutron to create a Unix domain socket that allows communication between the PMD and QEMU. The PMD uses this in order to acquire the file descriptors to the pre-allocated VM memory. This allows the PMD to directly access the VM memory space and perform a fast zero-copy of network packets directly into and out of the VMs virtio_net vring.

This yields performance improvements in the time it takes to process network packets.

9.3.14.1 Usage considerations

The target for a DPDK Open vSwitch is VM performance and VMs only run on compute nodes so the following considerations are compute node specific.

  1. You are required to Section 9.3.14.3, “Configuring Hugepages for DPDK in Neutron Networks” in order to use DPDK with VMs. The memory to be used must be allocated at boot time so you must know beforehand how many VMs will be scheduled on a node. Also, for NUMA considerations, you want those hugepages on the same NUMA node as the NIC. A VM maps its entire address space into a hugepage.

  2. For maximum performance you must reserve logical cores for DPDK poll mode driver (PMD) usage and for hypervisor (QEMU) usage. This keeps the Linux kernel from scheduling processes on those cores. The PMD threads will go to 100% cpu utilization since it uses polling of the hardware instead of interrupts. There will be at least 2 cores dedicated to PMD threads. Each VM will have a core dedicated to it although for less performance VMs can share cores.

  3. VMs can use the virtio_net or the virtio_pmd drivers. There is also a PMD for an emulated e1000.

  4. Only VMs that use hugepages can be sucessfully launched on a DPDK enabled NIC. If there is a need to support both DPDK and non-DPDK based VMs an additional port managed by the Linux kernel must exist.

  5. OVS/DPDK does not support jumbo frames. Please review https://github.com/openvswitch/ovs/blob/branch-2.5/INSTALL.DPDK.md#restrictions for restrictions.

  6. The Open vSwitch firewall driver in networking-ovs-dpdk repo is stateless and not a stateful one that would use iptables and conntrack. In the past, Neutron core has declined to pull in stateless type FW. https://bugs.launchpad.net/neutron/+bug/1531205 The native firewall driver is stateful, which is why conntrack was added to Open vSwitch. But this is not supported on DPDK and will not be until OVS 2.6.

9.3.14.2 For more information

See the following topics for more information:

9.3.14.3 Configuring Hugepages for DPDK in Neutron Networks

To take advantage of DPDK and its network performance enhancements, enable hugepages first.

With hugepages, physical RAM is reserved at boot time and dedicated to a virtual machine. Only that virtual machine and Open vSwitch can use this specifically allocated RAM. The host OS cannot access it. This memory is contiguous, and because of its larger size, reduces the number of entries in the memory map and number of times it must be read.

The hugepage reservation is made in /etc/default/grub, but this is handled by the Cloud Lifecycle Manager.

In addition to hugepages, to use DPDK, CPU isolation is required. This is achieved with the 'isolcups' command in /etc/default/grub, but this is also managed by the Cloud Lifecycle Manager using a new input model file.

The two new input model files introduced with this release to help you configure the necessary settings and persist them are:

  • memory_models.yml (for hugepages)

  • cpu_models.yml (for CPU isolation)

9.3.14.3.1 memory_models.yml

In this file you set your huge page size along with the number of such huge-page allocations.

 ---
  product:
    version: 2

  memory-models:
    - name: COMPUTE-MEMORY-NUMA
      default-huge-page-size: 1G
      huge-pages:
        - size: 1G
          count: 24
          numa-node: 0
        - size: 1G
          count: 24
          numa-node: 1
        - size: 1G
          count: 48
9.3.14.3.2 cpu_models.yml
---
  product:
    version: 2

  cpu-models:

    - name: COMPUTE-CPU
      assignments:
       - components:
           - nova-compute-kvm
         cpu:
           - processor-ids: 3-5,12-17
             role: vm

       - components:
           - openvswitch
         cpu:
           - processor-ids: 0
             role: eal
           - processor-ids: 1-2
             role: pmd
9.3.14.3.3 NUMA memory allocation

As mentioned above, the memory used for hugepages is locked down at boot time by an entry in /etc/default/grub. As an admin, you can specify in the input model how to arrange this memory on NUMA nodes. It can be spread across NUMA nodes or you can specify where you want it. For example, if you have only one NIC, you would probably want all the hugepages memory to be on the NUMA node closest to that NIC.

If you do not specify the numa-node settings in the memory_models.yml input model file and use only the last entry indicating "size: 1G" and "count: 48" then this memory is spread evenly across all NUMA nodes.

Also note that the hugepage service runs once at boot time and then goes to an inactive state so you should not expect to see it running. If you decide to make changes to the NUMA memory allocation, you will need to reboot the compute node for the changes to take effect.

9.3.14.4 DPDK Setup for Neutron Networking

9.3.14.4.1 Hardware requirements
  • Intel-based compute node. DPDK is not available on AMD-based systems.

  • The following BIOS settings must be enabled for DL360 Gen9:

    1. Virtualization Technology

    2. Intel(R) VT-d

    3. PCI-PT (Also see Section 9.3.15.13, “Enabling PCI-PT on HPE DL360 Gen 9 Servers”)

  • Need adequate host memory to allow for hugepages. The examples below use 1G hugepages for the VMs

9.3.14.4.2 Limitations
  • DPDK is supported on SLES only.

  • Applies to SUSE OpenStack Cloud 8 only.

  • Tenant network can be untagged vlan or untagged vxlan

  • DPDK port names must be of the form 'dpdk<portid>' where port id is sequential and starts at 0

  • No support for converting DPDK ports to non DPDK ports without rebooting compute node.

  • No security group support, need userspace conntrack.

  • No jumbo frame support.

9.3.14.4.3 Setup instructions

These setup instructions and example model are for a three-host system. There is one controller with Cloud Lifecycle Manager in cloud control plane and two compute hosts.

  1. After initial run of site.yml all compute nodes must be rebooted to pick up changes in grub for hugepages and isolcpus

  2. Changes to non-uniform memory access (NUMA) memory, isolcpu, or network devices must be followed by a reboot of compute nodes

  3. Run sudo reboot to pick up libvirt change and hugepage/isocpus grub changes

    tux > sudo reboot
  4. Use the bash script below to configure nova aggregates, neutron networks, a new flavor, etc. And then it will spin up two VMs.

VM spin-up instructions

Before running the spin up script you need to get a copy of the cirros image to your Cloud Lifecycle Manager node. You can manually scp a copy of the cirros image to the system. You can copy it locallly with wget like so

ardana > wget http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img

Save the following shell script in the home directory and run it. This should spin up two VMs, one on each compute node.

Warning
Warning

Make sure to change all network-specific information in the script to match your environment.

#!/usr/bin/env bash

source service.osrc

######## register glance image
glance image-create --name='cirros' --container-format=bare --disk-format=qcow2 < ~/cirros-0.3.4-x86_64-disk.img

####### create nova aggregate and flavor for dpdk

MI_NAME=dpdk

nova aggregate-create $MI_NAME nova
nova aggregate-add-host $MI_NAME openstack-cp-comp0001-mgmt
nova aggregate-add-host $MI_NAME openstack-cp-comp0002-mgmt
nova aggregate-set-metadata $MI_NAME pinned=true

nova flavor-create $MI_NAME 6 1024 20 1
nova flavor-key $MI_NAME set hw:cpu_policy=dedicated
nova flavor-key $MI_NAME set aggregate_instance_extra_specs:pinned=true
nova flavor-key $MI_NAME set hw:mem_page_size=1048576

######## sec groups NOTE: no sec groups supported on DPDK.  This is in case we do non-DPDK compute hosts.
nova secgroup-add-rule default tcp 22 22 0.0.0.0/0
nova secgroup-add-rule default icmp -1 -1 0.0.0.0/0

########  nova keys
nova keypair-add mykey >mykey.pem
chmod 400 mykey.pem

######## create neutron external network
neutron net-create ext-net --router:external --os-endpoint-type internalURL
neutron subnet-create ext-net 10.231.0.0/19 --gateway_ip=10.231.0.1  --ip-version=4 --disable-dhcp  --allocation-pool start=10.231.17.0,end=10.231.17.255

########  neutron network
neutron net-create mynet1
neutron subnet-create mynet1 10.1.1.0/24 --name mysubnet1
neutron router-create myrouter1
neutron router-interface-add myrouter1 mysubnet1
neutron router-gateway-set myrouter1 ext-net
export MYNET=$(neutron net-list|grep mynet|awk '{print $2}')

######## spin up 2 VMs, 1 on each compute
nova boot --image cirros --nic net-id=${MYNET} --key-name mykey --flavor dpdk --availability-zone nova:openstack-cp-comp0001-mgmt vm1
nova boot --image cirros --nic net-id=${MYNET} --key-name mykey --flavor dpdk --availability-zone nova:openstack-cp-comp0002-mgmt vm2

######## create floating ip and attach to instance
export MYFIP1=$(nova floating-ip-create|grep ext-net|awk '{print $4}')
nova add-floating-ip vm1 ${MYFIP1}

export MYFIP2=$(nova floating-ip-create|grep ext-net|awk '{print $4}')
nova add-floating-ip vm2 ${MYFIP2}

nova list

9.3.14.5 DPDK Configurations

9.3.14.5.1 Base configuration

The following is specific to DL360 Gen9 and BIOS configuration as detailed in Section 9.3.14.4, “DPDK Setup for Neutron Networking”.

  • EAL cores - 1, isolate: False in cpu-models

  • PMD cores - 1 per NIC port

  • Hugepages - 1G per PMD thread

  • Memory channels - 4

  • Global rx queues - based on needs

9.3.14.5.2 Performance considerations common to all NIC types

Compute host core frequency

Host CPUs should be running at maximum performance. The following is a script to set that. Note that in this case there are 24 cores. This needs to be modified to fit your environment. For a HP DL360 Gen9, the BIOS should be configured to use "OS Control Mode" which can be found on the iLO Power Settings page.

for i in `seq 0 23`; do echo "performance" > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done

IO non-posted prefetch

The DL360 Gen9 should have the IO non-posted prefetch disabled. Experimental evidence shows this yields an additional 6-8% performance boost.

9.3.14.5.3 Multiqueue configuration

In order to use multiqueue, a property must be applied to the Glance image and a setting inside the resulting VM must be applied. In this example we create a 4 vCPU flavor for DPDK using 1G hugepages.

MI_NAME=dpdk

nova aggregate-create $MI_NAME nova
nova aggregate-add-host $MI_NAME openstack-cp-comp0001-mgmt
nova aggregate-add-host $MI_NAME openstack-cp-comp0002-mgmt
nova aggregate-set-metadata $MI_NAME pinned=true

nova flavor-create $MI_NAME 6 1024 20 4
nova flavor-key $MI_NAME set hw:cpu_policy=dedicated
nova flavor-key $MI_NAME set aggregate_instance_extra_specs:pinned=true
nova flavor-key $MI_NAME set hw:mem_page_size=1048576

And set the hw_vif_multiqueue_enabled property on the Glance image

ardana > openstack image set --property hw_vif_multiqueue_enabled=true IMAGE UUID

Once the VM is booted using the flavor above, inside the VM, choose the number of combined rx and tx queues to be equal to the number of vCPUs

tux > sudo ethtool -L eth0 combined 4

On the hypervisor you can verify that multiqueue has been properly set by looking at the qemu process

-netdev type=vhost-user,id=hostnet0,chardev=charnet0,queues=4 -device virtio-net-pci,mq=on,vectors=10,

Here you can see that 'mq=on' and vectors=10. The formula for vectors is 2*num_queues+2

9.3.14.6 Troubleshooting DPDK

9.3.14.6.1 Hardware configuration

Because there are several variations of hardware, it is up to you to verify that the hardware is configured properly.

  • Only Intel based compute nodes are supported. There is no DPDK available for AMD-based CPUs.

  • PCI-PT must be enabled for the NIC that will be used with DPDK.

  • When using Intel Niantic and the igb_uio driver, the VT-d must be enabled in the BIOS.

  • For DL360 Gen9 systems, the BIOS shared-memory Section 9.3.15.13, “Enabling PCI-PT on HPE DL360 Gen 9 Servers”.

  • Adequate memory must be available for Section 9.3.14.3, “Configuring Hugepages for DPDK in Neutron Networks” usage.

  • Hyper-threading can be enabled but is not required for base functionality.

  • Determine the PCI slot that the DPDK NIC(s) are installed in to determine the associated NUMA node.

  • Only the Intel Haswell, Broadwell, and Skylake microarchitectures are supported. Intel Sandy Bridge is not supported.

9.3.14.6.2 System configuration
  • Only SLES12-SP3 compute nodes are supported.

  • If a NIC port is used with PCI-PT, SRIOV-only, or PCI-PT+SRIOV, then it cannot be used with DPDK. They are mutually exclusive. This is because DPDK depends on an OvS bridge which does not exist if you use any combination of PCI-PT and SRIOV. You can use DPDK, SRIOV-only, and PCI-PT on difference interfaces of the same server.

  • There is an association between the PCI slot for the NIC and a NUMA node. Make sure to use logical CPU cores that are on the NUMA node associated to the NIC. Use the following to determine which CPUs are on which NUMA node.

    ardana > lscpu
    
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                48
    On-line CPU(s) list:   0-47
    Thread(s) per core:    2
    Core(s) per socket:    12
    Socket(s):             2
    NUMA node(s):          2
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 63
    Model name:            Intel(R) Xeon(R) CPU E5-2650L v3 @ 1.80GHz
    Stepping:              2
    CPU MHz:               1200.000
    CPU max MHz:           1800.0000
    CPU min MHz:           1200.0000
    BogoMIPS:              3597.06
    Virtualization:        VT-x
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              256K
    L3 cache:              30720K
    NUMA node0 CPU(s):     0-11,24-35
    NUMA node1 CPU(s):     12-23,36-47
9.3.14.6.3 Input model configuration
  • If you do not specify a driver for a DPDK device, the igb_uio will be selected as default.

  • DPDK devices must be named dpdk<port-id> where the port-id starts at 0 and increments sequentially.

  • Tenant networks supported are untagged VXLAN and VLAN.

  • Jumbo Frames MTU does not work with DPDK OvS. There is an upstream patch most likely showing up in OvS 2.6 and it cannot be backported due to changes this patch relies upon.

  • Sample VXLAN model

  • Sample VLAN model

9.3.14.6.4 Reboot requirements

A reboot of a compute node must be performed when an input model change causes the following:

  1. After the initial site.yml play on a new OpenStack environment

  2. Changes to an existing OpenStack environment that modify the /etc/default/grub file, such as

    • hugepage allocations

    • CPU isolation

    • iommu changes

  3. Changes to a NIC port usage type, such as

    • moving from DPDK to any combination of PCI-PT and SRIOV

    • moving from DPDK to kernel based eth driver

9.3.14.6.5 Software configuration

The input model is processed by the Configuration Processor which eventually results in changes to the OS. There are several files that should be checked to verify the proper settings were applied. In addition, after the inital site.yml play is run all compute nodes must be rebooted in order to pickup changes to the /etc/default/grub file for hugepage reservation, CPU isolation and iommu settings.

Kernel settings

Check /etc/default/grub for the following

  1. hugepages

  2. CPU isolation

  3. that iommu is in passthru mode if the igb_uio driver is in use

Open vSwitch settings

Check /etc/default/openvswitch-switchf for

  1. using the --dpdk option

  2. core 0 set aside for EAL and kernel to share

  3. cores assigned to PMD drivers, at least two for each DPDK device

  4. verify that memory is reserved with socket-mem option

  5. Once VNETCORE-2509 merges also verify that the umask is 022 and the group is libvirt-qemu

DPDK settings

  1. check /etc/dpdk/interfacesf for the correct DPDK devices

9.3.14.6.6 DPDK runtime

All non-bonded DPDK devices will be added to individual OvS bridges. The bridges will be named br-dpdk0, br-dpdk1, etc. The name of the OvS bridge for bonded DPDK devices will be br-dpdkbond0, br-dpdkbond1, etc.

  1. Since each PMD thread is in a polling loop, it will use 100% of the CPU. Thus for two PMDs you would expect to see the ovs-vswitchd process running at 200%. This can be verified by running

    ardana > top
    
    top - 16:45:42 up 4 days, 22:24,  1 user,  load average: 2.03, 2.10, 2.14
    Tasks: 384 total,   2 running, 382 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  9.0 us,  0.2 sy,  0.0 ni, 90.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem:  13171580+total, 10356851+used, 28147296 free,   257196 buffers
    KiB Swap:        0 total,        0 used,        0 free.  1085868 cached Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
     1522 root      10 -10 6475196 287780  10192 S 200.4  0.2  14250:20 ovs-vswitchd
  2. Verify that ovs-vswitchd is running with

    --dpdk option. ps -ef | grep ovs-vswitchd
  3. PMD thread(s) are started when a DPDK port is added to an OvS bridge. Verify the port is on the bridge.

    tux > sudo ovs-vsctl show
  4. A DPDK port cannot be added to an OvS bridge unless it is bound to a driver. Verify that the DPDK port is bound.

    tux > sudo dpdk_nic_bind -s
  5. Verify that the proper number of hugepages is on the correct NUMA node

    tux > sudo virsh freepages --all

    or

    tux > sudo grep -R "" /sys/kernel/mm/hugepages/ /proc/sys/vm/*huge*
  6. Verify that the VM and the DPDK PMD threads have both mapped the same hugepage(s)

    # this will yield 2 process ids, use the 2nd one
    tux > sudo ps -ef | grep ovs-vswitchd
    tux > sudo ls -l /proc/PROCESS-ID/fd | grep huge
    
    # if running more than 1 VM you will need to figure out which one to use
    tux > sudo ps -ef | grep qemu
    tux > sudo ls -l /proc/PROCESS-ID/fd | grep huge
9.3.14.6.7 Errors

VM does not get fixed IP

  1. DPDK Poll Mode drivers (PMD) communicates with the VM by direct access of the VM hugepage. If a VM is not created using hugepages (see Section 9.3.14.3, “Configuring Hugepages for DPDK in Neutron Networks”), there is no way for DPDK to communicate with the VM and the VM will never be connected to the network.

  2. It has been observed that the DPDK communication with VM fails if the shared-memory is not disabled in BIOS for DL360 Gen9.

Vestiges of non-existent DPDK devices

  1. Incorrect input models that do not use the correct DPDK device name or do not use sequential port IDs starting at 0 may leave non-existent devices in the OvS database. While this does not affect proper functionality it may be confusing.

Startup issues

  1. Running the following will help diagnose startup issues with ovs-vswitchd:

    tux > sudo journalctl -u openvswitch.service --all

9.3.15 SR-IOV and PCI Passthrough Support

SUSE OpenStack Cloud supports both single-root I/O virtualization (SR-IOV) and PCI passthrough (PCIPT). Both technologies provide for better network performance.

This improves network I/O, decreases latency, and reduces processor overhead.

9.3.15.1 SR-IOV

A PCI-SIG Single Root I/O Virtualization and Sharing (SR-IOV) Ethernet interface is a physical PCI Ethernet NIC that implements hardware-based virtualization mechanisms to expose multiple virtual network interfaces that can be used by one or more virtual machines simultaneously. With SR-IOV based NICs, the traditional virtual bridge is no longer required. Each SR-IOV port is associated with a virtual function (VF).

When compared with a PCI Passthtrough Ethernet interface, an SR-IOV Ethernet interface:

  • Provides benefits similar to those of a PCI Passthtrough Ethernet interface, including lower latency packet processing.

  • Scales up more easily in a virtualized environment by providing multiple VFs that can be attached to multiple virtual machine interfaces.

  • Shares the same limitations, including the lack of support for LAG, QoS, ACL, and live migration.

  • Has the same requirements regarding the VLAN configuration of the access switches.

The process for configuring SR-IOV includes creating a VLAN provider network and subnet, then attaching VMs to that network.

With SR-IOV based NICs, the traditional virtual bridge is no longer required. Each SR-IOV port is associated with a virtual function (VF)

9.3.15.2 PCI passthrough Ethernet interfaces

A passthrough Ethernet interface is a physical PCI Ethernet NIC on a compute node to which a virtual machine is granted direct access. PCI passthrough allows a VM to have direct access to the hardware without being brokered by the hypervisor. This minimizes packet processing delays but at the same time demands special operational considerations. For all purposes, a PCI passthrough interface behaves as if it were physically attached to the virtual machine. Therefore any potential throughput limitations coming from the virtualized environment, such as the ones introduced by internal copying of data buffers, are eliminated. However, by bypassing the virtualized environment, the use of PCI passthrough Ethernet devices introduces several restrictions that must be taken into consideration. They include:

  • no support for LAG, QoS, ACL, or host interface monitoring

  • no support for live migration

  • no access to the compute node's OVS switch

A passthrough interface bypasses the compute node's OVS switch completely, and is attached instead directly to the provider network's access switch. Therefore, proper routing of traffic to connect the passthrough interface to a particular tenant network depends entirely on the VLAN tagging options configured on both the passthrough interface and the access port on the switch (TOR).

The access switch routes incoming traffic based on a VLAN ID, which ultimately determines the tenant network to which the traffic belongs. The VLAN ID is either explicit, as found in incoming tagged packets, or implicit, as defined by the access port's default VLAN ID when the incoming packets are untagged. In both cases the access switch must be configured to process the proper VLAN ID, which therefore has to be known in advance

9.3.15.3 Supported Intel 82599 Devices

Table 9.1: Intel 82599 devices supported with SRIOV and PCIPT
VendorDeviceTitle
Intel Corporation10f882599 10 Gigabit Dual Port Backplane Connection
Intel Corporation10f982599 10 Gigabit Dual Port Network Connection
Intel Corporation10fb82599ES 10-Gigabit SFI/SFP+ Network Connection
Intel Corporation10fc82599 10 Gigabit Dual Port Network Connection

9.3.15.4 SRIOV PCIPT configuration

If you plan to take advantage of SR-IOV support in SUSE OpenStack Cloud you will need to plan in advance to meet the following requirements:

  1. Use one of the supported NIC cards:

    • HP Ethernet 10Gb 2-port 560FLR-SFP+ Adapter (Intel Niantic). Product part number: 665243-B21 -- Same part number for the following card options:

      • FlexLOM card

      • PCI slot adapter card

  2. Identify the NIC ports to be used for PCI Passthrough devices and SRIOV devices from each compute node

  3. Ensure that:

    • SRIOV is enabled in the BIOS

    • HP Shared memory is disabled in the BIOS on the compute nodes.

    • The Intel boot agent is disabled on the compute (Section 9.3.15.10, “Intel bootutils” can be used to perform this)

    Note
    Note

    Because of Intel driver limitations, you cannot use a NIC port as an SRIOV NIC as well as a physical NIC. Using the physical function to carry the normal tenant traffic through the OVS bridge at the same time as assigning the VFs from the same NIC device as passthrough to the guest VM is not supported.

If the above prerequisites are met, then SR-IOV or PCIPT can be reconfigured at any time. There is no need to do it at install time.

9.3.15.5 Deployment use cases

The following are typical use cases that should cover your particular needs:

  1. A device on the host needs to be enabled for both PCI-passthrough and PCI-SRIOV during deployment. At run time Nova decides whether to use physical functions or virtual function depending on vnic_type of the port used for booting the VM.

  2. A device on the host needs to be configured only for PCI-passthrough.

  3. A device on the host needs to be configured only for PCI-SRIOV virtual functions.

9.3.15.6 Input model updates

SUSE OpenStack Cloud 8 provides various options for the user to configure the network for tenant VMs. These options have been enhanced to support SRIOV and PCIPT.

the Cloud Lifecycle Manager input model changes to support SRIOV and PCIPT are as follows. If you were familiar with the configuration settings previously, you will notice these changes.

net_interfaces.yml: This file defines the interface details of the nodes. In it, the following fields have been added under the compute node interface section:

KeyValue
sriov_only:

Indicates that only SR-IOV be enabled on the interface. This should be set to true if you want to dedicate the NIC interface to support only SR-IOV functionality.

pci-pt:

When this value is set to true, it indicates that PCIPT should be enabled on the interface.

vf-count:

Indicates the number of VFs to be configured on a given interface.

In control_plane.yml, under Compute resource neutron-sriov-nic-agent has been added as service components

under resources:

KeyValue
name: Compute
resource-prefix: Comp
server-role:COMPUTE-ROLE
allocation-policy: Any
min-count: 0
service-components:ntp-client
 nova-compute
 nova-compute-kvm
 neutron-l3-agent
 neutron-metadata-agent
 neutron-openvswitch-agent
 neutron-lbaasv2-agent
 - neutron-sriov-nic-agent*

nic_device_data.yml: This is the new file added with this release to support SRIOV and PCIPT configuration details. It contains information about the specifics of a nic, and is found here: ~/openstack/ardana/services/osconfig/nic_device_data.yml. The fields in this file are as follows.

  1. nic-device-types: The nic-device-types section contains the following key-value pairs:

    KeyValue
    name:

    The name of the nic-device-types that will be referenced in nic_mappings.yml

    family:

    The name of the nic-device-families to be used with this nic_device_type

    device_id:

    Device ID as specified by the vendor for the particular NIC

    type:

    The value of this field can be "simple-port" or "multi-port". If a single bus address is assigned to more than one nic it will be multi-port, else if there is a one-to one mapping between bus address and the nic then it will be simple-port.

  2. nic-device-families: The nic-device-families section contains the following key-value pairs:

    KeyValue
    name:

    The name of the device family that can be used for reference in nic-device-types.

    vendor-id:

    Vendor ID of the NIC

    config-script:

    A script file used to create the virtual functions (VF) on the Compute node.

    driver:

    Indicates the NIC driver that needs to be used.

    vf-count-type:

    This value can be either "port" or "driver".

    “port”:

    Indicates that the device supports per-port virtual function (VF) counts.

    “driver:”

    Indicates that all ports using the same driver will be configured with the same number of VFs, whether or not the interface model specifies a vf-count attribute for the port. If two or more ports specify different vf-count values, the config processor errors out.

    Max-vf-count:

    This field indicates the maximum VFs that can be configured on an interface as defined by the vendor.

control_plane.yml: This file provides the information about the services to be run on a particular node. To support SR-IOV on a particular compute node, you must run neutron-sriov-nic-agent on that node.

Mapping the use cases with various fields in input model

 Vf-countSR-IOVPCIPTOVS bridgeCan be NIC bondedUse case
sriov-only: trueMandatoryYesNoNoNoDedicated to SRIOV
pci-pt : trueNot SpecifiedNoYesNoNoDedicated to PCI-PT
pci-pt : trueSpecifiedYesYesNoNoPCI-PT or SRIOV
pci-pt and sriov-only keywords are not specifiedSpecifiedYesNoYesNoSRIOV with PF used by host
pci-pt and sriov-only keywords are not specifiedNot SpecifiedNoNoYesYesTraditional/Usual use case

9.3.15.7 Mappings between nic_mappings.yml and net_interfaces.yml

The following diagram shows which fields in nic_mappings.yml map to corresponding fields in net_interfaces.yml:

9.3.15.8 Example Use Cases for Intel

  1. Nic-device-types and nic-device-families with Intel 82559 with ixgbe as the driver.

    nic-device-types:
        - name: ''8086:10fb
          family: INTEL-82599
          device-id: '10fb'
          type: simple-port
    nic-device-families:
        # Niantic
        - name: INTEL-82599
          vendor-id: '8086'
          config-script: intel-82599.sh
          driver: ixgbe
          vf-count-type: port
          max-vf-count: 63
  2. net_interfaces.yml for the SRIOV-only use case:

    - name: COMPUTE-INTERFACES
       - name: hed1
         device:
           name: hed1
           sriov-only: true
           vf-count: 6
         network-groups:
          - GUEST1
  3. net_interfaces.yml for the PCIPT-only use case:

    - name: COMPUTE-INTERFACES
       - name: hed1
         device:
           name: hed1
           pci-pt: true
        network-groups:
         - GUEST1
  4. net_interfaces.yml for the SRIOV and PCIPT use case

     - name: COMPUTE-INTERFACES
        - name: hed1
          device:
            name: hed1
            pci-pt: true
            vf-count: 6
          network-groups:
          - GUEST1
  5. net_interfaces.yml for SRIOV and Normal Virtio use case

    - name: COMPUTE-INTERFACES
       - name: hed1
         device:
            name: hed1
            vf-count: 6
          network-groups:
          - GUEST1
  6. net_interfaces.yml for PCI-PT (hed1 and hed4 refer to the DUAL ports of the PCI-PT NIC)

        - name: COMPUTE-PCI-INTERFACES
          network-interfaces:
          - name: hed3
            device:
              name: hed3
            network-groups:
              - MANAGEMENT
              - EXTERNAL-VM
            forced-network-groups:
              - EXTERNAL-API
          - name: hed1
            device:
              name: hed1
              pci-pt: true
            network-groups:
              - GUEST
          - name: hed4
            device:
              name: hed4
              pci-pt: true
            network-groups:
              - GUEST

9.3.15.9 Launching Virtual Machines

Provisioning a VM with SR-IOV NIC is a two-step process.

  1. Create a Neutron port with vnic_type = direct.

    ardana > neutron port-create $net_id --name sriov_port --binding:vnic_type direct
  2. Boot a VM with the created port-id.

    ardana > nova boot --flavor m1.large --image ubuntu_14.04 --nic port-id=$port_id test-sriov

Provisioning a VM with PCI-PT NIC is a two-step process.

  1. Create two Neutron ports with vnic_type = direct-physical.

    ardana > neutron port-create net1 --name pci-port1 --vnic_type=direct-physical
    neutron port-create net1 --name pci-port2  --vnic_type=direct-physical
  2. Boot a VM with the created ports.

    ardana > nova boot --flavor 4 --image opensuse --nic port-id pci-port1-port-id \
    --nic port-id pci-port2-port-id vm1-pci-passthrough

If PCI-PT VM gets stuck (hangs) at boot time when using an Intel NIC, the boot agent should be disabled.

9.3.15.10 Intel bootutils

When Intel cards are used for PCI-PT, a tenant VM can get stuck at boot time. When this happens, you should download Intel bootutils and use it to should disable bootagent.

  1. Download Preebot.tar.gz from https://downloadcenter.intel.com/download/19186/Intel-Ethernet-Connections-Boot-Utility-Preboot-Images-and-EFI-Drivers

  2. Untar the Preboot.tar.gz on the compute node where the PCI-PT VM is to be hosted.

  3. Go to ~/APPS/BootUtil/Linux_x64

    cd ~/APPS/BootUtil/Linux_x64

    and run following command

    ./bootutil64e -BOOTENABLE disable -all
  4. Boot the PCI-PT VM and it should boot without getting stuck.

    Note
    Note

    Here even though VM console shows VM getting stuck at PXE boot, it is not related to BIOS PXE settings.

9.3.15.11 Making input model changes and implementing PCI PT and SR-IOV

To implent the configuration you require, log into the Cloud Lifecycle Manager node and update the Cloud Lifecycle Manager model files to enable SR-IOV or PCIPT following the relevent use case explained above. You will need to edit

  • net_interfaces.yml

  • nic_device_data.yml

  • control_plane.yml

To make the edits,

  1. Check out the site branch of the local git repository and change to the correct directory:

    ardana > git checkout site
    ardana > cd ~/openstack/my_cloud/definition/data/
  2. Open each file in vim or another editor and make the necessary changes. Save each file, then commit to the local git repository:

    ardana > git add -A
    ardana > git commit -m "your commit message goes here in quotes"
  3. Here you will have the Cloud Lifecycle Manager enable your changes by running the necessary playbooks:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts site.yml
Note
Note

After running the site.yml playbook above, you must reboot the compute nodes that are configured with Intel PCI devices.

Note
Note

When a VM is running on an SRIOV port on a given compute node, reconfiguration is not supported.

You can set the number of virtual functions that must be enabled on a compute node at install time. You can update the number of virtual functions after deployment. If any VMs have been spawned before you change the number of virtual functions, those VMs may lose connectivity. Therefore, it is always recommended that if any virtual function is used by any tenant VM, you should not reconfigure the virtual functions. Instead, you should delete/migrate all the VMs on that NIC before reconfiguring the number of virtual functions.

9.3.15.12 Limitations

  • Security groups are not applicable for PCI-PT and SRIOV ports.

  • Live migration is not supported for VMs with PCI-PT and SRIOV ports.

  • Rate limiting (QoS) is not applicable on SRIOV and PCI-PT ports.

  • SRIOV/PCIPT is not supported for VxLAN network.

  • DVR is not supported with SRIOV/PCIPT.

  • For Intel cards, the same NIC cannot be used for both SRIOV and normal VM boot.

  • Current upstream OpenStack code does not support this hot plugin of SRIOV/PCIPT interface using the nova attach_interface command. See https://review.openstack.org/#/c/139910/ for more information.

  • Neutron port-update when admin state is down will not work.

  • SLES Compute Nodes with dual-port PCI-PT NICs, both ports should always be passed in the VM. It is not possible to split the dual port and pass through just a single port.

9.3.15.13 Enabling PCI-PT on HPE DL360 Gen 9 Servers

The HPE DL360 Gen 9 and HPE ProLiant systems with Intel processors use a region of system memory for sideband communication of management information. The BIOS sets up Reserved Memory Region Reporting (RMRR) to report these memory regions and devices to the operating system. There is a conflict between the Linux kernel and RMRR which causes problems with PCI pass-through (PCI-PT). This is needed for IOMMU use by DPDK. Note that this does not affect SR-IOV.

In order to enable PCI-PT on the HPE DL360 Gen 9 you must have a version of firmware that supports setting this and you must change a BIOS setting.

To begin, get the latest firmware and install it on your compute nodes.

Once the firmware has been updated:

  1. Reboot the server and press F9 (system utilities) during POST (power on self test)

  2. Choose System Configuration

  3. Select the NIC for which you want to enable PCI-PT

  4. Choose Device Level Configuration

  5. Disable the shared memory feature in the BIOS.

  6. Save the changes and reboot server

9.3.16 Installing the L2 Gateway Agent for the Networking Service

The L2 gateway is a service plug-in to the Neutron networking service that allows two L2 networks to be seamlessly connected to create a single L2 broadcast domain. The initial implementation provides for the ability to connect a virtual Neutron VxLAN network to a physical VLAN using a VTEP-capable HPE 5930 switch. The L2 gateway is to be enabled only for VxLAN deployments.

To begin L2 gateway agent setup, you need to configure your switch. These instructions use an HPE FlexFabric 5930 Switch Series switch.

9.3.16.1 Sample network topology (for illustration purposes)

When viewing the following network diagram, assume that the blue VNET has been created by the tenant and has been assigned a segmentation ID of 1000 (VNI 1000). The Cloud Admin is now connecting physical servers to this VNET.

Assume also that the blue L2 Gateway is created and points to the HW VTEPS, the physical ports, the VLAN, and if it is an access or trunk port (tagged or untagged)

Note
Note

This example does not apply to distributed route networks, which use Distributed Virtual Routers (DVR).

9.3.16.2 Networks

The following diagram illustrates an example network configuration. It does not apply to distributed route networks, which use Distributed Virtual Routers (DVR).

L2 Gateway and 5930 Switch
Figure 9.1: L2 Gateway and 5930 Switch

An L2 gateway is useful in extending virtual networks (VxLAN) in a cloud onto physical VLAN networks. The L2 gateway switch converts VxLAN packets into VLAN packets and back again, as shown in the following diagram. This topic assumes a VxLAN deployment.

  • Management Network: 10.10.85.0/24

  • Data Network: 10.1.1.0/24

  • Tenant VM Network: 10.10.10.0/24

Note
Note

These IP ranges are used in the topology shown in the diagram for illustration only.

9.3.16.3 HPE 5930 switch configuration

  1. Telnet to the 5930 switch and provide your username and password.

  2. Go into system view:

    system-view
  3. Create the required VLANs and VLAN ranges:

    vlan 103
    vlan 83
    vlan 183
    vlan 1261 to 1270
  4. Assign an IP address to VLAN 103. This is used as a data path network for VxLAN traffic.

    interface vlan 103
    ip address 10.1.1.10 255.255.255.0
  5. Assign an IP address to VLAN 83. This is used as a hardware VTEP network.

    interface vlan 83
    ip address 10.10.83.3 255.255.255.0

    The 5930 switch has a fortygigE1/0/5 interface to which a splitter cable is connected that splits the network into four tengigEthernet (tengigEthernet1/0/5:1 to tengigEthernet1/0/5:4) interfaces:

    • tengigEthernet1/0/5:1 and tengigEthernet1/0/5:2 are connected to the compute node. This is required just to bring the interface up. In other words, in order to have the HPE 5930 switch work as a router, there should be at least one interface of that particular VLAN up. Alternatively, the interface can be connected to any host or network element.

    • tengigEthernet1/0/5:3 is connected to a baremetal server.

    • tengigEthernet1/0/5:4 is connected to controller 3, as shown in the figure L2 Gateway and 5930 Switch.

    The switch’s fortygigE1/0/6 interface to which the splitter cable is connected splits it into four tengigEthernet (tengigEthernet1/0/6:1 to tengigEthernet1/0/6:4) interfaces:

    • tengigEthernet1/0/6:1 is connected to controller 2

    • tengigEthernet1/0/6:2 is connected to controller 1

      Note: 6:3 and 6:4 are not used although they are available.

  6. Split the fortygigE 1/0/5 interface into tengig interfaces:

    interface fortygigE 1/0/5
    using tengig
    The interface FortyGigE1/0/5 will be deleted. Continue? [Y/N]: y
  7. Configure the Ten-GigabitEthernet1/0/5:1 interface:

    interface Ten-GigabitEthernet1/0/5:1
    port link-type trunk
    port trunk permit vlan 83

9.3.16.4 Configuring the Provider Data Path Network

  1. Configure the Ten-GigabitEthernet1/0/5:2 interface:

    interface Ten-GigabitEthernet1/0/5:2
    port link-type trunk
    port trunk permit vlan 103
    port trunk permit vlan 1261 to 1270
    port trunk pvid vlan 103
  2. Configure the Ten-GigabitEthernet1/0/5:4 interface:

    interface Ten-GigabitEthernet1/0/5:4
    port link-type trunk
    port trunk permit vlan 103
    port trunk permit vlan 1261 to 1270
    port trunk pvid vlan 103
  3. Configure the Ten-GigabitEthernet1/0/5:3 interface:

    interface Ten-GigabitEthernet1/0/5:3
    port link-type trunk
    port trunk permit vlan 183
    vtep access port
  4. Split the fortygigE 1/0/6 interface into tengig interfaces:

    interface fortygigE 1/0/6
    using tengig
    The interface FortyGigE1/0/6 will be deleted. Continue? [Y/N]: y
  5. Configure the Ten-GigabitEthernet1/0/6:1 interface:

    interface Ten-GigabitEthernet1/0/6:1
    port link-type trunk
    port trunk permit vlan 103
    port trunk permit vlan 1261 to 1270
    port trunk pvid vlan 103
  6. Configure the Ten-GigabitEthernet1/0/6:2 interface:

    interface Ten-GigabitEthernet1/0/6:2
    port link-type trunk
    port trunk permit vlan 103
    port trunk permit vlan 1261 to 1270
    port trunk pvid vlan 103
  7. Enable l2vpn:

    l2vpn enable
  8. Configure a passive TCP connection for OVSDB on port 6632:

    ovsdb server ptcp port 6632
  9. Enable OVSDB server:

    ovsdb server enable
  10. Enable a VTEP process:

    vtep enable
  11. Configure 10.10.83.3 as the VTEP source IP. This acts as a hardware VTEP IP.

    tunnel global source-address 10.10.83.3
  12. Configure the VTEP access port:

    interface Ten-GigabitEthernet1/0/5:3
    vtep access port
  13. Disable VxLAN tunnel mac-learning:

    vxlan tunnel mac-learning disable
  14. Display the current configuration of the 5930 switch and verify the configuration:

    display current-configuration

After switch configuration is complete, you can dump OVSDB to see the entries.

  • Run the ovsdb-client from any Linux machine reachable by the switch:

    ovsdb-client dump --pretty tcp:10.10.85.10:6632
    
    sdn@small-linuxbox:~$ ovsdb-client dump --pretty tcp:10.10.85.10:6632
    Arp_Sources_Local table
    _uuid locator src_mac
    ----- ------- -------
    
    Arp_Sources_Remote table
    _uuid locator src_mac
    ----- ------- -------
    
    Global table
    _uuid                                managers switches
    ------------------------------------ -------- --------------------------------------
    2c891edc-439b-4144-84d9-fa...bf []       [f5f4b43b-40bc-4640-b580-d4...88]
    
    Logical_Binding_Stats table
    _uuid bytes_from_local bytes_to_local packets_from_local packets_to_local
    ----- ---------------- -------------- ------------------ ----------------
    
    Logical_Router table
    _uuid description name static_routes switch_binding
    ----- ----------- ---- ------------- --------------
    
    Logical_Switch table
    _uuid description name tunnel_key
    ----- ----------- ---- ----------
    
    Manager table
    _uuid inactivity_probe is_connected max_backoff other_config status target
    ----- ---------------- ------------ ----------- ------------ ------ ------
    
    Mcast_Macs_Local table
    MAC _uuid ipaddr locator_set logical_switch
    --- ----- ------ ----------- --------------
    
    Mcast_Macs_Remote table
    MAC _uuid ipaddr locator_set logical_switch
    --- ----- ------ ----------- --------------
    
    Physical_Locator table
    _uuid dst_ip encapsulation_type
    ----- ------ ------------------
    
    Physical_Locator_Set table
    _uuid locators
    ----- --------
    
    Physical_Port table
    _uuid      description name                         port_flt_status vlan_bindings vlan_stats
    ---------- ----------- ---------------------------- --------------- ------------- ----------
    fda9...07e ""          "Ten-GigabitEthernet1/0/5:3" [UP]            {}            {}
    
    Physical_Switch table
    _uuid     desc... mgmnt_ips name       ports       sw_flt_status tunnel_ips     tunnels
    --------- ------- --------- ---------- ----------- ------------- -------------- -------
    f5f...688 ""      []        "L2GTWY02" [fda...07e] []            ["10.10.83.3"] []
    
    Tunnel table
    _uuid bfd_config_local bfd_config_remote bfd_params bfd_status local remote
    ----- ---------------- ----------------- ---------- ---------- ----- ------
    
    Ucast_Macs_Local table
    MAC _uuid ipaddr locator logical_switch
    --- ----- ------ ------- --------------
    
    Ucast_Macs_Remote table
    MAC _uuid ipaddr locator logical_switch
    --- ----- ------ ------- --------------

9.3.16.5 Enabling and Configuring the L2 Gateway Agent

  1. Update the input model (in control_plane.yml) to specify where you want to run the neutron-l2gateway-agent. For example, see the line in bold in the following yml:

    ---
     product:
     version: 2
     control-planes:
     - name: cp
       region-name: region1
     failure-zones:
     - AZ1
       common-service-components:
         - logging-producer
         - openstack-monasca-agent
         - freezer-agent
         - stunnel
         - lifecycle-manager-target
     clusters:
     - name: cluster1
       cluster-prefix: c1
       server-role: ROLE-CONTROLLER
       member-count: 2
       allocation-policy: strict
     service-components:
    
     ...
     - neutron-l2gateway-agent
     ... )
  2. Update l2gateway_agent.ini.j2. For example, here the IP address (10.10.85.10) must be the management IP address of your 5930 switch. Open the file in vi:

    ardana > vi ~/my_cloud/config/neutron/l2gateway_agent.ini.j2
  3. Then make the changes:

    [ovsdb]
    # (StrOpt) OVSDB server tuples in the format
    # <ovsdb_name>:<ip address>:<port>[,<ovsdb_name>:<ip address>:<port>]
    # - ovsdb_name: symbolic name that helps identifies keys and certificate files
    # - ip address: the address or dns name for the ovsdb server
    # - port: the port (ssl is supported)
    ovsdb_hosts = hardware_vtep:10.10.85.10:6632
  4. By default, the L2 gateway agent initiates a connection to OVSDB servers running on the L2 gateway switches. Set the attribute enable_manager to True if you want to change this behavior (to make L2 gateway switches initiate a connection to the L2 gateway agent). In this case, it is assumed that the Manager table in the OVSDB hardware_vtep schema on the L2 gateway switch has been populated with the management IP address of the L2 gateway agent and the port.

    #enable_manager = False
    #connection can be initiated by the ovsdb server.
    #By default 'enable_manager' value is False, turn on the variable to True
    #to initiate the connection from ovsdb server to l2gw agent.
  5. If the port that is configured with enable_manager = True is any port other than 6632, update the 2.0/services/neutron/l2gateway-agent.yml input model file with that port number:

    endpoints:
        - port: '6632'
          roles:
          - ovsdb-server
  6. Note: The following command can be used to set the Manager table on the switch from a remote system:

    sudo vtep-ctl --db=tcp:10.10.85.10:6632 set-manager tcp:10.10.85.130:6632
  7. For SSL communication, the command is:

    sudo vtep-ctl --db=tcp:10.10.85.10:6632 set-manager ssl:10.10.85.130:6632

    where 10.10.85.10 is the management IP address of the L2 gateway switch and 10.10.85.130 is the management IP of the host on which the L2 gateway agent runs.

    Therefore, in the above topology, this command has to be repeated for 10.10.85.131 and 10.10.85.132.

  8. If you are not using SSL, comment out the following:

    #l2_gw_agent_priv_key_base_path={{ neutron_l2gateway_agent_creds_dir }}/keys
    #l2_gw_agent_cert_base_path={{ neutron_l2gateway_agent_creds_dir }}/certs
    #l2_gw_agent_ca_cert_base_path={{ neutron_l2gateway_agent_creds_dir }}/ca_certs
  9. If you are using SSL, then rather than commenting out the attributes, specify the directory path of the private key, the certificate, and the CA cert that the agent should use to communicate with the L2 gateway switch which has the OVSDB server enabled for SSL communication.

    Make sure that the directory path of the files is given permissions 755, and the files’ owner is root and the group is root with 644 permissions.

    Private key: The name should be the same as the symbolic name used above in ovsdb_hosts attribute. The extension of the file should be ".key". With respect to the above example, the filename will be hardware_vtep.key

    Certificate The name should be the same as the symbolic name used above in ovsdb_hosts attribute. The extension of the file should be “.cert”. With respect to the above example, the filename will be hardware_vtep.cert

    CA certificate The name should be the same as the symbolic name used above in ovsdb_hosts attribute. The extension of the file should be “.ca_cert”. With respect to the above example, the filename will be hardware_vtep.ca_cert

  10. To enable the HPE 5930 switch for SSL communication, execute the following commands:

    undo ovsdb server ptcp
    undo ovsdb server enable
    ovsdb server ca-certificate flash:/cacert.pem bootstrap
    ovsdb server certificate flash:/sc-cert.pem
    ovsdb server private-key flash:/sc-privkey.pem
    ovsdb server pssl port 6632
    ovsdb server enable
  11. Data from the OVSDB sever with SSL can be viewed using the following command:

    ovsdb-client -C <ca-cert.pem> -p <client-private-key.pem> -c <client-cert.pem> \
    dump ssl:10.10.85.10:6632

9.3.16.6 Routing Between Software and Hardware - VTEP Networks

In order to allow L2 gateway switches to send VxLAN packets over the correct tunnels destined for the compute node and controller node VTEPs, you must ensure that the cloud VTEP (compute and controller) IP addresses are in different network/subnet from that of the L2 gateway switches. You must also create a route between these two networks. This is explained below.

  1. In the following example of the input model file networks.yml, the GUEST-NET represents the cloud data VxLAN network. REMOTE-NET is the network that represents the hardware VTEP network.

    # networks.yml
     networks:
        - name: GUEST-NET
          vlanid: 103
          tagged-vlan: false
          cidr: 10.1.1.0/24
          gateway-ip: 10.1.1.10
          network-group: GUEST
    
        - name: REMOTE-NET
          vlanid: 183
          tagged-vlan: false
          cidr: 10.10.83.0/24
          gateway-ip: 10.10.83.3
          network-group: REMOTE
  2. The route must be configured between the two networks in the network-groups.yml input model file:

    # network_groups.yml
     network-groups:
        - name: REMOTE
          routes:
            - GUEST
    
        - name: GUEST
          hostname-suffix: guest
          tags:
            - neutron.networks.vxlan:
                tenant-vxlan-id-range: "1:5000"
          routes:
            - REMOTE
    Note
    Note

    Note that the IP route is configured on the compute node. Per this route, the HPE 5930 acts as a gateway that routes between the two networks.

  3. On the compute node, it looks like this:

    tux > sudo ip route
    10.10.83.0/24 via 10.1.1.10 dev eth4
  4. Run the following Ansible playbooks to apply the changes.

    config-processor-run.yml:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml

    ready-deployment.yml:

    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml

    ardana-reconfigure.yml:

    ardana > cd ~/scratch/ansible/next/ardana/ansible/
    ardana > ansible-playbook -i hosts/verb_hosts ardana-reconfigure.yml

Notes:

  • Make sure that the controller cluster is able to reach the management IP address of the L2 gateway switch. Otherwise, the L2 gateway agents running on the controllers will not be able to reach the gateway switches.

  • Make sure that the interface on the baremetal server connected to the 5930 switch is tagged (this is explained shortly).

9.3.16.7 Connecting a Bare-Metal Server to the HPE 5930 Switch

As SUSE Linux Enterprise Server (bare-metal server) is not aware of tagged packets, it is not possible for a virtual machine to communicate with a baremetal box.

As the administrator, you must manually perform configuration changes to the interface in the HPE 5930 switch to which the baremetal server is connected so that the switch can send untagged packets. Either one of the following command sets can be used to do so:

Interface INTERFACE NUMBER
service-instance SERVICE-INSTANCE ID
encapsulation untagged
xconnect vsi VSI-NAME

Or:

Interface INTERFACE NUMBER
service-instance SERVICE-INSTANCE ID
encapsulation s-vid VLAN-ID
xconnect vsi VSI-NAME access-mode ethernet

There are two ways of configuring the baremetal server to communicate with virtual machines. If the switch sends tagged traffic, then the baremetal server should be able to receive the tagged traffic.

9.3.16.8 Configuration on a Bare-Metal Server

If the configuration changes mentioned previously are not made on the switch to send untagged traffic to the bare-metal server, to make the bare-metal server receive tagged traffic from the switch, perform the following on the bare-metal server:

  • Bare-metal management IP is 10.10.85.129 on interface em1

  • Switch 5930 must be connected to baremetal on eth1

  • IP address must be set into tagged interface of eth1

  1. Create a tagged (VLAN 183) interface

    vconfig add eth1 183
  2. Assign the IP address (10.10.10.129) to eth1.183 tagged interface (IP from the subnet 10.10.10.0/24 as VM (10.10.10.4) spawned in Compute node belongs to this subnet).

    ifconfig eth1.183 10.10.10.129/24

9.3.16.9 NIC Bonding and IRF Configuration

With L2 gateway in actiondeployment, NIC bonding can be enabled on compute nodes. For more details on NIC bonding, please refer to Book “Planning an Installation with Cloud Lifecycle Manager”, Chapter 6 “Configuration Objects”, Section 6.11 “Interface Models”. In order to achieve high availability, HPE 5930 switches can be configured to form a cluster using Intelligent Resilient Framework (IRF). Please refer to the HPE FlexFabric 5930 Switch Series configuration guide for details.

9.3.16.10 Scale Numbers Tested

  • Number of neutron port MACs tested on a single switch: 4000

  • Number of HPE 5930 switches tested: 2

  • Number of baremetal connected to a single HPE 5930 switch: 100

  • Number of L2 gateway connections to different networks: 800

9.3.16.11 L2 Gateway Commands

These commands are not part of the L2 gateway deployment. They are to be executed after L2 gateway is deployed.

  1. Create Network

    ardana > neutron net-create net1
  2. Create Subnet

    ardana > neutron subnet-create net1 10.10.10.0/24
  3. Boot a tenant VM (nova boot –image IMAGE-ID --flavor 2 –nic net-id=NET_ID VM NAME)

    nova boot --image 1f3cd49d-9239-49cf-8736-76bac5360489 --flavor 2 \
      --nic net-id=4f6c58b6-0acc-4e93-bb4c-439b38c27a23 VM

    Assume the VM was assigned the IP address 10.10.10.4

  4. Create the L2 gateway filling in your information here:

    neutron l2-gateway-create \
      --device name="SWITCH_NAME",interface_names="INTERFACE_NAME" \
      GATEWAY-NAME

    For this example:

    ardana > neutron l2-gateway-create \
      --device name="L2GTWY02",interface_names="Ten-GigabitEthernet1/0/5:3" \
      gw1

    Ping from the VM (10.10.10.4) to baremetal server and from baremetal server (10.10.10.129) to the VM Ping should not work as there is no gateway connection created yet.

  5. Create l2 gateway Connection

    ardana > neutron l2-gateway-connection-create gw1 net1 --segmentation-id 183
  6. Ping from VM (10.10.10.4) to baremetal and from baremetal (10.10.10.129) to VM Ping should work.

  7. Delete l2 gateway Connection

    ardana > neutron l2-gateway-connection-delete GATEWAY_ID/GATEWAY_NAME
  8. Ping from the VM (10.10.10.4) to baremetal and from baremetal (10.10.10.129) to the VM. Ping should not work as l2 gateway connection was deleted.

9.3.17 Setting up VLAN-Aware VMs

Creating a VM with a trunk port will allow a VM to gain connectivity to one or more networks over the same virtual NIC (vNIC) through the use VLAN interfaces in the guest VM. Connectivity to different networks can be added and removed dynamically through the use of subports. The network of the parent port will be presented to the VM as the untagged VLAN, and the networks of the child ports will be presented to the VM as the tagged VLANs (the VIDs of which can be chosen arbitrarily as long as they are unique to that trunk). The VM will send/receive VLAN-tagged traffic over the subports, and Neutron will mux/demux the traffic onto the subport's corresponding network. This is not to be confused with Section 9.3.18, “Enabling VLAN Transparent Networks”, in which a VM can pass VLAN-tagged traffic transparently across the network without interference from Neutron.

9.3.17.1 Terminology

  • Trunk: a resource that logically represents a trunked vNIC and references a parent port.

  • Parent port: a Neutron port that a Trunk is referenced to. Its network is presented as the untagged VLAN.

  • Subport: a resource that logically represents a tagged VLAN port on a Trunk. A Subport references a child port and consists of the <port>,<segmentation-type>,<segmentation-id> tuple. Currently only the 'vlan' segmentation type is supported.

  • Child port: a Neutron port that a Subport is referenced to. Its network is presented as a tagged VLAN based upon the segmentation-id used when creating/adding a Subport.

  • Legacy VM: a VM that does not use a trunk port.

  • Legacy port: a Neutron port that is not used in a Trunk.

  • VLAN-aware VM: a VM that uses at least one trunk port.

9.3.17.2 Trunk CLI reference

CommandAction
network trunk create Create a trunk.
network trunk delete Delete a given trunk.
network trunk list List all trunks.
network trunk show Show information of a given trunk.
network trunk set Add subports to a given trunk.
network subport list List all subports for a given trunk.
network trunk unsetRemove subports from a given trunk.
network trunk setUpdate trunk properties.

9.3.17.3 Enabling VLAN-aware VM capability

  1. Edit ~/openstack/my_cloud/config/neutron/neutron.conf.j2 to add the "trunk" service_plugin:

    service_plugins = {{ neutron_service_plugins }},trunk
  2. Edit ~/openstack/my_cloud/config/neutron/ml2_conf.ini.j2 to enable the noop firewall driver:

    [securitygroup]
    firewall_driver = neutron.agent.firewall.NoopFirewallDriver
    Note
    Note

    This is a manual configuration step because it must be made apparent that this step disables Neutron security groups completely. The default SUSE OpenStack Cloud firewall_driver is neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewall Driver which does not implement security groups for trunk ports. Optionally, the SUSE OpenStack Cloud default firewall_driver may still be used (that is, skip this step), which would provide security groups for legacy VMs but not for VLAN-aware VMs. However, this mixed environment is not recommended. For more information, see Section 9.3.17.6, “Firewall issues”.

  3. Commit the configuration changes:

    ardana > git add -A
    ardana > git commit -m "Enable vlan-aware VMs"
    ardana > cd ~/openstack/ardana/ansible/
  4. If this is an initial deployment, continue the rest of normal deployment process:

    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts site.yml
  5. If the cloud has already been deployed and this is a reconfiguration:

    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts neutron-reconfigure.yml

9.3.17.4 Use Cases

Creating a trunk port

Assume that a number of Neutron networks/subnets already exist: private, foo-net, and bar-net. This will create a trunk with two subports allocated to it. The parent port will be on the "private" network, while the two child ports will be on "foo-net" and "bar-net", respectively:

  1. Create a port that will function as the trunk's parent port:

    ardana > neutron port-create --name trunkparent private
  2. Create ports that will function as the child ports to be used in subports:

    ardana > neutron port-create --name subport1 foo-net
    ardana > neutron port-create --name subport2 bar-net
  3. Create a trunk port using the openstack network trunk create command, passing the parent port created in step 1 and child ports created in step 2:

    ardana > openstack network trunk create --parent-port trunkparent --subport port=subport1,segmentation-type=vlan,segmentation-id=1 --subport port=subport2,segmentation-type=vlan,segmentation-id=2 mytrunk
    +-----------------+-----------------------------------------------------------------------------------------------+
    | Field           | Value                                                                                         |
    +-----------------+-----------------------------------------------------------------------------------------------+
    | admin_state_up  | UP                                                                                            |
    | created_at      | 2017-06-02T21:49:59Z                                                                          |
    | description     |                                                                                               |
    | id              | bd822ebd-33d5-423e-8731-dfe16dcebac2                                                          |
    | name            | mytrunk                                                                                       |
    | port_id         | 239f8807-be2e-4732-9de6-c64519f46358                                                          |
    | project_id      | f51610e1ac8941a9a0d08940f11ed9b9                                                              |
    | revision_number | 1                                                                                             |
    | status          | DOWN                                                                                          |
    | sub_ports       | port_id='9d25abcf-d8a4-4272-9436-75735d2d39dc', segmentation_id='1', segmentation_type='vlan' |
    |                 | port_id='e3c38cb2-0567-4501-9602-c7a78300461e', segmentation_id='2', segmentation_type='vlan' |
    | tenant_id       | f51610e1ac8941a9a0d08940f11ed9b9                                                              |
    | updated_at      | 2017-06-02T21:49:59Z                                                                          |
    +-----------------+-----------------------------------------------------------------------------------------------+
    
    $ openstack network subport list --trunk mytrunk
    +--------------------------------------+-------------------+-----------------+
    | Port                                 | Segmentation Type | Segmentation ID |
    +--------------------------------------+-------------------+-----------------+
    | 9d25abcf-d8a4-4272-9436-75735d2d39dc | vlan              |               1 |
    | e3c38cb2-0567-4501-9602-c7a78300461e | vlan              |               2 |
    +--------------------------------------+-------------------+-----------------+

    Optionally, a trunk may be created without subports (they can be added later):

    ardana > openstack network trunk create --parent-port trunkparent mytrunk
    +-----------------+--------------------------------------+
    | Field           | Value                                |
    +-----------------+--------------------------------------+
    | admin_state_up  | UP                                   |
    | created_at      | 2017-06-02T21:45:35Z                 |
    | description     |                                      |
    | id              | eb8a3c7d-9f0a-42db-b26a-ca15c2b38e6e |
    | name            | mytrunk                              |
    | port_id         | 239f8807-be2e-4732-9de6-c64519f46358 |
    | project_id      | f51610e1ac8941a9a0d08940f11ed9b9     |
    | revision_number | 1                                    |
    | status          | DOWN                                 |
    | sub_ports       |                                      |
    | tenant_id       | f51610e1ac8941a9a0d08940f11ed9b9     |
    | updated_at      | 2017-06-02T21:45:35Z                 |
    +-----------------+--------------------------------------+

    A port that is already bound (that is, already in use by a VM) cannot be upgraded to a trunk port. The port must be unbound to be eligible for use as a trunk's parent port. When adding subports to a trunk, the child ports must be unbound as well.

Checking a port's trunk details

Once a trunk has been created, its parent port will show the trunk_details attribute, which consists of the trunk_id and list of subport dictionaries:

ardana > neutron port-show -F trunk_details trunkparent
+---------------+-------------------------------------------------------------------------------------+
| Field         | Value                                                                               |
+---------------+-------------------------------------------------------------------------------------+
| trunk_details | {"trunk_id": "bd822ebd-33d5-423e-8731-dfe16dcebac2", "sub_ports":                   |
|               | [{"segmentation_id": 2, "port_id": "e3c38cb2-0567-4501-9602-c7a78300461e",          |
|               | "segmentation_type": "vlan", "mac_address": "fa:16:3e:11:90:d2"},                   |
|               | {"segmentation_id": 1, "port_id": "9d25abcf-d8a4-4272-9436-75735d2d39dc",           |
|               | "segmentation_type": "vlan", "mac_address": "fa:16:3e:ff:de:73"}]}                  |
+---------------+-------------------------------------------------------------------------------------+

Ports that are not trunk parent ports will not have a trunk_details field:

ardana > neutron port-show -F trunk_details subport1
need more than 0 values to unpack

Adding subports to a trunk

Assuming a trunk and new child port have been created already, the trunk-subport-add command will add one or more subports to the trunk.

  1. Run openstack network trunk set

    ardana > openstack network trunk set --subport port=subport3,segmentation-type=vlan,segmentation-id=3 mytrunk
  2. Run openstack network subport list

    ardana > openstack network subport list --trunk mytrunk
    +--------------------------------------+-------------------+-----------------+
    | Port                                 | Segmentation Type | Segmentation ID |
    +--------------------------------------+-------------------+-----------------+
    | 9d25abcf-d8a4-4272-9436-75735d2d39dc | vlan              |               1 |
    | e3c38cb2-0567-4501-9602-c7a78300461e | vlan              |               2 |
    | bf958742-dbf9-467f-b889-9f8f2d6414ad | vlan              |               3 |
    +--------------------------------------+-------------------+-----------------+
Note
Note

The --subport option may be repeated multiple times in order to add multiple subports at a time.

Removing subports from a trunk

To remove a subport from a trunk, use openstack network trunk unset command:

ardana > openstack network trunk unset --subport subport3 mytrunk

Deleting a trunk port

To delete a trunk port, use the openstack network trunk delete command:

ardana > openstack network trunk delete mytrunk

Once a trunk has been created successfully, its parent port may be passed to the nova boot command, which will make the VM VLAN-aware:

ardana > nova boot --image ubuntu-server --flavor 1 --nic port-id=239f8807-be2e-4732-9de6-c64519f46358 vlan-aware-vm
Note
Note

A trunk cannot be deleted until its parent port is unbound. Mainly, this means you must delete the VM using the trunk port before you are allowed to delete the trunk.

9.3.17.5 VLAN-aware VM network configuration

This section illustrates how to configure the VLAN interfaces inside a VLAN-aware VM based upon the subports allocated to the trunk port being used.

  1. Run openstack network trunk subport list to see the VLAN IDs in use on the trunk port:

    ardana > openstack network subport list --trunk mytrunk
    +--------------------------------------+-------------------+-----------------+
    | Port                                 | Segmentation Type | Segmentation ID |
    +--------------------------------------+-------------------+-----------------+
    | e3c38cb2-0567-4501-9602-c7a78300461e | vlan              |               2 |
    +--------------------------------------+-------------------+-----------------+
  2. Run neutron port-show on the child port to get its mac_address:

    ardana > neutron port-show -F mac_address 08848e38-50e6-4d22-900c-b21b07886fb7
    +-------------+-------------------+
    | Field       | Value             |
    +-------------+-------------------+
    | mac_address | fa:16:3e:08:24:61 |
    +-------------+-------------------+
  3. Log into the VLAN-aware VM and run the following commands to set up the VLAN interface:

    tux > sudo ip link add link ens3 ens3.2 address fa:16:3e:11:90:d2 broadcast ff:ff:ff:ff:ff:ff type vlan id 2
    $ sudo ip link set dev ens3.2 up
  4. Note the usage of the mac_address from step 2 and VLAN ID from step 1 in configuring the VLAN interface:

    tux > sudo ip link add link ens3 ens3.2 address fa:16:3e:11:90:d2 broadcast ff:ff:ff:ff:ff:ff type vlan id 2
  5. Trigger a DHCP request for the new vlan interface to verify connectivity and retrieve its IP address. On an Ubuntu VM, this might be:

    tux > sudo dhclient ens3.2
    tux > sudo ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP group default qlen 1000
        link/ether fa:16:3e:8d:77:39 brd ff:ff:ff:ff:ff:ff
        inet 10.10.10.5/24 brd 10.10.10.255 scope global ens3
           valid_lft forever preferred_lft forever
        inet6 fe80::f816:3eff:fe8d:7739/64 scope link
           valid_lft forever preferred_lft forever
    3: ens3.2@ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default qlen 1000
        link/ether fa:16:3e:11:90:d2 brd ff:ff:ff:ff:ff:ff
        inet 10.10.12.7/24 brd 10.10.12.255 scope global ens3.2
           valid_lft forever preferred_lft forever
        inet6 fe80::f816:3eff:fe11:90d2/64 scope link
           valid_lft forever preferred_lft forever

9.3.17.6 Firewall issues

The SUSE OpenStack Cloud default firewall_driver is neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver. This default does not implement security groups for VLAN-aware VMs, but it does implement security groups for legacy VMs. For this reason, it is recommended to disable Neutron security groups altogether when using VLAN-aware VMs. To do so, set:

firewall_driver = neutron.agent.firewall.NoopFirewallDriver

Doing this will prevent having a mix of firewalled and non-firewalled VMs in the same environment, but it should be done with caution because all VMs would be non-firewalled.

9.3.18 Enabling VLAN Transparent Networks

VLAN transparent networks in SUSE OpenStack Cloud 8 allow for support of communication between guest VMs over tagged VLANs configured on the VMs without requiring any knowledge of said VLANs by Neutron. VLAN transparency is only supported in clouds that are configured with DPDK. Attempting to configure VLAN transparency in a non-DPDK cloud will result in an error message from the configuration processor.

9.3.18.1 Enabling VLAN transparency support

To enable VLAN transparency support, the entry vlan_transparent: True must be included in the configuration-data for Neutron (in the data/neutron/neutron_config.yml file of the cloud model). For example:

---
  product:
version: 2

  configuration-data:
    - name:  NEUTRON-CONFIG-CP1
      services:
        - neutron
      data:
        vlan_transparent: True

VLAN transparency is only allowed to be configured in clouds that are also configured with DPDK. Attempting to configure VLAN transparency in a non-DPDK cloud will result in an error message from the configuration processor.

9.3.18.2 Creating and using a VLAN transparent network

To create a network that supports VLAN transparency in Neutron, the flag --vlan-transparent True must be supplied at network creation time:

ardana > neutron net-create --vlan-transparent True  mynetwork
Created a new network:
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | True                                 |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2016-09-22T15:22:21                  |
| description               |                                      |
| id                        | bec25a3c-974e-4875-97ff-54a71508c6fe |
| ipv4_address_scope        |                                      |
| ipv6_address_scope        |                                      |
| mtu                       | 1500                                 |
| name                      | mynetwork                            |
| provider:network_type     | vlan                                 |
| provider:physical_network | physnet2                             |
| provider:segmentation_id  | 3861                                 |
| router:external           | False                                |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   |                                      |
| tags                      |                                      |
| tenant_id                 | f246417e37ee40ce9c4cb7f65ed697f6     |
| updated_at                | 2016-09-22T15:22:21                  |
| vlan_transparent          | True                                 |
+---------------------------+--------------------------------------+

As you will notice in the output above, the created network will report a value of True for the field vlan_transparent upon successful creation. Once the VLAN transparent network is created (and configured with a subnet), guest VMs that will support communication over tagged VLANs on the guests can be instantiated on that network. Note that the guest VM images must have the 8021q kernel module enabled and loaded if they are to have tagged VLANs configured. The Ubuntu cloud images (available at https://cloud-images.ubuntu.com) have the 8021q kernel module enabled. Configuration of tagged VLANs on guest images can be accomplished manually on the individual guests if they are not already baked into the image. Note that Neutron will not provide any DHCP service for tagged VLAN configuration on guests (since Neutron is completely unaware of the guest VLANs as per the definition of VLAN transparency). Here is a sample of the steps to configure a tagged VLAN on a guest VM:

tux > sudo ip link add link eth0 name vlan10 type vlan id 10
tux > sudo ip addr add 192.128.111.3/24 dev vlan10
tux > sudo ip link set vlan10 up

And here is what the interface configuration looks like after the above steps are performed:

tux > sudo ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:4b:ca:76 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.13/24 brd 10.1.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe4b:ca76/64 scope link
       valid_lft forever preferred_lft forever
3: vlan10@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
    link/ether fa:16:3e:4b:ca:76 brd ff:ff:ff:ff:ff:ff
    inet 192.128.111.3/24 scope global vlan10
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe4b:ca76/64 scope link
       valid_lft forever preferred_lft forever

Guest VMs on a VLAN transparent network will be able to communicate with each other over their tagged VLANs. Support is also included for double-tagged VLANs on guest VMs.

10 Managing the Dashboard

Information about managing and configuring the Dashboard service.

10.1 Configuring the Dashboard Service

Horizon is the OpenStack service that serves as the basis for the SUSE OpenStack Cloud dashboards.

The dashboards provide a web-based user interface to SUSE OpenStack Cloud services including Compute, Volume Operations, Networking, and Identity.

Along the left side of the dashboard are sections that provide access to Project and Identity sections. If your login credentials have been assigned the 'admin' role you will also see a separate Admin section that provides additional system-wide setting options.

Across the top are menus to switch between projects and menus where you can access user settings.

10.1.1 Dashboard Service and TLS in SUSE OpenStack Cloud

By default, the Dashboard service is configured with TLS in the input model (ardana-input-model). You should not disable TLS in the input model for the Dashboard service. The normal use case for users is to have all services behind TLS, but users are given the freedom in the input model to take a service off TLS for troubleshooting or debugging. TLS should always be enabled for production environments.

Make sure that horizon_public_protocol and horizon_private_protocol are both be set to use https.

10.2 Changing the Dashboard Timeout Value

The default session timeout for the dashboard is 1800 seconds or 30 minutes. This is the recommended default and best practice for those concerned with security.

As an administrator, you can change the session timeout by changing the value of the SESSION_TIMEOUT to anything less than or equal to 14400, which is equal to four hours. Values greater than 14400 should not be used due to Keystone constraints.

Warning
Warning

Increasing the value of SESSION_TIMEOUT increases the risk of abuse.

10.2.1 How to Change the Dashboard Timeout Value

Follow these steps to change and commit the Horizon timeout value.

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the Dashboard config file at ~/openstack/my_cloud/config/horizon/local_settings.py and, if it is not already present, add a line for SESSION_TIMEOUT above the line for SESSION_ENGINE.

    Here is an example snippet, in bold:

    SESSION_TIMEOUT = <timeout value>
    SESSION_ENGINE = 'django.contrib.sessions.backends.db'
    Important
    Important

    Do not exceed the maximum value of 14400.

  3. Commit the changes to git:

    git add -A
    git commit -a -m "changed Horizon timeout value"
  4. Run the configuration processor:

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Update your deployment directory:

    cd ~/openstack/ardana/ansible
    ansible-playbook -i hosts/localhost ready-deployment.yml
  6. Run the Dashboard reconfigure playbook:

    cd ~/scratch/ansible/next/ardana/ansible
    ansible-playbook -i hosts/verb_hosts horizon-reconfigure.yml

11 Managing Orchestration

Information about managing and configuring the Orchestration service, based on OpenStack Heat.

11.1 Configuring the Orchestration Service

Information about configuring the Orchestration service, based on OpenStack Heat.

The Orchestration service, based on OpenStack Heat, does not need any additional configuration to be used. This documenent describes some configuration options as well as reasons you may want to use them.

Heat Stack Tag Feature

Heat provides a feature called Stack Tags to allow attributing a set of simple string-based tags to stacks and optionally the ability to hide stacks with certain tags by default. This feature can be used for behind-the-scenes orchestration of cloud infrastructure, without exposing the cloud user to the resulting automatically-created stacks.

Additional details can be seen here: OpenStack - Stack Tags.

In order to use the Heat stack tag feature, you need to use the following steps to define the hidden_stack_tags setting in the Heat configuration file and then reconfigure the service to enable the feature.

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the Heat configuration file, at this location:

    ~/openstack/my_cloud/config/heat/heat.conf.j2
  3. Under the [DEFAULT] section, add a line for hidden_stack_tags. Example:

    [DEFAULT]
    hidden_stack_tags="<hidden_tag>"
  4. Commit the changes to your local git:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add --all
    ardana > git commit -m "enabling Heat Stack Tag feature"
  5. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Reconfigure the Orchestration service:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts heat-reconfigure.yml

To begin using the feature, use these steps to create a Heat stack using the defined hidden tag. You will need to use credentials that have the Heat admin permissions. In the example steps below we are going to do this from the Cloud Lifecycle Manager using the admin credentials and a Heat template named heat.yaml:

  1. Log in to the Cloud Lifecycle Manager.

  2. Source the admin credentials:

    ardana > source ~/service.osrc
  3. Create a Heat stack using this feature:

    ardana > openstack stack create -f heat.yaml hidden-stack --tags hidden
  4. If you list your Heat stacks, your hidden one will not show unless you use the --show-hidden switch.

    Example, not showing hidden stacks:

    ardana > openstack stack list

    Example, showing the hidden stacks:

    ardana > openstack stack list --show-hidden

11.2 Autoscaling using the Orchestration Service

Autoscaling is a process that can be used to scale up and down your compute resources based on the load they are currently experiencing to ensure a balanced load.

11.2.1 What is autoscaling?

Autoscaling is a process that can be used to scale up and down your compute resources based on the load they are currently experiencing to ensure a balanced load across your compute environment.

Important
Important

Autoscaling is only supported for KVM.

11.2.2 How does autoscaling work?

The monitoring service, Monasca, monitors your infrastructure resources and generates alarms based on their state. The orchestration service, Heat, talks to the Monasca API and offers the capability to templatize the existing Monasca resources, which are the Monasca Notification and Monasca Alarm definition. Heat can configure certain alarms for the infrastructure resources (compute instances and block storage volumes) it creates and can expect Monasca to notify continuously if a certain evaluation pattern in an alarm definition is met.

For example, Heat can tell Monasca that it needs an alarm generated if the average CPU utilization of the compute instance in a scaling group goes beyond 90%.

As Monasca continuously monitors all the resources in the cloud, if it happens to see a compute instance spiking above 90% load as configured by Heat, it generates an alarm and in turn sends a notification to Heat. Once Heat is notified, it will execute an action that was preconfigured in the template. Commonly, this action will be a scale up to increase the number of compute instances to balance the load that is being taken by the compute instance scaling group.

Monasca sends a notification every 60 seconds while the alarm is in the ALARM state.

11.2.3 Autoscaling template example

The following Monasca alarm definition template snippet is an example of instructing Monasca to generate an alarm if the average CPU utilization in a group of compute instances exceeds beyond 50%. If the alarm is triggered, it will invoke the up_notification webhook once the alarm evaluation expression is satisfied.

cpu_alarm_high:
  type: OS::Monasca::AlarmDefinition
  properties:
    name: CPU utilization beyond 50 percent
    description: CPU utilization reached beyond 50 percent
    expression:
    str_replace:
    template: avg(cpu.utilization_perc{scale_group=scale_group_id}) > 50 times 3
    params:
    scale_group_id: {get_param: "OS::stack_id"}
    severity: high
    alarm_actions:
      - {get_resource: up_notification }

The following Monasca notification template snippet is an example of creating a Monasca notification resource that will be used by the alarm definition snippet to notify Heat.

up_notification:
  type: OS::Monasca::Notification
  properties:
    type: webhook
    address: {get_attr: [scale_up_policy, alarm_url]}

11.2.4 Monasca Agent configuration options

There is a Monasca Agent configuration option which controls the behavior around compute instance creation and the measurements being received from the compute instance.

The variable is monasca_libvirt_vm_probation which is set in the ~/openstack/my_cloud/config/nova/libvirt-monitoring.yml file. Here is a snippet of the file showing the description and variable:

# The period of time (in seconds) in which to suspend metrics from a
# newly-created VM. This is used to prevent creating and storing
# quickly-obsolete metrics in an environment with a high amount of instance
# churn (VMs created and destroyed in rapid succession).  Setting to 0
# disables VM probation and metrics will be recorded as soon as possible
# after a VM is created.  Decreasing this value in an environment with a high
# amount of instance churn can have a large effect on the total number of
# metrics collected and increase the amount of CPU, disk space and network
# bandwidth required for Monasca. This value may need to be decreased if
# Heat Autoscaling is in use so that Heat knows that a new VM has been
# created and is handling some of the load.
monasca_libvirt_vm_probation: 300

The default value is 300. This is the time in seconds that a compute instance must live before the Monasca libvirt agent plugin will send measurements for it. This is so that the Monasca metrics database does not fill with measurements from short lived compute instances. However, this means that the Monasca threshold engine will not see measurements from a newly created compute instance for at least five minutes on scale up. If the newly created compute instance is able to start handling the load in less than five minutes, then Heat autoscaling may mistakenly create another compute instance since the alarm does not clear.

If the default monasca_libvirt_vm_probation turns out to be an issue, it can be lowered. However, that will affect all compute instances, not just ones used by Heat autoscaling which can increase the number of measurements stored in Monasca if there are many short lived compute instances. You should consider how often compute instances are created that live less than the new value of monasca_libvirt_vm_probation. If few, if any, compute instances live less than the value of monasca_libvirt_vm_probation, then this value can be decreased without causing issues. If many compute instances live less than the monasca_libvirt_vm_probation period, then decreasing monasca_libvirt_vm_probation can cause excessive disk, CPU and memory usage by Monasca.

If you wish to change this value, follow these steps:

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the monasca_libvirt_vm_probation value in this configuration file:

    ~/openstack/my_cloud/config/nova/libvirt-monitoring.yml
  3. Commit your changes to the local git:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add --all
    ardana > git commit -m "changing Monasca Agent configuration option"
  4. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  6. Run this playbook to reconfigure the Nova service and enact your changes:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts nova-reconfigure.yml

12 Managing Monitoring, Logging, and Usage Reporting

Information about the monitoring, logging, and metering services included with your SUSE OpenStack Cloud.

12.1 Monitoring

The SUSE OpenStack Cloud Monitoring service leverages OpenStack Monasca, which is a multi-tenant, scalable, fault tolerant monitoring service.

12.1.1 Getting Started with Monitoring

You can use the SUSE OpenStack Cloud Monitoring service to monitor the health of your cloud and, if necessary, to troubleshoot issues.

Monasca data can be extracted and used for a variety of legitimate purposes, and different purposes require different forms of data sanitization or encoding to protect against invalid or malicious data. Any data pulled from Monasca should be considered untrusted data, so users are advised to apply appropriate encoding and/or sanitization techniques to ensure safe and correct usage and display of data in a web browser, database scan, or any other use of the data.

12.1.1.1 Monitoring Service Overview

12.1.1.1.1 Installation

The monitoring service is automatically installed as part of the SUSE OpenStack Cloud installation.

No specific configuration is required to use Monasca. However, you can configure the database for storing metrics as explained in Section 12.1.2, “Configuring the Monitoring Service”.

12.1.1.1.2 Differences Between Upstream and SUSE OpenStack Cloud Implementations

In SUSE OpenStack Cloud, the OpenStack monitoring service, Monasca, is included as the monitoring solution, except for the following which are not included:

  • Transform Engine

  • Events Engine

  • Anomaly and Prediction Engine

Note
Note

Icinga was supported in previous SUSE OpenStack Cloud versions but it has been deprecated in SUSE OpenStack Cloud 8.

12.1.1.1.3 Diagram of Monasca Service
12.1.1.1.4 For More Information

For more details on OpenStack Monasca, see Monasca.io

12.1.1.1.5 Back-end Database

The monitoring service default metrics database is Cassandra, which is a highly-scalable analytics database and the recommended database for SUSE OpenStack Cloud.

You can learn more about Cassandra at Apache Cassandra.

12.1.1.2 Working with Monasca

Monasca-Agent

The monasca-agent is a Python program that runs on the control plane nodes. It runs the defined checks and then sends data onto the API. The checks that the agent runs include:

  • System Metrics: CPU utilization, memory usage, disk I/O, network I/O, and filesystem utilization on the control plane and resource nodes.

  • Service Metrics: the agent supports plugins such as MySQL, RabbitMQ, Kafka, and many others.

  • VM Metrics: CPU utilization, disk I/O, network I/O, and memory usage of hosted virtual machines on compute nodes. Full details of these can be found https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md#per-instance-metrics.

For a full list of packaged plugins that are included SUSE OpenStack Cloud, see Monasca Plugins

You can further customize the monasca-agent to suit your needs, see Customizing the Agent

12.1.1.3 Accessing the Monitoring Service

Access to the Monitoring service is available through a number of different interfaces.

12.1.1.3.1 Command-Line Interface

For users who prefer using the command line, there is the python-monascaclient, which is part of the default installation on your Cloud Lifecycle Manager node.

For details on the CLI, including installation instructions, see Python-Monasca Client

Monasca API

If low-level access is desired, there is the Monasca REST API.

Full details of the Monasca API can be found on GitHub.

12.1.1.3.2 Operations Console GUI

You can use the Operations Console (Ops Console) for SUSE OpenStack Cloud to view data about your SUSE OpenStack Cloud cloud infrastructure in a web-based graphical user interface (GUI) and ensure your cloud is operating correctly. By logging on to the console, SUSE OpenStack Cloud administrators can manage data in the following ways: Triage alarm notifications.

  • Alarm Definitions and notifications now have their own screens and are collected under the Alarm Explorer menu item which can be accessed from the Central Dashboard. Central Dashboard now allows you to customize the view in the following ways:

    • Rename or re-configure existing alarm cards to include services different from the defaults

    • Create a new alarm card with the services you want to select

    • Reorder alarm cards using drag and drop

    • View all alarms that have no service dimension now grouped in an Uncategorized Alarms card

    • View all alarms that have a service dimension that does not match any of the other cards -now grouped in an Other Alarms card

  • You can also easily access alarm data for a specific component. On the Summary page for the following components, a link is provided to an alarms screen specifically for that component:

    • Compute Instances: Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.3 “Managing Compute Hosts”

    • Object Storage: Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.4 “Managing Swift Performance”, Section 1.4.4 “Alarm Summary”

12.1.1.3.3 Connecting to the Operations Console

To connect to Operations Console, perform the following:

  • Ensure your login has the required access credentials: Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.2 “Connecting to the Operations Console”, Section 1.2.1 “Required Access Credentials”

  • Connect through a browser: Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.2 “Connecting to the Operations Console”, Section 1.2.2 “Connect Through a Browser”

  • Optionally use a Host name OR virtual IP address to access Operations Console: Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.2 “Connecting to the Operations Console”, Section 1.2.3 “Optionally use a Hostname OR virtual IP address to access Operations Console”

Operations Console will always be accessed over port 9095.

12.1.1.3.4 For More Information

For more details about the Operations Console, see Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.1 “Operations Console Overview”.

12.1.1.4 Service Alarm Definitions

SUSE OpenStack Cloud comes with some predefined monitoring alarms for the services installed.

Full details of all service alarms can be found here: Section 15.1.1, “Alarm Resolution Procedures”.

Each alarm will have one of the following statuses:

  • Critical - Open alarms, identified by red indicator.

  • Warning - Open alarms, identified by yellow indicator.

  • Unknown - Open alarms, identified by gray indicator. Unknown will be the status of an alarm that has stopped receiving a metric. This can be caused by the following conditions:

    • An alarm exists for a service or component that is not installed in the environment.

    • An alarm exists for a virtual machine or node that previously existed but has been removed without the corresponding alarms being removed.

    • There is a gap between the last reported metric and the next metric.

  • Open - Complete list of open alarms.

  • Total - Complete list of alarms, may include Acknowledged and Resolved alarms.

When alarms are triggered it is helpful to review the service logs.

12.1.2 Configuring the Monitoring Service

The monitoring service, based on Monasca, allows you to configure an external SMTP server for email notifications when alarms trigger. You also have options for your alarm metrics database should you choose not to use the default option provided with the product.

In SUSE OpenStack Cloud you have the option to specify a SMTP server for email notifications and a database platform you want to use for the metrics database. These steps will assist in this process.

12.1.2.1 Configuring the Monitoring Email Notification Settings

The monitoring service, based on Monasca, allows you to configure an external SMTP server for email notifications when alarms trigger. In SUSE OpenStack Cloud, you have the option to specify a SMTP server for email notifications. These steps will assist in this process.

If you are going to use the email notifiication feature of the monitoring service, you must set the configuration options with valid email settings including an SMTP server and valid email addresses. The email server is not provided by SUSE OpenStack Cloud, but must be specified in the configuration file described below. The email server must support SMTP.

12.1.2.1.1 Configuring monitoring notification settings during initial installation
  1. Log in to the Cloud Lifecycle Manager.

  2. To change the SMTP server configuration settings edit the following file:

    ~/openstack/my_cloud/definition/cloudConfig.yml
    1. Enter your email server settings. Here is an example snippet showing the configuration file contents, uncomment these lines before entering your environment details.

          smtp-settings:
          #  server: mailserver.examplecloud.com
          #  port: 25
          #  timeout: 15
          # These are only needed if your server requires authentication
          #  user:
          #  password:

      This table explains each of these values:

      ValueDescription
      Server (required)

      The server entry must be uncommented and set to a valid hostname or IP Address.

      Port (optional)

      If your SMTP server is running on a port other than the standard 25, then uncomment the port line and set it your port.

      Timeout (optional)

      If your email server is heavily loaded, the timeout parameter can be uncommented and set to a larger value. 15 seconds is the default.

      User / Password (optional)

      If your SMTP server requires authentication, then you can configure user and password. Use double quotes around the password to avoid issues with special characters.

  3. To configure the sending email addresses, edit the following file:

    ~/openstack/ardana/ansible/roles/monasca-notification/defaults/main.yml

    Modify the following value to add your sending email address:

    email_from_addr
    Note
    Note

    The default value in the file is email_from_address: notification@exampleCloud.com which you should edit.

  4. [optional] To configure the receiving email addresses, edit the following file:

    ~/openstack/ardana/ansible/roles/monasca-default-alarms/defaults/main.yml

    Modify the following value to configure a receiving email address:

    notification_address
    Note
    Note

    You can also set the receiving email address via the Operations Console. Instructions for this are in the last section.

  5. If your environment requires a proxy address then you can add that in as well:

    # notification_environment can be used to configure proxies if needed.
    # Below is an example configuration. Note that all of the quotes are required.
    # notification_environment: '"http_proxy=http://<your_proxy>:<port>" "https_proxy=http://<your_proxy>:<port>"'
    notification_environment: ''
  6. Commit your configuration to the local Git repository (see Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "Updated monitoring service email notification settings"
  7. Continue with your installation.

12.1.2.1.2 Monasca and Apache Commons validator

The Monasca notification uses a standard Apache Commons validator to validate the configured SUSE OpenStack Cloud domain names before sending the notification over webhook. Monasca notification supports some non-standard domain names, but not all. See the Domain Validator documentation for more information: https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/DomainValidator.html

You should ensure that any domains that you use are supported by IETF and IANA. As an example, .local is not listed by IANA and is invalid but .gov and .edu are valid.

Failure to use supported domains will generate an unprocessable exception in Monasca notification create:

HTTPException code=422 message={"unprocessable_entity":
{"code":422,"message":"Address https://myopenstack.sample:8000/v1/signal/test is not of correct format","details":"","internal_code":"c6cf9d9eb79c3fc4"}
12.1.2.1.3 Configuring monitoring notification settings after the initial installation

If you need to make changes to the email notification settings after your initial deployment, you can change the "From" address using the configuration files but the "To" address will need to be changed in the Operations Console. The following section will describe both of these processes.

To change the sending email address:

  1. Log in to the Cloud Lifecycle Manager.

  2. To configure the sending email addresses, edit the following file:

    ~/openstack/ardana/ansible/roles/monasca-notification/defaults/main.yml

    Modify the following value to add your sending email address:

    email_from_addr
    Note
    Note

    The default value in the file is email_from_address: notification@exampleCloud.com which you should edit.

  3. Commit your configuration to the local Git repository (Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "Updated monitoring service email notification settings"
  4. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  6. Run the Monasca reconfigure playbook to deploy the changes:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts monasca-reconfigure.yml --tags notification
    Note
    Note

    You may need to use the --ask-vault-pass switch if you opted for encryption during the initial deployment.

To change the receiving email address via the Operations Console:

To configure the "To" email address, after installation,

  1. Connect to and log in to the Operations Console. See Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.2 “Connecting to the Operations Console” for assistance.

  2. On the Home screen, click the menu represented by 3 horizontal lines (Three-Line Icon).

  3. From the menu that slides in on the left side, click Home, and then Alarm Explorer.

  4. On the Alarm Explorer page, at the top, click the Notification Methods text.

  5. On the Notification Methods page, find the row with the Default Email notification.

  6. In the Default Email row, click the details icon (Ellipsis Icon), then click Edit.

  7. On the Edit Notification Method: Default Email page, in Name, Type, and Address/Key, type in the values you want to use.

  8. On the Edit Notification Method: Default Email page, click Update Notification.

Important
Important

Once the notification has been added, using the procedures using the Ansible playbooks will not change it.

12.1.2.2 Managing Notification Methods for Alarms

12.1.2.2.1 Enabling a Proxy for Webhook or Pager Duty Notifications

If your environment requires a proxy in order for communications to function then these steps will show you how you can enable one. These steps will only be needed if you are utilizing the webhook or pager duty notification methods.

These steps will require access to the Cloud Lifecycle Manager in your cloud deployment so you may need to contact your Administrator. You can make these changes during the initial configuration phase prior to the first installation or you can modify your existing environment, the only difference being the last step.

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the ~/openstack/ardana/ansible/roles/monasca-notification/defaults/main.yml file and edit the line below with your proxy address values:

    notification_environment: '"http_proxy=http://<proxy_address>:<port>" "https_proxy=<http://proxy_address>:<port>"'
    Note
    Note

    There are single quotation marks around the entire value of this entry and then double quotation marks around the individual proxy entries. This formatting must exist when you enter these values into your configuration file.

  3. If you are making these changes prior to your initial installation then you are done and can continue on with the installation. However, if you are modifying an existing environment, you will need to continue on with the remaining steps below.

  4. Commit your configuration to the local Git repository (see Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "My config or other commit message"
  5. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Generate an updated deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Run the Monasca reconfigure playbook to enable these changes:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts monasca-reconfigure.yml --tags notification
12.1.2.2.2 Creating a New Notification Method
  1. Log in to the Operations Console. For more information, see Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.2 “Connecting to the Operations Console”.

  2. Use the navigation menu to go to the Alarm Explorer page:

  3. Select the Notification Methods menu and then click the Create Notification Method button:

  4. On the Create Notification Method window you will select your options and then click the Create Notification button.

    A description of each of the fields you use for each notification method:

    FieldDescription
    Name

    Enter a unique name value for the notification method you are creating.

    Type

    Choose a type. Available values are Webhook, Email, or Pager Duty.

    Address/KeyEnter the value corresponding to the type you chose.
12.1.2.2.3 Applying a Notification Method to an Alarm Definition
  1. Log in to the Operations Console. For more informalfigure, see Book “User Guide Overview”, Chapter 1 “Using the Operations Console”, Section 1.2 “Connecting to the Operations Console”.

  2. Use the navigation menu to go to the Alarm Explorer page:

  3. Select the Alarm Definition menu which will give you a list of each of the alarm definitions in your environment.

  4. Locate the alarm you want to change the notification method for and click on its name to bring up the edit menu. You can use the sorting methods for assistance.

  5. In the edit menu, scroll down to the Notifications and Severity section where you will select one or more Notification Methods before selecting the Update Alarm Definition button:

  6. Repeat as needed until all of your alarms have the notification methods you desire.

12.1.2.3 Enabling the RabbitMQ Admin Console

The RabbitMQ Admin Console is off by default in SUSE OpenStack Cloud. You can turn on the console by following these steps:

  1. Log in to the Cloud Lifecycle Manager.

  2. Edit the ~/openstack/my_cloud/config/rabbitmq/main.yml file. Under the rabbit_plugins:line, uncomment

    - rabbitmq_management
  3. Commit your configuration to the local Git repository (see Book “Installing with Cloud Lifecycle Manager”, Chapter 11 “Using Git for Configuration Management”), as follows:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "Enabled RabbitMQ Admin Console"
  4. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  5. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  6. Run the RabbitMQ reconfigure playbook to deploy the changes:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts rabbitmq-reconfigure.yml

To turn the RabbitMQ Admin Console off again, add the comment back and repeat steps 3 through 6.

12.1.2.4 Capacity Reporting and Monasca Transform

Capacity reporting is a new feature in SUSE OpenStack Cloud which will provide cloud operators overall capacity (available, used, and remaining) information via the Operations Console so that the cloud operator can ensure that cloud resource pools have sufficient capacity to meet the demands of users. The cloud operator is also able to set thresholds and set alarms to be notified when the thresholds are reached.

For Compute

  • Host Capacity - CPU/Disk/Memory: Used, Available and Remaining Capacity - for the entire cloud installation or by host

  • VM Capacity - CPU/Disk/Memory: Allocated, Available and Remaining - for the entire cloud installation, by host or by project

For Object Storage

  • Disk Capacity - Used, Available and Remaining Capacity - for the entire cloud installation or by project

In addition to overall capacity, roll up views with appropriate slices provide views by a particular project, or compute node. Graphs also show trends and the change in capacity over time.

12.1.2.4.1 Monasca Transform Features
  • Monasca Transform is a new component in Monasca which transforms and aggregates metrics using Apache Spark

  • Aggregated metrics are published to Kafka and are available for other monasca components like monasca-threshold and are stored in monasca datastore

  • Cloud operators can set thresholds and set alarms to receive notifications when thresholds are met.

  • These aggregated metrics are made available to the cloud operators via Operations Console's new Capacity Summary (reporting) UI

  • Capacity reporting is a new feature in SUSE OpenStack Cloud which will provides cloud operators an overall capacity (available, used and remaining) for Compute and Object Storage

  • Cloud operators can look at Capacity reporting via Operations Console's Compute Capacity Summary and Object Storage Capacity Summary UI

  • Capacity reporting allows the cloud operators the ability to ensure that cloud resource pools have sufficient capacity to meet demands of users. See table below for Service and Capacity Types.

  • A list of aggregated metrics is provided in Section 12.1.2.4.4, “New Aggregated Metrics”.

  • Capacity reporting aggregated metrics are aggregated and published every hour

  • In addition to the overall capacity, there are graphs which show the capacity trends over time range (for 1 day, for 7 days, for 30 days or for 45 days)

  • Graphs showing the capacity trends by a particular project or compute host are also provided.

  • Monasca Transform is integrated with centralized monitoring (Monasca) and centralized logging

  • Flexible Deployment

  • Upgrade & Patch Support

ServiceType of CapacityDescription
ComputeHost Capacity

CPU/Disk/Memory: Used, Available and Remaining Capacity - for entire cloud installation or by compute host

 VM Capacity

CPU/Disk/Memory: Allocated, Available and Remaining - for entire cloud installation, by host or by project

Object StorageDisk Capacity

Used, Available and Remaining Disk Capacity - for entire cloud installation or by project

 Storage Capacity

Utilized Storage Capacity - for entire cloud installation or by project

12.1.2.4.2 Architecture for Monasca Transform and Spark

Monasca Transform is a new component in Monasca. Monasca Transform uses Spark for data aggregation. Both Monasca Transform and Spark are depicted in the example diagram below.

You can see that the Monasca components run on the Cloud Controller nodes, and the Monasca agents run on all nodes in the Mid-scale Example configuration.

12.1.2.4.3 Components for Capacity Reporting
12.1.2.4.3.1 Monasca Transform: Data Aggregation Reporting

Monasca-transform is a new component which provides mechanism to aggregate or transform metrics and publish new aggregated metrics to Monasca.

Monasca Transform is a data driven Apache Spark based data aggregation engine which collects, groups and aggregates existing individual Monasca metrics according to business requirements and publishes new transformed (derived) metrics to the Monasca Kafka queue.

Since the new transformed metrics are published as any other metric in Monasca, alarms can be set and triggered on the transformed metric, just like any other metric.

12.1.2.4.3.2 Object Storage and Compute Capacity Summary Operations Console UI

A new "Capacity Summary" tab for Compute and Object Storage will displays all the aggregated metrics under the