Maintaining Enterprise Linux Kernels
Forking the Linux kernel and using it as the basis of an Enterprise product is a challenging task. The pace of development in the upstream Linux kernel makes it hard to keep up with all the fixes that need to be backported. This article describes the process we use at SUSE to find and backport potentially required upstream fixes to our kernels.
How Enterprise Kernels are Built
To understand how we search and find upstream fixes for our product kernels it is helpful to have some knowledge about how our kernels come to life. The story begins months to years before the targeted release date and starts by choosing a base kernel for the new product or service pack. The base kernel can be a new upstream Linux kernel version or the kernel of another product. In SUSE Linux Enterprise products, an odd-numbered service pack typically builds on top of the kernel from the previous service pack. Even numbered service packs chose a new upstream Linux kernel version. From the chosen base the fork is created as a new kernel branch in our internal git repository.
When the kernel branch is alive the backporting starts. Backport requests come from multiple sources, for example from hardware vendors that want to have their latest or even upcoming devices supported by the new product. Some requests might also come from within SUSE to support new upstream kernel features or improve performance. Sometimes it is necessary that a whole subsystem is lifted to a newer kernel version. For everything that is backported there is an upstream-first policy, meaning that we don’t backport patches unless they are merged into the upstream Linux kernel first. Or they’ve been accepted in a subsystem tree aimed for the next merge window.
Each upstream patch that is backported to our new kernel will get some annotations in the commit message. For example:
From: firstname.lastname@example.org Date: Thu, 7 Sep 2017 19:02:30 +0100 Subject: [PATCH 1/2] KVM: VMX: Do not BUG() on out-of-bounds guest IRQ Git-commit: 3a8b0677fc6180a467e26cc32ce6b0c09a32f9bb Patch-mainline: v4.14-rc1 References: bsc#1058038, CVE-2017-1000252 The value of the guest_irq argument to vmx_update_pi_irte() is ultimately coming from a KVM_IRQFD API call. Do not BUG() in vmx_update_pi_irte() if the value is out-of bounds. (Especially, since KVM as a whole seems to hang after that.) Instead, WARN_ONCE() if we find that we don't have a route for a certain IRQ (which can be out-of-bounds or within the array). Signed-off-by: Jan H. Schönherr <email@example.com> Acked-by: Joerg Roedel <firstname.lastname@example.org>
The important annotations are ‘Git-commit’, which contains the upstream git commit-id of the patch, and the last ‘Acked-by’ line. That line denotes the developer who did the backport.
The new kernel is already tested before all backports are done and it continues until release date and even after that for each maintenance update kernel. During the release cycle testing gets broader, in the beginning it only runs internally at SUSE, while later in the process the kernel is also tested by partners.
After the release no major backports happen anymore. New backports are only allowed to fix bugs.
Finding Upstream Fixes for a Kernel Branch
The primary mechanisms which cause an upstream patch to be backported to released kernels are bug reports from customers and security incidents. When a customer reports a problem we find the root-cause and check if the problem is already fixed upstream. If it is not we send a patch to fix it and backport it to our product kernel.
That process has been working well for quite some time, but it has the problem that the customer already ran into the problem before we have had a chance to fix it. In an attempt to prevent the customer running into the problem in the first place SUSE started to proactively backport kernel fixes from upstream Linux to its kernel branches.
But how do we find the fixes that need backporting? For that we look mainly at two sources:
- We consider upstream patches with ‘Fixes’ tags.
- If possible we also track upstream stable kernels.
To decide whether an upstream patch with a ‘Fixes’ tag is needed in one of our product kernels, we first need to create a list of upstream commit-ids that are in each of our maintained kernels. That commit-list consists of the commits in the base-kernel and all commits we backported to that kernel. Since each backported patch is annotated with its upstream commit-id we can easily find that information.
With the commit-list for each kernel we can start matching upstream commits with Fixes tags. When a Fixes tag in an upstream commit references a commit which we have on the commit-lists for one of our kernels, we put that upstream commit on the list of potential fixes.
The second source of potential fixes is the list of upstream patches that are applied to stable kernels. For that we track the stable kernels that are close to the base-kernel we use, but we can also track other stable-kernels on a per-subsystem basis.
The potential fixes from the two sources are compiled into a list for each maintained kernel. After that list is created, it gets split into groups of commits which get sent to kernel engineers doing those backports.
Reporting Upstream Fixes to Individual Developers
Every fix that is reported will be evaluated by a developer and either backported to the kernel branches that need it or blacklisted, so that the fix is no longer considered. But who is the best person (or group) to report a fix to?
The answer is easy if the fix is for a patch that was backported by someone within SUSE as part of a service pack development cycle. In that case the person who backported the patch is tasked with reviewing the associated fix. The same happens with upstream fixes that are authored or committed by a SUSE employee.
Assigning fixes for patches that are part of the base-kernel is a bit more complicated. To that end we have introduced a maintainer model with an internal list of experts for most parts of the Linux kernel.
The approach is similar to the MAINTAINERS file in the upstream Linux kernel, but the file at SUSE is simpler. It only contains a list of people and several path-specs per entry. Each potential fix for the base-kernel is matched against the path-specs in the maintainers list and assigned to the best matching entry. The fix is reported to the developers listed in the matching entry.
But not all fixes could be assigned that way because the SUSE maintainers list does not cover the whole kernel source tree. For the remaining fixes a heuristic is used. It is based on which source code files in the kernel source tree are touched by the backports of each developer. This is matched against the file(s) a fix touches.
How Fixes are Reported
The fixes are reported by e-mail to each developer or group of developers. E-mail is well accepted along kernel developers because it doesn’t need any change to the developers’ work methods.
Once a week, every developer gets an e-mail with the fixes she should look at. For example, in the week I wrote this article I got this in my inbox:
To: email@example.com Subject: [2019-10-08] Pending SUSE Kernel Fixes for firstname.lastname@example.org Hi, There are pending fixes you wrote/committed upstream or which are for patches you backported to SUSE kernels. Please have a look at the list below and backport or blacklist the patch(es) in there. Patch list (11 patches): ====================================================================== 34c0989c0531 iommu/amd: Fix pages leak in free_pagetable() Needed in SLE15-SP2 [...] 2a78f9962565 iommu/amd: Lock code paths traversing protection_domain->dev_list Needed in SLE12-SP5 Needed in SLE15 Needed in SLE15-SP2 You can reply to this message if you have any questions or suggestions. Thanks a lot, Joerg
This is a single actionable e-mail which developers can act on. In addition to the weekly e-mails, developers will get an update (at most once a day) if a new fix made it to their list and even when a fix got removed. The reason for the update is that we don’t want to miss important fixes until the next weekly e-mail.
Experiences with Proactive Fixes Backporting
The proactive reporting of potential upstream fixes is in place at SUSE for over three years now. During this time, the process was continuously improved to its current state with the experience we gained along the way.
Building the internal maintainers list and matching fixes against the entries was the latest addition to the process. The developers are mostly happy with it now and backporting fixes became standard routine work every week.
The ratio of fixes that are actually backported versus ones that get blacklisted is around 90% to 10% for more recent kernel branches. The kernel branch for SUSE Linux Enterprise Server 15 received over 6300 fixes that were proactively backported and fixed quite some bugs before customers hit them. Overall the proactive backporting became an integral part of our kernel development process during the last years.