Reboot Reloaded: Patching the Linux Kernel Online | SUSE Communities

Reboot Reloaded: Patching the Linux Kernel Online

Share
Share
“This article is from The SUSE Insider, a technical quarterly publication for SUSE customers to help them get the most value from their SUSE solutions.”

Author: Vojtech Pavlik is director of SUSE Labs, a department within SUSE R&D focusing on core technologies and research. He is one of the creators of kGraft.

1. Why?

The reliance of mankind on computers to control critical activities like stock trading, flight control or nuclear power plant management is ever increasing.

These services must not fail or have outages. Redundant systems have been proposed and implemented to solve single, unpredictable, independent component failures. Redundant systems composed of components of independent origin are being used to prevent systemic errors that cause larger outages.

Hot-swappable components have been designed to allow replacing components without shutting down systems.

Live kernel patching is the software equivalent of hot-swappable physical components. It allows replacing a faulty function inside the kernel without taking the system offline.

2. When?

The three commonly used tiers of change management, from top to bottom:

  • Incident response
  • Emergency change
  • Scheduled change

In an incident, a system could be down or in the midst of being actively exploited, and a corrective action is needed immediately. In an emergency, there is an identified risk that the system might crash or have a known vulnerability to attack, requiring an expedited action without delay. Scheduled changes are typically improvements that can wait until a window when the system is not needed.

Live patching gives a quick solution to incidents and emergencies caused by kernel issues and, in effect, turns the resolution of such an issue into a scheduled change: the full kernel update can wait until the next maintenance window.

This is of utmost importance to customers who need PCI-DSS, SSAE-16, ISO-27001 or other compliance and security certifications that mandate a certain speed of incident response.

3. Who?

One typical use case for live patching is in memory databases, where the cost of reboot and, thus, the value of avoiding it is highest. The huge processing and analytics power of an in-memory database comes at a cost: loading multiple terabytes of data to memory upon reboot can take a good part of an hour for even the fastest storage systems. Redundancy and replication can avoid externally visible downtime, but even then the switch from one server to another is usually noticeable. Additionally, using live patching could turn out to be much more cost effective than owning a second, very large server that acts only as a backup.

Mission-critical infrastructure services are another use case. These typically are redundant, and the goal is to keep them fully redundant at all times. The redundancy is there to cope with failures. It is not a tool to be used by administrators routinely for introducing changes. Live patching can help by allowing IT to apply fixes without having to go through a reduced redundancy cycle.

Simulations and HPC (high performance computing) calculations with terabytes of data in flight and spread over thousands of systems often cannot afford to stop and save all that data to storage; nor is a rolling reboot of the whole HPC cluster advisable. Live patching can help to keep the calculation going if a bug in the kernel is causing instability in the cluster.

Massive deployments that a cloud provider or an online service would use present a similar case. Live patching helps save on update costs, allowing IT to apply fixes in seconds rather than hours or days to a large farm of servers.

4. What?

The SUSE Live Patching technology is called “kGraft.” It was designed to be fast with no measurable interruption of service, and is easy and transparent to use. Simply installing a kGraft RPM package patches the kernel; upgrading the RPM package to a newer version updates the kernel to the next patch level; and downgrading the RPM package downgrades the patch level. In all cases, kGraft remains live, in memory, but persistent across reboots. Upon reboot, the kernel is patched in memory before the system boots; this provides a perfectly identical state to what is achieved through live patching.

To achieve this identical state, however, certain constraints had to be put on what kGraft can do. Most importantly, the scope of patches that will be available as live patches is limited to CVE vulnerabilities rated at CVSS level 6 and higher, and to bugs that could cause data corruption or severe system instability.

In addition, the fixes must be small in scope, replacing a limited number of kernel functions. This rules out whole-kernel upgrades using this method.

kGraft is available as SUSE Linux Enterprise Live Patching 12, a full-service offering with maintenance and support, providing live patch streams that allow IT to entirely avoid reboots for up to 12 months in one stretch.

5. How?

Let’s look at how kGraft works under the hood.

Basically, kGraft puts a “detour” sign (a CALL instruction) into a reserved space at the beginning of a function that contains a bug. This redirects the code flow to ftrace, an infrastructure for kernel tracing, which in turns calls into kGraft. Then kGraft decides which replacement function should be called instead. This is much more reliable than changing all call sites that want to execute the function to call the new one. There can be thousands inside the kernel, and identifying all is a tough, if not impossible, task, particularly given that the Linux kernel has a partially object-oriented architecture, and the address of the affected function may be stored in kernel data.

6. Creating Patches

There are two fundamental ways to create live patches: manual and automated. Automated approaches save effort but tend to make patches larger than required and very hard to review for correctness. In any case, semantic analysis of the patch must be done by a human, which mostly negates the saved effort by automating patch generation.

kGraft provides tools for automation. However, experience has shown that creating patches manually allows users to produce higher quality patches that can be fully independently reviewed in source form and proven to do exactly what they are intended to.

Since kGraft replaces functions inside the kernel, a starting point is to identify which functions need to be replaced. This can be easily seen from the source code changes that need to be applied.

A shortened example of a kGraft patch looks like this:


#include <linux/module.h>
#include <linux/kgraft.h>

static bool kgr_new_capable(int cap)
{
	printk(KERN_DEBUG "we added a printk to capable()\n");
	return ns_capable(&init_user_ns, cap);
}

static struct kgr_patch patch = {
	.name = "sample_kgraft_patch",
	.owner = THIS_MODULE,
	.patches = { KGR_PATCH(capable, kgr_new_capable, true),
		     KGR_PATCH_END }
};

	static int __init kgr_patcher_init(void)
	{
		return kgr_patch_kernel(&patch);
	}
	static void __exit kgr_patcher_cleanup(void)
	{
		kgr_patch_remove(&patch);
	}

	module_init(kgr_patcher_init);
	module_exit(kgr_patcher_cleanup);

	MODULE_LICENSE("GPL");

It starts with including the required header files and then defines the following:

  • A new version of a kernel function
  • A structure containing the description of the patch, as well as a list of functions to replace
  • The steps to initialize and clean up the module that uses the kGraft infrastructure to apply and remove the patch upon insertion and removal of the module into and from the Linux kernel

7. Caveats in Patch Creation

There are a number of stumbling blocks that a patch author?in the case of SUSE Linux Enterprise Live Patching, a SUSE developer?must be mindful of when creating a patch.

The first, very basic, is inlining. A C compiler can decide that a certain function is small enough that instead of being called, it’s worth embedding it into the calling function whole. This is called inlining. If the inlined function contains a bug, the bug is replicated into any other function that it has been inlined into. This isn’t seen in the C source and is purely a compiler internal decision. All the affected functions need replacing now, not just the original. There is a solution: DWARF debug information that is being built and archived together with the kernel contains all that is needed to know the compiler’s inlining decisions. It can be used to expand the list of functions that need to be replaced by the patch author.

Next, there can be unexported symbols. These are symbols used within a kernel object that aren’t available outside of its scope for linking. Using such a function from a patch directly is thus impossible and requires a trick: by using the kallsyms infrastructure of the Linux kernel, it is possible to obtain the addresses of all symbols, including unexported symbols. Such symbols then can be called via those addresses. For example:

int patched_fn(void)
{
	kgr_orig_static_fn();
}

static int __init kgr_patcher_init(void)
{
	kgr_orig_static_fn =
		(static_fn_proto)kallsyms_lookup_name("static_fn");
	if (!kgr_orig_static_fn) {
		pr_err("kgr: function %s not resolved\n",
		"static_fn");
		return -ENOENT;
}

IPA-SRA, or interprocedural scalar replacement of aggregates, is a feature that is as dangerous as its name sounds. It’s a compiler optimization (developed at SUSE) that gives a significant performance boost, but it is also a disaster for patching. It can modify CALL instructions at the end of a function into JMP if the CALL is the last statement of a function. It can transform arguments passed by reference into arguments passed by value if the value is never changed, and it can create multiple variants of a function with fewer arguments, assuming a specific constant value for the removed argument allows for significant reduction of a function. Fortunately, this is all recorded in DWARF, the same as inlining and only results in more effort for the patch author.

8. Patching in Detail

As mentioned earlier, kGraft uses the ftrace framework for call redirection. Ftrace uses ‘gcc -pg -mfentry’ to generate calls to __fentry__() at the beginning of every function, replacing all those calls with “NOP” instructions at boot and reserving space for call redirection in the future. When required, the “NOP” is automatically replaced with a “CALL” to ftrace. kGraft then registers a tracer with ftrace, taking control when a redirection is needed. And that’s it: a function call is redirected.

Before Patching After Patching

Gcc’s “-mfentry” argument is unique to the x86-64 architecture. However, similar functionality is offered by “-mprofile-kernel” on the POWER64 architecture, or by “-mhotpatch” on s390x. Supporting ftrace and, by extension, kGraft is thus possible across all major architectures, including Aarch64 (ARM64).

9. The Final Hurdle

Using ftrace, it’s fairly straightforward to redirect a single function to a new version. But what happens when multiple functions require being changed simultaneously because they depend on each other? The dependency can be in the form of changed number or types of arguments, return type or even a semantic change not covered by programming language syntax. In this case, we need a consistency model. kGraft uses a consistency model called “leave kernel / switch thread.” Its main virtue is no interruption of service and no impact on the running system whatsoever.

In kGraft we want to avoid calling a new function from an old one and vice versa: if the function prototype has changed, this would cause a system crash. We achieve it by remembering a “universe” flag for each thread of execution such as interrupts, user threads or kernel threads. Only when a thread reaches a safe point, where we know that no kernel function is being executed by that thread, can we switch the universe flag and the thread starts executing new functions.

Universe Flagging

This safe point is the end of interrupt for interrupts, kernel exit/entry for userspace threads and the so-called freezer for kernel threads. After applying a patch, threads migrate one by one to the new universe as they pass through their respective safe points. No stopping of anything is needed, and once everyone is in the new universe, kGraft declares patching complete.

But what if a thread never does anything and never passes a safe point? We call these threads “eternal sleepers.” They might be server daemons waiting for a request that never comes or gets onto consoles where no one ever logs in or daemons that handle situations that never arise. They just wait for their cue and sleep inside the kernel forever.

Patching cannot be declared complete until even these threads are moved over to the new universe. kGraft has to wake them up. This is done by sending them a signal, “SIGKGRAFT.” This special signal wakes up the thread and causes it to attempt to exit the kernel to handle the signal, thus passing a safe point. At the safe point, kGraft catches the signal and returns the thread back to the kernel. The sleeping userspace application never notices, its thread is safely migrated, and success can be declared.

Many other consistency models are also being proposed and implemented. One is the Ksplice consistency model (now categorized as “leave patched set / switch kernel”), which achieves consistency simply by stopping the whole system for patching. Stopping isn’t enough, though. After stopping, every thread needs to be checked to determine whether it is executing any of the patched functions. If it is, the kernel is resumed and stopped later to try again. This model is as safe as kGraft’s, but could cause up to 40ms of interruption of service for each patch, and fails if eternal sleepers are in any of the patched functions.

10. Community / Upstream

SUSE is a community player. We are proud that all of the kernel work we do is shared with the Linux developer community. SUSE has been working to get kGraft into the upstream Linux kernel since publishing it in 2014. The release of the kGraft technology was followed by the release of kpatch by Red Hat a few weeks later; kpatch is mostly based on the Ksplice model.

Because there are two independent implementations of live patching, SUSE and Red Hat engineers are working to create a joint project, now called “KLP for Kernel Live Patching,” to be included in the upstream kernel. It uses ideas from both implementations and has been merged into the upstream kernel version 4.0. Including live patching was the major reason for increasing the kernel major version to 4. The implementation is very basic at the moment, and SUSE and Red Hat are working together to extend it to be able to fully replace kGraft and kpatch.

When the joint project is complete, live patching will become a standard technology for Linux users.

Share
(Visited 1 times, 1 visits today)

Comments

  • Austin Joseph says:

    Excellent Write Up

  • Leave a Reply

    Your email address will not be published.

    10,317 views