Live Patching Meltdown – SUSE Engineer’s research project (Part 2)
Following up on the Part 1 about key technical obstacles for Live Patching Meltdown, in this blog I will give you some background on virtual address mappings in context of the Meltdown vulnerability and look at patching kGraft itself!
Virtually mapped memory is a protection feature provided by the CPU, orthogonal to the privilege separation between user and kernel mode.
Whenever a memory access is made, be it from a user space program or from the kernel, the CPU takes the target address to be a so called “virtual” address. It maps that virtual address to a physical one by means of a special mapping structure, the page tables. Each process has its own such map and thus, its own view of memory. In particular, for two different processes, the same virtual address can be (and usually is) mapped into separate regions of physical memory, thus preventing the two from interfering with each other.
The page tables are organized in a tree structure and the kernel announces the current address mapping to the CPU by writing a pointer to the tree’s root into a special register (CR3). Usually it does so when switching between processes.
Along with the virtual-to-physical translation, some access rights are also stored: a mapped virtual address might either be accessible from user mode or not.
On x86_64, the Linux kernel divides all available virtual address space into two regions: the lower range is dedicated to user space and the higher one to the kernel. Mappings in the lower, user space region usually differ between processes while the kernel space maps are always the same and synchronized between all page tables.
If the kernel is entered from user space, the current page tables used to be kept and the raised privilege level would allow the kernel to access any of its mirrored address mappings.
If on the other hand an access to the kernel space region was from user mode, the CPU would, courtesy of the access rights stored in the page tables, detect this violation and enter a fault exception. The problem with the Meltdown vulnerability is that, although the access would be ultimately denied, a CPU’s speculative execution could have resulted in indirectly observable side effects such as filled caches, allowing an attacker to infer the actual contents of the memory.
The KPTI patch set mitigates this by providing for each “full” page table a stripped down variant which is to be used when in user mode. These stripped down variants, also called “shadow page tables” in the kernel code, mirror their counterparts’ user space region mappings but lack any from the kernel region. Well, except for the bare minimum required to enter the entry code and switch to their fully populated counterpart.
Achieving global state
Once such a user space shadow page table has been created, it must be maintained and always kept consistent with its full counterpart it has been derived from. Of course, this is achievable by conventional kGraft-patching of all relevant page table modifying sites in the kernel.
However, some extra steps are necessary to elevate kGraft’s per task consistency model to a global one. Imagine that during the transition period, there are two threads, A and B, which share a common virtual memory space. Further assume that A has been switched to the new implementation while B hasn’t yet. If A happens to install a shadow page table prematurely, it can become stale quickly as memory map modifications from B won’t get propagated to it.
What is needed is a way to query the global patch state from the live patch module, i.e. to ask whether or not each and every task has finished transitioning to the new implementation. Similarly, the live patch must get notified before a transition to an unpatched state is about to happen.
In its current implementation, kGraft doesn’t provide such functionality though. The question was: could it still be emulated somehow? It turned out it could, namely by kGraft-patching kGraft itself!
Once the kGraft core concludes that all tasks have been transitioned to the “patched” state, it invokes some internal housekeeping function,
kgr_finalize(). In particular, the task in whose context
kgr_finalize() is called in will itself have been transitioned to the new state and thus, a redirection to
kgr_finalize()‘s live patch replacement, if any, will be made. The live patch can now amend its implementation of
kgr_finalize() to take notice of the global patch state. Something similar works for tracing attempts to start a transition from patched to unpatched state.
In conclusion, by patching kGraft itself, a live patch can hook into its internal transitioning processes and track its own state.